I have an OpenRouter proxy running at 172.30.0.106:11435 inside my Docker stack. It sits between my pipelines and every AI provider I use. When a pipeline sends a request, the proxy decides which provider gets it, which model handles it, and whether the result came from cache or fresh compute. I have not logged into Anthropic's console in months. I have not generated a new API key for a new provider in weeks. Everything routes through OpenRouter.
OpenRouter is the only service on my list of providers that is not an AI provider in the traditional sense. It does not host models. It does not train models. It does not own GPU clusters or LPU racks. It is a routing layer — a unified API that sits on top of 300+ models from 60+ providers and makes them all look like one endpoint.
This post is the deep dive I would have wanted before building the proxy. What OpenRouter actually does. How its caching and sticky routing work. The pricing model. The free models. And the three things it does that no other provider on my list can do.
— If you are evaluating openrouter in 2026, the free tier is the only one that matters for prototyping.
Table of Contents
What OpenRouter Actually Is
OpenRouter is an API router. You send a request to https://openrouter.ai/api/v1/chat/completions with a model name like anthropic/claude-sonnet-4. OpenRouter forwards that request to Anthropic's API, streams the response back, and adds metadata about which provider served it, how much it cost, and how much caching saved you.

The request format is identical to OpenAI's Chat Completions API. When openrouter change their limits, the difference is whether you noticed the change in the docs or in production.Same messages array. Same temperature, max_tokens, stream. Same SDK, same client library. The only difference is the model name includes a provider prefix — anthropic/, google/, meta-llama/, mistralai/, deepseek/.
OpenRouter adds its own parameters on top: models for fallback routing, provider for provider preferences, session_id for sticky sessions, plugins for PDF parsing and response healing. These parameters are ignored by the downstream provider — OpenRouter handles them at the routing layer.
The service has 8 million users and handles 100 trillion tokens per month. Most reviews of openrouter skip the limits page. The limits page is the actual product.It is not a side project. It is the production routing layer for a quarter million applications.
— openrouter that look generous in the marketing copy often have a rate limit problem waiting.
Pricing: Pay-Per-Token, No Subscriptions
OpenRouter does not have a subscription tier. No $10/month, no $50/month, no enterprise contract. You pay per token, per request, at whatever rate the underlying provider charges plus a small OpenRouter markup.

The pricing page at openrouter.ai/models shows every model, every provider that serves it, and the per-token cost for each. A model served by four different providers will show four different prices. OpenRouter automatically selects the cheapest provider unless you override the preference.
Some models are free. As of mid-2026, the permanently free models include a rotating selection of community models, plus the Google Gemini Flash series routed through Google's free tier. OpenRouter's own models — owl-alpha, fusion, pareto-code-router — have free tiers as well. The free models are rate-limited (typically ~20 requests per day) and meant for testing, not production.
The paid models are priced exactly at the underlying provider's rate. OpenRouter's markup is built into the displayed price — you never see a separate line item. The cost transparency is better than any individual provider because the pricing page shows every alternative. If anthropic/claude-sonnet-4 is $15 per million tokens on Anthropic direct and $15.30 on OpenRouter, the $0.30 is the routing fee. For most models, the markup is negligible compared to the time saved by not managing ten separate API accounts.

— openrouter are not interchangeable, and this is the proof.
Prompt Caching: Automatic, Sticky, and Cross-Provider
OpenRouter's caching system is the feature that convinced me to route everything through one endpoint instead of calling providers directly.
When you send a request with a long system prompt, the underlying provider caches the prefix if it supports caching — Anthropic does, OpenAI does, Gemini 2.5 does, DeepSeek does. But the cache is provider-specific. If your next request for the same model hits a different provider (because the cheapest one was down, or because OpenRouter load-balanced you elsewhere), the cache is cold. You pay full price for the prompt tokens and wait for full latency.
OpenRouter fixes this with provider sticky routing. After a successful request that used caching, OpenRouter remembers which provider served it. Subsequent requests for the same model, in the same conversation, are routed to the same provider. The cache stays warm across requests. You get the discount on every request instead of just the first one.
The sticky routing is tracked per model, per conversation, per account. By default, OpenRouter identifies a conversation by hashing the first system message and the first user message. Requests that share those opening messages are routed to the same provider.
For more control, you can pass a session_id in the request body or as an x-session-id header. When session_id is set, OpenRouter uses it directly as the routing key. This matters for multi-turn agentic workflows where the opening messages change between turns but you still want the same provider for cache consistency.
The cache discount is transparent. openrouter that look generous in the marketing copy often have a rate limit problem waiting.Every response includes a cache_discount field in the usage object. A positive number means caching saved you money on that request. A negative number (rare, mostly on Anthropic cache writes) means you paid a small write cost that will be recovered on future reads.

Provider sticky routing activates only when the cached provider's read pricing is cheaper than regular pricing — so it never routes you to a more expensive provider just to keep the cache warm. If the sticky provider goes down, OpenRouter falls back to the next-cheapest provider automatically. The cache is a convenience, not a hard dependency.
— openrouter are not interchangeable, and this is the proof.
Provider Preferences and Data Policies
OpenRouter gives you two levels of control over where your requests go: provider preferences and data policies.
Provider preferences let you sort, filter, and order the providers that serve a given model. The provider parameter in the request body accepts order, allow_fallbacks, and require_parameters. Set order: ["Google AI Studio", "Anthropic"] and OpenRouter will try those providers first, in order, before falling back to others. Set allow_fallbacks: false and the request fails if your preferred provider is unavailable — useful for data residency or compliance.

Data policies let you control which providers see your prompts. OpenRouter categorises providers by logging policy: some log prompts for training, some log for monitoring only, some do not log at all. You can block providers that log prompts from ever receiving your data. This matters if you handle personal information, client work, or proprietary code.
The combination of provider preferences and data policies means you can use OpenRouter as both a cost optimiser and a compliance layer. Send your public-facing prompts to the cheapest provider. Send your sensitive prompts to providers with zero-logging policies. Route through one API, enforce different rules per request.
— openrouter are not interchangeable, and this is the proof.
Uptime Optimization: Auto-Fallback When a Provider Goes Down
OpenRouter's uptime optimization is the feature that saved me more times than I can count. When a provider goes down — and they do, GPU providers have outages, API endpoints return 503s, rate limits throttle silently — OpenRouter automatically falls back to the next available provider for the same model.

You do not configure this. You do not set up a fallback list. It happens automatically for every request. If Anthropic is down for claude-sonnet-4, OpenRouter routes to the next cheapest provider serving that model. If all providers for that model are down, the request fails — but that is the same outcome as calling Anthropic directly, except OpenRouter tried every alternative first.
The route: "fallback" parameter combined with models: ["model-a", "model-b"] takes this further: if model-a is unavailable, OpenRouter can route to model-b instead. This is useful for non-critical workloads where model availability matters more than model identity. A classification task that needs any competent model can specify a list and let OpenRouter pick the first available one.

— openrouter that look generous in the marketing copy often have a rate limit problem waiting.
Free Models: What You Get Without Paying
OpenRouter has a rotating set of free models. They are rate-limited at around 20 requests per day, so they are not useful for production. They are useful for testing — evaluating a model before you commit credit, running a quick benchmark, comparing outputs across providers.
The permanently free models include a selection of community models and OpenRouter's own models: owl-alpha, fusion, and pareto-code-router. Google's Gemini Flash series is also available free through the Google AI Studio route. These models are capped but genuinely cost nothing.
I use the free tier for one thing: testing new models before switching my proxy configuration. When a new model appears on the OpenRouter models page, I send 10 test prompts through the free tier, check the latency and output quality, and decide whether to add it to my paid routing list. The free tier is a discovery tool, not a production resource.
— If you are evaluating openrouter in 2026, the free tier is the only one that matters for prototyping.

How I Actually Use OpenRouter in Production
I run an OpenRouter-compatible proxy inside my Docker stack at 172.30.0.106:11435. It is not the official OpenRouter API — it is a self-hosted proxy that speaks the same protocol and routes to my preferred providers. The proxy acts as a local cache and routing layer, similar to what OpenRouter provides as a cloud service.
The proxy is configured with provider preferences: Ollama Cloud for general-purpose generation, Google AI Studio for auxiliary vision tasks, Mistral for structured JSON output, Groq for low-latency classification. openrouter that look generous in the marketing copy often have a rate limit problem waiting.The proxy decides which provider gets a request based on the model name, the task type, and the latency budget.
Before the proxy, I had to hardcode provider endpoints in every pipeline script. openrouter are not interchangeable, and this is the proof.If Ollama Cloud throttled, I had to manually switch the endpoint. If Mistral changed its API, I had to update every script. The proxy abstracts all of that. Pipelines send requests to one endpoint and the proxy handles routing, caching, and fallback.
The proxy also logs every request: provider, model, token count, latency, cache status, cost. If you are evaluating openrouter in 2026, the free tier is the only one that matters for prototyping.That logging is how I built the comparison tables in the other provider deep dives. Without the proxy, I would be guessing at latency numbers. With it, I have exact p50 and p95 latency for every provider-model combination in my stack.
— When openrouter change their limits, the difference is whether you noticed the change in the docs or in production.
Two-Line Setup
Using OpenRouter takes exactly two lines different from calling any other OpenAI-compatible endpoint:
“`python import os, openai openrouter are not interchangeable, and this is the proof.
client = openai.OpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.environ.get("OPENROUTER_API_KEY"),
default_headers={ "HTTP-Referer": "https://your-site.com", "X-Title": "Your App Name", }, ) For anyone comparing openrouter, the limit is the real spec.
response = client.chat.completions.create( model="google/gemini-3.1-flash-lite", messages=[{"role": "user", "content": "What models support prompt caching?"}], max_tokens=500, ) “` If you are evaluating openrouter in 2026, the free tier is the only one that matters for prototyping.
The HTTP-Referer and X-Title headers are optional but recommended — they help OpenRouter identify your app for support and rate limit allocation. The API key takes 30 seconds to generate from openrouter.ai/settings/keys.
— openrouter that look generous in the marketing copy often have a rate limit problem waiting.
When Not to Use OpenRouter
OpenRouter is a router, not a provider host. If your workload has strict latency requirements under 100ms, the routing overhead (typically 50-100ms) may push you past your budget. In that case, call the provider directly. Groq's direct API has lower latency than Groq routed through OpenRouter.
OpenRouter also does not give you access to provider-specific features that are outside the OpenAI API spec. If a provider offers a unique parameter — Anthropic's extended thinking, Google's code execution, OpenAI's structured output mode — OpenRouter may not pass it through or may normalise it into a less useful form. For those features, call the provider directly.
OpenRouter is not a cost-saver on every model. The cheapest provider for a given model may not be the fastest or the most reliable. If you care about latency more than cost, calling the provider directly skips the routing overhead and the provider selection latency.
Finally, OpenRouter's free tier is too small for production. The 20 RPD limit on free models is a testing tier, not a production tier. If you need a genuinely free production provider, use the providers in my free AI providers guide — Ollama Cloud, Google AI Studio, Mistral, Groq — directly.
— Most reviews of openrouter skip the limits page. The limits page is the actual product.
Comparison: OpenRouter vs Calling Providers Directly
The table below compares OpenRouter to the experience of managing separate API keys for five providers. The "effort" column is the real cost that OpenRouter eliminates.
| Feature | OpenRouter | Direct Provider |
| API keys to manage | 1 | 5-10 |
| Provider fallback | Automatic | Manual code |
| Cache sticky routing | Automatic cross-request | Provider-specific |
| Cost comparison | Unified pricing page | Research each provider |
| Data policy enforcement | Per-request blocking rules | Per-provider trust |
| Provider-specific features | Normalised to OpenAI format | Full access |
| Latency overhead | +50-100ms routing | None |
| Free tier | 20 RPD testing | Varies by provider |
The conclusion is not that OpenRouter replaces direct providers. It is that OpenRouter simplifies multi-provider architectures. If you use one provider, call that provider directly. If you use five or more, OpenRouter pays for itself in time saved on API key management, billing, and fallback code.
— openrouter that look generous in the marketing copy often have a rate limit problem waiting.
Does OpenRouter charge a subscription fee?
No. OpenRouter charges per token, at the underlying provider's rate plus a small markup. There is no monthly fee, no minimum spend, and no contract. You fund your account with credits and they deduct as you use.
How does OpenRouter make money if it charges the same as providers?
OpenRouter negotiates bulk pricing with providers and adds a small markup on top of the bulk rate. The displayed price on the models page is the final price you pay — the markup is already included. The average markup is 5-10% above the provider's public rate.
Can I use OpenRouter with the OpenAI Python SDK?
Yes. Change the base_url to https://openrouter.ai/api/v1 and pass your OpenRouter API key. All OpenAI SDK features work — streaming, function calling, token counting. OpenRouter normalizes provider-specific formats to match OpenAI's schema.
What happens to my data when I route through OpenRouter?
OpenRouter itself does not log prompt or response content by default. The underlying provider's data policy applies. You can enforce per-provider data policies in OpenRouter's settings — block providers that log prompts, allow only zero-logging providers, or set custom rules per API key.
Does OpenRouter have an affiliate or referral program?
As of mid-2026, OpenRouter does not have a public affiliate or referral program. The service monetizes through the per-token markup on paid models, not through referrals.
— Most reviews of openrouter skip the limits page. The limits page is the actual product.
My Honest Recommendation
OpenRouter is the only service on my list that I would install before any specific AI provider. The routing layer comes first. The specific providers come second.
If you use more than three AI providers, set up OpenRouter. The time you spend managing separate API keys, billing portals, and fallback code is time you could spend on your actual product. The 50-100ms routing overhead is negligible for almost every use case. The automated provider fallback has saved me more pipeline runs than I can count.
If you use only one provider, skip OpenRouter and call that provider directly. The routing overhead is not worth it for a single endpoint. You gain nothing from a routing layer when there is nothing to route between.
If you are building an AI routing layer from scratch, OpenRouter is the reference architecture. Its model selection, provider preferences, sticky caching, data policies, and uptime optimization are the features you would eventually need to build yourself. Start with OpenRouter. Replace it later if your scale demands it. But start with it.
Related: zero-budget AI business guide