Pay-per-token · 400+ models · OpenAI API compatible

400+ AI models,
one API key, pay per token.

Access GPT-4, Claude, Gemini, Llama, and hundreds more through a single endpoint — without juggling separate vendor accounts, rotating API keys, or negotiating per-model billing. Your code stays the same; the model is just a parameter.

  • GPT-4 · Claude · Gemini · Llama · and more
  • OpenAI API compatible — zero code changes
  • Set a token ceiling, pay actual usage
Your AI agent
Any agent
AI Models
Model Gateway
Models available 400+ One key covers the entire AI model landscape — frontier models, open-weight models, specialized models — all responding to the same call.
API format 1 integration OpenAI chat completions format. Keep the same message structure, roles, and parameters your code already uses — just change the endpoint URL.
Upfront cost $0 No subscriptions, no per-vendor accounts. Set a token ceiling per call and pay only for the tokens actually generated — nothing reserved, nothing wasted.
Who it's for

Built for teams that need AI models without the key management.

Four shapes of team who benefit most. If your workflow touches more than one model provider today, the unified billing is already in your favor.

For Model Selection

Find the right model for every task.

Benchmark GPT-4, Claude, Gemini, and Llama on the same prompt by changing one parameter — no new SDK, no new account, no billing dashboard to reconcile. Run the comparison, pick the winner, and move on. Per-token billing means the evaluation costs only what it actually generates.

POST /v1/chat/completions { model: 'claude-3-5-sonnet', messages: [...] } → same call, swap model to 'gpt-4o'
For Multi-model Agents

Route each subtask to the best model.

Agents that classify intent, reason through plans, generate code, and summarize results don't all need the same model. Route lightweight tasks to cost-efficient models and complex reasoning to frontier ones — all through one key, one billing line, one integration. The agent picks the model; the gateway handles the rest.

POST /v1/chat/completions { model: 'llama-3.3-70b', messages: [{ role: 'user', content: 'Classify intent: ...' }] }
For Rapid Prototyping

Evaluate any model in minutes.

Build the entire proof-of-concept before committing to a specific vendor. Every model in the catalog is one parameter change away — no sign-up, no waiting for API access, no surprise bill at the end of the month. Token ceiling authorization keeps experimental runs from overspending before you've shipped anything.

POST /v1/chat/completions { model: 'gemini-2.0-flash', max_tokens: 500, messages: [...] } → try any model in the catalog
For Production AI

One integration point as the market evolves.

New models join the catalog without requiring code changes on your side. When a model gets deprecated, you swap a parameter — not a vendor integration. Centralized key management means no per-model credential rotation across production secrets, no per-vendor billing to reconcile each month.

POST /v1/chat/completions { model: 'gpt-4o', tools: [...], response_format: { type: 'json_object' } }
Start in two minutes

Stop managing API keys for every model.

Work with familiar patterns, including: - Standard chat completions - System, user, and assistant message roles - Generation controls such as temperature and top_p - Function and tool calling (on supported models) - JSON mode for structured outputs

  • 400+ models
  • OpenAI API compatible
  • Token ceiling billing
FAQ

The honest answers.

If something below doesn't cover your case, ping us — we answer directly, no SDR funnel.

How is this different from calling each model's API directly?

+

One key, one endpoint, one billing line. Instead of maintaining API accounts with OpenAI, Anthropic, Google, Meta, and dozens of others — each with their own authentication, rate limits, and invoices — you send every request to the same endpoint and change a single parameter to switch models. No vendor accounts to manage, no credential rotation across production secrets.

Which models are available?

+

400+ models including GPT-4o and o1 (OpenAI), Claude 3.5 and 3.7 (Anthropic), Gemini 2.0 and 2.5 (Google), Llama 3.3 (Meta), Mistral, Qwen, DeepSeek, and many more. The catalog grows as new models release — available to you without any code changes.

How does the token ceiling work?

+

Before each call you set a max_tokens value — the upper bound on how many tokens the response can generate. You're only charged for tokens actually generated, never the ceiling. This lets autonomous agents cap their spend per call without overpaying on short responses.

Is it really OpenAI API compatible?

+

Yes. The endpoint accepts the same JSON body as OpenAI's /v1/chat/completions — message roles (system, user, assistant), temperature, top_p, tool definitions, response_format. If your code calls the OpenAI API today, changing the base URL is all it takes to access every model in the catalog.

Does tool calling and JSON mode work?

+

Yes, on models that support them. Pass your tool definitions the same way you would for OpenAI — the gateway routes them to the model and returns the tool call response in the same format. JSON mode works identically for models that implement it.

Can autonomous AI agents use this?

+

Yes — that's a primary design point. An autonomous agent can select a model, authorize a spend ceiling, and generate a response without a human managing vendor accounts or API keys. The agent pays per token from a wallet you connect.