Hermes Agent · 400+ models · Pay-per-token

400+ AI models
in every Hermes flow, pay per token.

Register one skill in Hermes Agent and every model in the catalog — GPT-4, Claude, Gemini, Llama, and hundreds more — becomes a single parameter in your flow. No vendor accounts to manage, no API keys to rotate, no per-model billing to reconcile.

✓ Hermes-native skill — one POST
✓ GPT-4 · Claude · Gemini · Llama · and more
✓ OpenAI API compatible chat completions
✓ Per-flow budget caps respected

Try a model →

AI agent

Hermes Agent

AI Models

Model Gateway

Models available 400+ One Hermes skill connects to the entire AI model catalog. Frontier models, open-weight models, specialized models — all in one flow.

Skill registration 1 endpoint Register one HTTP skill in Hermes. Every model is a parameter — no additional skill registrations as new models join the catalog.

Time to wire it in ~5 min Add one skill to your Hermes registry, declare per-token cost, ship. No vendor accounts, no API key rotation in flow configuration.

What Hermes Agent builders ship

Real Hermes flows that need more than one model.

Each pattern below is a Hermes Agent flow with the AI model gateway registered as one skill. The math works because each model call costs only the tokens it generates.

Hermes model routing flow

Send each step to the right model.

A Hermes flow that classifies intent doesn't need the same model as the one that writes the final output. Register the gateway once and let the flow pick the model per step — lightweight classification to a fast cheap model, complex drafting to a frontier one. One skill, one budget line, regardless of how many models the flow touches.

POST /v1/chat/completions { model: 'llama-3.3-70b', messages: [{ role: 'user', content: 'Classify: ...' }] }

Hermes evaluation flow

Compare model outputs inside the flow.

A Hermes evaluation flow sends the same prompt to GPT-4, Claude, and Gemini in parallel skill calls, then scores and selects the best response. One skill registration covers all three — changing the model parameter is the entire diff between branches. Per-token billing keeps comparison runs proportional to what they actually generate.

POST /v1/chat/completions { model: 'gpt-4o', messages: [...] } → parallel call with model: 'claude-3-7-sonnet'

Hermes RAG flow

Pick the retrieval model separately from the generation model.

A Hermes RAG flow can use a fast embedding-capable model to retrieve context and a more capable model to generate the final answer — all through the same skill. The flow declares the model per step; Hermes budget caps keep total spend predictable across both retrieval and generation calls.

POST /v1/chat/completions { model: 'gemini-2.0-flash', messages: [{ role: 'system', content: 'You are a retrieval assistant...' }, ...] }

Hermes tool-calling flow

Run structured tool calls across any model.

A Hermes agent flow that needs tool calling or JSON mode can route those steps to models that support them — without registering a different skill for each. Pass tool definitions the same way you would for OpenAI; the gateway routes them and returns responses in the same Hermes-parseable format. Idle flows cost zero.

POST /v1/chat/completions { model: 'gpt-4o', tools: [...], response_format: { type: 'json_object' }, messages: [...] }

Hermes-ready in two minutes

One skill. Every model your flow will ever need.

Work with familiar patterns, including: - Standard chat completions - System, user, and assistant message roles - Generation controls such as temperature and top_p - Function and tool calling (on supported models) - JSON mode for structured outputs

Try a model →

Single Hermes skill
400+ models
Flow caps honored
OpenAI API compatible

FAQ

Hermes Agent specific questions.

If something below doesn't cover your case, ping us — we work directly with Hermes Agent builders, no SDR funnel.

How does this register as a Hermes Agent skill?

It's a standard POST endpoint that accepts the OpenAI chat completions body. Register it in Hermes the same way you'd register any HTTP skill — endpoint, schema, per-token price. Hermes uses the price to plan flow budgets and to show the user what each model call will cost before the flow fires.

Does Hermes need separate API accounts for each model?

No. The gateway holds the vendor relationships. Hermes pays per token from a payment method you choose — no OpenAI account, no Anthropic account, no Google AI account to create or maintain. One key, one billing line, regardless of how many models your flows call.

How does the token ceiling interact with Hermes budget caps?

They stack. Each skill call sets a max_tokens ceiling so no single generation step can overspend. Hermes flow budget caps then bound the total spend across all skill calls in a run. The flow stops issuing new calls when either limit is reached.

Can a Hermes flow call different models in the same run?

Yes — that's the primary design point. The model is a parameter on each skill call. A single flow can call GPT-4 for reasoning, Llama for classification, and Gemini for summarization without registering three skills or managing three API keys.

Does tool calling work in Hermes flows?

Yes, on models that support it. Pass tool definitions in the same format as OpenAI. The gateway routes them and returns tool call responses in the same structure Hermes already parses.

What happens when a model is deprecated?

Update one parameter in the flow — the model name. No skill re-registration, no new API account, no integration work. New models are added to the catalog automatically and become available to your flows without code changes.