400+ AI models
in every Hermes flow, pay per token.
Register one skill in Hermes Agent and every model in the catalog — GPT-4, Claude, Gemini, Llama, and hundreds more — becomes a single parameter in your flow. No vendor accounts to manage, no API keys to rotate, no per-model billing to reconcile.
Real Hermes flows that need more than one model.
Each pattern below is a Hermes Agent flow with the AI model gateway registered as one skill. The math works because each model call costs only the tokens it generates.
Send each step to the right model.
A Hermes flow that classifies intent doesn't need the same model as the one that writes the final output. Register the gateway once and let the flow pick the model per step — lightweight classification to a fast cheap model, complex drafting to a frontier one. One skill, one budget line, regardless of how many models the flow touches.
Compare model outputs inside the flow.
A Hermes evaluation flow sends the same prompt to GPT-4, Claude, and Gemini in parallel skill calls, then scores and selects the best response. One skill registration covers all three — changing the model parameter is the entire diff between branches. Per-token billing keeps comparison runs proportional to what they actually generate.
Pick the retrieval model separately from the generation model.
A Hermes RAG flow can use a fast embedding-capable model to retrieve context and a more capable model to generate the final answer — all through the same skill. The flow declares the model per step; Hermes budget caps keep total spend predictable across both retrieval and generation calls.
Run structured tool calls across any model.
A Hermes agent flow that needs tool calling or JSON mode can route those steps to models that support them — without registering a different skill for each. Pass tool definitions the same way you would for OpenAI; the gateway routes them and returns responses in the same Hermes-parseable format. Idle flows cost zero.
One skill. Every model your flow will ever need.
Work with familiar patterns, including: - Standard chat completions - System, user, and assistant message roles - Generation controls such as temperature and top_p - Function and tool calling (on supported models) - JSON mode for structured outputs
- Single Hermes skill
- 400+ models
- Flow caps honored
- OpenAI API compatible
Hermes Agent specific questions.
If something below doesn't cover your case, ping us — we work directly with Hermes Agent builders, no SDR funnel.
How does this register as a Hermes Agent skill?
+
It's a standard POST endpoint that accepts the OpenAI chat completions body. Register it in Hermes the same way you'd register any HTTP skill — endpoint, schema, per-token price. Hermes uses the price to plan flow budgets and to show the user what each model call will cost before the flow fires.
Does Hermes need separate API accounts for each model?
+
No. The gateway holds the vendor relationships. Hermes pays per token from a wallet you connect — no OpenAI account, no Anthropic account, no Google AI account to create or maintain. One key, one billing line, regardless of how many models your flows call.
How does the token ceiling interact with Hermes budget caps?
+
They stack. Each skill call sets a max_tokens ceiling so no single generation step can overspend. Hermes flow budget caps then bound the total spend across all skill calls in a run. The flow stops issuing new calls when either limit is reached.
Can a Hermes flow call different models in the same run?
+
Yes — that's the primary design point. The model is a parameter on each skill call. A single flow can call GPT-4 for reasoning, Llama for classification, and Gemini for summarization without registering three skills or managing three API keys.
Does tool calling work in Hermes flows?
+
Yes, on models that support it. Pass tool definitions in the same format as OpenAI. The gateway routes them and returns tool call responses in the same structure Hermes already parses.
What happens when a model is deprecated?
+
Update one parameter in the flow — the model name. No skill re-registration, no new API account, no integration work. New models are added to the catalog automatically and become available to your flows without code changes.