Hermes Agent · TTS + sound effects · Per-character billing

Text to speech
inside every Hermes flow, pay per character.

Register one skill in Hermes Agent and your flows can convert text to natural-sounding speech or generate sound effects from a description — no account setup, no subscription. TTS bills per character so audio cost stays proportional to what the flow actually says. Sound effects bill at a flat rate per call.

✓ Hermes-native skill — one POST
✓ TTS — billed per character
✓ Sound effects — flat rate per call
✓ Per-flow budget caps respected

Generate audio →

AI agent

Hermes Agent

Audio

Audio API

TTS billing Per character Hermes flows pay only for the characters they convert. Short spoken confirmations cost a fraction of full explanations — no fixed audio credits to exhaust.

Sound effects billing Flat rate per call One predictable charge per sound effect call, regardless of output length. Hermes budget caps can account for each audio event at a fixed cost.

Time to wire it in ~5 min Register one skill in your Hermes registry, declare the per-character TTS rate and flat SFX rate, ship. No audio vendor account to create.

What Hermes Agent builders ship

Hermes flows that speak and sound.

Four patterns where audio generation fits naturally into a Hermes flow — each relying on a different combination of TTS and sound effects.

Hermes voice response flow

Speak the flow's output to the user.

A Hermes conversational flow generates a text reply and voices it before returning — the TTS skill call sits at the end of the chain, converting the final output to speech. Per-character billing keeps the audio cost proportional to response length: a short acknowledgement costs almost nothing; a longer explanation costs more, and the user hears why.

... flow generates reply → POST /audio/tts { text: reply, voice: 'nova' } → return audio to user

Hermes content production flow

Script, voice, and add effects in one run.

A Hermes production flow writes a script, converts each segment to speech via TTS, and generates sound effects for transitions — all in a single run before handing off the assembled audio. Hermes budget caps bound total TTS character spend across the script; flat-rate SFX means each transition has a known, fixed cost before the flow fires.

generate script → POST /audio/tts { text: segment } × N → POST /audio/sfx { description: 'transition sting' } → assemble

Hermes alert flow

Generate spoken alerts when conditions change.

A Hermes monitoring flow watches a condition and fires a spoken alert when it triggers — 'Latency spiked above threshold on eu-west-1', 'New order received from Acme'. Short text, low character count, and a predictable cost per alert. The flow decides what to say based on the event; TTS handles the rest.

condition triggers → POST /audio/tts { text: 'Latency spike detected on eu-west-1.', voice: 'alloy' } → send audio

Hermes interactive narrative flow

Voice characters and generate scene audio together.

A Hermes interactive fiction flow generates NPC dialogue and voices it through TTS, then adds scene-appropriate sound effects — ambient environment, interaction cues — through the same skill. Both audio types come back through the same Hermes skill call pattern; budget caps apply across the full audio spend for the scene.

generate dialogue → POST /audio/tts { text: dialogue } → POST /audio/sfx { description: 'tavern ambient noise, low' } → play both

Hermes-ready in two minutes

One skill. Speech and sound inside every flow.

Two operations through one skill. For speech: the flow submits text and picks a voice — billing runs per character so audio cost stays proportional to what the flow says. For sound effects: describe the sound in plain text — 'door closing firmly', 'rain on glass', 'error chime' — and get a generated audio clip back at a flat rate. Hermes budget caps apply across both.

Generate audio →

Single Hermes skill
TTS — per character
Sound effects — flat rate
Flow caps honored

FAQ

Hermes Agent specific questions.

If something below doesn't cover your case, ping us — we work directly with Hermes Agent builders, no SDR funnel.

How does this register as a Hermes Agent skill?

It's a POST endpoint that accepts either a TTS or sound effects request body. Register it in Hermes as an HTTP skill with two declared costs: a per-character rate for TTS and a flat rate for sound effects. Hermes uses both to estimate audio spend before the flow fires and to enforce budget caps mid-run.

How do Hermes budget caps interact with per-character TTS billing?

The flow can estimate character count from the text it plans to submit before the skill call fires. Hermes uses the per-character rate to calculate expected cost and checks it against the remaining flow budget. If the call would exceed the cap, Hermes stops the flow before the call is made rather than after.

Can a Hermes flow use both TTS and sound effects in the same run?

Yes. Make separate skill calls for each — one for TTS, one for the sound effect. Both go through the same skill registration. Budget caps apply to the combined cost of all audio calls in the run.

What happens if the TTS text is very long?

The call returns a single audio clip for the full text. For very long scripts, the flow may want to split the text into segments and make multiple TTS calls — this also gives finer control over pacing and voice changes between sections.

Does the skill return audio inline or as a URL?

Audio comes back as an MP3 payload in the response body. The flow can pass it downstream, save it to a filesystem, or return it directly to the user.

Do Hermes flows need an audio vendor account?

No. The skill handles all vendor relationships. The flow pays per call from a payment method you choose — no separate audio account to create or manage.