Text to speech
inside every Hermes flow, pay per character.
Register one skill in Hermes Agent and your flows can convert text to natural-sounding speech or generate sound effects from a description — no account setup, no subscription. TTS bills per character so audio cost stays proportional to what the flow actually says. Sound effects bill at a flat rate per call.
Hermes flows that speak and sound.
Four patterns where audio generation fits naturally into a Hermes flow — each relying on a different combination of TTS and sound effects.
Speak the flow's output to the user.
A Hermes conversational flow generates a text reply and voices it before returning — the TTS skill call sits at the end of the chain, converting the final output to speech. Per-character billing keeps the audio cost proportional to response length: a short acknowledgement costs almost nothing; a longer explanation costs more, and the user hears why.
Script, voice, and add effects in one run.
A Hermes production flow writes a script, converts each segment to speech via TTS, and generates sound effects for transitions — all in a single run before handing off the assembled audio. Hermes budget caps bound total TTS character spend across the script; flat-rate SFX means each transition has a known, fixed cost before the flow fires.
Generate spoken alerts when conditions change.
A Hermes monitoring flow watches a condition and fires a spoken alert when it triggers — 'Latency spiked above threshold on eu-west-1', 'New order received from Acme'. Short text, low character count, and a predictable cost per alert. The flow decides what to say based on the event; TTS handles the rest.
Voice characters and generate scene audio together.
A Hermes interactive fiction flow generates NPC dialogue and voices it through TTS, then adds scene-appropriate sound effects — ambient environment, interaction cues — through the same skill. Both audio types come back through the same Hermes skill call pattern; budget caps apply across the full audio spend for the scene.
One skill. Speech and sound inside every flow.
Two operations through one skill. For speech: the flow submits text and picks a voice — billing runs per character so audio cost stays proportional to what the flow says. For sound effects: describe the sound in plain text — 'door closing firmly', 'rain on glass', 'error chime' — and get a generated audio clip back at a flat rate. Hermes budget caps apply across both.
- Single Hermes skill
- TTS — per character
- Sound effects — flat rate
- Flow caps honored
Hermes Agent specific questions.
If something below doesn't cover your case, ping us — we work directly with Hermes Agent builders, no SDR funnel.
How does this register as a Hermes Agent skill?
+
It's a POST endpoint that accepts either a TTS or sound effects request body. Register it in Hermes as an HTTP skill with two declared costs: a per-character rate for TTS and a flat rate for sound effects. Hermes uses both to estimate audio spend before the flow fires and to enforce budget caps mid-run.
How do Hermes budget caps interact with per-character TTS billing?
+
The flow can estimate character count from the text it plans to submit before the skill call fires. Hermes uses the per-character rate to calculate expected cost and checks it against the remaining flow budget. If the call would exceed the cap, Hermes stops the flow before the call is made rather than after.
Can a Hermes flow use both TTS and sound effects in the same run?
+
Yes. Make separate skill calls for each — one for TTS, one for the sound effect. Both go through the same skill registration. Budget caps apply to the combined cost of all audio calls in the run.
What happens if the TTS text is very long?
+
The call returns a single audio clip for the full text. For very long scripts, the flow may want to split the text into segments and make multiple TTS calls — this also gives finer control over pacing and voice changes between sections.
Does the skill return audio inline or as a URL?
+
Audio comes back as an MP3 payload in the response body. The flow can pass it downstream, save it to a filesystem, or return it directly to the user.
Do Hermes flows need an audio vendor account?
+
No. The skill handles all vendor relationships. The flow pays per call from a wallet you connect — no separate audio account to create or manage.