Text to speech
inside every OpenClaw agent, pay per character.
Register one tool in OpenClaw and your agents can convert text to natural-sounding speech or generate sound effects from a description — no account setup, no subscription. TTS bills per character so audio cost stays proportional to what the agent says. Sound effects bill at a flat rate per call.
OpenClaw agents that speak and sound.
Four patterns where audio generation fits naturally into an OpenClaw agent — each relying on a different combination of TTS and sound effects.
Speak the agent's output to the user.
An OpenClaw conversational agent generates a text reply and voices it before returning — the TTS tool call sits at the end of the chain, converting final output to speech. Per-character billing keeps audio cost proportional to response length: a short acknowledgement costs almost nothing; a longer explanation costs more, and the user hears why.
Script, voice, and add effects in one run.
An OpenClaw production agent writes a script, converts each segment to speech via TTS, and generates sound effects for transitions — all in a single run. Budget caps bound total TTS character spend across the script; flat-rate SFX means each transition has a known, fixed cost before the agent runs.
Generate spoken alerts when conditions change.
An OpenClaw monitoring agent watches a condition and fires a spoken alert when it triggers — 'Latency spiked above threshold', 'New order received'. Short text, low character count, predictable cost per alert. The agent decides what to say based on the event; TTS handles the conversion.
Voice characters and generate scene audio together.
An OpenClaw interactive fiction agent generates NPC dialogue and voices it through TTS, then adds scene-appropriate sound effects — ambient environment, interaction cues — through the same tool. Both audio types use the same tool registration; budget caps apply to the combined audio spend for the scene.
One tool. Speech and sound inside every agent.
Two operations through one tool. For speech: the agent submits text and picks a voice — billing runs per character so audio cost stays proportional to what the agent says. For sound effects: describe the sound in plain text — 'door closing firmly', 'rain on glass', 'error chime' — and get a generated audio clip back at a flat rate. OpenClaw budget caps apply across both.
- Single OpenClaw tool
- TTS — per character
- Sound effects — flat rate
- Budget caps honored
OpenClaw-specific questions.
If something below doesn't cover your case, ping us — we work directly with OpenClaw builders, no SDR funnel.
How does this register as an OpenClaw tool?
+
It's a POST endpoint that accepts either a TTS or sound effects request body. Register it in OpenClaw as an HTTP tool with two declared costs: a per-character rate for TTS and a flat rate for sound effects. OpenClaw uses both to enforce budget caps and to show the user what each audio call will cost before the agent runs.
How do OpenClaw budget caps interact with per-character TTS billing?
+
The agent can estimate character count from the text it plans to submit before the tool call fires. OpenClaw uses the per-character rate to calculate expected cost and checks it against the remaining run budget. If the call would exceed the cap, OpenClaw stops the agent before the call is made.
Can an OpenClaw agent use both TTS and sound effects in the same run?
+
Yes. Make separate tool calls for each — one for TTS, one for the sound effect. Both go through the same tool registration. Budget caps apply to the combined cost of all audio calls in the run.
What happens if the TTS text is very long?
+
The call returns a single audio clip for the full text. For very long scripts, the agent may want to split the text into segments and make multiple TTS calls — this also allows voice or pacing changes between sections.
Does the tool return audio inline or as a URL?
+
Audio comes back as an MP3 payload in the response body. The agent can pass it downstream, save it to a file, or return it directly to the user.
Do OpenClaw agents need an audio vendor account?
+
No. The tool handles all vendor relationships. The agent pays per call from a wallet you connect — no separate audio account to create or manage.