TTS · Sound effects · No account setup

Text to speech.
Sound effects. Pay for what you generate.

Convert text to natural-sounding speech or generate sound effects from a description — through one API with no account setup. TTS bills by character count so cost scales with content length. Sound effects bill at a flat rate per call regardless of output length.

  • Text-to-Speech — billed per character
  • Sound Effects — flat rate per call
  • No account setup, no subscription
  • One API, two operations
Your AI agent
Any agent
Audio
Audio API
TTS billing Per character Speech costs scale with what you actually convert — a one-sentence alert costs a fraction of a long paragraph. No credit bundles to budget against.
Sound effects billing Flat rate per call Each sound effect call costs the same regardless of how long or complex the output is. Budget per event, not per second of audio.
Setup required $0 No account, no subscription, no API key to request. Start generating audio with a single API call.
Who it's for

Audio generation for agents that need to speak or sound.

TTS and sound effects serve different needs — these four patterns cover the most common reasons to reach for each.

For Voice Agents

Speak LLM responses aloud.

A conversational agent generates a text reply and immediately converts it to speech before returning it to the user. Per-character TTS billing keeps short responses cheap and long ones proportional — a 20-word confirmation costs far less than a full explanation. No pre-recorded clips to maintain, no voice actor to commission.

POST /audio/tts { text: 'Your order ships tomorrow.', voice: 'nova' } → audio/mp3
For Content Pipelines

Voice scripts and add effects in one flow.

An automated production pipeline takes a written script, converts each segment to speech via TTS, and stitches in generated sound effects — intro stings, scene transitions, ambient sound — through the same API. Character-count billing keeps voiceover cost proportional to script length across high-volume runs.

POST /audio/tts { text: '[full script]' } → POST /audio/sfx { description: 'upbeat intro sting, 3 seconds' } → combine
For Game and Interactive Apps

Generate dialogue and ambient audio from game state.

A game agent generates NPC dialogue dynamically via TTS — no pre-recorded line for every possible conversation branch. Sound effects for environmental cues — footsteps on gravel, distant thunder, door mechanisms — come from text descriptions of the game state. Flat-rate SFX billing makes per-event audio cost predictable regardless of clip length.

POST /audio/tts { text: 'The merchant eyes you carefully...' } → POST /audio/sfx { description: 'wooden door creaking open' } → play both
For Notification Systems

Speak alerts and status updates.

An agent monitoring a process generates spoken alerts when conditions change — 'Build failed on main', 'Payment received from Acme'. Short text, low character count, low cost per notification. TTS handles the conversion; the agent decides when to speak based on event severity, not on a fixed script.

POST /audio/tts { text: 'Deployment complete. 3 errors found in logs.', voice: 'alloy' } → audio/mp3
Start in two minutes

Audio generation without the studio setup.

Two operations through one endpoint. For speech: submit text and pick a voice — billing runs per character so cost scales with what you actually say. For sound effects: describe what you need in plain text — 'thunder crack fading to rain', '8-bit coin pickup', 'crowd noise in a small stadium' — and get a generated audio clip back at a flat rate per call.

  • TTS — per character
  • Sound effects — flat rate
  • No account setup
  • One API, two operations
FAQ

The honest answers.

If something below doesn't cover your case, ping us — we answer directly, no SDR funnel.

What's the difference between TTS and sound effects?

+

TTS converts written text to spoken audio — you supply the words, choose a voice, and get back a speech clip. Sound effects generate audio from a description of a sound — you describe what you want to hear, not a script. Same endpoint, different operation type, different billing model.

How does character-count billing work for TTS?

+

You're billed for the number of characters in the text you submit. A 50-character sentence costs half what a 100-character sentence costs. Whitespace and punctuation count. There's no minimum per call, so short alerts are proportionally cheap.

What does the flat rate cover for sound effects?

+

One flat rate per call, regardless of how long the generated audio turns out to be. A two-second clip and a thirty-second clip cost the same. This makes per-event audio budgeting straightforward — each game event, each UI cue, each notification sound is one predictable charge.

What voices are available for TTS?

+

Multiple voices with different characteristics — tone, gender, pace. Pass a voice identifier in the request body. The available voices are listed in the API reference.

What audio format does the API return?

+

MP3. Both TTS and sound effect responses return an audio/mp3 payload you can stream, save, or pipe directly into an audio playback pipeline.

Can I generate TTS and sound effects in the same request?

+

No — each call handles one operation. Submit a TTS request for speech and a separate sound effects request for generated audio. Combining them is done on your side, either by the agent or by a downstream audio pipeline.