API Reference
API Reference / Endpoints

Speech endpoints

/v1/speech/*

TTS, STT, dialogue, dubbing, voice isolation, and related speech helpers.

What this section covers

Endpoint-level reference for the current /v1/speech/* surface.

Speech routes mix JSON and multipart inputs, so the biggest integration mistake is usually the wrong request format rather than the wrong path.

Some speech workflows are async, some are mixed, and the reserved streaming routes still return 501. Plan for that explicitly in your client.

Start here

Recommended starting point

Use POST/v1/speech/text-to-speech for standard async TTS.

Recommended starting point

Use POST/v1/speech/speech-to-text with multipart audio for transcription.

Recommended starting point

Use GET/v1/speech/voices before building a picker UI.

Synthesis

Turn text into voice or multi-speaker audio.

Transcription and cleanup

Transcribe files or isolate clean voice audio.

Localization and advanced flows

Dubbing and future voice transformation surfaces.

Was this page helpful?