tts = KokoroTTS()
Kokoro
Kokoro is a local Text-to-Speech (TTS) engine powered by the Kokoro-82M model.
It generates lifelike voice audio directly on your machine, without requiring an internet connection or external API.
Designed for low-latency and real-time use cases, Kokoro is ideal for projects that need fast, offline voice synthesis—such as on-device agents, prototypes, or privacy-sensitive applications.
The KokoroTTS plugin in the Stream Python AI SDK allows you to use the Kokoro model with configurable voices, playback speed, and device support (CPU/GPU).
Initialisation
The Kokoro plugin for Stream exists in the form of the KokoroTTS
class:
Parameters
These are the parameters available in the KokoroTTS plugin for you to customise:
Name | Type | Default | Description |
---|---|---|---|
lang_code | str | "a" | Language code for the TTS model. "a" refers to American English. |
voice | str | "af_heart" | The voice style or speaker preset to use. |
speed | float | 1.0 | Playback speed multiplier. Use values like 0.9 (slower) or 1.2 (faster). |
sample_rate | int | 24000 | Audio sample rate in Hz. |
device | str or None | None | The device to run synthesis on (e.g., "cuda" or "cpu" ). Auto-detected if not specified. |
Functionality
Set output track
The set_output_track()
method sets the audio output track for the synthesized speech.
tts.set_output_track(track)
Send text to convert to speech
The send()
method sends the text passed in for the service to synthesize.
The resulting audio is then played through the configured output track.
tts.send("Demo text you want AI voice to say")
Example
Check out our Kokoro example to see a practical implementation of the plugin and get inspiration for your own projects.