Cartesia

Cartesia is a service that provides Speech-to-Text (STT) and Text-to-Speech (TTS) capabilities. It’s designed for real-time voice applications, making it ideal for voice AI agents, transcription pipelines, and conversational interfaces. The Cartesia plugin for the Stream Python AI SDK allows you to add the TTS functionality to your project.

Initialisation

The Cartesia plugin for Stream exists in the form of the CartesiaTTS class:

tts = CartesiaTTS()
To initialise without passing in the API key, make sure the `CARTESIA_API_KEY` is available as an environment variable. You can do this either by defining it in a `.env` file or exporting it directly in your terminal.

Parameters

These are the parameters available in the CartesiaTTS plugin for you to customise:

NameTypeDefaultDescription
api_keystr or NoneNoneYour Cartesia API key. If not provided, the plugin will look for the CARTESIA_API_KEY environment variable.
model_idstr"sonic-2"ID of the Cartesia STT or TTS model to use.
voice_idstr or None"f9836c6e-a0bd-460e-9d3c-f7299fa60f94"ID of the voice to use for TTS responses.
sample_rateint16000Sample rate (in Hz) used for audio processing.

Functionality

Set output track

The set_output_track() method sets the audio output track for the synthesized speech.

tts.set_output_track(track)

Send text to convert to speech

The send() method sends the text passed in for the service to synthesize. The resulting audio is then played through the configured output track.

tts.send("Demo text you want AI voice to say")

Example

Check out our Cartesia example to see a practical implementation of the plugin and get inspiration for your own projects.

© Getstream.io, Inc. All Rights Reserved.