Quickstart

This guide will help you get started with the Stream Video ESP32 SDK.

Prerequisites

  • ESP-IDF v5.4 or higher installed and environment sourced (see Installation).
  • ESP32-S3 board (e.g. with camera and mic).
  • WiFi network and a backend that can issue Stream auth tokens.

Application flow

The app uses a small high-level API (stream_video.h):

APIPurpose
stream_video_init()Initialize the SDK (call once at startup)
stream_video_join_call(params, &client)Join a call — handles coordinator, SFU, WebRTC, and media publishing automatically
stream_video_leave_call(client)Leave and clean up
stream_video_deinit()Tear down the SDK
stream_video_error_to_string(err)Convert error codes to human-readable strings

Authentication (tokens) is handled by your auth service or backend. The SDK is responsible only for joining the call: it connects to the coordinator, joins the call, and connects to the SFU. Publishing starts automatically once you have joined.

Basic example

See the examples/minimal/ directory in the stream-video-esp32 repository for a complete working example. The SDK handles camera and microphone capture internally on ESP32-S3. WiFi credentials are configured via sdkconfig.defaults or idf.py menuconfig (under Stream Video Example), while user, environment, and call settings are #define constants in main.c.

Build and run

cd examples/minimal
idf.py set-target esp32s3
idf.py menuconfig
idf.py build flash monitor

Before building, you need to configure two things in idf.py menuconfig:

1. Set WiFi credentials

Navigate to Component configStream Video Example and set your WiFi SSID and WiFi password:

Component config
Stream Video Example
WiFi password configuration

2. Select your board

Navigate to Component configStream Video SDKCamera board pin map and select the board that matches your hardware:

  • ESP32-S3 WROOM (default wiring) — Default option
  • XIAO ESP32-S3 Sense (OV2640/OV3660)
Board pin map selection

Configure Stream settings

Edit the #define constants near the top of main/main.c:

#define STREAM_AUTH_BASE_URL "https://pronto.getstream.io/"  // Your backend or Stream's token service
#define STREAM_ENVIRONMENT   "pronto"   // "production", "staging", or "pronto"
#define STREAM_USER_ID       "esp32_user"  // User ID (can be NULL for auto-generated)
#define STREAM_CALL_TYPE     "default"     // Call type (e.g., "default", "livestream")
#define STREAM_CALL_ID       "call123" // Call ID to join (can be NULL to create new call)

Ensure your backend serves a token endpoint (e.g. GET .../api/auth/create-token?environment=...&user_id=...&exp=...). See Client auth for the request/response format.

Join a call with the high-level API

In your own app, initialize the SDK once (e.g. in app_main after WiFi is up), fetch auth data from your token service, then join:

#include "stream_video.h"
#include "app_token.h"  // or your own token fetch

void app_main(void)
{
    // ... WiFi init ...

    stream_video_init();

    stream_video_auth_data_t auth_data = {0};
    stream_video_error_t err = app_request_auth_data(
        "https://pronto.getstream.io/",
        "pronto",
        "esp32_user",
        STREAM_VIDEO_DEFAULT_TOKEN_EXPIRY_SECONDS,
        &auth_data);
    if (err != STREAM_VIDEO_ERR_OK) {
        ESP_LOGE("app", "Auth failed: %s", stream_video_error_to_string(err));
        return;
    }

    stream_video_join_call_params_t params = {
        .auth_data   = &auth_data,
        .call_type   = "default",
        .call_id     = "your_call_id",
        .create      = true,
        .result_cb   = your_result_callback,
    };

    stream_video_client_handle_t client = NULL;
    err = stream_video_join_call(&params, &client);
    if (err != STREAM_VIDEO_ERR_OK) {
        ESP_LOGE("app", "Join failed: %s", stream_video_error_to_string(err));
        return;
    }
    // Publishing starts automatically after SFU connect
}

Next steps