Build an Audio Room with Stream

This tutorial will teach you how to build an audio room experience like Twitter Spaces or Clubhouse. The end result will look like the image below and will support the following features:

Backstage mode. You can start the call with your co-hosts and chat a bit before going live
Calls run on Stream's global edge network for optimal latency and scalability
There is no cap to how many listeners you can have in a room
Listeners can raise their hand, and be invited to speak by the host
Audio tracks are sent multiple times for optimal reliability

Time to get started building an audio-room for your app.

Step 0 - Prepare your environment

For this tutorial, you'll need a few tools to be installed on your device. You can skip this step in case you already have them installed.

Node.js (version 18 or higher)
Yarn (version 1.22 or higher)

Step 1 - Create a new web app and install the Stream Video SDK

In this step, we will create a new web application using the Vite CLI, and install Stream's Video SDK. We recommend using Vite because its fast and easy to use.

Terminal (bash)

1
2
3
yarn create vite audio-rooms --template vanilla-ts
cd audio-rooms
yarn add @stream-io/video-client

Step 2 - Create & Join a call

Open up src/main.ts and replace it with this code:

src/main.ts (tsx)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import { StreamVideoClient, User } from '@stream-io/video-client';

const apiKey = 'REPLACE_WITH_API_KEY';
const token = 'REPLACE_WITH_TOKEN';
const userId = 'REPLACE_WITH_USER_ID';
const callId = 'REPLACE_WITH_CALL_ID';

// set up the user object
const user: User = {
  id: userId,
  name: 'Oliver',
  image: 'https://getstream.io/random_svg/?id=oliver&name=Oliver',
};

const client = new StreamVideoClient({ apiKey, token, user });
const call = client.call('audio_room', callId);
call.microphone.enable();
call.join({
  create: true,
  data: {
    members: [/* { user_id: 'john_smith' }, { user_id: 'jane_doe' } */],
    custom: {
      title: 'Audio Rooms',
      description: 'Talking about technology',
    },
  },
});

Let's review the example above and go over the details.

User setup

First, we create a user object. You typically sync your users via a server side integration from your own backend. Alternatively, you can also use guest or anonymous users.

1
2
3
4
5
6
7
import type { User } from '@stream-io/video-client';

const user: User = {
  id: userId,
  name: 'Oliver',
  image: 'https://getstream.io/random_svg/?id=oliver&name=Oliver',
};

Client setup

Next, we initialize the client by passing the API Key, user and user token.

1
2
3
import { StreamVideoClient } from '@stream-io/video-client';

const client = new StreamVideoClient({ apiKey, user, token });

Create and join call

After the user and client are created, we create a call like this:

tsx

1
2
3
4
5
6
7
8
9
10
11
12
const call = client.call('audio_room', callId);
call.microphone.enable();
await call.join({
  create: true,
  data: {
    members: [{ user_id: 'john_smith' }, { user_id: 'jane_doe' }],
    custom: {
      title: 'Audio Rooms',
      description: 'Talking about technology',
    },
  },
});

This enables the microphone before joining the call
Then joins and creates a call with the type: audio_room and the specified callId
The users with id john_smith and jane_doe are added as members to the call
And we set the title and description custom field on the call object

Read more in our Joining and Creating Calls guide.

To actually run this sample we need a valid user token. The user token is typically generated by your server side API. When a user logs in to your app you return the user token that gives them access to the call. To make this tutorial easier to follow we've generated the credentials for you.

Step 3 - Adding audio room UI elements

In this next step, we'll add:

Room title and description
Controls to toggle live mode on/off
A list of participants with their speaking status

Room Title & Description

Copy the following code to the index.html file:

index.html (html)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>Audio Rooms Tutorial</title>
  </head>
  <body>
    <div id="app">
      <div class="description-panel">
        <h2 class="title"></h2>
        <h3 class="description"></h3>
        <p class="participant-count"></p>
      </div>
      <div class="participants-panel">
        <h4>Speakers</h4>
        <ul class="speakers-list">
          <!-- we'll add the participants here -->
        </ul>
      </div>
      <div class="controls-panel">
        <button type="button" class="toggle-live-button">Go Live</button>
        <button type="button" class="toggle-mute-button">Mute</button>
      </div>
    </div>
    <script type="module" src="/src/main.ts"></script>
  </body>
</html>

For filling in the data, we take the state of the call by observing call.state observables. Read more about it: Call & Participant State.

Now, let's add the following code to the bottom of src/main.ts:

src/main.ts (tsx)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// ... previous code

// utility function to get DOM elements
const $$ = (selector: string) => document.querySelector<HTMLElement>(selector);

// references to the DOM elements
const titleElement = $$('.description-panel .title');
const descriptionElement = $$('.description-panel .description');
const participantCountElement = $$('.description-panel .participant-count');

// subscribe for call custom data changes
call.state.custom$.subscribe((custom) => {
  titleElement!.textContent = custom.title;
  descriptionElement!.textContent = custom.description;
});

// subscribe for call participant count changes
call.state.participantCount$.subscribe((count) => {
  participantCountElement!.textContent = `${count} participants`;
});

To make this a little more interactive, let's join the audio room from the browser.

For testing you can join the call on our web-app:

Backstage & Live mode control

As you probably noticed by opening the same room from the browser, audio rooms by default are not live. Regular users can only join an audio room when it is in live mode. Let's expand the Controls Panel and add a button that controls the backstage of the room.

src/main.ts (tsx)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// ... previous code

// go live button
const toggleLiveButton = $$('.controls-panel .toggle-live-button');
toggleLiveButton!.addEventListener('click', async () => {
  const isLive = !call.state.backstage;
  if (isLive) {
    await call.stopLive();
  } else {
    await call.goLive();
  }
});

// subscribe for call backstage changes: backstage is false when the call is live
call.state.backstage$.subscribe((backstage) => {
  const isLive = !backstage;
  toggleLiveButton!.textContent = isLive ? 'Stop Live' : 'Go Live';
});

While we're at it, let's also add a button that allows to mute/unmute the local audio track:

src/main.ts (tsx)

1
2
3
4
5
6
7
8
9
10
11
12
13
// ... previous code

// mute button
const toggleMuteButton = $$('.controls-panel .toggle-mute-button');
toggleMuteButton!.addEventListener('click', async () => {
  await call.microphone.toggle();
});

// subscribe for call microphone changes
call.microphone.state.status$.subscribe((state) => {
  const isMuted = state !== 'enabled';
  toggleMuteButton!.textContent = isMuted ? 'Unmute' : 'Mute';
});

Now the app exposes a mic control button and a button that allows to toggle live mode on/off. If you try the web demo of the audio room, you should be able to join as a regular user.

List Participants

As a next step, let's render the actual list of participants and show an indicator when they are speaking.

src/main.ts (tsx)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// ... previous code

// participant list
const speakerListElement = $$('.participants-panel .speakers-list');
call.state.participants$.subscribe((participants) => {
  // clear the list
  speakerListElement!.innerHTML = '';

  // loop over the participants and add them to the list
  participants.forEach((participant) => {
    const name = participant.name || participant.userId;
    const isSpeaking = participant.isSpeaking ? ' (speaking)' : '';

    const participantElement = document.createElement('li');
    participantElement.textContent = `${name}${isSpeaking}`;
    speakerListElement!.appendChild(participantElement);
  });
});

With these changes, things get more interesting, the app is now showing a list of all participants connected to the call and displays a (speaking) suffix to the ones that are speaking.

However, you might have noticed that you can't hear the audio from the browser. To enable this, you need to render an audio element for every participant and provide it to our SDK, so it can bind the appropriate audio track to it. Let's update our participant list rendering code:

src/main.ts (tsx)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// participant list
const speakerListElement = $$('.participants-panel .speakers-list');
call.state.participants$.subscribe((participants) => {
  // clear the list
  speakerListElement!.innerHTML = '';

  // loop over the participants and add them to the list
  participants.forEach((participant) => {
    const name = participant.name || participant.userId;
    const isSpeaking = participant.isSpeaking ? ' (speaking)' : '';

    const participantElement = document.createElement('li');
    participantElement.textContent = `${name}${isSpeaking}`;

    // create an audio element for the participant
    const audioElement = document.createElement('audio');
    participantElement.appendChild(audioElement);

    speakerListElement!.appendChild(participantElement);

    // will bind this participant's audio track to the provided audioElement
    call.bindAudioElement(audioElement, participant.sessionId, "audioTrack");
  });
});

Step 4 - Go live and join from the browser

If you now join the call from the browser you will see that the participant list updates as you open/close the browser tab.

Note how the web interface won't allow you to share your audio/video. The reason for this is that by default the audio_room call type only allows moderators or admins to speak. Regular participants can request permission. And if different defaults make sense for your app you can edit the call type in the dashboard or create your own.

Step 4.1 - Enable Noise Cancellation

Background noise in an audio session is never a pleasant experience for the listeners and the speaker.

Our SDK provides a plugin that helps to greatly reduce the unwanted noise caught by your microphone. Read more on how to enable it here.

Step 5 - Requesting permission to speak

Requesting permission to speak is quite straight forward. Let's first have a quick look at how the SDK call object exposes this:

Requesting permission to speak

tsx

1
2
3
4
5
import { OwnCapability } from '@stream-io/video-client';

await call.requestPermissions({
  permissions: [OwnCapability.SEND_AUDIO],
});

Handling permission requests

Permission requests are delivered to the call object in the form of an event one can subscribe to:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import type { PermissionRequestEvent } from '@stream-io/video-client';

const unsubscribe = call.on('call.permission_requested', async (request: PermissionRequestEvent) => {
  // get the permission request data
  const { user, permissions } = request;

  // reject it
  await call.revokePermissions(user.id, permissions);

  // or grant it
  await call.grantPermissions(user.id, permissions);
});

// remember to unsubscribe when you're done
unsubscribe();

Let's add another view that shows the incoming permission requests as well as the buttons to grant / reject it.

We start by updating the index.html file:

index.html (html)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>Audio Rooms Tutorial</title>
  </head>
  <body>
    <div id="app">
      <div class="permission-requests" style="display: none">
        <h4>Permission requests</h4>
        <ul class="permission-requests-list">
          <!-- we'll add the permission requests here -->
        </ul>
      </div>
      <div class="description-panel">
        <h2 class="title"></h2>
        <h3 class="description"></h3>
        <p class="participant-count"></p>
      </div>
      <div class="participants-panel">
        <h4>Speakers</h4>
        <ul class="speakers-list">
          <!-- we'll add the participants here -->
        </ul>
      </div>
      <div class="controls-panel">
        <button type="button" class="toggle-live-button">Go Live</button>
        <button type="button" class="toggle-mute-button">Mute</button>
      </div>
    </div>
    <script type="module" src="/src/main.ts"></script>
  </body>
</html>

Next, let's add the code to render and update the permission requests:

src/main.ts (tsx)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
import { PermissionRequestEvent } from '@stream-io/video-client';

// ... previous code

// permission requests
const permissionRequestsElement = $$('.permission-requests');
const permissionRequestListElement = $$('.permission-requests .permission-requests-list');

// subscribe for permission requests
call.on('call.permission_request', (permissionRequest) => {
  const {
    user: { id: userId, name },
    permissions,
  } = permissionRequest as PermissionRequestEvent;

  // create the list item
  const requestElement = document.createElement('li');

  // create the request elements
  const requestTextElement = document.createElement('span');
  const allowButton = document.createElement('button');
  const denyButton = document.createElement('button');

  requestTextElement.textContent = `${name} wants to ${permissions.join(', ')}`;
  allowButton.textContent = 'Allow';
  denyButton.textContent = 'Deny';

  // allow button click handler
  allowButton.addEventListener('click', async () => {
    await call.grantPermissions(userId, permissions);
    permissionRequestListElement!.removeChild(requestElement);
    permissionRequestsElement!.style.display = permissionRequestListElement!.childElementCount ? 'block' : 'none';
  });

  // deny button click handler
  denyButton.addEventListener('click', async () => {
    await call.revokePermissions(userId, permissions);
    permissionRequestListElement!.removeChild(requestElement);
    permissionRequestsElement!.style.display = permissionRequestListElement!.childElementCount ? 'block' : 'none';
  });

  // bridge everything together
  requestElement.appendChild(requestTextElement);
  requestElement.appendChild(allowButton);
  requestElement.appendChild(denyButton);
  permissionRequestListElement!.appendChild(requestElement);
  permissionRequestsElement!.style.display = 'block';
});

Step 6 - Group participants

It is common for audio rooms and similar interactive audio/video experiences to show users in separate groups. Let's see how we can update this application to render participants in two separate sections: Speakers and Listeners.

Building custom layout is straight forward. All we need to do is to apply some filtering to the result of call.state.participants$ observable.

Lets update our Participant Panel section:

index.html (html)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <title>Audio Rooms Tutorial</title>
  </head>
  <body>
    <div id="app">
      <div class="permission-requests" style="display: none">
        <h4>Permission requests</h4>
        <ul class="permission-requests-list">
          <!-- we'll add the permission requests here -->
        </ul>
      </div>
      <div class="description-panel">
        <h2 class="title"></h2>
        <h3 class="description"></h3>
        <p class="participant-count"></p>
      </div>
      <div class="participants-panel">
        <h4>Speakers</h4>
        <ul class="speakers-list">
          <!-- we'll add the participants here -->
        </ul>
        <h4>Listeners</h4>
        <ul class="listeners-list">
          <!-- we'll add the participants here -->
        </ul>
      </div>
      <div class="controls-panel">
        <button type="button" class="toggle-live-button">Go Live</button>
        <button type="button" class="toggle-mute-button">Mute</button>
      </div>
    </div>
    <script type="module" src="/src/main.ts"></script>
  </body>
</html>

Now, let's update the code accordingly:

src/main.ts (tsx)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import { SfuModels, StreamVideoParticipant } from '@stream-io/video-client';

// ... previous code

// participant lists
const speakerListElement = $$('.participants-panel .speakers-list');
const listenersListElement = $$('.participants-panel .listeners-list');
call.state.participants$.subscribe((participants) => {
  // clear the lists
  speakerListElement!.innerHTML = '';
  listenersListElement!.innerHTML = '';

  // a helper function to check if a participant has audio
  const hasAudio = (p: StreamVideoParticipant) =>
    p.publishedTracks.includes(SfuModels.TrackType.AUDIO);

  // loop over the participants and add them to the list
  participants.forEach((participant) => {
    const name = participant.name || participant.userId;
    const isSpeaking = participant.isSpeaking ? ' (speaking)' : '';

    const participantElement = document.createElement('li');
    participantElement.textContent = `${name}${isSpeaking}`;
    if (hasAudio(participant)) {
      const audioElement = document.createElement("audio");
      participantElement.appendChild(audioElement);
      speakerListElement!.appendChild(participantElement);
      // will bind this participant's audio track to the provided audioElement
      call.bindAudioElement(audioElement, participant.sessionId, "audioTrack");
    } else {
      listenersListElement!.appendChild(participantElement);
    }
  });
});

// ... remaining code

With these changes applied, the app now renders participants in two separate groups: Speakers and Listeners.

Because of simplicity, in this tutorial, we are skipping some of the best practices for building a production ready app. Take a look at our sample app linked at the end of this tutorial for a more complete example.

Other built-in features

There are a few more exciting features that you can use to build audio rooms

Query Calls: You can query calls to easily show upcoming calls, calls that recently finished as well as call previews.
Reactions & Custom events: Reactions and custom events are supported.
Recording & Broadcasting: You can record and broadcast your calls.
Chat: Stream's Chat SDKs are fully featured and you can integrate them in the call
Moderation: Moderation capabilities are built-in to the product
Transcriptions: You can enable transcriptions for your calls

Recap

It was fun to see just how quickly you can build an audio-room for your app. Please do let us know if you ran into any issues. Our team is also happy to review your UI designs and offer recommendations on how to achieve it with Stream.

To recap what we've learned:

You set up a call with const call = client.call('audio_room', '123')
The call type audio_room controls which features are enabled and how permissions are set up
The audio_room by default enables backstage mode, and only allows admins and the creator of the call to join before the call goes live
When you join a call, realtime communication is set up for audio: await call.join()
Call state call.state the exposed observables make it easy to build your own UI
For audio rooms, we use Opus RED and Opus DTX for optimal audio quality.

We've used Stream's Audio Rooms API, which means calls run on a global edge network of video servers. By being closer to your users the latency and reliability of calls are better. The JavaScript Video SDK enables you to build in-app video calling, audio rooms and livestreaming in days.

We hope you've enjoyed this tutorial and please do feel free to reach out if you have any suggestions or questions. You can find the code and the stylesheet for this tutorial in this CodeSandbox.

The source code for the companion audio room app, together with all of its features, is available on GitHub.

Final Thoughts

In this video app tutorial we built a fully functioning iOS messaging app with our iOS SDK component library. We also showed how easy it is to customize the behavior and the style of the iOS video app components with minimal code changes.

Both the video SDK for Javascript and the API have plenty more features available to support more advanced use-cases.

Javascript Audio Room Tutorial

Step 0 - Prepare your environment

Step 1 - Create a new web app and install the Stream Video SDK

Step 2 - Create & Join a call

User setup

Client setup

Create and join call

Step 3 - Adding audio room UI elements

Room Title & Description

Backstage & Live mode control

List Participants

Step 4 - Go live and join from the browser

Step 4.1 - Enable Noise Cancellation

Step 5 - Requesting permission to speak

Requesting permission to speak

Handling permission requests

Step 6 - Group participants

Other built-in features

Recap

Final Thoughts

Give us feedback!

Start coding for free