How to Build an Android Video Calling App
This tutorial teaches you how to build Zoom/Whatsapp style video calling for your app.
- Calls run on Stream's global edge network for optimal latency & reliability.
- Permissions give you fine grained control over who can do what.
- Video quality and codecs are automatically optimized.
- Powered by Stream's Video Calling API.
Step 1 - Create a new project in Android Studio
- Create a new project
- Select Phone & Tablet -> Empty Activity
- Name your project VideoCall.
Note that this tutorial was written using Android Studio Giraffe. Setup steps can vary slightly across Android Studio versions. We recommend using Android Studio Giraffe or newer.
Step 2 - Install the SDK & Setup the client
Add the Video Compose SDK and Jetpack Compose dependencies to your app's build.gradle.kts
file found in app/build.gradle.kts
.
If you're new to android, note that there are 2 build.gradle
files, you want to open the build.gradle
in the app folder.
dependencies {
// Stream Video Compose SDK
implementation("io.getstream:stream-video-android-compose:0.3.4")
// Optionally add Jetpack Compose if Android studio didn't automatically include them
implementation(platform("androidx.compose:compose-bom:2023.08.00"))
implementation("androidx.activity:activity-compose:1.7.2")
implementation("androidx.compose.ui:ui")
implementation("androidx.compose.ui:ui-tooling")
implementation("androidx.compose.runtime:runtime")
implementation("androidx.compose.foundation:foundation")
implementation("androidx.compose.material:material")
}
There are 2 versions of Stream's SDK.
- Video Compose SDK:
io.getstream:stream-video-android-compose
dependency that includes the video core SDK + compose UI components. - Video Core SDK:
io.getstream:stream-video-android-core
that only includes the core parts of the video SDK.
For this tutorial, we'll use the compose UI components.
Step 3 - Create & Join a call
To keep this tutorial short and easy to understand we'll place all code in MainActivity.kt
.
For a production app you'd want to initialize the client in your Application class or DI module.
You'd also want to use a viewmodel.
Open up MainActivity.kt
and replace the MainActivity class with:
class MainActivity : ComponentActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
val userToken = "REPLACE_WITH_TOKEN"
val userId = "REPLACE_WITH_USER_ID"
val callId = "REPLACE_WITH_CALL_ID"
// step1 - create a user.
val user = User(
id = userId, // any string
name = "Tutorial" // name and image are used in the UI
)
// step2 - initialize StreamVideo. For a production app we recommend adding the client to your Application class or di module.
val client = StreamVideoBuilder(
context = applicationContext,
apiKey = "hd8szvscpxvd", // demo API key
geo = GEO.GlobalEdgeNetwork,
user = user,
token = userToken,
).build()
// step3 - join a call, which type is `default` and id is `123`.
val call = client.call("default", callId)
lifecycleScope.launch {
val result = call.join(create = true)
result.onError {
Toast.makeText(applicationContext, it.message, Toast.LENGTH_LONG).show()
}
}
setContent {
// step4 - apply VideoTheme
VideoTheme {
// step5 - define required properties.
val participants by call.state.participants.collectAsState()
val connection by call.state.connection.collectAsState()
// step6 - render texts that display connection status.
Box(
contentAlignment = Alignment.Center,
modifier = Modifier.fillMaxSize()
) {
if (connection != RealtimeConnection.Connected) {
Text("loading...", fontSize = 30.sp)
} else {
Text("Call ${call.id} has ${participants.size} participants", fontSize = 30.sp)
}
}
}
}
}
}
To actually run this sample, we need a valid user token. The user token is typically generated by your server side API. When a user logs in to your app you return the user token that gives them access to the call. To make this tutorial easier to follow we'll generate a user token for you:
Please update REPLACE_WITH_USER_ID, REPLACE_WITH_TOKEN and REPLACE_WITH_CALL_ID with the actual values shown below:
Here are credentials to try out the app with:
Property | Value |
---|---|
API Key | Waiting for an API key ... |
Token | Token is generated ... |
User ID | Loading ... |
Call ID | Creating random call ID ... |
Now when you run the sample app it will connect successfully. The text will say "call ... has 1 participant" (yourself). Let's review what we did in the above code.
Create a user. First we create a user object. You typically sync these users via a server side integration from your own backend. Alternatively, you can also use guest or anonymous users.
val user = User(
id = userId, // any string
name = "Tutorial" // name and image are used in the UI
)
Initialize the Stream Client. Next we initialize the client by passing the API Key, user and user token.
val client = StreamVideoBuilder(
context = applicationContext,
apiKey = "hd8szvscpxvd", // demo API key
geo = GEO.GlobalEdgeNetwork,
user = user,
token = userToken,
).build()
Create and Join Call. After the user and client are created, we create a call like this:
val call = client.call("default", callId)
lifecycleScope.launch {
val result = call.join(create = true)
result.onError {
Toast.makeText(applicationContext, it.message, Toast.LENGTH_LONG).show()
}
}
As soon as you use call.join
the connection for video & audio is setup.
Lastly, the UI is rendered by observing call.state
(participants and connection states):
val participants by call.state.participants.collectAsState()
val connection by call.state.connection.collectAsState()
You'll find all relevant state for the call in call.state
and call.state.participants
.
The documentation on Call state and Participant state explains this in further detail.
Step 4 - Joining from the web
To make this a little more interactive, let's join the call from your browser.
On your Android device, you'll see the text update to 2 participants. Let's keep the browser tab open as you go through the tutorial.
Step 5 - Rendering Video
In this next step we're going to:
- Request Android Runtime permissions (to capture video and audio)
- Render your local & remote participant video
A. Requesting Android Runtime Permissions
To capture the microphone and camera output we need to request Android runtime permissions.
In MainActivity.kt
just below setContent add the line LaunchCallPermissions(call = call)
:
setContent {
LaunchCallPermissions(call = call)
...
}
The launch call permissions will request permissions when you open the call. Review the permissions docs to learn more about how you can easily request permissions.
B. Render the video
In the MainActivity.kt
file, replace the code inside setContent
code with the example below:
setContent {
LaunchCallPermissions(call = call)
VideoTheme {
val remoteParticipants by call.state.remoteParticipants.collectAsState()
val remoteParticipant = remoteParticipants.firstOrNull()
val me by call.state.me.collectAsState()
val connection by call.state.connection.collectAsState()
var parentSize: IntSize by remember { mutableStateOf(IntSize(0, 0)) }
Box(
contentAlignment = Alignment.Center,
modifier = Modifier
.fillMaxSize()
.background(VideoTheme.colors.appBackground)
.onSizeChanged { parentSize = it }
) {
if (remoteParticipant != null) {
val remoteVideo by remoteParticipant.video.collectAsState()
Column(modifier = Modifier.fillMaxSize()) {
VideoRenderer(
modifier = Modifier.weight(1f),
call = call,
video = remoteVideo
)
}
} else {
if (connection != RealtimeConnection.Connected) {
Text(
text = "loading...",
fontSize = 30.sp,
color = VideoTheme.colors.textHighEmphasis
)
} else {
Text(
modifier = Modifier.padding(30.dp),
text = "Join call ${call.id} in your browser to see the video here",
fontSize = 30.sp,
color = VideoTheme.colors.textHighEmphasis,
textAlign = TextAlign.Center
)
}
}
// floating video UI for the local video participant
me?.let { localVideo ->
FloatingParticipantVideo(
modifier = Modifier.align(Alignment.TopEnd),
call = call,
participant = localVideo,
parentBounds = parentSize
)
}
}
}
}
Now when you run the app, you'll see your local video in a floating video element and the video from your browser. The end result should look somewhat like this:
Let's review the changes we made.
VideoRenderer is one of our primary low-level components.
VideoRenderer(
modifier = Modifier.weight(1f),
call = call,
video = remoteVideo?.value
)
It only displays the video and doesn't add any other UI elements. The video is lazily loaded, and only requested from the video infrastructure if you're actually displaying it. So if you have a video call with 200 participants, and you show only 10 of them, you'll only receive video for 10 participants. This is how software like Zoom and Google Meet make large calls work.
FloatingParticipantVideo renders a draggable display of your own video.
FloatingParticipantVideo(
modifier = Modifier.align(Alignment.TopEnd),
call = call,
participant = me!!,
parentBounds = parentSize
)
Step 6 - A Full Video Calling UI
The above example showed how to use the call state object and compose to build a basic video UI. For a production version of calling you'd want a few more UI elements:
- Indicators of when someone is speaking
- Quality of their network
- Layout support for >2 participants
- Labels for the participant names
- Call header and controls
Stream ships with several Compose components to make this easy. You can customize the components with theming, arguments and swapping parts of them. This is convenient if you want to quickly build a production ready calling experience for you app. (and if you need more flexibility, many customers use the above low level approach to build a UI from scratch)
To render a full calling UI, we'll leverage the CallContent component. This includes sensible defaults for a call header, video grid, call controls, picture-in-picture, and everything that you need to build a video call screen.
Open MainActivity.kt
, and update the code inside of VideoTheme
to use the CallContent
.
The code will be a lot smaller than before since all UI logic is handled in the CallContent
:
VideoTheme {
CallContent(
modifier = Modifier.fillMaxSize(),
call = call,
onBackPressed = { onBackPressed() },
)
}
The result will be:
When you now run your app, you'll see a more polished video UI. It supports reactions, screensharing, active speaker detection, network quality indicators etc. The most commonly used UI components are:
- VideoRenderer: For rendering video and automatically requesting video tracks when needed. Most of the Video components are built on top of this.
- ParticipantVideo: The participant's video + some UI elements for network quality, reactions, speaking etc.
- ParticipantsGrid: A grid of participant video elements.
- FloatingParticipantVideo: A draggable version of the participant video. Typically used for your own video.
- ControlActions: A set of buttons for controlling your call, such as changing audio and video states.
- RingingCallContent: UI for displaying incoming and outgoing calls.
The full list of UI components is available in the docs.
Step 7 - Customizing the UI
You can customize the UI by:
- Building your own UI components (the most flexibility, build anything).
- Mixing and matching with Stream's UI Components (speeds up how quickly you can build common video UIs).
- Theming (basic customization of colors, fonts etc).
The example below shows how to swap out the call controls for your own controls:
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
lifecycleScope.launch {
val result = call.join(create = true)
result.onError {
Toast.makeText(applicationContext, it.message, Toast.LENGTH_LONG).show()
}
}
setContent {
VideoTheme {
val isCameraEnabled by call.camera.isEnabled.collectAsState()
val isMicrophoneEnabled by call.microphone.isEnabled.collectAsState()
CallContent(
modifier = Modifier.background(color = VideoTheme.colors.appBackground),
call = call,
onBackPressed = { onBackPressed() },
controlsContent = {
ControlActions(
call = call,
actions = listOf(
{
ToggleCameraAction(
modifier = Modifier.size(52.dp),
isCameraEnabled = isCameraEnabled,
onCallAction = { call.camera.setEnabled(it.isEnabled) }
)
},
{
ToggleMicrophoneAction(
modifier = Modifier.size(52.dp),
isMicrophoneEnabled = isMicrophoneEnabled,
onCallAction = { call.microphone.setEnabled(it.isEnabled) }
)
},
{
FlipCameraAction(
modifier = Modifier.size(52.dp),
onCallAction = { call.camera.flip() }
)
},
)
)
}
)
}
}
}
Stream's Video SDK provides fully polished UI components, allowing you to build a video call quickly and customize them. As you've seen before, you can implement a full complete video call screen with CallContent
composable in Jetpack Compose. The CallContent
composable consists of three major parts below:
- appBarContent: Content is shown that calls information or additional actions.
- controlsContent: Content is shown that allows users to trigger different actions to control a joined call.
- videoContent: Content shown to be rendered when we're connected to a call successfully.
Theming gives you control over the colors and fonts.
VideoTheme(
colors = StreamColors.defaultColors().copy(appBackground = Color.Black),
dimens = StreamDimens.defaultDimens().copy(callAvatarSize = 72.dp),
typography = StreamTypography.defaultTypography().copy(title1 = TextStyle()),
shapes = StreamShapes.defaultShapes().copy(avatar = CircleShape)
) {
..
}
Recap
Please do let us know if you ran into any issues while building an video calling app with Kotlin. Our team is also happy to review your UI designs and offer recommendations on how to achieve it with Stream.
To recap what we've learned about android video calling:
- You setup a call: (val call = client.call("default", "123"))
- The call type ("default" in the above case) controls which features are enabled and how permissions are setup
- When you join a call, realtime communication is setup for audio & video calling: (call.join())
- Stateflow objects in call.state and call.state.participants make it easy to build your own UI
- VideoRenderer is the low level component that renders video
We've used Stream's Video Calling API, which means calls run on a global edge network of video servers. By being closer to your users the latency and reliability of calls are better. The kotlin SDK enables you to build in-app video calling, audio rooms and livestreaming in days.
We hope you've enjoyed this tutorial and please do feel free to reach out if you have any suggestions or questions.