iOS Video Calling Tutorial


This tutorial teaches you how to build Zoom/Whatsapp style video calling for your app.
- Calls run on Stream's global edge network for optimal latency & reliability.
- Permissions give you fine grained control over who can do what.
- Video quality and codecs are automatically optimized.
- Powered by Stream's Video Calling API.
Step 1 - Create a new SwiftUI Application in Xcode
Confused about "Step 1 - Create a new SwiftUI Application in Xcode"?
Let us know how we can improve our documentation:
- Make sure you have Xcode installed and that you are running 14.3 or later
- Open Xcode and select "Create a new Project"
- Select "iOS" as the platform and "App" as the type of Application
- Name your project "VideoCall" and select "SwiftUI" as the interface
Step 2 - Install the SDK & Setup permissions
Confused about "Step 2 - Install the SDK & Setup permissions"?
Let us know how we can improve our documentation:
Next you need to add our SDK dependencies to your project using Swift Package Manager from Xcode.
- Click on "Add packages..." from the File menu
- Add https://github.com/GetStream/stream-video-swift in the search bar
- Select "StreamVideo" and "StreamVideoSwiftUI" and then click Add Package
App Permissions
Making a video call requires the usage of the camera and the microphone of the device. Therefore, you need to request permissions to use them in your app. In order to do this, you will need to add the following keys and values in your Info.plist
file.
Privacy - Camera Usage Description
- "VideoCall requires camera access in order to capture and transmit video"Privacy - Microphone Usage Description
- "VideoCall requires microphone access in order to capture and transmit audio"
Step 3 - Create & Join a call
Confused about "Step 3 - Create & Join a call"?
Let us know how we can improve our documentation:
Open up VideoCall/VideoCallApp.swift
and replace it with this code:
To actually run this sample we need a valid user token. The user token is typically generated by your server side API. When a user logs in to your app you return the user token that gives them access to the call. To make this tutorial easier to follow we'll generate a user token for you:
Please update REPLACE_WITH_USER_ID, REPLACE_WITH_TOKEN and REPLACE_WITH_CALL_ID with the actual values shown below:
Here are credentials to try out the app with:
Property | Value |
---|---|
API Key | Waiting for an API key ... |
Token | Token is generated ... |
User ID | Loading ... |
Call ID | Creating random call ID ... |
Now when you run the sample app it will connect successfully. The text will say "Call ... has 1 participant" (yourself). Let's review what we did in the above code.
Create a user. First we create a user object. You typically sync these users via a server side integration from your own backend. Alternatively, you can also use guest or anonymous users.
Initialize the Stream Client. Next we initialize the client by passing the API Key, user and user token.
Create and join call After the user and client are created, we create a call like this:
As soon as you use call.join the connection for video & audio is setup.
Lastly, the UI is rendered by observing call.state and participants state:
You'll find all relevant state for the call in call.state
and call.state.participants
. The documentation on Call state and Participant state explains this in further detail.
Step 4 - Joining from the web
Confused about "Step 4 - Joining from the web"?
Let us know how we can improve our documentation:
To make this a little more interactive, let's join the call from your browser.
On your iOS device, you'll see the text update to 2 participants. Let's keep the browser tab open as you go through the tutorial.
Step 5 - Rendering Video
Confused about "Step 5 - Rendering Video"?
Let us know how we can improve our documentation:
In this next step we're going to render your local & remote participant video.
Let's update the body of our VideoCallApp View with following code.
We will now create
Now when you run the app, you'll see your local video in a floating video element and the video from your browser. The end result should look somewhat like this:
Let's review the changes we made.
We added the changeTrackVisibility
in our app and we propagate its call to the other subviews we create. When this method is being called we are asking the Call object to make the participant's track visible or not visible. This is important when the view goes off-screen (e.g. while scrolling through participants during a call) to reduce energy and data consumption.
It only displays the video and doesn't add any other UI elements. The video is lazily loaded, and only requested from the video infrastructure if you're actually displaying it. So if you have a video call with 200 participants, and you show only 10 of them, you'll only receive video for 10 participants. This is how software like Zoom and Google Meet make large calls work.
FloatingParticipantVideo
renders a display of your own video. It uses VideoRendererView
which is the componet used by VideoCallParticipantView
to simply display the video without adding any other UI elements.
ParticipantsView
renders a scrollview of all remoteParticipants.
Step 6 - A Full Video Calling UI
Confused about "Step 6 - A Full Video Calling UI"?
Let us know how we can improve our documentation:
The above example showed how to use the call state object and SwiftUI to build a basic video UI. For a production version of calling you'd want a few more UI elements:
- Indicators of when someone is speaking
- Quality of their network
- Layout support for >2 participants
- Labels for the participant names
- Call header and controls
Stream ships with several SwiftUI components to make this easy. You can customize the components with theming, arguments and swapping parts of them. This is convenient if you want to quickly build a production ready calling experience for you app. (and if you need more flexibility, many customers use the above low level approach to build a UI from scratch)
To render a full calling UI, we'll leverage the CallContainer component. This includes sensible defaults for a call header, video grid, call controls, picture-in-picture, and everything that you need to build a video call screen.
Let's update the code in our VideoCall/VideoCallApp.swift
The result will be:
When you now run your app, you'll see a more polished video UI. It supports reactions, screensharing, active speaker detection, network quality indicators etc. The most commonly used UI components are:
- VideoRendererView: For rendering video and automatically requesting video tracks when needed. Most of the Video components are built on top of this.
- VideoCallParticipantView: The participant's video + some UI elements for network quality, reactions, speaking etc.
- ParticipantsGridLayout: A grid of participant video elements.
- CallControls: A set of buttons for controlling your call, such as changing audio and video states.
- IncomingCall: UI for displaying incoming and outgoing calls.
The full list of UI components is available in the docs.
Step 7 - Customizing the UI
Confused about "Step 7 - Customizing the UI"?
Let us know how we can improve our documentation:
You can customize the UI by:
- Building your own UI components (the most flexibility, build anything).
- Mixing and matching with Stream's UI Components (speeds up how quickly you can build common video UIs).
- Theming (basic customization of colors, fonts etc).
You can see an example of how to swap out the call controls for your own in the UI Cookbook related section.
Recap
Confused about "Recap"?
Let us know how we can improve our documentation:
Please do let us know if you ran into any issues while building an video calling app with Swift. Our team is also happy to review your UI designs and offer recommendations on how to achieve it with Stream.
To recap what we've learned:
- You setup a call: (let call = streamVideo.call(callType: "default", callId: "123")).
- The call type ("default" in the above case) controls which features are enabled and how permissions are setup.
- When you join a call, realtime communication is setup for audio & video calling: (call.join()).
- Published objects in call.state and call.state.participants make it easy to build your own UI.
- VideoRendererView is the low level component that renders video.
- We've used Stream's Video Calling API, which means calls run on a global edge network of video servers. By being closer to your users the latency and reliability of calls are better. The Swift SDK enables you to build in-app video calling, audio rooms and livestreaming in days.
We hope you've enjoyed this tutorial and please do feel free to reach out if you have any suggestions or questions.
Final Thoughts
In this video app tutorial we built a fully functioning iOS messaging app with our iOS SDK component library. We also showed how easy it is to customize the behavior and the style of the iOS video app components with minimal code changes.
Both the video SDK for iOS and the API have plenty more features available to support more advanced use-cases.
Give us Feedback!
Did you find this tutorial helpful in getting you up and running with iOS for adding video to your project? Either good or bad, we’re looking for your honest feedback so we can improve.