Build low-latency Vision AI applications using our new open-source Vision AI SDK. ⭐️ on GitHub ->

How Low-Latency Video Streaming Works

New
10 min read

Building real-time video at scale isn’t simple, but with the right architecture, sub-second streaming is within reach.

Raymond F
Raymond F
Published November 5, 2025
How Low Latency Video Streaming Works cover image

These days, low-latency video streaming is so deeply embedded in current culture that, for those accustomed to TikTok, Twitch, YouTube, or even straightforward video chats, the idea that video could be anything other than millisecond-perfect seems ridiculous.

Of course, all those platforms, and really the entire concept of video streaming and video chat, are only possible because of the technical advances in video streaming: WebRTC, STUN servers, TURN servers, ICE candidates, codec optimizations, and adaptive bitrate streaming. These technologies work together to solve fundamental challenges: compressing video efficiently, connecting peers to peers, and adapting to fluctuating network conditions in real-time.

So, how does it work? Here, we will guide you through the video streaming process - including the technology and techniques behind it - as well as how to integrate low-latency video streaming into your platform.

What is Low-Latency Video Streaming?

Video streaming latency is the end-to-end delay from capture through encoding, transmission, decoding, and display. Sometimes it's called "glass to glass" time (as in, the glass of the camera to the glass of the screen).

There is no exact definition of "low" latency, as it depends on the use case. If you are watching the Super Bowl, being a few seconds behind the live event might not bother you, but in a video chat, that delay would cause chaos. Then, low latency would mean sub-50ms latency.

This is different from video-on-demand. If you are watching Netflix, there is a delay between when Netflix initially sends the data and when it plays on your screen, but there is no latency as such. 

Common Use Cases for Low-Latency Video

Different applications demand different levels of latency. Here's what various real-world use cases require:

  • Video Chat & Conferencing: One-on-one and group video calls need 150-300ms round-trip latency to feel natural and conversational. Any higher and conversations start to feel stilted with people talking over each other.

  • Interactive Livestreams: Co-hosting, call-ins, and bringing viewers "on stage" require 200-500ms latency for participants to interact naturally. The audience watching can tolerate higher latency if they're not participating directly.

  • Live Shopping & Auctions: Real-time bidding and product demonstrations need 200-500ms latency to keep chat reactions and purchases synchronized with what's happening on screen. Delays here mean lost sales and confused customers.

  • Telehealth Consultations: Remote medical appointments require 200-300ms latency for natural doctor-patient interaction. Reliability matters more than absolute minimum latency for these critical conversations.

  • Customer Support: Remote troubleshooting and co-browsing sessions work well with 200-400ms latency. Support agents need to see customer actions quickly enough to provide real-time guidance.

  • Online Education: Virtual classrooms and tutoring sessions need 200-500ms latency for students and teachers to interact naturally. Higher latency makes discussions feel disconnected and reduces engagement.

  • Sports Watch Parties: Synchronized viewing experiences require 300-500ms latency, but more importantly, all viewers need to be synchronized with each other. Nothing ruins a watch party like someone cheering before others see the goal.

  • Cloud Gaming: Game streaming demands 60-100ms total latency from input to screen for responsive gameplay. Fighting games and shooters need the absolute minimum, while turn-based games can tolerate slightly more.

  • Security & Surveillance: Live monitoring with two-way audio requires near-real-time video feeds and sub-300ms audio latency for guards to issue verbal warnings or instructions.

The pattern is clear: anything conversational needs under 300ms, interactive experiences can tolerate up to 500ms, and specialized applications like music or gaming have even stricter requirements. The key is matching the technology to the use case. 

Why Traditional Streaming Protocols Are Slow

Video-on-demand services and many live streaming platforms rely on HTTP-based protocols that prioritize reliability and quality over speed.

The multi-second delays in HLS and DASH aren't bugs. They're architectural consequences of how these protocols work. Both protocols rely on segmented delivery. The server chops the live stream into small files, typically 2-10 seconds long. These segments are delivered via HTTP and listed in a playlist (HLS) or manifest (DASH). The player must download complete segments and buffer several ahead to ensure smooth playback.

HLS/DASH Segmented Delivery in Streaming

This approach creates a delay in three ways:

  • Segment creation time: The encoder must produce a complete segment before making it available

  • Download time: The player must fetch the entire segment before playback

  • Buffer depth: Players typically hold 2-3 segments in reserve to prevent stuttering

Large buffers prioritize smoothness over latency. To prevent stuttering from network variations, HLS and DASH players maintain substantial buffers. These multi-second buffers absorb network fluctuations, but at the cost of delay. The player is always showing video from seconds ago.

The tradeoff is scalability. HTTP segments work beautifully with CDNs and standard web infrastructure. Caching these segments allows services to reach millions of viewers efficiently. The protocol prioritizes quality and reliability over speed.

The Mechanics of Low-Latency Video Streaming

Understanding how to achieve low latency requires examining where delay accumulates in the video pipeline, why different protocols produce vastly different latencies, and what makes scaling low-latency systems technically complex.

The Glass-to-Glass Latency Pipeline

Every millisecond of delay in video streaming comes from somewhere specific in the pipeline. From the moment light hits the camera sensor to when pixels appear on the viewer's screen, latency accumulates through several distinct stages:

  • Camera Capture & Processing: The sensor integrates light over the exposure time (roughly 33 milliseconds at 30 fps) and processes the image. Typical latency ranges from 30-50 milliseconds, though some devices take 150-200 milliseconds.

  • Encoding: Compresses the raw frame, typically buffering one or two frames for prediction. Standard H.264 encoders add about 67 milliseconds, while hardware low-latency encoders can reduce this to 5-30 milliseconds.

  • Packetization & Network Transmission: Splitting the stream into packets adds only a few milliseconds. Network propagation is the real variable, ranging from 5 milliseconds locally to 50-100 milliseconds transcontinentally, or 150+ milliseconds for intercontinental connections.

  • Jitter Buffering: Real-time protocols like WebRTC use aggressive buffers of just 10-50 milliseconds. Traditional streaming protocols buffer 2-10 seconds, which is where most of their latency comes from.

  • Decoding: Decompresses the video in 5-15 milliseconds per frame. Codecs using B-frames may buffer frames for reordering, adding one frame period of delay.

  • Display Rendering: Graphics processing and screen refresh synchronization add 8-16 milliseconds at 60Hz.

Glass-to-Glass Latency Pipeline

In a well-optimized system, these stages combine to produce 200-500 milliseconds of glass-to-glass latency. A benchmark of a conventional 30fps video link showed roughly 67 milliseconds from capture through encoding, 5 milliseconds for local network transit, and 150 milliseconds for decoding with buffering, totaling 220 milliseconds end-to-end.

UDP vs TCP

The choice between UDP and TCP at the transport layer directly determines whether sub-second latency is achievable.

TCP guarantees that data arrives in order. If any packet is lost, TCP will not deliver subsequent data until the lost packet is retransmitted and received. In practice, this means a single lost packet can freeze the video stream until recovery completes. Later frames get stuck behind the missing packet, adding unpredictable delay.

This illustrates this head-of-line blocking problem.

TCP vs. UDP

When a data packet is lost in TCP, the protocol must wait for retransmission before delivering subsequent packets. The receiver sits idle while earlier packets are recovered, introducing latency spikes that make conversational video impossible.

UDP delivers packets immediately, lost or not. UDP is connectionless and provides no delivery guarantees. Packets may arrive out of order or disappear entirely, but critically, subsequent packets are delivered immediately. The stream continues without waiting for retransmission. This immediate delivery is why WebRTC and other real-time protocols build on UDP rather than TCP. The show must go on.

Scaling Low-Latency Systems

Achieving low latency for two people is straightforward. Maintaining that latency for hundreds or thousands of participants requires solving the peer-to-peer problem.

A video call with N participants using peer-to-peer requires each user to send their video to N-1 others. A 10-person call would require each participant to encode and send 9 separate video streams - 90 total video streams. Most residential internet connections can't handle that upload load.

Selective Forwarding Units

Modern conferencing systems solve this with SFUs, video routers that forward streams without decoding them. Each participant sends one stream to the SFU, which forwards it to all other participants. Forwarding adds only a few milliseconds of delay while dramatically reducing bandwidth requirements.

Large SFUs optimize further by sending only the most relevant streams to each client (such as active speakers) and using simulcast, where clients send multiple quality versions so the SFU can forward lower resolutions to participants on poor connections.

TURN Relays

Building your own app? Get early access to our Livestream or Video Calling API and launch in days!

Approximately 20% of connections can't establish direct peer-to-peer links due to restrictive firewalls. TURN servers relay media between peers in these cases, adding 10-30 milliseconds of latency but ensuring connectivity. The real challenge is bandwidth consumption, as all relayed traffic flows through servers that the provider must host and pay for.

Providers deploy TURN server clusters in major regions with intelligent routing so users reach the nearest relay, minimizing added latency while distributing load.

With these technologies handling the complexity of connectivity and scale, developers can focus on building features rather than infrastructure.

Building a Livestream with Stream

Understanding the mechanics of low-latency video, from UDP transport to SFU architecture, reveals why building production systems is complex. You need signaling servers, STUN and TURN infrastructure, codec negotiation, adaptive bitrate logic, and strategies for scaling beyond peer-to-peer.

Building all of this from scratch is possible, but for most applications, using an SDK that handles these complexities makes more sense. Stream's WebRTC-based livestreaming SDK abstracts this infrastructure while preserving WebRTC's real-time performance, so you can build and deploy low-latency livestreaming experiences without managing the underlying transport or signaling layers.

Here's how to build a production livestream that scales to thousands of viewers while maintaining sub-second latency for interactive participants.

Setting Up the Stream Client

Instead of managing peer connections and signaling servers, Stream handles everything through a unified client:

typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import { StreamVideoClient, User } from "@stream-io/video-client"; const user: User = { id: userId, name: "Oliver", image: "https://getstream.io/random_svg/?id=oliver&name=Oliver", }; const client = new StreamVideoClient({ apiKey, token, user, }); const call = client.call("livestream", callId);

The client manages all WebRTC connections, ICE candidates, and signaling automatically. The call object represents a livestream session that multiple users can join.

Going Live as a Host

The host controls when the stream starts, using a "backstage" mode for preparation:

typescript
1
2
3
4
5
6
7
8
9
10
11
12
// Enable camera and microphone call.camera.enable(); call.microphone.enable(); // Join backstage (not yet visible to viewers) call.join({ create: true }); // When ready, go live call.goLive(); // Optional: Enable HLS for massive scale call.goLive({ start_hls: true });

Backstage mode lets hosts set up their stream before viewers can see it. This addresses a common issue with raw WebRTC, where streams begin immediately upon connection. We weren't complimentary about HLS above, but it is a great option when you need to stream on a massive scale and some latency is acceptable.

Managing Host State

Stream provides reactive state management through observables:

typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
call.state.backstage$.subscribe((backstage) => { if (backstage) { goLiveButton.disabled = false; endLiveButton.disabled = true; } else { goLiveButton.disabled = true; endLiveButton.disabled = false; } }); call.state.participantCount$.subscribe((count) => { countElement.innerText = (count || 0).toString(); });

This determines whether the stream is live, how many viewers are watching, and handles all state synchronization across clients.

Rendering Video Streams

Instead of manually handling on-track events and media streams, we can use helper methods:

typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
const renderVideo = ( call: Call, participant: StreamVideoParticipant, parentContainer: HTMLElement ) => { const videoEl = document.createElement("video"); videoEl.id = `video-${participant.sessionId}`; videoEl.width = 333; videoEl.height = 250; parentContainer.appendChild(videoEl); // Stream handles all WebRTC complexity const unbind = call.bindVideoElement( videoEl, participant.sessionId, "videoTrack" ); };

The bindVideoElement method manages the entire lifecycle: attaching streams, handling track changes, and cleaning up when participants leave.

Viewer Experience

Viewers have a simpler flow. They wait for the stream to go live, then automatically join:

typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Viewers disable their camera/mic call.camera.disable(); call.microphone.disable(); call.state.backstage$.subscribe((backstage) => { if (!backstage && call.state.callingState === CallingState.IDLE) { // Auto-join when stream goes live call.join(); } }); // Render all remote participants (the hosts) call.state.remoteParticipants$.subscribe((participants) => { participants.forEach((participant) => { renderParticipant(call, participant, parentContainer); }); });

This allows us to automatically manage viewer connections, handle network changes, and reconnect without manual intervention.

Hybrid Streaming: WebRTC + HLS

For massive scale, we can use both WebRTC (low latency) and HLS (high scale) simultaneously:

typescript
1
2
3
4
5
6
7
8
9
10
11
12
13
// Host enables HLS when going live call.goLive({ start_hls: true }); // Viewers can watch via HLS call.state.egress$.subscribe((egress) => { if (egress?.hls?.playlist_url) { const videoEl = document.createElement("video"); const hls = new Hls(); hls.loadSource(egress.hls.playlist_url); hls.attachMedia(videoEl); hlsContainer.appendChild(videoEl); } });

This hybrid approach delivers:

  • WebRTC for interactive participants (hosts, co-hosts, featured viewers) with <500ms latency

  • HLS for thousands of passive viewers with 5-10 second latency

RTMP Ingestion

Professional streamers using OBS or similar software can stream via RTMP:

typescript
1
2
3
4
5
6
7
call.state.ingress$.subscribe((ingress) => { if (ingress?.rtmp.address) { const rtmpURL = ingress.rtmp.address; const streamKey = token; console.log("RTMP url:", rtmpURL, "Stream key:", streamKey); } });

RTMP input is transcoded to WebRTC, maintaining low latency for viewers while supporting professional broadcasting tools.

The Stream approach provides:

  • 50 lines for a complete livestream

  • Automatic signaling and connection management

  • Built-in backstage mode

  • Hybrid WebRTC/HLS delivery

  • Scales to 100,000+ viewers

  • RTMP ingestion support

  • Automatic reconnection and error recovery

The tradeoff is control versus simplicity. Raw WebRTC gives complete control over every aspect of the connection. Stream abstracts the complexity while maintaining WebRTC's core benefit: sub-second latency for real-time interaction. For most production applications, the SDK approach dramatically reduces development time while providing features that would take months to build from scratch.

Low Latency is High Value

Low-latency video streaming has evolved from a technical challenge to a solved problem with well-established patterns. The key principles remain constant across all implementations: minimize network hops, eliminate unnecessary buffering, and adapt continuously to changing conditions.

Understanding where latency accumulates in the pipeline, why UDP enables sub-second delivery while TCP cannot, and how SFUs and TURN relays solve scaling challenges provides the foundation for building any real-time video application.

Whether using raw WebRTC for complete control or SDKs like Stream that abstract the complexity, the underlying mechanics are the same. The result is video that arrives fast enough to enable genuine real-time experiences, from video chats to interactive livestreams reaching millions.

Integrating Video with your App?
We've built a Video and Audio solution just for you. Check out our APIs and SDKs.
Learn more ->