WHIP Protocol - What is it and how does it work?

Most modern livestreaming applications need real-time, interactive broadcasting without the latency introduced by traditional streaming pipelines.

WHIP allows you to ingest live content into your platform with minimal setup, near-instant delivery, and TLS-protected signaling.

What Is the WHIP Protocol?

The WebRTC-HTTP Ingestion Protocol (WHIP) is an open IETF-defined protocol that standardizes the signaling process for using WebRTC as a media ingest method in livestreaming services. It provides a simple, HTTP-based handshake between a media publisher and the server's ingestion endpoint. This allows broadcasting software to easily push a live WebRTC stream to a server using HTTP request-response exchange.

WHIP enables software and hardware encoders to support WebRTC streaming without the need for custom signaling implementations. It's free to implement and offers a modern alternative to the Real-Time Messaging Protocol (RTMP) for broadcast ingestion.

How WHIP Works

WHIP uses a one-time HTTP exchange to perform the WebRTC signaling between the broadcaster and the ingest server. After the handshake, the media flows over WebRTC transport directly to the server.

If you're building a livestreaming app, this is what the flow looks like at a high level:

The encoder, whether OBS, a browser app, or a headless service, receives an ingest URL and a token from your platform.
It captures the camera and microphone, prepares a short session description (including codecs, tracks, and network details), and sends a single HTTP POST to start the session.
The server responds with connection details and a unique session link to control the broadcast. Other protocols handle connectivity and encryption, allowing the client to send the livestream data to the server.
When terminating the livestream, the client sends an HTTP DELETE request to the server.

In more technical terms, the protocol works as follows:

Initial SDP Exchange

The WHIP client (WebRTC media encoder or producer) sends an initial HTTP POST request to the WHIP endpoint URL of the media server. This request contains the Session Description Protocol (SDP) offer in the body, which describes the media formats, codecs, and network information of the broadcaster.

The WHIP endpoint responds with a "201 Created" message containing the SDP answer in the body and an entity tag (ETag) header. This unique tag identifies the WHIP session and is required for subsequent requests that might update it, such as adding more Interactive Connectivity Establishment (ICE) candidates.

The SDP answer contains the media server's ICE and codec parameters, completing the WebRTC offer/answer negotiation over a single HTTP connection. The response also contains the content type and location header pointing to a WHIP session URL for the ingest session.

ICE Connection

After the SDP negotiation, the WHIP client and the media server establish a direct connection using ICE for NAT traversal.

ICE collects available IP addresses and candidates (ports) from the client and server. The ICE framework then performs STUN/TURN connectivity checks to find the best path for the media stream, bypassing NAT devices like firewalls.

WHIP can also use Trickle ICE, whereby the initial SDP offer may be sent before ICE gathering. Trickle ICE shares candidates incrementally via HTTP PATCH requests to the WHIP session URL, optimizing the connectivity process and lowering latency. These requests must have a Content-Type: application/trickle-ice-sdpfrag header and contain an SDP fragment with the new candidates.

Media Ingestion

After ICE succeeds and network ports are opened, the WHIP client begins sending media to the server. WebRTC uses Datagram Transport Layer Security (DTLS) to set up keys and the Secure Real-Time Transport Protocol (SRTP) to encrypt the media.

The media flows one-way from the WHIP client to the server, similar to RTMP. The WebRTC media server receives the livestream and routes it to the viewers.

WHIP only enables ingest and does not play a part in how users receive the stream.

Closing the Session

When the broadcast is completed or needs to be terminated, the WHIP client sends an HTTP DELETE request to the WHIP session URL. The server responds with "200 OK" after which the ingest session ends.

Benefits of WHIP

It comes with many benefits for livestreaming apps, such as:

Ultra-Low Latency

WebRTC can deliver livestreams with sub-second (often hundreds of milliseconds) latency, far lower than traditional protocols. By using WHIP, broadcasters can achieve real-time streaming, enabling interactive experiences where delays are unacceptable, such as in gaming and live Q&A sessions.

High-Quality, Adaptive Streaming

WHIP supports modern codecs and adaptive bitrate streaming to maintain high-quality video/audio. WHIP lets you use these capabilities for ingest, although simulcast is negotiated via SDP, while Scalable Video Coding (SVC) is configured via codec or at the application level.

A broadcaster can send multiple quality layers and let the media server or client adapt to network conditions. WebRTC's built-in bandwidth estimation and congestion control dynamically adjust quality to avoid stalls.

TLS Security by Default

WebRTC encrypts all media streams in transit using DTLS/SRTP encryption. WHIP also integrates HTTPS for signaling and supports strict token authentication, ensuring only authorized broadcasters can publish.

Open Standard

WHIP is an open protocol freely available for developers building broadcasting applications. Additionally, any WHIP-compatible encoder can communicate with any WHIP-compatible server, eliminating vendor lock-in.

Easy to Implement

This protocol eliminates the complexities of WebRTC signaling, making it easy to implement in devices by reducing it to just an SDP offer/answer. Developers building cameras or encoders can use any WHIP client library to achieve full WebRTC ingest capabilities without writing a custom signaling system.

Challenges and Limitations

While it has many strengths, it also has two disadvantages:

Lack of Renegotiation

WHIP trades off some features for simplicity, such as mid-stream adjustments.

For example, it does not support renegotiating the session after initial setup. Once the stream is published, you cannot add new tracks or change codecs via WHIP. This can be a limitation compared to WebRTC signaling, which can renegotiate features on the fly.

Early Adoption and Ecosystem

WHIP is in early adoption, and many major platforms do not support it as an ingestion option. This means broadcasters often still rely on RTMP as WHIP gains traction. Also, developers may encounter limited community support.

WHIP Protocol Use Cases

It has several use cases, including:

RTMP Alternative

RTMP was originally designed for Flash and shows its age with limited codec support and a lack of native encryption. Many streaming platforms view WHIP as a more modern replacement with greater longevity, offering additional advantages like ultra-low latency and the elimination of ingest-side transcoding.

Live Production Feeds

WHIP is suitable for remote contribution feeds, such as sports and news. Software or hardware encoders in the field can use WHIP for live ingestion to send camera feeds over the internet with minimal latency. This lets studio producers switch feeds or interact with on-site crew without the usual satellite uplink delay.

In-Browser Livestreaming

Browsers natively support WebRTC, so you can build apps that publish directly to a WHIP endpoint using standard APIs and a simple HTTP POST for signaling. It works without plugins or custom signaling services.

Unified Streaming Architecture

WHIP is often used alongside WebRTC-HTTP Egress Protocol (WHEP) to enable a real-time, end-to-end WebRTC streaming pipeline. In this case, a livestream is ingested via WHIP and delivered to viewers via WebRTC, rather than HLS.

This architecture is particularly useful for large-scale interactive webinars or online classrooms where you want minimal delay while serving large, global audiences.

Some platforms, such as Millicast and Dolby.io, support this model, effectively creating real-time CDNs.

OBS and Broadcast Software Integration

OBS Studio has built-in WHIP support that enables creators to publish sub-second, secure streams to compatible platforms using a simple server URL and bearer token.

If your users prefer the command line, FFmpeg or GStreamer can publish to the same endpoint with a similar URL-and-token setup.

Frequently Asked Questions

What’s WHIP Stream?

A WHIP stream is a live audio or video contribution feed published to a platform using the WHIP protocol. This is the upstream part of the workflow and not the video in the viewer playback.

What Does WHIP Stand For?

It stands for the WebRTC-HTTP Ingestion Protocol, an IETF standard that defines a simple HTTP handshake. The handshake involves a POST request for an SDP offer and an answer in response, so encoders can publish live audio/video over WebRTC without custom signaling.

What Is WHIP and WHEP?

WHIP is for ingest, while WHEP handles egress with similarly minimal signaling, enabling player clients to receive the stream over WebRTC.

What Is WHIP in Software?

In software, WHIP is a protocol that defines the roles of a client and a server. The WHIP client is a feature within broadcasting software like OBS Studio, a hardware encoder, or a custom web application that sends a media stream. A WHIP endpoint is a specific URL on a media server designed to accept that stream.

What Is the WebRTC Protocol Used For?

WebRTC is used for real-time video, audio, and data delivery with sub-second latency. It's used in interactive livestreams and device control. It's built into modern browsers, adapts to changing network conditions, and traverses NATs with ICE.