RTMP Streaming without Flash: Why the Real-Time Messaging Protocol Still Matters

...

The Real-Time Messaging Protocol (RTMP) laid the groundwork for modern multimedia streaming. Here’s how it works and why it still matters.

When Macromedia launched the Real-Time Messaging Protocol in the 1990s as the backbone of Flash, it established what would become the near-universal standard for multimedia streaming over the internet for more than 20 years. And although newer protocols like HLS and MPEG-DASH are now used for the content distribution stages of the streaming pipeline, RTMP still plays an important backend role (more on what that looks like below).

These newer protocols gained popularity with the decline of Flash support and the rise of streaming platforms outside the traditional web browser, including mobile and desktop native apps as well as smart TVs and some IoT devices. Today’s developers may not need to know every detail of how to build a traditional RTMP stream from initial capture to delivery — but a high-level understanding of how the RTMP framework functions is critical to building and maintaining any modern app or platform designed to deliver synchronized audio, video, and data to end users. All of these platforms rely on the foundational streaming architecture and concepts that RTMP standardized, so it’s worth reviewing those concepts before you start to build a live streaming platform.

Looking for a more technical code tutorial instead? Check out our step-by-step guide to building a live stream with Mux, Stream, and Flutter→

What is the Real-Time Messaging Protocol (RTMP)?

The Real-Time Messaging Protocol (RTMP) is a standardized set of specifications developed to transmit streaming audio, video, and data from a source to a user’s playback device over the internet, with synchronized timing and low latency.

With a delay time from input to delivery of about 3-5 seconds, RTMP is reasonably fast even by today’s standards. Those speeds may not be sufficient for uses like two-way video calling, which must be perceived as instantaneous, but they’re satisfactory to deliver hosted content and even for most one-way live streams. Factors like network connectivity, video resolution, added audio layers, speed of the transcoding server, and the playback device’s download speed all affect the latency of an RTMP stream.

Regardless of the total delay time, RTMP should deliver a smooth playback experience with parallel audio and video output synced to the same timing information. Developers will sometimes assign different priority levels to different message classes so that in the event of a poor connection the most important pieces of the stream still come through (so audio will still play if there’s a problem transmitting video, for example).

History of RTMP & Flash Player

Most people still associate RTMP with Flash Player, which reached its end of life (EOL) in 2020 — but the scope of RTMP use cases and implementations actually goes far beyond Flash and the protocol itself is still very much alive. A quick review of the history of RTMP and Flash clarifies how the two are related but not the same.

Flash Player debuted in 1996, powered by RTMP as developed by Macromedia. Adobe acquired Macromedia in 2005, keeping both the renamed Adobe Flash Player and the Real-Time Messaging Protocol proprietary for the time being. But in 2012, Adobe released RTMP to the public as an open specification, allowing anyone to build products and technology to deliver video, audio, and data. With the protocol open, many new solutions could be developed using RTMP for all or part of the transmission process — without relying on Flash. This is one reason why RTMP can still be used even after the Flash EOL.

Adobe announced the Flash EOL in 2017, giving content providers three years to migrate to newer protocols like HLS, MPEG-DASH, HTML5, WebGL, and WebAssembly — though as recently as the summer of 2020, major webcast services like Fox Sports Go still ran on Flash. These newer open standards are more secure and they’re also fully web-based, so they don’t require users to download and install a plugin. Many modern streams still use RTMP as part of the transmission pipeline, but not for the content’s entire journey from source to playback as it was used with Flash. We’ll take a closer look at the architecture of these modern pipelines below in the next section.

How Does the Real-Time Messaging Protocol Work?

The basic idea behind RTMP is to establish and maintain a persistent network connection between the playback device and the source, with a few extra steps in between. Connections follow the Transmission Control Protocol (TCP), which governs how endpoints establish contact and exchange messages over the internet.

Establishing a Secure Connection

To initiate an RTMP connection, two endpoints first conduct a handshake before any data is exchanged. On its own, the RTMP handshake doesn’t provide the same type of security as modern certificates and encryption, so additional protection is recommended. Secure variations on RTMP include RTMPS, which adds transport layer security (TLS, formerly SSL), and RTMPE, Adobe’s proprietary encryption method.

The open RTMP specification doesn’t require developers to use Adobe’s security measures — or any in particular — allowing the freedom to secure connections as desired and also adapt RTMP to work with evolving security practices in the future.

Multiplexing & Packetization

Once the RTMP handshake has been completed and a persistent connection has been established, the streaming content can be transmitted. RTMP’s standards for multiplexing and packetizing audio and video data are critical to delivering a smooth media streaming experience. The terms may sound complicated, but the concepts behind them are fairly straightforward.

Multiplexing merges separate audio and video data (and interactive content data like users’ chat messages) into a single audio-video stream that can be transmitted simultaneously. If you’ve ever worked with video or audio editing software, you can think of this process as analogous to (though not the same as) mixing down and exporting a multi-track project to one media file. In theory, multiplexing allows an infinite number of audio and video data streams to be combined, though in practice, there are usually just one or two of each. Upon receipt, interleaved data must be disassembled, or demultiplexed, in order to play back the original audio and video data.

As in many computer functions, packetization breaks up large data files into smaller chunks, or packets, which can be more easily transmitted. When they reach their destination, packets are reassembled into their previous larger form. Compression and decompression also play a key role in making sure audio and video data — which in their raw forms create huge unwieldy files — can travel smoothly over the internet. You can view the preferred codecs to optimize both quality and bandwidth in the RTMP Specifications section below.

Steps in an RTMP Streaming Pipeline

Here’s how the above processes work together to deliver a multimedia stream from its source, or camera input, to an end user’s playback device.

1. Camera Input

A camera does the work of capturing light and sound and converting these analog inputs to a raw (uncompressed) digital format. With audio, for example, an audible waveform is captured by a microphone, which puts out an electrical current analogous to the waveform. An analog-to-digital converter within the capture device plots that electrical signal on a graph and records each perceptible unique point on the waveform as a digital ordered pair. As noted above, this raw digital data creates very large files and must then be compressed (encoded) before streaming.

(For hosted media that isn’t streaming live (but is still streaming over the internet for near-real-time playback on a user’s device), the equivalent step is to upload a media file. Unlike raw data from a live streaming camera, this file will already be encoded. Depending on the hosting and distribution platform, it may still be transcoded into different formats and resolutions, following essentially the same steps outlined here, or it may skip a few of these steps.)

2. Encoding

A codec, short for coder-decoder, compresses raw audio and video data into a smaller, more easily handled file size, ideally without any perceptible loss in quality. In a live stream, encoding is usually handled by the capture device, whereas hosted media is usually encoded upon export from video editing software. H.264 is today’s most popular format for video encoding, in part because of its optimized balance between quality and compression.

3. Distribution (Upload)

Encoded media must now be distributed to media servers, and this is where RTMP comes in. RTMP establishes a persistent connection between the capture and encoding device and the server, allowing the data to be uploaded quickly.

4. Media Server (Transcoding & Transmuxing)

Users expect a reasonable media playback experience no matter what type of device they’re using, what operating system that device runs, or whether their internet connection is stellar or just okay.

The media server does important transcoding work behind the scenes to make this smooth cross-platform experience possible. In a process known as transmuxing, the server repackages the original media into many different versions of varying resolution, quality, and bitrate, and even outputs multiple delivery protocols like HLS, m3U8, and VP8 to meet the needs of different playback devices. When you switch from 480p to 720p while watching a YouTube video, for example, you’re actually asking YouTube’s server to play back a different file (of the same video) that has been encoded at a different resolution.

5. Content Delivery Network (CDN)

The right version of the encoded media must then be sent from the media server to the playback device. A content delivery network (CDN) isn’t always required for this, but it does reduce latency and the load on the streaming server, especially for media that needs to travel long distances from its source. A CDN works by mirroring the streaming data to numerous servers across the globe, handling end-user demand regionally.

Traditionally, the last-mile connection from the media server or CDN to the user’s device was handled by RTMP, with Flash Player taking care of playback. This process is essentially the reverse of Step 3 above. Today, however, more modern protocols have replaced RTMP for faster and more secure last-mile media delivery. We’ll introduce these modern protocols in the Live Streaming Today section below.

6. Playback

On the receiving end, the user’s playback device decodes the incoming data and sends the pieces to the appropriate hardware. With audio, a digital-to-analog converter translates the audio file’s numerical ordered pairs back into an analogous electrical signal, which then drives the device’s speaker cones after being amplified. Similar processes convert video data into the arrays of colored pixels we perceive as moving images on a screen.

RTMP Specifications at a Glance

Here are the most important RTMP specs to know. For complete documentation, you can download the Real-Time Messaging Protocol (RTMP) specification PDF directly from Adobe.

Supported video formats: H.264 (preferred) & VP8 are the most popular, but other codecs like Screen Video v2 and AMF, SWF, FLV, and F4V are also supported.

Supported audio formats: AAC (preferred), mp3, AAC-LC, & HE-AAC

Alternative versions of RTMP:

  • RTMPS connects using transport layer security (TLS) (formerly secure sockets layer, or SSL).
  • RTMPT encapsulates data within HTTP requests.
  • RTMPE uses Adobe’s proprietary encryption.
  • RTMFP transmits over the user datagram protocol (UDP), which doesn’t require establishment of a consistent connection, instead of TCP.

RTMP & Live Streaming Today

Now that we’ve walked through the basics of how a traditional RTMP stream works, we’re ready to explore how the protocol is used today, how newer protocols are replacing it, and where in the pipeline those changes are happening.

RTMP is still widely used for the first leg of a piece of content’s streaming journey (Step 3 above), to transmit content from a source to a server. But it’s no longer considered secure or efficient for last-mile delivery of that same content to the user (Steps 5 & 6 above). We sometimes refer to the front part of the pipeline, where RTMP still plays a major role, as the ingest stage, and the delivery portion, where RTMP has effectively reached its EOL, as egress.

Today’s most popular RTMP alternative for client-side content delivery is Apple’s HTTPS Live Streaming, or HLS. Instead of building support for Flash and players like JW Player and Flowplayer on iOS, Apple created its own protocol for frontend content distribution. HLS is supported by most modern devices, browsers, and apps, including Google Chrome, Android, Mac, Linux, and Windows, along with TV casting devices and smart TVs.

Unlike RTMP, HLS doesn’t establish a continuous two-way connection between client and server. That means it doesn’t actually create a stream in the traditional sense. Using HTTP to progressively download packets instead, HLS works more like “regular” web content delivery. One major benefit of HLS is decreased latency: The latest versions of the protocol can operate with a delay time of less than two seconds.

MPEG-DASH works similarly to HLS and serves as another preferred alternative to RTMP for last-mile delivery. On the ingest side, protocols like SRT and WebRTC are gaining ground and may eventually replace RTMP.

Adding Real-Time Chat to Audio/Video Livestreams

With so much streaming content now available, interactive experiences that go above and beyond a one-sided TV-like transmission have been shown to delight users and drive retention. Livestream chat is one tool that helps modern broadcasts keep users engaged by talking to each other or event hosts in near-real-time alongside the streaming content. Conveniently, chat messages can be bundled with audio and video data as part of the multiplexing process described above, then decoded and displayed as a third playback channel. That said, nailing livestream chat UX best practices can be easier said than done, and it’s not often practical to build and scale that type of functionality from scratch.

If your team is working to develop an impressive modern live streaming experience with synchronized video and text chat, discover how much time and effort you could save by integrating the Stream Chat API.