How to Build Accessible Video Experiences

Roughly 16% of the human population suffers from some form of long-term disability. If you include temporary disability like a broken arm, that number shoots up significantly.

If your in-app video content isn't accessible, you're potentially excluding a large part of the population and missing an opportunity to create better experiences for everyone.

As more people consume information through media, it is increasingly crucial to plan, organize, and ensure that video content is accessible. Accessible video expands the user base, enhances the experience for existing users, and ensures compliance with forthcoming accessibility laws.

But what does accessibility mean in the context of video?

Many teams struggle with where to start and find integrating inclusive practices challenging. This blog will break down the guidelines and considerations for building video experiences that people of all abilities can enjoy.

Understanding Web Accessibility

Accessibility is a broad topic and can be overwhelming for teams to plan for and implement. The World Wide Web Consortium (W3C) is an international organization that creates web standards around accessibility, internationalization, and privacy.

Through the Web Accessibility Initiative (WAI), they create and maintain a set of guidelines and resources known as Web Content Accessibility Guidelines (WCAG). Organizations can use WCAG to benchmark their product’s accessibility levels.

Web accessibility, in particular, refers to how easy it is for people of all abilities to use a web-based product. Video accessibility refers to the same concept for audiovisual experiences. It includes a set of checks and guidelines designed to help determine what is objectively accessible for all audiences.

The POUR principles

At the heart of web accessibility is the idea that the web is for everyone, and everyone should be able to use it with as little friction as possible. WCAG has a lot of guidelines and rules, but these are the main principles informing the specific implementations:

Perceivable

User interfaces (UIs) and content should be seen, heard, understandable, or otherwise perceivable through multiple senses. For example, your UI should be usable by both sighted and non-sighted users.

You can achieve this by developing your UI to make it easy for screen readers to scan and relay information back to blind users.

Provide captions and transcripts as needed for video content to make it perceivable by those who cannot take in visual information.

Here is an example of what captions look like:

YouTube video screenshot with a dark arrow pointing to the captions: "information access which is similar to chart GPT or deep seek chat." The video progress bar shows 0:43 of a 7:26 runtime.

Other considerations for making your UI perceivable include making it responsive, handling zoom gracefully, and ensuring text is large enough and has enough contrast to increase readability.

Operable

User interfaces (UIs), content, and navigation should be usable through multiple input devices. For example, your UI should work as expected if the user interacts with it using a keyboard instead of a trackpad or voice command instead of a mouse.

You can make your UI and apps operable by ensuring logical and consistent navigation. This way, users always know where they are on the site or product and can quickly skip links or return to a previous screen, as illustrated.

YouTube video screenshot showing a timestamp of 18:13 / 21:26, with a white arrow pointing to the timestamp navigation. This allows navigation within the video to jump to specific moments. The red progress bar always indicates where the user is in the video.

Additionally, content should be searchable, and callouts and alerts should not have auto timeouts.

Understandable

User interfaces (UIs) and content should be easy for the primary audience to understand, regardless of their language proficiency or cognitive skills.

The UI should follow established patterns to help users understand expected actions, and the content should use simple language and avoid jargon.

For example, forms should have descriptive labels and helpful error messages, as illustrated. Large forms should be broken up into multi-step forms. Video content should use everyday language and explain abbreviations or jargon where necessary to use them.

Other ways to increase your UI’s understandability are to use consistent labeling for components across the app, avoid significant UI changes without alerting users, and give the users enough time before implementation.

Robust

User interfaces (UIs) should be as future-proof as possible. They should be compatible with current and future user tools. That is, they should work correctly across as many browsers and assistive technologies as possible while keeping up with future changes.

This may sound complicated, but it comes down to using standard markup (HTML), standard web APIs, and giving names or titles when using a non-standard UI element.

For example, use <button> instead of styling a <div> to look and work like a button.

Two code snippets compare HTML button implementation. The left shows a correct use with a semantic <code><button></code> tag, marked with a green check. The right shows incorrect use by trying to mimic a button with a <code><div></code> and styles. Marked with a red cross.

What to consider when building accessible video experiences

When building accessible video experiences, there are certain aspects you need to pay particular attention to that are unique to audio-visual experiences. Video accessibility considerations are divided into the following main buckets:

User Research

Step one is always to understand your users and the tools they use to consume your visual products. This will help keep your efforts away from blind compliance, allowing you the flexibility to implement guidelines that are informed and most meaningful to your users.

Planning Audio-Visual Content and Media

The next step is to add accessibility concerns at the planning stage. This entails understanding the video's goal and deciding whether it will need visual descriptions, sign language interpretation, and transcripts to support it.

This helps make implementing the WCAG guidelines for audio-visual content a lot simpler.

Audio Descriptions (AD)

This refers to describing videos or parts needed to understand the content in context. It is required when some actions in the video are integral to understanding the whole video, as captions or transcripts are not enough to convey the complete message.

They are mainly for blind users and can be provided as part of the main script, a standalone file, or even a separate video.

Building your own app? Get early access to our Livestream or Video Calling API and launch in days!

An example of a YouTube tutorial in which Audio descriptions would be necessary is when the instructor walks through how to create an index.html page.

They say, “Open VSCode, create a new file called index.html, and then paste the following into it.” However, listeners who can’t see the screen don’t know what “the following” means, making captions insufficient.

In that case, an audio description might be: “The instructor opens VS Code and creates a new file named ‘index.html.’ They paste the following code: the opening <html> tag, then the <head> section, where they paste a <title> tag with the text ‘My First Web Page.’ Next, they paste the <body> ta, and within it, they paste a heading using the <h1> tag with the text ‘Hello, world!’”

Captions

Captions refer to the text version of a video's audiovisual content. They are shown in the media player and synced with the audio.

Captions should be provided for deaf and hard-of-hearing users and should include a dialog transcript, nonverbal sounds like sound effects, and speaker identification where necessary.

Building off the previous example, the YouTube instructor’s video captions would read: “[instructor] Open VSCode, create a new file called index.html, and then paste the following into it.”

Types of Captions

There are three types of captions:

Closed captions: Captions that the listener can turn on or off as they are separate from the video.
Open captions: Captions that autoplay are embedded in the video, so the user cannot turn them off.
Live captions: Captions that are transcribed in real time as the audio plays. The transcriber can be physically present or remote.

While there are tools that can help auto-create captions, results should not be relied on until they have been proofread and confirmed to be accurate.

Transcripts

Transcripts are text versions of audio content. They include all the information needed to understand the content entirely.

Types of Transcripts

Transcripts are of two types:

Basic transcripts: These are similar to captions, except they are provided separately from the video and are not auto-synced with the video as it plays.
Descriptive transcripts: These include the audio descriptions integrated with the transcribed audio.
Ideally, the user should be able to interact with it by using it to jump to relevant parts of the video.
Interactive Transcripts: These are time-stamped transcripts that allow users to click on the text and jump directly to the corresponding part of the video. The time stamps are clickable and sync with the video’s timeline.

The Stream Video SDK provides captions and transcripts in several languages and can be configured to your needs.

Sign language and translation considerations

Sign language is a gesture-based language used by the deaf community, some of whom might find text-based transcripts hard to understand.

Providing a sign-language interpretation of your video content is the gold standard, especially if you know you have deaf users or if your content could be helpful to them.

However, providing sign language translations might not be very direct and will require extra effort to be done correctly.

Another point is internationalization Although it is not a WCAG requirement, providing accurate captions and transcripts in widely used languages, especially the language your users commonly use, is an excellent way to make your video content more accessible.

User customizations (Media Players)

Choosing a media player that provides UX accessibility features out of the box, where applicable, is a significant step up when building accessible video experiences.

Accessible players support keyboard navigation and right-to-left (RTL) language support, have good contrast, and have labels.

They also allow users to change the speed of the video, control how captions are shown, and support interactive transcripts.

For example, Stream’s media player and SDKs allow users to turn captions on/off, customize theming, and cancel noise - all great features that improve video experiences.

Optimizing pre-existing video experiences

It is generally challenging to return and make preexisting video content accessible, which is why pre-planning is essential. However, you can get some wins by:

Providing transcripts.
Revising and providing accurate captions.
Assessing whether you need Audio Descriptions.
Optimizing the media player you use to display your video content.

Additionally, consider adding translated versions of the captions and transcripts.

Compliance considerations

Accessibility isn’t just best practice. It’s increasingly becoming a legal requirement regulated by several acts.

The following key regulations should be noted to ensure compliance:

Americans with Disabilities Act (ADA)

This regulation has been interpreted by courts in the US to apply to digital products and spaces and should be considered by organizations operating within US law.

This is more so for organizations that sell consumer-focused products.

European Accessibility Act (EAA)

This regulation, which is mainly based on WCAG, is focused on digital accessibility. It will come into effect in 2025 and affect all organizations operating within the EU and those wishing to target that market.

Complying with these and other relevant accessibility laws protects organizations from litigation, inspires more trust, and increases profitability.

Organizations can use this checklist to measure their compliance levels and dedicate resources to preparing for the upcoming laws.

Notes and resources

Building accessible video experiences requires much more than providing captions. It is a process that requires careful planning and consideration. However, improving audiovisual experiences to be more helpful to a broader audience is a worthwhile investment.

Here are some resources you can use to learn more about web accessibility and making video accessible:

Want to try building accessible video experiences? Check out how Stream’s Video API can help you get started.