Build low-latency Vision AI applications using our new open-source Vision AI SDK. ⭐️ on GitHub ->

Moderation Certification Course

Reviewing flagged content details

This lesson explains how to review flagged content in Stream’s moderation queue. Moderators get more than just a preview, they see conversation context, harm labels, severity levels, confidence scores, and user metadata. With these details, moderators can make informed, consistent decisions, confirm or override automated actions, and collaborate with notes for accountability.

When content is flagged by Stream’s AI Moderation, it flows into the moderation queue. At first glance, you’ll see a preview of the flagged text, image, or video, but moderators don’t have to rely on the preview alone. Each item contains detailed context and metadata that explain why it was flagged and help guide your decision-making.

Conversation Context

Every flagged message or media file includes surrounding context. For text, this means the previous few messages in the conversation, which are critical for understanding tone, sarcasm, or ongoing harassment.

A message like “ready to get some kills today” looks severe in isolation, but in context, it may be a joke about a video game. Conversely, subtle repeated comments may escalate into harassment only when viewed together.

Harm Labels and Categories

Each item is tagged with one or more harm labels that indicate what type of violation the system detected, such as harassment, self-harm, scams, or hate speech. These labels serve as the system’s best assessment, but moderators must confirm whether they truly apply to the flagged content.

Actions Already Taken

Depending on the configured policy, the system may have already taken an action on the flagged item. This could include flagging it for review, blocking it from being posted, shadowblocking it from all users except the sender, bouncing it back for revision, or masking offensive terms. Moderators can approve the action, override it, or escalate further.

Severity Levels (Text)

For text-based harms, flagged items are assigned a severity level, low, medium, high, or critical. These levels help moderators triage their work. Critical harms like terrorism, CSAM, or self-harm should be handled immediately, while low-severity items may be addressed after higher priorities are cleared.

Confidence Scores (Media)

For images and videos, the system provides a confidence score between 0 and 100. This score represents how certain the AI is that the content contains a violation. Higher scores indicate stronger certainty, while mid- or lower-range scores should be examined more carefully.

User Metadata and History

The dashboard also shows metadata about the user who posted the content. This includes their user ID, account role, account creation date, and any previous violations. Repeat offenders are more likely to require stronger enforcement actions, while first-time violators may receive a lighter response.

Notes and Collaboration

Moderators can attach notes to flagged items to explain decisions, clarify edge cases, or flag policy gaps for admins to review. These notes are saved in the audit log, creating accountability and serving as a training resource for future moderators.

Reviewing flagged content effectively means piecing together all these elements, context, harm labels, actions, severity or confidence, and user history, so that every decision is consistent, fair, and well-informed.

Now that you know how to interpret flagged content, we’ll examine severity levels and confidence scores and how they guide moderator decision-making.