When content is flagged by Stream’s AI Moderation, it flows into the moderation queue. At first glance, you’ll see a preview of the flagged text, image, or video, but moderators don’t have to rely on the preview alone. Each item contains detailed context and metadata that explain why it was flagged and help guide your decision-making.
Conversation Context
Every flagged message or media file includes surrounding context. For text, this means the previous few messages in the conversation, which are critical for understanding tone, sarcasm, or ongoing harassment.
A message like “ready to get some kills today” looks severe in isolation, but in context, it may be a joke about a video game. Conversely, subtle repeated comments may escalate into harassment only when viewed together.
Harm Labels and Categories
Each item is tagged with one or more harm labels that indicate what type of violation the system detected, such as harassment, self-harm, scams, or hate speech. These labels serve as the system’s best assessment, but moderators must confirm whether they truly apply to the flagged content.
Actions Already Taken
Depending on the configured policy, the system may have already taken an action on the flagged item. This could include flagging it for review, blocking it from being posted, shadowblocking it from all users except the sender, bouncing it back for revision, or masking offensive terms. Moderators can approve the action, override it, or escalate further.
Severity Levels (Text)
For text-based harms, flagged items are assigned a severity level, low, medium, high, or critical. These levels help moderators triage their work. Critical harms like terrorism, CSAM, or self-harm should be handled immediately, while low-severity items may be addressed after higher priorities are cleared.
Confidence Scores (Media)
For images and videos, the system provides a confidence score between 0 and 100. This score represents how certain the AI is that the content contains a violation. Higher scores indicate stronger certainty, while mid- or lower-range scores should be examined more carefully.
User Metadata and History
The dashboard also shows metadata about the user who posted the content. This includes their user ID, account role, account creation date, and any previous violations. Repeat offenders are more likely to require stronger enforcement actions, while first-time violators may receive a lighter response.
Notes and Collaboration
Moderators can attach notes to flagged items to explain decisions, clarify edge cases, or flag policy gaps for admins to review. These notes are saved in the audit log, creating accountability and serving as a training resource for future moderators.
Reviewing flagged content effectively means piecing together all these elements, context, harm labels, actions, severity or confidence, and user history, so that every decision is consistent, fair, and well-informed.
Now that you know how to interpret flagged content, we’ll examine severity levels and confidence scores and how they guide moderator decision-making.