Confidence scores and severity levels

When Stream’s AI Moderation flags content, you’ll often see either a severity level (for text harms) or a confidence score (for image and video harms). These signals are designed to help moderators prioritize their workload and decide how much weight to give the AI’s judgment. Understanding the difference is key to reviewing flagged content effectively.

Severity Levels (Text Harms)

The AI Text engine assigns severity levels whenever it detects a harm. They represent how urgent or serious the violation is.

Low: Minor or borderline cases, such as playful insults (“You’re such a nerd lol”) or light spam. These often require human judgment to decide if action is needed.
Medium: Clearer violations that may still need review, such as targeted ridicule (“Nobody likes you, go away”) or repetitive spam links.
High: Serious violations that likely warrant immediate action, such as explicit threats (“I’m going to punch you in the face”) or strong scam attempts (“Claim your free prize here, just enter your credit card”).
Critical: Zero-tolerance harms, including child exploitation, self-harm urgencies, or terrorism threats. These must be acted on immediately and usually require escalation or permanent enforcement.

As a moderator, severity levels act as a triage guide; you should review Critical and High items first, while Low and Medium can wait until the urgent cases are cleared.

Confidence Scores (Image & Video Harms)

For visual content, Stream’s AI models assign a confidence score between 0 and 100. This number represents how certain the system is that the content contains a violation.

High confidence (95%+): The system is very certain. A photo flagged as “Explicit Nudity” at 98% is almost always a true violation.
Medium confidence (70–90%): The system is somewhat confident. For example, a boxing match might be flagged for “Violence” at 78%. These cases usually need human review to confirm.
Low confidence (<70%): The system is unsure. Many of these will be false positives, such as a swimsuit photo flagged as “Underwear/Swimwear” at 65%.

Confidence scores don’t indicate severity of harm, only certainty of detection. It’s your job to look at the flagged image or video in context and decide if it truly violates policy.

How to Use Severity and Confidence Together

Use severity levels to decide which text items to prioritize.
Use confidence scores to decide how much trust to place in the AI’s visual detections.
Always check the conversation context or media details before confirming the AI’s decision.

Moderator Role

Your role isn’t to blindly follow severity or confidence, but to use them as decision aids. Critical harms and high-confidence detections should move to the top of your queue, but context always matters. False positives and borderline cases should be documented with notes so admins can adjust policies and thresholds over time.

Next, we’ll walk through how moderators can use Channel Explorer to browse conversations, review context in bulk, and even communicate directly with communities to reinforce guidelines.