Confidence scores and severity levels

Managing confidence scores and severity levels

This lesson explores how to manage confidence scores and severity levels in Stream’s moderation system. It explains how severity levels classify text-based harms and guide enforcement actions, while confidence scores determine how strictly AI models flag harmful images and videos. You’ll also learn best practices for setting thresholds, reducing false positives, and balancing automation with human oversight to maintain both safety and accuracy in your community.

Why This Matters

Stream moderation engines provide two distinct ways to tune sensitivity:

Severity Levels → for text harms detected by the AI Text engine.
Confidence Scores → for image and video harms detected by vision models.

Understanding how to configure each helps you find the right balance between automation and human oversight across different content types.

Severity Levels (Text Moderation)

Severity levels classify how harmful or urgent a text-based violation is. Stream uses four levels: Low, Medium, High, Critical.

How It’s Used

Severity is assigned by the LLM for each text harm.
Admins can map severity levels to different actions (e.g., Low = Flag, High = Shadowblock, Critical = Block).

Confidence Scores (Image & Video Moderation)

Confidence scores indicate how certain the AI vision model is that an uploaded image or video contains a specific harm. Scores range from 0–100.

Higher thresholds mean moderation is more lenient, the system will only act when it is highly confident the harm is present. This reduces false positives but may allow some harmful content to slip through.

Lower thresholds make moderation more strict, the system will act even on lower-confidence detections. This helps catch more potential violations but increases the risk of false positives.

How It’s Used

Confidence scores are set per harm category (nudity, violence, drugs, etc.).
Admins define thresholds for each harm:
- Above the threshold → take the configured action (flag, block, etc.).
- Below the threshold → allow content through.

Best Practices

For Severity (Text)

Keep categories narrow and map severity to consistent actions.
Treat Critical harms (CSAM, terrorism, self-harm threats) as automatic blocks.
Use severity filters in the queue so moderators handle urgent issues first.
For Confidence (Image/Video)
Start conservative with higher thresholds (95%+) for blocking explicit categories.
Use lower thresholds for flag-only categories like spammy images.
Review false positives/negatives regularly and adjust thresholds.

Both tools give you control over sensitivity and help balance automation with human oversight.

Next, we’ll move into moderation workflows, following content from detection to the review queue and understanding how moderators act on flagged items.