Build low-latency Vision AI applications using our new open-source Vision AI SDK. ⭐️ on GitHub ->

Moderation Certification Course

Interpreting insights to refine policies

This lesson explores how to use insights from Stream’s audit logs and reports to refine moderation policies. It explains how data on category volume, false positives, moderator overrides, and emerging harms can highlight policy gaps and inefficiencies. You’ll learn strategies for reducing false positives, closing detection gaps, adjusting severity mappings, and identifying new risks. The lesson also shares best practices like weekly reviews, staging tests, and documenting changes, helping teams create adaptive moderation policies that evolve with their communities.

Why Insights Matter

Moderation is not a set it and forget it process. Communities evolve, language shifts, and bad actors constantly invent new ways to bypass filters. Insights from your audit logs and reports give you the feedback loop you need to spot gaps, reduce false positives, and adapt your policies as your platform grows.

What Insights Reveal

From Stream’s dashboard, you’ll see metrics and trends that can highlight areas for improvement:

  • Category Volume: Which harms are being flagged most often (e.g., 60% spam, 25% harassment).
  • False Positives: Where moderators consistently override AI decisions.
  • Moderator Overrides: Categories where human review often disagrees with AI.
  • Emerging Harms: Sudden increases in categories like scams or hate speech.
  • Reviewer Activity: How quickly and consistently moderators act on flagged content.

Each of these signals is a clue that your policies or rules may need updating.

Using Insights to Improve Policies

  1. Reduce False Positives
    • If harmless content is frequently flagged (e.g., jokes misclassified as harassment), refine your LLM prompts for that harm.
    • Raise confidence thresholds for image/video categories with too many safe items being blocked.
    • Consider reassigning mild violations to Flag instead of Block.
      2.Close Detection Gaps
    • If certain harms rarely appear in reports but you suspect under-detection, test your policies.
    • Add semantic filters for new slang, coded language, or platform-specific terms.
    • Expand blocklists and regex for spam or scam patterns slipping through.
  2. Adjust Severity Mappings
    • If moderators spend too much time on low-severity issues, remap them to lower actions (e.g., Auto-approve or Bounce).
    • Make sure Critical harms (self-harm, child exploitation) always escalate with immediate actions.
  3. Identify Emerging Risks
    • Use reporting trends to detect surges in certain categories (e.g., seasonal scams, political hate speech during elections).
    • Update prompts, blocklists, or reviewer training to get ahead of new threats.
  4. Improve Moderator Efficiency
    • If reports show uneven workloads, consider better reviewer assignment rules.
    • Use tagging + notes to capture edge cases that should feed back into policy adjustments.

Example Scenarios

  • Spam Overload: Reporting shows 70% of flagged items are spam.
    • Add regex for repeated patterns and auto-block known phrases to reduce manual load.
  • Harassment False Positives: Moderators often unblock flagged “teasing” comments.
    • Rewrite the harassment prompt to better distinguish friendly banter from abuse.
  • Missed Self-Harm Signals: Logs show moderators manually flagging self-harm posts that AI missed.
    • Lower detection thresholds for self-harm or add a more explicit harm label.
  • Policy Drift: Reporting shows one moderator frequently overrides “hate speech” flags.
    • Review policies with the team to clarify boundaries and re-align enforcement.

Best Practices

  • Review Reports Weekly: Don’t let trends go unnoticed.
  • Pair Quantitative + Qualitative: Use numbers (volume, overrides) and moderator notes together.
  • Iterate in Staging: Test policy refinements in a dev environment before rolling to production.
  • Document Changes: Keep a changelog of rule and threshold updates so you can measure their impact.
  • Feedback Loop: Encourage moderators to flag edge cases with notes so policy adjustments are evidence-based.

Insights aren’t just numbers, they’re signals. By interpreting moderation data, you can reduce false positives, close gaps, catch new risks early, and keep your policies aligned with both your community standards and evolving online threats.

Next, we’ll dive into admin settings for system tuning, exploring the key global configurations that shape how Stream’s moderation system behaves, from default preferences and queue behavior to notifications, permissions, and templates.