Advanced Moderation Beta

LAST EDIT Oct 08 2021
Our advanced moderation Beta has finished and these docs are deprecated. Refer to the current Advanced Moderation docs.

Summary 

Copied!

Advanced Chat Moderation introduces two main features to Stream’s existing chat moderation feature set, improved AI Moderation and our Karma System. These features exist to complement the existing chat moderation dashboard, image moderation, and blocklist functionality.

This page will cover the existing features in the moderation dashboard and the new advanced moderation features.

Existing Moderation Features

Copied!

Moderation Dashboard

Copied!

You can access the latest version of our moderation dashboard within v2 of your Stream dashboard. Login to the Dashboard and then follow the link in the banner at the top of the dashboard to switch to v2. 

Once you are in the  v2 dashboard, select your app, and then navigate to Chat > Moderation.

The moderation dashboard gives you three tabs to view flagged, blocked, and reviewed messages respectively. Flagged messages are the most urgent, as you may want to take immediate action based on flagged messages, deleting messages or banning users. Blocked messages are important to review to help further calibrate your automated moderation settings, if you see messages being blocked by the AI moderation or blocklist that you do not wish to block, you’ll want to relax your AI thresholds or update your blocklist. And finally the reviewed tab is available to review past moderation decisions. 

Moderation Dashboard Access

Copied!

The Stream dashboard supports creating users with a "moderator" role, which grants read-only permissions for all sections of the dashboard except the moderation dashboard, where the user has full admin permissions to delete messages, ban users, etc.

Moderator Role Selection

Blocklist

Copied!

The Blocklist feature allows you to upload (or use a default) blocklist of key words for moderation. Messages containing any of these keywords can be automatically blocked or flagged as you see fit. You can manage your blocklists from the chat overview screen, found here: 

Scroll down on the chat overview screen and you’ll find the blocklist management settings. 

Here you can add or remove blocklists, view the contents of each blocklist, and add new words to existing lists. Once you’ve created the blocklists you plan to use you’ll need to configure a channel type to use that blocklist. To do this go to the settings from a channel type (the top of your chat overview screen will list all of your channel types) Once in the channel type settings, scroll down and find the Blocklist section of the settings screen

You’ll need to enable the feature, select a blocklist, and decide whether to flag or block messages. If you are using the default provided blocklist (profanity_en_2020_v1) we recommend you flag only as this block list is very aggressive. (You can review the contents of the default list, but cannot change the contents of the list)

Advanced Chat Moderation Features

Copied!

AI Moderation

Copied!

AI moderation is powered by a machine learning model that can provide a confidence interval (a 0-1 score) for any message, in each of three categories: Spam, Explicit, and Toxic. In addition to enabling AI moderation, thresholds must be set for each category, telling the system when to flag and when to block messages based on the scores provided by the AI moderation model. You can enable and configure settings for AI moderation from the channel type settings screen (found on the chat overview screen). 

You’ll see an example configuration above. These settings might make sense for a community for adults, where explicit language does not need to be moderated, but toxic speech and spam messages do. In this example the flag threshold for toxic speech (.65) is lower than spam threshold (.75) so the moderator wants to be more aggressive in flagging potential toxic speech for review than spam. Both spam and toxic thresholds are set to the same score (.95) meaning they want to block messages for either category at the same point, when the model is 95% confident that the message is spam/toxic Explicit content is set to never flag, and never block messages in this example.   

Testing and Calibration

Copied!

To begin to understand how the AI model scores messages, there is a “Test Messages” button in the top right corner of the AI moderation settings panel (on the Channel Type settings screen). Clicking this button will open a modal window, where you can enter as many test messages as you would like, and see what scores the AI model gives the message against each of the categories. The responses are also color coded to show how those messages would be handled with your currently configured thresholds. 

As in the example above, it’s not uncommon for messages to score highly in multiple categories. As in the lewd message example above, not only is there explicit language, but it is phrased as an attack on someone or something, and thus has a high toxic score as well. (Technically speaking, this system is a `Multi-label classifier` not a `Multi-class classifier`)

It also may be useful to note here, that this AI moderation system is based on a deep learning algorithm, meaning, there are not clear heuristics deciding how messages are scored, but rather a relational network of information mapping training data into a complex “brain” that generates confidence scores for new messages based on the messages it has been trained on. 

A complex amalgamation of a lot of data points. More specifically, for the curious reader, that looks like this: 

  1. We use a pre-trained open source multi-lingual DistilBERT model, trained on billions of Wikipedia pages in 104 languages.

  2. We then selected 16 public datasets containing millions of examples of hate speech, spam and toxic text.

  3. We added a classification layer to DistilBERT model for learning how to classify a message as spam, toxic and explicit.

Additional Moderation Rules

Copied!

Moderators and Admins are automatically exempted from all AI moderation and Blocklist flagging/blocking. 

Messages sent from server-side SDKs are exempted from all AI moderation. 

Limitations

Copied!

AI moderation is a powerful tool that helps to flag offensive content and lets a small team of moderators be more productive than they would be on their own. However, offensive content is contextual and changes. While we have tried to create as inclusive and comprehensive a base list as possible, we're not the ultimate authority on objectionable content.

Please don’t hesitate to email us with any concerns around what is and isn’t being caught by our AI moderation system.