Advanced Filters

LAST EDIT Oct 01 2024

Advanced Filters can be used to filter out and moderate chat messages using different types of rules. Currently we support domain, email and regex filters. Each app can configure up to 15 different filters via the Stream dashboard under the ‘Advanced Filters’ section, or through the API interface.

Domain Filters

Copied!

Blocklists of type domain support the filtering of web domains, by partially matching the suffix of URLs found in a message.

The URL scheme, eg. http or https, or any part of the URL after the domain name is ignored.

For example, a domain filter such as gmail.com, following messages will be flagged or blocked:

  • http://www.gmail.com/

  • https://gmail.com/index.html

  • http://support.gmail.com/docs?language=english

  • https://yet.another.subdomain.gmail.com/

note: for convenience, a domain filter like www.gmail.com will match both http://www.gmail.com and http://gmail.com

A more narrowed-down domain filter containing one or more sub-domains, such as messenger.facebook.com will flag or block following messages:

  • https://messenger.facebook.com/index.html

  • http://download.messenger.facebook.com/

But won’t flag or block following messages:

  • http://facebook.com/

  • http://www.facebook.com/index.html

Setup Example

Copied!

Email Filters

Copied!

Exact email addresses can be flagged or blocked using email type of blocklists.

To flag or block all email addresses under the domain name hotmail.com, a domain filter should be used. For specific email addresses like info@hotmail.com, can be added to an email filter. However, a specific email filter won’t affect messages containing following email addresses:

Setup Example

Copied!

Regex Filters

Copied!

Regular expression filters empowers users to create custom filters by defining rules based on complex patterns. Unlike word filters a regex filter allows for precise and versatile matching of textual input. Here are best practices and considerations to maximise the effectiveness of your regex filters. Our regex filters work by pattern-matching, enabling users to define rules that go beyond simple word matching found in regular blocklists. By controlling regular expressions, users can capture nuanced patterns in the text, providing a high level of customisation.

  • Consider using a “word” type of blocklist for single words, domain type of blocklist for URLs and email type for specific email addresses

  • A regex filter can contain up to 100 rules,

  • Each regex rule can be up to 60 characters

  • Wildcard regex patterns like .* are not allowed since they will match most messages.

For example, a regex filter using ignorecase such as (?i)stream, will flag or block following messages:

  • best chat company provider in the world is getstream for sure!

  • i heard strava started using stream's chat messaging system

  • i will StReaM my gaming content tonight on twitch

  • stream's ai moderation feature is neat

Considerations when building regex filters

Copied!
  1. Expression Clarity:

    Crafting regex expressions should prioritise clarity to avoid confusion. Complex or overly cryptic expressions may be challenging to understand and maintain. Aim for expressive yet concise regex patterns.

  2. Handling Variable Length Text:

    Regex filters can efficiently handle variable length text, making them suitable for scenarios where the length of the matched content may vary. This flexibility allows for capturing diverse patterns within messages.

  3. Managing Escaping Characters:

    Special characters within regex patterns require escaping with \\\\ prior to them, to ensure accurate matching. Special characters that require escaping are: ., +, *, ?, ^, $, (, ), [, ], {, }, |, \\\\ To match a character having special meaning in regex, a escape sequence prefix with a backslash (\\\\) is needed. For example, \\\\. matches ".", regex \\\\+, matches "+", and regex \\\\(matches "(". To escape \\ the same principle applies, meaning \\\\\\\\

  4. Balancing Specificity and Generality:

    Achieving the right balance between a specific and general regex pattern is crucial. Overly specific patterns may miss variations, while overly general patterns may result in false positives. Broad regex rules like .* (match all characters), .a (match all “a” characters), ah* (will match both “a” and “ahhhhh”. By having such simple and broad regex patterns the chances to match false positives are increasing due to the fact that .a for example will match all words containing the letter “a”.

Optimising for Performance:

Copied!

Complex regex patterns may impact processing speed. The more knowledge around the data that is being searched through, the better the ability to optimise the regex for performance and success criteria. Consider the following tips as general guidelines when optimising the regex rules:

Order of alterations:

Copied!

Place the most likely regex matches first, for example if web address domains are to be matched, instead of \\\\.(?:biz|net|com)\\\\b the following rule \\\\.(?:com|net|biz)\\\\b should be used, as com will be the most likely match from the list

Usage of anchors:

Copied!

Anchors should be used when possible, particularly at the start and end of line or string anchors. The beginning and end-of-string anchors ^ and $ can save your regex a lot of backtracking in cases where the match is bound to fail. If the entire string does not match the pattern, and the anchors characters indicate the pattern must be found at the beginning or end, then the regex engine can quickly “fail” if the pattern expected is not found at the anchor position.

Example: ^(?:word1|word2) is preferred to ^word1|^word2

Usage of non-capturing groups:

Copied!

Groups multiple tokens together without creating a capture group.

Lazy quantifiers:

Copied!

Greedy quantifiers can be safely replaced using lazy quantifiers **which gives the regex a performance boost without altering the result.

  • Greedy Quantifier (*):

    • May lead to backtracking if the subsequent part of the pattern doesn't match immediately.

    • Matches as much as possible.

  • Lazy Quantifier (*?):

    • Matches as little as possible.

    • Tries to satisfy the overall pattern with minimal consumption.

In scenarios where the matched content is relatively small or close to the beginning of the text, using a lazy quantifier can avoid unnecessary processing and potentially result in faster regex execution. Example, let's say a spam message Heyyyyyyyyyyyy! Check out this amazinggggggggggg offer!!!!!! has been posted a lot, trying to bypass the filters by adding more yyyyyy and gggggg

  • Greedy: (.)\\\\1+ - will match "yyyyyyyyyyy" and "ggggggggggg

  • Lazy: (.)\\\\1+? - will match "yy" and "gg

By using the lazy quantifier, unnecessary flagging of longer repeated sequences can be avoided, making the spam detection more precise and less prone to false positives.

Building Effective Custom Regex Filters

Copied!

Specific Patterns and Context:

Copied!

Develop regex filters with specific patterns in mind, tailoring them to focus on distinct content types or language patterns that are pertinent to your community. Examples include:

  1. \\\\bprofit\\\\b: Matches the word "profit" as a whole word.

  2. @[A-Za-z0-9_]+: Matches Twitter-like usernames.

  3. (?i)hate: Matches the word “hate” (case insensitive)

Strive for patterns that align with specific use-case which can lead to effectively capturing potential violations.

Copied!

Exploring Variations with Regex:

Copied!

Regex filters provide flexibility to account for variations in content. Include variations in your regex rules to enhance coverage. For example:

  1. (earn|make) money fast

  2. (\\\\d{1,3},)*\\\\d{3}: Matches numbers with or without commas (e.g., 1,000 or 1000).

Combine with other Filters or Harm Engines:

Copied!

Enhance your filtering strategy by combining regex filters with some of our other filters. This allows addressing a broader range of content while maintaining precision. For instance, specific words, emails, or domains can be added to the Advanced Filters in conjunction with regex rules, or any of our harm engines in combination with well-crafted regex patterns will increase the number of moderated messages significantly.

Setup Example

Copied!

Copied!