await client.moderation.upsertConfig({
key: "chat:messaging",
rule_builder_config: {
enabled: true,
rules: [
{
id: "spam-detection",
name: "Spam Detection",
rule_type: "user",
enabled: true,
cooldown_period: "24h",
conditions: [
{
type: "text_rule",
text_rule_params: {
threshold: 5,
time_window: "1h",
harm_labels: ["SCAM", "PLATFORM_BYPASS"],
},
},
],
logic: "AND",
action: {
type: "ban",
ban_options: {
duration: 3600,
reason: "Spam behavior detected",
shadow_ban: false,
ip_ban: false,
},
},
},
],
},
});
Advanced Rule Builder
The Rule Builder is an advanced moderation feature that automatically takes action against users based on their behavior patterns. Instead of moderating each message individually, the Rule Builder tracks user violations over time and triggers actions when users reach certain thresholds.
How It Works
The Rule Builder monitors user behavior across all your moderation tools and automatically responds when users violate your community guidelines repeatedly. For example:
- Spam Detection: Ban users who send 5+ spam messages within 1 hour
- Toxic Behavior: Flag users who post 3+ hate speech messages within 24 hours
- New User Abuse: Shadow ban new accounts that violate rules within their first day
- Content Frequency: Flag users who post 50+ messages within 1 hour
Key Benefits
- Automated Response: No manual intervention needed for common violation patterns
- Flexible Rules: Create custom rules that match your community’s specific needs
- Time-Based Tracking: Consider user behavior over time, not just individual messages
- Multiple Actions: Choose from ban, flag, shadow ban, or other actions
- Real-Time Processing: Rules are evaluated immediately as users post content
Creating Your First Rule
Dashboard
You can create rules in the dashboard by navigating to Moderation > Policies > [Select a policy] > Rule Builder.
API
You can create rules using UpsertConfig API.
client.moderation.upsert_config({
key: "chat:messaging",
rule_builder_config: {
enabled: true,
rules: [
{
id: "spam-detection",
name: "Spam Detection",
rule_type: "user",
enabled: true,
cooldown_period: "24h",
conditions: [
{
type: "text_rule",
text_rule_params: {
threshold: 5,
time_window: "1h",
harm_labels: ["SPAM", "ADS"]
}
}
],
logic: "AND",
action: {
type: "ban",
ban_options: {
duration: 3600,
reason: "Spam behavior detected",
shadow_ban: false,
ip_ban: false
}
}
}
]
}
})
Basic Rule Structure
Every rule has three main parts:
- Conditions: What behavior to watch for
- Threshold: How many violations before taking action
- Action: What to do when the threshold is reached
Example: Spam Detection Rule
{
"id": "spam-detection",
"name": "Spam Detection",
"rule_type": "user",
"enabled": true,
"cooldown_period": "24h",
"conditions": [
{
"type": "text_rule",
"text_rule_params": {
"threshold": 5,
"time_window": "1h",
"harm_labels": ["SCAM", "PLATFORM_BYPASS"]
}
},
{
"type": "content_count_rule",
"content_count_rule_params": {
"threshold": 50,
"time_window": "1h"
}
}
],
"logic": "OR",
"action": {
"type": "ban_user",
"ban_options": {
"duration": 3600,
"reason": "Spam behavior detected",
"shadow_ban": false,
"ip_ban": false
}
}
}
This rule:
- Watches for spam and advertising content
- Triggers when a user posts 5+ spam messages within 1 hour or 50+ messages within 1 hour
- Bans the user for 1 hour when triggered
User-Type Rules (Track User Behavior Over Time)
User-type rules track violations across multiple pieces of content and trigger actions when users reach certain thresholds over time.
Text-Based Rules
Track violations in text content like messages, comments, or posts over time.
Available Labels:
- Harassment: SEXUAL_HARASSMENT, MORAL_HARASSMENT, BULLYING
- Hate Speech: RACISM, HOMOPHOBIA, MISOGYNY, ABLEISM
- Threats: THREAT, TERRORISM, SELF_HARM
- Inappropriate Content: SEXUALLY_EXPLICIT, DRUG_EXPLICIT, WEAPON_EXPLICIT
- Spam: SCAM, ADS, FLOOD
Example:
{
"type": "text_rule",
"text_rule_params": {
"threshold": 3,
"time_window": "24h",
"harm_labels": ["HATE_SPEECH", "THREAT"],
"severity": "HIGH"
}
}
This rule tracks how many hate speech or threat messages a user posts within 24 hours and triggers when they reach 3 violations.
Image-Based Rules
Track violations in uploaded images over time.
Available Labels:
- Explicit Content: Explicit, Non-Explicit Nudity
- Violence: Violence, Visually Disturbing
- Inappropriate: Drugs & Tobacco, Alcohol, Rude Gestures
- Hate Symbols: Hate Symbols
Example:
{
"type": "image_rule",
"image_rule_params": {
"threshold": 1,
"time_window": "24h",
"harm_labels": ["Explicit", "Violence"]
}
}
This rule tracks how many explicit or violent images a user uploads within 24 hours and triggers when they reach 1 violation.
User-Based Rules
Check user account properties.
Example:
{
"type": "user_rule",
"user_rule_params": {
"max_age": "24h"
}
}
This condition is true for users who created their account within the last 24 hours.
Content Count Rules
Track how many messages a user posts over time.
Example:
{
"type": "content_count_rule",
"content_count_rule_params": {
"threshold": 50,
"time_window": "1h"
}
}
This triggers when a user posts 50+ messages within 1 hour.
Available Actions
These actions affect the user account and are typically used with user-type rules that track behavior over time.
Ban User
Temporarily or permanently ban a user from your platform.
{
"type": "ban_user",
"ban_options": {
"duration": 86400,
"reason": "Multiple violations detected",
"shadow_ban": false,
"ip_ban": false
}
}
Options:
duration
: Ban length in seconds (0 = permanent)reason
: Reason shown to moderatorsshadow_ban
: User can post but content is hiddenip_ban
: Also ban the user’s IP address
Flag User
Create a review item for manual moderator review of the user.
{
"type": "flag_user",
"flag_user_options": {
"reason": "Suspicious behavior pattern detected"
}
}
Examples
Example 1: Toxic User Detection
This rule identifies users who consistently engage in harmful behavior over time, targeting both hate speech and excessive profanity.
{
"id": "toxic-user",
"name": "Toxic User Detection",
"rule_type": "user",
"enabled": true,
"cooldown_period": "7d",
"logic": "OR",
"conditions": [
{
"type": "text_rule",
"text_rule_params": {
"threshold": 5,
"time_window": "24h",
"harm_labels": ["HATE_SPEECH", "HARASSMENT", "THREAT"]
}
},
{
"type": "text_rule",
"text_rule_params": {
"threshold": 50,
"time_window": "24h",
"blocklist_match": ["profanity_en_2020_v1"]
}
}
],
"action": {
"type": "ban_user",
"ban_options": {
"duration": 604800,
"reason": "Toxic behavior detected",
"shadow_ban": false,
"ip_ban": true
}
}
}
What it does:
- First condition: Triggers if a user posts 5 or more messages containing hate speech, harassment, or threats within 24 hours
- Second condition: Triggers if a user posts 50 or more messages that match the profanity blocklist within 24 hours
- Logic: Uses “OR” logic, meaning either condition can trigger the rule
- Action: Bans the user for 7 days (604,800 seconds) and also bans their IP address to prevent them from creating new accounts
When to use this rule:
- Communities with strict content policies
- Platforms that need to quickly remove toxic users
- Situations where you want to prevent users from circumventing bans with new accounts
Example 2: New User Spam Protection
This rule specifically targets spam from newly created accounts, which are often used by bots or malicious users.
{
"id": "new-user-spam",
"name": "New User Spam Protection",
"rule_type": "user",
"enabled": true,
"cooldown_period": "6h",
"logic": "AND",
"conditions": [
{
"type": "text_rule",
"text_rule_params": {
"threshold": 3,
"time_window": "1h",
"harm_labels": ["SPAM", "ADS"]
}
},
{
"type": "user_rule",
"user_rule_params": {
"max_age": "24h"
}
}
],
"action": {
"type": "ban_user",
"ban_options": {
"duration": 3600,
"reason": "New user spam detected",
"shadow_ban": true,
"ip_ban": false
}
}
}
What it does:
- First condition: Triggers if a user posts 3 or more spam or advertising messages within 1 hour
- Second condition: Only applies to users whose accounts are less than 24 hours old
- Logic: Uses “AND” logic, meaning both conditions must be true for the rule to trigger
- Action: Shadow bans the user for 1 hour (3,600 seconds), meaning they can still post but their content is hidden from other users
When to use this rule:
- Platforms with high bot activity
- Communities that want to give new users a chance but prevent immediate spam
- Situations where you want to test if a user is legitimate before fully banning them
Example 3: Message Frequency Abuse
This rule catches users who are posting too many messages too quickly, especially when combined with spam content.
{
"id": "message-flood",
"name": "Message Frequency Abuse",
"rule_type": "user",
"enabled": true,
"cooldown_period": "12h",
"logic": "AND",
"conditions": [
{
"type": "content_count_rule",
"content_count_rule_params": {
"threshold": 50,
"time_window": "1h"
}
},
{
"type": "text_rule",
"text_rule_params": {
"threshold": 5,
"time_window": "1h",
"harm_labels": ["SCAM", "PLATFORM_BYPASS"],
"contains_url": true
}
}
],
"action": {
"type": "flag_user",
"flag_user_options": {
"reason": "Excessive messaging with spam content"
}
}
}
What it does:
- First condition: Triggers if a user posts 50 or more messages within 1 hour (regardless of content)
- Second condition: Triggers if a user posts 5 or more messages classified as spam or flood content within 1 hour
- Logic: Uses “AND” logic, meaning both conditions must be true for the rule to trigger
- Action: Flags the user for manual review by moderators instead of automatically banning them
When to use this rule:
- Communities where legitimate users might post frequently (like gaming chats)
- Situations where you want human oversight before taking action
- Platforms that want to distinguish between active users and spam bots
Example 4: Severe User Violation Pattern
This rule provides user-level action for severe violations tracked over time.
{
"id": "severe-user-violation",
"name": "Severe User Violation",
"rule_type": "user",
"enabled": true,
"cooldown_period": "30d",
"logic": "OR",
"conditions": [
{
"type": "text_rule",
"text_rule_params": {
"threshold": 1,
"time_window": "24h",
"harm_labels": ["TERRORISM"]
}
},
{
"type": "image_rule",
"image_rule_params": {
"threshold": 1,
"time_window": "24h",
"harm_labels": ["Violence", "Hate Symbols"]
}
}
],
"action": {
"type": "ban_user",
"ban_options": {
"duration": 0,
"reason": "Severe content violation",
"shadow_ban": false,
"ip_ban": true
}
}
}
What it does:
- First condition: Triggers if a user posts even 1 message containing terrorism-related content within 24 hours
- Second condition: Triggers if a user uploads even 1 image containing violence or hate symbols within 24 hours
- Logic: Uses “OR” logic, meaning either condition can trigger the rule
- Action: Permanently bans the user (duration: 0 seconds) and bans their IP address
When to use this rule:
- Platforms with zero-tolerance policies for certain content
- Communities that need to comply with legal requirements
- Situations where you want to permanently remove users who post severe violations
Example 5: Coordinated Behavior Detection
This rule identifies patterns that suggest coordinated abuse or bot activity.
{
"id": "coordinated-behavior",
"name": "Coordinated Behavior Detection",
"rule_type": "user",
"enabled": true,
"cooldown_period": "48h",
"logic": "AND",
"conditions": [
{
"type": "content_count_rule",
"content_count_rule_params": {
"threshold": 100,
"time_window": "24h"
}
},
{
"type": "text_rule",
"text_rule_params": {
"threshold": 10,
"time_window": "24h",
"harm_labels": ["SPAM", "ADS", "SCAM"]
}
},
{
"type": "text_rule",
"text_rule_params": {
"threshold": 5,
"time_window": "24h",
"contains_url": true
}
}
],
"action": {
"type": "flag_user",
"flag_user_options": {
"reason": "Potential coordinated spam or bot activity"
}
}
}
What it does:
- First condition: Triggers if a user posts 100 or more messages within 24 hours
- Second condition: Triggers if a user posts 10 or more spam, advertising, or scam messages within 24 hours
- Third condition: Triggers if a user posts 5 or more messages containing URLs within 24 hours
- Logic: Uses “AND” logic, meaning all three conditions must be true for the rule to trigger
- Action: Flags the user for manual review by moderators
When to use this rule:
- Platforms experiencing coordinated spam attacks
- Communities that want to identify potential bot networks
- Situations where you need to investigate before taking action
Content-Type Rules (Evaluate Individual Content)
Content-type rules evaluate individual pieces of content and can trigger immediate actions or be combined with user-type rules.
Text Content Rules
Evaluate individual text content for immediate action.
Example:
{
"type": "text_content",
"text_content_params": {
"harm_labels": ["TERRORISM"],
"severity": "HIGH"
}
}
This rule evaluates each individual text message and triggers immediately if it contains terrorism-related content.
Complete Rule Example:
{
"id": "immediate-text-filter",
"name": "Immediate Text Filter",
"rule_type": "content",
"enabled": true,
"logic": "OR",
"conditions": [
{
"type": "text_content",
"text_content_params": {
"harm_labels": ["TERRORISM", "THREAT"],
"severity": "HIGH"
}
}
],
"action": {
"type": "block_content",
"remove_content_options": {
"reason": "Immediate removal of threatening content"
}
}
}
Image Content Rules
Evaluate individual images for immediate action.
Example:
{
"type": "image_content",
"image_content_params": {
"harm_labels": ["Explicit", "Violence"]
}
}
This rule evaluates each individual image and triggers immediately if it contains explicit or violent content.
Complete Rule Example:
{
"id": "immediate-image-filter",
"name": "Immediate Image Filter",
"rule_type": "content",
"enabled": true,
"logic": "OR",
"conditions": [
{
"type": "image_content",
"image_content_params": {
"harm_labels": ["Explicit", "Violence", "Hate Symbols"]
}
}
],
"action": {
"type": "flag_content",
"flag_content_options": {
"reason": "Inappropriate image content detected"
}
}
}
Video Content Rules
Evaluate individual videos for immediate action.
Example:
{
"type": "video_content",
"video_content_params": {
"harm_labels": ["Explicit", "Violence"]
}
}
This rule evaluates each individual video and triggers immediately if it contains explicit or violent content.
Complete Rule Example:
{
"id": "immediate-video-filter",
"name": "Immediate Video Filter",
"rule_type": "content",
"enabled": true,
"logic": "OR",
"conditions": [
{
"type": "video_content",
"video_content_params": {
"harm_labels": ["Explicit", "Violence"]
}
}
],
"action": {
"type": "block_content",
"remove_content_options": {
"reason": "Inappropriate video content detected"
}
}
}
Available Actions
These actions affect individual pieces of content and are typically used with content-type rules that evaluate content immediately.
Flag Content
Flag the specific content for manual review.
{
"type": "flag_content",
"flag_content_options": {
"reason": "Content violates community guidelines"
}
}
Block Content
Block the specific content from the platform.
{
"type": "block_content",
"remove_content_options": {
"reason": "Content violates community guidelines"
}
}
Examples
Example 1: Zero-Tolerance Content Filter
This rule provides immediate action for the most serious violations using content-type rules.
{
"id": "zero-tolerance-filter",
"name": "Zero-Tolerance Content Filter",
"rule_type": "content",
"enabled": true,
"logic": "OR",
"conditions": [
{
"type": "text_content",
"text_content_params": {
"harm_labels": ["TERRORISM"]
}
},
{
"type": "image_content",
"image_content_params": {
"harm_labels": ["Violence", "Hate Symbols"]
}
}
],
"action": {
"type": "block_content",
"remove_content_options": {
"reason": "Severe content violation"
}
}
}
What it does:
- First condition: Triggers immediately if the current text message contains terrorism-related content
- Second condition: Triggers immediately if the current image contains violence or hate symbols
- Logic: Uses “OR” logic, meaning either condition can trigger the rule
- Action: Immediately removes the violating content
When to use this rule:
- Platforms with zero-tolerance policies for certain content
- Communities that need immediate content filtering
- Situations where you want to remove content instantly without affecting the user account
Example 2: Multi-Content Type Filter
This rule demonstrates how to filter multiple content types using content-type rules for immediate action.
{
"id": "multi-content-filter",
"name": "Multi-Content Type Filter",
"rule_type": "content",
"enabled": true,
"logic": "OR",
"conditions": [
{
"type": "text_content",
"text_content_params": {
"harm_labels": ["SPAM", "SCAM"],
"contains_url": true
}
},
{
"type": "image_content",
"image_content_params": {
"harm_labels": ["Explicit"]
}
},
{
"type": "video_content",
"video_content_params": {
"harm_labels": ["Violence"]
}
}
],
"action": {
"type": "flag_content",
"flag_content_options": {
"reason": "Multiple content violations detected"
}
}
}
What it does:
- First condition: Triggers immediately if the current text message contains spam/scam content AND includes a URL
- Second condition: Triggers immediately if the current image contains explicit content
- Third condition: Triggers immediately if the current video contains violent content
- Logic: Uses “OR” logic, meaning any condition can trigger the rule
- Action: Flags the content for manual review by moderators
When to use this rule:
- Platforms that want to catch multiple types of violations in a single rule
- Communities that need immediate content filtering across different media types
- Situations where you want to flag content for review rather than removing it immediately
Example 3: Spam Link Detection
This rule catches spam and phishing attempts by detecting suspicious links in messages.
{
"id": "spam-link-detection",
"name": "Spam Link Detection",
"rule_type": "content",
"enabled": true,
"logic": "AND",
"conditions": [
{
"type": "text_content",
"text_content_params": {
"contains_url": true
}
},
{
"type": "text_content",
"text_content_params": {
"blocklist_match": ["phishing_2023", "malware_links"]
}
}
],
"action": {
"type": "block_content",
"remove_content_options": {
"reason": "Suspicious URL detected"
}
}
}
What it does:
- First condition: Triggers if the current message contains a link
- Second condition: Triggers if the current message matches known spam or phishing patterns
- Logic: Uses “AND” logic, meaning both conditions must be true for the rule to trigger
- Action: Immediately removes messages with suspicious links
When to use this rule:
- Platforms with high spam activity
- Communities that want to protect users from malicious links
- Situations where you want to automatically remove suspicious content
Key Differences
Aspect | User-Type Rules | Content-Type Rules |
---|---|---|
Evaluation Timing | Track over time, trigger when threshold reached | Evaluate immediately per content piece |
Threshold | Required (e.g., 3 violations in 24h) | Not applicable (immediate evaluation) |
Time Window | Required (e.g., “24h”, “7d”) | Not applicable |
Use Case | Pattern detection, repeated violations | Immediate content filtering |
Actions | User actions (ban_user, flag user) | Content actions (flag content, block_content) |
Action Selection Guidelines
- User-Type Rules: Use user actions (ban_user, flag user) when you want to take action against the user account based on their behavior pattern
- Content-Type Rules: Use content actions (flag content, block_content) when you want to take action against specific content pieces
- Mixed Rules: You can use any action type, but consider whether you want to affect the user or just the content
Time Windows
Specify how long to track user behavior (only applicable to user-type rules):
"30m"
: 30 minutes"1h"
: 1 hour"24h"
: 24 hours"7d"
: 7 days"30d"
: 30 days
Cooldown Periods
The Rule Builder supports cooldown periods to prevent immediate re-triggering of rules after an action has been taken. This is particularly useful when users are banned and then unbanned by administrators.
When a rule with a cooldown period is triggered and an action is taken (like banning a user), the system records this action with an expiration time. During the cooldown period, the same rule will not trigger again for that user, even if they continue to violate the conditions.
Configuration
Add a cooldown_period
field to your rule configuration:
{
"id": "spam-detection",
"name": "Spam Detection with Cooldown",
"rule_type": "user",
"enabled": true,
"cooldown_period": "24h",
"conditions": [
...
],
"action": {
"type": "ban_user",
"ban_options": {
"duration": 3600,
"reason": "Spam behavior detected",
"shadow_ban": false,
"ip_ban": false
}
}
}
Example Scenario
- User violates rule: User posts 5 spam messages in 1 hour
- Rule triggers: User gets banned for 1 hour
- Admin unbans user: Administrator manually unbans the user
- User posts again: User immediately posts more spam messages
- Cooldown active: Rule does not trigger again due to 24-hour cooldown
- After cooldown: User can trigger the rule again after 24 hours
Use Cases
- Post-Ban Protection: Prevent immediate re-banning after manual unbans
- Graduated Response: Give users time to reflect before facing consequences again
- Administrative Flexibility: Allow admins to override rules without immediate re-triggering
Best Practices
Start Simple
Begin with basic rules and gradually add complexity as you understand your community’s needs.
Set Reasonable Thresholds
- Too low: May catch legitimate users
- Too high: May miss problematic behavior
- Start conservative and adjust based on results
Use Appropriate Time Windows
- Short windows (1-6 hours): Catch immediate abuse
- Medium windows (24-48 hours): Catch persistent violators
- Long windows (7-30 days): Catch chronic offenders
Configure Cooldown Periods
- Short cooldowns (1-6 hours): For minor violations where users should get another chance quickly
- Medium cooldowns (24-48 hours): For moderate violations where users need time to reflect
- Long cooldowns (7-30 days): For serious violations where users need significant time before facing consequences again
Test Your Rules
Use the test mode to verify your rules work as expected before enabling them in production.
Monitor Performance
Watch for rules that trigger too frequently or not enough, and adjust accordingly.
Common Use Cases
Gaming Communities
- Detect toxic players who harass others repeatedly
- Identify spam bots posting promotional content
- Flag users who post inappropriate content in chat
Social Platforms
- Prevent harassment campaigns against specific users
- Detect coordinated spam or bot activity
- Flag users who post explicit content
Business Applications
- Protect customer support channels from spam
- Detect fake accounts created for abuse
- Maintain professional communication standards