11 KiB
AI Moderation System Implementation
Overview
This document describes the implementation of a production-ready AI-powered content moderation system for the Sojorn platform. The system integrates OpenAI's Moderation API and Google Vision API to automatically analyze text and image content for policy violations.
Architecture
Components
- Database Layer - PostgreSQL tables for storing moderation flags and user status
- AI Analysis Layer - OpenAI (text) and Google Vision (image) API integration
- Service Layer - Go backend services for content analysis and flag management
- CMS Integration - Directus interface for moderation queue management
Data Flow
User Content → Go Backend → AI APIs → Analysis Results → Database → Directus CMS → Admin Review
Database Schema
New Tables
moderation_flags
Stores AI-generated content moderation flags:
CREATE TABLE moderation_flags (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
post_id UUID REFERENCES posts(id) ON DELETE CASCADE,
comment_id UUID REFERENCES comments(id) ON DELETE CASCADE,
flag_reason TEXT NOT NULL,
scores JSONB NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
reviewed_by UUID REFERENCES users(id),
reviewed_at TIMESTAMP WITH TIME ZONE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
user_status_history
Audit trail for user status changes:
CREATE TABLE user_status_history (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
old_status TEXT,
new_status TEXT NOT NULL,
reason TEXT,
changed_by UUID REFERENCES users(id),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
Modified Tables
users
Added status column for user moderation:
ALTER TABLE users ADD COLUMN status TEXT DEFAULT 'active'
CHECK (status IN ('active', 'suspended', 'banned'));
API Integration
OpenAI Moderation API
Endpoint: https://api.openai.com/v1/moderations
Purpose: Analyze text content for policy violations
Categories Mapped:
- Hate → Hate (violence, hate speech)
- Self-Harm → Delusion (self-harm content)
- Sexual → Hate (inappropriate content)
- Violence → Hate (violent content)
Example Response:
{
"results": [{
"categories": {
"hate": 0.1,
"violence": 0.05,
"self-harm": 0.0
},
"category_scores": {
"hate": 0.1,
"violence": 0.05,
"self-harm": 0.0
},
"flagged": false
}]
}
Google Vision API
Endpoint: https://vision.googleapis.com/v1/images:annotate
Purpose: Analyze images for inappropriate content using SafeSearch
SafeSearch Categories Mapped:
- Violence → Hate (violent imagery)
- Adult → Hate (adult content)
- Racy → Delusion (suggestive content)
Example Response:
{
"responses": [{
"safeSearchAnnotation": {
"adult": "UNLIKELY",
"spoof": "UNLIKELY",
"medical": "UNLIKELY",
"violence": "UNLIKELY",
"racy": "UNLIKELY"
}
}]
}
Three Poisons Score Mapping
The system maps AI analysis results to the Buddhist "Three Poisons" framework:
Hate (Dvesha)
- Sources: OpenAI hate, violence, sexual content; Google violence, adult
- Threshold: > 0.5
- Content: Hate speech, violence, explicit content
Greed (Lobha)
- Sources: Keyword-based detection (OpenAI doesn't detect spam well)
- Keywords: buy, crypto, rich, scam, investment, profit, money, trading, etc.
- Threshold: > 0.5
- Content: Spam, scams, financial exploitation
Delusion (Moha)
- Sources: OpenAI self-harm; Google racy content
- Threshold: > 0.5
- Content: Self-harm, misinformation, inappropriate suggestions
Service Implementation
ModerationService
Key methods:
// AnalyzeContent analyzes text and media with AI APIs
func (s *ModerationService) AnalyzeContent(ctx context.Context, body string, mediaURLs []string) (*ThreePoisonsScore, string, error)
// FlagPost creates a moderation flag for a post
func (s *ModerationService) FlagPost(ctx context.Context, postID uuid.UUID, scores *ThreePoisonsScore, reason string) error
// FlagComment creates a moderation flag for a comment
func (s *ModerationService) FlagComment(ctx context.Context, commentID uuid.UUID, scores *ThreePoisonsScore, reason string) error
// GetPendingFlags retrieves pending moderation flags for review
func (s *ModerationService) GetPendingFlags(ctx context.Context, limit, offset int) ([]map[string]interface{}, error)
// UpdateFlagStatus updates flag status after review
func (s *ModerationService) UpdateFlagStatus(ctx context.Context, flagID uuid.UUID, status string, reviewedBy uuid.UUID) error
// UpdateUserStatus updates user moderation status
func (s *ModerationService) UpdateUserStatus(ctx context.Context, userID uuid.UUID, status string, changedBy uuid.UUID, reason string) error
Configuration
Environment variables:
# Enable/disable moderation system
MODERATION_ENABLED=true
# OpenAI API key for text moderation
OPENAI_API_KEY=sk-your-openai-key
# Google Vision API key for image analysis
GOOGLE_VISION_API_KEY=your-google-vision-key
Directus Integration
Permissions
The migration grants appropriate permissions to the Directus user:
GRANT SELECT, INSERT, UPDATE, DELETE ON moderation_flags TO directus;
GRANT SELECT, INSERT, UPDATE, DELETE ON user_status_history TO directus;
GRANT SELECT, UPDATE ON users TO directus;
CMS Interface
Directus will automatically detect the new tables and allow you to build:
- Moderation Queue - View pending flags with content preview
- User Management - Manage user status (active/suspended/banned)
- Audit Trail - View moderation history and user status changes
- Analytics - Reports on moderation trends and statistics
Recommended Directus Configuration
-
Moderation Flags Collection
- Hide technical fields (id, updated_at)
- Create custom display for scores (JSON visualization)
- Add status workflow buttons (approve/reject/escalate)
-
Users Collection
- Add status field with dropdown (active/suspended/banned)
- Create relationship to status history
- Add moderation statistics panel
-
User Status History Collection
- Read-only view for audit trail
- Filter by user and date range
- Export functionality for compliance
Usage Examples
Analyzing Content
ctx := context.Background()
moderationService := NewModerationService(pool, openAIKey, googleKey)
// Analyze text and images
scores, reason, err := moderationService.AnalyzeContent(ctx, postContent, mediaURLs)
if err != nil {
log.Printf("Moderation analysis failed: %v", err)
return
}
// Flag content if needed
if reason != "" {
err = moderationService.FlagPost(ctx, postID, scores, reason)
if err != nil {
log.Printf("Failed to flag post: %v", err)
}
}
Managing Moderation Queue
// Get pending flags
flags, err := moderationService.GetPendingFlags(ctx, 50, 0)
if err != nil {
log.Printf("Failed to get pending flags: %v", err)
return
}
// Review and update flag status
for _, flag := range flags {
flagID := flag["id"].(uuid.UUID)
err = moderationService.UpdateFlagStatus(ctx, flagID, "approved", adminID)
if err != nil {
log.Printf("Failed to update flag status: %v", err)
}
}
User Status Management
// Suspend user for repeated violations
err = moderationService.UpdateUserStatus(ctx, userID, "suspended", adminID, "Multiple hate speech violations")
if err != nil {
log.Printf("Failed to update user status: %v", err)
}
Performance Considerations
API Rate Limits
- OpenAI: 60 requests/minute for moderation endpoint
- Google Vision: 1000 requests/minute per project
Caching
Consider implementing caching for:
- Repeated content analysis
- User reputation scores
- API responses for identical content
Batch Processing
For high-volume scenarios:
- Queue content for batch analysis
- Process multiple items in single API calls
- Implement background workers
Security & Privacy
Data Protection
- Content sent to third-party APIs
- Consider privacy implications
- Implement data retention policies
API Key Security
- Store keys in environment variables
- Rotate keys regularly
- Monitor API usage for anomalies
Compliance
- GDPR considerations for content analysis
- Data processing agreements with AI providers
- User consent for content analysis
Monitoring & Alerting
Metrics to Track
- API response times and error rates
- Flag volume by category
- Review queue length and processing time
- User status changes and appeals
Alerting
- High API error rates
- Queue processing delays
- Unusual flag patterns
- API quota exhaustion
Testing
Unit Tests
func TestAnalyzeContent(t *testing.T) {
service := NewModerationService(pool, "test-key", "test-key")
// Test hate content
scores, reason, err := service.AnalyzeContent(ctx, "I hate everyone", nil)
assert.NoError(t, err)
assert.Equal(t, "hate", reason)
assert.Greater(t, scores.Hate, 0.5)
}
Integration Tests
- Test API integrations with mock servers
- Verify database operations
- Test Directus integration
Load Testing
- Test API rate limit handling
- Verify database performance under load
- Test queue processing throughput
Deployment
Environment Setup
- Set required environment variables
- Run database migrations
- Configure API keys
- Test integrations
Migration Steps
- Deploy schema changes
- Update application code
- Configure Directus permissions
- Test moderation flow
- Monitor for issues
Rollback Plan
- Database migration rollback
- Previous version deployment
- Data backup and restore procedures
Future Enhancements
Additional AI Providers
- Content moderation alternatives
- Multi-language support
- Custom model training
Advanced Features
- Machine learning for false positive reduction
- User reputation scoring
- Automated escalation workflows
- Appeal process integration
Analytics & Reporting
- Moderation effectiveness metrics
- Content trend analysis
- User behavior insights
- Compliance reporting
Troubleshooting
Common Issues
-
API Key Errors
- Verify environment variables
- Check API key permissions
- Monitor usage quotas
-
Database Connection Issues
- Verify migration completion
- Check Directus permissions
- Test database connectivity
-
Performance Issues
- Monitor API response times
- Check database query performance
- Review queue processing
Debug Tools
- API request/response logging
- Database query logging
- Performance monitoring
- Error tracking and alerting
Support & Maintenance
Regular Tasks
- Monitor API usage and costs
- Review moderation accuracy
- Update keyword lists
- Maintain database performance
Documentation Updates
- API documentation changes
- New feature additions
- Configuration updates
- Troubleshooting guides