# AI Moderation System Implementation ## Overview This document describes the implementation of a production-ready AI-powered content moderation system for the Sojorn platform. The system integrates OpenAI's Moderation API and Google Vision API to automatically analyze text and image content for policy violations. ## Architecture ### Components 1. **Database Layer** - PostgreSQL tables for storing moderation flags and user status 2. **AI Analysis Layer** - OpenAI (text) and Google Vision (image) API integration 3. **Service Layer** - Go backend services for content analysis and flag management 4. **CMS Integration** - Directus interface for moderation queue management ### Data Flow ``` User Content → Go Backend → AI APIs → Analysis Results → Database → Directus CMS → Admin Review ``` ## Database Schema ### New Tables #### `moderation_flags` Stores AI-generated content moderation flags: ```sql CREATE TABLE moderation_flags ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), post_id UUID REFERENCES posts(id) ON DELETE CASCADE, comment_id UUID REFERENCES comments(id) ON DELETE CASCADE, flag_reason TEXT NOT NULL, scores JSONB NOT NULL, status TEXT NOT NULL DEFAULT 'pending', reviewed_by UUID REFERENCES users(id), reviewed_at TIMESTAMP WITH TIME ZONE, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(), updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); ``` #### `user_status_history` Audit trail for user status changes: ```sql CREATE TABLE user_status_history ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, old_status TEXT, new_status TEXT NOT NULL, reason TEXT, changed_by UUID REFERENCES users(id), created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); ``` ### Modified Tables #### `users` Added status column for user moderation: ```sql ALTER TABLE users ADD COLUMN status TEXT DEFAULT 'active' CHECK (status IN ('active', 'suspended', 'banned')); ``` ## API Integration ### OpenAI Moderation API **Endpoint**: `https://api.openai.com/v1/moderations` **Purpose**: Analyze text content for policy violations **Categories Mapped**: - Hate → Hate (violence, hate speech) - Self-Harm → Delusion (self-harm content) - Sexual → Hate (inappropriate content) - Violence → Hate (violent content) **Example Response**: ```json { "results": [{ "categories": { "hate": 0.1, "violence": 0.05, "self-harm": 0.0 }, "category_scores": { "hate": 0.1, "violence": 0.05, "self-harm": 0.0 }, "flagged": false }] } ``` ### Google Vision API **Endpoint**: `https://vision.googleapis.com/v1/images:annotate` **Purpose**: Analyze images for inappropriate content using SafeSearch **SafeSearch Categories Mapped**: - Violence → Hate (violent imagery) - Adult → Hate (adult content) - Racy → Delusion (suggestive content) **Example Response**: ```json { "responses": [{ "safeSearchAnnotation": { "adult": "UNLIKELY", "spoof": "UNLIKELY", "medical": "UNLIKELY", "violence": "UNLIKELY", "racy": "UNLIKELY" } }] } ``` ## Three Poisons Score Mapping The system maps AI analysis results to the Buddhist "Three Poisons" framework: ### Hate (Dvesha) - **Sources**: OpenAI hate, violence, sexual content; Google violence, adult - **Threshold**: > 0.5 - **Content**: Hate speech, violence, explicit content ### Greed (Lobha) - **Sources**: Keyword-based detection (OpenAI doesn't detect spam well) - **Keywords**: buy, crypto, rich, scam, investment, profit, money, trading, etc. - **Threshold**: > 0.5 - **Content**: Spam, scams, financial exploitation ### Delusion (Moha) - **Sources**: OpenAI self-harm; Google racy content - **Threshold**: > 0.5 - **Content**: Self-harm, misinformation, inappropriate suggestions ## Service Implementation ### ModerationService Key methods: ```go // AnalyzeContent analyzes text and media with AI APIs func (s *ModerationService) AnalyzeContent(ctx context.Context, body string, mediaURLs []string) (*ThreePoisonsScore, string, error) // FlagPost creates a moderation flag for a post func (s *ModerationService) FlagPost(ctx context.Context, postID uuid.UUID, scores *ThreePoisonsScore, reason string) error // FlagComment creates a moderation flag for a comment func (s *ModerationService) FlagComment(ctx context.Context, commentID uuid.UUID, scores *ThreePoisonsScore, reason string) error // GetPendingFlags retrieves pending moderation flags for review func (s *ModerationService) GetPendingFlags(ctx context.Context, limit, offset int) ([]map[string]interface{}, error) // UpdateFlagStatus updates flag status after review func (s *ModerationService) UpdateFlagStatus(ctx context.Context, flagID uuid.UUID, status string, reviewedBy uuid.UUID) error // UpdateUserStatus updates user moderation status func (s *ModerationService) UpdateUserStatus(ctx context.Context, userID uuid.UUID, status string, changedBy uuid.UUID, reason string) error ``` ### Configuration Environment variables: ```bash # Enable/disable moderation system MODERATION_ENABLED=true # OpenAI API key for text moderation OPENAI_API_KEY=sk-your-openai-key # Google Vision API key for image analysis GOOGLE_VISION_API_KEY=your-google-vision-key ``` ## Directus Integration ### Permissions The migration grants appropriate permissions to the Directus user: ```sql GRANT SELECT, INSERT, UPDATE, DELETE ON moderation_flags TO directus; GRANT SELECT, INSERT, UPDATE, DELETE ON user_status_history TO directus; GRANT SELECT, UPDATE ON users TO directus; ``` ### CMS Interface Directus will automatically detect the new tables and allow you to build: 1. **Moderation Queue** - View pending flags with content preview 2. **User Management** - Manage user status (active/suspended/banned) 3. **Audit Trail** - View moderation history and user status changes 4. **Analytics** - Reports on moderation trends and statistics ### Recommended Directus Configuration 1. **Moderation Flags Collection** - Hide technical fields (id, updated_at) - Create custom display for scores (JSON visualization) - Add status workflow buttons (approve/reject/escalate) 2. **Users Collection** - Add status field with dropdown (active/suspended/banned) - Create relationship to status history - Add moderation statistics panel 3. **User Status History Collection** - Read-only view for audit trail - Filter by user and date range - Export functionality for compliance ## Usage Examples ### Analyzing Content ```go ctx := context.Background() moderationService := NewModerationService(pool, openAIKey, googleKey) // Analyze text and images scores, reason, err := moderationService.AnalyzeContent(ctx, postContent, mediaURLs) if err != nil { log.Printf("Moderation analysis failed: %v", err) return } // Flag content if needed if reason != "" { err = moderationService.FlagPost(ctx, postID, scores, reason) if err != nil { log.Printf("Failed to flag post: %v", err) } } ``` ### Managing Moderation Queue ```go // Get pending flags flags, err := moderationService.GetPendingFlags(ctx, 50, 0) if err != nil { log.Printf("Failed to get pending flags: %v", err) return } // Review and update flag status for _, flag := range flags { flagID := flag["id"].(uuid.UUID) err = moderationService.UpdateFlagStatus(ctx, flagID, "approved", adminID) if err != nil { log.Printf("Failed to update flag status: %v", err) } } ``` ### User Status Management ```go // Suspend user for repeated violations err = moderationService.UpdateUserStatus(ctx, userID, "suspended", adminID, "Multiple hate speech violations") if err != nil { log.Printf("Failed to update user status: %v", err) } ``` ## Performance Considerations ### API Rate Limits - **OpenAI**: 60 requests/minute for moderation endpoint - **Google Vision**: 1000 requests/minute per project ### Caching Consider implementing caching for: - Repeated content analysis - User reputation scores - API responses for identical content ### Batch Processing For high-volume scenarios: - Queue content for batch analysis - Process multiple items in single API calls - Implement background workers ## Security & Privacy ### Data Protection - Content sent to third-party APIs - Consider privacy implications - Implement data retention policies ### API Key Security - Store keys in environment variables - Rotate keys regularly - Monitor API usage for anomalies ### Compliance - GDPR considerations for content analysis - Data processing agreements with AI providers - User consent for content analysis ## Monitoring & Alerting ### Metrics to Track - API response times and error rates - Flag volume by category - Review queue length and processing time - User status changes and appeals ### Alerting - High API error rates - Queue processing delays - Unusual flag patterns - API quota exhaustion ## Testing ### Unit Tests ```go func TestAnalyzeContent(t *testing.T) { service := NewModerationService(pool, "test-key", "test-key") // Test hate content scores, reason, err := service.AnalyzeContent(ctx, "I hate everyone", nil) assert.NoError(t, err) assert.Equal(t, "hate", reason) assert.Greater(t, scores.Hate, 0.5) } ``` ### Integration Tests - Test API integrations with mock servers - Verify database operations - Test Directus integration ### Load Testing - Test API rate limit handling - Verify database performance under load - Test queue processing throughput ## Deployment ### Environment Setup 1. Set required environment variables 2. Run database migrations 3. Configure API keys 4. Test integrations ### Migration Steps 1. Deploy schema changes 2. Update application code 3. Configure Directus permissions 4. Test moderation flow 5. Monitor for issues ### Rollback Plan - Database migration rollback - Previous version deployment - Data backup and restore procedures ## Future Enhancements ### Additional AI Providers - Content moderation alternatives - Multi-language support - Custom model training ### Advanced Features - Machine learning for false positive reduction - User reputation scoring - Automated escalation workflows - Appeal process integration ### Analytics & Reporting - Moderation effectiveness metrics - Content trend analysis - User behavior insights - Compliance reporting ## Troubleshooting ### Common Issues 1. **API Key Errors** - Verify environment variables - Check API key permissions - Monitor usage quotas 2. **Database Connection Issues** - Verify migration completion - Check Directus permissions - Test database connectivity 3. **Performance Issues** - Monitor API response times - Check database query performance - Review queue processing ### Debug Tools - API request/response logging - Database query logging - Performance monitoring - Error tracking and alerting ## Support & Maintenance ### Regular Tasks - Monitor API usage and costs - Review moderation accuracy - Update keyword lists - Maintain database performance ### Documentation Updates - API documentation changes - New feature additions - Configuration updates - Troubleshooting guides