sojorn/sojorn_docs/AI_MODERATION_IMPLEMENTATION.md

11 KiB

AI Moderation System Implementation

Overview

This document describes the implementation of a production-ready AI-powered content moderation system for the Sojorn platform. The system integrates OpenAI's Moderation API and Google Vision API to automatically analyze text and image content for policy violations.

Architecture

Components

  1. Database Layer - PostgreSQL tables for storing moderation flags and user status
  2. AI Analysis Layer - OpenAI (text) and Google Vision (image) API integration
  3. Service Layer - Go backend services for content analysis and flag management
  4. CMS Integration - Directus interface for moderation queue management

Data Flow

User Content → Go Backend → AI APIs → Analysis Results → Database → Directus CMS → Admin Review

Database Schema

New Tables

moderation_flags

Stores AI-generated content moderation flags:

CREATE TABLE moderation_flags (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    post_id UUID REFERENCES posts(id) ON DELETE CASCADE,
    comment_id UUID REFERENCES comments(id) ON DELETE CASCADE,
    flag_reason TEXT NOT NULL,
    scores JSONB NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',
    reviewed_by UUID REFERENCES users(id),
    reviewed_at TIMESTAMP WITH TIME ZONE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

user_status_history

Audit trail for user status changes:

CREATE TABLE user_status_history (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    old_status TEXT,
    new_status TEXT NOT NULL,
    reason TEXT,
    changed_by UUID REFERENCES users(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

Modified Tables

users

Added status column for user moderation:

ALTER TABLE users ADD COLUMN status TEXT DEFAULT 'active' 
CHECK (status IN ('active', 'suspended', 'banned'));

API Integration

OpenAI Moderation API

Endpoint: https://api.openai.com/v1/moderations

Purpose: Analyze text content for policy violations

Categories Mapped:

  • Hate → Hate (violence, hate speech)
  • Self-Harm → Delusion (self-harm content)
  • Sexual → Hate (inappropriate content)
  • Violence → Hate (violent content)

Example Response:

{
  "results": [{
    "categories": {
      "hate": 0.1,
      "violence": 0.05,
      "self-harm": 0.0
    },
    "category_scores": {
      "hate": 0.1,
      "violence": 0.05,
      "self-harm": 0.0
    },
    "flagged": false
  }]
}

Google Vision API

Endpoint: https://vision.googleapis.com/v1/images:annotate

Purpose: Analyze images for inappropriate content using SafeSearch

SafeSearch Categories Mapped:

  • Violence → Hate (violent imagery)
  • Adult → Hate (adult content)
  • Racy → Delusion (suggestive content)

Example Response:

{
  "responses": [{
    "safeSearchAnnotation": {
      "adult": "UNLIKELY",
      "spoof": "UNLIKELY",
      "medical": "UNLIKELY",
      "violence": "UNLIKELY",
      "racy": "UNLIKELY"
    }
  }]
}

Three Poisons Score Mapping

The system maps AI analysis results to the Buddhist "Three Poisons" framework:

Hate (Dvesha)

  • Sources: OpenAI hate, violence, sexual content; Google violence, adult
  • Threshold: > 0.5
  • Content: Hate speech, violence, explicit content

Greed (Lobha)

  • Sources: Keyword-based detection (OpenAI doesn't detect spam well)
  • Keywords: buy, crypto, rich, scam, investment, profit, money, trading, etc.
  • Threshold: > 0.5
  • Content: Spam, scams, financial exploitation

Delusion (Moha)

  • Sources: OpenAI self-harm; Google racy content
  • Threshold: > 0.5
  • Content: Self-harm, misinformation, inappropriate suggestions

Service Implementation

ModerationService

Key methods:

// AnalyzeContent analyzes text and media with AI APIs
func (s *ModerationService) AnalyzeContent(ctx context.Context, body string, mediaURLs []string) (*ThreePoisonsScore, string, error)

// FlagPost creates a moderation flag for a post
func (s *ModerationService) FlagPost(ctx context.Context, postID uuid.UUID, scores *ThreePoisonsScore, reason string) error

// FlagComment creates a moderation flag for a comment
func (s *ModerationService) FlagComment(ctx context.Context, commentID uuid.UUID, scores *ThreePoisonsScore, reason string) error

// GetPendingFlags retrieves pending moderation flags for review
func (s *ModerationService) GetPendingFlags(ctx context.Context, limit, offset int) ([]map[string]interface{}, error)

// UpdateFlagStatus updates flag status after review
func (s *ModerationService) UpdateFlagStatus(ctx context.Context, flagID uuid.UUID, status string, reviewedBy uuid.UUID) error

// UpdateUserStatus updates user moderation status
func (s *ModerationService) UpdateUserStatus(ctx context.Context, userID uuid.UUID, status string, changedBy uuid.UUID, reason string) error

Configuration

Environment variables:

# Enable/disable moderation system
MODERATION_ENABLED=true

# OpenAI API key for text moderation
OPENAI_API_KEY=sk-your-openai-key

# Google Vision API key for image analysis
GOOGLE_VISION_API_KEY=your-google-vision-key

Directus Integration

Permissions

The migration grants appropriate permissions to the Directus user:

GRANT SELECT, INSERT, UPDATE, DELETE ON moderation_flags TO directus;
GRANT SELECT, INSERT, UPDATE, DELETE ON user_status_history TO directus;
GRANT SELECT, UPDATE ON users TO directus;

CMS Interface

Directus will automatically detect the new tables and allow you to build:

  1. Moderation Queue - View pending flags with content preview
  2. User Management - Manage user status (active/suspended/banned)
  3. Audit Trail - View moderation history and user status changes
  4. Analytics - Reports on moderation trends and statistics
  1. Moderation Flags Collection

    • Hide technical fields (id, updated_at)
    • Create custom display for scores (JSON visualization)
    • Add status workflow buttons (approve/reject/escalate)
  2. Users Collection

    • Add status field with dropdown (active/suspended/banned)
    • Create relationship to status history
    • Add moderation statistics panel
  3. User Status History Collection

    • Read-only view for audit trail
    • Filter by user and date range
    • Export functionality for compliance

Usage Examples

Analyzing Content

ctx := context.Background()
moderationService := NewModerationService(pool, openAIKey, googleKey)

// Analyze text and images
scores, reason, err := moderationService.AnalyzeContent(ctx, postContent, mediaURLs)
if err != nil {
    log.Printf("Moderation analysis failed: %v", err)
    return
}

// Flag content if needed
if reason != "" {
    err = moderationService.FlagPost(ctx, postID, scores, reason)
    if err != nil {
        log.Printf("Failed to flag post: %v", err)
    }
}

Managing Moderation Queue

// Get pending flags
flags, err := moderationService.GetPendingFlags(ctx, 50, 0)
if err != nil {
    log.Printf("Failed to get pending flags: %v", err)
    return
}

// Review and update flag status
for _, flag := range flags {
    flagID := flag["id"].(uuid.UUID)
    err = moderationService.UpdateFlagStatus(ctx, flagID, "approved", adminID)
    if err != nil {
        log.Printf("Failed to update flag status: %v", err)
    }
}

User Status Management

// Suspend user for repeated violations
err = moderationService.UpdateUserStatus(ctx, userID, "suspended", adminID, "Multiple hate speech violations")
if err != nil {
    log.Printf("Failed to update user status: %v", err)
}

Performance Considerations

API Rate Limits

  • OpenAI: 60 requests/minute for moderation endpoint
  • Google Vision: 1000 requests/minute per project

Caching

Consider implementing caching for:

  • Repeated content analysis
  • User reputation scores
  • API responses for identical content

Batch Processing

For high-volume scenarios:

  • Queue content for batch analysis
  • Process multiple items in single API calls
  • Implement background workers

Security & Privacy

Data Protection

  • Content sent to third-party APIs
  • Consider privacy implications
  • Implement data retention policies

API Key Security

  • Store keys in environment variables
  • Rotate keys regularly
  • Monitor API usage for anomalies

Compliance

  • GDPR considerations for content analysis
  • Data processing agreements with AI providers
  • User consent for content analysis

Monitoring & Alerting

Metrics to Track

  • API response times and error rates
  • Flag volume by category
  • Review queue length and processing time
  • User status changes and appeals

Alerting

  • High API error rates
  • Queue processing delays
  • Unusual flag patterns
  • API quota exhaustion

Testing

Unit Tests

func TestAnalyzeContent(t *testing.T) {
    service := NewModerationService(pool, "test-key", "test-key")
    
    // Test hate content
    scores, reason, err := service.AnalyzeContent(ctx, "I hate everyone", nil)
    assert.NoError(t, err)
    assert.Equal(t, "hate", reason)
    assert.Greater(t, scores.Hate, 0.5)
}

Integration Tests

  • Test API integrations with mock servers
  • Verify database operations
  • Test Directus integration

Load Testing

  • Test API rate limit handling
  • Verify database performance under load
  • Test queue processing throughput

Deployment

Environment Setup

  1. Set required environment variables
  2. Run database migrations
  3. Configure API keys
  4. Test integrations

Migration Steps

  1. Deploy schema changes
  2. Update application code
  3. Configure Directus permissions
  4. Test moderation flow
  5. Monitor for issues

Rollback Plan

  • Database migration rollback
  • Previous version deployment
  • Data backup and restore procedures

Future Enhancements

Additional AI Providers

  • Content moderation alternatives
  • Multi-language support
  • Custom model training

Advanced Features

  • Machine learning for false positive reduction
  • User reputation scoring
  • Automated escalation workflows
  • Appeal process integration

Analytics & Reporting

  • Moderation effectiveness metrics
  • Content trend analysis
  • User behavior insights
  • Compliance reporting

Troubleshooting

Common Issues

  1. API Key Errors

    • Verify environment variables
    • Check API key permissions
    • Monitor usage quotas
  2. Database Connection Issues

    • Verify migration completion
    • Check Directus permissions
    • Test database connectivity
  3. Performance Issues

    • Monitor API response times
    • Check database query performance
    • Review queue processing

Debug Tools

  • API request/response logging
  • Database query logging
  • Performance monitoring
  • Error tracking and alerting

Support & Maintenance

Regular Tasks

  • Monitor API usage and costs
  • Review moderation accuracy
  • Update keyword lists
  • Maintain database performance

Documentation Updates

  • API documentation changes
  • New feature additions
  • Configuration updates
  • Troubleshooting guides