sojorn/sojorn_docs/AI_MODERATION_IMPLEMENTATION.md

# AI Moderation System Implementation

## Overview

This document describes the implementation of a production-ready AI-powered content moderation system for the Sojorn platform. The system integrates OpenAI's Moderation API and Google Vision API to automatically analyze text and image content for policy violations.

## Architecture

### Components

1. **Database Layer** - PostgreSQL tables for storing moderation flags and user status
2. **AI Analysis Layer** - OpenAI (text) and Google Vision (image) API integration
3. **Service Layer** - Go backend services for content analysis and flag management
4. **CMS Integration** - Directus interface for moderation queue management

### Data Flow

```
User Content → Go Backend → AI APIs → Analysis Results → Database → Directus CMS → Admin Review
```

## Database Schema

### New Tables

#### `moderation_flags`
Stores AI-generated content moderation flags:

```sql
CREATE TABLE moderation_flags (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    post_id UUID REFERENCES posts(id) ON DELETE CASCADE,
    comment_id UUID REFERENCES comments(id) ON DELETE CASCADE,
    flag_reason TEXT NOT NULL,
    scores JSONB NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',
    reviewed_by UUID REFERENCES users(id),
    reviewed_at TIMESTAMP WITH TIME ZONE,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
    updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
```

#### `user_status_history`
Audit trail for user status changes:

```sql
CREATE TABLE user_status_history (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    old_status TEXT,
    new_status TEXT NOT NULL,
    reason TEXT,
    changed_by UUID REFERENCES users(id),
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
```

### Modified Tables

#### `users`
Added status column for user moderation:

```sql
ALTER TABLE users ADD COLUMN status TEXT DEFAULT 'active'
CHECK (status IN ('active', 'suspended', 'banned'));
```

## API Integration

### OpenAI Moderation API

**Endpoint**: `https://api.openai.com/v1/moderations`

**Purpose**: Analyze text content for policy violations

**Categories Mapped**:
- Hate → Hate (violence, hate speech)
- Self-Harm → Delusion (self-harm content)
- Sexual → Hate (inappropriate content)
- Violence → Hate (violent content)

**Example Response**:
```json
{
  "results": [{
    "categories": {
      "hate": 0.1,
      "violence": 0.05,
      "self-harm": 0.0
    },
    "category_scores": {
      "hate": 0.1,
      "violence": 0.05,
      "self-harm": 0.0
    },
    "flagged": false
  }]
}
```

### Google Vision API

**Endpoint**: `https://vision.googleapis.com/v1/images:annotate`

**Purpose**: Analyze images for inappropriate content using SafeSearch

**SafeSearch Categories Mapped**:
- Violence → Hate (violent imagery)
- Adult → Hate (adult content)
- Racy → Delusion (suggestive content)

**Example Response**:
```json
{
  "responses": [{
    "safeSearchAnnotation": {
      "adult": "UNLIKELY",
      "spoof": "UNLIKELY",
      "medical": "UNLIKELY",
      "violence": "UNLIKELY",
      "racy": "UNLIKELY"
    }
  }]
}
```

## Three Poisons Score Mapping

The system maps AI analysis results to the Buddhist "Three Poisons" framework:

### Hate (Dvesha)
- **Sources**: OpenAI hate, violence, sexual content; Google violence, adult
- **Threshold**: > 0.5
- **Content**: Hate speech, violence, explicit content

### Greed (Lobha)
- **Sources**: Keyword-based detection (OpenAI doesn't detect spam well)
- **Keywords**: buy, crypto, rich, scam, investment, profit, money, trading, etc.
- **Threshold**: > 0.5
- **Content**: Spam, scams, financial exploitation

### Delusion (Moha)
- **Sources**: OpenAI self-harm; Google racy content
- **Threshold**: > 0.5
- **Content**: Self-harm, misinformation, inappropriate suggestions

## Service Implementation

### ModerationService

Key methods:

```go
// AnalyzeContent analyzes text and media with AI APIs
func (s *ModerationService) AnalyzeContent(ctx context.Context, body string, mediaURLs []string) (*ThreePoisonsScore, string, error)

// FlagPost creates a moderation flag for a post
func (s *ModerationService) FlagPost(ctx context.Context, postID uuid.UUID, scores *ThreePoisonsScore, reason string) error

// FlagComment creates a moderation flag for a comment
func (s *ModerationService) FlagComment(ctx context.Context, commentID uuid.UUID, scores *ThreePoisonsScore, reason string) error

// GetPendingFlags retrieves pending moderation flags for review
func (s *ModerationService) GetPendingFlags(ctx context.Context, limit, offset int) ([]map[string]interface{}, error)

// UpdateFlagStatus updates flag status after review
func (s *ModerationService) UpdateFlagStatus(ctx context.Context, flagID uuid.UUID, status string, reviewedBy uuid.UUID) error

// UpdateUserStatus updates user moderation status
func (s *ModerationService) UpdateUserStatus(ctx context.Context, userID uuid.UUID, status string, changedBy uuid.UUID, reason string) error
```

### Configuration

Environment variables:

```bash
# Enable/disable moderation system
MODERATION_ENABLED=true

# OpenAI API key for text moderation
OPENAI_API_KEY=sk-your-openai-key

# Google Vision API key for image analysis
GOOGLE_VISION_API_KEY=your-google-vision-key
```

## Directus Integration

### Permissions

The migration grants appropriate permissions to the Directus user:

```sql
GRANT SELECT, INSERT, UPDATE, DELETE ON moderation_flags TO directus;
GRANT SELECT, INSERT, UPDATE, DELETE ON user_status_history TO directus;
GRANT SELECT, UPDATE ON users TO directus;
```

### CMS Interface

Directus will automatically detect the new tables and allow you to build:

1. **Moderation Queue** - View pending flags with content preview
2. **User Management** - Manage user status (active/suspended/banned)
3. **Audit Trail** - View moderation history and user status changes
4. **Analytics** - Reports on moderation trends and statistics

### Recommended Directus Configuration

1. **Moderation Flags Collection**
   - Hide technical fields (id, updated_at)
   - Create custom display for scores (JSON visualization)
   - Add status workflow buttons (approve/reject/escalate)

2. **Users Collection**
   - Add status field with dropdown (active/suspended/banned)
   - Create relationship to status history
   - Add moderation statistics panel

3. **User Status History Collection**
   - Read-only view for audit trail
   - Filter by user and date range
   - Export functionality for compliance

## Usage Examples

### Analyzing Content

```go
ctx := context.Background()
moderationService := NewModerationService(pool, openAIKey, googleKey)

// Analyze text and images
scores, reason, err := moderationService.AnalyzeContent(ctx, postContent, mediaURLs)
if err != nil {
    log.Printf("Moderation analysis failed: %v", err)
    return
}

// Flag content if needed
if reason != "" {
    err = moderationService.FlagPost(ctx, postID, scores, reason)
    if err != nil {
        log.Printf("Failed to flag post: %v", err)
    }
}
```

### Managing Moderation Queue

```go
// Get pending flags
flags, err := moderationService.GetPendingFlags(ctx, 50, 0)
if err != nil {
    log.Printf("Failed to get pending flags: %v", err)
    return
}

// Review and update flag status
for _, flag := range flags {
    flagID := flag["id"].(uuid.UUID)
    err = moderationService.UpdateFlagStatus(ctx, flagID, "approved", adminID)
    if err != nil {
        log.Printf("Failed to update flag status: %v", err)
    }
}
```

### User Status Management

```go
// Suspend user for repeated violations
err = moderationService.UpdateUserStatus(ctx, userID, "suspended", adminID, "Multiple hate speech violations")
if err != nil {
    log.Printf("Failed to update user status: %v", err)
}
```

## Performance Considerations

### API Rate Limits

- **OpenAI**: 60 requests/minute for moderation endpoint
- **Google Vision**: 1000 requests/minute per project

### Caching

Consider implementing caching for:
- Repeated content analysis
- User reputation scores
- API responses for identical content

### Batch Processing

For high-volume scenarios:
- Queue content for batch analysis
- Process multiple items in single API calls
- Implement background workers

## Security & Privacy

### Data Protection

- Content sent to third-party APIs
- Consider privacy implications
- Implement data retention policies

### API Key Security

- Store keys in environment variables
- Rotate keys regularly
- Monitor API usage for anomalies

### Compliance

- GDPR considerations for content analysis
- Data processing agreements with AI providers
- User consent for content analysis

## Monitoring & Alerting

### Metrics to Track

- API response times and error rates
- Flag volume by category
- Review queue length and processing time
- User status changes and appeals

### Alerting

- High API error rates
- Queue processing delays
- Unusual flag patterns
- API quota exhaustion

## Testing

### Unit Tests

```go
func TestAnalyzeContent(t *testing.T) {
    service := NewModerationService(pool, "test-key", "test-key")

    // Test hate content
    scores, reason, err := service.AnalyzeContent(ctx, "I hate everyone", nil)
    assert.NoError(t, err)
    assert.Equal(t, "hate", reason)
    assert.Greater(t, scores.Hate, 0.5)
}
```

### Integration Tests

- Test API integrations with mock servers
- Verify database operations
- Test Directus integration

### Load Testing

- Test API rate limit handling
- Verify database performance under load
- Test queue processing throughput

## Deployment

### Environment Setup

1. Set required environment variables
2. Run database migrations
3. Configure API keys
4. Test integrations

### Migration Steps

1. Deploy schema changes
2. Update application code
3. Configure Directus permissions
4. Test moderation flow
5. Monitor for issues

### Rollback Plan

- Database migration rollback
- Previous version deployment
- Data backup and restore procedures

## Future Enhancements

### Additional AI Providers

- Content moderation alternatives
- Multi-language support
- Custom model training

### Advanced Features

- Machine learning for false positive reduction
- User reputation scoring
- Automated escalation workflows
- Appeal process integration

### Analytics & Reporting

- Moderation effectiveness metrics
- Content trend analysis
- User behavior insights
- Compliance reporting

## Troubleshooting

### Common Issues

1. **API Key Errors**
   - Verify environment variables
   - Check API key permissions
   - Monitor usage quotas

2. **Database Connection Issues**
   - Verify migration completion
   - Check Directus permissions
   - Test database connectivity

3. **Performance Issues**
   - Monitor API response times
   - Check database query performance
   - Review queue processing

### Debug Tools

- API request/response logging
- Database query logging
- Performance monitoring
- Error tracking and alerting

## Support & Maintenance

### Regular Tasks

- Monitor API usage and costs
- Review moderation accuracy
- Update keyword lists
- Maintain database performance

### Documentation Updates

- API documentation changes
- New feature additions
- Configuration updates
- Troubleshooting guides