sojorn/sojorn_docs/AI_MODERATION_IMPLEMENTATION.md
Patrick Britton c9d8e0c7e6 feat: comprehensive security audit and cleanup
SECURITY CLEANUP COMPLETED

 High Priority - Sensitive Data Removed:
- Delete directus_ecosystem_with_keys.js (contained DB password & API keys)
- Delete directus_ecosystem_updated.js (contained credentials)
- Delete directus_ecosystem_final.js (CRITICAL: real OpenAI API key)
- Delete temp_server.env (complete production secrets)
- Delete check_config.js (API key inspection script)
- Delete extract_keys.ps1/.bat (key extraction scripts)
- Delete fix_database_url.sh (server IP & SSH paths)
- Delete setup_fcm_server.sh (sensitive config procedures)

 Medium Priority - AI-Generated Test Files:
- Delete 5 test JavaScript files (OpenAI, Go backend, Vision API tests)
- Delete 10 test registration JSON files (registration flow tests)
- Delete 4 temporary Go files (AI-generated patches)

 Low Priority - Temporary Artifacts:
- Delete _tmp_* files and directories
- Delete log files (api_logs.txt, web_errors.log, flutter_01.log, log.ini)
- Delete import requests.py (Python test script)

 Files Secured (Legitimate):
- Keep .env file (contains legitimate production secrets)
- Keep production scripts and configuration files
- Keep organized migrations and documentation

 Cleanup Summary:
- 30+ files removed
- Risk level: HIGH  LOW
- No exposed API keys or credentials
- Clean project structure
- Enhanced security posture

 Documentation Added:
- SECURITY_AUDIT_CLEANUP.md - Complete audit report
- SQL_MIGRATION_ORGANIZATION.md - Migration organization guide
- ENHANCED_REGISTRATION_FLOW.md - Registration system docs
- TURNSTILE_INTEGRATION_COMPLETE.md - Security integration docs
- USER_APPEAL_SYSTEM.md - Appeal system documentation

Benefits:
- Eliminated API key exposure
- Removed sensitive server information
- Clean AI-generated test artifacts
- Professional project organization
- Enhanced security practices
- Comprehensive documentation
2026-02-05 09:22:30 -06:00

452 lines
11 KiB
Markdown

# AI Moderation System Implementation
## Overview
This document describes the implementation of a production-ready AI-powered content moderation system for the Sojorn platform. The system integrates OpenAI's Moderation API and Google Vision API to automatically analyze text and image content for policy violations.
## Architecture
### Components
1. **Database Layer** - PostgreSQL tables for storing moderation flags and user status
2. **AI Analysis Layer** - OpenAI (text) and Google Vision (image) API integration
3. **Service Layer** - Go backend services for content analysis and flag management
4. **CMS Integration** - Directus interface for moderation queue management
### Data Flow
```
User Content → Go Backend → AI APIs → Analysis Results → Database → Directus CMS → Admin Review
```
## Database Schema
### New Tables
#### `moderation_flags`
Stores AI-generated content moderation flags:
```sql
CREATE TABLE moderation_flags (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
post_id UUID REFERENCES posts(id) ON DELETE CASCADE,
comment_id UUID REFERENCES comments(id) ON DELETE CASCADE,
flag_reason TEXT NOT NULL,
scores JSONB NOT NULL,
status TEXT NOT NULL DEFAULT 'pending',
reviewed_by UUID REFERENCES users(id),
reviewed_at TIMESTAMP WITH TIME ZONE,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
```
#### `user_status_history`
Audit trail for user status changes:
```sql
CREATE TABLE user_status_history (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
old_status TEXT,
new_status TEXT NOT NULL,
reason TEXT,
changed_by UUID REFERENCES users(id),
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
```
### Modified Tables
#### `users`
Added status column for user moderation:
```sql
ALTER TABLE users ADD COLUMN status TEXT DEFAULT 'active'
CHECK (status IN ('active', 'suspended', 'banned'));
```
## API Integration
### OpenAI Moderation API
**Endpoint**: `https://api.openai.com/v1/moderations`
**Purpose**: Analyze text content for policy violations
**Categories Mapped**:
- Hate → Hate (violence, hate speech)
- Self-Harm → Delusion (self-harm content)
- Sexual → Hate (inappropriate content)
- Violence → Hate (violent content)
**Example Response**:
```json
{
"results": [{
"categories": {
"hate": 0.1,
"violence": 0.05,
"self-harm": 0.0
},
"category_scores": {
"hate": 0.1,
"violence": 0.05,
"self-harm": 0.0
},
"flagged": false
}]
}
```
### Google Vision API
**Endpoint**: `https://vision.googleapis.com/v1/images:annotate`
**Purpose**: Analyze images for inappropriate content using SafeSearch
**SafeSearch Categories Mapped**:
- Violence → Hate (violent imagery)
- Adult → Hate (adult content)
- Racy → Delusion (suggestive content)
**Example Response**:
```json
{
"responses": [{
"safeSearchAnnotation": {
"adult": "UNLIKELY",
"spoof": "UNLIKELY",
"medical": "UNLIKELY",
"violence": "UNLIKELY",
"racy": "UNLIKELY"
}
}]
}
```
## Three Poisons Score Mapping
The system maps AI analysis results to the Buddhist "Three Poisons" framework:
### Hate (Dvesha)
- **Sources**: OpenAI hate, violence, sexual content; Google violence, adult
- **Threshold**: > 0.5
- **Content**: Hate speech, violence, explicit content
### Greed (Lobha)
- **Sources**: Keyword-based detection (OpenAI doesn't detect spam well)
- **Keywords**: buy, crypto, rich, scam, investment, profit, money, trading, etc.
- **Threshold**: > 0.5
- **Content**: Spam, scams, financial exploitation
### Delusion (Moha)
- **Sources**: OpenAI self-harm; Google racy content
- **Threshold**: > 0.5
- **Content**: Self-harm, misinformation, inappropriate suggestions
## Service Implementation
### ModerationService
Key methods:
```go
// AnalyzeContent analyzes text and media with AI APIs
func (s *ModerationService) AnalyzeContent(ctx context.Context, body string, mediaURLs []string) (*ThreePoisonsScore, string, error)
// FlagPost creates a moderation flag for a post
func (s *ModerationService) FlagPost(ctx context.Context, postID uuid.UUID, scores *ThreePoisonsScore, reason string) error
// FlagComment creates a moderation flag for a comment
func (s *ModerationService) FlagComment(ctx context.Context, commentID uuid.UUID, scores *ThreePoisonsScore, reason string) error
// GetPendingFlags retrieves pending moderation flags for review
func (s *ModerationService) GetPendingFlags(ctx context.Context, limit, offset int) ([]map[string]interface{}, error)
// UpdateFlagStatus updates flag status after review
func (s *ModerationService) UpdateFlagStatus(ctx context.Context, flagID uuid.UUID, status string, reviewedBy uuid.UUID) error
// UpdateUserStatus updates user moderation status
func (s *ModerationService) UpdateUserStatus(ctx context.Context, userID uuid.UUID, status string, changedBy uuid.UUID, reason string) error
```
### Configuration
Environment variables:
```bash
# Enable/disable moderation system
MODERATION_ENABLED=true
# OpenAI API key for text moderation
OPENAI_API_KEY=sk-your-openai-key
# Google Vision API key for image analysis
GOOGLE_VISION_API_KEY=your-google-vision-key
```
## Directus Integration
### Permissions
The migration grants appropriate permissions to the Directus user:
```sql
GRANT SELECT, INSERT, UPDATE, DELETE ON moderation_flags TO directus;
GRANT SELECT, INSERT, UPDATE, DELETE ON user_status_history TO directus;
GRANT SELECT, UPDATE ON users TO directus;
```
### CMS Interface
Directus will automatically detect the new tables and allow you to build:
1. **Moderation Queue** - View pending flags with content preview
2. **User Management** - Manage user status (active/suspended/banned)
3. **Audit Trail** - View moderation history and user status changes
4. **Analytics** - Reports on moderation trends and statistics
### Recommended Directus Configuration
1. **Moderation Flags Collection**
- Hide technical fields (id, updated_at)
- Create custom display for scores (JSON visualization)
- Add status workflow buttons (approve/reject/escalate)
2. **Users Collection**
- Add status field with dropdown (active/suspended/banned)
- Create relationship to status history
- Add moderation statistics panel
3. **User Status History Collection**
- Read-only view for audit trail
- Filter by user and date range
- Export functionality for compliance
## Usage Examples
### Analyzing Content
```go
ctx := context.Background()
moderationService := NewModerationService(pool, openAIKey, googleKey)
// Analyze text and images
scores, reason, err := moderationService.AnalyzeContent(ctx, postContent, mediaURLs)
if err != nil {
log.Printf("Moderation analysis failed: %v", err)
return
}
// Flag content if needed
if reason != "" {
err = moderationService.FlagPost(ctx, postID, scores, reason)
if err != nil {
log.Printf("Failed to flag post: %v", err)
}
}
```
### Managing Moderation Queue
```go
// Get pending flags
flags, err := moderationService.GetPendingFlags(ctx, 50, 0)
if err != nil {
log.Printf("Failed to get pending flags: %v", err)
return
}
// Review and update flag status
for _, flag := range flags {
flagID := flag["id"].(uuid.UUID)
err = moderationService.UpdateFlagStatus(ctx, flagID, "approved", adminID)
if err != nil {
log.Printf("Failed to update flag status: %v", err)
}
}
```
### User Status Management
```go
// Suspend user for repeated violations
err = moderationService.UpdateUserStatus(ctx, userID, "suspended", adminID, "Multiple hate speech violations")
if err != nil {
log.Printf("Failed to update user status: %v", err)
}
```
## Performance Considerations
### API Rate Limits
- **OpenAI**: 60 requests/minute for moderation endpoint
- **Google Vision**: 1000 requests/minute per project
### Caching
Consider implementing caching for:
- Repeated content analysis
- User reputation scores
- API responses for identical content
### Batch Processing
For high-volume scenarios:
- Queue content for batch analysis
- Process multiple items in single API calls
- Implement background workers
## Security & Privacy
### Data Protection
- Content sent to third-party APIs
- Consider privacy implications
- Implement data retention policies
### API Key Security
- Store keys in environment variables
- Rotate keys regularly
- Monitor API usage for anomalies
### Compliance
- GDPR considerations for content analysis
- Data processing agreements with AI providers
- User consent for content analysis
## Monitoring & Alerting
### Metrics to Track
- API response times and error rates
- Flag volume by category
- Review queue length and processing time
- User status changes and appeals
### Alerting
- High API error rates
- Queue processing delays
- Unusual flag patterns
- API quota exhaustion
## Testing
### Unit Tests
```go
func TestAnalyzeContent(t *testing.T) {
service := NewModerationService(pool, "test-key", "test-key")
// Test hate content
scores, reason, err := service.AnalyzeContent(ctx, "I hate everyone", nil)
assert.NoError(t, err)
assert.Equal(t, "hate", reason)
assert.Greater(t, scores.Hate, 0.5)
}
```
### Integration Tests
- Test API integrations with mock servers
- Verify database operations
- Test Directus integration
### Load Testing
- Test API rate limit handling
- Verify database performance under load
- Test queue processing throughput
## Deployment
### Environment Setup
1. Set required environment variables
2. Run database migrations
3. Configure API keys
4. Test integrations
### Migration Steps
1. Deploy schema changes
2. Update application code
3. Configure Directus permissions
4. Test moderation flow
5. Monitor for issues
### Rollback Plan
- Database migration rollback
- Previous version deployment
- Data backup and restore procedures
## Future Enhancements
### Additional AI Providers
- Content moderation alternatives
- Multi-language support
- Custom model training
### Advanced Features
- Machine learning for false positive reduction
- User reputation scoring
- Automated escalation workflows
- Appeal process integration
### Analytics & Reporting
- Moderation effectiveness metrics
- Content trend analysis
- User behavior insights
- Compliance reporting
## Troubleshooting
### Common Issues
1. **API Key Errors**
- Verify environment variables
- Check API key permissions
- Monitor usage quotas
2. **Database Connection Issues**
- Verify migration completion
- Check Directus permissions
- Test database connectivity
3. **Performance Issues**
- Monitor API response times
- Check database query performance
- Review queue processing
### Debug Tools
- API request/response logging
- Database query logging
- Performance monitoring
- Error tracking and alerting
## Support & Maintenance
### Regular Tasks
- Monitor API usage and costs
- Review moderation accuracy
- Update keyword lists
- Maintain database performance
### Documentation Updates
- API documentation changes
- New feature additions
- Configuration updates
- Troubleshooting guides