SECURITY CLEANUP COMPLETED High Priority - Sensitive Data Removed: - Delete directus_ecosystem_with_keys.js (contained DB password & API keys) - Delete directus_ecosystem_updated.js (contained credentials) - Delete directus_ecosystem_final.js (CRITICAL: real OpenAI API key) - Delete temp_server.env (complete production secrets) - Delete check_config.js (API key inspection script) - Delete extract_keys.ps1/.bat (key extraction scripts) - Delete fix_database_url.sh (server IP & SSH paths) - Delete setup_fcm_server.sh (sensitive config procedures) Medium Priority - AI-Generated Test Files: - Delete 5 test JavaScript files (OpenAI, Go backend, Vision API tests) - Delete 10 test registration JSON files (registration flow tests) - Delete 4 temporary Go files (AI-generated patches) Low Priority - Temporary Artifacts: - Delete _tmp_* files and directories - Delete log files (api_logs.txt, web_errors.log, flutter_01.log, log.ini) - Delete import requests.py (Python test script) Files Secured (Legitimate): - Keep .env file (contains legitimate production secrets) - Keep production scripts and configuration files - Keep organized migrations and documentation Cleanup Summary: - 30+ files removed - Risk level: HIGH LOW - No exposed API keys or credentials - Clean project structure - Enhanced security posture Documentation Added: - SECURITY_AUDIT_CLEANUP.md - Complete audit report - SQL_MIGRATION_ORGANIZATION.md - Migration organization guide - ENHANCED_REGISTRATION_FLOW.md - Registration system docs - TURNSTILE_INTEGRATION_COMPLETE.md - Security integration docs - USER_APPEAL_SYSTEM.md - Appeal system documentation Benefits: - Eliminated API key exposure - Removed sensitive server information - Clean AI-generated test artifacts - Professional project organization - Enhanced security practices - Comprehensive documentation
452 lines
11 KiB
Markdown
452 lines
11 KiB
Markdown
# AI Moderation System Implementation
|
|
|
|
## Overview
|
|
|
|
This document describes the implementation of a production-ready AI-powered content moderation system for the Sojorn platform. The system integrates OpenAI's Moderation API and Google Vision API to automatically analyze text and image content for policy violations.
|
|
|
|
## Architecture
|
|
|
|
### Components
|
|
|
|
1. **Database Layer** - PostgreSQL tables for storing moderation flags and user status
|
|
2. **AI Analysis Layer** - OpenAI (text) and Google Vision (image) API integration
|
|
3. **Service Layer** - Go backend services for content analysis and flag management
|
|
4. **CMS Integration** - Directus interface for moderation queue management
|
|
|
|
### Data Flow
|
|
|
|
```
|
|
User Content → Go Backend → AI APIs → Analysis Results → Database → Directus CMS → Admin Review
|
|
```
|
|
|
|
## Database Schema
|
|
|
|
### New Tables
|
|
|
|
#### `moderation_flags`
|
|
Stores AI-generated content moderation flags:
|
|
|
|
```sql
|
|
CREATE TABLE moderation_flags (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
post_id UUID REFERENCES posts(id) ON DELETE CASCADE,
|
|
comment_id UUID REFERENCES comments(id) ON DELETE CASCADE,
|
|
flag_reason TEXT NOT NULL,
|
|
scores JSONB NOT NULL,
|
|
status TEXT NOT NULL DEFAULT 'pending',
|
|
reviewed_by UUID REFERENCES users(id),
|
|
reviewed_at TIMESTAMP WITH TIME ZONE,
|
|
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW(),
|
|
updated_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
#### `user_status_history`
|
|
Audit trail for user status changes:
|
|
|
|
```sql
|
|
CREATE TABLE user_status_history (
|
|
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
|
user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
|
|
old_status TEXT,
|
|
new_status TEXT NOT NULL,
|
|
reason TEXT,
|
|
changed_by UUID REFERENCES users(id),
|
|
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
|
|
);
|
|
```
|
|
|
|
### Modified Tables
|
|
|
|
#### `users`
|
|
Added status column for user moderation:
|
|
|
|
```sql
|
|
ALTER TABLE users ADD COLUMN status TEXT DEFAULT 'active'
|
|
CHECK (status IN ('active', 'suspended', 'banned'));
|
|
```
|
|
|
|
## API Integration
|
|
|
|
### OpenAI Moderation API
|
|
|
|
**Endpoint**: `https://api.openai.com/v1/moderations`
|
|
|
|
**Purpose**: Analyze text content for policy violations
|
|
|
|
**Categories Mapped**:
|
|
- Hate → Hate (violence, hate speech)
|
|
- Self-Harm → Delusion (self-harm content)
|
|
- Sexual → Hate (inappropriate content)
|
|
- Violence → Hate (violent content)
|
|
|
|
**Example Response**:
|
|
```json
|
|
{
|
|
"results": [{
|
|
"categories": {
|
|
"hate": 0.1,
|
|
"violence": 0.05,
|
|
"self-harm": 0.0
|
|
},
|
|
"category_scores": {
|
|
"hate": 0.1,
|
|
"violence": 0.05,
|
|
"self-harm": 0.0
|
|
},
|
|
"flagged": false
|
|
}]
|
|
}
|
|
```
|
|
|
|
### Google Vision API
|
|
|
|
**Endpoint**: `https://vision.googleapis.com/v1/images:annotate`
|
|
|
|
**Purpose**: Analyze images for inappropriate content using SafeSearch
|
|
|
|
**SafeSearch Categories Mapped**:
|
|
- Violence → Hate (violent imagery)
|
|
- Adult → Hate (adult content)
|
|
- Racy → Delusion (suggestive content)
|
|
|
|
**Example Response**:
|
|
```json
|
|
{
|
|
"responses": [{
|
|
"safeSearchAnnotation": {
|
|
"adult": "UNLIKELY",
|
|
"spoof": "UNLIKELY",
|
|
"medical": "UNLIKELY",
|
|
"violence": "UNLIKELY",
|
|
"racy": "UNLIKELY"
|
|
}
|
|
}]
|
|
}
|
|
```
|
|
|
|
## Three Poisons Score Mapping
|
|
|
|
The system maps AI analysis results to the Buddhist "Three Poisons" framework:
|
|
|
|
### Hate (Dvesha)
|
|
- **Sources**: OpenAI hate, violence, sexual content; Google violence, adult
|
|
- **Threshold**: > 0.5
|
|
- **Content**: Hate speech, violence, explicit content
|
|
|
|
### Greed (Lobha)
|
|
- **Sources**: Keyword-based detection (OpenAI doesn't detect spam well)
|
|
- **Keywords**: buy, crypto, rich, scam, investment, profit, money, trading, etc.
|
|
- **Threshold**: > 0.5
|
|
- **Content**: Spam, scams, financial exploitation
|
|
|
|
### Delusion (Moha)
|
|
- **Sources**: OpenAI self-harm; Google racy content
|
|
- **Threshold**: > 0.5
|
|
- **Content**: Self-harm, misinformation, inappropriate suggestions
|
|
|
|
## Service Implementation
|
|
|
|
### ModerationService
|
|
|
|
Key methods:
|
|
|
|
```go
|
|
// AnalyzeContent analyzes text and media with AI APIs
|
|
func (s *ModerationService) AnalyzeContent(ctx context.Context, body string, mediaURLs []string) (*ThreePoisonsScore, string, error)
|
|
|
|
// FlagPost creates a moderation flag for a post
|
|
func (s *ModerationService) FlagPost(ctx context.Context, postID uuid.UUID, scores *ThreePoisonsScore, reason string) error
|
|
|
|
// FlagComment creates a moderation flag for a comment
|
|
func (s *ModerationService) FlagComment(ctx context.Context, commentID uuid.UUID, scores *ThreePoisonsScore, reason string) error
|
|
|
|
// GetPendingFlags retrieves pending moderation flags for review
|
|
func (s *ModerationService) GetPendingFlags(ctx context.Context, limit, offset int) ([]map[string]interface{}, error)
|
|
|
|
// UpdateFlagStatus updates flag status after review
|
|
func (s *ModerationService) UpdateFlagStatus(ctx context.Context, flagID uuid.UUID, status string, reviewedBy uuid.UUID) error
|
|
|
|
// UpdateUserStatus updates user moderation status
|
|
func (s *ModerationService) UpdateUserStatus(ctx context.Context, userID uuid.UUID, status string, changedBy uuid.UUID, reason string) error
|
|
```
|
|
|
|
### Configuration
|
|
|
|
Environment variables:
|
|
|
|
```bash
|
|
# Enable/disable moderation system
|
|
MODERATION_ENABLED=true
|
|
|
|
# OpenAI API key for text moderation
|
|
OPENAI_API_KEY=sk-your-openai-key
|
|
|
|
# Google Vision API key for image analysis
|
|
GOOGLE_VISION_API_KEY=your-google-vision-key
|
|
```
|
|
|
|
## Directus Integration
|
|
|
|
### Permissions
|
|
|
|
The migration grants appropriate permissions to the Directus user:
|
|
|
|
```sql
|
|
GRANT SELECT, INSERT, UPDATE, DELETE ON moderation_flags TO directus;
|
|
GRANT SELECT, INSERT, UPDATE, DELETE ON user_status_history TO directus;
|
|
GRANT SELECT, UPDATE ON users TO directus;
|
|
```
|
|
|
|
### CMS Interface
|
|
|
|
Directus will automatically detect the new tables and allow you to build:
|
|
|
|
1. **Moderation Queue** - View pending flags with content preview
|
|
2. **User Management** - Manage user status (active/suspended/banned)
|
|
3. **Audit Trail** - View moderation history and user status changes
|
|
4. **Analytics** - Reports on moderation trends and statistics
|
|
|
|
### Recommended Directus Configuration
|
|
|
|
1. **Moderation Flags Collection**
|
|
- Hide technical fields (id, updated_at)
|
|
- Create custom display for scores (JSON visualization)
|
|
- Add status workflow buttons (approve/reject/escalate)
|
|
|
|
2. **Users Collection**
|
|
- Add status field with dropdown (active/suspended/banned)
|
|
- Create relationship to status history
|
|
- Add moderation statistics panel
|
|
|
|
3. **User Status History Collection**
|
|
- Read-only view for audit trail
|
|
- Filter by user and date range
|
|
- Export functionality for compliance
|
|
|
|
## Usage Examples
|
|
|
|
### Analyzing Content
|
|
|
|
```go
|
|
ctx := context.Background()
|
|
moderationService := NewModerationService(pool, openAIKey, googleKey)
|
|
|
|
// Analyze text and images
|
|
scores, reason, err := moderationService.AnalyzeContent(ctx, postContent, mediaURLs)
|
|
if err != nil {
|
|
log.Printf("Moderation analysis failed: %v", err)
|
|
return
|
|
}
|
|
|
|
// Flag content if needed
|
|
if reason != "" {
|
|
err = moderationService.FlagPost(ctx, postID, scores, reason)
|
|
if err != nil {
|
|
log.Printf("Failed to flag post: %v", err)
|
|
}
|
|
}
|
|
```
|
|
|
|
### Managing Moderation Queue
|
|
|
|
```go
|
|
// Get pending flags
|
|
flags, err := moderationService.GetPendingFlags(ctx, 50, 0)
|
|
if err != nil {
|
|
log.Printf("Failed to get pending flags: %v", err)
|
|
return
|
|
}
|
|
|
|
// Review and update flag status
|
|
for _, flag := range flags {
|
|
flagID := flag["id"].(uuid.UUID)
|
|
err = moderationService.UpdateFlagStatus(ctx, flagID, "approved", adminID)
|
|
if err != nil {
|
|
log.Printf("Failed to update flag status: %v", err)
|
|
}
|
|
}
|
|
```
|
|
|
|
### User Status Management
|
|
|
|
```go
|
|
// Suspend user for repeated violations
|
|
err = moderationService.UpdateUserStatus(ctx, userID, "suspended", adminID, "Multiple hate speech violations")
|
|
if err != nil {
|
|
log.Printf("Failed to update user status: %v", err)
|
|
}
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
### API Rate Limits
|
|
|
|
- **OpenAI**: 60 requests/minute for moderation endpoint
|
|
- **Google Vision**: 1000 requests/minute per project
|
|
|
|
### Caching
|
|
|
|
Consider implementing caching for:
|
|
- Repeated content analysis
|
|
- User reputation scores
|
|
- API responses for identical content
|
|
|
|
### Batch Processing
|
|
|
|
For high-volume scenarios:
|
|
- Queue content for batch analysis
|
|
- Process multiple items in single API calls
|
|
- Implement background workers
|
|
|
|
## Security & Privacy
|
|
|
|
### Data Protection
|
|
|
|
- Content sent to third-party APIs
|
|
- Consider privacy implications
|
|
- Implement data retention policies
|
|
|
|
### API Key Security
|
|
|
|
- Store keys in environment variables
|
|
- Rotate keys regularly
|
|
- Monitor API usage for anomalies
|
|
|
|
### Compliance
|
|
|
|
- GDPR considerations for content analysis
|
|
- Data processing agreements with AI providers
|
|
- User consent for content analysis
|
|
|
|
## Monitoring & Alerting
|
|
|
|
### Metrics to Track
|
|
|
|
- API response times and error rates
|
|
- Flag volume by category
|
|
- Review queue length and processing time
|
|
- User status changes and appeals
|
|
|
|
### Alerting
|
|
|
|
- High API error rates
|
|
- Queue processing delays
|
|
- Unusual flag patterns
|
|
- API quota exhaustion
|
|
|
|
## Testing
|
|
|
|
### Unit Tests
|
|
|
|
```go
|
|
func TestAnalyzeContent(t *testing.T) {
|
|
service := NewModerationService(pool, "test-key", "test-key")
|
|
|
|
// Test hate content
|
|
scores, reason, err := service.AnalyzeContent(ctx, "I hate everyone", nil)
|
|
assert.NoError(t, err)
|
|
assert.Equal(t, "hate", reason)
|
|
assert.Greater(t, scores.Hate, 0.5)
|
|
}
|
|
```
|
|
|
|
### Integration Tests
|
|
|
|
- Test API integrations with mock servers
|
|
- Verify database operations
|
|
- Test Directus integration
|
|
|
|
### Load Testing
|
|
|
|
- Test API rate limit handling
|
|
- Verify database performance under load
|
|
- Test queue processing throughput
|
|
|
|
## Deployment
|
|
|
|
### Environment Setup
|
|
|
|
1. Set required environment variables
|
|
2. Run database migrations
|
|
3. Configure API keys
|
|
4. Test integrations
|
|
|
|
### Migration Steps
|
|
|
|
1. Deploy schema changes
|
|
2. Update application code
|
|
3. Configure Directus permissions
|
|
4. Test moderation flow
|
|
5. Monitor for issues
|
|
|
|
### Rollback Plan
|
|
|
|
- Database migration rollback
|
|
- Previous version deployment
|
|
- Data backup and restore procedures
|
|
|
|
## Future Enhancements
|
|
|
|
### Additional AI Providers
|
|
|
|
- Content moderation alternatives
|
|
- Multi-language support
|
|
- Custom model training
|
|
|
|
### Advanced Features
|
|
|
|
- Machine learning for false positive reduction
|
|
- User reputation scoring
|
|
- Automated escalation workflows
|
|
- Appeal process integration
|
|
|
|
### Analytics & Reporting
|
|
|
|
- Moderation effectiveness metrics
|
|
- Content trend analysis
|
|
- User behavior insights
|
|
- Compliance reporting
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
1. **API Key Errors**
|
|
- Verify environment variables
|
|
- Check API key permissions
|
|
- Monitor usage quotas
|
|
|
|
2. **Database Connection Issues**
|
|
- Verify migration completion
|
|
- Check Directus permissions
|
|
- Test database connectivity
|
|
|
|
3. **Performance Issues**
|
|
- Monitor API response times
|
|
- Check database query performance
|
|
- Review queue processing
|
|
|
|
### Debug Tools
|
|
|
|
- API request/response logging
|
|
- Database query logging
|
|
- Performance monitoring
|
|
- Error tracking and alerting
|
|
|
|
## Support & Maintenance
|
|
|
|
### Regular Tasks
|
|
|
|
- Monitor API usage and costs
|
|
- Review moderation accuracy
|
|
- Update keyword lists
|
|
- Maintain database performance
|
|
|
|
### Documentation Updates
|
|
|
|
- API documentation changes
|
|
- New feature additions
|
|
- Configuration updates
|
|
- Troubleshooting guides
|