sojorn/sojorn_docs/TROUBLESHOOTING_COMPREHENSIVE.md

14 KiB

Troubleshooting Comprehensive Guide

Overview

This guide consolidates all common issues, debugging procedures, and solutions for the Sojorn platform, covering authentication, notifications, E2EE chat, backend services, and deployment issues.


Authentication Issues

JWT Algorithm Mismatch (ES256 vs HS256)

Problem: 401 Unauthorized errors due to JWT algorithm mismatch between client and server.

Symptoms:

  • Edge Functions rejecting JWT with 401 errors
  • Authentication working in development but not production
  • Cached sessions appearing to fail

Root Cause: Supabase project issuing ES256 JWTs while backend expects HS256.

Diagnosis:

  1. Decode JWT at https://jwt.io
  2. Check header algorithm:
    {
      "alg": "ES256",  // Problem: backend expects HS256
      "kid": "b66bc58d-34b8-4..."
    }
    

Solutions:

Option A: Update Backend to Accept ES256

// In your JWT validation middleware
token, err := jwt.Parse(tokenString, func(token *jwt.Token) (interface{}, error) {
    if _, ok := token.Method.(*jwt.SigningMethodECDSA); !ok {
        return nil, fmt.Errorf("unexpected signing method: %v", token.Header["alg"])
    }
    return publicKey, nil
})

Option B: Configure Supabase to Use HS256

  1. Go to Supabase Dashboard → Settings → API
  2. Change JWT signing algorithm to HS256
  3. Regenerate API keys if needed

Verification:

# Test JWT validation
curl -H "Authorization: Bearer <token>" https://api.sojorn.net/health

FCM/Push Notification Issues

Web Notifications Not Working

Symptoms:

  • "Web push is missing FIREBASE_WEB_VAPID_KEY" error
  • No notification permission prompt
  • Token registration fails

Diagnostics:

// Check browser console
FCM token registered (web): d2n2ELGKel7yzPL3wZLGSe...

Solutions:

1. Check VAPID Key Configuration

File: sojorn_app/lib/config/firebase_web_config.dart

static const String _vapidKey = 'BNxS7_your_actual_vapid_key_here';

2. Verify Service Worker

Check DevTools > Application > Service Workers for firebase-messaging-sw.js

3. Test Permission Status

// In browser console
Notification.permission === 'granted'

Android Notifications Not Working

Symptoms:

  • Web notifications work, Android doesn't
  • No FCM token generated on Android
  • "Token is null after getToken()" error

Diagnostics:

adb logcat | findstr "FCM"

Expected Logs:

[FCM] Initializing for platform: android
[FCM] Token registered (android): eXaMpLe...
[FCM] Token synced with Go Backend successfully

Solutions:

1. Verify google-services.json

ls sojorn_app/android/app/google-services.json

Check package name matches: "package_name": "com.gosojorn.app"

2. Check Build Configuration

File: sojorn_app/android/app/build.gradle.kts

applicationId = "com.gosojorn.app"
plugins {
    id("com.google.gms.google-services")
}

3. Verify Permissions

File: AndroidManifest.xml

<uses-permission android:name="android.permission.POST_NOTIFICATIONS" />

4. Reinstall App

adb uninstall com.gosojorn.app
flutter run

Backend Push Service Issues

Symptoms:

  • "Failed to initialize PushService" error
  • Notifications not being sent

Diagnostics:

# Check service account file
ls -la /opt/sojorn/firebase-service-account.json

# Check .env configuration
sudo cat /opt/sojorn/.env | grep FIREBASE

# Validate JSON
cat /opt/sojorn/firebase-service-account.json | jq .

# Check logs
sudo journalctl -u sojorn-api -f | grep -i push

Solutions:

  1. Ensure service account JSON exists and is valid
  2. Verify file permissions (600)
  3. Check Firebase project configuration

E2EE Chat Issues

Key Generation Problems

Symptoms:

  • 208-bit keys instead of 256-bit
  • Zero signatures
  • Key upload failures

Diagnostics:

# Check database for keys
sudo -u postgres psql sojorn -c "SELECT user_id, LEFT(identity_key, 20) FROM profiles WHERE identity_key IS NOT NULL;"

Common Issues & Solutions:

1. 208-bit Key Bug

Problem: String-based KDF instead of byte-based Solution: Update _kdf method to use SHA-256 on byte arrays

2. Fake Zero Signatures

Problem: Manual upload using fake signatures Solution: Generate real Ed25519 signatures in key upload

3. Database Constraint Errors

Problem: SQLSTATE 42P10 - constraint mismatch Solution: Use correct constraint ON CONFLICT (user_id, key_id)

Message Encryption/Decryption Failures

Symptoms:

  • Messages not decrypting
  • MAC verification failures
  • "Cannot decrypt own messages" issue

Diagnostics:

# Check message headers
sudo -u postgres psql sojorn -c "SELECT LEFT(message_header, 50) FROM encrypted_messages LIMIT 5;"

Expected Header Format:

{
  "epk": "<base64 sender ephemeral public key>",
  "n":   "<base64 nonce>",
  "m":   "<base64 MAC>",
  "v":   1
}

Solutions:

1. Verify Key Bundle Format

Identity Key Format: Ed25519:X25519 (base64 concatenated with colon)

2. Check Signature Verification

Ensure both users enforce signature verification (no legacy asymmetry)

3. Validate OTK Management

Check one-time prekeys are being generated and deleted properly


Backend Service Issues

CORS Problems

Symptoms:

  • "Failed to fetch" errors
  • CORS policy errors in browser console
  • Pre-flight request failures

Diagnostics:

# Check Nginx configuration
sudo nginx -t

# Check Go CORS logs
sudo journalctl -u sojorn-api -f | grep -i cors

Solutions:

1. Dynamic Origin Matching

allowedOrigins := strings.Split(cfg.CORSOrigins, ",")
allowAllOrigins := false
allowedOriginSet := make(map[string]struct{})

for _, origin := range allowedOrigins {
    trimmed := strings.TrimSpace(origin)
    if trimmed == "*" {
        allowAllOrigins = true
        break
    }
    allowedOriginSet[trimmed] = struct{}{}
}

2. Nginx CORS Headers

add_header 'Access-Control-Allow-Origin' '$http_origin';
add_header 'Access-Control-Allow-Credentials' 'true';

Database Connection Issues

Symptoms:

  • Database connection timeouts
  • "Unable to connect to database" errors
  • Connection pool exhaustion

Diagnostics:

# Check PostgreSQL status
sudo systemctl status postgresql

# Check connection count
sudo -u postgres psql -c "SELECT count(*) FROM pg_stat_activity;"

# Check Go backend logs
sudo journalctl -u sojorn-api -f | grep -i database

Solutions:

1. Verify Connection String

# Check .env file
sudo cat /opt/sojorn/.env | grep DATABASE_URL

2. Adjust Connection Pool

// In database connection setup
config, err := pgxpool.ParseConfig(databaseURL)
config.MaxConns = 20
config.MinConns = 5

3. Check Database Resources

# Check available connections
sudo -u postgres psql -c "SELECT max_connections FROM pg_settings;"

Service Startup Issues

Symptoms:

  • Service fails to start
  • Port already in use errors
  • Configuration file errors

Diagnostics:

# Check service status
sudo systemctl status sojorn-api

# Check port usage
sudo netstat -tlnp | grep :8080

# Check logs
sudo journalctl -u sojorn-api -n 50

Solutions:

1. Fix Port Conflicts

# Kill process using port 8080
sudo fuser -k 8080/tcp

# Or change port in .env
PORT=8081

2. Verify Configuration

# Test configuration
cd /opt/sojorn/go-backend
go run ./cmd/api/main.go

Media Upload Issues

File Upload Failures

Symptoms:

  • Upload timeouts
  • File size limit errors
  • Permission denied errors

Diagnostics:

# Check upload directory
ls -la /opt/sojorn/uploads/

# Check Nginx limits
grep client_max_body_size /etc/nginx/nginx.conf

# Check disk space
df -h /opt/sojorn/

Solutions:

1. Fix Directory Permissions

sudo chown -R patrick:patrick /opt/sojorn/uploads/
sudo chmod -R 755 /opt/sojorn/uploads/

2. Increase Upload Limits

# In Nginx config
client_max_body_size 50M;

3. Configure Go Limits

// In main.go
r.MaxMultipartMemory = 32 << 20 // 32 MB

R2/Cloud Storage Issues

Symptoms:

  • R2 upload failures
  • Authentication errors
  • CORS issues with direct uploads

Diagnostics:

# Check R2 configuration
sudo cat /opt/sojorn/.env | grep R2

# Test R2 connection
curl -I https://<your-r2-domain>.r2.cloudflarestorage.com

Solutions:

1. Verify R2 Credentials

  • Check R2 token permissions
  • Verify bucket exists
  • Test API access

2. Fix CORS for Direct Uploads

Configure CORS in R2 bucket settings for direct browser uploads.


Performance Issues

Slow API Response Times

Symptoms:

  • Requests taking > 2 seconds
  • Database query timeouts
  • High CPU usage

Diagnostics:

# Check system resources
top
htop

# Check database queries
sudo -u postgres psql -c "SELECT query, mean_time, calls FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;"

# Check Go goroutines
curl http://localhost:8080/debug/pprof/goroutine?debug=1

Solutions:

1. Database Optimization

-- Add indexes
CREATE INDEX CONCURRENTLY idx_posts_created_at ON posts(created_at DESC);
CREATE INDEX CONCURRENTLY idx_posts_author_id ON posts(author_id);

2. Connection Pool Tuning

config.MaxConns = 25
config.MaxConnLifetime = time.Hour
config.HealthCheckPeriod = time.Minute * 5

3. Enable Query Logging

// Add to database config
config.ConnConfig.LogLevel = pgx.LogLevelInfo

Memory Leaks

Symptoms:

  • Memory usage increasing over time
  • Out of memory errors
  • Service crashes

Diagnostics:

# Monitor memory usage
watch -n 1 'ps aux | grep sojorn-api'

# Check Go memory stats
curl http://localhost:8080/debug/pprof/heap

Solutions:

1. Profile Memory Usage

go tool pprof http://localhost:8080/debug/pprof/heap

2. Fix Goroutine Leaks

// Ensure proper cleanup
defer cancel()
defer wg.Wait()

Deployment Issues

SSL/TLS Certificate Problems

Symptoms:

  • Certificate expired errors
  • SSL handshake failures
  • Mixed content warnings

Diagnostics:

# Check certificate status
sudo certbot certificates

# Test SSL configuration
sudo nginx -t

# Check certificate expiry
openssl x509 -in /etc/letsencrypt/live/api.sojorn.net/cert.pem -text -noout | grep "Not After"

Solutions:

1. Renew Certificates

sudo certbot renew --dry-run
sudo certbot renew
sudo systemctl reload nginx

2. Fix Nginx SSL Config

ssl_certificate /etc/letsencrypt/live/api.sojorn.net/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/api.sojorn.net/privkey.pem;

DNS Propagation Issues

Symptoms:

  • Domain not resolving
  • pointing to wrong IP
  • TTL still propagating

Diagnostics:

# Check DNS resolution
nslookup api.sojorn.net
dig api.sojorn.net

# Check propagation
for i in {1..10}; do echo "Attempt $i:"; dig api.sojorn.net +short; sleep 30; done

Solutions:

1. Verify DNS Records

# Check A record
dig api.sojorn.net A

# Check with multiple DNS servers
dig @8.8.8.8 api.sojorn.net
dig @1.1.1.1 api.sojorn.net

2. Reduce TTL Before Changes

Set TTL to 300 seconds before making DNS changes.


Debugging Tools & Commands

Essential Commands

# Service Management
sudo systemctl status sojorn-api
sudo systemctl restart sojorn-api
sudo journalctl -u sojorn-api -f

# Database
sudo -u postgres psql sojorn
sudo -u postgres psql -c "SELECT count(*) FROM users;"

# Network
sudo netstat -tlnp | grep :8080
curl -I https://api.sojorn.net/health

# Logs
sudo tail -f /var/log/nginx/access.log
sudo tail -f /var/log/nginx/error.log

# File System
ls -la /opt/sojorn/
df -h /opt/sojorn/

Monitoring Scripts

#!/bin/bash
# monitor.sh - Basic health check

echo "=== Service Status ==="
sudo systemctl is-active sojorn-api

echo "=== Database Connections ==="
sudo -u postgres psql -c "SELECT count(*) FROM pg_stat_activity;"

echo "=== Disk Space ==="
df -h /opt/sojorn/

echo "=== Memory Usage ==="
free -h

echo "=== Recent Errors ==="
sudo journalctl -u sojorn-api --since "1 hour ago" | grep -i error

Emergency Procedures

Service Recovery

  1. Immediate Response:

    sudo systemctl restart sojorn-api
    sudo systemctl restart nginx
    sudo systemctl restart postgresql
    
  2. Check Logs:

    sudo journalctl -u sojorn-api -n 100
    sudo journalctl -u nginx -n 100
    
  3. Verify Health:

    curl https://api.sojorn.net/health
    

Database Recovery

  1. Check Database Status:

    sudo systemctl status postgresql
    sudo -u postgres psql -c "SELECT 1;"
    
  2. Restore from Backup:

    sudo -u postgres psql sojorn < backup.sql
    
  3. Verify Data Integrity:

    sudo -u postgres psql -c "SELECT COUNT(*) FROM users;"
    

Contact & Support

Information to Gather

When reporting issues, include:

  1. Environment Details:

    • OS version
    • Service versions
    • Configuration files (redacted)
  2. Error Messages:

    • Full error messages
    • Stack traces
    • Log entries
  3. Reproduction Steps:

    • What triggers the issue
    • Frequency
    • Impact assessment
  4. Diagnostic Output:

    • Service status
    • Resource usage
    • Network tests

Escalation Procedures

  1. Level 1: Check this guide and run basic diagnostics
  2. Level 2: Collect detailed logs and metrics
  3. Level 3: Contact infrastructure provider if needed

Last Updated: January 30, 2026 Version: 1.0 Next Review: February 15, 2026