Files
gitea-mirror/docs/GRACEFUL_SHUTDOWN.md
Arunavo Ray daf4ab6a93 feat: Implement graceful shutdown and enhanced job recovery
- Added shutdown handler in docker-entrypoint.sh to manage application termination signals.
- Introduced shutdown manager to track active jobs and ensure state persistence during shutdown.
- Enhanced cleanup service to support stopping and status retrieval.
- Integrated signal handlers for proper response to termination signals (SIGTERM, SIGINT, SIGHUP).
- Updated middleware to initialize shutdown manager and cleanup service.
- Created integration tests for graceful shutdown functionality, verifying job state preservation and recovery.
- Documented graceful shutdown process and configuration in GRACEFUL_SHUTDOWN.md and SHUTDOWN_PROCESS.md.
- Added new scripts for testing shutdown behavior and cleanup.
2025-05-24 23:06:28 +05:30

7.5 KiB

Graceful Shutdown and Enhanced Job Recovery

This document describes the graceful shutdown and enhanced job recovery capabilities implemented in gitea-mirror v2.8.0+.

Overview

The gitea-mirror application now includes comprehensive graceful shutdown handling and enhanced job recovery mechanisms designed specifically for containerized environments. These features ensure:

  • No data loss during container restarts or shutdowns
  • Automatic job resumption after application restarts
  • Clean termination of all active processes and connections
  • Container-aware design optimized for Docker/LXC deployments

Features

1. Graceful Shutdown Manager

The shutdown manager (src/lib/shutdown-manager.ts) provides centralized coordination of application termination:

Key Capabilities:

  • Active Job Tracking: Monitors all running mirroring/sync jobs
  • State Persistence: Saves job progress to database before shutdown
  • Callback System: Allows services to register cleanup functions
  • Timeout Protection: Prevents hanging shutdowns with configurable timeouts
  • Signal Coordination: Works with signal handlers for proper container lifecycle

Configuration:

  • Shutdown Timeout: 30 seconds maximum (configurable)
  • Job Save Timeout: 10 seconds per job (configurable)

2. Signal Handlers

The signal handler system (src/lib/signal-handlers.ts) ensures proper response to container lifecycle events:

Supported Signals:

  • SIGTERM: Docker stop, Kubernetes pod termination
  • SIGINT: Ctrl+C, manual interruption
  • SIGHUP: Terminal hangup, service reload
  • Uncaught Exceptions: Emergency shutdown on critical errors
  • Unhandled Rejections: Graceful handling of promise failures

3. Enhanced Job Recovery

Building on the existing recovery system, new enhancements include:

Shutdown-Aware Processing:

  • Jobs check for shutdown signals during execution
  • Automatic state saving when shutdown is detected
  • Proper job status management (interrupted vs failed)

Container Integration:

  • Docker entrypoint script forwards signals correctly
  • Startup recovery runs before main application
  • Recovery timeouts prevent startup delays

Usage

Basic Operation

The graceful shutdown system is automatically initialized when the application starts. No manual configuration is required for basic operation.

Testing

Test the graceful shutdown functionality:

# Run the integration test
bun run test-shutdown

# Clean up test data
bun run test-shutdown-cleanup

# Run unit tests
bun test src/lib/shutdown-manager.test.ts
bun test src/lib/signal-handlers.test.ts

Manual Testing

  1. Start the application:

    bun run dev
    # or in production
    bun run start
    
  2. Start a mirroring job through the web interface

  3. Send shutdown signal:

    # Send SIGTERM (recommended)
    kill -TERM <process_id>
    
    # Or use Ctrl+C for SIGINT
    
  4. Verify job state is saved and can be resumed on restart

Container Testing

Test with Docker:

# Build and run container
docker build -t gitea-mirror .
docker run -d --name test-shutdown gitea-mirror

# Start a job, then stop container
docker stop test-shutdown

# Restart and verify recovery
docker start test-shutdown
docker logs test-shutdown

Implementation Details

Shutdown Flow

  1. Signal Reception: Signal handlers detect termination request
  2. Shutdown Initiation: Shutdown manager begins graceful termination
  3. Job State Saving: All active jobs save current progress to database
  4. Service Cleanup: Registered callbacks stop background services
  5. Connection Cleanup: Database connections and resources are released
  6. Process Termination: Application exits with appropriate code

Job State Management

During shutdown, active jobs are updated with:

  • inProgress: false - Mark as not currently running
  • lastCheckpoint: <timestamp> - Record shutdown time
  • message: "Job interrupted by application shutdown - will resume on restart"
  • Status remains as "imported" (not "failed") to enable recovery

Recovery Integration

The existing recovery system automatically detects and resumes interrupted jobs:

  • Jobs with inProgress: false and incomplete status are candidates for recovery
  • Recovery runs during application startup (before serving requests)
  • Jobs resume from their last checkpoint with remaining items

Configuration

Environment Variables

# Optional: Adjust shutdown timeout (default: 30000ms)
SHUTDOWN_TIMEOUT=30000

# Optional: Adjust job save timeout (default: 10000ms)
JOB_SAVE_TIMEOUT=10000

Docker Configuration

The Docker entrypoint script includes proper signal handling:

# Signals are forwarded to the application process
# SIGTERM is handled gracefully with 30-second timeout
# Container stops cleanly without force-killing processes

Kubernetes Configuration

For Kubernetes deployments, configure appropriate termination grace period:

apiVersion: v1
kind: Pod
spec:
  terminationGracePeriodSeconds: 45  # Allow time for graceful shutdown
  containers:
  - name: gitea-mirror
    # ... other configuration

Monitoring and Debugging

Logs

The application provides detailed logging during shutdown:

🛑 Graceful shutdown initiated by signal: SIGTERM
📊 Shutdown status: 2 active jobs, 1 callbacks
📝 Step 1: Saving active job states...
Saving state for job abc-123...
✅ Saved state for job abc-123
🔧 Step 2: Executing shutdown callbacks...
✅ Shutdown callback 1 completed
💾 Step 3: Closing database connections...
✅ Graceful shutdown completed successfully

Status Endpoints

Check shutdown manager status via API:

# Get current status (if application is running)
curl http://localhost:4321/api/health

Troubleshooting

Problem: Jobs not resuming after restart

  • Check: Startup recovery logs for errors
  • Verify: Database contains interrupted jobs with correct status
  • Test: Run bun run startup-recovery manually

Problem: Shutdown timeout reached

  • Check: Job complexity and database performance
  • Adjust: Increase SHUTDOWN_TIMEOUT environment variable
  • Monitor: Database connection and disk I/O during shutdown

Problem: Container force-killed

  • Check: Container orchestrator termination grace period
  • Adjust: Increase grace period to allow shutdown completion
  • Monitor: Application shutdown logs for timing issues

Best Practices

Development

  • Always test graceful shutdown during development
  • Use the provided test scripts to verify functionality
  • Monitor logs for shutdown timing and job state persistence

Production

  • Set appropriate container termination grace periods
  • Monitor shutdown logs for performance issues
  • Use health checks to verify application readiness after restart
  • Consider job complexity when planning maintenance windows

Monitoring

  • Track job recovery success rates
  • Monitor shutdown duration metrics
  • Alert on forced terminations or recovery failures
  • Log analysis for shutdown pattern optimization

Future Enhancements

Planned improvements for future versions:

  1. Configurable Timeouts: Environment variable configuration for all timeouts
  2. Shutdown Metrics: Prometheus metrics for shutdown performance
  3. Progressive Shutdown: Graceful degradation of service capabilities
  4. Job Prioritization: Priority-based job saving during shutdown
  5. Health Check Integration: Readiness probes during shutdown process