mirror of https://github.com/RayLabsHQ/gitea-mirror.git synced 2025-12-06 19:46:44 +03:00

Files

Arunavo Ray daf4ab6a93 feat: Implement graceful shutdown and enhanced job recovery

- Added shutdown handler in docker-entrypoint.sh to manage application termination signals.
- Introduced shutdown manager to track active jobs and ensure state persistence during shutdown.
- Enhanced cleanup service to support stopping and status retrieval.
- Integrated signal handlers for proper response to termination signals (SIGTERM, SIGINT, SIGHUP).
- Updated middleware to initialize shutdown manager and cleanup service.
- Created integration tests for graceful shutdown functionality, verifying job state preservation and recovery.
- Documented graceful shutdown process and configuration in GRACEFUL_SHUTDOWN.md and SHUTDOWN_PROCESS.md.
- Added new scripts for testing shutdown behavior and cleanup.

2025-05-24 23:06:28 +05:30

7.5 KiB

Raw Blame History

Graceful Shutdown and Enhanced Job Recovery

This document describes the graceful shutdown and enhanced job recovery capabilities implemented in gitea-mirror v2.8.0+.

Overview

The gitea-mirror application now includes comprehensive graceful shutdown handling and enhanced job recovery mechanisms designed specifically for containerized environments. These features ensure:

No data loss during container restarts or shutdowns
Automatic job resumption after application restarts
Clean termination of all active processes and connections
Container-aware design optimized for Docker/LXC deployments

Features

1. Graceful Shutdown Manager

The shutdown manager (src/lib/shutdown-manager.ts) provides centralized coordination of application termination:

Key Capabilities:

Active Job Tracking: Monitors all running mirroring/sync jobs
State Persistence: Saves job progress to database before shutdown
Callback System: Allows services to register cleanup functions
Timeout Protection: Prevents hanging shutdowns with configurable timeouts
Signal Coordination: Works with signal handlers for proper container lifecycle

Configuration:

Shutdown Timeout: 30 seconds maximum (configurable)
Job Save Timeout: 10 seconds per job (configurable)

2. Signal Handlers

The signal handler system (src/lib/signal-handlers.ts) ensures proper response to container lifecycle events:

Supported Signals:

SIGTERM: Docker stop, Kubernetes pod termination
SIGINT: Ctrl+C, manual interruption
SIGHUP: Terminal hangup, service reload
Uncaught Exceptions: Emergency shutdown on critical errors
Unhandled Rejections: Graceful handling of promise failures

3. Enhanced Job Recovery

Building on the existing recovery system, new enhancements include:

Shutdown-Aware Processing:

Jobs check for shutdown signals during execution
Automatic state saving when shutdown is detected
Proper job status management (interrupted vs failed)

Container Integration:

Docker entrypoint script forwards signals correctly
Startup recovery runs before main application
Recovery timeouts prevent startup delays

Usage

Basic Operation

The graceful shutdown system is automatically initialized when the application starts. No manual configuration is required for basic operation.

Testing

Test the graceful shutdown functionality:

# Run the integration test
bun run test-shutdown

# Clean up test data
bun run test-shutdown-cleanup

# Run unit tests
bun test src/lib/shutdown-manager.test.ts
bun test src/lib/signal-handlers.test.ts

Manual Testing

Start the application:

bun run dev
# or in production
bun run start

Start a mirroring job through the web interface

Send shutdown signal:

# Send SIGTERM (recommended)
kill -TERM <process_id>

# Or use Ctrl+C for SIGINT

Verify job state is saved and can be resumed on restart

Container Testing

Test with Docker:

# Build and run container
docker build -t gitea-mirror .
docker run -d --name test-shutdown gitea-mirror

# Start a job, then stop container
docker stop test-shutdown

# Restart and verify recovery
docker start test-shutdown
docker logs test-shutdown

Implementation Details

Shutdown Flow

Signal Reception: Signal handlers detect termination request
Shutdown Initiation: Shutdown manager begins graceful termination
Job State Saving: All active jobs save current progress to database
Service Cleanup: Registered callbacks stop background services
Connection Cleanup: Database connections and resources are released
Process Termination: Application exits with appropriate code

Job State Management

During shutdown, active jobs are updated with:

inProgress: false - Mark as not currently running
lastCheckpoint: <timestamp> - Record shutdown time
message: "Job interrupted by application shutdown - will resume on restart"
Status remains as "imported" (not "failed") to enable recovery

Recovery Integration

The existing recovery system automatically detects and resumes interrupted jobs:

Jobs with inProgress: false and incomplete status are candidates for recovery
Recovery runs during application startup (before serving requests)
Jobs resume from their last checkpoint with remaining items

Configuration

Environment Variables

# Optional: Adjust shutdown timeout (default: 30000ms)
SHUTDOWN_TIMEOUT=30000

# Optional: Adjust job save timeout (default: 10000ms)
JOB_SAVE_TIMEOUT=10000

Docker Configuration

The Docker entrypoint script includes proper signal handling:

# Signals are forwarded to the application process
# SIGTERM is handled gracefully with 30-second timeout
# Container stops cleanly without force-killing processes

Kubernetes Configuration

For Kubernetes deployments, configure appropriate termination grace period:

apiVersion: v1
kind: Pod
spec:
  terminationGracePeriodSeconds: 45  # Allow time for graceful shutdown
  containers:
  - name: gitea-mirror
    # ... other configuration

Monitoring and Debugging

Logs

The application provides detailed logging during shutdown:

🛑 Graceful shutdown initiated by signal: SIGTERM
📊 Shutdown status: 2 active jobs, 1 callbacks
📝 Step 1: Saving active job states...
Saving state for job abc-123...
✅ Saved state for job abc-123
🔧 Step 2: Executing shutdown callbacks...
✅ Shutdown callback 1 completed
💾 Step 3: Closing database connections...
✅ Graceful shutdown completed successfully

Status Endpoints

Check shutdown manager status via API:

# Get current status (if application is running)
curl http://localhost:4321/api/health

Troubleshooting

Problem: Jobs not resuming after restart

Check: Startup recovery logs for errors
Verify: Database contains interrupted jobs with correct status
Test: Run bun run startup-recovery manually

Problem: Shutdown timeout reached

Check: Job complexity and database performance
Adjust: Increase SHUTDOWN_TIMEOUT environment variable
Monitor: Database connection and disk I/O during shutdown

Problem: Container force-killed

Check: Container orchestrator termination grace period
Adjust: Increase grace period to allow shutdown completion
Monitor: Application shutdown logs for timing issues

Best Practices

Development

Always test graceful shutdown during development
Use the provided test scripts to verify functionality
Monitor logs for shutdown timing and job state persistence

Production

Set appropriate container termination grace periods
Monitor shutdown logs for performance issues
Use health checks to verify application readiness after restart
Consider job complexity when planning maintenance windows

Monitoring

Track job recovery success rates
Monitor shutdown duration metrics
Alert on forced terminations or recovery failures
Log analysis for shutdown pattern optimization

Future Enhancements

Planned improvements for future versions:

Configurable Timeouts: Environment variable configuration for all timeouts
Shutdown Metrics: Prometheus metrics for shutdown performance
Progressive Shutdown: Graceful degradation of service capabilities
Job Prioritization: Priority-based job saving during shutdown
Health Check Integration: Readiness probes during shutdown process

7.5 KiB Raw Blame History