- Added shutdown handler in docker-entrypoint.sh to manage application termination signals. - Introduced shutdown manager to track active jobs and ensure state persistence during shutdown. - Enhanced cleanup service to support stopping and status retrieval. - Integrated signal handlers for proper response to termination signals (SIGTERM, SIGINT, SIGHUP). - Updated middleware to initialize shutdown manager and cleanup service. - Created integration tests for graceful shutdown functionality, verifying job state preservation and recovery. - Documented graceful shutdown process and configuration in GRACEFUL_SHUTDOWN.md and SHUTDOWN_PROCESS.md. - Added new scripts for testing shutdown behavior and cleanup.
7.5 KiB
Graceful Shutdown and Enhanced Job Recovery
This document describes the graceful shutdown and enhanced job recovery capabilities implemented in gitea-mirror v2.8.0+.
Overview
The gitea-mirror application now includes comprehensive graceful shutdown handling and enhanced job recovery mechanisms designed specifically for containerized environments. These features ensure:
- No data loss during container restarts or shutdowns
- Automatic job resumption after application restarts
- Clean termination of all active processes and connections
- Container-aware design optimized for Docker/LXC deployments
Features
1. Graceful Shutdown Manager
The shutdown manager (src/lib/shutdown-manager.ts) provides centralized coordination of application termination:
Key Capabilities:
- Active Job Tracking: Monitors all running mirroring/sync jobs
- State Persistence: Saves job progress to database before shutdown
- Callback System: Allows services to register cleanup functions
- Timeout Protection: Prevents hanging shutdowns with configurable timeouts
- Signal Coordination: Works with signal handlers for proper container lifecycle
Configuration:
- Shutdown Timeout: 30 seconds maximum (configurable)
- Job Save Timeout: 10 seconds per job (configurable)
2. Signal Handlers
The signal handler system (src/lib/signal-handlers.ts) ensures proper response to container lifecycle events:
Supported Signals:
- SIGTERM: Docker stop, Kubernetes pod termination
- SIGINT: Ctrl+C, manual interruption
- SIGHUP: Terminal hangup, service reload
- Uncaught Exceptions: Emergency shutdown on critical errors
- Unhandled Rejections: Graceful handling of promise failures
3. Enhanced Job Recovery
Building on the existing recovery system, new enhancements include:
Shutdown-Aware Processing:
- Jobs check for shutdown signals during execution
- Automatic state saving when shutdown is detected
- Proper job status management (interrupted vs failed)
Container Integration:
- Docker entrypoint script forwards signals correctly
- Startup recovery runs before main application
- Recovery timeouts prevent startup delays
Usage
Basic Operation
The graceful shutdown system is automatically initialized when the application starts. No manual configuration is required for basic operation.
Testing
Test the graceful shutdown functionality:
# Run the integration test
bun run test-shutdown
# Clean up test data
bun run test-shutdown-cleanup
# Run unit tests
bun test src/lib/shutdown-manager.test.ts
bun test src/lib/signal-handlers.test.ts
Manual Testing
-
Start the application:
bun run dev # or in production bun run start -
Start a mirroring job through the web interface
-
Send shutdown signal:
# Send SIGTERM (recommended) kill -TERM <process_id> # Or use Ctrl+C for SIGINT -
Verify job state is saved and can be resumed on restart
Container Testing
Test with Docker:
# Build and run container
docker build -t gitea-mirror .
docker run -d --name test-shutdown gitea-mirror
# Start a job, then stop container
docker stop test-shutdown
# Restart and verify recovery
docker start test-shutdown
docker logs test-shutdown
Implementation Details
Shutdown Flow
- Signal Reception: Signal handlers detect termination request
- Shutdown Initiation: Shutdown manager begins graceful termination
- Job State Saving: All active jobs save current progress to database
- Service Cleanup: Registered callbacks stop background services
- Connection Cleanup: Database connections and resources are released
- Process Termination: Application exits with appropriate code
Job State Management
During shutdown, active jobs are updated with:
inProgress: false- Mark as not currently runninglastCheckpoint: <timestamp>- Record shutdown timemessage: "Job interrupted by application shutdown - will resume on restart"- Status remains as
"imported"(not"failed") to enable recovery
Recovery Integration
The existing recovery system automatically detects and resumes interrupted jobs:
- Jobs with
inProgress: falseand incomplete status are candidates for recovery - Recovery runs during application startup (before serving requests)
- Jobs resume from their last checkpoint with remaining items
Configuration
Environment Variables
# Optional: Adjust shutdown timeout (default: 30000ms)
SHUTDOWN_TIMEOUT=30000
# Optional: Adjust job save timeout (default: 10000ms)
JOB_SAVE_TIMEOUT=10000
Docker Configuration
The Docker entrypoint script includes proper signal handling:
# Signals are forwarded to the application process
# SIGTERM is handled gracefully with 30-second timeout
# Container stops cleanly without force-killing processes
Kubernetes Configuration
For Kubernetes deployments, configure appropriate termination grace period:
apiVersion: v1
kind: Pod
spec:
terminationGracePeriodSeconds: 45 # Allow time for graceful shutdown
containers:
- name: gitea-mirror
# ... other configuration
Monitoring and Debugging
Logs
The application provides detailed logging during shutdown:
🛑 Graceful shutdown initiated by signal: SIGTERM
📊 Shutdown status: 2 active jobs, 1 callbacks
📝 Step 1: Saving active job states...
Saving state for job abc-123...
✅ Saved state for job abc-123
🔧 Step 2: Executing shutdown callbacks...
✅ Shutdown callback 1 completed
💾 Step 3: Closing database connections...
✅ Graceful shutdown completed successfully
Status Endpoints
Check shutdown manager status via API:
# Get current status (if application is running)
curl http://localhost:4321/api/health
Troubleshooting
Problem: Jobs not resuming after restart
- Check: Startup recovery logs for errors
- Verify: Database contains interrupted jobs with correct status
- Test: Run
bun run startup-recoverymanually
Problem: Shutdown timeout reached
- Check: Job complexity and database performance
- Adjust: Increase
SHUTDOWN_TIMEOUTenvironment variable - Monitor: Database connection and disk I/O during shutdown
Problem: Container force-killed
- Check: Container orchestrator termination grace period
- Adjust: Increase grace period to allow shutdown completion
- Monitor: Application shutdown logs for timing issues
Best Practices
Development
- Always test graceful shutdown during development
- Use the provided test scripts to verify functionality
- Monitor logs for shutdown timing and job state persistence
Production
- Set appropriate container termination grace periods
- Monitor shutdown logs for performance issues
- Use health checks to verify application readiness after restart
- Consider job complexity when planning maintenance windows
Monitoring
- Track job recovery success rates
- Monitor shutdown duration metrics
- Alert on forced terminations or recovery failures
- Log analysis for shutdown pattern optimization
Future Enhancements
Planned improvements for future versions:
- Configurable Timeouts: Environment variable configuration for all timeouts
- Shutdown Metrics: Prometheus metrics for shutdown performance
- Progressive Shutdown: Graceful degradation of service capabilities
- Job Prioritization: Priority-based job saving during shutdown
- Health Check Integration: Readiness probes during shutdown process