mirror of
https://github.com/RayLabsHQ/gitea-mirror.git
synced 2025-12-06 11:36:44 +03:00
- Added shutdown handler in docker-entrypoint.sh to manage application termination signals. - Introduced shutdown manager to track active jobs and ensure state persistence during shutdown. - Enhanced cleanup service to support stopping and status retrieval. - Integrated signal handlers for proper response to termination signals (SIGTERM, SIGINT, SIGHUP). - Updated middleware to initialize shutdown manager and cleanup service. - Created integration tests for graceful shutdown functionality, verifying job state preservation and recovery. - Documented graceful shutdown process and configuration in GRACEFUL_SHUTDOWN.md and SHUTDOWN_PROCESS.md. - Added new scripts for testing shutdown behavior and cleanup.
250 lines
7.5 KiB
Markdown
250 lines
7.5 KiB
Markdown
# Graceful Shutdown and Enhanced Job Recovery
|
|
|
|
This document describes the graceful shutdown and enhanced job recovery capabilities implemented in gitea-mirror v2.8.0+.
|
|
|
|
## Overview
|
|
|
|
The gitea-mirror application now includes comprehensive graceful shutdown handling and enhanced job recovery mechanisms designed specifically for containerized environments. These features ensure:
|
|
|
|
- **No data loss** during container restarts or shutdowns
|
|
- **Automatic job resumption** after application restarts
|
|
- **Clean termination** of all active processes and connections
|
|
- **Container-aware design** optimized for Docker/LXC deployments
|
|
|
|
## Features
|
|
|
|
### 1. Graceful Shutdown Manager
|
|
|
|
The shutdown manager (`src/lib/shutdown-manager.ts`) provides centralized coordination of application termination:
|
|
|
|
#### Key Capabilities:
|
|
- **Active Job Tracking**: Monitors all running mirroring/sync jobs
|
|
- **State Persistence**: Saves job progress to database before shutdown
|
|
- **Callback System**: Allows services to register cleanup functions
|
|
- **Timeout Protection**: Prevents hanging shutdowns with configurable timeouts
|
|
- **Signal Coordination**: Works with signal handlers for proper container lifecycle
|
|
|
|
#### Configuration:
|
|
- **Shutdown Timeout**: 30 seconds maximum (configurable)
|
|
- **Job Save Timeout**: 10 seconds per job (configurable)
|
|
|
|
### 2. Signal Handlers
|
|
|
|
The signal handler system (`src/lib/signal-handlers.ts`) ensures proper response to container lifecycle events:
|
|
|
|
#### Supported Signals:
|
|
- **SIGTERM**: Docker stop, Kubernetes pod termination
|
|
- **SIGINT**: Ctrl+C, manual interruption
|
|
- **SIGHUP**: Terminal hangup, service reload
|
|
- **Uncaught Exceptions**: Emergency shutdown on critical errors
|
|
- **Unhandled Rejections**: Graceful handling of promise failures
|
|
|
|
### 3. Enhanced Job Recovery
|
|
|
|
Building on the existing recovery system, new enhancements include:
|
|
|
|
#### Shutdown-Aware Processing:
|
|
- Jobs check for shutdown signals during execution
|
|
- Automatic state saving when shutdown is detected
|
|
- Proper job status management (interrupted vs failed)
|
|
|
|
#### Container Integration:
|
|
- Docker entrypoint script forwards signals correctly
|
|
- Startup recovery runs before main application
|
|
- Recovery timeouts prevent startup delays
|
|
|
|
## Usage
|
|
|
|
### Basic Operation
|
|
|
|
The graceful shutdown system is automatically initialized when the application starts. No manual configuration is required for basic operation.
|
|
|
|
### Testing
|
|
|
|
Test the graceful shutdown functionality:
|
|
|
|
```bash
|
|
# Run the integration test
|
|
bun run test-shutdown
|
|
|
|
# Clean up test data
|
|
bun run test-shutdown-cleanup
|
|
|
|
# Run unit tests
|
|
bun test src/lib/shutdown-manager.test.ts
|
|
bun test src/lib/signal-handlers.test.ts
|
|
```
|
|
|
|
### Manual Testing
|
|
|
|
1. **Start the application**:
|
|
```bash
|
|
bun run dev
|
|
# or in production
|
|
bun run start
|
|
```
|
|
|
|
2. **Start a mirroring job** through the web interface
|
|
|
|
3. **Send shutdown signal**:
|
|
```bash
|
|
# Send SIGTERM (recommended)
|
|
kill -TERM <process_id>
|
|
|
|
# Or use Ctrl+C for SIGINT
|
|
```
|
|
|
|
4. **Verify job state** is saved and can be resumed on restart
|
|
|
|
### Container Testing
|
|
|
|
Test with Docker:
|
|
|
|
```bash
|
|
# Build and run container
|
|
docker build -t gitea-mirror .
|
|
docker run -d --name test-shutdown gitea-mirror
|
|
|
|
# Start a job, then stop container
|
|
docker stop test-shutdown
|
|
|
|
# Restart and verify recovery
|
|
docker start test-shutdown
|
|
docker logs test-shutdown
|
|
```
|
|
|
|
## Implementation Details
|
|
|
|
### Shutdown Flow
|
|
|
|
1. **Signal Reception**: Signal handlers detect termination request
|
|
2. **Shutdown Initiation**: Shutdown manager begins graceful termination
|
|
3. **Job State Saving**: All active jobs save current progress to database
|
|
4. **Service Cleanup**: Registered callbacks stop background services
|
|
5. **Connection Cleanup**: Database connections and resources are released
|
|
6. **Process Termination**: Application exits with appropriate code
|
|
|
|
### Job State Management
|
|
|
|
During shutdown, active jobs are updated with:
|
|
- `inProgress: false` - Mark as not currently running
|
|
- `lastCheckpoint: <timestamp>` - Record shutdown time
|
|
- `message: "Job interrupted by application shutdown - will resume on restart"`
|
|
- Status remains as `"imported"` (not `"failed"`) to enable recovery
|
|
|
|
### Recovery Integration
|
|
|
|
The existing recovery system automatically detects and resumes interrupted jobs:
|
|
- Jobs with `inProgress: false` and incomplete status are candidates for recovery
|
|
- Recovery runs during application startup (before serving requests)
|
|
- Jobs resume from their last checkpoint with remaining items
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables
|
|
|
|
```bash
|
|
# Optional: Adjust shutdown timeout (default: 30000ms)
|
|
SHUTDOWN_TIMEOUT=30000
|
|
|
|
# Optional: Adjust job save timeout (default: 10000ms)
|
|
JOB_SAVE_TIMEOUT=10000
|
|
```
|
|
|
|
### Docker Configuration
|
|
|
|
The Docker entrypoint script includes proper signal handling:
|
|
|
|
```dockerfile
|
|
# Signals are forwarded to the application process
|
|
# SIGTERM is handled gracefully with 30-second timeout
|
|
# Container stops cleanly without force-killing processes
|
|
```
|
|
|
|
### Kubernetes Configuration
|
|
|
|
For Kubernetes deployments, configure appropriate termination grace period:
|
|
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: Pod
|
|
spec:
|
|
terminationGracePeriodSeconds: 45 # Allow time for graceful shutdown
|
|
containers:
|
|
- name: gitea-mirror
|
|
# ... other configuration
|
|
```
|
|
|
|
## Monitoring and Debugging
|
|
|
|
### Logs
|
|
|
|
The application provides detailed logging during shutdown:
|
|
|
|
```
|
|
🛑 Graceful shutdown initiated by signal: SIGTERM
|
|
📊 Shutdown status: 2 active jobs, 1 callbacks
|
|
📝 Step 1: Saving active job states...
|
|
Saving state for job abc-123...
|
|
✅ Saved state for job abc-123
|
|
🔧 Step 2: Executing shutdown callbacks...
|
|
✅ Shutdown callback 1 completed
|
|
💾 Step 3: Closing database connections...
|
|
✅ Graceful shutdown completed successfully
|
|
```
|
|
|
|
### Status Endpoints
|
|
|
|
Check shutdown manager status via API:
|
|
|
|
```bash
|
|
# Get current status (if application is running)
|
|
curl http://localhost:4321/api/health
|
|
```
|
|
|
|
### Troubleshooting
|
|
|
|
**Problem**: Jobs not resuming after restart
|
|
- **Check**: Startup recovery logs for errors
|
|
- **Verify**: Database contains interrupted jobs with correct status
|
|
- **Test**: Run `bun run startup-recovery` manually
|
|
|
|
**Problem**: Shutdown timeout reached
|
|
- **Check**: Job complexity and database performance
|
|
- **Adjust**: Increase `SHUTDOWN_TIMEOUT` environment variable
|
|
- **Monitor**: Database connection and disk I/O during shutdown
|
|
|
|
**Problem**: Container force-killed
|
|
- **Check**: Container orchestrator termination grace period
|
|
- **Adjust**: Increase grace period to allow shutdown completion
|
|
- **Monitor**: Application shutdown logs for timing issues
|
|
|
|
## Best Practices
|
|
|
|
### Development
|
|
- Always test graceful shutdown during development
|
|
- Use the provided test scripts to verify functionality
|
|
- Monitor logs for shutdown timing and job state persistence
|
|
|
|
### Production
|
|
- Set appropriate container termination grace periods
|
|
- Monitor shutdown logs for performance issues
|
|
- Use health checks to verify application readiness after restart
|
|
- Consider job complexity when planning maintenance windows
|
|
|
|
### Monitoring
|
|
- Track job recovery success rates
|
|
- Monitor shutdown duration metrics
|
|
- Alert on forced terminations or recovery failures
|
|
- Log analysis for shutdown pattern optimization
|
|
|
|
## Future Enhancements
|
|
|
|
Planned improvements for future versions:
|
|
|
|
1. **Configurable Timeouts**: Environment variable configuration for all timeouts
|
|
2. **Shutdown Metrics**: Prometheus metrics for shutdown performance
|
|
3. **Progressive Shutdown**: Graceful degradation of service capabilities
|
|
4. **Job Prioritization**: Priority-based job saving during shutdown
|
|
5. **Health Check Integration**: Readiness probes during shutdown process
|