feat: Implement comprehensive job recovery and resume process improvements

- Added a startup recovery script to handle interrupted jobs before application startup.
- Enhanced recovery system with database connection validation and stale job cleanup.
- Improved middleware to check for recovery needs and handle recovery during requests.
- Updated health check endpoint to include recovery system status and metrics.
- Introduced test scripts for verifying recovery functionality and job state management.
- Enhanced logging and error handling throughout the recovery process.
This commit is contained in:
Arunavo Ray
2025-05-24 13:45:25 +05:30
parent 98610482ae
commit a988be1028
10 changed files with 1006 additions and 111 deletions

View File

@@ -118,6 +118,27 @@ bun scripts/fix-interrupted-jobs.ts <userId>
Use this script if you're having trouble cleaning up activities due to "interrupted" jobs that won't delete.
### Startup Recovery (startup-recovery.ts)
Runs job recovery during application startup to handle any interrupted jobs from previous runs.
```bash
# Run startup recovery (normal mode)
bun scripts/startup-recovery.ts
# Force recovery even if recent attempt was made
bun scripts/startup-recovery.ts --force
# Set custom timeout (default: 30000ms)
bun scripts/startup-recovery.ts --timeout=60000
# Using npm scripts
bun run startup-recovery
bun run startup-recovery-force
```
This script is automatically run by the Docker entrypoint during container startup. It ensures that any jobs interrupted by container restarts or application crashes are properly recovered or marked as failed.
## Deployment Scripts
### Docker Deployment