feat: implement automatic database cleanup with cron jobs for events and mirror jobs

This commit is contained in:
Arunavo Ray
2025-05-23 12:15:34 +05:30
parent 3bb85a4cdb
commit 7d32112369
12 changed files with 204 additions and 58 deletions

View File

@@ -2,7 +2,7 @@
FROM oven/bun:1.2.9-alpine AS base
WORKDIR /app
RUN apk add --no-cache libc6-compat python3 make g++ gcc wget sqlite openssl
RUN apk add --no-cache libc6-compat python3 make g++ gcc wget sqlite openssl cron
# ----------------------------
FROM base AS deps

View File

@@ -14,7 +14,7 @@
```bash
# Using Docker (recommended)
docker compose --profile production up -d
docker compose up -d
# Using Bun
bun run setup && bun run dev
@@ -115,7 +115,7 @@ Gitea Mirror provides multi-architecture Docker images that work on both ARM64 (
```bash
# Start the application using Docker Compose
docker compose --profile production up -d
docker compose up -d
# For development mode (requires configuration)
# Ensure you have run bun run setup first
@@ -162,7 +162,7 @@ cp .env.example .env
./scripts/build-docker.sh --push
# Then run with Docker Compose
docker compose --profile production up -d
docker compose up -d
```
See [Docker build documentation](./scripts/README-docker.md) for more details.
@@ -470,7 +470,7 @@ Try the following steps:
> ghcr.io/arunavo4/gitea-mirror:latest
> ```
>
> For homelab/self-hosted setups, you can use the provided Docker Compose file with automatic event cleanup:
> For homelab/self-hosted setups, you can use the standard Docker Compose file which includes automatic database cleanup:
>
> ```bash
> # Clone the repository
@@ -478,10 +478,10 @@ Try the following steps:
> cd gitea-mirror
>
> # Start the application with Docker Compose
> docker-compose -f docker-compose.homelab.yml up -d
> docker compose up -d
> ```
>
> This setup includes a cron job that runs daily to clean up old events and prevent the database from growing too large.
> This setup includes automatic database maintenance that runs daily to clean up old events and mirror jobs, preventing the database from growing too large. You can customize the retention periods by setting the `EVENTS_RETENTION_DAYS` and `JOBS_RETENTION_DAYS` environment variables.
#### Database Maintenance
@@ -504,14 +504,29 @@ Try the following steps:
>
> # Clean up old events with custom retention period (e.g., 30 days)
> bun run cleanup-events 30
>
> # Clean up old mirror jobs (keeps last 7 days by default)
> bun run cleanup-jobs
>
> # Clean up old mirror jobs with custom retention period (e.g., 30 days)
> bun run cleanup-jobs 30
>
> # Clean up both events and mirror jobs
> bun run cleanup-all
> ```
>
> For automated maintenance, consider setting up a cron job to run the cleanup script periodically:
> For automated maintenance, consider setting up cron jobs to run the cleanup scripts periodically:
>
> ```bash
> # Add this to your crontab (runs daily at 2 AM)
> # Add these to your crontab
> # Clean up events daily at 2 AM
> 0 2 * * * cd /path/to/gitea-mirror && bun run cleanup-events
>
> # Clean up mirror jobs daily at 3 AM
> 0 3 * * * cd /path/to/gitea-mirror && bun run cleanup-jobs
> ```
>
> **Note:** When using Docker, these cleanup jobs are automatically scheduled inside the container with the default retention period of 7 days. You can customize the retention periods by setting the `EVENTS_RETENTION_DAYS` and `JOBS_RETENTION_DAYS` environment variables in your docker-compose file.
> [!NOTE]

View File

@@ -1,4 +0,0 @@
# Run event cleanup daily at 2 AM
0 2 * * * cd /app && bun run cleanup-events 30 >> /app/data/cleanup-events.log 2>&1
# Empty line at the end is required for cron to work properly

View File

@@ -1,38 +0,0 @@
version: '3.8'
services:
gitea-mirror:
image: ghcr.io/arunavo4/gitea-mirror:latest
container_name: gitea-mirror
restart: unless-stopped
ports:
- "4321:4321"
volumes:
- gitea-mirror-data:/app/data
# Mount the crontab file
- ./crontab:/etc/cron.d/gitea-mirror-cron
environment:
- NODE_ENV=production
- HOST=0.0.0.0
- PORT=4321
- DATABASE_URL=sqlite://data/gitea-mirror.db
- DELAY=${DELAY:-3600}
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:4321/api/health"]
interval: 1m
timeout: 10s
retries: 3
start_period: 30s
# Install cron in the container and set up the cron job
command: >
sh -c "
apt-get update && apt-get install -y cron curl &&
chmod 0644 /etc/cron.d/gitea-mirror-cron &&
crontab /etc/cron.d/gitea-mirror-cron &&
service cron start &&
bun dist/server/entry.mjs
"
# Define named volumes for database persistence
volumes:
gitea-mirror-data: # Database volume

View File

@@ -1,8 +1,7 @@
# Gitea Mirror deployment configuration
# - production: Standard deployment with real data
# Standard deployment with automatic database maintenance
services:
# Production service with real data
gitea-mirror:
image: ${DOCKER_REGISTRY:-ghcr.io}/${DOCKER_IMAGE:-arunavo4/gitea-mirror}:${DOCKER_TAG:-latest}
build:
@@ -42,13 +41,15 @@ services:
- GITEA_ORGANIZATION=${GITEA_ORGANIZATION:-github-mirrors}
- GITEA_ORG_VISIBILITY=${GITEA_ORG_VISIBILITY:-public}
- DELAY=${DELAY:-3600}
# Database maintenance settings
- EVENTS_RETENTION_DAYS=${EVENTS_RETENTION_DAYS:-7}
- JOBS_RETENTION_DAYS=${JOBS_RETENTION_DAYS:-7}
healthcheck:
test: ["CMD", "wget", "--no-verbose", "--tries=3", "--spider", "http://localhost:4321/api/health"]
interval: 30s
timeout: 10s
retries: 5
start_period: 15s
profiles: ["production"]
# Define named volumes for database persistence
volumes:

View File

@@ -30,6 +30,25 @@ if [ "$JWT_SECRET" = "your-secret-key-change-this-in-production" ] || [ -z "$JWT
echo "JWT_SECRET has been set to a secure random value"
fi
# Set up automatic database cleanup cron job
# Default to 7 days retention for events and mirror jobs unless specified by environment variables
EVENTS_RETENTION_DAYS=${EVENTS_RETENTION_DAYS:-7}
JOBS_RETENTION_DAYS=${JOBS_RETENTION_DAYS:-7}
# Create cron directory if it doesn't exist
mkdir -p /app/data/cron
# Create the cron job file
cat > /app/data/cron/cleanup-cron <<EOF
# Run event cleanup daily at 2 AM
0 2 * * * cd /app && bun dist/scripts/cleanup-events.js ${EVENTS_RETENTION_DAYS} >> /app/data/cleanup-events.log 2>&1
# Run mirror jobs cleanup daily at 3 AM
0 3 * * * cd /app && bun dist/scripts/cleanup-mirror-jobs.js ${JOBS_RETENTION_DAYS} >> /app/data/cleanup-mirror-jobs.log 2>&1
# Empty line at the end is required for cron to work properly
EOF
# Skip dependency installation entirely for pre-built images
# Dependencies are already installed during the Docker build process
@@ -204,6 +223,33 @@ if [ -f "package.json" ]; then
echo "Setting application version: $npm_package_version"
fi
# Set up cron if it's available
if command -v crontab >/dev/null 2>&1; then
echo "Setting up automatic database cleanup cron jobs..."
# Install cron if not already installed
if ! command -v cron >/dev/null 2>&1; then
echo "Installing cron..."
apt-get update && apt-get install -y cron
fi
# Install the cron job
crontab /app/data/cron/cleanup-cron
# Start cron service
if command -v service >/dev/null 2>&1; then
service cron start
echo "Cron service started"
elif command -v cron >/dev/null 2>&1; then
cron
echo "Cron daemon started"
else
echo "Warning: Could not start cron service. Automatic database cleanup will not run."
fi
else
echo "Warning: crontab command not found. Automatic database cleanup will not be set up."
echo "Consider setting up external scheduled tasks to run cleanup scripts."
fi
# Start the application
echo "Starting Gitea Mirror..."
exec bun ./dist/server/entry.mjs

View File

@@ -18,6 +18,8 @@
"fix-db": "bun scripts/manage-db.ts fix",
"reset-users": "bun scripts/manage-db.ts reset-users",
"cleanup-events": "bun scripts/cleanup-events.ts",
"cleanup-jobs": "bun scripts/cleanup-mirror-jobs.ts",
"cleanup-all": "bun scripts/cleanup-events.ts && bun scripts/cleanup-mirror-jobs.ts",
"preview": "bunx --bun astro preview",
"start": "bun dist/server/entry.mjs",
"start:fresh": "bun run cleanup-db && bun run manage-db init && bun run update-db && bun dist/server/entry.mjs",

View File

@@ -47,12 +47,12 @@ The script uses environment variables from the `.env` file in the project root:
# First build the image
./scripts/build-docker.sh --load
# Then run using docker-compose for development
docker-compose -f ../docker-compose.dev.yml up -d
# Or for production
docker-compose --profile production up -d
docker compose up -d
```
## Diagnostics Script

View File

@@ -80,7 +80,7 @@ bun scripts/cleanup-events.ts
bun scripts/cleanup-events.ts 14
```
This script can be scheduled to run periodically (e.g., daily) using cron or another scheduler.
This script can be scheduled to run periodically (e.g., daily) using cron or another scheduler. When using Docker, this is automatically scheduled to run daily.
### Mark Events as Read (mark-events-read.ts)
@@ -94,6 +94,20 @@ bun scripts/mark-events-read.ts
For testing purposes, this script modifies event timestamps to make them appear older.
### Mirror Jobs Cleanup (cleanup-mirror-jobs.ts)
Removes old mirror jobs from the database to prevent it from growing too large.
```bash
# Remove mirror jobs older than 7 days (default)
bun scripts/cleanup-mirror-jobs.ts
# Remove mirror jobs older than X days
bun scripts/cleanup-mirror-jobs.ts 14
```
This script can be scheduled to run periodically (e.g., daily) using cron or another scheduler. When using Docker, this is automatically scheduled to run daily.
```bash
bun scripts/make-events-old.ts
```

View File

@@ -0,0 +1,102 @@
#!/usr/bin/env bun
/**
* Script to clean up old mirror jobs from the database
* This script should be run periodically (e.g., daily) to prevent the mirror_jobs table from growing too large
*
* Usage:
* bun scripts/cleanup-mirror-jobs.ts [days]
*
* Where [days] is the number of days to keep mirror jobs (default: 7)
*/
import { db, mirrorJobs } from "../src/lib/db";
import { lt, and, eq } from "drizzle-orm";
// Parse command line arguments
const args = process.argv.slice(2);
const daysToKeep = args.length > 0 ? parseInt(args[0], 10) : 7;
if (isNaN(daysToKeep) || daysToKeep < 1) {
console.error("Error: Days to keep must be a positive number");
process.exit(1);
}
/**
* Cleans up old mirror jobs to prevent the database from growing too large
* Should be called periodically (e.g., daily via a cron job)
*
* @param maxAgeInDays Number of days to keep mirror jobs (default: 7)
* @returns Object containing the number of completed and in-progress jobs deleted
*/
async function cleanupOldMirrorJobs(
maxAgeInDays: number = 7
): Promise<{ completedJobsDeleted: number; inProgressJobsDeleted: number }> {
try {
console.log(`Cleaning up mirror jobs older than ${maxAgeInDays} days...`);
// Calculate the cutoff date for completed jobs
const cutoffDate = new Date();
cutoffDate.setDate(cutoffDate.getDate() - maxAgeInDays);
// Delete completed jobs older than the cutoff date
// Only delete jobs that are not in progress (inProgress = false)
const completedResult = await db
.delete(mirrorJobs)
.where(
and(
eq(mirrorJobs.inProgress, false),
lt(mirrorJobs.timestamp, cutoffDate)
)
);
const completedJobsDeleted = completedResult.changes || 0;
console.log(`Deleted ${completedJobsDeleted} completed mirror jobs`);
// Calculate a much older cutoff date for in-progress jobs (3x the retention period)
// This is to handle jobs that might have been abandoned or crashed
const inProgressCutoffDate = new Date();
inProgressCutoffDate.setDate(inProgressCutoffDate.getDate() - (maxAgeInDays * 3));
// Delete in-progress jobs that are significantly older
// This helps clean up jobs that might have been abandoned due to crashes
const inProgressResult = await db
.delete(mirrorJobs)
.where(
and(
eq(mirrorJobs.inProgress, true),
lt(mirrorJobs.timestamp, inProgressCutoffDate)
)
);
const inProgressJobsDeleted = inProgressResult.changes || 0;
console.log(`Deleted ${inProgressJobsDeleted} abandoned in-progress mirror jobs`);
return { completedJobsDeleted, inProgressJobsDeleted };
} catch (error) {
console.error("Error cleaning up old mirror jobs:", error);
return { completedJobsDeleted: 0, inProgressJobsDeleted: 0 };
}
}
// Run the cleanup
async function runCleanup() {
try {
console.log(`Starting mirror jobs cleanup (retention: ${daysToKeep} days)...`);
// Call the cleanupOldMirrorJobs function
const result = await cleanupOldMirrorJobs(daysToKeep);
console.log(`Cleanup summary:`);
console.log(`- Completed jobs deleted: ${result.completedJobsDeleted}`);
console.log(`- Abandoned in-progress jobs deleted: ${result.inProgressJobsDeleted}`);
console.log(`- Total jobs deleted: ${result.completedJobsDeleted + result.inProgressJobsDeleted}`);
console.log("Mirror jobs cleanup completed successfully");
} catch (error) {
console.error("Error running mirror jobs cleanup:", error);
process.exit(1);
}
}
// Run the cleanup
runCleanup();

View File

@@ -153,10 +153,18 @@ bun scripts/check-events.ts
# Clean up old events (default: older than 7 days)
bun scripts/cleanup-events.ts
# Clean up old mirror jobs (default: older than 7 days)
bun scripts/cleanup-mirror-jobs.ts
# Clean up both events and mirror jobs
bun run cleanup-all
# Mark all events as read
bun scripts/mark-events-read.ts
```
When using Docker, database cleanup is automatically scheduled to run daily. You can customize the retention periods by setting the `EVENTS_RETENTION_DAYS` and `JOBS_RETENTION_DAYS` environment variables in your docker-compose file.
### Health Check Endpoint
Gitea Mirror includes a built-in health check endpoint at `/api/health` that provides:

View File

@@ -37,7 +37,7 @@ Docker provides the easiest way to get started with minimal configuration.
2. Start the application in production mode:
```bash
docker-compose --profile production up -d
docker compose up -d
```
3. Access the application at [http://localhost:4321](http://localhost:4321)