mirror of
https://github.com/RayLabsHQ/gitea-mirror.git
synced 2025-12-06 11:36:44 +03:00
chore: remove duplicate fix summary
This commit is contained in:
@@ -1,348 +0,0 @@
|
||||
# Fix for Issue #115: Duplicate Timestamped Repositories
|
||||
|
||||
## Problem Summary
|
||||
|
||||
Users were experiencing hundreds/thousands of duplicate repositories with timestamps in their names (e.g., `repo-owner-1729123456789`). The issue was caused by a combination of:
|
||||
|
||||
1. **Timestamp fallback** creating infinite unique names
|
||||
2. **Race conditions** in concurrent mirror operations
|
||||
3. **Retry logic** compounding the duplication
|
||||
4. **No idempotency** checks to prevent duplicate operations
|
||||
|
||||
## Root Cause Analysis
|
||||
|
||||
### Primary Issue: Timestamp Fallback (Line 736-738)
|
||||
```typescript
|
||||
// OLD CODE (REMOVED)
|
||||
const timestamp = Date.now();
|
||||
return `${baseName}-${githubOwner}-${timestamp}`;
|
||||
```
|
||||
|
||||
When the `generateUniqueRepoName` function exhausted all 10 naming attempts:
|
||||
- It fell back to using `Date.now()` timestamp
|
||||
- Each retry got a different timestamp → unique name
|
||||
- Retry logic (2 retries + exponential backoff) multiplied the problem
|
||||
- Result: 100s-1000s of timestamped duplicates per repository
|
||||
|
||||
### Contributing Factors
|
||||
|
||||
1. **Concurrent Processing**: 3-10 repositories mirrored in parallel
|
||||
2. **Periodic Sync**: Users running sync while previous sync still in progress
|
||||
3. **Long-Running Mirrors**: Large repos timing out, then retrying
|
||||
4. **No Race Protection**: Multiple processes could start mirroring same repo
|
||||
|
||||
## Solution Implemented
|
||||
|
||||
### 1. ✅ Removed Timestamp Fallback
|
||||
**Location**: `src/lib/gitea.ts:817-826`
|
||||
|
||||
```typescript
|
||||
// NEW CODE
|
||||
throw new Error(
|
||||
`Unable to generate unique repository name for "${baseName}". ` +
|
||||
`All ${maxAttempts} naming attempts resulted in conflicts. ` +
|
||||
`Please manually resolve the naming conflict or adjust your duplicate strategy.`
|
||||
);
|
||||
```
|
||||
|
||||
**Impact**:
|
||||
- Prevents infinite duplicate creation
|
||||
- Forces manual resolution of genuine conflicts
|
||||
- Provides clear error message to user
|
||||
|
||||
### 2. ✅ Added Database Idempotency Check
|
||||
**Location**: `src/lib/gitea.ts:207-291`
|
||||
|
||||
```typescript
|
||||
export const isRepoCurrentlyMirroring = async ({
|
||||
config,
|
||||
repoName,
|
||||
expectedLocation,
|
||||
}: {
|
||||
config: Partial<Config>;
|
||||
repoName: string;
|
||||
expectedLocation?: string;
|
||||
}): Promise<boolean>
|
||||
```
|
||||
|
||||
**Features**:
|
||||
- Checks database for repos in "mirroring" or "syncing" status
|
||||
- Validates expected location to catch same-name conflicts
|
||||
- **Stale operation detection**: Ignores operations stuck >2 hours
|
||||
- Safe error handling: Returns `false` on errors (allows operation to proceed)
|
||||
|
||||
### 3. ✅ Dual Idempotency Guards with Early Location Assignment
|
||||
**Locations**:
|
||||
- `mirrorGithubRepoToGitea`: Lines 363-378 & 447-473
|
||||
- `mirrorGitHubRepoToGiteaOrg`: Lines 876-891 & 984-1010
|
||||
|
||||
```typescript
|
||||
// FIRST CHECK: Before starting mirror operation
|
||||
const expectedLocation = `${repoOwner}/${targetRepoName}`;
|
||||
const isCurrentlyMirroring = await isRepoCurrentlyMirroring({
|
||||
config,
|
||||
repoName: targetRepoName,
|
||||
expectedLocation,
|
||||
});
|
||||
|
||||
if (isCurrentlyMirroring) {
|
||||
console.log(`[Idempotency] Skipping - already being mirrored`);
|
||||
return;
|
||||
}
|
||||
|
||||
// ... check if repo exists in Gitea ...
|
||||
|
||||
// SECOND CHECK: Right before setting "mirroring" status
|
||||
const finalCheck = await isRepoCurrentlyMirroring({
|
||||
config,
|
||||
repoName: targetRepoName,
|
||||
expectedLocation,
|
||||
});
|
||||
|
||||
if (finalCheck) {
|
||||
console.log(`[Idempotency] Race condition detected. Skipping.`);
|
||||
return;
|
||||
}
|
||||
|
||||
// CRITICAL FIX: Set mirroredLocation IMMEDIATELY when setting status
|
||||
// This allows idempotency checks to detect concurrent operations on first mirror
|
||||
await db.update(repositories).set({
|
||||
status: "mirroring",
|
||||
mirroredLocation: expectedLocation, // ← Set target location NOW
|
||||
updatedAt: new Date(),
|
||||
});
|
||||
```
|
||||
|
||||
**Why This Works**:
|
||||
- **First check**: Catches obvious duplicates early (saves resources)
|
||||
- **Second check**: Catches race conditions in the tiny window between first check and status update
|
||||
- **Early location set**: Makes `mirroredLocation` the "target location" so concurrent checks can match it
|
||||
- **Together**: Provides bulletproof protection against concurrent operations
|
||||
|
||||
**Critical Fix Explained**:
|
||||
Previously, `mirroredLocation` was only set AFTER successful mirror completion. This meant:
|
||||
- Process A sets status="mirroring", mirroredLocation=""
|
||||
- Process B checks and finds status="mirroring" BUT mirroredLocation="" ≠ expectedLocation
|
||||
- Process B proceeds anyway, creating duplicate!
|
||||
|
||||
Now we set `mirroredLocation` immediately, making it the "intent" field that blocks concurrent operations.
|
||||
|
||||
### 4. ✅ Stale Operation Recovery
|
||||
**Location**: `src/lib/gitea.ts:240-264`
|
||||
|
||||
```typescript
|
||||
const TWO_HOURS_MS = 2 * 60 * 60 * 1000;
|
||||
const activeRepos = inProgressRepos.filter((repo) => {
|
||||
if (!repo.updatedAt) return true;
|
||||
const updatedTime = new Date(repo.updatedAt).getTime();
|
||||
const isStale = (now - updatedTime) > TWO_HOURS_MS;
|
||||
|
||||
if (isStale) {
|
||||
console.warn(`Repository has been stuck for over 2 hours. Allowing retry.`);
|
||||
}
|
||||
|
||||
return !isStale;
|
||||
});
|
||||
```
|
||||
|
||||
**Prevents**:
|
||||
- Repos permanently stuck in "mirroring" status after crashes
|
||||
- Blocking legitimate retries for genuinely failed operations
|
||||
- Manual database cleanup requirement
|
||||
|
||||
## Testing
|
||||
|
||||
### Regression Testing
|
||||
✅ All 11 existing tests in `gitea.test.ts` pass
|
||||
- No breaking changes to existing functionality
|
||||
- Backward compatible implementation
|
||||
|
||||
### Edge Cases Handled
|
||||
1. ✅ Missing `userId` in config → Returns `false` (safe)
|
||||
2. ✅ Database query failure → Caught and returns `false` (safe)
|
||||
3. ✅ Stale operations (>2h) → Allows retry
|
||||
4. ✅ Race conditions → Dual-check system prevents duplicates
|
||||
5. ✅ Concurrent operations → Database queries are atomic
|
||||
6. ✅ Error in `generateUniqueRepoName` → Propagates up, repo marked "failed"
|
||||
|
||||
## Expected Behavior After Fix
|
||||
|
||||
### Scenario 1: First Mirror Attempt
|
||||
1. Check if repo is being mirrored → NO
|
||||
2. Check if repo exists in Gitea → NO
|
||||
3. Double-check before status update → NO
|
||||
4. Set status to "mirroring"
|
||||
5. ✅ Mirror proceeds normally
|
||||
|
||||
### Scenario 2: Concurrent Mirror Attempts (Race Condition)
|
||||
1. **Process A**: First check → NO (not being mirrored)
|
||||
2. **Process B**: First check → NO (not being mirrored yet)
|
||||
3. **Process A**: Double-check → NO, sets status to "mirroring"
|
||||
4. **Process B**: Double-check → **YES** (Process A set status)
|
||||
5. **Process B**: Skips with log message
|
||||
6. ✅ Only one mirror operation proceeds
|
||||
|
||||
### Scenario 3: Retry After Failure
|
||||
1. Previous attempt failed:
|
||||
- Status changed from "mirroring" → "failed"
|
||||
- mirroredLocation remains set (e.g., "org/repo")
|
||||
2. User clicks retry
|
||||
3. Check if currently being mirrored:
|
||||
- Query filters by status="mirroring" OR status="syncing"
|
||||
- Repo has status="failed" → NOT matched
|
||||
- Returns FALSE
|
||||
4. ✅ Retry proceeds normally
|
||||
5. Sets status="mirroring", mirroredLocation="org/repo"
|
||||
6. ✅ Now protected from concurrent retries
|
||||
|
||||
### Scenario 4: Stale Operation Recovery
|
||||
1. Previous mirror attempt crashed 3 hours ago (still status="mirroring")
|
||||
2. New sync attempt starts
|
||||
3. Check detects "mirroring" but sees it's > 2 hours old
|
||||
4. Considers it stale and returns `false`
|
||||
5. ✅ New mirror proceeds (recovers stuck operation)
|
||||
|
||||
### Scenario 5: Naming Conflict (10 Attempts Exhausted)
|
||||
1. Tries: `repo-owner`, `repo-owner-1`, ..., `repo-owner-9`
|
||||
2. All 10 attempts conflict with existing repos
|
||||
3. ❌ Throws error with clear message
|
||||
4. User sees: "Unable to generate unique repository name..."
|
||||
5. ✅ No timestamp duplicates created
|
||||
|
||||
## Upgrade Path
|
||||
|
||||
### For Users With Existing Duplicates
|
||||
1. The fix prevents NEW duplicates from being created
|
||||
2. Existing timestamped duplicates will remain
|
||||
3. Recommended cleanup process:
|
||||
```bash
|
||||
# Identify timestamped repos
|
||||
# Pattern: *-1234567890* (ends with timestamp)
|
||||
|
||||
# Keep only the latest timestamp version
|
||||
# Delete older duplicates manually via Gitea UI or API
|
||||
```
|
||||
|
||||
### Configuration Recommendations
|
||||
- **Mirror Strategy**: Review `starredDuplicateStrategy` setting
|
||||
- **Concurrency**: Reduce if still experiencing conflicts
|
||||
- **Sync Frequency**: Ensure previous sync completes before starting new one
|
||||
|
||||
## Logging & Monitoring
|
||||
|
||||
### New Log Messages
|
||||
```
|
||||
[Idempotency] Repository X is already being mirrored at Y
|
||||
[Idempotency] Skipping X - already being mirrored to Y
|
||||
[Idempotency] Race condition detected - X is now being mirrored by another process
|
||||
[Idempotency] Repository X has been in "mirroring" status for over 2 hours
|
||||
[Idempotency] All in-progress operations for X are stale (>2h). Allowing retry.
|
||||
```
|
||||
|
||||
### Error Messages
|
||||
```
|
||||
Unable to generate unique repository name for "X".
|
||||
All 10 naming attempts resulted in conflicts.
|
||||
Please manually resolve the naming conflict or adjust your duplicate strategy.
|
||||
```
|
||||
|
||||
## Performance Impact
|
||||
|
||||
- **Minimal overhead**: 2 additional database queries per mirror operation
|
||||
- **Reduced load**: Prevents hundreds of unnecessary Gitea API calls
|
||||
- **Improved reliability**: Fewer failed operations due to conflicts
|
||||
- **Faster completion**: Skips duplicate work immediately
|
||||
|
||||
## Breaking Changes
|
||||
|
||||
**None** - This is a backward-compatible fix that only changes error behavior:
|
||||
- Before: Created timestamped duplicates on conflict
|
||||
- After: Throws error on genuine conflicts
|
||||
|
||||
## Code Quality
|
||||
|
||||
- ✅ Follows existing code patterns in the project
|
||||
- ✅ Consistent error handling with `createSecureErrorResponse`
|
||||
- ✅ Clear, descriptive logging for debugging
|
||||
- ✅ Comprehensive inline documentation
|
||||
- ✅ No hardcoded values (timeout configurable via constant)
|
||||
|
||||
## Critical Review Finding & Fix
|
||||
|
||||
### Issue Discovered During Code Review
|
||||
|
||||
During thorough code review, a critical flaw was identified in the initial idempotency implementation:
|
||||
|
||||
**The Problem:**
|
||||
- Idempotency check relied on matching `mirroredLocation === expectedLocation`
|
||||
- But `mirroredLocation` was only set AFTER successful mirror completion
|
||||
- On first mirror attempt, `mirroredLocation` was `""` (empty)
|
||||
- Concurrent processes could both see status="mirroring" but fail the location match
|
||||
- Result: Race condition still existed!
|
||||
|
||||
**The Timeline:**
|
||||
```
|
||||
t=0ms: Process A checks → mirroredLocation="" → No match → Proceeds
|
||||
t=5ms: Process A sets status="mirroring", mirroredLocation="" (still empty!)
|
||||
t=10ms: Process B checks → Finds status="mirroring"
|
||||
t=11ms: Process B checks mirroredLocation: "" === "org/repo"? → FALSE
|
||||
t=12ms: Process B proceeds anyway → DUPLICATE CREATED!
|
||||
```
|
||||
|
||||
**The Fix:**
|
||||
Set `mirroredLocation = expectedLocation` immediately when setting status to "mirroring":
|
||||
|
||||
```typescript
|
||||
await db.update(repositories).set({
|
||||
status: "mirroring",
|
||||
mirroredLocation: expectedLocation, // Critical addition
|
||||
updatedAt: new Date(),
|
||||
});
|
||||
```
|
||||
|
||||
**Semantic Change:**
|
||||
- **Before**: `mirroredLocation` = confirmed location (set after success)
|
||||
- **After**: `mirroredLocation` = target/intent location (set at start)
|
||||
|
||||
This is acceptable because:
|
||||
1. Status field still indicates success/failure
|
||||
2. Failed mirrors have status="failed" making the location irrelevant
|
||||
3. Actually improves debugging (can see WHERE mirror was attempted)
|
||||
4. Required for idempotency checks to work correctly
|
||||
|
||||
### Files Modified
|
||||
- `src/lib/gitea.ts:466-473` - mirrorGithubRepoToGitea
|
||||
- `src/lib/gitea.ts:1003-1010` - mirrorGitHubRepoToGiteaOrg
|
||||
|
||||
## Related Issues
|
||||
|
||||
- Fixes #115: "Large quantity of timestamped duplicate repos"
|
||||
- Prevents future occurrences of similar race conditions
|
||||
- Improves overall system reliability and data integrity
|
||||
|
||||
## Future Improvements (Optional)
|
||||
|
||||
1. **Database-level unique constraint**: Add unique index on (userId, name, mirroredLocation)
|
||||
2. **Configurable timeout**: Make 2-hour stale timeout user-configurable
|
||||
3. **Automatic cleanup**: Background job to detect and merge/delete duplicates
|
||||
4. **Metrics**: Track duplicate prevention statistics
|
||||
5. **Admin dashboard**: Show currently in-progress mirrors
|
||||
|
||||
## Verification Steps
|
||||
|
||||
To verify the fix is working:
|
||||
|
||||
1. **Check logs** for `[Idempotency]` messages
|
||||
2. **Monitor database** for repos stuck in "mirroring" status
|
||||
3. **Run concurrent syncs** - should see skip messages
|
||||
4. **Trigger naming conflicts** - should see error messages (not timestamps)
|
||||
5. **Check Gitea** - no new timestamped repositories created
|
||||
|
||||
---
|
||||
|
||||
**Branch**: `fix/duplicate-repos-issue-115`
|
||||
**Files Changed**: `src/lib/gitea.ts`
|
||||
**Lines Added**: ~140
|
||||
**Lines Removed**: ~3
|
||||
**Tests**: 11/11 passing
|
||||
**Status**: ✅ Ready for review
|
||||
Reference in New Issue
Block a user