#1200: Race condition in parallel git-fetch • .git Internal

SK

Sarah Kim opened this issue 2 hours ago • Updated 15 mins ago

Problem Description

We are seeing intermittent failures in the deployment pipeline when multiple edge nodes attempt to git-fetch simultaneously from the same high-churn repository. The lock mechanism appears to be releasing prematurely, causing corrupted index files on ~3% of concurrent pulls.

Steps to Reproduce

Spin up 10+ edge nodes using the git-v2.0.4 image.
Trigger a massive parallel fetch job against the repo-large target.
Observe logs for index-pack: fatal: index corruption errors.

Expected Behavior

Parallel fetches should serialize access to the index safely or use separate temp directories without collision.

Stack Trace

                                                stack-trace.log
                                                
                                            
thread'main' panicked at 'lock_file.rs:45: failed to acquire write lock':
  Error: Resource temporarily unavailable

at     git_engine::index::lock::acquire
             at src/index/lock.rs:45
   at     git_engine::fetch::parallel::worker
             at src/fetch/parallel.rs:112\n   at     std::sys_common::backtrace::__rust_begin_short_backtrace

Priority elevated to P0 due to impact on production deployments during peak hours.

AR

Alex Rivera commented 45 mins ago

Reproduced this locally. The issue stems from the file-lock timeout being set too low for high-latency network calls. Increasing the timeout to 5000ms stabilizes the test, but I think a better approach is to use O_EXCL flag on Linux to avoid the retry loop entirely.

I'll open a PR shortly to refactor the lock acquisition logic.

🏷️

Sarah Kim added labels P0: Critical performance

👤

Sarah Kim assigned Alex Rivera

Race condition in parallel git-fetch on edge nodes #1200 In Progress

Problem Description

Steps to Reproduce

Expected Behavior

Stack Trace