tractor

2.0 KiB

Raw Blame History

Here’s the problem. The per-key locking doesn’t fix this race because the lock lifetime is decoupled from run_ctx’s lifetime. Trace through what happens:

maybe_open_context finally (lines 468-495): 1. users[ctx_key] -= 1 → 0 2. no_more_users.set() → wakes run_ctx (but it doesn’t run yet, just becomes ready) 3. _Cache.locks.pop(ctx_key) → lock is gone 4. maybe_open_context exits

run_ctx now gets scheduled (lines 237-241): 5. Inner finally: values.pop(ctx_key) → values gone 6. mng.__aexit__() runs → in_aexit.set(), await trio.sleep(10) → CHECKPOINT

reenter_during_teardown wakes: 7. Enters maybe_open_context → no lock exists for ctx_key → creates fresh lock, acquires immediately 8. values[ctx_key] → KeyError 9. resources.get(ctx_key) → still exists! (step 6 hasn’t reached resources.pop() in run_ctx’s outer finally yet) 10. raise RuntimeError('Caching resources ALREADY exist?!')

The core issue: no_more_users.set() (step 2) and locks.pop() (step 3) happen in the exiting caller’s task, but the actual resource cleanup (values.pop, acm __aexit__, resources.pop) happens in the run_ctx task inside service_tn. There’s no synchronization between “lock removed, new callers welcome” and “run_ctx has actually finished tearing down”. The per-ctx_key lock change just made the lock more granular — it didn’t close this gap.

The fix needs to ensure a new caller can’t acquire a lock for ctx_key until run_ctx has fully exited (including resources.pop()). Either: - don’t pop the lock in maybe_open_context — let run_ctx release/pop it after resources.pop() - wait on a “teardown complete” event in the cache-miss path when stale resources exist - pop resources before the acm __aexit__ (move resources.pop() into the inner finally alongside values.pop())

2.0 KiB Raw Blame History Unescape Escape

2.0 KiB

Raw Blame History