First draft at resolving,
https://github.com/goodboy/tractor/issues/424
`tests.conftest.py.daemon()` previously used a blind
`time.sleep(_PROC_SPAWN_WAIT + uds_bonus + ci_bonus)` to "wait for the
daemon to come up" before yielding the proc to the test.
Two problems:
1. **Racy under load** — sleep is fixed at design time; loaded boxes
/ cold starts / fork-spawn cost spikes blow past it, leading to
`ConnectionRefusedError` /`OSError: connect failed` flakes in
`test_register_duplicate_name`.
2. **Wasteful when daemon comes up fast** — happy-path pays the FULL
sleep regardless. ~3s of dead time per fixture invocation, ~10-20s
per full suite run.
Replace with `_wait_for_daemon_ready()` — active poll via stdlib
`socket.create_connection` (TCP) or `socket.connect` (UDS) on the
daemon's bind addr, with 50ms backoff and a 10s/15s deadline (CI gets
extra headroom). Daemon-died-during-startup early-exit catches the case
where `_PROC_SPAWN_WAIT` was silently masking daemon startup crashes.
Why stdlib `socket` (Option 2 from the conc-anal doc) instead of
`tractor`'s own `_root.ping_tpt_socket` closure or trio?
- `tractor.run_daemon()` doesn't return from bootstrap until the runtime
is fully ready to handle IPC, so probing listen-side acceptance is
sufficient.
- no need to do the full IPC handshake just to validate readiness.
Sidesteps the `trio.run()` bootstrap cost (~50ms) per fixture too.
`claude`'s verification: 10/10 runs of `tests/test_multi_program.py`
pass on both `--tpt-proto=tcp` and `--tpt-proto=uds`. Per-test wall-time
`test_register_duplicate_name`: 4.31s → 1.10s. Full file: ~12s → 3.27s
per transport.
Doc-tracked at:
`ai/conc-anal/test_register_duplicate_name_daemon_connect_race_issue.md`
Future work — session-scoped trio runtime in a bg thread to share
fixture-side trio operations across many fixtures (currently overkill
for the one fixture that needs it).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code