All `daemon` fixture consumers are discovery-
protocol tests now living under `tests/discovery/`.
Move the fixture, its `_wait_for_daemon_ready`
helper, and `test_multi_program.py` into that subdir
so scope matches usage.
Also,
- add `pytestmark` for `track_orphaned_uds_per_test`
+ `detect_runaway_subactors_per_test` to `test_multi_program` as
regression net.
- drop now-unused `_PROC_SPAWN_WAIT` + `socket` import from root
conftest.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
First draft at resolving,
https://github.com/goodboy/tractor/issues/424
`tests.conftest.py.daemon()` previously used a blind
`time.sleep(_PROC_SPAWN_WAIT + uds_bonus + ci_bonus)` to "wait for the
daemon to come up" before yielding the proc to the test.
Two problems:
1. **Racy under load** — sleep is fixed at design time; loaded boxes
/ cold starts / fork-spawn cost spikes blow past it, leading to
`ConnectionRefusedError` /`OSError: connect failed` flakes in
`test_register_duplicate_name`.
2. **Wasteful when daemon comes up fast** — happy-path pays the FULL
sleep regardless. ~3s of dead time per fixture invocation, ~10-20s
per full suite run.
Replace with `_wait_for_daemon_ready()` — active poll via stdlib
`socket.create_connection` (TCP) or `socket.connect` (UDS) on the
daemon's bind addr, with 50ms backoff and a 10s/15s deadline (CI gets
extra headroom). Daemon-died-during-startup early-exit catches the case
where `_PROC_SPAWN_WAIT` was silently masking daemon startup crashes.
Why stdlib `socket` (Option 2 from the conc-anal doc) instead of
`tractor`'s own `_root.ping_tpt_socket` closure or trio?
- `tractor.run_daemon()` doesn't return from bootstrap until the runtime
is fully ready to handle IPC, so probing listen-side acceptance is
sufficient.
- no need to do the full IPC handshake just to validate readiness.
Sidesteps the `trio.run()` bootstrap cost (~50ms) per fixture too.
`claude`'s verification: 10/10 runs of `tests/test_multi_program.py`
pass on both `--tpt-proto=tcp` and `--tpt-proto=uds`. Per-test wall-time
`test_register_duplicate_name`: 4.31s → 1.10s. Full file: ~12s → 3.27s
per transport.
Doc-tracked at:
`ai/conc-anal/test_register_duplicate_name_daemon_connect_race_issue.md`
Future work — session-scoped trio runtime in a bg thread to share
fixture-side trio operations across many fixtures (currently overkill
for the one fixture that needs it).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
In top level `daemon`-fixture that is..
Use a local `bg_daemon_spawn_delay` instead of
mutating the module-level `_PROC_SPAWN_WAIT` —
previously each `daemon` fixture invocation would
permanently add 1.6s (UDS) or 1s (CI) to the
global, inflating delays across the session.
Also, emit a `test_log.warning()` when verbose
loglevel is silently reduced to `'info'`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Only override `tractor.log._default_loglevel` when
the flag is explicitly passed — lets per-spawn and
per-example `loglevel` kwargs take effect instead
of being clobbered by the hard-coded `'ERROR'`
default.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Move `--capture=sys` enforcement from a static ini
flag to a `pytest_load_initial_conftests()` bootstrap
hook that dynamically flips capture mode only when a
fork-based spawner (like `main_thread_forkserver`) is
detected; non-fork backends keep `--capture=fd`.
Also,
- load `tractor._testing.pytest` via `-p` in ini
(bc bootstrapping hooks must register before
conftest `pytest_plugins` runs).
- register `_reap` as sub-plugin via `pytest_plugins`
tuple in `._testing.pytest`.
- drop now-duplicate reap fixtures (already in `_reap`
per 1cdc7fb3).
- rename `tractor_enable_stackscope` dest -> `enable_stackscope`
and pop env var on disable.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Adopt the `@pytest.mark.skipon_spawn_backend('subint',
reason=...)` marker (a617b521) across the suites
reproducing the `subint` GIL-contention / starvation
hang classes doc'd in `ai/conc-anal/subint_*_issue.md`.
Deats,
- Module-level `pytestmark` on full-file-hanging suites:
- `tests/test_cancellation.py`
- `tests/test_inter_peer_cancellation.py`
- `tests/test_pubsub.py`
- `tests/test_shm.py`
- Per-test decorator where only one test in the file
hangs:
- `tests/discovery/test_registrar.py
::test_stale_entry_is_deleted` — replaces the
inline `if start_method == 'subint': pytest.skip`
branch with a declarative skip.
- `tests/test_subint_cancellation.py
::test_subint_non_checkpointing_child`.
- A few per-test decorators are left commented-in-
place as breadcrumbs for later finer-grained unskips.
Also, some nearby tidying in the affected files:
- Annotate loose fixture / test params
(`pytest.FixtureRequest`, `str`, `tuple`, `bool`) in
`tests/conftest.py`, `tests/devx/conftest.py`, and
`tests/test_cancellation.py`.
- Normalize `"""..."""` → `'''...'''` docstrings per
repo convention on a few touched tests.
- Add `timeout=6` / `timeout=10` to
`@tractor_test(...)` on `test_cancel_infinite_streamer`
and `test_some_cancels_all`.
- Drop redundant `spawn_backend` param from
`test_cancel_via_SIGINT`; use `start_method` in the
`'mp' in ...` check instead.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Deats,
- use `proc.poll() is None` in `sig_prog()` to
distinguish "still running" from exit code 0;
drop stale `breakpoint()` from fallback kill
path (would hang CI).
- add missing `raise` on the `RuntimeError` in
`async_main()` when no tpt bind addrs given.
- clean up stale uid entries from the registrar
`_registry` when addr eviction empties the
addr list.
- update `discovery.__init__` docstring to match
the new eager `._multiaddr` import.
- fix `registar` -> `registrar` typo in teardown
report log msg.
Review: PR #429 (Copilot)
https://github.com/goodboy/tractor/pull/429
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Retry signal delivery in `sig_prog()` up to `tries`
times (default 3) w/ `canc_timeout` sleep between
attempts; only fall back to `_KILL_SIGNAL` after all
retries exhaust. Bump default timeout 0.1 -> 0.2.
Also,
- `test_multi_nested_subactors_error_through_nurseries`
gives the first prompt iteration a 5s timeout even
on linux bc the initial crash sequence can be slow
to arrive at a `pdb` prompt
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Factor the CPU-freq-scaling helper out of
`test_legacy_one_way_streaming` into `conftest.py`
alongside a new `cpu_scaling_factor()` convenience fn
that returns a latency-headroom multiplier (>= 1.0).
Apply it to the two other flaky-timeout tests,
- `test_cancel_via_SIGINT_other_task`: 2s -> scaled
- `test_example[we_are_processes.py]`: 16s -> scaled
Deats,
- add `get_cpu_state()` + `cpu_scaling_factor()` to
`conftest.py` so all test mods can share the logic.
- catch `IndexError` (empty glob) in addition to
`FileNotFoundError`.
- rename `factor` var -> `headroom` at call sites for
clarity on intent.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Forward the `tpt_proto` fixture val into spawned daemon
subprocesses via `run_daemon(enable_transports=..)` and
sync `_runtime_vars['_enable_tpts']` in the `tpt_proto`
fixture so sub-actors inherit the transport setting.
Deats,
- add `enable_transports={enable_tpts}` to the daemon
spawn-cmd template in `tests/conftest.py`.
- set `_state._runtime_vars['_enable_tpts']` in the
`tpt_proto` fixture in `_testing/pytest.py`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
For more reliability with the oob registrar using tests
via the `daemon` fixture,
- increase spawn-wait to `2` in CI, `1` OW; drop
the old py<3.7 branch.
- move `_ci_env` to module-level (above `_non_linux`)
so `_PROC_SPAWN_WAIT` can reference it at parse time.
- add `test_log` fixture param to `daemon()`, use it
for the error-on-exit log line instead of bare `log`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
There's a very sloppy registrar-actor-bootup syncing approach used in
this fixture (basically just guessing how long to sleep to wait for it
to init and bind the registry socket) using a `global _PROC_SPAWN_WAIT`
that needs to be made more reliable. But, for now i'm just playing along
with what's there to try and make less CI runs flaky by,
- sleeping *another* 1s when run from non-linux CI.
- reporting stdout (if any) alongside stderr on teardown.
- not strictly requiring a `proc.returncode == -2` indicating successful
graceful cancellation via SIGINT; instead we now error-log and only
raise the RTE on `< 0` exit code.
* though i can't think of why this would happen other then an
underlying crash which should propagate.. but i don't think any test
suite does this intentionally rn?
* though i don't think it should ever happen, having a CI run
"error"-fail bc of this isn't all that illuminating, if there is
some weird `.returncode == 0` termination case it's likely not
a failure?
For later, see the new todo list; we should sync to some kind of "ping"
polling of the tpt address if possible which is already easy enough for
TCP reusing an internal closure from `._root.open_root_actor()`.
Use new implicit module-name detection throughout codebase to simplify
logger creation and leverage auto-naming from caller mod .
Main changes,
- drop `name=__name__` arg from all `get_logger()` calls
(across 29 modules).
- update `get_console_log()` calls to include `name='tractor'` for
enabling root logger in test harness and entry points; this ensures
logic in `get_logger()` triggers so that **all** `tractor`-internal
logging emits to console.
- add info log msg in test `conftest.py` showing test-harness
log level
Also,
- fix `.actor.uid` ref to `.actor.aid.uid` in `._trace`.
- adjust a `._context` log msg formatting for clarity.
- add TODO comments in `._addr`, `._uds` for when we mv to
using `multiaddr`.
- add todo for `RuntimeVars` type hint TODO in `.msg.types` (once we
eventually get that all going obvi!)
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Namely any CLI driven runtime-config fixtures such as,
- `--spawn-backend` and `start_method`,
- `--tpdb` and `debug_mode`,
- `--tpt-proto` and `tpt_protos`/`tpt_proto`,
- `reg_addr` as driven by the above.
This moves all fixtures and necessary hook funcs (CLI parsing,
configuring and test-gen) to the `._testing.pytest` module and thus
allows any dependent project to leverage these fixtures in their own
test suites after pointing to that plugin mod using,
```python
# conftest.py
pytest_plugins: tuple[str] = (
"tractor._testing.pytest",
)
```
Also, add a new `._testing.addr` helper mod which now contains
a factored `get_rando_addr()` helper for creating test-sesh unique
tpt-specific registry (or other) IPC endpoint addrs.
Via a new accumulative `--tpt-proto` arg you can select which
`tpt_protos: list[str]`-fixture protocol keys will be delivered to
opting in tests!
B)
Also includes,
- CLI quote handling/stripping.
- default of 'tcp'.
- only support one selection per session at the moment (until we figure
out how we want to support multiples, either simultaneously or
sequentially).
- draft a (masked) dynamic-`metafunc` parametrization in the
`pytest_generate_tests()` hook.
- first proven and working use in the `test_advanced_faults`-suite (and
thus its underlying
`examples/advanced_faults/ipc_failure_during_stream.py` script)!
|_ actually needed this to prove that the suite only has 2 failures on
'uds' seemingly due to low-level `trio` error semantics translation
differences to do with with calling `socket.close()`..
On a very nearly related topic,
- draft an (also commented out) `set_script_runtime_args()` fixture idea
for a std way of `partial`-ling in runtime args to `examples/`
scripts-as-modules defining a `main()` which would proxy to
`tractor.open_nursery()`.
Such that we can run (opting-in) tests on both TCP and UDS backends and
ensure the `reg_addr` fixture and various timeouts are adjusted
accordingly.
Impl deats,
- add a new `tpc_proto` CLI option and fixture to allow choosing which
"transport protocol" will be used in the test suites (either globally
or contextually).
- rm `_reg_addr` instead opting for a `_rando_port` which will only be
used for `reg_addr`s which are net-tpt-protos.
- rejig `reg_addr` fixture to set a ideally session-unique `testrun_reg_addr`
based on the `tpt_proto` setting making appropriate calls to `._addr`
APIs as needed.
- refine `daemon` fixture a bit with typing, `tpt_proto` timings, and
stderr capture.
- in `test_discovery` do a ton of type-annots, add `debug_mode` fixture
opt ins, augment `spawn_and_check_registry()` with `psutil.Process`
passing for introspection (when things go wrong..).
Since there's no way to activate `greenback`'s portal in such cases, we
should at least have a test verifying our very loud error about the
inability to support this usage..
Allows tests (including any `@tractor_test`s) to subscribe to a CLI flag
`--tpdb` (for "tractor python debugger") which the session can provide
to tests which can then proxy the value to `open_root_actor()` (via
`open_nursery()`) when booting the runtime - thus enabling our debug
mode globally to any subscribers B)
This is real handy if you have some failures but can't determine the
root issue without jumping into a `pdbp` REPL inside a (sub-)actor's
spawned-task.
Since importing from our top level `conftest.py` is not scaleable
or as "future forward thinking" in terms of:
- LoC-wise (it's only one file),
- prevents "external" (aka non-test) example scripts from importing
content easily,
- seemingly(?) can't be used via abs-import if using
a `[tool.pytest.ini_options]` in a `pyproject.toml` vs.
a `pytest.ini`, see:
https://docs.pytest.org/en/8.0.x/reference/customize.html#pyproject-toml)
=> Go back to having an internal "testing" pkg like `trio` (kinda) does.
Deats:
- move generic top level helpers into pkg-mod including the new
`expect_ctxc()` (which i needed in the advanced faults testing script.
- move `@tractor_test` into `._testing.pytest` sub-mod.
- adjust all the helper imports to be a `from tractor._testing import <..>`
Rework `test_ipc_channel_break_during_stream()` and backing script:
- make test(s) pull `debug_mode` from new fixture (which is now
controlled manually from `--tpdb` flag) and drop the previous
parametrized input.
- update logic in ^ test for "which-side-fails" cases to better match
recently updated/stricter cancel/failure semantics in terms of
`ClosedResouruceError` vs. `EndOfChannel` expectations.
- handle `ExceptionGroup`s with expected embedded errors in test.
- better pendantics around whether to expect a user simulated KBI.
- for `examples/advanced_faults/ipc_failure_during_stream.py` script:
- generalize ipc breakage in new `break_ipc()` with support for diff
internal `trio` methods and a #TODO for future disti frameworks
- only make one sub-actor task break and the other just stream.
- use new `._testing.expect_ctxc()` around ctx block.
- add a bit of exception handling with `print()`s around ctxc (unused
except if 'msg' break method is set) and eoc cases.
- don't break parent side ipc in loop any more then once
after first break, checked via flag var.
- add a `pre_close: bool` flag to control whether
`MsgStreama.aclose()` is called *before* any ipc breakage method.
Still TODO:
- drop `pytest.ini` and add the alt section to `pyproject.py`.
-> currently can't get `--rootdir=` opt to work.. not showing in
console header.
-> ^ also breaks on 'tests' `enable_modules` imports in subactors
during discovery tests?
Including mostly tweaking asserts on relayed `ContextCancelled`s and
the new pub ctx properties: `.outcome`, `.maybe_error`, etc. as it
pertains to graceful (absorbed) remote cancellation vs. loud ctxc cases
expected to be raised by any `Portal.cancel_actor()` style teardown.
Start checking a variety internals like `._remote/local_error`,
`._is_self_cancelled()`, `._is_final_result_set()`, `._cancel_msg`
where applicable.
Also factor out the new `expect_ctxc()` checker to our `conftest.py` for
use in other suites.
- ease up on first stream test run deadline
- skip streaming tests in CI for mp backend, period
- give up on > 1 depth nested spawning with mp
- completely give up on slow spawning on windows