Commit Graph

658 Commits (90c46288ad6b0c5c517fa549bc185654feca54ac)

Author SHA1 Message Date
Gud Boi 94d233a2f7 Add opt-in `reap_subactors_per_test` fixture
Function-scoped, NON-autouse zombie-subactor reaper for
modules whose teardown is known-leaky enough to cascade-
fail every following test in a session.

Sibling to the autouse session-scoped `_reap_orphaned_subactors`. The
session-scoped one fires at session end — too late to save tests that
follow a hung/leaky test in the suite. The new fixture, opted into via
`pytestmark = pytest.mark.usefixtures(...)`, runs between tests in
a problem-module so a leftover subactor from test N can't squat on
registrar ports / UDS paths / shm segments needed by tests N+1,
N+2, ...

Intentionally NOT autouse — the fixture's presence on a module signals
"this module's teardown leaks; please root-cause instead of relying
forever on cleanup". A visibility-vs-convenience trade picked in favor
of the former.

Apply to `tests/test_infected_asyncio.py` since both recent full-suite
runs (parallel-tpt-proto + TCP-only) showed the cascade originating in
this file's KBI- and SIGINT-flavored tests under
`main_thread_forkserver`. Module-comment names the specific offenders so
future de-flake work has a starting point.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit b376eb0332)
2026-06-09 23:24:18 -04:00
Gud Boi a0e2c08119 Wall-cap `test_stale_entry_is_deleted` via `pytest-timeout`
Add a hard process-level wall-clock bound on a test
known to wedge un-Ctrl-C-ably under an in-dev spawn
backend, so an unattended suite run can't hang
indefinitely.

Deats,
- New `testing` dep: `pytest-timeout>=2.3`.
- `test_stale_entry_is_deleted`:
  `@pytest.mark.timeout(3, method='thread')`. The
  `method='thread'` choice is deliberate —
  `method='signal'` routes via `SIGALRM` which can be
  starved by the same GIL-hostage path that drops
  `SIGINT`, so it'd never actually fire in the
  starvation case.

At timeout, `pytest-timeout` hard-kills the pytest
process itself — that's the intended behavior here;
the alternative is the suite never returning.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 189f4e3f72e9f1eda5d24bcbab5743f7e35bd913)
(factored: kept pyproject + tests/discovery/test_registrar.py parts of
 "Wall-cap `subint` audit tests via `pytest-timeout`"; dropped
 tests/test_subint_cancellation.py)
2026-06-09 23:24:18 -04:00
Gud Boi 0e03c6815b Add `supervise_run_process` to `trionics._subproc`
A `trio.Nursery.start()`-style wrapper around
`trio.run_process()` that surfaces rc!=0 errors
deterministically, ALWAYS isolates the parent
controlling-tty, and optionally live-relays the child's
std-streams to `log.<level>` per-line. Suits both
short-lived test-runners + long-lived daemons.

`supervise_run_process()`,
- Deterministic rc!=0: pass `check=False` to `trio`
  and do our OWN post-drain rc-check from the
  supervisor coro body AFTER `own_tn.__aexit__` — NOT
  inside the internal nursery, since that would
  race-cancel the still-draining relay reader and lose
  stderr lines. (Re)build + raise a BARE
  `subprocess.CalledProcessError`: `.stderr=` for
  programmatic callers + an `add_note()`'d
  `|_.stderr:` block for human teardown logs. No
  nursery-eg-wrapped CPE to `collapse_eg` around.
- Parent controlling-tty isolation: `stdin=DEVNULL`
  always, `stdout=DEVNULL` unless relayed/overridden
  (via `stdout=` kwarg w/ `_UNSET` sentinel so explicit
  `None` = inherit still works). Prevents a spawned
  program from clobbering the launching tty's scrollback
  w/ control-seqs.
- Live per-line relay: `relay_stdout=True`/
  `relay_stderr=True` → relayed to `log.<relay_level>`
  (default `'io'`, our custom level 21). Picked to sort
  just above stdlib `INFO`=20 so it shows at usual
  `info`/`devx` levels yet stays separately filterable;
  `runtime`=15 was REJECTED as a default since it'd be
  silently filtered at usual verbosity — footgun for
  daemon supervisors whose whole point is visibility.
  STREAMED, not buffered-until-exit.
- Non-blocking `tn.start()` semantics: live
  `trio.Process` handed up via
  `task_status.started()` immediately (else
  `tn.start()` would block till child exit, losing
  the long-lived-daemon use case). Supervise/relay bg
  tasks run to completion in this coro.
- `**run_process_kwargs` forwarded verbatim (env, shell,
  cwd, start_new_session, executable, ...); MANAGED keys
  (`stdin`/`stdout`/`stderr`/`check`) win on conflict.
- Crash-handling layer intentionally NOT baked in —
  compose `maybe_open_crash_handler()` ON TOP at the
  call-site.

`_relay_stream_lines()` helper,
- Concurrent pipe-drain reader. MANDATORY whenever piping
  w/o `capture_*` since nothing else drains the OS pipe —
  child blocks on `write()` once kernel buf (~64KiB) fills
  → deadlock.
- Modes (combine freely): `emit`-only live relay,
  `accum`-only silent drain+capture (for the CPE note),
  or both. Per-line splitting handles cross-chunk
  residuals + flushes any trailing un-newline-term'd line
  at EOF.

`_add_stderr_note()` helper,
- Attaches an indented `|_.stderr:` note to a CPE via
  `add_note()` for legible rc!=0 reporting at teardown.

Tests (`tests/trionics/test_subproc.py`),
- Hermetic `trio`-only (no actor-runtime).
- `test_stdout_relayed_per_line`: per-line stdout relay.
- `test_parent_tty_isolated`: child fd1 is OUR pipe (no
  `/dev/pts/*`), fd0 pinned to `/dev/null`.
- `test_no_deadlock_on_big_unnewlined_output`: 200KiB
  no-newline output completes under `fail_after(2)` —
  exercises the concurrent drain (without it, the child
  blocks at ~64KiB).
- `test_stderr_relay_and_cpe_rebuild`: rc!=0 w/
  `relay_stderr=True` → bare `CalledProcessError` w/ the
  `.stderr` note + per-line live relay.
- `test_nonrelay_cpe_note`: rc!=0 w/o relay → same
  deterministic post-drain CPE w/ `.stderr` note (silent
  drain+capture path).

Re-export `supervise_run_process` from `tractor.trionics`.

Prompt-IO: ai/prompt-io/claude/20260601T231429Z_0e3e008b_prompt_io.md

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit f595acc76c)
2026-06-09 23:24:18 -04:00
Gud Boi 9c905b390b Add `add_log_level()` factory + register `IO`=21
Follow-up to f595acc7 (`supervise_run_process`) which
called `log.io(...)` for std-stream relay assuming an
`IO=21` level existed. Add the registration via a new
factory + tests covering both the factory and the new
level.

`add_log_level()` factory,
- One call wires the four (otherwise hand-synced) pieces:
  - `CUSTOM_LEVELS[NAME]` — drives the `stacklevel` bump
    in `StackLevelAdapter.log()` + `get_logger()`'s
    per-level audit.
  - `logging.addLevelName()` — stdlib name registration.
  - `STD_PALETTE[NAME]` + `BOLD_PALETTE['bold'][NAME]` —
    color entries consumed by `get_console_log()`'s
    `ColoredFormatter` build.
  - Same-named (lowercase) emit method bound on
    `StackLevelAdapter` so `log.<name>('msg')` works +
    `get_logger()`'s per-level method audit passes.
- Idempotent: re-registering an existing name is a
  no-op-ish refresh that won't clobber an already-bound
  method.
- Method binding uses a default-arg `_level=value` so
  the level int is captured (not late-bound across
  multiple registrations).

`IO=21` level (first user),
- Purple. Used by `tractor.trionics._subproc`'s
  std-stream relay (see f595acc7).
- Value 21 picked to sit just ABOVE stdlib `INFO`=20 so
  it's SHOWN BY DEFAULT at usual `info`/`devx` console
  levels — a `runtime`=15 relay would be silently
  filtered (footgun for daemon supervisors whose whole
  point is visibility). Still distinctly labeled +
  filterable.

Tests (`tests/test_log_sys.py`),
- `test_io_custom_level_registered`: validates the IO
  level is fully wired (`CUSTOM_LEVELS`, `addLevelName`,
  both palettes, `StackLevelAdapter.io()` callable);
  emits a record + sanity-asserts `21 >= INFO(20)`.
- `test_add_log_level_pluggable`: registers a fresh
  `XLVL=19` (cyan) via `add_log_level()`, asserts all
  four wires + the bound `xlog.xlvl()` emit, then
  try/finally cleans up the module-global mutations so
  later `get_logger()` audits don't trip on a
  half-removed level.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 7bd7dd50c7)
2026-06-09 23:24:18 -04:00
Gud Boi 11b9a87077 Fix `get_logger()` collapse of nested sub-pkgs
Strip the trailing `pkg_path` token ONLY when it duplicates the
caller's leaf-*module* name (which the console header already
shows via `{filename}`), instead of blindly dropping the last
token. This keeps genuine, possibly-*nested* sub-PACKAGE parts
addressable as their own sub-loggers.

- detect a true leaf-mod by comparing the caller's `__name__`
  vs `__package__` (a pkg `__init__` has them equal -> its
  trailing token is a real sub-pkg, NOT a leaf to strip).
- `name='devx.debug'` now -> `tractor.devx.debug`, DISTINCT
  from a bare `devx` -> `tractor.devx`; the old unconditional
  `pkg_path = subpkg_path` collapsed both to `tractor.devx` and
  silently broke per-sub-pkg level control via the logging-spec.
- `get_logger(__name__)` leaf-strip still works (cosmetic, bc
  the leaf-mod is in the `{filename}` header field).

Also,
- update the `LogSpec` caveat: sub-PACKAGE granularity now
  addressable at ANY depth; leaf *modules* intentionally aren't
  (they're the `{filename}`); top-level mods (eg. `to_asyncio`)
  still emit on the root logger.
- adjust `test_root_pkg_not_duplicated_in_logger_name` to the
  new literal explicit-`name` contract (no leaf-collapse).

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 9c36363b01)
2026-06-09 23:24:18 -04:00
Gud Boi 7478478038 Lift `--ll`/`--tl` to plugin + `LogSpec` API
Two coupled changes that let downstream projects (eg. `modden`) inherit
the test-harness loglevel plumbing for free via
`tractor._testing.pytest`:

Plugin lift (`tests/conftest.py` → `_testing/pytest.py`),
- mv `pytest_addoption(--ll)`, the `loglevel` autouse
  fixture, and `test_log` fixture out of the test-suite-
  local conftest into the reusable plugin.
- add `--tl`/`--tractor-loglevel` as a DISTINCT flag from
  `--ll`: `--ll` is the consuming-project's OWN app
  loglevel (scoped to its pkg-hierarchy), `--tl` is the
  `tractor.*` runtime loglevel. `--tl` falls back to
  `--ll` when unset (preserves current `tractor`-suite
  behavior).
- add `testing_pkg_name` session fixture (default
  `'tractor'`) — downstream projects override to e.g.
  `'modden'` so `--ll` scopes to their own hierarchy
  instead of `tractor.*`.
- `loglevel` fixture now yields the resolved
  tractor-runtime level (passed to
  `open_root_actor(loglevel=<.>)` by `@tractor_test`)
  AND separately applies `--ll` to the
  `testing_pkg_name` hierarchy when that isn't
  `tractor`. `test_log` scopes the per-test logger to
  `testing_pkg_name`.

`tractor.log` "logging-spec" mini-DSL,
- `LogSpec = str|bool`. Accepted forms:
  - `True` → enable `pkg_name` root at `default_level`
    (fallback `'cancel'`).
  - `False` → no-op.
  - bare level eg. `'info'` → root-logger at that level.
  - `'sub:info,x:cancel'` → per-sub-logger filter-spec;
    each `<name>` is RELATIVE to `pkg_name` (must NOT
    include the pkg-token).
- `parse_logspec()` → `{sublog|None: level}` mapping.
  `None` key = root-logger. Mixed bare-level + filters
  in one spec is rejected w/ a helpful err msg; so is
  embedding the `pkg_name` token in a sub-name.
- `apply_logspec()` → `(primary_level, {name: log})`:
  parses then enables a `colorlog` stderr handler per
  named (sub)logger. Authoritative sub-logger filters
  get `propagate=False` so they don't double-emit
  through a parallel root-level handler.
- !GRANULARITY CAVEAT! sub-logger names match at
  sub-pkg granularity, not leaf-module — so `devx.debug`
  collapses to the same `tractor.devx` logger as a bare
  `devx`, and top-level lib modules (eg.
  `tractor.to_asyncio`) emit under the *root* logger
  rather than a phantom `to_asyncio` child. Documented
  inline on `LogSpec`.

Other,
- `tests/conftest.py` keeps a NOTE pointing to the
  plugin for future-debugging clarity (don't remove
  silently — the lift is the relevant signal).

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 19a77708ba)
2026-06-09 23:24:18 -04:00
Gud Boi 9e09dc5eee Default `--ll` to `None` in test harness
Only override `tractor.log._default_loglevel` when
the flag is explicitly passed — lets per-spawn and
per-example `loglevel` kwargs take effect instead
of being clobbered by the hard-coded `'ERROR'`
default.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 72a0465c52)
2026-06-09 23:24:18 -04:00
Gud Boi c4ec664bfa Fix `SIGUSR1` tree-dump ordering in `_stackscope`
Factor the sub-actor relay loop out of
`dump_tree_on_sig()` into `_relay_sig_to_subactors()`
and chain both dump + relay in a single
`run_sync_soon` callback (`_dump_then_relay`) so the
parent's task-tree flushes BEFORE any sub receives
the signal — fixes a hierarchical-ordering race
where subs could dump ahead of the parent in the
muxed pty stream.

Also,
- gate file/tty sink writes behind `write_file` +
  `write_tty` params on `dump_task_tree()`.
- use `actor.aid.uid` instead of deprecated `.uid`.
- update `test_shield_pause` expects to match the
  new sequential parent -> relay-log -> sub ordering.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit e2b790a70d)
2026-06-09 23:24:18 -04:00
Gud Boi d3b1a68ff9 Add `enable_transports`/`registry_addrs` proto guard
Raise `ValueError` from `open_root_actor()` when any
`registry_addrs` entry uses a transport proto not in
`enable_transports` — historically this caused a
silent indefinite hang during the registrar handshake
(the actor could never connect to register/discover).

Also,
- update `test_root_passes_tpt_to_sub` to detect a
  proto mismatch between parametrized `tpt_proto_key`
  and CLI `tpt_proto`, asserting the new guard raises
  `ValueError` with expected msg content.
- replace old commented-out notes with a clearer
  explanation of the mismatch foot-gun.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit d036ef7d7f)
2026-06-09 23:08:40 -04:00
Gud Boi aaf696ba4c Drop global mutation of `_PROC_SPAWN_WAIT`
In top level `daemon`-fixture that is..

Use a local `bg_daemon_spawn_delay` instead of
mutating the module-level `_PROC_SPAWN_WAIT` —
previously each `daemon` fixture invocation would
permanently add 1.6s (UDS) or 1s (CI) to the
global, inflating delays across the session.

Also, emit a `test_log.warning()` when verbose
loglevel is silently reduced to `'info'`.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit c4885f9d99)
2026-06-09 23:07:44 -04:00
Gud Boi 2d8bcbb1c4 Add `tractor.trionics.patches` subpkg + first fix
With a seminal patch fixing `trio`'s `WakeupSocketpair.drain()` which
can busy-loop due to lack of handling `EOF`.

New `tractor.trionics.patches` subpkg housing defensive monkey-patches
for upstream `trio` bugs we've encountered while running `tractor`
— particularly as of recent, fork-survival edge cases that haven't been
filed/fixed upstream yet. Each patch is idempotent, version-gated via
`is_needed()`, and carries a `# REMOVE WHEN:` marker pointing at the
upstream release whose adoption allows deletion.

Subpkg layout + per-patch contract documented in
`tractor/trionics/patches/README.md` — `apply()` / `is_needed()`
/ `repro()` API, registry pattern via `_PATCHES` in `__init__.py`,
single-call entry point `apply_all()`.

First patch, `_wakeup_socketpair`:
- `trio`'s `WakeupSocketpair.drain()` loops on `recv(64KB)` and exits
  ONLY on `BlockingIOError`, NEVER on `recv() == b''` (peer-closed FIN).
- under `fork()`-spawning backends the COW-inherited socketpair fds
  & `_close_inherited_fds()` teardown can leave a `WakeupSocketpair`
  instance whose write-end is closed, and `drain()` then **spins forever
  in C with no Python checkpoints**,
- this obviously burns 100% CPU and no signal delivery.

Standalone repro:

    from trio._core._wakeup_socketpair import WakeupSocketpair
    ws = WakeupSocketpair()
    ws.write_sock.close()
    ws.drain()  # spins forever

Patch is one-line — break the drain loop on b'' EOF.

Manifested as two distinct test failures:

- `tests/test_multi_program.py::test_register_duplicate_name` hung at
  100% CPU on the busy-loop directly (fork child's worker thread)
- `tests/test_infected_asyncio.py::test_aio_simple_error` Mode-A
  deadlock — busy-loop wedged trio's scheduler inside `start_guest_run`,
  both threads parked in `epoll_wait`, no TCP connect-back to parent
  ever happened.

Same patch fixes both. Restored 99.7% pass rate on full
suite under `--spawn-backend=main_thread_forkserver`
(was hanging indefinitely before).

Wired into `tractor._child._actor_child_main` via `apply_all()` BEFORE
any trio runtime init. Harmless on non-fork backends.

Conc-anal write-ups, including strace + py-spy evidence:

- `ai/conc-anal/trio_wakeup_socketpair_busy_loop_under_fork_issue.md`
- `ai/conc-anal/infected_asyncio_under_main_thread_forkserver_hang_issue.md`

Regression tests in `tests/trionics/test_patches.py`: each test asserts
(a) the bug exists pre-patch (or is fixed upstream — skip cleanly), (b)
the patch fixes it with a SIGALRM wall-clock cap so a regression hangs
loud instead of silently.

TODO:
- [ ] file the upstream `python-trio/trio` issue + PR.
- [ ] use the `repro()` callable in `_wakeup_socketpair.py` IS the issue
      body's evidence section.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 0ef549fadb)
(factored: dropped spawn-backend-only paths: ai/conc-anal/infected_asyncio_under_main_thread_forkserver_hang_issue.md)
2026-06-09 23:07:44 -04:00
Gud Boi 400ed77ab1 Allow per-call `start_method`/`loglevel` overrides
In `tests/devx/conftest.py::spawn`, refactor the
fixture-internal closures so consumer tests can pass
explicit `start_method`/`loglevel` to each `_spawn()`
invocation rather than only inheriting the fixture-
scoped parametrize values.

Deats,
- promote `set_spawn_method()` and `set_loglevel()`
  to take their respective values as fn params (vs
  closing over the fixture-scope vars).
- give `_spawn()` `start_method=start_method` and
  `loglevel: str|None = None` kwargs so callers
  override one-off without re-parametrizing the
  suite. NOTE: this drops the implicit fixture-
  scoped `loglevel` forward — `_spawn()` callers
  now must pass `loglevel=...` explicitly.
- TODO: figure out how `--ll <level>` should map to
  the default (currently `None` → uses env-var or
  tractor default).
- add a docstring to `_spawn()` so its role as the
  consumer-facing closure is obvious from `help()`.

Also,
- `assert_before()` now returns the `.before` output
  on success (was `None`); add a one-line docstring
  describing the new return contract.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 486249d74f)
2026-06-09 23:07:06 -04:00
Gud Boi c794d5ef44 Honor `TRACTOR_LOGLEVEL`+`TRACTOR_SPAWN_METHOD` env-vars
Add env-var overrides inside `._root.open_root_actor()` so
devs/test-runs can swap the actor-spawn backend or crank
console verbosity *without* touching application code.

In `._root.open_root_actor()`,
- read `TRACTOR_LOGLEVEL` early, overriding any caller-passed
  `loglevel` and stashing an `env_ll_report` to emit once the
  console log is set up.
- pull the `loglevel` fallback (`or _default_loglevel`) and
  `log.get_console_log()` init *up* so the env-var report
  routes through tractor's own logger.
- read `TRACTOR_SPAWN_METHOD`, overriding any caller-passed
  `start_method` and warn-logging when the env-var clobbers
  an explicit caller value.

Wire the same vars through `tests/devx/conftest.py::spawn`,
- request the `loglevel` fixture, set both `TRACTOR_LOGLEVEL`
  and `TRACTOR_SPAWN_METHOD` in `os.environ` before each
  `pexpect.spawn()` (inherited by the example subproc).
- expand `supported_spawners` to include
  `main_thread_forkserver` and `subint_forkserver` bc
  example scripts no longer need per-script CLI plumbing.
- pop both vars in fixture teardown so a leaked value can't
  re-route a later in-process tractor test's spawn-backend
  or loglevel.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 208e7c0926)
2026-06-09 23:07:06 -04:00
Gud Boi ffad2f91ec Mk `test_no_runtime()` not require `pytest-trio`
(cherry picked from commit 79dda4cb4a)
2026-06-09 23:05:19 -04:00
Gud Boi 1d960193f8 Import-or-skip `.devx.` tests requiring `greenback`
Which is for sure true on py3.14+ rn since `greenlet` didn't want to
build for us (yet).

(cherry picked from commit d6e70e9de4)
2026-06-09 23:05:19 -04:00
Gud Boi 09da8d42fd Strip ANSI + accept `_create(...)` in devx tests
Two version-compat fixes for the `devx` debugger
test-suite, both about matching upstream output that
got more verbose w/ recent lib releases.

ANSI stripping (`tests/devx/conftest.py`),
- Add `ansi_strip(text)` helper + `_ansi_re` pattern
  (regex per https://stackoverflow.com/a/14693789).
- Apply inside `in_prompt_msg()` + `assert_before()` so
  substring matches against REPL/traceback output stay
  robust to color leakage.
- Motivated by py3.13's colored tracebacks +
  `pdbp`/pygments highlighting leaking ANSI even when
  `PYTHON_COLORS=0` is set in the `spawn` fixture (not
  every renderer in the spawned subproc honors it).
- Replaces the longstanding inline TODO that linked
  the SO answer w/o impl'ing.

trio 0.30+ `Cancelled._create(` match (`test_debugger`),
- In `test_shield_pause` swap the two
  `"raise Cancelled._create()"` assertion patterns →
  `"raise Cancelled._create("` (open-paren form, no
  closing).
- trio >=0.30 raises a multi-line
  `raise Cancelled._create(source=.., reason=..,
  source_task=..)` w/ cancel-reason metadata, so the
  legacy bare-`()` form no longer matches. Inline
  comment documents the trio-version pivot.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 3854cf5ecb)
2026-06-09 22:54:57 -04:00
Gud Boi 680cad2cac Avoid skip `.ipc._ringbuf` import when no `cffi`
(cherry picked from commit 03bf2b931e)
2026-06-09 22:53:23 -04:00
Gud Boi 89f7ed4794 Handle py3.14+ incompats as test skips
Since we're devving subints we require the 3.14+ stdlib API
and a couple compiled libs don't support it yet, namely:
- `cffi`, which we're only using for the `.ipc._linux` eventfd
  stuff (now factored into `hotbaud` anyway).
- `greenback`, which requires `greenlet` which doesn't seem to be
  wheeled yet
  * on nixos the sdist build was failing due to lack of `g++` which
    i don't care to figure out rn since we don't need `.devx` stuff
    immediately for this subints prototype.
  * [ ] we still need to adjust any dependent suites to skip.

Adjust `test_ringbuf` to skip on import failure.

Also project wide,
- pin us to py 3.13+ in prep for last-2-minor-version policy.
- drop `msgspec>=0.20.0`, the first release with py3.14 support.

(cherry picked from commit d2ea8aa2de)
2026-06-09 22:53:23 -04:00
Gud Boi 3867403fab Scale `test_open_local_sub_to_stream` timeout by CPU factor
Import and apply `cpu_scaling_factor()` from
`conftest`; bump base from 3.6 -> 4 and multiply
through so CI boxes with slow CPUs don't flake.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-16 20:03:32 -04:00
Gud Boi ed65301d32 Fix misc bugs caught by Copilot review
Deats,
- use `proc.poll() is None` in `sig_prog()` to
  distinguish "still running" from exit code 0;
  drop stale `breakpoint()` from fallback kill
  path (would hang CI).
- add missing `raise` on the `RuntimeError` in
  `async_main()` when no tpt bind addrs given.
- clean up stale uid entries from the registrar
  `_registry` when addr eviction empties the
  addr list.
- update `discovery.__init__` docstring to match
  the new eager `._multiaddr` import.
- fix `registar` -> `registrar` typo in teardown
  report log msg.

Review: PR #429 (Copilot)
https://github.com/goodboy/tractor/pull/429

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:15 -04:00
Gud Boi cd287c7e93 Fix `test_registrar_merge_binds_union` for UDS collision
`get_random()` can produce the same UDS filename for a given
pid+actor-state, so the "disjoint addrs" premise doesn't always hold.
Gate the `len(bound) >= 2` assertion on whether the registry and bind
addrs actually differ via `expect_disjoint`.

Also,
- drop unused `partial` import

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:15 -04:00
Gud Boi 86d4e0d3ed Harden `sig_prog()` retries, adjust debugger test timeouts
Retry signal delivery in `sig_prog()` up to `tries`
times (default 3) w/ `canc_timeout` sleep between
attempts; only fall back to `_KILL_SIGNAL` after all
retries exhaust. Bump default timeout 0.1 -> 0.2.

Also,
- `test_multi_nested_subactors_error_through_nurseries`
  gives the first prompt iteration a 5s timeout even
  on linux bc the initial crash sequence can be slow
  to arrive at a `pdb` prompt

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi c3d6cc9007 Rename `discovery._discovery` to `._api`
Adjust all imports to match.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi e90241baaa Add `parse_endpoints()` to `_multiaddr`
Provide a service-table parsing API for downstream projects (like
`piker`) to declare per-actor transport bind addresses as a config map
of actor-name -> multiaddr strings (e.g. from a TOML `[network]`
section).

Deats,
- `EndpointsTable` type alias: input `dict[str, list[str|tuple]]`.
- `ParsedEndpoints` type alias: output `dict[str, list[Address]]`.
- `parse_endpoints()` iterates the table and delegates each entry to the
  existing `tractor.discovery._discovery.wrap_address()` helper, which
  handles maddr strings, raw `(host, port)` tuples, and pre-wrapped
  `Address` objs.
- UDS maddrs use the multiaddr spec name `/unix/...` (not tractor's
  internal `/uds/` proto_key)

Also add new tests,
- 7 new pure unit tests (no trio runtime): TCP-only, mixed tpts,
  unwrapped tuples, mixed str+tuple, unsupported proto (`/udp/`),
  empty table, empty actor list
- all 22 multiaddr tests pass rn.

Prompt-IO:
ai/prompt-io/claude/20260413T205048Z_269d939c_prompt_io.md

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 7079a597c5 Add `test_tpt_bind_addrs.py` + fix type-mixing bug
Add 9 test variants (6 fns) covering all three
`tpt_bind_addrs` code paths in `open_root_actor()`:
- registrar w/ explicit bind (eq, subset, disjoint)
- non-registrar w/ explicit bind (same/diff
  bindspace) using `daemon` fixture
- non-registrar default random bind (baseline)
- maddr string input parsing
- registrar merge produces union
- `open_nursery()` forwards `tpt_bind_addrs`

Fix type-mixing bug at `_root.py:446` where the
registrar merge path did `set(Address + tuple)`,
preventing dedup and causing double-bind `OSError`.
Wrap `uw_reg_addrs` before the set union so both
sides are `Address` objs.

Also,
- add prompt-io output log for this session
- stage original prompt input for tracking

Prompt-IO: ai/prompt-io/claude/20260413T192116Z_f851f28_prompt_io.md

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi f881683c97 Tweak timeouts and rm `arbiter_addr` in tests
Use `cpu_scaling_factor()` headroom in
`test_peer_spawns_and_cancels_service_subactor`'s `fail_after` to avoid
flaky timeouts on throttled CI runners. Rename `arbiter_addr=` ->
`registry_addrs=[..]` throughout `test_spawning` and
`test_task_broadcasting` suites to match the current `open_root_actor()`
/ `open_nursery()` API.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 490fac432c Preserve absolute UDS paths in `parse_maddr()`
Drop the `.lstrip('/')` on the unix protocol value
so the lib-prepended `/` restores the absolute-path
semantics that `mk_maddr()` strips when encoding.
Pass `Path` components (not `str`) to `UDSAddress`.

Also, update all UDS test params to use absolute
paths (`/tmp/tractor_test/...`, `/tmp/tractor_rt/...`)
matching real runtime sockpath behavior; tighten
`test_parse_maddr_uds` to assert exact `filedir`.

Review: PR #429 (copilot-pull-request-reviewer[bot])
https://github.com/goodboy/tractor/pull/429#pullrequestreview-4018448152

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 5c4438bacc Add `parse_maddr()` tests + registrar maddr integ test
Cover `parse_maddr()` with unit tests for tcp/ipv4,
tcp/ipv6, uds, and unsupported-protocol error paths,
plus full `addr -> mk_maddr -> str -> parse_maddr`
roundtrip verification.

Adds,
- a `_maddr_to_tpt_proto` inverse-mapping assertion.
- an `wrap_address()` maddr-string acceptance test.
- a `test_reg_then_unreg_maddr` end-to-end suite which audits passing
  the registry addr as multiaddr str through the entire runtime.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 1f1e09a786 Move `test_discovery` to `tests/discovery/test_registrar`
All tests are registrar-actor integration scenarios
sharing intertwined helpers + `enable_modules=[__name__]`
task fns, so keep as one mod but rename to reflect
content. Now lives alongside `test_multiaddr.py` in
the new `tests/discovery/` subpkg.

Also,
- update 5 refs in `/run-tests` SKILL.md to match
  the new path
- add `discovery/` subdir to the test directory
  layout tree

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 7cf3b5d00d Add `test_multiaddr.py` suite for `mk_maddr()`
Cover `_tpt_proto_to_maddr` mapping, TCP (ipv4/ipv6),
UDS, unsupported `proto_key` error, and round-trip
re-parse for both transport types.

Deats,
- new `tests/discovery/` subpkg w/ empty `__init__.py`
- `test_tpt_proto_to_maddr_mapping`: verify `tcp` and
  `uds` entries
- `test_mk_maddr_tcp_ipv4`: full assertion on
  `/ip4/127.0.0.1/tcp/1234` incl protocol iteration
- `test_mk_maddr_tcp_ipv6`: verify `/ip6/::1/tcp/5678`
- `test_mk_maddr_uds`: relative `filedir` bc the
  multiaddr parser rejects double-slash from abs paths
- `test_mk_maddr_unsupported_proto_key`: `ValueError`
  on `proto_key='quic'` via `SimpleNamespace` mock
- `test_mk_maddr_roundtrip`: parametrized over tcp +
  uds, re-parse `str(maddr)` back through `Multiaddr`

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:14 -04:00
Gud Boi 570c975f14 Fix test typo + use `wait_for_result()` API
- "boostrap" → "bootstrap" in mod docstring
- replace deprecated `portal.result()` with
  `portal.wait_for_result()` + value assertion
  inside the nursery block

Review: PR #1 (Copilot)
https://github.com/mahmoudhas/tractor/pull/1#pullrequestreview-4091096072

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-10 20:58:54 -04:00
Gud Boi e8f1eca8d2 Tighten `test_spawning` types, parametrize loglevel
Parametrize `test_loglevel_propagated_to_subactor`
across `'debug'`, `'cancel'`, `'critical'` levels
(was hardcoded to just `'critical'`) and move it
above the parent-main tests for logical grouping.

Also,
- add `start_method: str` annotations throughout
- use `portal.wait_for_result()` in
  `test_most_beautiful_word` (replaces `.result()`)
- expand mod docstring to describe test coverage
- reformat `check_parent_main_inheritance` docstr

Review: PR #438 (Copilot)
https://github.com/goodboy/tractor/pull/438

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-10 20:58:54 -04:00
Gud Boi c6c591e61a Drop `spawn_test_support` pkg, inline parent-main tests
Replace the subproc-based test harness with inline
`tractor.open_nursery()` calls that directly check
`actor._parent_main_data` instead of comparing
`__main__.__name__` across a process boundary
(which is a no-op under pytest bc the parent
`__main__` is `pytest.__main__`).

Deats,
- delete `tests/spawn_test_support/` pkg (3 files)
- add `check_parent_main_inheritance()` helper fn
  that asserts on `_parent_main_data` emptiness
- rewrite both `run_in_actor` and `start_actor`
  parent-main tests as inline async fns
- drop `tmp_path` fixture and unused imports

Review: PR #434 (goodboy, Copilot)
https://github.com/goodboy/tractor/pull/434

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-10 20:58:54 -04:00
mahmoud b883b27646 Exercise parent-main inheritance through spawn test support
Move the subprocess probe into dedicated spawn test support files so the inheritance tests cover the real __main__ replay path without monkeypatching or inline script strings.
2026-04-10 20:58:54 -04:00
mahmoud 00637764d9 Address review follow-ups for parent-main inheritance opt-out
Clean up mutable defaults, give parent-main bootstrap data a named type, and add direct start_actor coverage so the opt-out change is clearer to review.
2026-04-10 20:58:54 -04:00
mahmoud ea971d25aa Rename parent-main inheritance flag.
Use `inherit_parent_main` across the actor APIs and helper to better describe the behavior, and restore the reviewer note at child bootstrap where the inherited `__main__` data is copied from `SpawnSpec`.
2026-04-10 20:58:54 -04:00
mahmoud 83b6c4270a Simplify parent-main replay opt-out.
Keep actor-owned parent-main capture and let `_mp_figure_out_main()` decide whether to return `__main__` bootstrap data, avoiding the extra SpawnSpec plumbing while preserving the per-actor flag.
2026-04-10 20:58:54 -04:00
mahmoud 6309c2e6fc Route parent-main replay through SpawnSpec
Keep trio child bootstrap data in the spawn handshake instead of stashing it on Actor state so the replay opt-out stays explicit and avoids stale-looking runtime fields.
2026-04-10 20:58:54 -04:00
mahmoud f5301d3fb0 Add per-actor parent-main replay opt-out
Let actor callers skip replaying the parent __main__ during child startup so downstream integrations can avoid inheriting incompatible bootstrap state without changing the default spawn behavior.
2026-04-10 20:58:54 -04:00
Gud Boi febe587c6c Drop `xfail` from `test_moc_reentry_during_teardown`
The per-`ctx_key` locking fix in f086222d intended to resolve the
teardown race reproduced by the new test suite, so the test SHOULD now
pass. TLDR, it doesn't Bp

Also add `collapse_eg()` to the test's ctx-manager stack so that when
run with `pytest <...> --tpdb` we'll actually `pdb`-REPL the RTE when it
hits (previously an assert-error).

(this commit-msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-09 14:42:42 -04:00
Gud Boi cab366cd65 Add xfail test for `_Cache.run_ctx` teardown race
Reproduce the piker `open_cached_client('kraken')` scenario: identical
`ctx_key` callers share one cached resource, and a new task re-enters
during `__aexit__` — hitting `assert not resources.get()` bc `values`
was popped but `resources` wasn't yet.

Deats,
- `test_moc_reentry_during_teardown` uses an `in_aexit` event to
  deterministically land in the teardown window.
- marked `xfail(raises=AssertionError)` against unpatched code (fix in
  `9e49eddd` or wtv lands on the `maybe_open_ctx_locking` or thereafter
  patch branch).

Also, add prompt-io log for the session.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

Prompt-IO: ai/prompt-io/claude/20260406T193125Z_85f9c5d_prompt_io.md
2026-04-06 18:17:04 -04:00
Gud Boi 85f9c5df6f Add per-`ctx_key` isolation tests for `maybe_open_context()`
Add `test_per_ctx_key_resource_lifecycle` to verify that per-key user
tracking correctly tears down resources independently - exercises the
fix from 02b2ef18 where a global `_Cache.users` counter caused stale
cache hits when the same `acm_func` was called with different kwargs.

Also, add a paired `acm_with_resource()` helper `@acm` that yields its
`resource_id` for per-key testing in the above suite.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

Prompt-IO: ai/prompt-io/claude/20260406T172848Z_02b2ef1_prompt_io.md
2026-04-06 14:37:47 -04:00
Gud Boi ebe9d5e4b5 Parametrize `test_resource_cache.test_open_local_sub_to_stream`
Namely with multiple pre-sleep `delay`-parametrizations before either,

- parent-scope cancel-calling (as originally) or,
- depending on the  new `cancel_by_cs: bool` suite parameter, optionally
  just immediately exiting from (the newly named)
  `maybe_cancel_outer_cs()` a checkpoint.

In the latter case we ensure we **don't** inf sleep to avoid leaking
those tasks into the `Actor._service_tn` (though we should really have
a better soln for this)..

Deats,
- make `cs` args optional and adjust internal logic to match.
- add some notes around various edge cases and issues with using the
  actor-service-tn as the scope by default.
2026-04-06 14:37:47 -04:00
Gud Boi 8f44efa327 Drop stale `.cancel()`, fix docstring typo in tests
- Remove leftover `await an.cancel()` in
  `test_registered_custom_err_relayed`; the
  nursery already cancels on scope exit.
- Fix `This document` -> `This documents` typo in
  `test_unregistered_err_still_relayed` docstring.

Review: PR #426 (Copilot)
https://github.com/goodboy/tractor/pull/426

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 5968a3c773 Use `'<unknown>'` for unresolvable `.boxed_type_str`
Add a teensie unit test to match.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 80597b80bf Add passing test for unregistered err relay
Drop the `xfail` test and instead add a new one that ensures the
`tractor._exceptions` fixes enable graceful relay of
remote-but-unregistered error types via the unboxing of just the
`rae.src_type_str/boxed_type_str` content. The test also ensures
a warning is included with remote error content indicating the user
should register their error type for effective cross-actor re-raising.

Deats,
- add `test_unregistered_err_still_relayed`: verify the
  `RemoteActorError` IS raised with `.boxed_type`
  as `None` but `.src_type_str`, `.boxed_type_str`,
  and `.tb_str` all preserved from the IPC msg.
- drop `test_unregistered_boxed_type_resolution_xfail`
  since the new above case covers it and we don't need to have
  an effectively entirely repeated test just with an inverse assert
  as it's last line..

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 9c37b3f956 Add `reg_err_types()` test suite for remote exc relay
Verify registered custom error types round-trip correctly over IPC via
`reg_err_types()` + `get_err_type()`.

Deats,
- `TestRegErrTypesPlumbing`: 5 unit tests for the type-registry plumbing
  (register, lookup, builtins, tractor-native types, unregistered
  returns `None`)
- `test_registered_custom_err_relayed`: IPC end-to-end for a registered
  `CustomAppError` checking `.boxed_type`, `.src_type`, and `.tb_str`
- `test_registered_another_err_relayed`: same for `AnotherAppError`
  (multi-type coverage)
- `test_unregistered_custom_err_fails_lookup`: `xfail` documenting that
  `.boxed_type` can't resolve without `reg_err_types()` registration

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi b14dbde77b Skip `test_empty_mngrs_input_raises` on UDS tpt
The `open_actor_cluster()` teardown hangs
intermittently on UDS when `gather_contexts(mngrs=())`
raises `ValueError` mid-setup; likely a race in the
actor-nursery cleanup vs UDS socket shutdown. TCP
passes reliably (5/5 runs).

- Add `tpt_proto` fixture param to the test
- `pytest.skip()` on UDS with a TODO for deeper
  investigation of `._clustering`/`._supervise`
  teardown paths

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 93d99ed2eb Move `get_cpu_state()` to `conftest` as shared latency headroom
Factor the CPU-freq-scaling helper out of
`test_legacy_one_way_streaming` into `conftest.py`
alongside a new `cpu_scaling_factor()` convenience fn
that returns a latency-headroom multiplier (>= 1.0).

Apply it to the two other flaky-timeout tests,
- `test_cancel_via_SIGINT_other_task`: 2s -> scaled
- `test_example[we_are_processes.py]`: 16s -> scaled

Deats,
- add `get_cpu_state()` + `cpu_scaling_factor()` to
  `conftest.py` so all test mods can share the logic.
- catch `IndexError` (empty glob) in addition to
  `FileNotFoundError`.
- rename `factor` var -> `headroom` at call sites for
  clarity on intent.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 6215e3b2dd Adjust `test_a_quadruple_example` time-limit for CPU scaling
Add `get_cpu_state()` helper to read CPU freq settings
from `/sys/devices/system/cpu/` and use it to compensate
the perf time-limit when `auto-cpufreq` (or similar)
scales down the max frequency.

Deats,
- read `*_pstate_max_freq` and `scaling_max_freq`
  to compute a `cpu_scaled` ratio.
- when `cpu_scaled != 1.`, increase `this_fast` limit
  proportionally (factoring dual-threaded cores).
- log a warning via `test_log` when compensating.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00