Make the sub-actor proc-title prefix a single
authoritative constant (`_proctitle._def_prefix`) so
the reap-recognition markers and `xontrib` banner pick
it up automatically — one place to flip the prefix
shape going fwd.
Deats,
- `_proctitle._def_prefix: str = '_subactor'`. New
module-level const consumed by everything that needs
to know the prefix.
- `set_actor_proctitle(actor, prefix=_def_prefix)`:
takes an explicit `prefix` arg (default = the const)
so callers can override per-spawn if they want.
- Default proc-title format:
`'tractor[<reprol>]'` → `f'{prefix}[<reprol>]'`
i.e. `_subactor[<reprol>]` by default.
- `_testing/_reap.py`: cmdline + comm markers source
the prefix from `_proctitle._def_prefix` instead of
the hardcoded `'tractor['`. So
`_is_tractor_subactor()` tracks the const
automatically.
- `xontrib/tractor_diag.xsh`: `acli.reap` orphan-mode
banner now interpolates the
`_TRACTOR_PROC_CMDLINE_MARKERS` tuple directly so
the human-readable mode line stays in sync if the
prefix shape changes again.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 3a45dbd503)
(cherry picked from commit fd8d39c0ce)
Follow-up to f595acc7 (`supervise_run_process`) which
called `log.io(...)` for std-stream relay assuming an
`IO=21` level existed. Add the registration via a new
factory + tests covering both the factory and the new
level.
`add_log_level()` factory,
- One call wires the four (otherwise hand-synced) pieces:
- `CUSTOM_LEVELS[NAME]` — drives the `stacklevel` bump
in `StackLevelAdapter.log()` + `get_logger()`'s
per-level audit.
- `logging.addLevelName()` — stdlib name registration.
- `STD_PALETTE[NAME]` + `BOLD_PALETTE['bold'][NAME]` —
color entries consumed by `get_console_log()`'s
`ColoredFormatter` build.
- Same-named (lowercase) emit method bound on
`StackLevelAdapter` so `log.<name>('msg')` works +
`get_logger()`'s per-level method audit passes.
- Idempotent: re-registering an existing name is a
no-op-ish refresh that won't clobber an already-bound
method.
- Method binding uses a default-arg `_level=value` so
the level int is captured (not late-bound across
multiple registrations).
`IO=21` level (first user),
- Purple. Used by `tractor.trionics._subproc`'s
std-stream relay (see f595acc7).
- Value 21 picked to sit just ABOVE stdlib `INFO`=20 so
it's SHOWN BY DEFAULT at usual `info`/`devx` console
levels — a `runtime`=15 relay would be silently
filtered (footgun for daemon supervisors whose whole
point is visibility). Still distinctly labeled +
filterable.
Tests (`tests/test_log_sys.py`),
- `test_io_custom_level_registered`: validates the IO
level is fully wired (`CUSTOM_LEVELS`, `addLevelName`,
both palettes, `StackLevelAdapter.io()` callable);
emits a record + sanity-asserts `21 >= INFO(20)`.
- `test_add_log_level_pluggable`: registers a fresh
`XLVL=19` (cyan) via `add_log_level()`, asserts all
four wires + the bound `xlog.xlvl()` emit, then
try/finally cleans up the module-global mutations so
later `get_logger()` audits don't trip on a
half-removed level.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 7bd7dd50c7)
(cherry picked from commit 93558fe3c9)
Two version-compat fixes for the `devx` debugger
test-suite, both about matching upstream output that
got more verbose w/ recent lib releases.
ANSI stripping (`tests/devx/conftest.py`),
- Add `ansi_strip(text)` helper + `_ansi_re` pattern
(regex per https://stackoverflow.com/a/14693789).
- Apply inside `in_prompt_msg()` + `assert_before()` so
substring matches against REPL/traceback output stay
robust to color leakage.
- Motivated by py3.13's colored tracebacks +
`pdbp`/pygments highlighting leaking ANSI even when
`PYTHON_COLORS=0` is set in the `spawn` fixture (not
every renderer in the spawned subproc honors it).
- Replaces the longstanding inline TODO that linked
the SO answer w/o impl'ing.
trio 0.30+ `Cancelled._create(` match (`test_debugger`),
- In `test_shield_pause` swap the two
`"raise Cancelled._create()"` assertion patterns →
`"raise Cancelled._create("` (open-paren form, no
closing).
- trio >=0.30 raises a multi-line
`raise Cancelled._create(source=.., reason=..,
source_task=..)` w/ cancel-reason metadata, so the
legacy bare-`()` form no longer matches. Inline
comment documents the trio-version pivot.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 3854cf5ecb)
(cherry picked from commit c07cf2546b)
A `trio.Nursery.start()`-style wrapper around
`trio.run_process()` that surfaces rc!=0 errors
deterministically, ALWAYS isolates the parent
controlling-tty, and optionally live-relays the child's
std-streams to `log.<level>` per-line. Suits both
short-lived test-runners + long-lived daemons.
`supervise_run_process()`,
- Deterministic rc!=0: pass `check=False` to `trio`
and do our OWN post-drain rc-check from the
supervisor coro body AFTER `own_tn.__aexit__` — NOT
inside the internal nursery, since that would
race-cancel the still-draining relay reader and lose
stderr lines. (Re)build + raise a BARE
`subprocess.CalledProcessError`: `.stderr=` for
programmatic callers + an `add_note()`'d
`|_.stderr:` block for human teardown logs. No
nursery-eg-wrapped CPE to `collapse_eg` around.
- Parent controlling-tty isolation: `stdin=DEVNULL`
always, `stdout=DEVNULL` unless relayed/overridden
(via `stdout=` kwarg w/ `_UNSET` sentinel so explicit
`None` = inherit still works). Prevents a spawned
program from clobbering the launching tty's scrollback
w/ control-seqs.
- Live per-line relay: `relay_stdout=True`/
`relay_stderr=True` → relayed to `log.<relay_level>`
(default `'io'`, our custom level 21). Picked to sort
just above stdlib `INFO`=20 so it shows at usual
`info`/`devx` levels yet stays separately filterable;
`runtime`=15 was REJECTED as a default since it'd be
silently filtered at usual verbosity — footgun for
daemon supervisors whose whole point is visibility.
STREAMED, not buffered-until-exit.
- Non-blocking `tn.start()` semantics: live
`trio.Process` handed up via
`task_status.started()` immediately (else
`tn.start()` would block till child exit, losing
the long-lived-daemon use case). Supervise/relay bg
tasks run to completion in this coro.
- `**run_process_kwargs` forwarded verbatim (env, shell,
cwd, start_new_session, executable, ...); MANAGED keys
(`stdin`/`stdout`/`stderr`/`check`) win on conflict.
- Crash-handling layer intentionally NOT baked in —
compose `maybe_open_crash_handler()` ON TOP at the
call-site.
`_relay_stream_lines()` helper,
- Concurrent pipe-drain reader. MANDATORY whenever piping
w/o `capture_*` since nothing else drains the OS pipe —
child blocks on `write()` once kernel buf (~64KiB) fills
→ deadlock.
- Modes (combine freely): `emit`-only live relay,
`accum`-only silent drain+capture (for the CPE note),
or both. Per-line splitting handles cross-chunk
residuals + flushes any trailing un-newline-term'd line
at EOF.
`_add_stderr_note()` helper,
- Attaches an indented `|_.stderr:` note to a CPE via
`add_note()` for legible rc!=0 reporting at teardown.
Tests (`tests/trionics/test_subproc.py`),
- Hermetic `trio`-only (no actor-runtime).
- `test_stdout_relayed_per_line`: per-line stdout relay.
- `test_parent_tty_isolated`: child fd1 is OUR pipe (no
`/dev/pts/*`), fd0 pinned to `/dev/null`.
- `test_no_deadlock_on_big_unnewlined_output`: 200KiB
no-newline output completes under `fail_after(2)` —
exercises the concurrent drain (without it, the child
blocks at ~64KiB).
- `test_stderr_relay_and_cpe_rebuild`: rc!=0 w/
`relay_stderr=True` → bare `CalledProcessError` w/ the
`.stderr` note + per-line live relay.
- `test_nonrelay_cpe_note`: rc!=0 w/o relay → same
deterministic post-drain CPE w/ `.stderr` note (silent
drain+capture path).
Re-export `supervise_run_process` from `tractor.trionics`.
Prompt-IO: ai/prompt-io/claude/20260601T231429Z_0e3e008b_prompt_io.md
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit f595acc76c)
(cherry picked from commit 6df9ee11bc)
Matches the explicit `dict.pop(uid, None)` contract one
line above; same semantics as the prior truthy check.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 0e3e008b0c)
(cherry picked from commit 13ed668512)
`30e15925` ("Add `start_or_cancel()` to `trionics._taskc`")
inserted `async def start_or_cancel()` — whose body opens its
own col-4 `try:` — immediately before the trailing `else:
raise`. Because the edit was a pure insertion (0 deletions),
the *same* `else: raise` lines were silently REPARENTED: they
used to be the `for exc_match in matching: ... else: raise`
of `maybe_raise_from_masking_exc`, but now bind to
`start_or_cancel`'s `try/except` where they're unreachable
dead code.
Net effect: `maybe_raise_from_masking_exc` lost the `for/else`
re-raise of the un-masked exception, so a masked child
cancellation gets swallowed instead of surfaced.
- restore the `for/else: raise` to `maybe_raise_from_masking_exc`
- drop the now-dead `else: raise` from `start_or_cancel`
Surfaced as 2 deterministic failures in
`test_sigint_closes_lifetime_stack[wait_for_ctx-bg_aio_task-
send_SIGINT_to=child-*]` (the SIGINT-to-child "silent-abandon"
regime). Bisected with `trio` held at `0.29.0`: clean at
`9c36363b` (0/8), broken at `30e15925` (8/8), fixed (0/8).
NOT a `trio` (0.29↔0.33 identical) nor logging-plugin
regression.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 325574cc07)
(cherry picked from commit 0b8033fdaa)
Wrapper around `trio.Nursery.start()` that DOESN'T mask
out-of-band cancellation as a lossy startup failure.
Picks the right re-raise: ambient `Cancelled` when
present, the genuine startup-protocol `RuntimeError`
otherwise.
The problem,
- `trio.Nursery.start()` raises a generic
`RuntimeError("child exited without calling
task_status.started()")` whenever the started task
exits BEFORE calling `task_status.started()` —
INCLUDING the common case where the child was
cancelled out-of-band by an *ancestor* cancel-scope
erroring/cancelling.
- In that case the original `trio.Cancelled` is
swallowed and the caller is left w/ an opaque,
root-cause-detached `RuntimeError`.
The fix,
- Catch the "...started" RTE.
- `await trio.lowlevel.checkpoint_if_cancelled()` —
re-raises the in-flight `Cancelled` IFF we're under
effective cancellation (ancestor-inclusive), carrying
trio's auto-generated reason which points at the true
root exc.
- If we're NOT cancelled the `checkpoint_if_cancelled()`
is a cheap no-op and we fall through to re-raise the
genuine startup-protocol RTE.
Re-export from `tractor.trionics` so callers don't have
to reach into `_taskc`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 30e15925ba)
(cherry picked from commit 2b4589b1ee)
New `tractor.devx._proctitle` mod sets each
sub-actor's `argv[0]` (and kernel `comm`) to
`tractor[<aid.reprol()>]` — e.g.
`tractor[doggy@1027301b]` — so `ps`/`top`/`htop`
and `acli.pytree`/reaper tooling can identify
actors at a glance without parsing full cmdlines.
Deats,
- `set_actor_proctitle()` wraps the `setproctitle`
pkg with `ImportError` guard; optional at runtime
but listed in `pyproject.toml` so default installs
benefit.
- called early in `_child._actor_child_main()` after
`Actor` construction, before `_trio_main()` entry.
- tests in `tests/devx/test_proctitle.py`: format
unit test, `/proc/{cmdline,comm}` integration
test, negative detection test.
Resolves#457
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit d60245777e)
Reshuffle `pyproject.toml` deps into per-python-version
`[tool.uv.dependency-groups]`:
- `subints` group: `msgspec>=0.21.0`, py>=3.14
- `eventfd` group: `cffi>=1.17.1`, py>=3.13,<3.14
- `sync_pause` group: `greenback`, py>=3.13,<3.14
(was in `devx`; moved out bc no 3.14 yet)
Bump top-level `msgspec>=0.20.0` too.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 34d9d482e4)
(factored: kept only the pyproject dep-group parts of
"Raise `subint` floor to py3.14 and split dep-groups"; dropped
tractor/spawn/_spawn.py + tractor/spawn/_subint.py)
The `cpu_scaling_factor()` CI bump (2x) wasn't enough for the
macOS runners — slower + noisier than linux for our multi-actor
cancel-cascade timing — so `test_nested_multierrors[depth=3]` and
`test_fast_graceful_cancel_*` flaked there (linux + sdist green),
- make the CI headroom platform-aware: 3x on macOS, 2x on linux
(keeps the proven linux budget; depth=3 inner 12*3=36s still
fits under the 40s outer wall).
- give `test_fast_graceful_cancel_*` the `cpu_scaling_factor()`
headroom it was missing entirely (was a bare `timeout=2.9`).
Verified locally w/ `CI=1` (2x linux path; 3x macOS path confirmed
via forced `_non_linux`).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
All 5 flagged items were valid (4 real bugs + 1 dead assert),
- fix an inverted `sys.version_info < (3, 14)` guard in
`ipc._linux` — the "`cffi` has no 3.14 support" import note now
fires on 3.14+ (where it applies) instead of on older pys.
- use `os.environ.get('PYTHON_COLORS')` in the `sync_bp` example
so it doesn't `KeyError` when run outside the test harness.
- correct `dump_task_tree()`'s docstring: the `/tmp` + `/dev/tty`
tee is gated on `write_file`/`write_tty`, not "unconditional".
- tidy the `ActorTooSlowError` message spacing in `cancel_actor`.
- replace a tautological `applied is True or applied is False` in
`test_patches` with `isinstance(applied, bool)` (the value is
order-dependent across the module).
Review: PR #462 (copilot-pull-request-reviewer)
https://github.com/goodboy/tractor/pull/462#pullrequestreview-4527179852
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
GH Actions (and most shared) CI runners are slow + noisy and —
unlike a throttled local box — don't expose CPU-freq scaling via
sysfs, so `cpu_scaling_factor()` read `1.0` and the timing-
sensitive deadlines/asserts that key off it got NO headroom on
CI (a class of `TooSlowError` / `assert diff < this_fast` flakes),
- add a flat `_ci_env` x2 bump inside `cpu_scaling_factor()` so
every test already using it (quad streaming, SIGINT-cancel,
docs examples, ...) gets CI headroom for free — compounds with
any local-throttle factor.
- route the `time_quad_ex` cancel-deadline through it instead of
a bespoke per-test `ci_env` bump.
- fix a real bug in `test_nested_multierrors`: its OUTER
`@tractor_test(timeout=10)` was *smaller* than the inner
`fail_after_w_trace` budget (trio depth=3 = 12s), so the outer
wall fired first and pre-empted the snapshot-capturing inner
deadline -> `FAILED` instead of dumping. Bump the outer to `40`
(> max inner budget) and scale the inner budgets by
`cpu_scaling_factor()` too.
Verified locally with `CI=1` (quad + both `test_nested_multierrors`
depths + `test_cancel_via_SIGINT_other_task` green).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
The `TRACTOR_LOGLEVEL`/`TRACTOR_SPAWN_METHOD` override-notice
branches were unreachable: `loglevel`/`start_method` were
reassigned to the env value BEFORE the `!=` compare, so the
"OVERRIDES caller-passed" message never fired. Capture the
caller value first, then compare. Rel. `208e7c09`/`d4eac06d`
"Honor env-vars" (`trionics.start_or_cancel`); surfaced by
`/code-review high` on #462.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Two defensive fixes around the `Portal.cancel_actor()` +
`_try_cancel_then_kill()` escalation from `34f333a0`
"Escalate cancel-ack timeouts to `proc.terminate()`" (the
`trionics.start_or_cancel` follow-up); surfaced by
`/code-review high` on #462,
- guard `proc.terminate()` for backends whose `proc` slot
isn't a `Process` — the future `subint` backend stores an
`int` interp-id, so escalation would `AttributeError`
instead of hard-killing; now it logs + no-ops.
- swap `assert cs.cancelled_caught` for an
`if cs.cancelled_caught and raise_on_timeout:` guard so an
unexpected shielded-scope exit returns a soft `False`
rather than crashing `cancel_actor()` mid-teardown.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Two fixes to the hang-debug SIGUSR1 task-tree dump path,
surfaced by `/code-review high` on #462,
- re-add `_debug_mode` to the sub-actor handler-install gate
in `_runtime.py`. Dropping it (rel. `3a386ba5`/`3d9c75b6`
"Drop debug_mode gate", from the `custom_log_levels_api`
follow-up) was meant to *also* enable non-pdb runs, but
nothing sets `use_stackscope` from `debug_mode`, so
debug-mode subs were left with NO handler — and the default
SIGUSR1 disposition then *kills* them. Now additive:
`_debug_mode OR use_stackscope OR env`.
- pass `write_file=True` at both `dump_task_tree()` SIGUSR1
call sites so the advertised `/tmp/tractor-stackscope-<pid>`
`.log` tee is actually written (was dead under
`--capture=fd`). Matches `1b1ef10a` "Re-enable writing
`stackscope` to file by default"; param from `0df90500`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Left-over debug trap from the `_runtime_vars` pure get/set
refactor — it fired on *every* struct-form rt-var write (e.g.
via `.update()`), hanging any non-tty / CI / forked actor on
`pdb` stdin.
Surfaced by a `/code-review high` pass on #462.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Regenerate the lockfile so it's consistent with the
post-rebase `pyproject.toml` — which now carries both #461's
landed tooling (`pytest>=9.0.3`, …) and this branch's
tractor deps (`setproctitle`, `pytest-timeout`, `psutil`),
- `uv lock` resolves the merged dep set against the landed
`main` baseline.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
`open_root_actor()` writes `_enable_tpts` (and friends) into
the process-global `_state._runtime_vars` dict but nothing
resets it on actor teardown. Under the in-proc `pytest`
launchpad a uds-using test leaks `_enable_tpts=['uds']` into
a sibling tcp test, tripping the
`registry_addrs`×`enable_transports` proto-guard in
`open_root_actor()` with a `ValueError`.
New `_reset_runtime_vars` fixture snapshots + restores the
dict around every test so no runtime-var state crosses a
test boundary.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Same trio 0.29 → 0.33 cancel-cascade slowdown that hit
`test_nested_multierrors` (ea67f1b6) — bumps the
`trio`-backend (non-debug, non-forking) budget in
`test_echoserver_detailed_mechanics` from 1s → 4s.
- The 1s budget raced the ~1s teardown deadline. On a
deadline-fire trio 0.33 injects
`Cancelled(source='deadline')` (cancel-reason
metadata) that wraps the mid-stream KBI in a
`BaseExceptionGroup`, breaking the bare
`pytest.raises(KeyboardInterrupt)` below.
- Bump matches the forking-spawner branch (4s).
- Inline NOTE references the tracking issue
`ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit d7da502d93)
(cherry picked from commit d0144e52cb)
trio 0.29 → 0.33 lock bump (c7741bba) slowed the
depth=3 cancel-cascade in `test_nested_multierrors`
from <6s to ~7-8s; the 6s deadline was firing and its
`Cancelled(source='deadline')` (trio 0.33's new
cancel-reason metadata) collapsed a BEG branch,
breaking the `RemoteActorError` assertion downstream.
- Split the `('trio', _)` case-match into per-depth
arms: `('trio', 1)` keeps 6s (still finishes in
~3s); `('trio', 3)` → 12s.
- Updated inline NOTE explains the version pivot +
links the tracking issue
`ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`.
- Existing MTF/`subint_forkserver` budgets unchanged.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit ea67f1b67b)
(cherry picked from commit 57b3ea59ea)
Strip the trailing `pkg_path` token ONLY when it duplicates the
caller's leaf-*module* name (which the console header already
shows via `{filename}`), instead of blindly dropping the last
token. This keeps genuine, possibly-*nested* sub-PACKAGE parts
addressable as their own sub-loggers.
- detect a true leaf-mod by comparing the caller's `__name__`
vs `__package__` (a pkg `__init__` has them equal -> its
trailing token is a real sub-pkg, NOT a leaf to strip).
- `name='devx.debug'` now -> `tractor.devx.debug`, DISTINCT
from a bare `devx` -> `tractor.devx`; the old unconditional
`pkg_path = subpkg_path` collapsed both to `tractor.devx` and
silently broke per-sub-pkg level control via the logging-spec.
- `get_logger(__name__)` leaf-strip still works (cosmetic, bc
the leaf-mod is in the `{filename}` header field).
Also,
- update the `LogSpec` caveat: sub-PACKAGE granularity now
addressable at ANY depth; leaf *modules* intentionally aren't
(they're the `{filename}`); top-level mods (eg. `to_asyncio`)
still emit on the root logger.
- adjust `test_root_pkg_not_duplicated_in_logger_name` to the
new literal explicit-`name` contract (no leaf-collapse).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 9c36363b01)
Two coupled changes that let downstream projects (eg. `modden`) inherit
the test-harness loglevel plumbing for free via
`tractor._testing.pytest`:
Plugin lift (`tests/conftest.py` → `_testing/pytest.py`),
- mv `pytest_addoption(--ll)`, the `loglevel` autouse
fixture, and `test_log` fixture out of the test-suite-
local conftest into the reusable plugin.
- add `--tl`/`--tractor-loglevel` as a DISTINCT flag from
`--ll`: `--ll` is the consuming-project's OWN app
loglevel (scoped to its pkg-hierarchy), `--tl` is the
`tractor.*` runtime loglevel. `--tl` falls back to
`--ll` when unset (preserves current `tractor`-suite
behavior).
- add `testing_pkg_name` session fixture (default
`'tractor'`) — downstream projects override to e.g.
`'modden'` so `--ll` scopes to their own hierarchy
instead of `tractor.*`.
- `loglevel` fixture now yields the resolved
tractor-runtime level (passed to
`open_root_actor(loglevel=<.>)` by `@tractor_test`)
AND separately applies `--ll` to the
`testing_pkg_name` hierarchy when that isn't
`tractor`. `test_log` scopes the per-test logger to
`testing_pkg_name`.
`tractor.log` "logging-spec" mini-DSL,
- `LogSpec = str|bool`. Accepted forms:
- `True` → enable `pkg_name` root at `default_level`
(fallback `'cancel'`).
- `False` → no-op.
- bare level eg. `'info'` → root-logger at that level.
- `'sub:info,x:cancel'` → per-sub-logger filter-spec;
each `<name>` is RELATIVE to `pkg_name` (must NOT
include the pkg-token).
- `parse_logspec()` → `{sublog|None: level}` mapping.
`None` key = root-logger. Mixed bare-level + filters
in one spec is rejected w/ a helpful err msg; so is
embedding the `pkg_name` token in a sub-name.
- `apply_logspec()` → `(primary_level, {name: log})`:
parses then enables a `colorlog` stderr handler per
named (sub)logger. Authoritative sub-logger filters
get `propagate=False` so they don't double-emit
through a parallel root-level handler.
- !GRANULARITY CAVEAT! sub-logger names match at
sub-pkg granularity, not leaf-module — so `devx.debug`
collapses to the same `tractor.devx` logger as a bare
`devx`, and top-level lib modules (eg.
`tractor.to_asyncio`) emit under the *root* logger
rather than a phantom `to_asyncio` child. Documented
inline on `LogSpec`.
Other,
- `tests/conftest.py` keeps a NOTE pointing to the
plugin for future-debugging clarity (don't remove
silently — the lift is the relevant signal).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 19a77708ba)
Factor the "deliver an exc to a running aio task" pattern out of
`translate_aio_errors()` + `open_channel_from()` into a shared
`maybe_signal_aio_task()` helper. Add a cause-chain matrix comment
+ relay-echo guard so the final-raise block can't cycle
`trio_err.__cause__` back onto its own derivative relay.
`maybe_signal_aio_task()`,
- Delivers `exc` via `aio_task._fut_waiter.set_exception()` — NOT
`aio_task.set_exception()` which on py3.13+ ALWAYS raises
`RuntimeError("Task does not support set_exception")` (dead code as
a relay mechanism).
- Returns `(delivered: bool, report: str)`. Caller uses `delivered` to
flip `wait_on_aio_task` when delivery failed (avoids hanging on
`_aio_task_complete.wait()`).
- `pre_captured_fut=`: required when the caller crosses a trio
checkpoint between capturing `_fut_waiter` and invoking the helper.
`Task._wakeup` clears `_fut_waiter = None` so re-reading
post-checkpoint loses the ref even though the exc is still in-flight
on the (now-`done()`) original fut.
- `cause=`: sets `exc.__cause__ = cause` so the relay carries
a "trio_err -> caused -> relay" chain through `set_exception()`
→ `Task._wakeup` → coro raise → `wait_on_coro_final_result`
→ `signal_trio_when_done` → `task.result()`-raise.
- `allow_cancel_fallback=True`: opt-in `aio_task.cancel()` for the
narrow case where `_fut_waiter is None` AND task is runnable (sitting
in asyncio's ready queue, not parked on a poke-able future). NEVER
cancels when `_fut_waiter` carries an in-flight exc — that would race
+ mask the real terminating exc.
`translate_aio_errors()`,
- Replace the two ad-hoc `_fut_waiter.set_exception()`
/ `aio_task.set_exception()` call sites w/ the helper.
- Capture `pre_cp_fut = aio_task._fut_waiter` BEFORE the post-shutdown
`trio.lowlevel.checkpoint()` (critical: `_wakeup` clears the ref).
- New "cross-loop cause-chain matrix" comment block on the final-raise
— tabulates every `(trio_err, aio_err, trio_to_raise)` combo into
exactly one terminal `raise X [from Y]` or early `return`. Covers the
sibling `signal_trio_when_done()` resolution + the relay-echo
INVARIANT.
- New relay-echo guard: if `aio_err` is one of OUR OWN signals
(`TrioTaskExited`/`TrioCancelled`) AND `aio_err.__cause__ is
trio_err`, raise the bare `trio_err` instead of `trio_err from
aio_err` (which would CYCLE the cause chain since the relay was itself
caused-by `trio_err`).
- Drop the stale "the `task.set_exception(aio_taskc)` call MUST NOT
EXCEPT or this WILL HANG" warning — the helper handles the failure
path explicitly via `delivered=False` → `wait_on_aio_task = False`.
- Carry `cause=trio_err` on both the cancel-relay (`TrioCancelled`) and
the graceful-exit relay (`TrioTaskExited`) so the aio-side traceback
shows the real root.
`open_channel_from()`,
- Adopt the same helper; drop the dead "SHOULD NEVER GET HERE !?!?"
+ `tractor.pause(shield=True)` panic branch.
- Capture in-flight trio-side exc via `sys.exc_info()[1]` and pass as
`cause=` — non-`None` only when the `try` body raised (graceful exit
→ None).
Other,
- Top-level import: `sys` (for `sys.exc_info()`).
- `run_as_asyncio_guest()`: add commented-out alt `out: Outcome = await
trio_done_fute` next to the shielded version — exploratory note for
the longstanding "why is `.shield()` needed?" TODO.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit acd1cbeec4)
SIGUSR1 task-tree dumps via `stackscope` should work in
plain (non-pdb) runs too — esp. in infected-`asyncio`
processes where the kernel-default SIGUSR1 disposition is
`Term` (proc dies on `kill -USR1` w/o an installed
handler). Ungate the install path from `_debug_mode` in
both root and sub-actor init; the `use_stackscope` rt-var
+ `TRACTOR_ENABLE_STACKSCOPE` env-var checks remain as
the actual opt-in (e.g. via `--enable-stackscope`).
Deats,
- `_root.open_root_actor`: drop the `debug_mode and ...`
conjunction around the `enable_stack_on_sig()` call;
now gated only on the `enable_stack_on_sig` arg itself.
- `_runtime.Actor` sub-actor init: lift the
`use_stackscope`/`TRACTOR_ENABLE_STACKSCOPE` branch out
of the `if rvs['_debug_mode']:` block to peer-level.
The `use_greenback` branch stays inside `_debug_mode`
(pdb-specific).
- Refresh inline comments on both sites to call out the
infected-`asyncio` "default SIGUSR1 = terminate proc"
rationale.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 3d9c75b6ed)
Adopt the `_testing.trace` CM helpers in two MTF-hang-prone
tests so on-timeout we get a fresh
`ptree`/`wchan`/`py-spy` diag snapshot on disk instead of
opaque pytest timeout-kills. Same shape as bd07a95d for
`test_dynamic_pub_sub`.
Deats,
- `test_echoserver_detailed_mechanics`:
* inner `trio.fail_after` → `fail_after_w_trace`. Adds
`fail_after_w_trace: FailAfterWTraceFactory` fixture
param.
* mv per-backend `timeout` calc to top of test body (was
interleaved w/ helper defs).
* factor deep
`open_nursery`/`open_context`/`open_stream` body into
`_body()` so the wrapping `main()` stays a 2-liner —
keeps the nested-CM block at its natural indent level
instead of pushing it under yet another `async with`.
* drop `with_timeout: bool` knob + `fa_main()` helper
(knob was hard-coded `True`).
- `test_sigint_closes_lifetime_stack`:
* outer `signal.alarm`/`try`/`finally` → single
`afk_alarm_w_trace(10)` CM. Adds
`afk_alarm_w_trace: AfkAlarmWTraceFactory` fixture
param.
* drop `_AFK_CAP_S` + `armed_alarm` vars (CM owns both).
* explanatory comment refreshed to mention
`AFKAlarmTimeout` + the disk-snapshot side effect.
Other,
- Drop debug `return 1e3` short-circuit from `delay()`
fixture — snuck in as a scratch line, was clobbering the
proper `debug_mode`-branched return.
- Top-level import: `FailAfterWTraceFactory`,
`AfkAlarmWTraceFactory` from `tractor._testing.trace`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 1cafaecf52)
Per-terminal optimized `watch`-like xonsh alias that
runs an arbitrary callable alias in a loop inside the
alt-screen buffer with flicker-free repaint. Supersedes
the inline `acli.ptree` polling .xsh snippet (removed
from `_ptree` docstr in favor of
`acli.watch acli.ptree pytest`).
Deats,
- alt-screen entry/exit (`\033[?1049h/l`) + cursor-hide
(`\033[?25l/h`) wrapped in try/finally so Ctrl-C always
returns to a pristine shell.
- per-frame draw uses cursor-home (`\033[H`) + per-line
EL (`\033[K` before each `\n`) + post-draw erase-down
(`\033[J`) → stale tail chars from a longer prior
frame are obvi cleared; no full-screen flash.
- SIGWINCH-aware: terminal resize sets a flag, next
frame does a full clear (`\033[H\033[2J`) instead of
the cheap cursor-home path.
- Ctrl-C handling: install `signal.default_int_handler`
so `KeyboardInterrupt` lands cleanly; prior handler
restored on exit.
- Output capture: redirect the alias's stdout to
`StringIO` per frame so we can post-process the EL
fix. Aliases writing directly to `sys.stdout.buffer`
/ `os.write(1)` bypass capture — EL-fix won't apply
but loop still works.
- Alias unwrap: xonsh stores callables as either a bare
callable OR `[fn, *preset_args]`. Both forms handled;
subprocess-style aliases rejected w/ a friendly err
msg.
- `argparse` w/ `-n`/`--interval` (default 0.3s); rest
of argv forwarded as alias args.
- Reg `'acli.watch': watch` in `_TCLI_ALIASES`.
Other,
- Tn `_ptree` `args: list[str]` param.
- Mod-header `Provides:` block updated w/ `acli.watch`
entry.
- Top-level imports: `os`, `sys`, `signal`, `time`,
`typing.Callable`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit bb239e847f)
Only flag `tractor._child` procs as cross-test ghosts of
THIS run if `ppid==1` (init-adopted real leak) or `ppid`
is in the walk's `seen` set (descendant we missed via
race).
Previously, procs whose `ppid` points to some OTHER live non-`pytest`
(in the use of `acli.ptree pytest`) process belong to a different
tractor app (`piker`, another `pytest` shell, a long-running tractor
daemon) and were being falsely flagged as cross-test ghosts.
Deats,
- post-cmdline-match check via `_ppid_from_proc(pid)`,
short-circuit on `None` (proc died in-flight).
- expand module docstring to spell out the ownership
filter rule + its rationale.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit a6d4ac3aac)
Replace inline `trio.fail_after` + manual `signal.alarm` guard with the
`_testing.trace` CM helpers that auto-capture a full ptree/wchan/py-spy
diag snapshot to disk on timeout.
Deats,
- inner guard: `trio.fail_after` → `fail_after_w_trace` (async CM,
captures on `TooSlowError`).
- outer AFK guard: raw `signal.alarm` → `afk_alarm_w_trace` (sync
CM, captures on `SIGALRM`), only armed under fork backends.
Extracts `_run_and_match()` helper to keep branching clean.
- bump `fail_after_s` from 4/12 → 8/20 to stop borderline flakes
while diag harness accumulates evidence.
- drop `_DIAG_CAP_S` var + manual signal import (now internal to
`afk_alarm_w_trace`).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit bd07a95d80)
Deats,
- `pytestmark`: enrich `skipon_spawn_backend('subint')` reason with
conc-anal doc refs + GH#379 link, add `reap_subactors_per_test`,
`track_orphaned_uds_per_test`,
`detect_runaway_subactors_per_test` fixtures
- `test_nested_multierrors`: parametrize over `depth` `{1, 3}`, add
MTF `xfail(strict=False)` with detailed race-window comment
explaining the BEG shape mismatch, wrap body in
`fail_after_w_trace` with per-backend timeout budget, bump
`@tractor_test(timeout=10)`, drop old multiprocessing depth
special-casing
- `test_multierror_fast_nursery`: wrap in
`fail_after_w_trace(30.0)`, accept `TooSlowError` in
`pytest.raises`, surface explicit `pytest.fail` on hang
- `test_cancel_while_childs_child_in_sync_sleep`: swap
`spawn_backend` param for `is_forking_spawner`, widen
`fail_after` delay for fork-based spawners
- `test_remote_error`, `test_multierror`,
`test_cancel_infinite_streamer`, `test_some_cancels_all`: add
`set_fork_aware_capture` fixture param
- Drop commented-out per-test `skipon_spawn_backend` blocks (now
covered by module-level `pytestmark`)
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 32955db02e)
Post-yield now also reaps init-adopted (`ppid==1`) tractor procs
that appeared during the test — leaked subactors whose mid-tier
parent died during cascade teardown, reparenting them to init.
Pre-yield snapshot of existing orphans scopes reap to THIS test's
leaks only, avoiding reap of unrelated tractor uses (piker, etc.)
on the box.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 01ce2857ea)
`reap(include_descendants=True)` now expands each orphan-root pid
into its full psutil subtree before delivering SIGINT, so a
multi-level leaked actor-tree gets torn down in a single pass
instead of requiring repeated calls (each pass kills the current
`ppid==1` level, the level below becomes init-adopted, etc.).
Falls back to the original flat `pids` list when `psutil` is
unavailable. Emits a log line when expansion adds descendant pids.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 8de684f5de)
- `_testing/trace.py`: add `_SNAPSHOT_INDEX` session- scoped list
populated by `_do_capture_snapshot()` on each successful dump;
add TODO for future `TRACTOR_TRACE_HOLD=1` pause-on-hang mode
- `_testing/pytest.py`: add `pytest_terminal_summary` hook that
prints all captured snapshot dirs at end-of-session so paths
don't get buried in scrollback
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit fb87c36263)
Deats,
- `_find_tractor_strays()`: scan `/proc/*/cmdline` for
`tractor._child` procs NOT in the walk's `seen` set — surfaces
ghost subactor trees from prior test runs (cross-test launchpad
contamination).
- `dump_proc_tree(include_strays=True)`: refactor classification
into `_classify_walk()` closure, walk stray roots as additional
trees, emit stray-root summary in header. Also: `tractor._child`
procs reparented to init are now always classified as orphans
regardless of cgroup-slice (leaked subactor ≠ desktop-launched
app).
- `_do_capture_snapshot()`: use `sys.__stderr__` to bypass pytest
`--capture=sys` redirection so snapshot paths always land on the
real terminal
- `fail_after_w_trace()`: capture diag snapshot on
non-`TooSlowError` exceptions when the `fail_after` scope's
cancel had already fired (e.g. nursery wraps `Cancelled` into a
`BaseExceptionGroup` that escapes before `TooSlowError` can be
raised).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 3a243a1fd4)
Extract all pure-Python diagnostic helpers (`dump_proc_tree`,
`dump_hung_state`, `scan_bindspace`, `dump_all`, `resolve_pids`,
`ensure_sudo_cached`, etc.) from the xonsh xontrib into a new
`tractor/_testing/trace.py` module so the same logic is callable
from both the `acli.*` terminal aliases AND in-test capture-on-hang
fixtures.
Deats,
- `_testing/trace.py`: new module (1171 lines) — proc-tree walker,
hung-state dumper, bindspace scanner, `dump_all()` snapshot
archiver, `AFKAlarmTimeout` exc, `fail_after_w_trace()` async CM
(trio `fail_after` + auto-snapshot on `TooSlowError`),
`afk_alarm_w_trace()` sync CM (`signal.alarm` + snapshot on
`SIGALRM`), plus pytest fixture wrappers for both.
- `_testing/pytest.py`: re-export the two fixtures via `from .trace
import` so pytest plugin-discovery picks them up.
- `tractor_diag.xsh`: thin terminal wrappers that import from
`_testing.trace` — drops ~627 lines of inline impl. Add
`acli.dump_all` alias for full snapshot-bundle CLI access.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 7509e313ff)
Deats,
- `test_echoserver_detailed_mechanics`: add `is_forking_spawner`
param, wrap `main()` in `fa_main()` with per-backend
`trio.fail_after` (4s fork / 1s trio) to cap cancel-cascade
teardown that compounds under forkserver.
- `test_sigint_closes_lifetime_stack`: swap `start_method` param
for `is_forking_spawner`, pre-init `tmp_file`/`ctx` to `None` so
KBI firing before `open_context` body doesn't `UnboundLocalError`,
add `pytest.fail` guard for the spawn-time IPC race case, arm
`signal.alarm` AFK-safety cap (10s) under fork backends
Also,
- `pytestmark`: add `track_orphaned_uds_per_test` +
`detect_runaway_subactors_per_test` fixtures.
- `delay()`: hardcode `return 1e3` at top (debug override still in
place).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 7ee0dc2e8f)
For forking spawner backends that is.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit b10011a36e)
- `skipon_spawn_backend('subint')`: expand reason with specific
analysis doc refs + GH issue #379 umbrella link.
- add `track_orphaned_uds_per_test` fixture via `usefixtures` to
blame-attribute UDS sock-file orphans left by SIGKILL cancel
cascades.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 7d0a53d205)
- `test_ext_types_over_ipc`: wrap `main()` in `fa_main()` with
`trio.fail_after(2)` + commented `capfd.disabled()` investigation
(pytest#14444).
- `test_basic_payload_spec`: add fixture param with note on fork-spawner
hang prevention.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 8aa07a7932)
Outer `signal.alarm` cap that fires even when trio's
`fail_after` is blocked by a shielded-await deadlock
(the bug-class-3 hang under MTF backends). Only armed
for fork-based spawners where the bug lives.
Deats,
- `_DIAG_CAP_S = fail_after_s + 5` — slightly larger than the
trio-native guard so it always loses when the in-band path works.
- `test_log.cancel()` breadcrumbs at each cancel-scope boundary so the
last-fired breadcrumb names the swallow point on hang.
- try/finally wrapping around each scope level for deterministic
breadcrumb emission.
- add `is_forking_spawner`, `set_fork_aware_capture` fixture params.
- rework `fail_after_s`: 4s for fork, 12s for trio (was 30/12).
Also,
- `test_sigint_both_stream_types`: `assert 0` -> `pytest.fail()`, add
TODO re `pytest.raises()`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 10db117864)
Split the old `live`/`orphans` sock classification
into three ppid-aware buckets: `live-active` (PID
alive, parent owns it), `orphaned-alive` (PID alive
but `ppid==1`, init-adopted — `acli.reap` candidate),
and `orphaned-dead` (PID gone, sock stale).
Deats,
- new `_ppid()` helper reads `/proc/<pid>/stat` field [3] for parent
PID, handles the tricky `(comm)` field (can contain spaces/parens) by
splitting from last `)`.
- live-active rows now show `(ppid=<N>)` for ctx.
- orphaned-alive rows flagged `(adopted by init)`.
- cleanup suggestion: `acli.reap --uds` for both
alive-orphan graceful cancel + dead-sock cleanup
in one shot; manual `rm` kept as fallback.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 9bbb6f796b)
New `ai/conc-anal/spawn_time_boot_death_dup_name_issue.md`
documenting the spawn-time rc=2 race under rapid
same-name spawning against a forkserver + registrar
— the `wait_for_peer_or_proc_death` helper now surfaces
the death instead of parking forever on the handshake
wait.
Also,
- extract inline `xfail` into module-level
`_DOGGY_BOOT_RACE_XFAIL` marker.
- apply it to `n_dups=8` too (previously bare) bc
larger N widens the race window enough to fire
occasionally.
- link to tracking issue #456.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 92443dc4ef)
Forking spawner + UDS transport has different timing
vs `trio_proc` — streaming example completes faster
in some cases, slower in others depending on fork
overhead + sock setup.
Deats,
- add `expect_cancel` param to `cancel_after()`, raise
`ActorTooSlowError` when cancel scope fires unexpectedly instead of
silently returning `None`.
- `time_quad_ex` fixture: bump timeout +1 for forking+UDS, explicit
`ActorTooSlowError` on `None` result instead of bare `assert results`.
- `test_not_fast_enough_quad`: `xfail` for forking+UDS being "too fast"
(cancel doesn't fire bc streaming finishes before delay).
- add `is_forking_spawner`, `tpt_proto` fixture params throughout.
Also,
- `_testing/pytest.py`: widen `start_method` parametrize and
`is_forking_spawner` fixture to `scope='session'`.
- `"""` -> `'''` docstring style throughout.
- hoist `_non_linux` to module scope (was redefined locally in two
places).
- type hints, kwarg-style `partial()` calls.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit d3cbc92751)
`acli.bindspace_scan piker` now resolves `<name>` to
`$XDG_RUNTIME_DIR/<name>` — useful for projects like
`piker` that bind sibling sub-dirs alongside tractor's
default. Full paths still work as-is.
Also,
- rename "unparseable" section to "non-tractor" with
clearer desc (filename lacks `@<pid>` suffix)
- print per-sock `ss -lpx 'src = <path>'` cmds for
non-tractor socks so callers can manually resolve
listener-PID liveness
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 099104e0af)
Add module-level `pytestmark` applying per-test
`reap_subactors_per_test`, `track_orphaned_uds_per_test`, and
`detect_runaway_subactors_per_test` fixtures — registrar tests stress
discovery roundtrips that historically left orphaned UDS sock-files.
Deats,
- drop unused `say_hello()` fn, keep only `say_hello_use_wait`;
rename param `func` -> `ria_fn`.
- use `@tractor_test(timeout=7)` instead of separate
`@pytest.mark.timeout(7, method='thread')` decorator.
- add `with_timeout()` helper, wire into
`test_subactors_unregister_on_cancel_remote_daemon`.
- uncomment `_timeout_main()` in `test_stale_entry_is_deleted`, use
configurable `timeout` var + `debug_mode` guard for `tractor.pause()`
on cancel.
- `dump_on_hang(seconds=timeout*2)` instead of hardcoded `20`.
- fix typo "oustanding" -> "outstanding".
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit abd3950ba6)
Rework reap/diag tooling to identify tractor sub-actors via
intrinsic proc signals — cmdline/comm markers from `setproctitle` —
instead of env-var or cwd matching.
Deats,
- new `_is_tractor_subactor()` checks cmdline for `tractor[` /
`tractor._child` markers, falls back to `/proc/<pid>/comm` for
zombie-resilient detection (kernel preserves `comm` past exit
until reap)
- `_read_comm()` reads kernel per-task name set by `setproctitle()`
— the zombie-safe ID signal
- `_read_status_state()` reads single-letter proc state from
`/proc/<pid>/status` (`Z` = zombie)
- `find_orphans()` drops `repo_root` requirement, uses
`_is_tractor_subactor()` for intrinsic sub-actor ID instead of
cwd coincidence-matching
- new `find_zombies()` with optional `parent_pid` filter for
zombie-state sub-actors
Also,
- rename `pytree` -> `ptree` throughout xontrib
- add `_which_cgroup_slice()` — reads `/proc/<pid>/cgroup` to
distinguish `system.slice` services vs `user.slice` desktop apps
from genuinely leaked orphans
- `_ptree` classifies `ppid==1` procs into `system-slice`,
`user-slice`, and `orphans` buckets with per-section output
- `_tractor_reap` drops `git rev-parse` / `sys.path` hack — assumes
tractor importable from active venv
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 522b57570b)