Same trio 0.29 → 0.33 cancel-cascade slowdown that hit
`test_nested_multierrors` (ea67f1b6) — bumps the
`trio`-backend (non-debug, non-forking) budget in
`test_echoserver_detailed_mechanics` from 1s → 4s.
- The 1s budget raced the ~1s teardown deadline. On a
deadline-fire trio 0.33 injects
`Cancelled(source='deadline')` (cancel-reason
metadata) that wraps the mid-stream KBI in a
`BaseExceptionGroup`, breaking the bare
`pytest.raises(KeyboardInterrupt)` below.
- Bump matches the forking-spawner branch (4s).
- Inline NOTE references the tracking issue
`ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
trio 0.29 → 0.33 lock bump (c7741bba) slowed the
depth=3 cancel-cascade in `test_nested_multierrors`
from <6s to ~7-8s; the 6s deadline was firing and its
`Cancelled(source='deadline')` (trio 0.33's new
cancel-reason metadata) collapsed a BEG branch,
breaking the `RemoteActorError` assertion downstream.
- Split the `('trio', _)` case-match into per-depth
arms: `('trio', 1)` keeps 6s (still finishes in
~3s); `('trio', 3)` → 12s.
- Updated inline NOTE explains the version pivot +
links the tracking issue
`ai/conc-anal/trio_033_cancel_cascade_slowdown_depth3_issue.md`.
- Existing MTF/`subint_forkserver` budgets unchanged.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Two version-compat fixes for the `devx` debugger
test-suite, both about matching upstream output that
got more verbose w/ recent lib releases.
ANSI stripping (`tests/devx/conftest.py`),
- Add `ansi_strip(text)` helper + `_ansi_re` pattern
(regex per https://stackoverflow.com/a/14693789).
- Apply inside `in_prompt_msg()` + `assert_before()` so
substring matches against REPL/traceback output stay
robust to color leakage.
- Motivated by py3.13's colored tracebacks +
`pdbp`/pygments highlighting leaking ANSI even when
`PYTHON_COLORS=0` is set in the `spawn` fixture (not
every renderer in the spawned subproc honors it).
- Replaces the longstanding inline TODO that linked
the SO answer w/o impl'ing.
trio 0.30+ `Cancelled._create(` match (`test_debugger`),
- In `test_shield_pause` swap the two
`"raise Cancelled._create()"` assertion patterns →
`"raise Cancelled._create("` (open-paren form, no
closing).
- trio >=0.30 raises a multi-line
`raise Cancelled._create(source=.., reason=..,
source_task=..)` w/ cancel-reason metadata, so the
legacy bare-`()` form no longer matches. Inline
comment documents the trio-version pivot.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Make the sub-actor proc-title prefix a single
authoritative constant (`_proctitle._def_prefix`) so
the reap-recognition markers and `xontrib` banner pick
it up automatically — one place to flip the prefix
shape going fwd.
Deats,
- `_proctitle._def_prefix: str = '_subactor'`. New
module-level const consumed by everything that needs
to know the prefix.
- `set_actor_proctitle(actor, prefix=_def_prefix)`:
takes an explicit `prefix` arg (default = the const)
so callers can override per-spawn if they want.
- Default proc-title format:
`'tractor[<reprol>]'` → `f'{prefix}[<reprol>]'`
i.e. `_subactor[<reprol>]` by default.
- `_testing/_reap.py`: cmdline + comm markers source
the prefix from `_proctitle._def_prefix` instead of
the hardcoded `'tractor['`. So
`_is_tractor_subactor()` tracks the const
automatically.
- `xontrib/tractor_diag.xsh`: `acli.reap` orphan-mode
banner now interpolates the
`_TRACTOR_PROC_CMDLINE_MARKERS` tuple directly so
the human-readable mode line stays in sync if the
prefix shape changes again.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Follow-up to f595acc7 (`supervise_run_process`) which
called `log.io(...)` for std-stream relay assuming an
`IO=21` level existed. Add the registration via a new
factory + tests covering both the factory and the new
level.
`add_log_level()` factory,
- One call wires the four (otherwise hand-synced) pieces:
- `CUSTOM_LEVELS[NAME]` — drives the `stacklevel` bump
in `StackLevelAdapter.log()` + `get_logger()`'s
per-level audit.
- `logging.addLevelName()` — stdlib name registration.
- `STD_PALETTE[NAME]` + `BOLD_PALETTE['bold'][NAME]` —
color entries consumed by `get_console_log()`'s
`ColoredFormatter` build.
- Same-named (lowercase) emit method bound on
`StackLevelAdapter` so `log.<name>('msg')` works +
`get_logger()`'s per-level method audit passes.
- Idempotent: re-registering an existing name is a
no-op-ish refresh that won't clobber an already-bound
method.
- Method binding uses a default-arg `_level=value` so
the level int is captured (not late-bound across
multiple registrations).
`IO=21` level (first user),
- Purple. Used by `tractor.trionics._subproc`'s
std-stream relay (see f595acc7).
- Value 21 picked to sit just ABOVE stdlib `INFO`=20 so
it's SHOWN BY DEFAULT at usual `info`/`devx` console
levels — a `runtime`=15 relay would be silently
filtered (footgun for daemon supervisors whose whole
point is visibility). Still distinctly labeled +
filterable.
Tests (`tests/test_log_sys.py`),
- `test_io_custom_level_registered`: validates the IO
level is fully wired (`CUSTOM_LEVELS`, `addLevelName`,
both palettes, `StackLevelAdapter.io()` callable);
emits a record + sanity-asserts `21 >= INFO(20)`.
- `test_add_log_level_pluggable`: registers a fresh
`XLVL=19` (cyan) via `add_log_level()`, asserts all
four wires + the bound `xlog.xlvl()` emit, then
try/finally cleans up the module-global mutations so
later `get_logger()` audits don't trip on a
half-removed level.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
A `trio.Nursery.start()`-style wrapper around
`trio.run_process()` that surfaces rc!=0 errors
deterministically, ALWAYS isolates the parent
controlling-tty, and optionally live-relays the child's
std-streams to `log.<level>` per-line. Suits both
short-lived test-runners + long-lived daemons.
`supervise_run_process()`,
- Deterministic rc!=0: pass `check=False` to `trio`
and do our OWN post-drain rc-check from the
supervisor coro body AFTER `own_tn.__aexit__` — NOT
inside the internal nursery, since that would
race-cancel the still-draining relay reader and lose
stderr lines. (Re)build + raise a BARE
`subprocess.CalledProcessError`: `.stderr=` for
programmatic callers + an `add_note()`'d
`|_.stderr:` block for human teardown logs. No
nursery-eg-wrapped CPE to `collapse_eg` around.
- Parent controlling-tty isolation: `stdin=DEVNULL`
always, `stdout=DEVNULL` unless relayed/overridden
(via `stdout=` kwarg w/ `_UNSET` sentinel so explicit
`None` = inherit still works). Prevents a spawned
program from clobbering the launching tty's scrollback
w/ control-seqs.
- Live per-line relay: `relay_stdout=True`/
`relay_stderr=True` → relayed to `log.<relay_level>`
(default `'io'`, our custom level 21). Picked to sort
just above stdlib `INFO`=20 so it shows at usual
`info`/`devx` levels yet stays separately filterable;
`runtime`=15 was REJECTED as a default since it'd be
silently filtered at usual verbosity — footgun for
daemon supervisors whose whole point is visibility.
STREAMED, not buffered-until-exit.
- Non-blocking `tn.start()` semantics: live
`trio.Process` handed up via
`task_status.started()` immediately (else
`tn.start()` would block till child exit, losing
the long-lived-daemon use case). Supervise/relay bg
tasks run to completion in this coro.
- `**run_process_kwargs` forwarded verbatim (env, shell,
cwd, start_new_session, executable, ...); MANAGED keys
(`stdin`/`stdout`/`stderr`/`check`) win on conflict.
- Crash-handling layer intentionally NOT baked in —
compose `maybe_open_crash_handler()` ON TOP at the
call-site.
`_relay_stream_lines()` helper,
- Concurrent pipe-drain reader. MANDATORY whenever piping
w/o `capture_*` since nothing else drains the OS pipe —
child blocks on `write()` once kernel buf (~64KiB) fills
→ deadlock.
- Modes (combine freely): `emit`-only live relay,
`accum`-only silent drain+capture (for the CPE note),
or both. Per-line splitting handles cross-chunk
residuals + flushes any trailing un-newline-term'd line
at EOF.
`_add_stderr_note()` helper,
- Attaches an indented `|_.stderr:` note to a CPE via
`add_note()` for legible rc!=0 reporting at teardown.
Tests (`tests/trionics/test_subproc.py`),
- Hermetic `trio`-only (no actor-runtime).
- `test_stdout_relayed_per_line`: per-line stdout relay.
- `test_parent_tty_isolated`: child fd1 is OUR pipe (no
`/dev/pts/*`), fd0 pinned to `/dev/null`.
- `test_no_deadlock_on_big_unnewlined_output`: 200KiB
no-newline output completes under `fail_after(2)` —
exercises the concurrent drain (without it, the child
blocks at ~64KiB).
- `test_stderr_relay_and_cpe_rebuild`: rc!=0 w/
`relay_stderr=True` → bare `CalledProcessError` w/ the
`.stderr` note + per-line live relay.
- `test_nonrelay_cpe_note`: rc!=0 w/o relay → same
deterministic post-drain CPE w/ `.stderr` note (silent
drain+capture path).
Re-export `supervise_run_process` from `tractor.trionics`.
Prompt-IO: ai/prompt-io/claude/20260601T231429Z_0e3e008b_prompt_io.md
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Matches the explicit `dict.pop(uid, None)` contract one
line above; same semantics as the prior truthy check.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
`30e15925` ("Add `start_or_cancel()` to `trionics._taskc`")
inserted `async def start_or_cancel()` — whose body opens its
own col-4 `try:` — immediately before the trailing `else:
raise`. Because the edit was a pure insertion (0 deletions),
the *same* `else: raise` lines were silently REPARENTED: they
used to be the `for exc_match in matching: ... else: raise`
of `maybe_raise_from_masking_exc`, but now bind to
`start_or_cancel`'s `try/except` where they're unreachable
dead code.
Net effect: `maybe_raise_from_masking_exc` lost the `for/else`
re-raise of the un-masked exception, so a masked child
cancellation gets swallowed instead of surfaced.
- restore the `for/else: raise` to `maybe_raise_from_masking_exc`
- drop the now-dead `else: raise` from `start_or_cancel`
Surfaced as 2 deterministic failures in
`test_sigint_closes_lifetime_stack[wait_for_ctx-bg_aio_task-
send_SIGINT_to=child-*]` (the SIGINT-to-child "silent-abandon"
regime). Bisected with `trio` held at `0.29.0`: clean at
`9c36363b` (0/8), broken at `30e15925` (8/8), fixed (0/8).
NOT a `trio` (0.29↔0.33 identical) nor logging-plugin
regression.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Wrapper around `trio.Nursery.start()` that DOESN'T mask
out-of-band cancellation as a lossy startup failure.
Picks the right re-raise: ambient `Cancelled` when
present, the genuine startup-protocol `RuntimeError`
otherwise.
The problem,
- `trio.Nursery.start()` raises a generic
`RuntimeError("child exited without calling
task_status.started()")` whenever the started task
exits BEFORE calling `task_status.started()` —
INCLUDING the common case where the child was
cancelled out-of-band by an *ancestor* cancel-scope
erroring/cancelling.
- In that case the original `trio.Cancelled` is
swallowed and the caller is left w/ an opaque,
root-cause-detached `RuntimeError`.
The fix,
- Catch the "...started" RTE.
- `await trio.lowlevel.checkpoint_if_cancelled()` —
re-raises the in-flight `Cancelled` IFF we're under
effective cancellation (ancestor-inclusive), carrying
trio's auto-generated reason which points at the true
root exc.
- If we're NOT cancelled the `checkpoint_if_cancelled()`
is a cheap no-op and we fall through to re-raise the
genuine startup-protocol RTE.
Re-export from `tractor.trionics` so callers don't have
to reach into `_taskc`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Follow-up note documenting why the deeper "Route B" fix
for `LogSpec`/`apply_logspec()` true per-leaf-MODULE level
control was NOT taken — in favor of the smaller
sub-PACKAGE fix that shipped in 9c36363b.
Doc covers,
- Status: what 9c36363b already gives (per-sub-pkg
control at any nesting depth, `devx.debug` ≠ `devx`)
vs. what remains unaddressed (per-leaf-mod levels,
top-level lib mods like `tractor.to_asyncio` on the
root logger).
- "Route B" sketch: make logger *identity* the full
dotted module path; mv the cosmetic leaf-trim out of
logger-naming into the *formatter's* `{name}`
rendering.
- 6 breaking-change costs: every logger name changes,
formatter rewrite, propagation/double-emit surface
grows, level-inheritance semantics shift,
`modden`/`piker` contract churn, `get_logger()`
refactor risk.
- Migration plan if pursued: extract a pure
`_mk_logger_name()` helper w/ an exhaustive name-shape
test matrix, swap `get_logger()` to use it for
identity, swap formatter to use the display string,
golden-diff rendered headers, coordinate w/
downstreams.
- "Route A" alternative: a `logging.Filter` keyed on
`record.module`/`pathname` for per-leaf control w/o
name churn — lower risk, narrower power.
- Recommendation: defer Route B; prefer Route A if
per-leaf is needed soon; the shipped sub-PKG fix
covers the common ask.
Lives under `ai/tooling-todos/` since it's a deferred-
work decision record, not a triage/conc-anal doc.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Strip the trailing `pkg_path` token ONLY when it duplicates the
caller's leaf-*module* name (which the console header already
shows via `{filename}`), instead of blindly dropping the last
token. This keeps genuine, possibly-*nested* sub-PACKAGE parts
addressable as their own sub-loggers.
- detect a true leaf-mod by comparing the caller's `__name__`
vs `__package__` (a pkg `__init__` has them equal -> its
trailing token is a real sub-pkg, NOT a leaf to strip).
- `name='devx.debug'` now -> `tractor.devx.debug`, DISTINCT
from a bare `devx` -> `tractor.devx`; the old unconditional
`pkg_path = subpkg_path` collapsed both to `tractor.devx` and
silently broke per-sub-pkg level control via the logging-spec.
- `get_logger(__name__)` leaf-strip still works (cosmetic, bc
the leaf-mod is in the `{filename}` header field).
Also,
- update the `LogSpec` caveat: sub-PACKAGE granularity now
addressable at ANY depth; leaf *modules* intentionally aren't
(they're the `{filename}`); top-level mods (eg. `to_asyncio`)
still emit on the root logger.
- adjust `test_root_pkg_not_duplicated_in_logger_name` to the
new literal explicit-`name` contract (no leaf-collapse).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Two coupled changes that let downstream projects (eg. `modden`) inherit
the test-harness loglevel plumbing for free via
`tractor._testing.pytest`:
Plugin lift (`tests/conftest.py` → `_testing/pytest.py`),
- mv `pytest_addoption(--ll)`, the `loglevel` autouse
fixture, and `test_log` fixture out of the test-suite-
local conftest into the reusable plugin.
- add `--tl`/`--tractor-loglevel` as a DISTINCT flag from
`--ll`: `--ll` is the consuming-project's OWN app
loglevel (scoped to its pkg-hierarchy), `--tl` is the
`tractor.*` runtime loglevel. `--tl` falls back to
`--ll` when unset (preserves current `tractor`-suite
behavior).
- add `testing_pkg_name` session fixture (default
`'tractor'`) — downstream projects override to e.g.
`'modden'` so `--ll` scopes to their own hierarchy
instead of `tractor.*`.
- `loglevel` fixture now yields the resolved
tractor-runtime level (passed to
`open_root_actor(loglevel=<.>)` by `@tractor_test`)
AND separately applies `--ll` to the
`testing_pkg_name` hierarchy when that isn't
`tractor`. `test_log` scopes the per-test logger to
`testing_pkg_name`.
`tractor.log` "logging-spec" mini-DSL,
- `LogSpec = str|bool`. Accepted forms:
- `True` → enable `pkg_name` root at `default_level`
(fallback `'cancel'`).
- `False` → no-op.
- bare level eg. `'info'` → root-logger at that level.
- `'sub:info,x:cancel'` → per-sub-logger filter-spec;
each `<name>` is RELATIVE to `pkg_name` (must NOT
include the pkg-token).
- `parse_logspec()` → `{sublog|None: level}` mapping.
`None` key = root-logger. Mixed bare-level + filters
in one spec is rejected w/ a helpful err msg; so is
embedding the `pkg_name` token in a sub-name.
- `apply_logspec()` → `(primary_level, {name: log})`:
parses then enables a `colorlog` stderr handler per
named (sub)logger. Authoritative sub-logger filters
get `propagate=False` so they don't double-emit
through a parallel root-level handler.
- !GRANULARITY CAVEAT! sub-logger names match at
sub-pkg granularity, not leaf-module — so `devx.debug`
collapses to the same `tractor.devx` logger as a bare
`devx`, and top-level lib modules (eg.
`tractor.to_asyncio`) emit under the *root* logger
rather than a phantom `to_asyncio` child. Documented
inline on `LogSpec`.
Other,
- `tests/conftest.py` keeps a NOTE pointing to the
plugin for future-debugging clarity (don't remove
silently — the lift is the relevant signal).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Factor the "deliver an exc to a running aio task" pattern out of
`translate_aio_errors()` + `open_channel_from()` into a shared
`maybe_signal_aio_task()` helper. Add a cause-chain matrix comment
+ relay-echo guard so the final-raise block can't cycle
`trio_err.__cause__` back onto its own derivative relay.
`maybe_signal_aio_task()`,
- Delivers `exc` via `aio_task._fut_waiter.set_exception()` — NOT
`aio_task.set_exception()` which on py3.13+ ALWAYS raises
`RuntimeError("Task does not support set_exception")` (dead code as
a relay mechanism).
- Returns `(delivered: bool, report: str)`. Caller uses `delivered` to
flip `wait_on_aio_task` when delivery failed (avoids hanging on
`_aio_task_complete.wait()`).
- `pre_captured_fut=`: required when the caller crosses a trio
checkpoint between capturing `_fut_waiter` and invoking the helper.
`Task._wakeup` clears `_fut_waiter = None` so re-reading
post-checkpoint loses the ref even though the exc is still in-flight
on the (now-`done()`) original fut.
- `cause=`: sets `exc.__cause__ = cause` so the relay carries
a "trio_err -> caused -> relay" chain through `set_exception()`
→ `Task._wakeup` → coro raise → `wait_on_coro_final_result`
→ `signal_trio_when_done` → `task.result()`-raise.
- `allow_cancel_fallback=True`: opt-in `aio_task.cancel()` for the
narrow case where `_fut_waiter is None` AND task is runnable (sitting
in asyncio's ready queue, not parked on a poke-able future). NEVER
cancels when `_fut_waiter` carries an in-flight exc — that would race
+ mask the real terminating exc.
`translate_aio_errors()`,
- Replace the two ad-hoc `_fut_waiter.set_exception()`
/ `aio_task.set_exception()` call sites w/ the helper.
- Capture `pre_cp_fut = aio_task._fut_waiter` BEFORE the post-shutdown
`trio.lowlevel.checkpoint()` (critical: `_wakeup` clears the ref).
- New "cross-loop cause-chain matrix" comment block on the final-raise
— tabulates every `(trio_err, aio_err, trio_to_raise)` combo into
exactly one terminal `raise X [from Y]` or early `return`. Covers the
sibling `signal_trio_when_done()` resolution + the relay-echo
INVARIANT.
- New relay-echo guard: if `aio_err` is one of OUR OWN signals
(`TrioTaskExited`/`TrioCancelled`) AND `aio_err.__cause__ is
trio_err`, raise the bare `trio_err` instead of `trio_err from
aio_err` (which would CYCLE the cause chain since the relay was itself
caused-by `trio_err`).
- Drop the stale "the `task.set_exception(aio_taskc)` call MUST NOT
EXCEPT or this WILL HANG" warning — the helper handles the failure
path explicitly via `delivered=False` → `wait_on_aio_task = False`.
- Carry `cause=trio_err` on both the cancel-relay (`TrioCancelled`) and
the graceful-exit relay (`TrioTaskExited`) so the aio-side traceback
shows the real root.
`open_channel_from()`,
- Adopt the same helper; drop the dead "SHOULD NEVER GET HERE !?!?"
+ `tractor.pause(shield=True)` panic branch.
- Capture in-flight trio-side exc via `sys.exc_info()[1]` and pass as
`cause=` — non-`None` only when the `try` body raised (graceful exit
→ None).
Other,
- Top-level import: `sys` (for `sys.exc_info()`).
- `run_as_asyncio_guest()`: add commented-out alt `out: Outcome = await
trio_done_fute` next to the shielded version — exploratory note for
the longstanding "why is `.shield()` needed?" TODO.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
SIGUSR1 task-tree dumps via `stackscope` should work in
plain (non-pdb) runs too — esp. in infected-`asyncio`
processes where the kernel-default SIGUSR1 disposition is
`Term` (proc dies on `kill -USR1` w/o an installed
handler). Ungate the install path from `_debug_mode` in
both root and sub-actor init; the `use_stackscope` rt-var
+ `TRACTOR_ENABLE_STACKSCOPE` env-var checks remain as
the actual opt-in (e.g. via `--enable-stackscope`).
Deats,
- `_root.open_root_actor`: drop the `debug_mode and ...`
conjunction around the `enable_stack_on_sig()` call;
now gated only on the `enable_stack_on_sig` arg itself.
- `_runtime.Actor` sub-actor init: lift the
`use_stackscope`/`TRACTOR_ENABLE_STACKSCOPE` branch out
of the `if rvs['_debug_mode']:` block to peer-level.
The `use_greenback` branch stays inside `_debug_mode`
(pdb-specific).
- Refresh inline comments on both sites to call out the
infected-`asyncio` "default SIGUSR1 = terminate proc"
rationale.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Adopt the `_testing.trace` CM helpers in two MTF-hang-prone
tests so on-timeout we get a fresh
`ptree`/`wchan`/`py-spy` diag snapshot on disk instead of
opaque pytest timeout-kills. Same shape as bd07a95d for
`test_dynamic_pub_sub`.
Deats,
- `test_echoserver_detailed_mechanics`:
* inner `trio.fail_after` → `fail_after_w_trace`. Adds
`fail_after_w_trace: FailAfterWTraceFactory` fixture
param.
* mv per-backend `timeout` calc to top of test body (was
interleaved w/ helper defs).
* factor deep
`open_nursery`/`open_context`/`open_stream` body into
`_body()` so the wrapping `main()` stays a 2-liner —
keeps the nested-CM block at its natural indent level
instead of pushing it under yet another `async with`.
* drop `with_timeout: bool` knob + `fa_main()` helper
(knob was hard-coded `True`).
- `test_sigint_closes_lifetime_stack`:
* outer `signal.alarm`/`try`/`finally` → single
`afk_alarm_w_trace(10)` CM. Adds
`afk_alarm_w_trace: AfkAlarmWTraceFactory` fixture
param.
* drop `_AFK_CAP_S` + `armed_alarm` vars (CM owns both).
* explanatory comment refreshed to mention
`AFKAlarmTimeout` + the disk-snapshot side effect.
Other,
- Drop debug `return 1e3` short-circuit from `delay()`
fixture — snuck in as a scratch line, was clobbering the
proper `debug_mode`-branched return.
- Top-level import: `FailAfterWTraceFactory`,
`AfkAlarmWTraceFactory` from `tractor._testing.trace`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Per-terminal optimized `watch`-like xonsh alias that
runs an arbitrary callable alias in a loop inside the
alt-screen buffer with flicker-free repaint. Supersedes
the inline `acli.ptree` polling .xsh snippet (removed
from `_ptree` docstr in favor of
`acli.watch acli.ptree pytest`).
Deats,
- alt-screen entry/exit (`\033[?1049h/l`) + cursor-hide
(`\033[?25l/h`) wrapped in try/finally so Ctrl-C always
returns to a pristine shell.
- per-frame draw uses cursor-home (`\033[H`) + per-line
EL (`\033[K` before each `\n`) + post-draw erase-down
(`\033[J`) → stale tail chars from a longer prior
frame are obvi cleared; no full-screen flash.
- SIGWINCH-aware: terminal resize sets a flag, next
frame does a full clear (`\033[H\033[2J`) instead of
the cheap cursor-home path.
- Ctrl-C handling: install `signal.default_int_handler`
so `KeyboardInterrupt` lands cleanly; prior handler
restored on exit.
- Output capture: redirect the alias's stdout to
`StringIO` per frame so we can post-process the EL
fix. Aliases writing directly to `sys.stdout.buffer`
/ `os.write(1)` bypass capture — EL-fix won't apply
but loop still works.
- Alias unwrap: xonsh stores callables as either a bare
callable OR `[fn, *preset_args]`. Both forms handled;
subprocess-style aliases rejected w/ a friendly err
msg.
- `argparse` w/ `-n`/`--interval` (default 0.3s); rest
of argv forwarded as alias args.
- Reg `'acli.watch': watch` in `_TCLI_ALIASES`.
Other,
- Tn `_ptree` `args: list[str]` param.
- Mod-header `Provides:` block updated w/ `acli.watch`
entry.
- Top-level imports: `os`, `sys`, `signal`, `time`,
`typing.Callable`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Only flag `tractor._child` procs as cross-test ghosts of
THIS run if `ppid==1` (init-adopted real leak) or `ppid`
is in the walk's `seen` set (descendant we missed via
race).
Previously, procs whose `ppid` points to some OTHER live non-`pytest`
(in the use of `acli.ptree pytest`) process belong to a different
tractor app (`piker`, another `pytest` shell, a long-running tractor
daemon) and were being falsely flagged as cross-test ghosts.
Deats,
- post-cmdline-match check via `_ppid_from_proc(pid)`,
short-circuit on `None` (proc died in-flight).
- expand module docstring to spell out the ownership
filter rule + its rationale.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Replace inline `trio.fail_after` + manual `signal.alarm` guard with the
`_testing.trace` CM helpers that auto-capture a full ptree/wchan/py-spy
diag snapshot to disk on timeout.
Deats,
- inner guard: `trio.fail_after` → `fail_after_w_trace` (async CM,
captures on `TooSlowError`).
- outer AFK guard: raw `signal.alarm` → `afk_alarm_w_trace` (sync
CM, captures on `SIGALRM`), only armed under fork backends.
Extracts `_run_and_match()` helper to keep branching clean.
- bump `fail_after_s` from 4/12 → 8/20 to stop borderline flakes
while diag harness accumulates evidence.
- drop `_DIAG_CAP_S` var + manual signal import (now internal to
`afk_alarm_w_trace`).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Deats,
- `pytestmark`: enrich `skipon_spawn_backend('subint')` reason with
conc-anal doc refs + GH#379 link, add `reap_subactors_per_test`,
`track_orphaned_uds_per_test`,
`detect_runaway_subactors_per_test` fixtures
- `test_nested_multierrors`: parametrize over `depth` `{1, 3}`, add
MTF `xfail(strict=False)` with detailed race-window comment
explaining the BEG shape mismatch, wrap body in
`fail_after_w_trace` with per-backend timeout budget, bump
`@tractor_test(timeout=10)`, drop old multiprocessing depth
special-casing
- `test_multierror_fast_nursery`: wrap in
`fail_after_w_trace(30.0)`, accept `TooSlowError` in
`pytest.raises`, surface explicit `pytest.fail` on hang
- `test_cancel_while_childs_child_in_sync_sleep`: swap
`spawn_backend` param for `is_forking_spawner`, widen
`fail_after` delay for fork-based spawners
- `test_remote_error`, `test_multierror`,
`test_cancel_infinite_streamer`, `test_some_cancels_all`: add
`set_fork_aware_capture` fixture param
- Drop commented-out per-test `skipon_spawn_backend` blocks (now
covered by module-level `pytestmark`)
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Append "Snapshot evidence (2026-05-13)" section to
`cancel_cascade_too_slow_under_main_thread_forkserver_issue.md`
documenting `fail_after_w_trace` diag capture results for
`test_nested_multierrors` under the MTF backend — reproduction cmd,
ptree analysis, observed hang signature, and updated triage plan.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Post-yield now also reaps init-adopted (`ppid==1`) tractor procs
that appeared during the test — leaked subactors whose mid-tier
parent died during cascade teardown, reparenting them to init.
Pre-yield snapshot of existing orphans scopes reap to THIS test's
leaks only, avoiding reap of unrelated tractor uses (piker, etc.)
on the box.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
`reap(include_descendants=True)` now expands each orphan-root pid
into its full psutil subtree before delivering SIGINT, so a
multi-level leaked actor-tree gets torn down in a single pass
instead of requiring repeated calls (each pass kills the current
`ppid==1` level, the level below becomes init-adopted, etc.).
Falls back to the original flat `pids` list when `psutil` is
unavailable. Emits a log line when expansion adds descendant pids.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- `_testing/trace.py`: add `_SNAPSHOT_INDEX` session- scoped list
populated by `_do_capture_snapshot()` on each successful dump;
add TODO for future `TRACTOR_TRACE_HOLD=1` pause-on-hang mode
- `_testing/pytest.py`: add `pytest_terminal_summary` hook that
prints all captured snapshot dirs at end-of-session so paths
don't get buried in scrollback
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Deats,
- `_find_tractor_strays()`: scan `/proc/*/cmdline` for
`tractor._child` procs NOT in the walk's `seen` set — surfaces
ghost subactor trees from prior test runs (cross-test launchpad
contamination).
- `dump_proc_tree(include_strays=True)`: refactor classification
into `_classify_walk()` closure, walk stray roots as additional
trees, emit stray-root summary in header. Also: `tractor._child`
procs reparented to init are now always classified as orphans
regardless of cgroup-slice (leaked subactor ≠ desktop-launched
app).
- `_do_capture_snapshot()`: use `sys.__stderr__` to bypass pytest
`--capture=sys` redirection so snapshot paths always land on the
real terminal
- `fail_after_w_trace()`: capture diag snapshot on
non-`TooSlowError` exceptions when the `fail_after` scope's
cancel had already fired (e.g. nursery wraps `Cancelled` into a
`BaseExceptionGroup` that escapes before `TooSlowError` can be
raised).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Extract all pure-Python diagnostic helpers (`dump_proc_tree`,
`dump_hung_state`, `scan_bindspace`, `dump_all`, `resolve_pids`,
`ensure_sudo_cached`, etc.) from the xonsh xontrib into a new
`tractor/_testing/trace.py` module so the same logic is callable
from both the `acli.*` terminal aliases AND in-test capture-on-hang
fixtures.
Deats,
- `_testing/trace.py`: new module (1171 lines) — proc-tree walker,
hung-state dumper, bindspace scanner, `dump_all()` snapshot
archiver, `AFKAlarmTimeout` exc, `fail_after_w_trace()` async CM
(trio `fail_after` + auto-snapshot on `TooSlowError`),
`afk_alarm_w_trace()` sync CM (`signal.alarm` + snapshot on
`SIGALRM`), plus pytest fixture wrappers for both.
- `_testing/pytest.py`: re-export the two fixtures via `from .trace
import` so pytest plugin-discovery picks them up.
- `tractor_diag.xsh`: thin terminal wrappers that import from
`_testing.trace` — drops ~627 lines of inline impl. Add
`acli.dump_all` alias for full snapshot-bundle CLI access.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Deats,
- `test_echoserver_detailed_mechanics`: add `is_forking_spawner`
param, wrap `main()` in `fa_main()` with per-backend
`trio.fail_after` (4s fork / 1s trio) to cap cancel-cascade
teardown that compounds under forkserver.
- `test_sigint_closes_lifetime_stack`: swap `start_method` param
for `is_forking_spawner`, pre-init `tmp_file`/`ctx` to `None` so
KBI firing before `open_context` body doesn't `UnboundLocalError`,
add `pytest.fail` guard for the spawn-time IPC race case, arm
`signal.alarm` AFK-safety cap (10s) under fork backends
Also,
- `pytestmark`: add `track_orphaned_uds_per_test` +
`detect_runaway_subactors_per_test` fixtures.
- `delay()`: hardcode `return 1e3` at top (debug override still in
place).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
For forking spawner backends that is.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- `skipon_spawn_backend('subint')`: expand reason with specific
analysis doc refs + GH issue #379 umbrella link.
- add `track_orphaned_uds_per_test` fixture via `usefixtures` to
blame-attribute UDS sock-file orphans left by SIGKILL cancel
cascades.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- `test_ext_types_over_ipc`: wrap `main()` in `fa_main()` with
`trio.fail_after(2)` + commented `capfd.disabled()` investigation
(pytest#14444).
- `test_basic_payload_spec`: add fixture param with note on fork-spawner
hang prevention.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Outer `signal.alarm` cap that fires even when trio's
`fail_after` is blocked by a shielded-await deadlock
(the bug-class-3 hang under MTF backends). Only armed
for fork-based spawners where the bug lives.
Deats,
- `_DIAG_CAP_S = fail_after_s + 5` — slightly larger than the
trio-native guard so it always loses when the in-band path works.
- `test_log.cancel()` breadcrumbs at each cancel-scope boundary so the
last-fired breadcrumb names the swallow point on hang.
- try/finally wrapping around each scope level for deterministic
breadcrumb emission.
- add `is_forking_spawner`, `set_fork_aware_capture` fixture params.
- rework `fail_after_s`: 4s for fork, 12s for trio (was 30/12).
Also,
- `test_sigint_both_stream_types`: `assert 0` -> `pytest.fail()`, add
TODO re `pytest.raises()`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Split the old `live`/`orphans` sock classification
into three ppid-aware buckets: `live-active` (PID
alive, parent owns it), `orphaned-alive` (PID alive
but `ppid==1`, init-adopted — `acli.reap` candidate),
and `orphaned-dead` (PID gone, sock stale).
Deats,
- new `_ppid()` helper reads `/proc/<pid>/stat` field [3] for parent
PID, handles the tricky `(comm)` field (can contain spaces/parens) by
splitting from last `)`.
- live-active rows now show `(ppid=<N>)` for ctx.
- orphaned-alive rows flagged `(adopted by init)`.
- cleanup suggestion: `acli.reap --uds` for both
alive-orphan graceful cancel + dead-sock cleanup
in one shot; manual `rm` kept as fallback.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add `capture` dimension to CI matrix so fork-based
backends run `--capture=sys` (fork-child × `--capture=fd`
is a known deadlock). Non-fork backends keep `fd`.
Deats,
- two `include:` rows for `main_thread_forkserver` on
linux py3.13: tcp + uds, both `capture: 'sys'`
- job name updated to show `capture=` mode
- timeout bumped 16 -> 20 min to accommodate the
additional matrix cells
- `--capture=${{ matrix.capture }}` replaces hardcoded
`--capture=fd`
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
New `ai/conc-anal/spawn_time_boot_death_dup_name_issue.md`
documenting the spawn-time rc=2 race under rapid
same-name spawning against a forkserver + registrar
— the `wait_for_peer_or_proc_death` helper now surfaces
the death instead of parking forever on the handshake
wait.
Also,
- extract inline `xfail` into module-level
`_DOGGY_BOOT_RACE_XFAIL` marker.
- apply it to `n_dups=8` too (previously bare) bc
larger N widens the race window enough to fire
occasionally.
- link to tracking issue #456.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Forking spawner + UDS transport has different timing
vs `trio_proc` — streaming example completes faster
in some cases, slower in others depending on fork
overhead + sock setup.
Deats,
- add `expect_cancel` param to `cancel_after()`, raise
`ActorTooSlowError` when cancel scope fires unexpectedly instead of
silently returning `None`.
- `time_quad_ex` fixture: bump timeout +1 for forking+UDS, explicit
`ActorTooSlowError` on `None` result instead of bare `assert results`.
- `test_not_fast_enough_quad`: `xfail` for forking+UDS being "too fast"
(cancel doesn't fire bc streaming finishes before delay).
- add `is_forking_spawner`, `tpt_proto` fixture params throughout.
Also,
- `_testing/pytest.py`: widen `start_method` parametrize and
`is_forking_spawner` fixture to `scope='session'`.
- `"""` -> `'''` docstring style throughout.
- hoist `_non_linux` to module scope (was redefined locally in two
places).
- type hints, kwarg-style `partial()` calls.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
`acli.bindspace_scan piker` now resolves `<name>` to
`$XDG_RUNTIME_DIR/<name>` — useful for projects like
`piker` that bind sibling sub-dirs alongside tractor's
default. Full paths still work as-is.
Also,
- rename "unparseable" section to "non-tractor" with
clearer desc (filename lacks `@<pid>` suffix)
- print per-sock `ss -lpx 'src = <path>'` cmds for
non-tractor socks so callers can manually resolve
listener-PID liveness
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add module-level `pytestmark` applying per-test
`reap_subactors_per_test`, `track_orphaned_uds_per_test`, and
`detect_runaway_subactors_per_test` fixtures — registrar tests stress
discovery roundtrips that historically left orphaned UDS sock-files.
Deats,
- drop unused `say_hello()` fn, keep only `say_hello_use_wait`;
rename param `func` -> `ria_fn`.
- use `@tractor_test(timeout=7)` instead of separate
`@pytest.mark.timeout(7, method='thread')` decorator.
- add `with_timeout()` helper, wire into
`test_subactors_unregister_on_cancel_remote_daemon`.
- uncomment `_timeout_main()` in `test_stale_entry_is_deleted`, use
configurable `timeout` var + `debug_mode` guard for `tractor.pause()`
on cancel.
- `dump_on_hang(seconds=timeout*2)` instead of hardcoded `20`.
- fix typo "oustanding" -> "outstanding".
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Comment/docstring updates: `subint_forkserver` is a clean
`NotImplementedError` stub — not an alias to variant-1
(`main_thread_forkserver`). Key reserved in-place (not aliased) so
the subint-hosted-child impl can flip without API churn once
jcrist/msgspec#1026 unblocks PEP 684 subints.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Rework reap/diag tooling to identify tractor sub-actors via
intrinsic proc signals — cmdline/comm markers from `setproctitle` —
instead of env-var or cwd matching.
Deats,
- new `_is_tractor_subactor()` checks cmdline for `tractor[` /
`tractor._child` markers, falls back to `/proc/<pid>/comm` for
zombie-resilient detection (kernel preserves `comm` past exit
until reap)
- `_read_comm()` reads kernel per-task name set by `setproctitle()`
— the zombie-safe ID signal
- `_read_status_state()` reads single-letter proc state from
`/proc/<pid>/status` (`Z` = zombie)
- `find_orphans()` drops `repo_root` requirement, uses
`_is_tractor_subactor()` for intrinsic sub-actor ID instead of
cwd coincidence-matching
- new `find_zombies()` with optional `parent_pid` filter for
zombie-state sub-actors
Also,
- rename `pytree` -> `ptree` throughout xontrib
- add `_which_cgroup_slice()` — reads `/proc/<pid>/cgroup` to
distinguish `system.slice` services vs `user.slice` desktop apps
from genuinely leaked orphans
- `_ptree` classifies `ppid==1` procs into `system-slice`,
`user-slice`, and `orphans` buckets with per-section output
- `_tractor_reap` drops `git rev-parse` / `sys.path` hack — assumes
tractor importable from active venv
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
New `tractor.devx._proctitle` mod sets each
sub-actor's `argv[0]` (and kernel `comm`) to
`tractor[<aid.reprol()>]` — e.g.
`tractor[doggy@1027301b]` — so `ps`/`top`/`htop`
and `acli.pytree`/reaper tooling can identify
actors at a glance without parsing full cmdlines.
Deats,
- `set_actor_proctitle()` wraps the `setproctitle`
pkg with `ImportError` guard; optional at runtime
but listed in `pyproject.toml` so default installs
benefit.
- called early in `_child._actor_child_main()` after
`Actor` construction, before `_trio_main()` entry.
- tests in `tests/devx/test_proctitle.py`: format
unit test, `/proc/{cmdline,comm}` integration
test, negative detection test.
Resolves#457
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Extend `test_register_duplicate_name` w/ cancel-level log
breadcrumbs and `try/finally` for better diag on the cancel-cascade
hang.
Add `test_dup_name_cancel_cascade_escalates_to_hard_kill` as a
regression test for the TCP+MTF duplicate-name cancel-cascade
deadlock. Spawns N same-name actors, calls `an.cancel()`, and
asserts teardown completes within a `trio.fail_after()` budget that
scales w/ `n_dups`.
Deats,
- parametrize `n_dups` (2, 4, 8) to widen the race window for
concurrent `register_actor` RPCs.
- `n_dups=4` xfail'd — exposes a separate boot-race bug (doggy
`rc=2` under rapid same-name spawn), tracked in #456.
- post-teardown asserts all `Portal` chans disconnect, verifying
hard-kill escalation worked.
Relates to https://github.com/goodboy/tractor/issues/456
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Race `IPCServer.wait_for_peer(uid)` against the sub-proc's
`.wait()` inside a `trio` nursery; whichever completes first
cancels the other.
Prevents the spawning task from parking forever on an unsignalled
`_peer_connected[uid]` event when a sub-actor dies during boot
(e.g. crashed on import before reaching `_actor_child_main`).
Instead of hanging, raises `ActorFailure` w/ the proc's exit code
for clean supervisor error reporting.
Also,
- use the new racer in `main_thread_forkserver_proc()` spawn path.
- keep `proc_wait` generic so each backend passes its own callable
(`trio.Process.wait`, `_ForkedProc.wait`, etc.).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Group all xontrib aliases under an `acli.` prefix
so xonsh prefix-completion treats them as a sub-cmd
group — `acli.<TAB>` lists the full set. No parent
`acli` cmd exists; the dot is purely naming.
Renames (incl `-` -> `_` in suffixes for shell-
identifier-friendliness):
- `pytree` -> `acli.pytree`
- `hung-dump` -> `acli.hung_dump`
- `bindspace-scan` -> `acli.bindspace_scan`
Add new `acli.reap` wrapping `scripts/tractor-reap`:
Deats,
- 3 opt-in phases via flags:
1. process reap — `find_orphans()` (default,
PPid=1 + cwd=repo + cmdline `python`) or
`find_descendants(--parent PID)`. SIGINT
first, SIGKILL after `--grace` (def 3.0s).
2. `/dev/shm` sweep (`--shm`/`--shm-only`) —
`find_orphaned_shm()` + `reap_shm()`. needed
bc `tractor` disables `mp.resource_tracker`.
3. UDS sock-file sweep (`--uds`/`--uds-only`) —
`find_orphaned_uds()` + `reap_uds()` for stale
`${XDG_RUNTIME_DIR}/tractor/<name>@<pid>.sock`
entries. See #452.
- `--dry-run` lists matches without signalling/
unlinking; survivor pids or sweep errors flip
the alias rc to `1`.
- lazy-imports `tractor._testing._reap` after
`git rev-parse --show-toplevel` (with
`Path(__file__).parent.parent` fallback) so the
contrib is loadable before the venv is on
`sys.path`.
- `argparse.SystemExit` on `-h`/bad-args is
caught + returned as the alias rc instead of
killing xonsh.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Wires SC-discipline cancel-then-escalate into
`ActorNursery.cancel()`:
graceful cancel-req -> bounded wait -> hard-kill
Deats,
- add `raise_on_timeout: bool = False` kwarg to `Portal.cancel_actor()`.
When `True`, bounded- wait expiry raises `ActorTooSlowError` instead
of the legacy DEBUG-log + return-`False` path. Default stays `False`
for callers that handle their own escalation (e.g.
`_spawn.soft_kill()` polling `proc.poll()`).
- add `_try_cancel_then_kill()` helper in `_supervise` used by per-child
cancel tasks. On `ActorTooSlowError`, escalates via `proc.terminate()`
(SIGTERM) so a non-acking sub doesn't park `soft_kill()` forever
waiting on `proc.poll()`.
- replace `tn.start_soon(portal.cancel_actor)` in
`ActorNursery.cancel()` with the helper.
Debug-mode bypass:
-----------------
skip escalation (fall back to legacy fire-and-forget cancel) when ANY
of:
- `Lock.ctx_in_debug is not None` (some actor is currently
REPL-locked)
- `_runtime_vars['_debug_mode']` (root opened with `debug_mode=True`).
- `ActorNursery._at_least_one_child_in_debug` (per-child `debug_mode=`
opt-in).
ORing covers root-debug, child-debug, and active- REPL-lock cases
without false-positively SIGTERM- ing a sub-tree proxying stdio for
a REPL session.
Motivated by the `subint_forkserver` dup-name hang where a same-named
sibling subactor's cancel-RPC failed to ack within
`Portal.cancel_timeout` (TCP+ forkserver register-RPC contention) and
the nursery `__aexit__` deadlocked.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Distinct from `trio.TooSlowError` so that existing
`except trio.TooSlowError:` blocks don't silently
mask actor-cancel timeouts — these must propagate
to let a supervisor escalate to
`proc.terminate()` per SC-discipline:
graceful cancel-req -> bounded wait -> hard-kill
Motivated by #subint_forkserver dup-name hang
where `Portal.cancel_actor()` silently swallowed
the timeout and the supervisor never escalated,
leaving a same-named sibling subactor parked
forever.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code