Commit Graph

1482 Commits (2917b74ba4dbdee2bfda5d3a28bb5b76a10ff0bc)

Author SHA1 Message Date
Gud Boi 2d4995e08d Route `stackscope` SIGUSR1 onto trio loop
Signal handlers fire in a non-trio stack frame; calling
`stackscope.extract(recurse_child_tasks=True)` from there
only walks the `<init>` task and misses everything inside
`async_main`'s nurseries — exactly the part you want to
see during a hang.

Fix: capture `trio.lowlevel.current_trio_token()` at
`enable_stack_on_sig()` time and stash it as a module-
level `_trio_token`. The SIGUSR1 handler then dispatches
the dump *onto* the trio loop via
`_trio_token.run_sync_soon(_safe_dump_task_tree)`, so
`stackscope.extract` runs from a real trio-task context
and walks the full nursery tree.

Late-binding: pytest's `pytest_configure` calls
`enable_stack_on_sig()` outside any `trio.run`, so token
capture there is a `RuntimeError` — left at `None`. The
runtime re-calls `enable_stack_on_sig()` from inside
`async_main` (subactor side) where the token IS
available, so subactors get the full-tree path.
`dump_tree_on_sig` falls back to a direct call when
`_trio_token is None` (parent process pre-trio.run, or
signal delivered after `trio.run` returns).

`_safe_dump_task_tree()` is a `run_sync_soon`-friendly
wrapper that swallows any exception from
`dump_task_tree()` — trio prints + crashes on uncaught
exceptions in scheduled callbacks; better to log + keep
the run alive so the user can re-trigger.

Other,
- emit `capture-bypass tee: <fpath>` line + `tail -f`
  hint in the rendered dump header so users know where
  to find the artifact even when stdio is captured.
- swap the inline `f'     |_{actor}'` line for a
  `_pformat.nest_from_op` rendering of `actor_repr`
  (matches the rest of the runtime's nested-op style).
- log lines on handler install + already-installed
  branches now note `(trio_token captured: <bool>)`
  so it's obvious from the log whether the full-tree
  path is wired.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-29 12:01:03 -04:00
Gud Boi 8c730193f9 Refine fork-survival docs + `EBADF` handling
Two cleanup tweaks in `_main_thread_forkserver`:

Doc, "what survives the fork?" section — expand the
"non-calling threads are gone in the child" claim with
the precise execution-vs-memory split that reconciles
this module's prior framing with trio's (canonical
[python-trio/trio#1614][trio-1614]) "leaked stacks"
framing:

- execution-side: only the calling thread runs
  post-fork; all others never execute another
  instruction.
- memory-side: those non-running threads' stacks +
  per-thread heap structures are still COW-inherited
  as orphaned bytes — what trio means by "leaked".

Same POSIX reality, opposite sides; the table is
extended to a 4-col `parent | child (executing) |
child (memory)` layout to make both views explicit.
Also blank-line-padded the bulleted hazard classes
for cleaner markdown rendering.

[trio-1614]: https://github.com/python-trio/trio/issues/1614

Code, `_close_inherited_fds()` log noise — split the
catch-all `except OSError` into:

- `EBADF` — benign race where the dirfd that
  `os.listdir('/proc/self/fd')` itself opened ends up
  in `candidates`, then auto-closes before the loop
  reaches it. Demote to `log.debug()` + `continue`;
  prior `log.exception` drowned the post-fork log
  channel with stack traces every spawn.
- other errnos (EIO / EPERM / EINTR / ...) keep the
  loud `log.exception` surface — those ARE genuinely
  unexpected.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-29 10:34:33 -04:00
Gud Boi 5418f2dc3c Add `--enable-stackscope` pytest plugin flag
New `--enable-stackscope` CLI flag installs a SIGUSR1 →
trio-task-tree-dump handler in pytest itself + every
spawned subactor for live stack visibility during hang
investigations. Lighter than `--tpdb` (no pdb machinery
/ tty-lock contention) — pure stack-only triage.

Plumbing:
- `_testing.pytest.pytest_addoption()` adds the flag.
- `_testing.pytest.pytest_configure()` (when flag set):
  * exports `TRACTOR_ENABLE_STACKSCOPE=1` so fork-children
    inherit it via environ,
  * installs the handler in pytest itself via
    `enable_stack_on_sig()`.
- `runtime._runtime.Actor.async_main()` extends the
  existing `_debug_mode` gate to ALSO fire when
  `TRACTOR_ENABLE_STACKSCOPE` is in env — so subactors
  install the same handler at runtime startup.

Capture-bypass tee in `dump_task_tree()`:
Pytest's default `--capture=fd` swallows `log.devx()`
output, making SIGUSR1 dumps invisible right when you
need them. Render the dump once to a `full_dump` str,
then unconditionally tee to:

- `/tmp/tractor-stackscope-<pid>.log` (append-mode,
  always written) — guaranteed-readable artifact even
  under CI / `nohup` / no-tty. `tail -f` to follow.
- `/dev/tty` (best-effort) — pytest never captures the
  tty; ignored if device is missing.

Other,
- squelch the benign `RuntimeWarning` ("coroutine method
  'asend'/'athrow' was never awaited") from
  `stackscope._glue`'s import-time async-gen type
  introspection so `--enable-stackscope` setup stays
  quiet.
- log msg in the `_runtime` ImportError branch now
  mentions `--enable-stackscope` alongside debug-mode.

Usage,
  pytest --enable-stackscope -k <hang-test>
  # in another shell, find the pid + signal:
  kill -USR1 <pytest-or-subactor-pid>
  # tail the artifact:
  tail -f /tmp/tractor-stackscope-<pid>.log

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-29 10:32:23 -04:00
Gud Boi f8178df0fd Return parent `pid: int` from new `reap_subactors_per_test` fixture 2026-04-27 23:27:19 -04:00
Gud Boi b376eb0332 Add opt-in `reap_subactors_per_test` fixture
Function-scoped, NON-autouse zombie-subactor reaper for
modules whose teardown is known-leaky enough to cascade-
fail every following test in a session.

Sibling to the autouse session-scoped `_reap_orphaned_subactors`. The
session-scoped one fires at session end — too late to save tests that
follow a hung/leaky test in the suite. The new fixture, opted into via
`pytestmark = pytest.mark.usefixtures(...)`, runs between tests in
a problem-module so a leftover subactor from test N can't squat on
registrar ports / UDS paths / shm segments needed by tests N+1,
N+2, ...

Intentionally NOT autouse — the fixture's presence on a module signals
"this module's teardown leaks; please root-cause instead of relying
forever on cleanup". A visibility-vs-convenience trade picked in favor
of the former.

Apply to `tests/test_infected_asyncio.py` since both recent full-suite
runs (parallel-tpt-proto + TCP-only) showed the cascade originating in
this file's KBI- and SIGINT-flavored tests under
`main_thread_forkserver`. Module-comment names the specific offenders so
future de-flake work has a starting point.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-27 21:41:02 -04:00
Gud Boi 7c5dd4d033 Fix `_testing.addr.get_rando_addr` cross-process collisions
Previously the random port was a default-arg expression
(`_rando_port: str = random.randint(1000, 9999)`) — evaluated
ONCE at module import time, making it a per-process singleton.
Two parallel pytest sessions had a 1/9000 birthday-pair chance
of picking the same port; when it hit, every `reg_addr`-using
test in BOTH runs would cascade-fail with "Address already in
use".

Switch to per-call `random.randint()` salted with `os.getpid()`
so:

- within one session: two calls return distinct ports — e.g.
  `test_tpt_bind_addrs::bind-subset-reg` now actually gets two
  different reg addrs on the TCP backend (it was silently
  duplicating before),
- across parallel sessions: pid salt biases each process's
  port choices apart, making cross-run collisions
  vanishingly rare.

Drop the bogus `: str` annotation (was always `int`). UDS already gets
per-process isolation via `UDSAddress.get_random()`'s `@<pid>`
socket-path suffix, so no change needed there.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-27 20:15:20 -04:00
Gud Boi 205382a39b Sweep `subint_forkserver` → `main_thread_forkserver` in code
After the variant-1 / variant-2 backend split, update remaining
string-match refs to the variant-1 backend so user-visible gates
+ skip-marks + comments name the working backend correctly:

- `tractor._root._DEBUG_COMPATIBLE_BACKENDS`: include
  `main_thread_forkserver`, drop the stub-only `subint_forkserver`
  entry.
- `tests/test_spawning.py::test_loglevel_propagated_to_subactor`:
  capfd-skip flips to `main_thread_forkserver`.
- `tests/test_infected_asyncio.py::test_sigint_closes_lifetime_stack`:
  xfail-condition flips to `main_thread_forkserver`.
- `tests/test_shm.py`: drop stale "broken on `main_thread_forkserver`"
  reason-text since the `mp.SharedMemory(track=False)`
  + resource-tracker monkey-patch in `.ipc._mp_bs` makes the tests pass;
  the skip-mark only fires on plain `subint` now.
- Comment / docstring sweep: `runtime._state`, `runtime._runtime`,
  `_testing.pytest`, `_subint.py`, `pyproject.toml`,
  `test_cancellation.py`, `test_registrar.py` — refs to variant-1
  backend updated.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-27 19:55:37 -04:00
Gud Boi 9f0709eee2 Migrate test/smoketest imports + rename test file
Rename `tests/spawn/test_subint_forkserver.py` →
`test_main_thread_forkserver.py` and migrate its imports +
internal refs to the new canonical names:

- `fork_from_worker_thread`, `wait_child` → from
  `tractor.spawn._main_thread_forkserver`.
- `run_subint_in_worker_thread` → still from `_subint_forkserver`
  (variant-2 primitive).
- Module docstring + tier-3 fixture + the `*_spawn_basic` test fn
  renamed for variant-1-honesty.
- Orphan-harness subprocess argv flipped from `'subint_forkserver'`
  → `'main_thread_forkserver'`.

`ai/conc-anal/subint_fork_from_main_thread_smoketest.py` imports split
the same way.

`tractor/spawn/_subint_forkserver.py` drops the backward- compat
re-exports of the fork primitives — the only consumers (test file
+ smoketest) now import from `_main_thread_forkserver` directly.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-27 19:47:44 -04:00
Gud Boi 5e83881f10 Add `subint_forkserver_proc` stub, flip dispatch, prune
Reduce `_subint_forkserver.py` to its variant-2 placeholder shape:

- Add `subint_forkserver_proc` async stub raising `NotImplementedError`
  with a redirect msg pointing at the working variant-1 backend
  (`main_thread_forkserver`), jcrist/msgspec#1026 (upstream PEP 684
  blocker), and #379 (subint umbrella).

- `tractor.spawn._spawn._methods['subint_forkserver']` now dispatches to
  the stub instead of aliasing the variant-1 coroutine
  — `--spawn-backend=subint_forkserver` errors cleanly.

- Drop now-dead module-scope: `ChildSigintMode`
  / `_DEFAULT_CHILD_SIGINT` defs, `_has_subints` try/except (replaced
  with import from `._subint`), unused imports (`partial`, `Literal`,
  `sys`, msgtypes/pretty_struct, `current_actor`,
  `cancel_on_completion`/`soft_kill`, `_server` TYPE_CHECKING).

- Backward-compat re-exports of fork primitives kept until the follow-up
  commit migrates external test imports.

- `tests/spawn/test_subint_forkserver.py::forkserver_spawn_method`
  fixture: flip hardcoded `'subint_forkserver'`
  → `'main_thread_forkserver'` so the test still exercises the working
  backend (full file rename comes in the test-import migration commit).

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-27 19:36:08 -04:00
Gud Boi 57dae0e4a6 Split forkserver backend into variant 1/2 mods
The `subint_forkserver` name was always aspirational —
today's impl forks from a regular main-interp worker
thread and the child runs trio on its own main interp;
NO subinterp anywhere in parent or child. Splitting the
backend into two clearly-named variants drops the lie:

- **variant 1** — `main_thread_forkserver` (the working
  impl). New `SpawnMethodKey` literal + `_methods`
  dispatch entry + `_runtime.Actor._from_parent()`
  match-arm. The spawn-coro `subint_forkserver_proc`
  moves to `_main_thread_forkserver` and is renamed
  `main_thread_forkserver_proc()`.

- **variant 2** — `subint_forkserver` (future, reserved).
  Module shrinks to a placeholder describing the
  variant-2 design (subint-isolated child runtime, gated
  on jcrist/msgspec#1026 + PEP 684). Today the legacy
  `'subint_forkserver'` key aliases to
  `main_thread_forkserver_proc` so existing
  `--spawn-backend=subint_forkserver` invocations keep
  working; flipped to a `NotImplementedError` stub in a
  follow-up.

Deats,
- `Actor._from_parent()` spawn-method gate now accepts
  both `'main_thread_forkserver'` and
  `'subint_forkserver'` (both go through the
  IPC-`SpawnSpec` path).
- the variant-1 spawn-coro stamps its own `SpawnSpec` /
  log lines with `spawn_method='main_thread_forkserver'`
  so subactor renders reflect the actual mechanism.
- docstring reorg: trio×fork hazard breakdown, POSIX
  fork-survival semantics, in-process-vs-stdlib
  forkserver design notes, and the TODO/cleanup section
  all move from `_subint_forkserver` to
  `_main_thread_forkserver` (lives with the working
  code). `_subint_forkserver` keeps a tight forward-
  looking doc that motivates the reserved key.
- `run_subint_in_worker_thread()` stays in
  `_subint_forkserver` as the companion primitive — it's
  the subint counterpart to `fork_from_worker_thread()`
  and will plug into the future variant-2 spawn-coro.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-27 19:28:11 -04:00
Gud Boi 99dade0fb3 Extract fork primitives into `_main_thread_forkserver`
Move the truly-generic main-interp-worker-thread fork primitives
(`fork_from_worker_thread`, `_close_inherited_fds`, `_ForkedProc`,
`wait_child`, `_format_child_exit`) out of `_subint_forkserver.py` into
a sibling `_main_thread_forkserver.py` module so the primitive layer is
honestly named — none of these helpers touch a subint, they just fork
from a main-interp worker thread.

`_subint_forkserver.py` keeps its public surface intact via re-export so
any existing `from tractor.spawn._subint_forkserver import ...` callsite
still resolves.

Net: zero behavior change, preps the way for the upcoming spawn-method
key split where `main_thread_forkserver` ships as the working backend
and `subint_forkserver` becomes reserved for the future
subint-isolated-child variant (gated on jcrist/msgspec#1026).

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-27 19:04:26 -04:00
Gud Boi 4b5176e2c3 Doc future-subint payoffs for `_subint_forkserver`
Adds a "Future arch — what subints would buy us" section to
the module docstring, complementing the prior commit's
current-state rationale. Code is unchanged.

Frames the `subint` prefix as family-naming today (no actual
subinterp is created yet), then lays out the three concrete
wins that land once jcrist/msgspec#1026 unblocks PEP 684
isolated-mode subints:

- Cheaper forks — moving the parent's `trio.run()` into a
  subint shrinks the main-interp COW image the child inherits.
  The main interp becomes the literal forkserver: an
  intentionally-empty execution ctx whose only job is to call
  `os.fork()` cleanly.

- True parallelism — per-interp GIL means the forkserver
  thread on main and the trio thread on subint actually run in
  parallel. Spawn latency stops stalling the trio loop.

- Multi-actor-per-process — the architectural payoff. With
  per-interp-GIL subints, one process can host main + N
  subint-resident actor `trio.run()`s, and `os.fork()` reverts
  to the last-resort spawn (only when OS-level isolation is
  actually needed). Joins the story with the in-thread
  `_subint.py` backend: `subint` → in-process spawn,
  `subint_forkserver` → cross-process when a real OS boundary
  is required.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-27 18:20:10 -04:00
Gud Boi 3ab99d557a Doc `_subint_forkserver` design + fork semantics
Major expansion of the module docstring. Code is
unchanged; this lands the architectural reasoning that
was previously implicit, plus the POSIX/trio fork
mechanics the design relies on.

New sections:
- "Design rationale" — answers two implicit questions:
  (1) why a forkserver pattern at all (vs. forking
  directly from a trio task), (2) why in-process (vs.
  stdlib `mp.forkserver`'s sidecar process). Documents
  the three costs the in-process design avoids
  (sidecar lifecycle, per-spawn IPC, cold-start child)
  and the tradeoffs we accept in exchange (3.14-only,
  heavier than `to_thread.run_sync`).
- "Implementation status" — clarifies what's actually
  landed today vs. the envisioned arch: parent's
  `trio.run()` still lives on main interp (subint-
  hosted root gated on jcrist/msgspec#1026). Names
  why the "subint" prefix is correct anyway — same PR
  series as `_subint.py` / `_subint_fork.py`.
- "What survives the fork? — POSIX semantics" — POSIX
  preserves only the calling thread, so the
  `trio.run()` thread is gone in the child. Includes
  a small parent/child thread-survival table and
  covers the four artifact classes that DO cross the
  fork boundary (inherited fds, COW memory, Python
  thread state, user-level locks) and how each is
  handled.
- "FYI: how this dodges the `trio.run()` × `fork()`
  hazards" — itemizes each class of trio process-
  global state (wakeup-fd, `epoll`/`kqueue`,
  threadpool, cancel scopes / nurseries, `atexit`,
  foreign-language I/O) and explains how the
  forkserver-thread design avoids each.

Also,
- bump the gated msgspec issue link from
  `jcrist/msgspec#563` to `jcrist/msgspec#1026` (the
  PEP 684 isolated-mode tracker).

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-27 18:16:50 -04:00
Gud Boi 54561959e6 Log subint bootstrap excs + cancel-leak state
Two diagnostic gaps in `tractor.spawn._subint.subint_proc()` that hid
otherwise-silent failures, plus tracking-issue links on the two open
`subint_forkserver` follow-ups.

Deats,
- bootstrap-exc visibility: wrap the call to
  `_interpreters.exec(interp_id, bootstrap)` with
  `try/except BaseException` + `log.exception(...)`.
  * Without it, an `ImportError` / `SyntaxError` raised inside the
    dedicated driver thread goes only to Python's default thread
    excepthook — invisible to the parent, which then waits forever on
    `subint_exited.wait()`.
  * `?TODO` notes `anyio`'s `to_interpreter._interp_call` +
    `(retval, is_exception)` pattern as the next step for re-raising;
    skipped now bc it must coordinate with the `trio.Cancelled` paths
    around the existing `.wait()` calls.

- cancel-leak disambiguation: when the driver thread doesn't exit within
  `_HARD_KILL_TIMEOUT`, also log `_interpreters.is_running(interp_id)`
  as `subint_still_running=...` so the operator can tell "thread leaked,
  subint already done" apart from "thread alive bc subint is wedged".
  * pattern borrowed from `trio-parallel`'s `_sint.SintWorker.is_alive()`.

- `?TODO` near the `bootstrap` literal: future switch to
  `_interpreters.set___main___attrs()` — same API `anyio`
  uses in `to_interpreter._Worker.call()` — for passing
  non-`repr()`-roundtrippable values (`SpawnSpec` struct, callables,
  etc).
  * add cross-refs tracking issue `#379`.

Also,
- `Tracked at: [#449]` link on
  `subint_forkserver_test_cancellation_leak_issue.md`.
- `Tracked at: [#450]` link on
  `subint_forkserver_thread_constraints_on_pep684_issue.md`.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-27 15:57:55 -04:00
Gud Boi 4f12d69b41 Add `--shm` orphan sweep to `tractor-reap`
Since `tractor.ipc._mp_bs.disable_mantracker()` turns off
`mp.resource_tracker` entirely (see the conc-anal doc
`subint_forkserver_mp_shared_memory_issue.md`), a
hard-crashing actor can leave `/dev/shm/<key>` segments
that nothing else GCs. New `tractor-reap` phase 2 sweeps
them.

Deats,
- `tractor/_testing/_reap.py`: add `find_orphaned_shm()`
  + `reap_shm()` helpers. Match criteria: regular file
  under `/dev/shm`, owned by current uid, AND no live
  proc has it open (mmap'd or fd-held). In-use
  enumeration via `psutil.Process.memory_maps()` +
  `.open_files()` — xplatform, kernel-canonical (same
  answer `lsof` would give), no reliance on
  tractor-specific shm-key naming.
- `_ensure_shm_supported()` guard: helpers raise
  `NotImplementedError` outside Linux/FreeBSD bc macOS
  POSIX shm has no fs-visible path (`shm_open` only)
  and Windows is a different story.
- `scripts/tractor-reap`: new `--shm` (run after
  process reap) and `--shm-only` (skip process phase)
  flags. `-n` dry-runs both phases. Exit code is `1`
  if either phase had survivors/errors.
- `pyproject.toml` + `uv.lock`: add `psutil>=7.0.0` to
  the `testing` dep group; lazy-imported in `_reap.py`
  so the process-reap path stays import-clean without
  it.

Also,
- doc `--shm` in `.claude/skills/run-tests/SKILL.md`
  (new section 10c) — covers match criteria + the
  preservation guarantee for unrelated apps.
- flip mitigation status in
  `subint_forkserver_mp_shared_memory_issue.md` from
  "could extend `tractor-reap`" to "implemented", with
  a note that callers should still UUID-pin shm keys to
  avoid cross-session collisions.

Verified locally vs 81 in-use segments held by `piker`,
`lttng-ust-*`, `aja-shm-*` — all preserved; only the
genuinely-orphaned tractor segments got unlinked.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-27 11:35:33 -04:00
Gud Boi aa3e230926 Fix `SharedMemory` under `subint_forkserver`
Implements the resolution described in c99d475d's
`subint_forkserver_mp_shared_memory_issue.md` (now
updated with the resolution post-mortem). Two-part
fix that side-steps `mp.resource_tracker` entirely
rather than try to make it fork-safe — turns out
that's both simpler AND more correct given tractor
already SC-manages allocation lifetimes.

Deats,
- `tractor/ipc/_mp_bs.py::disable_mantracker()`: drop the
  `platform.python_version_tuple()[:-1] >= ('3', '13')` branch — patches
  now run unconditionally:
  * monkey-patch `mp.resource_tracker. _resource_tracker` to a no-op
    `ManTracker` subclass (empty `register` / `unregister`
    / `ensure_running`).
  * return `partial(SharedMemory, track=False)` for the per-allocation
    opt-out.
  * belt + suspenders: even if something dodges the wrapper, the
    singleton can't talk to the inherited (broken) parent fd.

- `tractor/ipc/_shm.py::open_shm_list()`: drop the 3.13+ conditional
  skip of the unlink-callback; install a `try_unlink()` wrapper that
  swallows `FileNotFoundError` (sibling-already-cleaned race in
  shared-key setups). Without `mp.resource_tracker` doing it for us, we
  own the unlink — `actor. lifetime_stack` is the right place since
  tractor already controls actor lifecycle.

- `tests/test_shm.py`: uncomment-out `subint_forkserver` from the
  module-level skip- list (tests pass now). Inline comment cross-refs
  the two `_mp_bs` / `_shm` workarounds.

- `ai/conc-anal/subint_forkserver_mp_shared_memory_ issue.md`: heavy
  rewrite — flips status from "open / unresolvable in tractor" to
  "resolved, kept as decision record". Adds Resolution section, "Why
  this is the right call" rationale (mp tracker is widely criticized;
  tractor already owns lifecycle), trade-offs (crash-leaked segments,
  lost mp leak warning), verification (7 passed under both
  `subint_forkserver` and `trio` backends), and upstream issue links

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-27 10:51:28 -04:00
Gud Boi eae478f3d5 Add `_testing._reap` + auto-reap fixture
Zombie-subactor cleanup for the test suite, SC-polite discipline
(`SIGINT` first, bounded grace, `SIGKILL` only on survivors). Two parts:
a shared reaper module + an autouse session-end fixture that runs it.

Deats,
- new `tractor/_testing/_reap.py` (+230 LOC) — Linux- only reaper using
  `/proc/<pid>/{status,cwd,cmdline}` inspection. Two detection modes:
  - `find_descendants(parent_pid)` for the in-session case
    (PPid-direct-match while pytest is still alive).
  - `find_orphans(repo_root)` for the CLI / post- mortem case (`PPid==1`
    reparented to init + `cwd` filter to repo root + `python` cmdline
    filter).
- `reap(pids, *, grace=3.0, poll=0.25)` does the signal ladder: SIGINT
  all, poll up to `grace` for exit, SIGKILL any survivors. Returns
  `(signalled, killed)` for caller-side reporting.
- new `_reap_orphaned_subactors` session-scoped autouse fixture in
  `tractor/_testing/pytest.py` — after `yield`, runs
  `find_descendants(os.getpid())` + `reap(...)` so each pytest session
  leaves no surviving forks.
- companion CLI scaffolding lives at `scripts/tractor-reap` (separate
  commit) for the pytest-died-mid-session case where the in-session
  fixture didn't get to run.

Also,
- promote `from tractor.spawn._spawn import SpawnMethodKey` to
  module-top in `pytest.py` (was inline-imported inside
  `pytest_generate_tests`), and reuse it in
  `pytest_collection_modifyitems` to assert each `skipon_spawn_backend`
  mark arg is a valid spawn-method literal — catches typos at collection
  time.
- inline `# ?TODO` flags running these through the `try_set_backend`
  checker for stronger validation.

Cross-refs `feedback_sc_graceful_cancel_first.md` for the
SIGINT-before-SIGKILL discipline rationale.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-25 00:05:58 -04:00
Gud Boi 4c133ab541 Default `pytest` to use `--capture=sys`
Lands the capture-pipe workaround from the prior cluster of diagnosis
commits: switch pytest's `--capture` mode from the default `fd`
(redirects fd 1,2 to temp files, which fork children inherit and can
deadlock writing into) to `sys` (only `sys.stdout` / `sys.stderr` — fd
1,2 left alone).

Trade-off documented inline in `pyproject.toml`:
- LOST: per-test attribution of raw-fd output (C-ext writes,
  `os.write(2, ...)`, subproc stdout). Still goes to terminal / CI
  capture, just not per-test-scoped in the failure report.
- KEPT: `print()` + `logging` capture per-test (tractor's logger uses
  `sys.stderr`).
- KEPT: `pytest -s` debugging behavior.

This allows us to re-enable `test_nested_multierrors` without
skip-marking + clears the class of pytest-capture-induced hangs for any
future fork-based backend tests.

Deats,
- `pyproject.toml`: `'--capture=sys'` added to `addopts` w/ ~20 lines of
  rationale comment cross-ref'ing the post-mortem doc

- `test_cancellation`: drop `skipon_spawn_backend('subint_forkserver')`
  from `test_nested_ multierrors` — no longer needed.
  * file-level `pytestmark` covers any residual.

- `tests/spawn/test_subint_forkserver.py`: orphan-SIGINT test's xfail
  mark loosened from `strict=True` to `strict=False` + reason rewritten.
  * it passes in isolation but is session-env-pollution sensitive
    (leftover subactor PIDs competing for ports / inheriting harness
    FDs).
  * tolerate both outcomes until suite isolation improves.

- `test_shm`: extend the existing
  `skipon_spawn_backend('subint', ...)` to also skip
  `'subint_forkserver'`.
  * Different root cause from the cancel-cascade class:
    `multiprocessing.SharedMemory`'s `resource_tracker` + internals
    assume fresh- process state, don't survive fork-without-exec cleanly

- `tests/discovery/test_registrar.py`: bump timeout 3→7s on one test
  (unrelated to forkserver; just a flaky-under-load bump).

- `tractor.spawn._subint_forkserver`: inline comment-only future-work
  marker right before `_actor_child_main()` describing the planned
  conditional stdout/stderr-to-`/dev/null` redirect for cases where
  `--capture=sys` isn't enough (no code change — the redirect logic
  itself is deferred).

EXTRA NOTEs
-----------
The `--capture=sys` approach is the minimum- invasive fix: just a pytest
ini change, no runtime code change, works for all fork-based backends,
trade-offs well-understood (terminal-level capture still happens, just
not pytest's per-test attribution of raw-fd output).

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-24 14:17:23 -04:00
Gud Boi e312a68d8a Bound peer-clear wait in `async_main` finally
Fifth diagnostic pass pinpointed the hang to
`async_main`'s finally block — every stuck actor
reaches `FINALLY ENTER` but never `RETURNING`.
Specifically `await ipc_server.wait_for_no_more_
peers()` never returns when a peer-channel handler
is stuck: the `_no_more_peers` Event is set only
when `server._peers` empties, and stuck handlers
keep their channels registered.

Wrap the call in `trio.move_on_after(3.0)` + a
warning-log on timeout that records the still-
connected peer count. 3s is enough for any
graceful cancel-ack round-trip; beyond that we're
in bug territory and need to proceed with local
teardown so the parent's `_ForkedProc.wait()` can
unblock. Defensive-in-depth regardless of the
underlying bug — a local finally shouldn't block
on remote cooperation forever.

Verified: with this fix, ALL 15 actors reach
`async_main: RETURNING` (up from 10/15 before).

Test still hangs past 45s though — there's at
least one MORE unbounded wait downstream of
`async_main`. Candidates enumerated in the doc
update (`open_root_actor` finally /
`actor.cancel()` internals / trio.run bg tasks /
`_serve_ipc_eps` finally). Skip-mark stays on
`test_nested_multierrors[subint_forkserver]`.

Also updates
`subint_forkserver_test_cancellation_leak_issue.md`
with the new pinpoint + summary of the 6-item
investigation win list:
1. FD hygiene fix (`_close_inherited_fds`) —
   orphan-SIGINT closed
2. pidfd-based `_ForkedProc.wait` — cancellable
3. `_parent_chan_cs` wiring — shielded parent-chan
   loop now breakable
4. `wait_for_no_more_peers` bound — THIS commit
5. Ruled-out hypotheses: tree-kill missing, stuck
   socket recv, capture-pipe fill (all wrong)
6. Remaining unknown: at least one more unbounded
   wait in the teardown cascade above `async_main`

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 22:34:49 -04:00
Gud Boi 458a35cf09 Surface silent failures in `_subint_forkserver`
Three places that previously swallowed exceptions silently now log via
`log.exception()` so they surface in the runtime log when something
weird happens — easier to track down sneaky failures in the
fork-from-worker-thread / subint-bootstrap primitives.

Deats,
- `_close_inherited_fds()`: post-fork child's per-fd `os.close()`
  swallow now logs the fd that failed to close. The comment notes the
  expected failure modes (already-closed-via-listdir-race,
  otherwise-unclosable) — both still fine to ignore semantically, but
  worth flagging in the log.
- `fork_from_worker_thread()` parent-side timeout branch: the
  `os.close(rfd)` + `os.close(wfd)` cleanup now logs each pipe-fd close
  failure separately before raising the `worker thread didn't return`
  RuntimeError.
- `run_subint_in_worker_thread._drive()`: when
  `_interpreters.exec(interp_id, bootstrap)` raises a `BaseException`,
  log the full call signature (interp_id + bootstrap) along with the
  captured exception, before stashing into `err` for the outer caller.

Behavior unchanged — only adds observability.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 8ac3dfeb85 Break parent-chan shield during teardown
Completes the nested-cancel deadlock fix started in
0cd0b633 (fork-child FD scrub) and fe540d02 (pidfd-
cancellable wait). The remaining piece: the parent-
channel `process_messages` loop runs under
`shield=True` (so normal cancel cascades don't kill
it prematurely), and relies on EOF arriving when the
parent closes the socket to exit naturally.

Under exec-spawn backends (`trio_proc`, mp) that EOF
arrival is reliable — parent's teardown closes the
handler-task socket deterministically. But fork-
based backends like `subint_forkserver` share enough
process-image state that EOF delivery becomes racy:
the loop parks waiting for an EOF that only arrives
after the parent finishes its own teardown, but the
parent is itself blocked on `os.waitpid()` for THIS
actor's exit. Mutual wait → deadlock.

Deats,
- `async_main` stashes the cancel-scope returned by
  `root_tn.start(...)` for the parent-chan
  `process_messages` task onto the actor as
  `_parent_chan_cs`
- `Actor.cancel()`'s teardown path (after
  `ipc_server.cancel()` + `wait_for_shutdown()`)
  calls `self._parent_chan_cs.cancel()` to
  explicitly break the shield — no more waiting for
  EOF delivery, unwinding proceeds deterministically
  regardless of backend
- inline comments on both sites explain the mutual-
  wait deadlock + why the explicit cancel is
  backend-agnostic rather than a forkserver-specific
  workaround

With this + the prior two fixes, the
`subint_forkserver` nested-cancel cascade unwinds
cleanly end-to-end.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi c20b05e181 Use `pidfd` for cancellable `_ForkedProc.wait`
Two coordinated improvements to the `subint_forkserver` backend:

1. Replace `trio.to_thread.run_sync(os.waitpid, ...,
   abandon_on_cancel=False)` in `_ForkedProc.wait()`
   with `trio.lowlevel.wait_readable(pidfd)`. The
   prior version blocked a trio cache thread on a
   sync syscall — outer cancel scopes couldn't
   unwedge it when something downstream got stuck.
   Same pattern `trio.Process.wait()` and
   `proc_waiter` (the mp backend) already use.

2. Drop the `@pytest.mark.xfail(strict=True)` from
   `test_orphaned_subactor_sigint_cleanup_DRAFT` —
   the test now PASSES after 0cd0b633 (fork-child
   FD scrub). Same root cause as the nested-cancel
   hang: inherited IPC/trio FDs were poisoning the
   child's event loop. Closing them lets SIGINT
   propagation work as designed.

Deats,
- `_ForkedProc.__init__` opens a pidfd via
  `os.pidfd_open(pid)` (Linux 5.3+, Python 3.9+)
- `wait()` parks on `trio.lowlevel.wait_readable()`,
  then non-blocking `waitpid(WNOHANG)` to collect
  the exit status (correct since the pidfd signal
  IS the child-exit notification)
- `ChildProcessError` swallow handles the rare race
  where someone else reaps first
- pidfd closed after `wait()` completes (one-shot
  semantics) + `__del__` belt-and-braces for
  unexpected-teardown paths
- test docstring's `@xfail` block replaced with a
  `# NOTE` comment explaining the historical
  context + cross-ref to the conc-anal doc; test
  remains in place as a regression guard

The two changes are interdependent — the
cancellable `wait()` matters for the same nested-
cancel scenarios the FD scrub fixes, since the
original deadlock had trio cache workers wedged in
`os.waitpid` swallowing the outer cancel.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 9993db0193 Scrub inherited FDs in fork-child prelude
Implements fix-direction (1)/blunt-close-all-FDs from
b71705bd (`subint_forkserver` nested-cancel hang
diag), targeting the multi-level cancel-cascade
deadlock in
`test_nested_multierrors[subint_forkserver]`.

The diagnosis doc voted for surgical FD cleanup via
`actor.ipc_server` handle as the cleanest approach,
but going blunt is actually the right call: after
`os.fork()`, the child immediately enters
`_actor_child_main()` which opens its OWN IPC
sockets / wakeup-fd / epoll-fd / etc. — none of the
parent's FDs are needed. Closing everything except
stdio is safe AND defends against future
listener/IPC additions to the parent inheriting
silently into children.

Deats,
- new `_close_inherited_fds(keep={0,1,2}) -> int`
  helper. Linux fast-path enumerates `/proc/self/fd`;
  POSIX fallback uses `RLIMIT_NOFILE` range. Matches
  the stdlib `subprocess._posixsubprocess.close_fds`
  strategy. Returns close-count for sanity logging
- wire into `fork_from_worker_thread._worker()`'s
  post-fork child prelude — runs immediately after
  the pid-pipe `os.close(rfd/wfd)`, before the user
  `child_target` callable executes
- docstring cross-refs the diagnosis doc + spells
  out the FD-inheritance-cascade mechanism and why
  the close-all approach is safe for our spawn shape

Validation pending: re-run `test_nested_multierrors[subint_forkserver]`
to confirm the deadlock is gone.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 1e357dcf08 Mv `test_subint_cancellation.py` to `tests/spawn/` subpkg
Also, some slight touchups in `.spawn._subint`.
2026-04-23 18:48:34 -04:00
Gud Boi e31eb8d7c9 Label forkserver child as `subint_forkserver`
Follow-up to 72d1b901 (was prev commit adding `debug_mode` for
`subint_forkserver`): that commit wired the runtime-side
`subint_forkserver` SpawnSpec-recv gate in `Actor._from_parent`, but the
`subint_forkserver_proc` child-target was still passing
`spawn_method='trio'` to `_trio_main` — so `Actor.pformat()` / log lines
would report the subactor as plain `'trio'` instead of the actual
parent-side spawn mechanism. Flip the label to `'subint_forkserver'`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 8bcbe730bf Enable `debug_mode` for `subint_forkserver`
The `subint_forkserver` backend's child runtime is trio-native (uses
`_trio_main` + receives `SpawnSpec` over IPC just like `trio`/`subint`),
so `tractor.devx.debug._tty_lock` works in those subactors. Wire the
runtime gates that historically hard-coded `_spawn_method == 'trio'` to
recognize this third backend.

Deats,
- new `_DEBUG_COMPATIBLE_BACKENDS` module-const in `tractor._root`
  listing the spawn backends whose subactor runtime is trio-native
  (`'trio'`, `'subint_forkserver'`). Both the enable-site
  (`_runtime_vars['_debug_mode'] = True`) and the cleanup-site reset
  key.
  off the same tuple — keep them in lockstep when adding backends
- `open_root_actor`'s `RuntimeError` for unsupported backends now
  reports the full compatible-set + the rejected method instead of the
  stale "only `trio`" msg.
- `runtime._runtime.Actor._from_parent`'s SpawnSpec-recv gate adds
  `'subint_forkserver'` to the existing `('trio', 'subint')` tuple
  — fork child-side runtime receives the same SpawnSpec IPC handshake as
  the others.
- `subint_forkserver_proc` child-target now passes
  `spawn_method='subint_forkserver'` (was hard-coded `'trio'`) so
  `Actor.pformat()` / log lines reflect the actual parent-side spawn
  mechanism rather than masquerading as plain `trio`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 5e85f184e0 Drop unneeded f-str prefixes 2026-04-23 18:48:34 -04:00
Gud Boi a72deef709 Refine `subint_forkserver` orphan-SIGINT diagnosis
Empirical follow-up to the xfail'd orphan-SIGINT test:
the hang is **not** "trio can't install a handler on a
non-main thread" (the original hypothesis from the
`child_sigint` scaffold commit). On py3.14:

- `threading.current_thread() is threading.main_thread()`
  IS True post-fork — CPython re-designates the
  fork-inheriting thread as "main" correctly
- trio's `KIManager` SIGINT handler IS installed in the
  subactor (`signal.getsignal(SIGINT)` confirms)
- the kernel DOES deliver SIGINT to the thread

But `faulthandler` dumps show the subactor wedged in
`trio/_core/_io_epoll.py::get_events` — trio's
wakeup-fd mechanism (which turns SIGINT into an epoll-wake)
isn't firing. So the `except KeyboardInterrupt` at
`tractor/spawn/_entry.py::_trio_main:164` — the runtime's
intentional "KBI-as-OS-cancel" path — never fires.

Deats,
- new `ai/conc-anal/subint_forkserver_orphan_sigint_hang_issue.md`
  (+385 LOC): full writeup — TL;DR, symptom reproducer,
  the "intentional cancel path" the bug defeats,
  diagnostic evidence (`faulthandler` output +
  `getsignal` probe), ruled-out hypotheses
  (non-main-thread issue, wakeup-fd inheritance,
  KBI-as-trio-check-exception), and fix directions
- `test_orphaned_subactor_sigint_cleanup_DRAFT` xfail
  `reason` + test docstring rewritten to match the
  refined understanding — old wording blamed the
  non-main-thread path, new wording points at the
  `epoll_wait` wedge + cross-refs the new conc-anal doc
- `_subint_forkserver` module docstring's
  `child_sigint='trio'` bullet updated: now notes trio's
  handler is already correctly installed, so the flag may
  end up a no-op / doc-only mode once the real root cause
  is fixed

Closing the gap aligns with existing design intent (make
the already-designed "KBI-as-OS-cancel" behavior actually
fire), not a new feature.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi dcd5c1ff40 Scaffold `child_sigint` modes for forkserver
Add configuration surface for future child-side SIGINT
plumbing in `subint_forkserver_proc` without wiring up the
actual trio-native SIGINT bridge — lifting one entry-guard
clause will flip the `'trio'` branch live once the
underlying fork-prelude plumbing is implemented.

Deats,
- new `ChildSigintMode = Literal['ipc', 'trio']` type +
  `_DEFAULT_CHILD_SIGINT = 'ipc'` module-level default.
  Docstring block enumerates both:
  - `'ipc'` (default, currently the only implemented mode):
    no child-side SIGINT handler — `trio.run()` is on the
    fork-inherited non-main thread where
    `signal.set_wakeup_fd()` is main-thread-only, so
    cancellation flows exclusively via the parent's
    `Portal.cancel_actor()` IPC path. Known gap: orphan
    children don't respond to SIGINT
    (`test_orphaned_subactor_sigint_cleanup_DRAFT`)
  - `'trio'` (scaffolded only): manual SIGINT → trio-cancel
    bridge in the fork-child prelude so external Ctrl-C
    reaches stuck grandchildren even w/ a dead parent
- `subint_forkserver_proc` pulls `child_sigint` out of
  `proc_kwargs` (matches how `trio_proc` threads config to
  `open_process`, keeps `start_actor(proc_kwargs=...)` as
  the ergonomic entry point); validates membership + raises
  `NotImplementedError` for `'trio'` at the backend-entry
  guard
- `_child_target` grows a `match child_sigint:` arm that
  slots in the future `'trio'` impl without restructuring
  — today only the `'ipc'` case is reachable
- module docstring "Still-open work" list grows a bullet
  pointing at this config + the xfail'd orphan-SIGINT test

No behavioral change on the default path — `'ipc'` is the
existing flow. Scaffolding only.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 7804a9feac Refactor `_runtime_vars` into pure get/set API
Post-fork `_runtime_vars` reset in `subint_forkserver_proc`
was previously done via direct mutation of
`_state._runtime_vars` from an external module + an inline
default dict duplicating the `_state.py`-internal defaults.
Split the access surface into a pure getter + explicit
setter so the reset call site becomes a one-liner
composition.

Deats `tractor/runtime/_state.py`,
- extract initial values into a module-level
  `_RUNTIME_VARS_DEFAULTS: dict[str, Any]` constant; the
  live `_runtime_vars` is now initialised from
  `dict(_RUNTIME_VARS_DEFAULTS)`
- `get_runtime_vars()` grows a `clear_values: bool = False`
  kwarg. When True, returns a fresh copy of
  `_RUNTIME_VARS_DEFAULTS` instead of the live dict —
  still a **pure read**, never mutates anything
- new `set_runtime_vars(rtvars: dict | RuntimeVars)` —
  atomic replacement of the live dict's contents via
  `.clear()` + `.update()`, so existing references to the
  same dict object remain valid. Accepts either the
  historical dict form or the `RuntimeVars` struct

Deats `tractor/spawn/_subint_forkserver.py`,
- collapse the prior ad-hoc `.update({...})` block into
  `set_runtime_vars(get_runtime_vars(clear_values=True))`
- drop the `_state._current_actor = None` line —
  `_trio_main` unconditionally overwrites it downstream,
  so no explicit reset needed (noted in the XXX comment)

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 63ab7c986b Reset post-fork `_state` in forkserver child
`os.fork()` inherits the parent's entire memory image,
including `tractor.runtime._state` globals that encode
"this process is the root actor" — `_runtime_vars`'s
`_is_root=True`, pre-populated `_root_mailbox` +
`_registry_addrs`, and the parent's `_current_actor`
singleton.

A fresh `exec`-based child starts with those globals at
their module-level defaults (all falsey/empty). The
forkserver child needs to match that shape BEFORE calling
`_actor_child_main()`, otherwise `Actor.__init__()` takes
the `is_root_process() == True` branch and pre-populates
`self.enable_modules`, which then trips
`assert not self.enable_modules` at the top of
`Actor._from_parent()` on the subsequent parent→child
`SpawnSpec` handshake.

Fix: at the start of `_child_target`, null
`_state._current_actor` and overwrite `_runtime_vars` with
a cold-root blank (`_is_root=False`, empty mailbox/addrs,
`_debug_mode=False`) before `_actor_child_main()` runs.

Found-via: `test_subint_forkserver_spawn_basic` hitting
the `enable_modules` assert on child-side runtime boot.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 26914fde75 Wire `subint_forkserver` as first-class backend
Promote `_subint_forkserver` from primitives-only into a
registered spawn backend: `'subint_forkserver'` is now a
`SpawnMethodKey` literal, dispatched via `_methods` to
the new `subint_forkserver_proc()` target, feature-gated
under the existing `subint`-family py3.14+ case, and
selectable via `--spawn-backend=subint_forkserver`.

Deats,
- new `subint_forkserver_proc()` spawn target in
  `_subint_forkserver`:
  - mirrors `trio_proc()`'s supervision model — real OS
    subprocess so `Portal.cancel_actor()` + `soft_kill()`
    on graceful teardown, `os.kill(SIGKILL)` on hard-reap
    (no `_interpreters.destroy()` race to fuss over bc the
    child lives in its own process)
  - only real diff from `trio_proc` is the spawn mechanism:
    fork from a main-interp worker thread via
    `fork_from_worker_thread()` (off-loaded to trio's
    thread pool) instead of `trio.lowlevel.open_process()`
  - child-side `_child_target` closure runs
    `tractor._child._actor_child_main()` with
    `spawn_method='trio'` — the child is a regular trio
    actor, "subint_forkserver" names how the parent
    spawned, not what the child runs
- new `_ForkedProc` class — thin `trio.Process`-compatible
  shim around a raw OS pid: `.poll()` via
  `waitpid(WNOHANG)`, async `.wait()` off-loaded to a trio
  cache thread, `.kill()` via `SIGKILL`, `.returncode`
  cached for repeat calls. `.stdin`/`.stdout`/`.stderr`
  are `None` (fork-w/o-exec inherits parent FDs; we don't
  marshal them) which matches `soft_kill()`'s `is not None`
  guards

Also, new backend-tier test
`test_subint_forkserver_spawn_basic` drives the registered
backend end-to-end via `open_root_actor` + `open_nursery` +
`run_in_actor` w/ a trivial portal-RPC round-trip. Uses a
`forkserver_spawn_method` fixture to flip
`_spawn_method`/`_ctx` for the test's duration + restore on
teardown (so other session-level tests don't observe the
global flip). Test module docstring reworked to describe
the three tiers now covered: (1) primitive-level, (2)
parent-trio-driven primitives, (3) full registered backend.

Status: still-open work (tracked on `tractor#379`) doc'd
inline in the module docstring — no cancel/hard-kill stress
coverage yet, child-side subint-hosted root runtime still
future (gated on `msgspec#563`), thread-hygiene audit
pending the same unblock.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 25e400d526 Add trio-parent tests for `_subint_forkserver`
New pytest module `tests/spawn/test_subint_forkserver.py`
drives the forkserver primitives from inside a real
`trio.run()` in the parent — the runtime shape tractor will
actually use when we wire up a `subint_forkserver` spawn
backend proper. Complements the standalone no-trio-in-parent
`ai/conc-anal/subint_fork_from_main_thread_smoketest.py`.

Deats,
- new test pkg `tests/spawn/` (+ empty `__init__.py`)
- two tests, both `@pytest.mark.timeout(30, method='thread')`
  for the GIL-hostage safety reason doc'd in
  `ai/conc-anal/subint_sigint_starvation_issue.md`:
  - `test_fork_from_worker_thread_via_trio` — parent-side
    plumbing baseline. `trio.run()` off-loads forkserver
    prims via `trio.to_thread.run_sync()` + asserts the
    child reaps cleanly
  - `test_fork_and_run_trio_in_child` — end-to-end: forked
    child calls `run_subint_in_worker_thread()` with a
    bootstrap str that does `trio.run()` in a fresh subint
- both tests wrap the inner `trio.run()` in a
  `dump_on_hang()` for post-mortem if the outer
  `pytest-timeout` fires
- intentionally NOT using `--spawn-backend` — the tests
  drive the primitives directly rather than going through
  tractor's spawn-method registry (which the forkserver
  isn't plugged into yet)

Also, rename `run_trio_in_subint()` →
`run_subint_in_worker_thread()` for naming consistency with
the sibling `fork_from_worker_thread()`. The action is really
"host a subint on a worker thread", not specifically "run
trio" — trio just happens to be the typical payload.
Propagate the rename to the smoketest.

Further, add a "TODO — cleanup gated on msgspec PEP 684
support" section to the `_subint_forkserver` module
docstring: flags the dedicated-`threading.Thread` design as
potentially-revisable once isolated-mode subints are viable
in tractor. Cross-refs `msgspec#563` + `tractor#379` and
points at an audit-plan conc-anal doc we'll add next.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 82332fbceb Lift fork prims into `_subint_forkserver` mod
The smoketest (prior commit) empirically validated the
"fork-from-main-interp-worker-thread" arch on py3.14. Promote
the validated primitives out of the `ai/conc-anal/` smoketest
into `tractor.spawn._subint_forkserver` so they can eventually
be wired into a real "subint forkserver" spawn backend.

Deats,
- new module `tractor/spawn/_subint_forkserver.py` (337 LOC):
  - `fork_from_worker_thread(child_target, thread_name)` —
    spawn a main-interp `threading.Thread`, call `os.fork()`
    from it, shuttle the child pid back to main via a pipe
  - `run_trio_in_subint(bootstrap, ...)` — post-fork helper:
    create a fresh subint + drive `_interpreters.exec()` on
    a dedicated worker thread running the `bootstrap` str
    (typically imports `trio`, defines an async entry, calls
    `trio.run()`)
  - `wait_child(pid, expect_exit_ok)` — `os.waitpid()` +
    pass/fail classification reusable from harness AND the
    eventual real spawn path
- feature-gated py3.14+ via the public
  `concurrent.interpreters` presence check; matches the gate
  in `tractor.spawn._subint`
- module docstring doc's the CPython-block context
  (cross-refs `_subint_fork` stub + the two `conc-anal/`
  docs) and status: EXPERIMENTAL, not yet registered in
  `_spawn._methods`

Also, refactor the smoketest
`ai/conc-anal/subint_fork_from_main_thread_smoketest.py` to
import the primitives from the new module rather than inline
its own copies. Keeps the smoketest and the tractor-side
impl in sync as the forkserver design evolves; the smoketest
remains a zero-`tractor`-runtime CPython-level check
(imports ONLY the three primitives, no runtime bring-up).

Status: next step is to drive these from a parent-side
`trio.run()` and hook the returned child pid into the normal
actor-nursery/IPC flow — then register `subint_forkserver`
as a `SpawnMethodKey` in `_spawn.py`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:34 -04:00
Gud Boi 0f48ed2eb9 Doc `subint_fork` as blocked by CPython post-fork
Empirical finding: the WIP `subint_fork_proc` scaffold
landed in `cf0e3e6f` does *not* work on current CPython.
The `fork()` syscall succeeds in the parent, but the
CHILD aborts immediately during
`PyOS_AfterFork_Child()` →
`_PyInterpreterState_DeleteExceptMain()`, which gates
on the current tstate belonging to the main interp —
the child dies with `Fatal Python error: not main
interpreter`.

CPython devs acknowledge the fragility with an in-source
comment (`// Ideally we could guarantee tstate is running
main.`) but expose no user-facing hook to satisfy the
precondition — so the strategy is structurally dead until
upstream changes.

Rather than delete the scaffold, reshape it into a
documented dead-end so the next person with this idea
lands on the reason rather than rediscovering the same
CPython-level refusal.

Deats,
- Move `subint_fork_proc` out of `tractor.spawn._subint`
  into a new `tractor.spawn._subint_fork` dedicated
  module (153 LOC). Module + fn docstrings now describe
  the blockage directly; the fn body is trimmed to a
  `NotImplementedError` pointing at the analysis doc —
  no more dead-code `bootstrap` sketch bloating
  `_subint.py`.
- `_spawn.py`: keep `'subint_fork'` in `SpawnMethodKey`
  + the `_methods` dispatch so
  `--spawn-backend=subint_fork` routes to a clean
  `NotImplementedError` rather than "invalid backend";
  comment calls out the blockage. Collapse the duplicate
  py3.14 feature-gate in `try_set_start_method()` into a
  combined `case 'subint' | 'subint_fork':` arm.
- New 337-line analysis:
  `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
  Annotated walkthrough from the user-visible fatal
  error down to the specific `Modules/posixmodule.c` +
  `Python/pystate.c` source lines enforcing the refusal,
  plus an upstream-report draft.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:06 -04:00
Gud Boi eee79a0357 Add WIP `subint_fork_proc` backend scaffold
Experimental third spawn backend: use a fresh
sub-interpreter purely as a trio-free launchpad from
which to `os.fork()` + exec back into
`python -m tractor._child`. Per issue #379's
"fork()-workaround/hacks" thread.

Intent is to sidestep both,
- the trio+fork hazards hitting `trio_proc` (python- trio/trio#1614 et
  al.), since the forking interp is guaranteed trio-free.

- the shared-GIL abandoned-thread hazards hitting `subint_proc`
  (`ai/conc-anal/subint_sigint_starvation_issue.md`), since we don't
  *stay* in the subint — it only lives long enough to call `os.fork()`

Downstream of the fork+exec, all the existing `trio_proc` plumbing is
reused verbatim: `ipc_server.wait_for_peer()`, `SpawnSpec`, `Portal`
yield, soft-kill.

Status: NOT wired up beyond scaffolding. The fn raises
`NotImplementedError` immediately; the `bootstrap` fork/exec string
builder and the `# TODO: orchestrate driver thread` block are kept
in-tree as deliberate dead code so the next iteration starts from
a concrete shape rather than a blank page.

Docstring calls out three open questions that need
empirical validation before wiring this up:
1. Does CPython permit `os.fork()` from a non-main
   legacy subint?
2. Can the child stay fork-without-exec and
   `trio.run()` directly from within the launchpad
   subint?
3. How do `signal.set_wakeup_fd()` handlers and other
   process-global state interact when the forking
   thread is inside a subint?

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:48:06 -04:00
Gud Boi 3b26b59dad Add `skipon_spawn_backend` pytest marker
A reusable `@pytest.mark.skipon_spawn_backend( '<backend>' [, ...],
reason='...')` marker for backend-specific known-hang / -borked cases
— avoids scattering `@pytest.mark.skipif(lambda ...)` branches across
tests that misbehave under a particular `--spawn-backend`.

Deats,
- `pytest_configure()` registers the marker via
  `addinivalue_line('markers', ...)`.
- New `pytest_collection_modifyitems()` hook walks
  each collected item with `item.iter_markers(
  name='skipon_spawn_backend')`, checks whether the
  active `--spawn-backend` appears in `mark.args`, and
  if so injects a concrete `pytest.mark.skip(
  reason=...)`. `iter_markers()` makes the decorator
  work at function, class, or module (`pytestmark =
  [...]`) scope transparently.
- First matching mark wins; default reason is
  `f'Borked on --spawn-backend={backend!r}'` if the
  caller doesn't supply one.

Also, tighten type annotations on nearby `pytest`
integration points — `pytest_configure`, `debug_mode`,
`spawn_backend`, `tpt_protos`, `tpt_proto` — now taking
typed `pytest.Config` / `pytest.FixtureRequest` params.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:47:49 -04:00
Gud Boi 34d9d482e4 Raise `subint` floor to py3.14 and split dep-groups
The private `_interpreters` C module ships since 3.13, but that vintage
wedges under our `threading.Thread` + multi-trio usage pattern
—> `_interpreters.exec()` silently never makes progress. 3.14 fixes it.
So gate on the presence of the public `concurrent.interpreters` wrapper
(3.14+ only) even tho we still call into the private module at runtime.

Deats,
- `try_set_start_method('subint')` error msg + `_subint` module
  docstring/comments rewritten to document the 3.14 floor and why 3.13
  can't work.
- `_subint._has_subints` gate now imports `concurrent.interpreters` (not
  `_interpreters`) as the version sentinel.

Also, reshuffle `pyproject.toml` deps into
per-python-version `[tool.uv.dependency-groups]`:
- `subints` group: `msgspec>=0.21.0`, py>=3.14
- `eventfd` group: `cffi>=1.17.1`, py>=3.13,<3.14
- `sync_pause` group: `greenback`, py>=3.13,<3.14
  (was in `devx`; moved out bc no 3.14 yet)

Bump top-level `msgspec>=0.20.0` too.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:47:49 -04:00
Gud Boi 09466a1e9d Add `._debug_hangs` to `.devx` for hang triage
Bottle up the diagnostic primitives that actually cracked the
silent mid-suite hangs in the `subint` spawn-backend bringup (issue
there" session has them on the shelf instead of reinventing from
scratch.

Deats,
- `dump_on_hang(seconds, *, path)` — context manager wrapping
  `faulthandler.dump_traceback_later()`. Critical gotcha baked in:
  dumps go to a *file*, not `sys.stderr`, bc pytest's stderr
  capture silently eats the output and you can spend an hour
  convinced you're looking at the wrong thing
- `track_resource_deltas(label, *, writer)` — context manager
  logging per-block `(threading.active_count(),
  len(_interpreters.list_all()))` deltas; quickly rules out
  leak-accumulation theories when a suite progressively worsens (if
  counts don't grow, it's not a leak, look for a race on shared
  cleanup instead)
- `resource_delta_fixture(*, autouse, writer)` — factory returning
  a `pytest` fixture wrapping `track_resource_deltas` per-test; opt
  in by importing into a `conftest.py`. Kept as a factory (not a
  bare fixture) so callers own `autouse` / `writer` wiring

Also,
- export the three names from `tractor.devx`
- dep-free on py<3.13 (swallows `ImportError` for `_interpreters`)
- link back to the provenance in the module docstring (issue #379 /
  commit `26fb820`)

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:47:49 -04:00
Gud Boi 99541feec7 Bound subint teardown shields with hard-kill timeout
Unbounded `trio.CancelScope(shield=True)` at the
soft-kill and thread-join sites can wedge the parent
trio loop indefinitely when a stuck subint ignores
portal-cancel (e.g. bc the IPC channel is already
broken).

Deats,
- add `_HARD_KILL_TIMEOUT` (3s) module-level const
- wrap both shield sites with
  `trio.move_on_after()` so we abandon a stuck
  subint after the deadline
- flip driver thread to `daemon=True` so proc-exit
  also isn't blocked by a wedged subint
- pass `abandon_on_cancel=True` to
  `trio.to_thread.run_sync(driver_thread.join)`
  — load-bearing for `move_on_after` to actually
  fire
- log warnings when either timeout triggers
- improve `InterpreterError` log msg to explain
  the abandoned-thread scenario

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:47:49 -04:00
Gud Boi 31cbd11a5b Fix subint destroy race via dedicated OS thread
`trio.to_thread.run_sync(_interpreters.exec, ...)` runs `exec()` on
a cached worker thread — and when that thread is returned to the
cache after the subint's `trio.run()` exits, CPython still keeps
the subint's tstate attached to the (now idle) worker. Result: the
teardown `_interpreters.destroy(interp_id)` in the `finally` block
can block the parent's trio loop indefinitely, waiting for a tstate
release that only happens when the worker either picks up a new job
or exits.

Manifested as intermittent mid-suite hangs under
`--spawn-backend=subint` — caught by a
`faulthandler.dump_traceback_later()` showing the main thread stuck
in `_interpreters.destroy()` at `_subint.py:293` with only an idle
trio-cache worker as the other live thread.

Deats,
- drive the subint on a plain `threading.Thread` (not
  `trio.to_thread`) so the OS thread truly exits after
  `_interpreters.exec()` returns, releasing tstate and unblocking
  destroy
- signal `subint_exited.set()` back to the parent trio loop from
  the driver thread via `trio.from_thread.run_sync(...,
  trio_token=...)` — capture the token at `subint_proc` entry
- swallow `trio.RunFinishedError` in that signal path for the case
  where parent trio has already exited (proc teardown)
- in the teardown `finally`, off-load the sync
  `driver_thread.join()` to `trio.to_thread.run_sync` (cache thread
  w/ no subint tstate → safe) so we actually wait for the driver to
  exit before `_interpreters.destroy()`

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:47:49 -04:00
Gud Boi 8a8d01e076 Doc the `_interpreters` private-API choice in `_subint`
Expand the comment block above the `_interpreters`
import explaining *why* we use the private C mod
over `concurrent.interpreters`: the public API only
exposes PEP 734's `'isolated'` config which breaks
`msgspec` (missing PEP 684 slot). Add reference
links to PEP 734, PEP 684, cpython sources, and
the msgspec upstream tracker (jcrist/msgspec#563).

Also,
- update error msgs in both `_spawn.py` and
  `_subint.py` to say "3.13+" (matching the actual
  `_interpreters` availability) instead of "3.14+".
- tweak the mod docstring to reflect py3.13+
  availability via the private C module.

Review: PR #444 (copilot-pull-request-reviewer)
https://github.com/goodboy/tractor/pull/444

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:47:49 -04:00
Gud Boi b8f243e98d Impl min-viable `subint` spawn backend (B.2)
Replace the B.1 scaffold stub w/ a working spawn
flow driving PEP 734 sub-interpreters on dedicated
OS threads.

Deats,
- use private `_interpreters` C mod (not the public
  `concurrent.interpreters` API) to get `'legacy'`
  subint config — avoids PEP 684 C-ext compat
  issues w/ `msgspec` and other deps missing the
  `Py_mod_multiple_interpreters` slot
- bootstrap subint via code-string calling new
  `_actor_child_main()` from `_child.py` (shared
  entry for both CLI and subint backends)
- drive subint lifetime on an OS thread using
  `trio.to_thread.run_sync(_interpreters.exec, ..)`
- full supervision lifecycle mirrors `trio_proc`:
  `ipc_server.wait_for_peer()` → send `SpawnSpec`
  → yield `Portal` via `task_status.started()`
- graceful shutdown awaits the subint's inner
  `trio.run()` completing; cancel path sends
  `portal.cancel_actor()` then waits for thread
  join before `_interpreters.destroy()`

Also,
- extract `_actor_child_main()` from `_child.py`
  `__main__` block as callable entry shape bc the
  subint needs it for code-string bootstrap
- add `"subint"` to the `_runtime.py` spawn-method
  check so child accepts `SpawnSpec` over IPC

Prompt-IO: ai/prompt-io/claude/20260417T124437Z_5cd6df5_prompt_io.md

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:47:49 -04:00
Gud Boi d2ea8aa2de Handle py3.14+ incompats as test skips
Since we're devving subints we require the 3.14+ stdlib API
and a couple compiled libs don't support it yet, namely:
- `cffi`, which we're only using for the `.ipc._linux` eventfd
  stuff (now factored into `hotbaud` anyway).
- `greenback`, which requires `greenlet` which doesn't seem to be
  wheeled yet
  * on nixos the sdist build was failing due to lack of `g++` which
    i don't care to figure out rn since we don't need `.devx` stuff
    immediately for this subints prototype.
  * [ ] we still need to adjust any dependent suites to skip.

Adjust `test_ringbuf` to skip on import failure.

Also project wide,
- pin us to py 3.13+ in prep for last-2-minor-version policy.
- drop `msgspec>=0.20.0`, the first release with py3.14 support.
2026-04-23 18:47:49 -04:00
Gud Boi d318f1f8f4 Add `'subint'` spawn backend scaffold (#379)
Land the scaffolding for a future sub-interpreter (PEP 734
`concurrent.interpreters`) actor spawn backend per issue #379. The
spawn flow itself is not yet implemented; `subint_proc()` raises a
placeholder `NotImplementedError` pointing at the tracking issue —
this commit only wires up the registry, the py-version gate, and
the harness.

Deats,
- bump `pyproject.toml` `requires-python` to `>=3.12, <3.15` and
  list the `3.14` classifier — the new stdlib
  `concurrent.interpreters` module only ships on 3.14
- extend `SpawnMethodKey = Literal[..., 'subint']`
- `try_set_start_method('subint')` grows a new `match` arm that
  feature-detects the stdlib module and raises `RuntimeError` with
  a clear banner on py<3.14
- `_methods` registers the new `subint_proc()` via the same
  bottom-of-module late-import pattern used for `._trio` / `._mp`

Also,
- new `tractor/spawn/_subint.py` — top-level `try: from concurrent
  import interpreters` guards `_has_subints: bool`; `subint_proc()`
  signature mirrors `trio_proc`/`mp_proc` so the Phase B.2 impl can
  drop in without touching the registry
- re-add `import sys` to `_spawn.py` (needed for the py-version msg
  in the gate-error)
- `_testing.pytest.pytest_configure` wraps `try_set_start_method()`
  in a `pytest.UsageError` handler so `--spawn-backend=subint` on
  py<3.14 prints a clean banner instead of a traceback

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:47:49 -04:00
Gud Boi a7b1ee34ef Restore fn-arg `_runtime_vars` in `trio_proc` teardown
During the Phase A extraction of `trio_proc()` out of
`spawn._spawn` into its own submod, the
`debug.maybe_wait_for_debugger(child_in_debug=...)` call site in
the hard-reap `finally` got refactored from the original
`_runtime_vars.get('_debug_mode', ...)` (the fn parameter — the
dict that was constructed by the *parent* for the *child*'s
`SpawnSpec`) to `get_runtime_vars().get(...)` (a global getter that
returns the *parent's* live `_state`). Those are semantically
different — the first asks "is the child we just spawned in debug
mode?", the second asks "are *we* in debug mode?". Under
mixed-debug-mode trees the swap can incorrectly skip (or
unnecessarily delay) the debugger-lock wait during teardown.

Revert to the fn-parameter lookup and add an inline `NOTE` comment
calling out the distinction so it's harder to regress again.

Deats,
- `spawn/_trio.py`: `child_in_debug=get_runtime_vars().get(...)` →
  `child_in_debug=_runtime_vars.get(...)` at the
  `debug.maybe_wait_for_debugger(...)` call in the hard-reap block;
  add 4-line `NOTE` explaining the parent-vs-child distinction.
- `spawn/__init__.py`: drop trailing whitespace after the
  `'mp_forkserver'` docstring bullet.
- `ai/prompt-io/prompts/subints_spawner.md`: drop duplicated `with`
  in `"as with with subprocs"` prose (copilot grammar catch).

Review: PR #444 (Copilot)
https://github.com/goodboy/tractor/pull/444#pullrequestreview-4165928469

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:30:11 -04:00
Gud Boi f75865fb2e Tidy `spawn/` subpkg docstrings and imports
Drop unused `TYPE_CHECKING` imports (`Channel`,
`_server`), remove commented-out `import os` in
`_entry.py`, and use `get_runtime_vars()` accessor
instead of bare `_runtime_vars` in `_trio.py`.

Also,
- freshen `__init__.py` layout docstring for the
  new per-backend submod structure
- update `_spawn.py` + `_trio.py` module docstrings

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-17 19:03:00 -04:00
Gud Boi d7ca68cf61 Mv `trio_proc`/`mp_proc` to per-backend submods
Split the monolithic `spawn._spawn` into a slim
"core" + per-backend submodules so a future
`._subint` backend (per issue #379) can drop in
without piling more onto `_spawn.py`.

`._spawn` retains the cross-backend supervisor
machinery: `SpawnMethodKey`, `_methods` registry,
`_spawn_method`/`_ctx` state, `try_set_start_method()`,
the `new_proc()` dispatcher, and the shared helpers
`exhaust_portal()`, `cancel_on_completion()`,
`hard_kill()`, `soft_kill()`, `proc_waiter()`.

Deats,
- mv `trio_proc()` → new `spawn._trio`
- mv `mp_proc()` → new `spawn._mp`, reads `_ctx` and
  `_spawn_method` via `from . import _spawn` for
  late binding bc both get mutated by
  `try_set_start_method()`
- `_methods` wires up the new submods via late
  bottom-of-module imports to side-step circular
  dep (both backend mods pull shared helpers from
  `._spawn`)
- prune now-unused imports from `_spawn.py` — `sys`,
  `is_root_process`, `current_actor`,
  `is_main_process`, `_mp_main`, `ActorFailure`,
  `pretty_struct`, `_pformat`

Also,
- `_testing.pytest.pytest_generate_tests()` now
  drives the valid-backend set from
  `typing.get_args(SpawnMethodKey)` so adding a
  new backend (e.g. `'subint'`) doesn't require
  touching the harness
- refresh `spawn/__init__.py` docstring for the
  new layout

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-17 16:48:22 -04:00
Gud Boi ed65301d32 Fix misc bugs caught by Copilot review
Deats,
- use `proc.poll() is None` in `sig_prog()` to
  distinguish "still running" from exit code 0;
  drop stale `breakpoint()` from fallback kill
  path (would hang CI).
- add missing `raise` on the `RuntimeError` in
  `async_main()` when no tpt bind addrs given.
- clean up stale uid entries from the registrar
  `_registry` when addr eviction empties the
  addr list.
- update `discovery.__init__` docstring to match
  the new eager `._multiaddr` import.
- fix `registar` -> `registrar` typo in teardown
  report log msg.

Review: PR #429 (Copilot)
https://github.com/goodboy/tractor/pull/429

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:15 -04:00
Gud Boi 8817032c90 Prefer fresh conn for unreg, fallback to `_parent_chan`
The prior approach eagerly reused `_parent_chan` when
parent IS the registrar, but that channel may still
carry ctx/stream teardown protocol traffic —
concurrent `unregister_actor` RPC causes protocol
conflicts. Now try a fresh `get_registry()` conn
first; only fall back to the parent channel on
`OSError` (listener already closed/unlinked).

Deats,
- fresh `get_registry()` is the primary path for
  all addrs regardless of `parent_is_reg`
- `OSError` handler checks `parent_is_reg` +
  `rent_chan.connected()` before fallback
- fallback catches `OSError` and
  `trio.ClosedResourceError` separately
- drop unused `reg_addr: Address` annotation

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-14 19:54:15 -04:00