Fifth diagnostic pass pinpointed the hang to
`async_main`'s finally block — every stuck actor
reaches `FINALLY ENTER` but never `RETURNING`.
Specifically `await ipc_server.wait_for_no_more_
peers()` never returns when a peer-channel handler
is stuck: the `_no_more_peers` Event is set only
when `server._peers` empties, and stuck handlers
keep their channels registered.
Wrap the call in `trio.move_on_after(3.0)` + a
warning-log on timeout that records the still-
connected peer count. 3s is enough for any
graceful cancel-ack round-trip; beyond that we're
in bug territory and need to proceed with local
teardown so the parent's `_ForkedProc.wait()` can
unblock. Defensive-in-depth regardless of the
underlying bug — a local finally shouldn't block
on remote cooperation forever.
Verified: with this fix, ALL 15 actors reach
`async_main: RETURNING` (up from 10/15 before).
Test still hangs past 45s though — there's at
least one MORE unbounded wait downstream of
`async_main`. Candidates enumerated in the doc
update (`open_root_actor` finally /
`actor.cancel()` internals / trio.run bg tasks /
`_serve_ipc_eps` finally). Skip-mark stays on
`test_nested_multierrors[subint_forkserver]`.
Also updates
`subint_forkserver_test_cancellation_leak_issue.md`
with the new pinpoint + summary of the 6-item
investigation win list:
1. FD hygiene fix (`_close_inherited_fds`) —
orphan-SIGINT closed
2. pidfd-based `_ForkedProc.wait` — cancellable
3. `_parent_chan_cs` wiring — shielded parent-chan
loop now breakable
4. `wait_for_no_more_peers` bound — THIS commit
5. Ruled-out hypotheses: tree-kill missing, stuck
socket recv, capture-pipe fill (all wrong)
6. Remaining unknown: at least one more unbounded
wait in the teardown cascade above `async_main`
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit e312a68d8a)
(factored: dropped subint_forkserver conc-anal doc update)
Completes the nested-cancel deadlock fix started in
0cd0b633 (fork-child FD scrub) and fe540d02 (pidfd-
cancellable wait). The remaining piece: the parent-
channel `process_messages` loop runs under
`shield=True` (so normal cancel cascades don't kill
it prematurely), and relies on EOF arriving when the
parent closes the socket to exit naturally.
Under exec-spawn backends (`trio_proc`, mp) that EOF
arrival is reliable — parent's teardown closes the
handler-task socket deterministically. But fork-
based backends like `subint_forkserver` share enough
process-image state that EOF delivery becomes racy:
the loop parks waiting for an EOF that only arrives
after the parent finishes its own teardown, but the
parent is itself blocked on `os.waitpid()` for THIS
actor's exit. Mutual wait → deadlock.
Deats,
- `async_main` stashes the cancel-scope returned by
`root_tn.start(...)` for the parent-chan
`process_messages` task onto the actor as
`_parent_chan_cs`
- `Actor.cancel()`'s teardown path (after
`ipc_server.cancel()` + `wait_for_shutdown()`)
calls `self._parent_chan_cs.cancel()` to
explicitly break the shield — no more waiting for
EOF delivery, unwinding proceeds deterministically
regardless of backend
- inline comments on both sites explain the mutual-
wait deadlock + why the explicit cancel is
backend-agnostic rather than a forkserver-specific
workaround
With this + the prior two fixes, the
`subint_forkserver` nested-cancel cascade unwinds
cleanly end-to-end.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 8ac3dfeb85)
Resetting `_runtime_vars` post-(forking-)spawn was
previously only possible via direct mutation of
`_state._runtime_vars` from an external module + an
inline default dict duplicating the
`_state.py`-internal defaults. Split the access
surface into a pure getter + explicit setter so such
a reset call site becomes a one-liner composition:
`set_runtime_vars(get_runtime_vars(clear_values=True))`.
Deats `tractor/runtime/_state.py`,
- extract initial values into a module-level
`_RUNTIME_VARS_DEFAULTS: dict[str, Any]` constant; the
live `_runtime_vars` is now initialised from
`dict(_RUNTIME_VARS_DEFAULTS)`
- `get_runtime_vars()` grows a `clear_values: bool = False`
kwarg. When True, returns a fresh copy of
`_RUNTIME_VARS_DEFAULTS` instead of the live dict —
still a **pure read**, never mutates anything
- new `set_runtime_vars(rtvars: dict | RuntimeVars)` —
atomic replacement of the live dict's contents via
`.clear()` + `.update()`, so existing references to the
same dict object remain valid. Accepts either the
historical dict form or the `RuntimeVars` struct
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 7804a9fe57693dd5e15bee6a08e7d2fa14b6a98a)
(factored: kept only the tractor/runtime/_state.py part; dropped
tractor/spawn/_subint_forkserver.py call-site rewire)
Pull the `_child.py` `__main__` block body out into
a callable `_actor_child_main()` so alternate spawn
backends can bootstrap a subactor without going
through the CLI entrypoint.
Deats,
- new `_actor_child_main(uid, loglevel, parent_addr,
infect_asyncio, spawn_method='trio')` holds the
full child-side runtime startup previously inlined
under `if __name__ == '__main__':`
- `__main__` block reduces to arg-parsing + a call
into the new func
- add `"subint"` to the `_runtime.py` spawn-method
check so a child accepts `SpawnSpec` from that
(future) backend; inert str-compare w/o it
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit b8f243e98d)
(factored: kept only the `_child.py`/`_runtime.py` entry-extraction parts of
"Impl min-viable `subint` spawn backend (B.2)"; dropped
tractor/spawn/_subint.py + subint prompt-io logs)
Two version-compat fixes for the `devx` debugger
test-suite, both about matching upstream output that
got more verbose w/ recent lib releases.
ANSI stripping (`tests/devx/conftest.py`),
- Add `ansi_strip(text)` helper + `_ansi_re` pattern
(regex per https://stackoverflow.com/a/14693789).
- Apply inside `in_prompt_msg()` + `assert_before()` so
substring matches against REPL/traceback output stay
robust to color leakage.
- Motivated by py3.13's colored tracebacks +
`pdbp`/pygments highlighting leaking ANSI even when
`PYTHON_COLORS=0` is set in the `spawn` fixture (not
every renderer in the spawned subproc honors it).
- Replaces the longstanding inline TODO that linked
the SO answer w/o impl'ing.
trio 0.30+ `Cancelled._create(` match (`test_debugger`),
- In `test_shield_pause` swap the two
`"raise Cancelled._create()"` assertion patterns →
`"raise Cancelled._create("` (open-paren form, no
closing).
- trio >=0.30 raises a multi-line
`raise Cancelled._create(source=.., reason=..,
source_task=..)` w/ cancel-reason metadata, so the
legacy bare-`()` form no longer matches. Inline
comment documents the trio-version pivot.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 3854cf5ecb)
(cherry picked from commit a6cbac954d)
(factored: lock regenerated for the dep-group split only; the `pytest-timeout` locking lands with its dep-add in the testing-harness segment)
Reshuffle `pyproject.toml` deps into per-python-version
`[tool.uv.dependency-groups]`:
- `subints` group: `msgspec>=0.21.0`, py>=3.14
- `eventfd` group: `cffi>=1.17.1`, py>=3.13,<3.14
- `sync_pause` group: `greenback`, py>=3.13,<3.14
(was in `devx`; moved out bc no 3.14 yet)
Bump top-level `msgspec>=0.20.0` too.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 34d9d482e4)
(factored: kept only the pyproject dep-group parts of
"Raise `subint` floor to py3.14 and split dep-groups"; dropped
tractor/spawn/_spawn.py + tractor/spawn/_subint.py)
Since we're devving subints we require the 3.14+ stdlib API
and a couple compiled libs don't support it yet, namely:
- `cffi`, which we're only using for the `.ipc._linux` eventfd
stuff (now factored into `hotbaud` anyway).
- `greenback`, which requires `greenlet` which doesn't seem to be
wheeled yet
* on nixos the sdist build was failing due to lack of `g++` which
i don't care to figure out rn since we don't need `.devx` stuff
immediately for this subints prototype.
* [ ] we still need to adjust any dependent suites to skip.
Adjust `test_ringbuf` to skip on import failure.
Also project wide,
- pin us to py 3.13+ in prep for last-2-minor-version policy.
- drop `msgspec>=0.20.0`, the first release with py3.14 support.
(cherry picked from commit d2ea8aa2de)
Prep for a future sub-interpreter (PEP 734
`concurrent.interpreters`) spawn backend per issue
#379 — land just the py-version range bump and the
test-harness error-gating; the backend itself comes
later.
Deats,
- bump `pyproject.toml` `requires-python` to
`>=3.12, <3.15` and list the `3.14` classifier —
the new stdlib `concurrent.interpreters` module
only ships on 3.14
- `_testing.pytest.pytest_configure` wraps
`try_set_start_method()` in a `pytest.UsageError`
handler so an unsupported `--spawn-backend` on the
running py-version prints a clean banner instead
of a traceback
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit d318f1f8f4)
(factored: kept only the pyproject + `_testing/pytest.py` parts of
"Add `'subint'` spawn backend scaffold (#379)"; dropped
tractor/spawn/_spawn.py + tractor/spawn/_subint.py)
(cherry picked from commit c4cad921b9)
(factored: also disable the local-editable `xonsh` uv-source so the release pin actually applies; regenerate `uv.lock` accordingly — on the dev branch these rode in an unrelated later commit)
Draft plan for consolidating pytest CLI flags,
ad-hoc env vars, and hardcoded fixture defaults
into the existing (but unused) `RuntimeVars`
struct as the single source of truth.
Deats,
- `_rtvars.py` leaf mod w/ `dump`/`load`/`get`/
`update` helpers using `str(dict)` +
`ast.literal_eval` encoding
- phased migration: test infra first, then
runtime callers, then per-session bindspace
- addresses concurrent pytest session collisions
and subproc env propagation for `devx/` scripts
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 7882c37ce0)
Encode the hard-won lesson from the forkserver
cancel-cascade investigation into two skill docs
so future sessions grep-find it before spelunking
into trio internals.
Deats,
- `.claude/skills/conc-anal/SKILL.md`:
- new "Unbounded waits in cleanup paths"
section — rule: bound every `await X.wait()`
in cleanup paths with `trio.move_on_after()`
unless the setter is unconditionally
reachable. Recent example:
`ipc_server.wait_for_no_more_peers()` in
`async_main`'s finally (was unbounded,
deadlocked when any peer handler stuck)
- new "The capture-pipe-fill hang pattern"
section — mechanism, grep-pointers to the
existing `conftest.py` guards (`tests/conftest
.py:258`, `:316`), cross-ref to the full
post-mortem doc, and the grep-note: "if a
multi-subproc tractor test hangs, `pytest -s`
first, conc-anal second"
- `.claude/skills/run-tests/SKILL.md`: new
"Section 9: The pytest-capture hang pattern
(CHECK THIS FIRST)" with symptom / cause /
pre-existing guards to grep / three-step debug
recipe (try `-s`, lower loglevel, redirect
stdout/stderr) / signature of this bug vs. a
real code hang / historical reference
Cost several investigation sessions before the
capture-pipe issue surfaced — it was masked by
deeper cascade deadlocks. Once the cascades were
fixed, the tree tore down enough to generate
pipe-filling log volume. Lesson: **grep this
pattern first when any multi-subproc tractor test
hangs under default pytest but passes with `-s`.**
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 4106ba73ea)
The previous cleanup recipe went straight to
SIGTERM+SIGKILL, which hides bugs: tractor is
structured concurrent — `_trio_main` catches SIGINT
as an OS-cancel and cascades `Portal.cancel_actor`
over IPC to every descendant. So a graceful SIGINT
exercises the actual SC teardown path; if it hangs,
that's a real bug to file (the forkserver `:1616`
zombie was originally suspected to be one of these
but turned out to be a teardown gap in
`_ForkedProc.kill()` instead).
Deats,
- step 1: `pkill -INT` scoped to `$(pwd)/py*` — no
sleep yet, just send the signal
- step 2: bounded wait loop (10 × 0.3s = ~3s) using
`pgrep` to poll for exit. Loop breaks early on
clean exit
- step 3: `pkill -9` only if graceful timed out, w/
a logged escalation msg so it's obvious when SC
teardown didn't complete
- step 4: same SIGINT-first ladder for the rare
`:1616`-holding zombie that doesn't match the
cmdline pattern (find PID via `ss -tlnp`, then
`kill -INT NNNN; sleep 1; kill -9 NNNN`)
- steps 5-6: UDS-socket `rm -f` + re-verify
unchanged
Goal: surface real teardown bugs through the test-
cleanup workflow instead of papering over them with
`-9`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit 70d58c4bd2)
Fork-based backends (esp. `subint_forkserver`) can
leak child actor processes on cancelled / SIGINT'd
test runs; the zombies keep the tractor default
registry (`127.0.0.1:1616` / `/tmp/registry@1616.sock`)
bound, so every subsequent session can't bind and
50+ unrelated tests fail with the same
`TooSlowError` / "address in use" signature. Document
the pre-flight + post-cancel check as a mandatory
step 4.
Deats,
- **primary signal**: `ss -tlnp | grep ':1616'` for a
bound TCP registry listener — the authoritative
check since :1616 is unique to our runtime
- `pgrep -af` scoped to `$(pwd)/py[0-9]*/bin/python.*
_actor_child_main|subint-forkserv` for leftover
actor/forkserver procs — scoped deliberately so we
don't false-flag legit long-running tractor-
embedding apps like `piker`
- `ls /tmp/registry@*.sock` for stale UDS sockets
- scoped cleanup recipe (SIGTERM + SIGKILL sweep
using the same `$(pwd)/py*` pattern, UDS `rm -f`,
re-verify) plus a fallback for when a zombie holds
:1616 but doesn't match the pattern: `ss -tlnp` →
kill by PID
- explicit false-positive warning calling out the
`piker` case (`~/repos/piker/py*/bin/python3 -m
tractor._child ...`) so a bare `pgrep` doesn't lead
to nuking unrelated apps
Goal: short-circuit the "spelunking into test code"
rabbit-hole when the real cause is just a leaked PID
from a prior session, without collateral damage to
other tractor-embedding projects on the same box.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit d093c31979)
Rework section 3 from a worktree-only check into a
structured 3-step flow: detect active venv, interpret
results (Case A: active, B: none, C: worktree), then
run import + collection checks.
Deats,
- Case B prompts via `AskUserQuestion` when no venv
is detected, offering `uv sync` or manual activate
- add `uv run` fallback section for envs where venv
activation isn't practical
- new allowed-tools: `uv run python`, `uv run pytest`,
`uv pip show`, `AskUserQuestion`
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit b1a0753a3f)
New "Inspect last failures" section reads the pytest
`lastfailed` cache JSON directly — instant, no
collection overhead, and filters to `tests/`-prefixed
entries to avoid stale junk paths.
Also,
- add `jq` tool permission for `.pytest_cache/` files
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit ba86d482e3)
Group `.claude/` ignores per-skill instead of a
flat list: `ai.skillz` symlinks, `/open-wkt`,
`/code-review-changes`, `/pr-msg`, `/commit-msg`.
Add missing symlink entries (`yt-url-lookup` ->
`resolve-conflicts`, `inter-skill-review`). Drop
stale `Claude worktrees` section (already covered
by `.claude/wkts/`).
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
(cherry picked from commit d3d6f646f9)
During the Phase A extraction of `trio_proc()` out of
`spawn._spawn` into its own submod, the
`debug.maybe_wait_for_debugger(child_in_debug=...)` call site in
the hard-reap `finally` got refactored from the original
`_runtime_vars.get('_debug_mode', ...)` (the fn parameter — the
dict that was constructed by the *parent* for the *child*'s
`SpawnSpec`) to `get_runtime_vars().get(...)` (a global getter that
returns the *parent's* live `_state`). Those are semantically
different — the first asks "is the child we just spawned in debug
mode?", the second asks "are *we* in debug mode?". Under
mixed-debug-mode trees the swap can incorrectly skip (or
unnecessarily delay) the debugger-lock wait during teardown.
Revert to the fn-parameter lookup and add an inline `NOTE` comment
calling out the distinction so it's harder to regress again.
Deats,
- `spawn/_trio.py`: `child_in_debug=get_runtime_vars().get(...)` →
`child_in_debug=_runtime_vars.get(...)` at the
`debug.maybe_wait_for_debugger(...)` call in the hard-reap block;
add 4-line `NOTE` explaining the parent-vs-child distinction.
- `spawn/__init__.py`: drop trailing whitespace after the
`'mp_forkserver'` docstring bullet.
- `ai/prompt-io/prompts/subints_spawner.md`: drop duplicated `with`
in `"as with with subprocs"` prose (copilot grammar catch).
Review: PR #444 (Copilot)
https://github.com/goodboy/tractor/pull/444#pullrequestreview-4165928469
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Drop unused `TYPE_CHECKING` imports (`Channel`,
`_server`), remove commented-out `import os` in
`_entry.py`, and use `get_runtime_vars()` accessor
instead of bare `_runtime_vars` in `_trio.py`.
Also,
- freshen `__init__.py` layout docstring for the
new per-backend submod structure
- update `_spawn.py` + `_trio.py` module docstrings
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Split the monolithic `spawn._spawn` into a slim
"core" + per-backend submodules so a future
`._subint` backend (per issue #379) can drop in
without piling more onto `_spawn.py`.
`._spawn` retains the cross-backend supervisor
machinery: `SpawnMethodKey`, `_methods` registry,
`_spawn_method`/`_ctx` state, `try_set_start_method()`,
the `new_proc()` dispatcher, and the shared helpers
`exhaust_portal()`, `cancel_on_completion()`,
`hard_kill()`, `soft_kill()`, `proc_waiter()`.
Deats,
- mv `trio_proc()` → new `spawn._trio`
- mv `mp_proc()` → new `spawn._mp`, reads `_ctx` and
`_spawn_method` via `from . import _spawn` for
late binding bc both get mutated by
`try_set_start_method()`
- `_methods` wires up the new submods via late
bottom-of-module imports to side-step circular
dep (both backend mods pull shared helpers from
`._spawn`)
- prune now-unused imports from `_spawn.py` — `sys`,
`is_root_process`, `current_actor`,
`is_main_process`, `_mp_main`, `ActorFailure`,
`pretty_struct`, `_pformat`
Also,
- `_testing.pytest.pytest_generate_tests()` now
drives the valid-backend set from
`typing.get_args(SpawnMethodKey)` so adding a
new backend (e.g. `'subint'`) doesn't require
touching the harness
- refresh `spawn/__init__.py` docstring for the
new layout
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Log the `claude-opus-4-7` design session that produced the phased plan
(A: modularize `_spawn`, B: `_subint` backend, C: harness) and concrete
Phase A file-split for #379. Substantive bc the plan directly drives
upcoming impl.
Prompt-IO: ai/prompt-io/claude/20260417T034918Z_9703210_prompt_io.md
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Import and apply `cpu_scaling_factor()` from
`conftest`; bump base from 3.6 -> 4 and multiply
through so CI boxes with slow CPUs don't flake.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Since we no longer need the example after integrating `multiaddr` into
the `.discovery` subsys.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Replace verbose inline code dumps in `.raw.md`
entries with terse summaries and `git diff`
cmd references. Add `diff_cmd` metadata to each
entry's YAML frontmatter so readers can reproduce
the actual output diff.
Also,
- rename `multiaddr_declare_eps.md_` -> `.md`
(drop trailing `_` suffix)
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Deats,
- use `proc.poll() is None` in `sig_prog()` to
distinguish "still running" from exit code 0;
drop stale `breakpoint()` from fallback kill
path (would hang CI).
- add missing `raise` on the `RuntimeError` in
`async_main()` when no tpt bind addrs given.
- clean up stale uid entries from the registrar
`_registry` when addr eviction empties the
addr list.
- update `discovery.__init__` docstring to match
the new eager `._multiaddr` import.
- fix `registar` -> `registrar` typo in teardown
report log msg.
Review: PR #429 (Copilot)
https://github.com/goodboy/tractor/pull/429
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
The prior approach eagerly reused `_parent_chan` when
parent IS the registrar, but that channel may still
carry ctx/stream teardown protocol traffic —
concurrent `unregister_actor` RPC causes protocol
conflicts. Now try a fresh `get_registry()` conn
first; only fall back to the parent channel on
`OSError` (listener already closed/unlinked).
Deats,
- fresh `get_registry()` is the primary path for
all addrs regardless of `parent_is_reg`
- `OSError` handler checks `parent_is_reg` +
`rent_chan.connected()` before fallback
- fallback catches `OSError` and
`trio.ClosedResourceError` separately
- drop unused `reg_addr: Address` annotation
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
A backlog of 1 caused `ECONNREFUSED` when multiple
sub-actors simultaneously connect to deregister from
a remote-daemon registrar. Now matches the TCP
transport's default backlog (~128).
Also,
- add cross-ref comments between
`_uds.close_listener()` and `async_main()`'s
`parent_is_reg` deregistration path explaining
the UDS socket-file lifecycle
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
`get_random()` can produce the same UDS filename for a given
pid+actor-state, so the "disjoint addrs" premise doesn't always hold.
Gate the `len(bound) >= 2` assertion on whether the registry and bind
addrs actually differ via `expect_disjoint`.
Also,
- drop unused `partial` import
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
When the parent actor IS the registrar, reuse the existing parent
channel for `unregister_actor` RPC instead of opening a new connection
via `get_registry()`. This avoids failures when the registrar's listener
socket is already closed during teardown (e.g. UDS transport unlinks the
socket file rapidly).
Deats,
- detect `parent_is_reg` by comparing `_parent_chan.raddr` against
`reg_addrs` and if matched, create a `Portal(rent_chan)` directly
instead of `async with get_registry()`.
- rename `failed` -> `failed_unreg` for clarity.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Retry signal delivery in `sig_prog()` up to `tries`
times (default 3) w/ `canc_timeout` sleep between
attempts; only fall back to `_KILL_SIGNAL` after all
retries exhaust. Bump default timeout 0.1 -> 0.2.
Also,
- `test_multi_nested_subactors_error_through_nurseries`
gives the first prompt iteration a 5s timeout even
on linux bc the initial crash sequence can be slow
to arrive at a `pdb` prompt
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
New locality-aware addr preference for multihomed
actors: UDS > local TCP > remote TCP. Uses
`ipaddress` + `socket.getaddrinfo()` to detect
whether a `TCPAddress` is on the local host.
Deats,
- `_is_local_addr()` checks loopback or
same-host IPs via interface enumeration
- `prefer_addr()` classifies an addr list into
three tiers and picks the latest entry from
the highest-priority non-empty tier
- `query_actor()` and `wait_for_actor()` now
call `prefer_addr()` instead of grabbing
`addrs[-1]` or a single pre-selected addr
Also,
- `Registrar.find_actor()` returns full
`list[UnwrappedAddress]|None` so callers can
apply transport preference
Prompt-IO: ai/prompt-io/claude/20260414T163300Z_befedc49_prompt_io.md
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Adjust all imports to match.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Replace `Registrar._registry: bidict[uid, addr]`
with `dict[uid, list[UnwrappedAddress]]` to
support actors binding on multiple transports
simultaneously (multi-homed).
Deats,
- `find_actor_addr()` returns first addr from
the uid's list
- `get_registry()` now returns per-uid addr
lists
- `find_actor_addrs()` uses `.extend()` to
collect all addrs for a given actor name
- `register_actor_addr()` appends to the uid's
list (dedup'd) and evicts stale entries where
a different uid claims the same addr
- `delete_actor_addr()` does a linear scan +
`.remove()` instead of `bidict.inverse.pop()`;
deletes the uid entry entirely when no addrs
remain
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code