- Remove leftover `await an.cancel()` in
`test_registered_custom_err_relayed`; the
nursery already cancels on scope exit.
- Fix `This document` -> `This documents` typo in
`test_unregistered_err_still_relayed` docstring.
Review: PR #426 (Copilot)
https://github.com/goodboy/tractor/pull/426
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add a teensie unit test to match.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Drop the `xfail` test and instead add a new one that ensures the
`tractor._exceptions` fixes enable graceful relay of
remote-but-unregistered error types via the unboxing of just the
`rae.src_type_str/boxed_type_str` content. The test also ensures
a warning is included with remote error content indicating the user
should register their error type for effective cross-actor re-raising.
Deats,
- add `test_unregistered_err_still_relayed`: verify the
`RemoteActorError` IS raised with `.boxed_type`
as `None` but `.src_type_str`, `.boxed_type_str`,
and `.tb_str` all preserved from the IPC msg.
- drop `test_unregistered_boxed_type_resolution_xfail`
since the new above case covers it and we don't need to have
an effectively entirely repeated test just with an inverse assert
as it's last line..
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Make `RemoteActorError` resilient to unresolved
custom error types so that errors from remote actors
always relay back to the caller - even when the user
hasn't called `reg_err_types()` to register the exc type.
Deats,
- `.src_type`: log warning + return `None` instead
of raising `TypeError` which was crashing the
entire `_deliver_msg()` -> `pformat()` chain
before the error could be relayed.
- `.boxed_type_str`: fallback to `_ipc_msg.boxed_type_str`
when the type obj can't be resolved so the type *name* is always
available.
- `unwrap_src_err()`: fallback to `RuntimeError` preserving
original type name + traceback.
- `unpack_error()`: log warning when `get_err_type()` returns
`None` telling the user to call `reg_err_types()`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Verify registered custom error types round-trip correctly over IPC via
`reg_err_types()` + `get_err_type()`.
Deats,
- `TestRegErrTypesPlumbing`: 5 unit tests for the type-registry plumbing
(register, lookup, builtins, tractor-native types, unregistered
returns `None`)
- `test_registered_custom_err_relayed`: IPC end-to-end for a registered
`CustomAppError` checking `.boxed_type`, `.src_type`, and `.tb_str`
- `test_registered_another_err_relayed`: same for `AnotherAppError`
(multi-type coverage)
- `test_unregistered_custom_err_fails_lookup`: `xfail` documenting that
`.boxed_type` can't resolve without `reg_err_types()` registration
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
The `open_actor_cluster()` teardown hangs
intermittently on UDS when `gather_contexts(mngrs=())`
raises `ValueError` mid-setup; likely a race in the
actor-nursery cleanup vs UDS socket shutdown. TCP
passes reliably (5/5 runs).
- Add `tpt_proto` fixture param to the test
- `pytest.skip()` on UDS with a TODO for deeper
investigation of `._clustering`/`._supervise`
teardown paths
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Factor the CPU-freq-scaling helper out of
`test_legacy_one_way_streaming` into `conftest.py`
alongside a new `cpu_scaling_factor()` convenience fn
that returns a latency-headroom multiplier (>= 1.0).
Apply it to the two other flaky-timeout tests,
- `test_cancel_via_SIGINT_other_task`: 2s -> scaled
- `test_example[we_are_processes.py]`: 16s -> scaled
Deats,
- add `get_cpu_state()` + `cpu_scaling_factor()` to
`conftest.py` so all test mods can share the logic.
- catch `IndexError` (empty glob) in addition to
`FileNotFoundError`.
- rename `factor` var -> `headroom` at call sites for
clarity on intent.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add `get_cpu_state()` helper to read CPU freq settings
from `/sys/devices/system/cpu/` and use it to compensate
the perf time-limit when `auto-cpufreq` (or similar)
scales down the max frequency.
Deats,
- read `*_pstate_max_freq` and `scaling_max_freq`
to compute a `cpu_scaled` ratio.
- when `cpu_scaled != 1.`, increase `this_fast` limit
proportionally (factoring dual-threaded cores).
- log a warning via `test_log` when compensating.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Move the `Arbiter` class out of `runtime._runtime` into its
logical home at `discovery._registry` as `Registrar(Actor)`.
This completes the long-standing terminology migration from
"arbiter" to "registrar/registry" throughout the codebase.
Deats,
- add new `discovery/_registry.py` mod with `Registrar`
class + backward-compat `Arbiter = Registrar` alias.
- rename `Actor.is_arbiter` attr -> `.is_registrar`;
old attr now a `@property` with `DeprecationWarning`.
- `_root.py` imports `Registrar` directly for
root-actor instantiation.
- export `Registrar` + `Arbiter` from `tractor.__init__`.
- `_runtime.py` re-imports from `discovery._registry`
for backward compat.
Also,
- update all test files to use `.is_registrar`
(`test_local`, `test_rpc`, `test_spawning`,
`test_discovery`, `test_multi_program`).
- update "arbiter" -> "registrar" in comments/docstrings
across `_discovery.py`, `_server.py`, `_transport.py`,
`_testing/pytest.py`, and examples.
- drop resolved TODOs from `_runtime.py` and `_root.py`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Adjust all `tractor._state`, `tractor._addr`,
`tractor._supervise`, etc. refs in tests and examples
to use the new `runtime/`, `discovery/`, `spawn/` paths.
Also,
- use `tractor.debug_mode()` pub API instead of
`tractor._state.debug_mode()` in a few test mods
- add explicit `timeout=20` to `test_respawn_consumer_task`
`@tractor_test` deco call
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Restructure the flat `tractor/` top-level private mods
into (more nested) subpackages:
- `runtime/`: `_runtime`, `_portal`, `_rpc`, `_state`,
`_supervise`
- `spawn/`: `_spawn`, `_entry`, `_forkserver_override`,
`_mp_fixup_main`
- `discovery/`: `_addr`, `_discovery`, `_multiaddr`
Each subpkg `__init__.py` is kept lazy (no eager
imports) to avoid circular import issues.
Also,
- update all intra-pkg imports across ~35 mods to use
the new subpkg paths (e.g. `from .runtime._state`
instead of `from ._state`)
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Refactor the test-fn deco to use `wrapt.decorator`
instead of `functools.wraps` for better fn-sig
preservation and optional-args support via
`PartialCallableObjectProxy`.
Deats,
- add `timeout` and `hide_tb` deco params
- wrap test-fn body with `trio.fail_after(timeout)`
- consolidate per-fixture `if` checks into a loop
- add `iscoroutinefunction()` type-check on wrapped fn
- set `__tracebackhide__` at each wrapper level
Also,
- update imports for new subpkg paths:
`tractor.spawn._spawn`, `tractor.discovery._addr`,
`tractor.runtime._state`
(see upcoming, likely large patch commit ;)
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Export the new `RuntimeVars` struct and `get_runtime_vars()`
from `tractor.__init__` and improve the accessor to
optionally return the struct form.
Deats,
- add `RuntimeVars` and `get_runtime_vars` to
`__init__.py` exports; alphabetize `_state` imports.
- move `get_runtime_vars()` up in `_state.py` to sit
right below `_runtime_vars` dict definition.
- add `as_dict: bool = True` param so callers can get
either the legacy `dict` or the new `RuntimeVars`
struct.
- drop the old stub fn at bottom of `_state.py`.
- rm stale `from .msg.pretty_struct import Struct` comment.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
So we can start transition from runtime-vars `dict` to a typed struct
for better clarity and wire-ready monitoring potential, as well as
better traceability when .
Deats,
- add a new `RuntimeVars(Struct)` with all fields from `_runtime_vars`
dict typed out
- include `__setattr__()` with `breakpoint()` for debugging
any unexpected mutations.
- add `.update()` method for batch-updating compat with `dict`.
- keep old `_runtime_vars: dict` in place (we need to port a ton of
stuff to adjust..).
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Allow external app code to register custom exception types
on `._exceptions` so they can be re-raised on the receiver
side of an IPC dialog via `get_err_type()`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Expose a copy of the current actor's `_runtime_vars` dict
via a public fn; TODO to convert to `RuntimeVars` struct.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- Add `LocalPortal` union to `query_actor()` return
type and `reg_portal` var annotation since the
registrar yields a `LocalPortal` instance.
- Update docstring to note the `LocalPortal` case.
- Widen `.delete_addr()` `addr` param to accept
`list[str|int]` bc msgpack deserializes tuples as
lists over IPC.
- Tighten `uid` annotation to `tuple[str, str]|None`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
`msgpack` deserializes tuples as lists over IPC so
the `bidict.inverse.pop()` needs a `tuple`-cast to
match registry keys.
Regressed-by: 85457cb (`registry_addrs` change)
Found-via: `/run-tests` test_stale_entry_is_deleted
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- Use `bidict.forceput()` in `register_actor()` to handle
duplicate addr values from stale entries or actor restarts.
- Fix `uid` annotation to `tuple[str, str]|None` in
`maybe_open_portal()` and handle the `None` return from
`delete_addr()` in log output.
- Pass explicit `registry_addrs=[reg_addr]` to `open_nursery()`
and `find_actor()` in `test_stale_entry_is_deleted` to ensure
the test uses the remote registrar.
- Update `query_actor()` docstring to document the new
`(addr, reg_portal)` yield shape.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix potential `AttributeError` when `query_actor()` yields
a `None` portal (peer-found-locally path) and an `OSError`
is raised during transport connect.
Also,
- fix `Arbiter.delete_addr()` return type to
`tuple[str, str]|None` bc it can return `None`.
- fix "registar" typo -> "registrar" in comment.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
By spawning an actor task that immediately shuts down the transport
server and then sleeps, verify that attempting to connect via the
`._discovery.find_actor()` helper delivers `None` for the `Portal`
value.
Relates to #184 and #216
In cases where an actor's transport server task (by default handling new
TCP connections) terminates early but does not de-register from the
pertaining registry (aka the registrar) actor's address table, the
trying-to-connect client actor will get a connection error on that
address. In the case where client handles a (local) `OSError` (meaning
the target actor address is likely being contacted over `localhost`)
exception, make a further call to the registrar to delete the stale
entry and `yield None` gracefully indicating to calling code that no
`Portal` can be delivered to the target address.
This issue was originally discovered in `piker` where the `emsd`
(clearing engine) actor would sometimes crash on rapid client
re-connects and then leave a `pikerd` stale entry. With this fix new
clients will attempt connect via an endpoint which will re-spawn the
`emsd` when a `None` portal is delivered (via `maybe_spawn_em()`).
Since stale addrs can be leaked where the actor transport server task
crashes but doesn't (successfully) unregister from the registrar, we
need a remote way to remove such entries; hence this new (registrar)
method.
To implement this make use of the `bidict` lib for the `._registry`
table thus making it super simple to do reverse uuid lookups from an
input socket-address.
- `test_inter_peer_cancellation`: swap all `.uid` refs
on `Actor`, `Channel`, and `Portal` to `.aid.uid`
- `test_legacy_one_way_streaming`: same + fix `print()`
to multiline style
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Use 6s timeout on non-linux (vs 4s) in
`test_cancel_while_childs_child_in_sync_sleep()` to avoid
flaky `TooSlowError` on slower CI runners.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- add `tpt_proto: ['tcp', 'uds']` matrix dimension
to the `testing` job.
- exclude `uds` on `macos-latest` for now.
- pass `--tpt-proto=${{ matrix.tpt_proto }}` to the
`pytest` invocation.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Removes the `pytest` deprecation warns and attempts to avoid
some de-registration raciness, though i'm starting to think the
real issue is due to not having the fixes from #366 (which handle
the new dereg on `OSError` case from UDS)?
- use `.channel.aid.uid` over deprecated `.channel.uid`
throughout `test_discovery.py`.
- add polling loop (up to 5s) for subactor de-reg check
in `spawn_and_check_registry()` to handle slower
transports like UDS where teardown takes longer.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- add `is_valid` and `sockpath.resolve()` asserts in
`get_rando_addr()` for the `'uds'` case plus an
explicit `UDSAddress` type annotation.
- rename no-runtime sockname prefixes from
`'<unknown-actor>'`/`'root'` to
`'no_runtime_root'`/`'no_runtime_actor'` with a proper
if/else branch in `UDSAddress.get_random()`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add UDS skip-guard to `test_streaming_to_actor_cluster()`
and plumb `tpt_proto` through the `@tractor_test` wrapper
so transport-parametrized tests can receive it.
Deats,
- skip cluster test when `tpt_proto == 'uds'` with
descriptive msg, add TODO about `@pytest.mark.no_tpt`.
- add `tpt_proto: str|None` param to inner wrapper in
`tractor_test()`, forward to decorated fn when its sig
accepts it.
- register custom `no_tpt` marker via `pytest_configure()`
to avoid unknown-marker warnings.
- add masked todo for `no_tpt` marker-check code in `tpt_proto` fixture
(needs fn-scope to work, left as TODO).
- add `request` param to `tpt_proto` fixture for future
marker inspection.
Also,
- add doc-string to `test_streaming_to_actor_cluster()`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Namely the workaround expected exc branches added in ef7ed7a for the UDS
parametrization. With the new boxing of the underlying CREs as
tpt-closed, we can expect the same exc outcomes as in the TCP cases.
Also this tweaks some error report logging content used while debugging
this,
- properly `repr()` the `TransportClosed.src_exc`-type from
the maybe emit in `.report_n_maybe_raise()`.
- remove the redudant `chan.raddr` from the "closed abruptly"
header in the tpt-closed handler of `._rpc.process_messages()`,
the `Channel.__repr__()` now contains it by default.
Forward the `tpt_proto` fixture val into spawned daemon
subprocesses via `run_daemon(enable_transports=..)` and
sync `_runtime_vars['_enable_tpts']` in the `tpt_proto`
fixture so sub-actors inherit the transport setting.
Deats,
- add `enable_transports={enable_tpts}` to the daemon
spawn-cmd template in `tests/conftest.py`.
- set `_state._runtime_vars['_enable_tpts']` in the
`tpt_proto` fixture in `_testing/pytest.py`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
For more reliability with the oob registrar using tests
via the `daemon` fixture,
- increase spawn-wait to `2` in CI, `1` OW; drop
the old py<3.7 branch.
- move `_ci_env` to module-level (above `_non_linux`)
so `_PROC_SPAWN_WAIT` can reference it at parse time.
- add `test_log` fixture param to `daemon()`, use it
for the error-on-exit log line instead of bare `log`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add `expect_timeout: float` param to `_spawn()`
so individual tests can tune `pexpect` timeouts
instead of relying on the hard-coded 3/10 split.
Deats,
- default to 4s, bump by +6 on non-linux CI.
- use walrus `:=` to capture resolved timeout and assert
`spawned.timeout == timeout` for sanity.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
I started getting annoyed by all the warnings from `pytest` during work
on macos suport in CI, so this replaces all `Actor.uid`/`Channel.uid`
accesses with `.aid.uid` (or `.aid.reprol()` for log msgs) across the
core runtime and IPC subsystems to avoid the noise.
This also provides incentive to start the adjustment to all
`.uid`-holding/tracking internal `dict`-tables/data-structures to
instead use `.msg.types.Aid`. Hopefully that will come a (vibed?) follow
up shortly B)
Deats,
- `._context`: swap all `self._actor.uid`, `self.chan.uid`,
and `portal.actor.uid` refs to `.aid.uid`; use
`.aid.reprol()` for log/error formatting.
- `._rpc`: same treatment for `actor.uid`, `chan.uid` in
log msgs and cancel-scope handling; fix `str(err)` typo
in `ContextCancelled` log.
- `._runtime`: update `chan.uid` -> `chan.aid.uid` in ctx
cache lookups, RPC `Start` msg, registration and
cancel-request handling; improve ctxc log formatting.
- `._spawn`: replace all `subactor.uid` with
`.aid.uid` for child-proc tracking, IPC peer waiting,
debug-lock acquisition, and nursery child dict ops.
- `._supervise`: same for `subactor.uid` in cancel and
portal-wait paths; use `actor.aid.uid` for error dict.
- `._state`: fix `last.uid` -> `last.aid.uid` in
`current_actor()` error msg.
Also,
- `._chan`: make `Channel.aid` a proper `@property` backed
by `._aid` so we can add validation/typing later.
- `.log`: use `current_actor().aid.uuid` instead of
`.uid[1]` for actor-uid log field.
- `.msg.types`: add TODO comment for `Start.aid` field
conversion.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Event on linux i was noticing lotsa false negatives based on sub
teardown race conditions, so this tries to both make way for
(eventually?) expanding the set of suite cases and ensure the current
ones are more reliable on every run.
The main change is to hange the `error_in_child=False` case to use
parent-side-cancellation via a new `trio.move_on_after(timeout)` instead
of `actor.cancel_soon()` (which is now toggled by a new `self_cancel:
bool` but unused rn), and add better teardown assertions.
Low level deats,
- add `rent_cancel`/`self_cancel` params to
`crash_and_clean_tmpdir()` for different cancel paths;
default to `rent_cancel=True` which just sleeps forever
letting the parent's timeout do the work.
- use `trio.move_on_after()` with longer timeouts per
case: 1.6s for error, 1s for cancel.
- use the `.move_on_after()` cancel-scope to assert `.cancel_called`
pnly when `error_in_child=False`, indicating we
parent-graceful-cancelled the sub.
- add `loglevel` fixture, pass to `open_nursery()`.
- log caught `RemoteActorError` via console logger.
- add `ids=` to parametrize for readable test names.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add a 6s timeout guard around `test_streaming_to_actor_cluster()`
to catch hangs, and nest the `async with` block inside it.
Found this when running `pytest tests/ --tpt-proto uds`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code