Nowhere near ready yet since this generates many test-body-logic
un-handled true-positives which currently fail, but it's a first draft
for the general case set. To start, includes a greater-than-one-`Msg` (a
strict top level tagged-union-of-structs) pld spec alongside
a `AnyFieldMsg`-workaround struct-msg for packing all other builtin
non-`Struct`/`Any` python types alongside the other explicit msgs.
Removing the now masked-for-a-while unit test remnants for
`test_limit_msgspec()` (and its helper `chk_pld_type()`) since these cases
are now covered in the `test_pldrx_limiting` suite at an e2e
IPC-system-spanning level.
Note that the contents of the `chk_pld_type()` might be useful in the
future once we start setting/allowing semantics for various "phases of
IPC with matching msgspecs", but that's a little ways off rn and this
commit can always be looked up, also iirc most of the details were
already somewhat out of date and causing suite failure.
That is, in `.msg._codec.mk_dec()` to ensure we actually still respect
the provided `spec: Union[Type[Struct]]|Type|None` alongside any
"custom" extension-types expected to be `dec_hook()` pre-processed.
Notes,
- previously when `dec_hook()` was provided we were merging with
a `msgspec.Raw` instead of `spec` which **is entirely wrong**; it was
likely leftover code from the sloppy/naive first draft of extension
types support.
- notice the `spec: Union[Type[Struct]]|Type|None` type annotation (and
it appears as though a `test_ext_types_msgspec` suite actually passes
the value `spec=None` fyi) with a value of `None` to imply merging as
`Union[ext_types]|None` (or equivalently a `Optional[Union]`), due
to the incorrect `Raw`-default usage this was actually being ignored..
-> this case has now been clarified via comment in the fn-signature.
Such that we can parametrize the `@context(pld_spec)` endpoint setting
using `pytest` and of course enable testing more then just the lone
`maybe_msg_spec` case. The implementation was a bit tricky because
subactors import any `enable_modules` just after subproc spawn, so
there's no easy way to indicate from the parent should should be passed
to the `@context()` decorator since it's already resolved by the time an
IPC is established. Thus the bulk of this patch is implementing
a pre-ctx which monkey-patches the (test) `child()`-ep-defining-module
before running test logic.
Impl deats,
- drop `maybe_msg_spec` global instead providing the same value via
a new `pld_spec: Union[Type]` parametrized input to the test suite.
- add a `decorate_child_ep()` helper which (re-)decorates the
mod-defined `child()` IPC-context endpoint with the provided `pld_spec`.
- add a new "pre IPC context" endpoint: `set_chld_pldspec()` which can
be opened (from another actor) just prior to opening the `child()` ep
and it will decorate the latter (using `decorate_child_ep()`)
presuming a `.msg._exts.enc_type_union()` generated `pld_spec_strs`
is provided.
- actually open the `set_chld_pldspec()` as a `deco_ctx` rom the
root-actor and ensure we cancel it on block teardown in non-raising
cases.
Such that decoded output equivalent to `str|None` can actually be
unpacked from a `type_names = ['str', 'NoneType]` without just
ignoring the null-type entry.. Previously, the loop would fall through
silently ignoring the `None` -> `NoneType` string representation mapped
by `.enc_type_union()` and the output union would be incorrect.
Deats,
- include the stdlib's `types` in the lookup loop, obvi changing the
output var's name to `_types` to not collide.
- add output checking versus input `type_names` such that we raise
a value-error with a case specific `report: str` when either,
* the output `_types: list[Type]` is empty,
* the `len(_types) != len(type_names)`.
Since we're planning to use it for (discovery) addressing, allowing
replacement of the hacky (pretend) attempt in `tractor._multiaddr`.
Bump the lock file obvi!
- Use `Type[BaseException]` (not bare `BaseException`)
for all err-type references: `get_err_type()` return,
`._src_type`, `boxed_type` in `unpack_error()`.
- Add `|None` where types can be unresolvable
(`get_err_type()`, `.boxed_type` property).
- Add `._src_type_resolved` flag to prevent repeated
lookups and guard against `._ipc_msg is None`.
- Fix `recevier` and `exeptions` typos.
Review: PR #426 (Copilot)
https://github.com/goodboy/tractor/pull/426
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- Remove leftover `await an.cancel()` in
`test_registered_custom_err_relayed`; the
nursery already cancels on scope exit.
- Fix `This document` -> `This documents` typo in
`test_unregistered_err_still_relayed` docstring.
Review: PR #426 (Copilot)
https://github.com/goodboy/tractor/pull/426
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add a teensie unit test to match.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Drop the `xfail` test and instead add a new one that ensures the
`tractor._exceptions` fixes enable graceful relay of
remote-but-unregistered error types via the unboxing of just the
`rae.src_type_str/boxed_type_str` content. The test also ensures
a warning is included with remote error content indicating the user
should register their error type for effective cross-actor re-raising.
Deats,
- add `test_unregistered_err_still_relayed`: verify the
`RemoteActorError` IS raised with `.boxed_type`
as `None` but `.src_type_str`, `.boxed_type_str`,
and `.tb_str` all preserved from the IPC msg.
- drop `test_unregistered_boxed_type_resolution_xfail`
since the new above case covers it and we don't need to have
an effectively entirely repeated test just with an inverse assert
as it's last line..
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Make `RemoteActorError` resilient to unresolved
custom error types so that errors from remote actors
always relay back to the caller - even when the user
hasn't called `reg_err_types()` to register the exc type.
Deats,
- `.src_type`: log warning + return `None` instead
of raising `TypeError` which was crashing the
entire `_deliver_msg()` -> `pformat()` chain
before the error could be relayed.
- `.boxed_type_str`: fallback to `_ipc_msg.boxed_type_str`
when the type obj can't be resolved so the type *name* is always
available.
- `unwrap_src_err()`: fallback to `RuntimeError` preserving
original type name + traceback.
- `unpack_error()`: log warning when `get_err_type()` returns
`None` telling the user to call `reg_err_types()`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Verify registered custom error types round-trip correctly over IPC via
`reg_err_types()` + `get_err_type()`.
Deats,
- `TestRegErrTypesPlumbing`: 5 unit tests for the type-registry plumbing
(register, lookup, builtins, tractor-native types, unregistered
returns `None`)
- `test_registered_custom_err_relayed`: IPC end-to-end for a registered
`CustomAppError` checking `.boxed_type`, `.src_type`, and `.tb_str`
- `test_registered_another_err_relayed`: same for `AnotherAppError`
(multi-type coverage)
- `test_unregistered_custom_err_fails_lookup`: `xfail` documenting that
`.boxed_type` can't resolve without `reg_err_types()` registration
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
The `open_actor_cluster()` teardown hangs
intermittently on UDS when `gather_contexts(mngrs=())`
raises `ValueError` mid-setup; likely a race in the
actor-nursery cleanup vs UDS socket shutdown. TCP
passes reliably (5/5 runs).
- Add `tpt_proto` fixture param to the test
- `pytest.skip()` on UDS with a TODO for deeper
investigation of `._clustering`/`._supervise`
teardown paths
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Factor the CPU-freq-scaling helper out of
`test_legacy_one_way_streaming` into `conftest.py`
alongside a new `cpu_scaling_factor()` convenience fn
that returns a latency-headroom multiplier (>= 1.0).
Apply it to the two other flaky-timeout tests,
- `test_cancel_via_SIGINT_other_task`: 2s -> scaled
- `test_example[we_are_processes.py]`: 16s -> scaled
Deats,
- add `get_cpu_state()` + `cpu_scaling_factor()` to
`conftest.py` so all test mods can share the logic.
- catch `IndexError` (empty glob) in addition to
`FileNotFoundError`.
- rename `factor` var -> `headroom` at call sites for
clarity on intent.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add `get_cpu_state()` helper to read CPU freq settings
from `/sys/devices/system/cpu/` and use it to compensate
the perf time-limit when `auto-cpufreq` (or similar)
scales down the max frequency.
Deats,
- read `*_pstate_max_freq` and `scaling_max_freq`
to compute a `cpu_scaled` ratio.
- when `cpu_scaled != 1.`, increase `this_fast` limit
proportionally (factoring dual-threaded cores).
- log a warning via `test_log` when compensating.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Move the `Arbiter` class out of `runtime._runtime` into its
logical home at `discovery._registry` as `Registrar(Actor)`.
This completes the long-standing terminology migration from
"arbiter" to "registrar/registry" throughout the codebase.
Deats,
- add new `discovery/_registry.py` mod with `Registrar`
class + backward-compat `Arbiter = Registrar` alias.
- rename `Actor.is_arbiter` attr -> `.is_registrar`;
old attr now a `@property` with `DeprecationWarning`.
- `_root.py` imports `Registrar` directly for
root-actor instantiation.
- export `Registrar` + `Arbiter` from `tractor.__init__`.
- `_runtime.py` re-imports from `discovery._registry`
for backward compat.
Also,
- update all test files to use `.is_registrar`
(`test_local`, `test_rpc`, `test_spawning`,
`test_discovery`, `test_multi_program`).
- update "arbiter" -> "registrar" in comments/docstrings
across `_discovery.py`, `_server.py`, `_transport.py`,
`_testing/pytest.py`, and examples.
- drop resolved TODOs from `_runtime.py` and `_root.py`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Adjust all `tractor._state`, `tractor._addr`,
`tractor._supervise`, etc. refs in tests and examples
to use the new `runtime/`, `discovery/`, `spawn/` paths.
Also,
- use `tractor.debug_mode()` pub API instead of
`tractor._state.debug_mode()` in a few test mods
- add explicit `timeout=20` to `test_respawn_consumer_task`
`@tractor_test` deco call
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Restructure the flat `tractor/` top-level private mods
into (more nested) subpackages:
- `runtime/`: `_runtime`, `_portal`, `_rpc`, `_state`,
`_supervise`
- `spawn/`: `_spawn`, `_entry`, `_forkserver_override`,
`_mp_fixup_main`
- `discovery/`: `_addr`, `_discovery`, `_multiaddr`
Each subpkg `__init__.py` is kept lazy (no eager
imports) to avoid circular import issues.
Also,
- update all intra-pkg imports across ~35 mods to use
the new subpkg paths (e.g. `from .runtime._state`
instead of `from ._state`)
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Refactor the test-fn deco to use `wrapt.decorator`
instead of `functools.wraps` for better fn-sig
preservation and optional-args support via
`PartialCallableObjectProxy`.
Deats,
- add `timeout` and `hide_tb` deco params
- wrap test-fn body with `trio.fail_after(timeout)`
- consolidate per-fixture `if` checks into a loop
- add `iscoroutinefunction()` type-check on wrapped fn
- set `__tracebackhide__` at each wrapper level
Also,
- update imports for new subpkg paths:
`tractor.spawn._spawn`, `tractor.discovery._addr`,
`tractor.runtime._state`
(see upcoming, likely large patch commit ;)
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Export the new `RuntimeVars` struct and `get_runtime_vars()`
from `tractor.__init__` and improve the accessor to
optionally return the struct form.
Deats,
- add `RuntimeVars` and `get_runtime_vars` to
`__init__.py` exports; alphabetize `_state` imports.
- move `get_runtime_vars()` up in `_state.py` to sit
right below `_runtime_vars` dict definition.
- add `as_dict: bool = True` param so callers can get
either the legacy `dict` or the new `RuntimeVars`
struct.
- drop the old stub fn at bottom of `_state.py`.
- rm stale `from .msg.pretty_struct import Struct` comment.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
So we can start transition from runtime-vars `dict` to a typed struct
for better clarity and wire-ready monitoring potential, as well as
better traceability when .
Deats,
- add a new `RuntimeVars(Struct)` with all fields from `_runtime_vars`
dict typed out
- include `__setattr__()` with `breakpoint()` for debugging
any unexpected mutations.
- add `.update()` method for batch-updating compat with `dict`.
- keep old `_runtime_vars: dict` in place (we need to port a ton of
stuff to adjust..).
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Allow external app code to register custom exception types
on `._exceptions` so they can be re-raised on the receiver
side of an IPC dialog via `get_err_type()`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Expose a copy of the current actor's `_runtime_vars` dict
via a public fn; TODO to convert to `RuntimeVars` struct.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- Add `LocalPortal` union to `query_actor()` return
type and `reg_portal` var annotation since the
registrar yields a `LocalPortal` instance.
- Update docstring to note the `LocalPortal` case.
- Widen `.delete_addr()` `addr` param to accept
`list[str|int]` bc msgpack deserializes tuples as
lists over IPC.
- Tighten `uid` annotation to `tuple[str, str]|None`.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
`msgpack` deserializes tuples as lists over IPC so
the `bidict.inverse.pop()` needs a `tuple`-cast to
match registry keys.
Regressed-by: 85457cb (`registry_addrs` change)
Found-via: `/run-tests` test_stale_entry_is_deleted
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
- Use `bidict.forceput()` in `register_actor()` to handle
duplicate addr values from stale entries or actor restarts.
- Fix `uid` annotation to `tuple[str, str]|None` in
`maybe_open_portal()` and handle the `None` return from
`delete_addr()` in log output.
- Pass explicit `registry_addrs=[reg_addr]` to `open_nursery()`
and `find_actor()` in `test_stale_entry_is_deleted` to ensure
the test uses the remote registrar.
- Update `query_actor()` docstring to document the new
`(addr, reg_portal)` yield shape.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix potential `AttributeError` when `query_actor()` yields
a `None` portal (peer-found-locally path) and an `OSError`
is raised during transport connect.
Also,
- fix `Arbiter.delete_addr()` return type to
`tuple[str, str]|None` bc it can return `None`.
- fix "registar" typo -> "registrar" in comment.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
By spawning an actor task that immediately shuts down the transport
server and then sleeps, verify that attempting to connect via the
`._discovery.find_actor()` helper delivers `None` for the `Portal`
value.
Relates to #184 and #216
In cases where an actor's transport server task (by default handling new
TCP connections) terminates early but does not de-register from the
pertaining registry (aka the registrar) actor's address table, the
trying-to-connect client actor will get a connection error on that
address. In the case where client handles a (local) `OSError` (meaning
the target actor address is likely being contacted over `localhost`)
exception, make a further call to the registrar to delete the stale
entry and `yield None` gracefully indicating to calling code that no
`Portal` can be delivered to the target address.
This issue was originally discovered in `piker` where the `emsd`
(clearing engine) actor would sometimes crash on rapid client
re-connects and then leave a `pikerd` stale entry. With this fix new
clients will attempt connect via an endpoint which will re-spawn the
`emsd` when a `None` portal is delivered (via `maybe_spawn_em()`).
Since stale addrs can be leaked where the actor transport server task
crashes but doesn't (successfully) unregister from the registrar, we
need a remote way to remove such entries; hence this new (registrar)
method.
To implement this make use of the `bidict` lib for the `._registry`
table thus making it super simple to do reverse uuid lookups from an
input socket-address.
- `test_inter_peer_cancellation`: swap all `.uid` refs
on `Actor`, `Channel`, and `Portal` to `.aid.uid`
- `test_legacy_one_way_streaming`: same + fix `print()`
to multiline style
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code