Commit Graph

2475 Commits (maybe_open_ctx_locking)

Author SHA1 Message Date
Gud Boi 007b45857b Revert `resources.pop()` back inside `run_ctx` inner finally
Reverts the `_Cache.run_ctx` change from 93aa39db which
moved `resources.pop(ctx_key)` to an outer `finally`
*after* the acm's `__aexit__()`. That introduced an
atomicity gap: `values` was already popped in the inner
finally but `resources` survived through the acm teardown
checkpoints. A re-entering task that creates a fresh lock
(the old one having been popped by the exiting caller)
could then acquire immediately and find stale `resources`
(for which now we raise a `RuntimeError('Caching resources ALREADY
exist?!')`).

Deats,
- the orig 93aa39db rationale was a preemptive guard
  against acm `__aexit__()` code accessing `_Cache`
  mid-teardown, but no `@acm` in `tractor` (or `piker`) ever
  does that; the scenario never materialized.
- by popping both `values` AND `resources` atomically
  (no checkpoint between them) in the inner finally,
  the re-entry race window is closed: either the new
  task sees both entries (cache hit) or neither
  (clean cache miss).
- `test_moc_reentry_during_teardown` now passes
  without `xfail`! (:party:)

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-07 00:47:53 -04:00
Gud Boi 06cfe8999a claude: add `conc-anal` skill
That is a repo-specific "concurrency-analysis" skill

Bo

Structured analysis framework for `trio`-based async races: inventories
shared mutable state, maps checkpoint boundaries, traces interleaved
task schedules, and proposes fixes. Includes tractor-specific patterns
for `_Cache` lock vs `run_ctx` lifetime, the `values`/`resources`
atomicity gap, and `Event.set()` scheduling semantics.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-07 00:38:14 -04:00
Gud Boi da7e749b09 Add prompt-io log for `run_ctx` teardown analysis
Documents the diagnostic session tracing why
per-`ctx_key` locking alone doesn't close the
`_Cache.run_ctx` teardown race — the lock pops
in the exiting caller's task but resource cleanup
runs in the `run_ctx` task inside `service_tn`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-06 18:37:14 -04:00
Gud Boi 03c42e1333 Drop `xfail` from `test_moc_reentry_during_teardown`
The per-`ctx_key` locking fix in f086222d intended to resolve the
teardown race reproduced by the new test suite, so the test SHOULD now
pass. TLDR, it doesn't Bp

Also add `collapse_eg()` to the test's ctx-manager stack so that when
run with `pytest <...> --tpdb` we'll actually `pdb`-REPL the RTE when it
hits (previously an assert-error).

(this commit-msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-06 18:29:07 -04:00
Gud Boi f086222d74 Use per-key locking+user tracking in `maybe_open_context()`
(Hopefully!) solving a long-run bug with the `brokerd.kraken` backend in
`piker`..

- Track `_Cache.users` per `ctx_key` via a `defaultdict[..., int]`
  instead of a single global counter; fix premature teardown when
  multiple ctx keys are active simultaneously.
- Key `_Cache.locks` on `ctx_key` (not bare `fid`) so different kwarg
  sets for the same `acm_func` get independent `StrictFIFOLock`s.
- Add `_UnresolvedCtx` sentinel class to replace bare `None` check;
  avoid false-positive teardown when a wrapped acm legitimately yields
  `None`.
- Swap resource-exists `assert` for detailed `RuntimeError`.

Also,
- fix "whih" typo.
- add debug logging for lock acquire/release lifecycle.

(this commit-msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-06 18:19:55 -04:00
Gud Boi cab366cd65 Add xfail test for `_Cache.run_ctx` teardown race
Reproduce the piker `open_cached_client('kraken')` scenario: identical
`ctx_key` callers share one cached resource, and a new task re-enters
during `__aexit__` — hitting `assert not resources.get()` bc `values`
was popped but `resources` wasn't yet.

Deats,
- `test_moc_reentry_during_teardown` uses an `in_aexit` event to
  deterministically land in the teardown window.
- marked `xfail(raises=AssertionError)` against unpatched code (fix in
  `9e49eddd` or wtv lands on the `maybe_open_ctx_locking` or thereafter
  patch branch).

Also, add prompt-io log for the session.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

Prompt-IO: ai/prompt-io/claude/20260406T193125Z_85f9c5d_prompt_io.md
2026-04-06 18:17:04 -04:00
Gud Boi 85f9c5df6f Add per-`ctx_key` isolation tests for `maybe_open_context()`
Add `test_per_ctx_key_resource_lifecycle` to verify that per-key user
tracking correctly tears down resources independently - exercises the
fix from 02b2ef18 where a global `_Cache.users` counter caused stale
cache hits when the same `acm_func` was called with different kwargs.

Also, add a paired `acm_with_resource()` helper `@acm` that yields its
`resource_id` for per-key testing in the above suite.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

Prompt-IO: ai/prompt-io/claude/20260406T172848Z_02b2ef1_prompt_io.md
2026-04-06 14:37:47 -04:00
Gud Boi ebe9d5e4b5 Parametrize `test_resource_cache.test_open_local_sub_to_stream`
Namely with multiple pre-sleep `delay`-parametrizations before either,

- parent-scope cancel-calling (as originally) or,
- depending on the  new `cancel_by_cs: bool` suite parameter, optionally
  just immediately exiting from (the newly named)
  `maybe_cancel_outer_cs()` a checkpoint.

In the latter case we ensure we **don't** inf sleep to avoid leaking
those tasks into the `Actor._service_tn` (though we should really have
a better soln for this)..

Deats,
- make `cs` args optional and adjust internal logic to match.
- add some notes around various edge cases and issues with using the
  actor-service-tn as the scope by default.
2026-04-06 14:37:47 -04:00
Bd bbf01d5161
Merge pull request #430 from goodboy/dependabot/uv/pygments-2.20.0
Bump pygments from 2.19.2 to 2.20.0
2026-04-05 13:33:33 -04:00
Bd ec8e8a2786
Merge pull request #426 from goodboy/remote_exc_type_registry
Fix remote exc relay + add `reg_err_types()` tests
2026-04-02 22:44:36 -04:00
Gud Boi c3d1ec22eb Fix `Type[BaseException]` annots, guard `.src_type` resolve
- Use `Type[BaseException]` (not bare `BaseException`)
  for all err-type references: `get_err_type()` return,
  `._src_type`, `boxed_type` in `unpack_error()`.
- Add `|None` where types can be unresolvable
  (`get_err_type()`, `.boxed_type` property).
- Add `._src_type_resolved` flag to prevent repeated
  lookups and guard against `._ipc_msg is None`.
- Fix `recevier` and `exeptions` typos.

Review: PR #426 (Copilot)
https://github.com/goodboy/tractor/pull/426

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 8f44efa327 Drop stale `.cancel()`, fix docstring typo in tests
- Remove leftover `await an.cancel()` in
  `test_registered_custom_err_relayed`; the
  nursery already cancels on scope exit.
- Fix `This document` -> `This documents` typo in
  `test_unregistered_err_still_relayed` docstring.

Review: PR #426 (Copilot)
https://github.com/goodboy/tractor/pull/426

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 5968a3c773 Use `'<unknown>'` for unresolvable `.boxed_type_str`
Add a teensie unit test to match.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 80597b80bf Add passing test for unregistered err relay
Drop the `xfail` test and instead add a new one that ensures the
`tractor._exceptions` fixes enable graceful relay of
remote-but-unregistered error types via the unboxing of just the
`rae.src_type_str/boxed_type_str` content. The test also ensures
a warning is included with remote error content indicating the user
should register their error type for effective cross-actor re-raising.

Deats,
- add `test_unregistered_err_still_relayed`: verify the
  `RemoteActorError` IS raised with `.boxed_type`
  as `None` but `.src_type_str`, `.boxed_type_str`,
  and `.tb_str` all preserved from the IPC msg.
- drop `test_unregistered_boxed_type_resolution_xfail`
  since the new above case covers it and we don't need to have
  an effectively entirely repeated test just with an inverse assert
  as it's last line..

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi a41c6d5c70 Fix unregistered-remote-error-type relay crash
Make `RemoteActorError` resilient to unresolved
custom error types so that errors from remote actors
always relay back to the caller - even when the user
hasn't called `reg_err_types()` to register the exc type.

Deats,
- `.src_type`: log warning + return `None` instead
  of raising `TypeError` which was crashing the
  entire `_deliver_msg()` -> `pformat()` chain
  before the error could be relayed.
- `.boxed_type_str`: fallback to `_ipc_msg.boxed_type_str`
  when the type obj can't be resolved so the type *name* is always
  available.
- `unwrap_src_err()`: fallback to `RuntimeError` preserving
  original type name + traceback.
- `unpack_error()`: log warning when `get_err_type()` returns
  `None` telling the user to call `reg_err_types()`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Gud Boi 9c37b3f956 Add `reg_err_types()` test suite for remote exc relay
Verify registered custom error types round-trip correctly over IPC via
`reg_err_types()` + `get_err_type()`.

Deats,
- `TestRegErrTypesPlumbing`: 5 unit tests for the type-registry plumbing
  (register, lookup, builtins, tractor-native types, unregistered
  returns `None`)
- `test_registered_custom_err_relayed`: IPC end-to-end for a registered
  `CustomAppError` checking `.boxed_type`, `.src_type`, and `.tb_str`
- `test_registered_another_err_relayed`: same for `AnotherAppError`
  (multi-type coverage)
- `test_unregistered_custom_err_fails_lookup`: `xfail` documenting that
  `.boxed_type` can't resolve without `reg_err_types()` registration

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 18:21:19 -04:00
Bd 8f6bc56174
Merge pull request #427 from goodboy/subsys_reorg
Mv core mods to `runtime/`, `spawn/`, `discovery/` subpkgs
2026-04-02 18:21:00 -04:00
Gud Boi b14dbde77b Skip `test_empty_mngrs_input_raises` on UDS tpt
The `open_actor_cluster()` teardown hangs
intermittently on UDS when `gather_contexts(mngrs=())`
raises `ValueError` mid-setup; likely a race in the
actor-nursery cleanup vs UDS socket shutdown. TCP
passes reliably (5/5 runs).

- Add `tpt_proto` fixture param to the test
- `pytest.skip()` on UDS with a TODO for deeper
  investigation of `._clustering`/`._supervise`
  teardown paths

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi cd6509b724 Fix `tractor_test` kwarg check and Windows `start_method` default
- Use `kw in kwargs` membership test instead of
  `kwargs[kw]` to avoid `KeyError` on missing params.
- Restructure Windows `start_method` logic to properly
  default to `'trio'` when unset; only raise on an
  explicit non-trio value.

Review: PR #427 (Copilot)
https://github.com/goodboy/tractor/pull/427#pullrequestreview-4009934142

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 93d99ed2eb Move `get_cpu_state()` to `conftest` as shared latency headroom
Factor the CPU-freq-scaling helper out of
`test_legacy_one_way_streaming` into `conftest.py`
alongside a new `cpu_scaling_factor()` convenience fn
that returns a latency-headroom multiplier (>= 1.0).

Apply it to the two other flaky-timeout tests,
- `test_cancel_via_SIGINT_other_task`: 2s -> scaled
- `test_example[we_are_processes.py]`: 16s -> scaled

Deats,
- add `get_cpu_state()` + `cpu_scaling_factor()` to
  `conftest.py` so all test mods can share the logic.
- catch `IndexError` (empty glob) in addition to
  `FileNotFoundError`.
- rename `factor` var -> `headroom` at call sites for
  clarity on intent.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 6215e3b2dd Adjust `test_a_quadruple_example` time-limit for CPU scaling
Add `get_cpu_state()` helper to read CPU freq settings
from `/sys/devices/system/cpu/` and use it to compensate
the perf time-limit when `auto-cpufreq` (or similar)
scales down the max frequency.

Deats,
- read `*_pstate_max_freq` and `scaling_max_freq`
  to compute a `cpu_scaled` ratio.
- when `cpu_scaled != 1.`, increase `this_fast` limit
  proportionally (factoring dual-threaded cores).
- log a warning via `test_log` when compensating.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi be5d8da8c0 Just alias `Arbiter` via assignment 2026-04-02 17:59:13 -04:00
Gud Boi 21ed181835 Filter `__pycache__` from example discovery in tests
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 9ec2749ab7 Rename `Arbiter` -> `Registrar`, mv to `discovery._registry`
Move the `Arbiter` class out of `runtime._runtime` into its
logical home at `discovery._registry` as `Registrar(Actor)`.
This completes the long-standing terminology migration from
"arbiter" to "registrar/registry" throughout the codebase.

Deats,
- add new `discovery/_registry.py` mod with `Registrar`
  class + backward-compat `Arbiter = Registrar` alias.
- rename `Actor.is_arbiter` attr -> `.is_registrar`;
  old attr now a `@property` with `DeprecationWarning`.
- `_root.py` imports `Registrar` directly for
  root-actor instantiation.
- export `Registrar` + `Arbiter` from `tractor.__init__`.
- `_runtime.py` re-imports from `discovery._registry`
  for backward compat.

Also,
- update all test files to use `.is_registrar`
  (`test_local`, `test_rpc`, `test_spawning`,
  `test_discovery`, `test_multi_program`).
- update "arbiter" -> "registrar" in comments/docstrings
  across `_discovery.py`, `_server.py`, `_transport.py`,
  `_testing/pytest.py`, and examples.
- drop resolved TODOs from `_runtime.py` and `_root.py`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi f3441a6790 Update tests+examples imports for new subpkgs
Adjust all `tractor._state`, `tractor._addr`,
`tractor._supervise`, etc. refs in tests and examples
to use the new `runtime/`, `discovery/`, `spawn/` paths.

Also,
- use `tractor.debug_mode()` pub API instead of
  `tractor._state.debug_mode()` in a few test mods
- add explicit `timeout=20` to `test_respawn_consumer_task`
  `@tractor_test` deco call

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi cc42d38284 Mv core mods to `runtime/`, `spawn/`, `discovery/` subpkgs
Restructure the flat `tractor/` top-level private mods
into (more nested) subpackages:

- `runtime/`: `_runtime`, `_portal`, `_rpc`, `_state`,
  `_supervise`
- `spawn/`: `_spawn`, `_entry`, `_forkserver_override`,
  `_mp_fixup_main`
- `discovery/`: `_addr`, `_discovery`, `_multiaddr`

Each subpkg `__init__.py` is kept lazy (no eager
imports) to avoid circular import issues.

Also,
- update all intra-pkg imports across ~35 mods to use
  the new subpkg paths (e.g. `from .runtime._state`
  instead of `from ._state`)

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 6827ceba12 Use `wrapt` for `tractor_test()` decorator
Refactor the test-fn deco to use `wrapt.decorator`
instead of `functools.wraps` for better fn-sig
preservation and optional-args support via
`PartialCallableObjectProxy`.

Deats,
- add `timeout` and `hide_tb` deco params
- wrap test-fn body with `trio.fail_after(timeout)`
- consolidate per-fixture `if` checks into a loop
- add `iscoroutinefunction()` type-check on wrapped fn
- set `__tracebackhide__` at each wrapper level

Also,
- update imports for new subpkg paths:
  `tractor.spawn._spawn`, `tractor.discovery._addr`,
  `tractor.runtime._state`
  (see upcoming, likely large patch commit ;)

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 94458807ce Expose `RuntimeVars` + `get_runtime_vars()` from pkg
Export the new `RuntimeVars` struct and `get_runtime_vars()`
from `tractor.__init__` and improve the accessor to
optionally return the struct form.

Deats,
- add `RuntimeVars` and `get_runtime_vars` to
  `__init__.py` exports; alphabetize `_state` imports.
- move `get_runtime_vars()` up in `_state.py` to sit
  right below `_runtime_vars` dict definition.
- add `as_dict: bool = True` param so callers can get
  either the legacy `dict` or the new `RuntimeVars`
  struct.
- drop the old stub fn at bottom of `_state.py`.
- rm stale `from .msg.pretty_struct import Struct` comment.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi be5e7e446b Proto a `._state.RuntimeVars` struct
So we can start transition from runtime-vars `dict` to a typed struct
for better clarity and wire-ready monitoring potential, as well as
better traceability when .

Deats,
- add a new `RuntimeVars(Struct)` with all fields from `_runtime_vars`
  dict typed out
- include `__setattr__()` with `breakpoint()` for debugging
  any unexpected mutations.
- add `.update()` method for batch-updating compat with `dict`.
- keep old `_runtime_vars: dict` in place (we need to port a ton of
  stuff to adjust..).

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi 571b2b320e Add `reg_err_types()` for custom remote exc lookup
Allow external app code to register custom exception types
on `._exceptions` so they can be re-raised on the receiver
side of an IPC dialog via `get_err_type()`.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
Gud Boi c7b5d00f19 Add `get_runtime_vars()` accessor to `._state`
Expose a copy of the current actor's `_runtime_vars` dict
via a public fn; TODO to convert to `RuntimeVars` struct.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-02 17:59:13 -04:00
dependabot[bot] 1049f7bf38
Bump pygments from 2.19.2 to 2.20.0
Bumps [pygments](https://github.com/pygments/pygments) from 2.19.2 to 2.20.0.
- [Release notes](https://github.com/pygments/pygments/releases)
- [Changelog](https://github.com/pygments/pygments/blob/master/CHANGES)
- [Commits](https://github.com/pygments/pygments/compare/2.19.2...2.20.0)

---
updated-dependencies:
- dependency-name: pygments
  dependency-version: 2.20.0
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-03-30 20:28:44 +00:00
Bd cc3bfac741
Merge pull request #366 from goodboy/dereg_on_oserror
Make `find_actor()` delete stale sockaddr entries from registrar on `OSError`
2026-03-25 03:27:27 -04:00
Gud Boi e71eec07de Refine type annots in `_discovery` and `_runtime`
- Add `LocalPortal` union to `query_actor()` return
  type and `reg_portal` var annotation since the
  registrar yields a `LocalPortal` instance.
- Update docstring to note the `LocalPortal` case.
- Widen `.delete_addr()` `addr` param to accept
  `list[str|int]` bc msgpack deserializes tuples as
  lists over IPC.
- Tighten `uid` annotation to `tuple[str, str]|None`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-25 02:16:48 -04:00
Gud Boi b557ec20a7 Coerce IPC `addr` to `tuple` in `.delete_addr()`
`msgpack` deserializes tuples as lists over IPC so
the `bidict.inverse.pop()` needs a `tuple`-cast to
match registry keys.

Regressed-by: 85457cb (`registry_addrs` change)
Found-via: `/run-tests` test_stale_entry_is_deleted

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-25 01:36:58 -04:00
Gud Boi 85457cb839 Address Copilot review suggestions on PR #366
- Use `bidict.forceput()` in `register_actor()` to handle
  duplicate addr values from stale entries or actor restarts.
- Fix `uid` annotation to `tuple[str, str]|None` in
  `maybe_open_portal()` and handle the `None` return from
  `delete_addr()` in log output.
- Pass explicit `registry_addrs=[reg_addr]` to `open_nursery()`
  and `find_actor()` in `test_stale_entry_is_deleted` to ensure
  the test uses the remote registrar.
- Update `query_actor()` docstring to document the new
  `(addr, reg_portal)` yield shape.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 00:21:09 -04:00
Gud Boi 850219f60c Guard `reg_portal` for `None` in `maybe_open_portal()`
Fix potential `AttributeError` when `query_actor()` yields
a `None` portal (peer-found-locally path) and an `OSError`
is raised during transport connect.

Also,
- fix `Arbiter.delete_addr()` return type to
  `tuple[str, str]|None` bc it can return `None`.
- fix "registar" typo -> "registrar" in comment.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-17 17:26:57 -04:00
Tyler Goodlet d929fb75b5 Rename `.delete_sockaddr()` -> `.delete_addr()` 2026-03-13 21:51:15 -04:00
Tyler Goodlet 403c2174a1 Always no-raise try-to-pop registry addrs 2026-03-13 21:51:15 -04:00
Tyler Goodlet 528012f35f Add stale entry deleted from registrar test
By spawning an actor task that immediately shuts down the transport
server and then sleeps, verify that attempting to connect via the
`._discovery.find_actor()` helper delivers `None` for the `Portal`
value.

Relates to #184 and #216
2026-03-13 21:51:15 -04:00
Tyler Goodlet 0dfa6f4a8a Don't unwrap and unwrapped addr, just warn on delete XD 2026-03-13 21:51:15 -04:00
Tyler Goodlet a0d3741fac Ensure `._registry` values are hashable, since `bidict`! 2026-03-13 21:51:15 -04:00
Tyler Goodlet 149b800c9f Handle stale registrar entries; detect and delete
In cases where an actor's transport server task (by default handling new
TCP connections) terminates early but does not de-register from the
pertaining registry (aka the registrar) actor's address table, the
trying-to-connect client actor will get a connection error on that
address. In the case where client handles a (local) `OSError` (meaning
the target actor address is likely being contacted over `localhost`)
exception, make a further call to the registrar to delete the stale
entry and `yield None` gracefully indicating to calling code that no
`Portal` can be delivered to the target address.

This issue was originally discovered in `piker` where the `emsd`
(clearing engine) actor would sometimes crash on rapid client
re-connects and then leave a `pikerd` stale entry. With this fix new
clients will attempt connect via an endpoint which will re-spawn the
`emsd` when a `None` portal is delivered (via `maybe_spawn_em()`).
2026-03-13 21:51:15 -04:00
Tyler Goodlet 03f458a45c Add `Arbiter.delete_sockaddr()` to remove addrs
Since stale addrs can be leaked where the actor transport server task
crashes but doesn't (successfully) unregister from the registrar, we
need a remote way to remove such entries; hence this new (registrar)
method.

To implement this make use of the `bidict` lib for the `._registry`
table thus making it super simple to do reverse uuid lookups from an
input socket-address.
2026-03-13 21:51:15 -04:00
Bd e77198bb64
Merge pull request #422 from goodboy/global_uds_in_test_harness
Run (some) test suites in CI with `--tpt-proto uds`
2026-03-13 21:50:45 -04:00
Gud Boi 5b8f6cf4c7 Use `.aid.uid` to avoid deprecation warns in tests
- `test_inter_peer_cancellation`: swap all `.uid` refs
  on `Actor`, `Channel`, and `Portal` to `.aid.uid`
- `test_legacy_one_way_streaming`: same + fix `print()`
  to multiline style

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 21:10:52 -04:00
Gud Boi 8868ff19f3 Flip to `ActorNursery.cancel_called` API
Avoid deprecation warnings, prepping for property removal.
2026-03-13 21:10:52 -04:00
Gud Boi 066011b83d Bump `fail_after` delay on non-linux for sync-sleep test
Use 6s timeout on non-linux (vs 4s) in
`test_cancel_while_childs_child_in_sync_sleep()` to avoid
flaky `TooSlowError` on slower CI runners.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 21:10:52 -04:00
Gud Boi b1d003d850 Add `--tpt-proto` CI matrix and wire to `pytest`
- add `tpt_proto: ['tcp', 'uds']` matrix dimension
  to the `testing` job.
- exclude `uds` on `macos-latest` for now.
- pass `--tpt-proto=${{ matrix.tpt_proto }}` to the
  `pytest` invocation.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 21:10:52 -04:00
Gud Boi 8991ec2bf5 Fix warns and de-reg race in `test_discovery`
Removes the `pytest` deprecation warns and attempts to avoid
some de-registration raciness, though i'm starting to think the
real issue is due to not having the fixes from #366 (which handle
the new dereg on `OSError` case from UDS)?

- use `.channel.aid.uid` over deprecated `.channel.uid`
  throughout `test_discovery.py`.
- add polling loop (up to 5s) for subactor de-reg check
  in `spawn_and_check_registry()` to handle slower
  transports like UDS where teardown takes longer.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 21:10:52 -04:00