Commit Graph

1482 Commits (2917b74ba4dbdee2bfda5d3a28bb5b76a10ff0bc)

Author SHA1 Message Date
Tyler Goodlet 0dfa6f4a8a Don't unwrap and unwrapped addr, just warn on delete XD 2026-03-13 21:51:15 -04:00
Tyler Goodlet a0d3741fac Ensure `._registry` values are hashable, since `bidict`! 2026-03-13 21:51:15 -04:00
Tyler Goodlet 149b800c9f Handle stale registrar entries; detect and delete
In cases where an actor's transport server task (by default handling new
TCP connections) terminates early but does not de-register from the
pertaining registry (aka the registrar) actor's address table, the
trying-to-connect client actor will get a connection error on that
address. In the case where client handles a (local) `OSError` (meaning
the target actor address is likely being contacted over `localhost`)
exception, make a further call to the registrar to delete the stale
entry and `yield None` gracefully indicating to calling code that no
`Portal` can be delivered to the target address.

This issue was originally discovered in `piker` where the `emsd`
(clearing engine) actor would sometimes crash on rapid client
re-connects and then leave a `pikerd` stale entry. With this fix new
clients will attempt connect via an endpoint which will re-spawn the
`emsd` when a `None` portal is delivered (via `maybe_spawn_em()`).
2026-03-13 21:51:15 -04:00
Tyler Goodlet 03f458a45c Add `Arbiter.delete_sockaddr()` to remove addrs
Since stale addrs can be leaked where the actor transport server task
crashes but doesn't (successfully) unregister from the registrar, we
need a remote way to remove such entries; hence this new (registrar)
method.

To implement this make use of the `bidict` lib for the `._registry`
table thus making it super simple to do reverse uuid lookups from an
input socket-address.
2026-03-13 21:51:15 -04:00
Gud Boi 8868ff19f3 Flip to `ActorNursery.cancel_called` API
Avoid deprecation warnings, prepping for property removal.
2026-03-13 21:10:52 -04:00
Gud Boi 52e8fb43ee Tighten UDS addr validation and sockname prefixes
- add `is_valid` and `sockpath.resolve()` asserts in
  `get_rando_addr()` for the `'uds'` case plus an
  explicit `UDSAddress` type annotation.
- rename no-runtime sockname prefixes from
  `'<unknown-actor>'`/`'root'` to
  `'no_runtime_root'`/`'no_runtime_actor'` with a proper
  if/else branch in `UDSAddress.get_random()`.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 21:10:52 -04:00
Gud Boi 99577b719a Skip cluster test on UDS, wire `tpt_proto` fixture
Add UDS skip-guard to `test_streaming_to_actor_cluster()`
and plumb `tpt_proto` through the `@tractor_test` wrapper
so transport-parametrized tests can receive it.

Deats,
- skip cluster test when `tpt_proto == 'uds'` with
  descriptive msg, add TODO about `@pytest.mark.no_tpt`.
- add `tpt_proto: str|None` param to inner wrapper in
  `tractor_test()`, forward to decorated fn when its sig
  accepts it.
- register custom `no_tpt` marker via `pytest_configure()`
  to avoid unknown-marker warnings.
- add masked todo for `no_tpt` marker-check code in `tpt_proto` fixture
  (needs fn-scope to work, left as TODO).
- add `request` param to `tpt_proto` fixture for future
  marker inspection.

Also,
- add doc-string to `test_streaming_to_actor_cluster()`.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 21:10:52 -04:00
Gud Boi 4092db60b2 Revert advanced-fault UDS edge case handling
Namely the workaround expected exc branches added in ef7ed7a for the UDS
parametrization. With the new boxing of the underlying CREs as
tpt-closed, we can expect the same exc outcomes as in the TCP cases.

Also this tweaks some error report logging content used while debugging
this,
- properly `repr()` the `TransportClosed.src_exc`-type from
  the maybe emit in `.report_n_maybe_raise()`.
- remove the redudant `chan.raddr` from the "closed abruptly"
  header in the tpt-closed handler of `._rpc.process_messages()`,
  the `Channel.__repr__()` now contains it by default.
2026-03-13 21:10:52 -04:00
Gud Boi 4f333dee05 Pass `enable_transports` in `daemon` fixture
Forward the `tpt_proto` fixture val into spawned daemon
subprocesses via `run_daemon(enable_transports=..)` and
sync `_runtime_vars['_enable_tpts']` in the `tpt_proto`
fixture so sub-actors inherit the transport setting.

Deats,
- add `enable_transports={enable_tpts}` to the daemon
  spawn-cmd template in `tests/conftest.py`.
- set `_state._runtime_vars['_enable_tpts']` in the
  `tpt_proto` fixture in `_testing/pytest.py`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 21:10:52 -04:00
Gud Boi 5ea721683b Use `.aid.uid` to avoid deprecation warns
I started getting annoyed by all the warnings from `pytest` during work
on macos suport in CI, so this replaces all `Actor.uid`/`Channel.uid`
accesses with `.aid.uid` (or `.aid.reprol()` for log msgs) across the
core runtime and IPC subsystems to avoid the noise.

This also provides incentive to start the adjustment to all
`.uid`-holding/tracking internal `dict`-tables/data-structures to
instead use `.msg.types.Aid`. Hopefully that will come a (vibed?) follow
up shortly B)

Deats,
- `._context`: swap all `self._actor.uid`, `self.chan.uid`,
  and `portal.actor.uid` refs to `.aid.uid`; use
  `.aid.reprol()` for log/error formatting.
- `._rpc`: same treatment for `actor.uid`, `chan.uid` in
  log msgs and cancel-scope handling; fix `str(err)` typo
  in `ContextCancelled` log.
- `._runtime`: update `chan.uid` -> `chan.aid.uid` in ctx
  cache lookups, RPC `Start` msg, registration and
  cancel-request handling; improve ctxc log formatting.
- `._spawn`: replace all `subactor.uid` with
  `.aid.uid` for child-proc tracking, IPC peer waiting,
  debug-lock acquisition, and nursery child dict ops.
- `._supervise`: same for `subactor.uid` in cancel and
  portal-wait paths; use `actor.aid.uid` for error dict.
- `._state`: fix `last.uid` -> `last.aid.uid` in
  `current_actor()` error msg.

Also,
- `._chan`: make `Channel.aid` a proper `@property` backed
  by `._aid` so we can add validation/typing later.
- `.log`: use `current_actor().aid.uuid` instead of
  `.uid[1]` for actor-uid log field.
- `.msg.types`: add TODO comment for `Start.aid` field
  conversion.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 21:10:52 -04:00
Gud Boi b3ce5ab4f6 Swap `open_channel_from()` to yield `(chan, first)`
Deliver `(LinkedTaskChannel, Any)` instead of the prior `(first, chan)`
order from `open_channel_from()` to match the type annotation and be
consistent with `trio.open_*_channel()` style where the channel obj
comes first.

- flip `yield first, chan` -> `yield chan, first`
- update type annotation + docstring to match
- swap all unpack sites in tests and examples

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 19:28:57 -04:00
Gud Boi e89fe03da7 Fix `LinkedTaskChannel` docstrings from GH bot review
Address valid findings from copilot's PR #413 review
(https://github.com/goodboy/tractor/pull/413
 #pullrequestreview-3925876037):

- `.get()` docstring referenced non-existent
  `._from_trio` attr, correct to `._to_aio`.
- `.send()` docstring falsely claimed error-raising
  on missing `from_trio` arg; reword to describe the
  actual `.put_nowait()` enqueue behaviour.
- `.open_channel_from()` return type annotation had
  `tuple[LinkedTaskChannel, Any]` but `yield` order
  is `(first, chan)`; fix annotation + docstring to
  match actual `tuple[Any, LinkedTaskChannel]`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 19:28:57 -04:00
Gud Boi 36cbc07602 Tried out an alt approach for `.to_asyncio` crashes
This change is masked out now BUT i'm leaving it in for reference.

I was debugging a multi-actor fault where the primary source actor was
an infected-aio-subactor (`brokerd.ib`) and it seemed like the REPL was only
entering on the `trio` side (at a `.open_channel_from()`) and not
eventually breaking in the `asyncio.Task`. But, since (changing
something?) it seems to be working now, it's just that the `trio` side
seems to sometimes handle before the (source/causing and more
child-ish) `asyncio`-task, which is a bit odd and not expected..
We could likely refine (maybe with an inter-loop-task REPL lock?) this
at some point and ensure a child-`asyncio` task which errors always
grabs the REPL **first**?

Lowlevel deats/further-todos,
- add (masked) `maybe_open_crash_handler()` block around
  `asyncio.Task` execution with notes about weird parent-addr
  delivery bug in `test_sync_pause_from_aio_task`
  * yeah dunno what that's about but made a bug; seems to be IPC
    serialization of the `TCPAddress` struct somewhere??
- add inter-loop lock TODO for avoiding aio-task clobbering
  trio-tasks when both crash in debug-mode

Also,
- change import from `tractor.devx.debug` to `tractor.devx`
- adjust `get_logger()` call to use new implicit mod-name detection
  added to `.log.get_logger()`, i.e. sin `name=__name__`.
- some teensie refinements to `open_channel_from()`:
  * swap return type annotation for  to `tuple[LinkedTaskChannel, Any]`
    (was `Any`).
  * update doc-string to clarify started-value delivery
  * add err-log before `.pause()` in what should be an unreachable path.
  * add todo to swap the `(first, chan)` pair to match that of ctx..

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])

[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-13 19:28:57 -04:00
Tyler Goodlet 1f2fad22ee Extend `.to_asyncio.LinkedTaskChannel` for aio side
With methods to comms similar to those that exist for the `trio` side,
- `.get()` which proxies verbatim to the `._to_aio: asyncio.Queue`,
- `.send_nowait()` which thin-wraps to `._to_trio: trio.MemorySendChannel`.

Obviously the more correct design is to break up the channel type into
a pair of handle types, one for each "side's" task in each event-loop,
that's hopefully coming shortly in a follow up patch B)

Also,
- fill in some missing doc strings, tweak some explanation comments and
  update todos.
- adjust the `test_aio_errors_and_channel_propagates_and_closes()` suite
  to use the new `chan` fn-sig-API with `.open_channel_from()` including
  the new methods for msg comms; ensures everything added here works e2e.
2026-03-13 19:28:57 -04:00
Tyler Goodlet ca5f6f50a8 Explain the `infect_asyncio: bool` param to pass in RTE msg 2026-03-13 19:28:57 -04:00
Gud Boi 8215a7ba34 Hide private fields in `Struct.pformat()` output
Skip fields starting with `_` in pretty-printed struct output
to avoid cluttering displays with internal/private state (and/or accessing
private properties which have errors Bp).

Deats,
- add `if k[0] == '_': continue` check to skip private fields
- change nested `if isinstance(v, Struct)` to `elif` since we
  now have early-continue for private fields
- mv `else:` comment to clarify it handles top-level fields
- fix indentation of `yield` statement to only output
  non-private, non-nested fields

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-10 00:33:32 -04:00
Gud Boi daae196048 Warn if `.ipc._uds.get_peer_pid()` returns null 2026-03-08 19:17:16 -04:00
Gud Boi 70efcb09a0 Slight refinements to `._state.get_rt_dir()`
Per the `copilot` review,
https://github.com/goodboy/tractor/pull/406#pullrequestreview-3893270953

now we also,
- pass `exists_ok=True` to `.mkdir()` to avoid conc races.
- expose `appname: str` param for caller override.
- normalize `subdir` to avoid escaping the base rt-dir location.
2026-03-08 19:17:16 -04:00
Gud Boi a7e74acdff Doc `getsockopt()` args (for macOS)
Per the questionable `copilot` review which is detailed for follow up in
https://github.com/goodboy/tractor/issues/418. These constants are
directly linked from the kernel sources fwiw.
2026-03-08 19:17:16 -04:00
Gud Boi 9c3d3bcec1 Add prompt flush hack for `bash` on macos as well.. 2026-03-08 19:17:16 -04:00
Gud Boi 521fb97fe9 Support UDS on macos (for realz)
Though it was a good (vibed) try by @dnks, the previous "fix" was not
actually adding unix socket support but merely sidestepping a crash due
to `get_peer_info()`'s impl never going to work on MacOS (and it was
never intended to).

This patch instead solves the underlying issue by implementing a new
`get_peer_pid()` helper which does in fact retrieve the peer's PID in
a more generic/cross-platform way (:fingers_crossed:); much thanks to
the linked SO answer for this solution!

Impl deats,
- add `get_peer_pid()` and call it from
  `MsgpackUDSStream.get_stream_addrs()` when we detect a non-'linux'
  platform, OW use the original soln: `get_stream_addrs()`.
- add a new case for the `match (peername, sockname)` with a
  `case (str(), str()):` which seems to at least work on macos.
- drop all the `LOCAL_PEERCRED` dynamic import branching since it was
  never needed and was never going to work.
2026-03-08 19:17:16 -04:00
Gud Boi d8a3969048 Also shorten shm-key for `ShmList` on macos
Same problem as for the `ShmArray` tokens, so tweak and reuse
the `_shorten_key_for_macos()` helper and call it from
`open_shm_list()` similarly.

Some tweaks/updates to the various helpers,
- support `prefix/suffix` inputs and if provided take their lengths and
  subtract them from the known *macOS shm_open() has a 31 char limit
  (PSHMNAMLEN)* when generating and using the `hashlib.sha256()` value
  which overrides (for now..) wtv `key` is passed by the caller.
- pass the appropriate `suffix='_first/_last'` values for the `ShmArray`
  token generators case.
- add a `prefix: str = 'shml_'` param to `open_shm_list()`.
- better log formatting with `!r` to report any key shortening.
2026-03-08 19:17:16 -04:00
Gud Boi 01c0db651a Port macOS shm 31-char name limit hack from `piker`
Adapt the `PSHMNAMLEN` fix from `piker.data._sharedmem` (orig commit
96fb79ec thx @dnks!) to `tractor.ipc._shm` accounting for the
module-local differences:

- Add `hashlib` import for sha256 key hashing
- Add `key: str|None` field to `NDToken` for storing
  the original descriptive key separate from the
  (possibly shortened) OS-level `shm_name`
- Add `__eq__()`/`__hash__()` to `NDToken` excluding
  the `key` field from identity comparison
- Add `_shorten_key_for_macos()` using `t_` prefix
  (vs piker's `p_`) with 16 hex chars of sha256
- Use `platform.system() == 'Darwin'` in `_make_token()`
  (tractor already imports the `platform` module vs
  piker's `sys.platform`)
- Wrap `shm_unlink()` in `ShmArray.destroy()` with
  `try/except FileNotFoundError` for teardown races
  (was already done in `SharedInt.destroy()`)
- Move token creation before `SharedMemory()` alloc in
  `open_shm_ndarray()` so `token.shm_name` is used
  as the OS-level name
- Use `lookup_key` pattern in `attach_shm_ndarray()`
  to decouple `_known_tokens` dict key from OS name

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-03-08 19:17:16 -04:00
Gud Boi 7bcd7aca2b Reorg `socket` conditional imports a bit
Move the multi-platorm-supporting conditional/dynamic `socket` constant
imports to *after* the main cross-platform ones.
Also add constant typing and reformat comments a bit for the macOS case.
2026-03-08 19:17:16 -04:00
Gud Boi 920d0043b4 Force parent subdirs for macos 2026-03-08 19:17:16 -04:00
wygud 93b9a6cd97 Add macOS compatibility for Unix socket credential passing
Make socket credential imports platform-conditional in `.ipc._uds`.
- Linux: use `SO_PASSCRED`/`SO_PEERCRED` from socket module
- macOS: use `LOCAL_PEERCRED` (0x0001) instead, no need for `SO_PASSCRED`
- Conditionally call `setsockopt(SO_PASSCRED)` only on Linux

Fixes AttributeError on macOS where SO_PASSCRED doesn't exist.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-08 19:17:16 -04:00
Tyler Goodlet e7cefba67f Use `platformdirs` for `.config.get_rt_dir()`
Thanks to the `tox`-dev community for such a lovely pkg which seems to
solves all the current cross-platform user-dir problems B)

Also this,
- now passes `platformdirs.user_runtime_dir(appname='tractor')`
  and allows caller to pass an optional `subdir` under `tractor/`
  if desired.
- drops the `.config._rtdir: Path` mod var.
- bumps the lock file with the new dep.
2026-03-08 19:16:49 -04:00
Gud Boi 916f88a070 Less newlines in `._rpc` log msg 2026-02-19 16:31:54 -05:00
Tyler Goodlet 3e5124e184 Hide `._rpc._invoke()` frame, again.. 2026-02-19 16:28:22 -05:00
Gud Boi fa86269e30 Stuff from auto-review in https://github.com/goodboy/tractor/pull/412 .. 2026-02-19 16:20:21 -05:00
Gud Boi d0b92bbeba Clean up `._transport` error-case comment
Expand and clarify the comment for the default `case _`
block in the `.send()` error matcher, noting that we
console-error and raise-thru for unexpected disconnect
conditions.

(this patch was suggested by copilot in,
 https://github.com/goodboy/tractor/pull/411)

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-19 16:18:39 -05:00
Gud Boi 9470815f5a Fix `spawn` fixture cleanup + test assertions
Improve the `spawn` fixture teardown logic in
`tests/devx/conftest.py` fixing the while-else bug, and fix
`test_advanced_faults` genexp for `TransportClosed` exc type
checking.

Deats,
- replace broken `while-else` pattern with direct
  `if ptyproc.isalive()` check after the SIGINT loop.
- fix undefined `spawned` ref -> `ptyproc.isalive()` in
  while condition.
- improve walrus expr formatting in timeout check (multiline
  style).

Also fix `test_ipc_channel_break_during_stream()` assertion,
- wrap genexp in `all()` call so it actually checks all excs
  are `TransportClosed` instead of just creating an unused
  generator.

(this patch was suggested by copilot in,
 https://github.com/goodboy/tractor/pull/411)

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-19 16:14:11 -05:00
Gud Boi 0cddc67bdb Add doc-strs to `get_root()` + `maybe_open_portal()`
Brief descriptions for both fns in `._discovery` clarifying
what each delivers and under what conditions.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-19 13:55:02 -05:00
Gud Boi 052fe2435f Improve `Channel` doc-strs + minor cleanups
Flesh out missing method doc-strings, improve log msg formatting and
assert -> `RuntimeError` for un-inited tpt layer.

Deats,
- add doc-string to `.send()` noting `TransportClosed` raise
  on comms failures.
- add doc-string to `.recv()`.
- expand `._aiter_msgs()` doc-string, line-len reflow.
- add doc-string to `.connected()`.
- convert `assert self._transport` -> `RuntimeError` raise
  in `._aiter_msgs()` for more explicit crashing.
- expand `_connect_chan()` doc-string, note it's lowlevel
  and suggest `.open_portal()` to user instead.
- factor out `src_exc_str` in `TransportClosed` log handler
  to avoid double-call
- use multiline style for `.connected()` return expr.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-19 13:55:02 -05:00
Gud Boi 28819bf5d3 Add `Actor.is_root()` convenience predicate meth 2026-02-19 13:55:02 -05:00
Gud Boi 07c2ba5c0d Drop `trio`-exc-catching if tpt-closed covers them
Remove the `trio.ClosedResourceError` and `trio.BrokenResourceError`
handling that should now be subsumed by `TransportClosed` re-raising out
of the `.ipc` stack.

Deats,
- drop CRE and BRE from `._streaming.MsgStream.aclose()/.send()` blocks.
- similarly rm from `._context.open_context_from_portal()`.
- also from `._portal.Portal.cancel_actor()` and drop the
  (now-completed-todo) comment about this exact thing.

Also add comment in `._rpc.try_ship_error_to_remote()` noting the
remaining `trio` catches there are bc the `.ipc` layers *should* be
wrapping them; thus `log.critical()` use is warranted.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-19 13:55:02 -05:00
Gud Boi 50f40f427b Include `TransportClosed` in tpt-layer err handling
Add `TransportClosed` to except clauses where `trio`'s own
resource-closed errors are already caught, ensuring our
higher-level tpt exc is also tolerated in those same spots.
Likely i will follow up with a removal of the `trio` variants since most
*should be* caught and re-raised as tpt-closed out of the `.ipc` stack
now?

Add `TransportClosed` to various handler blocks,
- `._streaming.MsgStream.aclose()/.send()` except blocks.
- the broken-channel except in `._context.open_context_from_portal()`.
- obvi import it where necessary in those ^ mods.

Adjust `test_advanced_faults` suite + exs-script to match,
- update `ipc_failure_during_stream.py` example to catch
  `TransportClosed` alongside `trio.ClosedResourceError`
  in both the break and send-check paths.
- shield the `trio.sleep(0.01)` after tpt close in example to avoid
  taskc-raise/masking on that checkpoint since we want to simulate
  waiting for a user to send a KBI.
- loosen `ExceptionGroup` assertion to `len(excs) <= 2` and ensure all
  excs are `TransportClosed`.
- improve multi-line formatting, minor style/formatting fixes in
  condition expressions.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-19 13:55:02 -05:00
Gud Boi bf6de55865 Improve tpt-closed msg-fmt/content and CRE case matching
Refine tpt-error reporting to include closure attribution (`'locally'`
vs `'by peer'`), tighten match conditions and reduce needless newlines
in exc reprs.

Deats,
- factor out `trans_err_msg: str` and `by_whom: str` into a `dict`
  lookup before the `match:` block to pair specific err msgs to closure
  attribution strings.
- use `by_whom` directly as `CRE` case guard condition
  (truthy when msg matches known underlying CRE msg content).
- conveniently include `by_whom!r` in `TransportClosed` message.
- fix `'locally ?'` -> `'locally?'` in send-side `CRE`
  handler (drop errant space).
- add masked `maybe_pause_bp()` calls at both `CRE` sites (from when
  i was tracing a test harness issue where the UDS socket path wasn't
  being cleaned up on teardown).
- drop trailing `\n` from `body=` args to `TransportClosed`.
- reuse `trans_err_msg` for the `BRE`/broken-pipe guard.

Also adjust testing, namely `test_ctxep_pauses_n_maybe_ipc_breaks`'s
expected patts-set for new msg formats to be raised out of
`.ipc._transport`.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-19 13:55:02 -05:00
Gud Boi 5ded99a886 Add a `._trace.maybe_pause_bp()` for tpt-broken cases
Internal helper which falls back to sync `pdb` when the
child actor can't reach root to acquire the TTY lock.

Useful when debugging tpt layer failures (intentional or
otherwise) where a sub-actor can no longer IPC-contact the
root to coordinate REPL access; root uses `.pause()` as
normal while non-root falls back to `mk_pdb().set_trace()`.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-19 13:55:02 -05:00
Gud Boi f8e25688c7 Unmask `ClosedResourceError` handling in `._transport`
Unmask the CRE case block for peer-closed socket errors which already
had a TODO about reproducing the condition. It appears this case can
happen during inter-actor comms teardowns in `piker`, but i haven't been
able to figure out exactly what reproduces it yet..

So activate the block again for that 'socket already closed'-msg case,
and add a TODO questioning how to reproduce it.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-12 00:51:50 -05:00
Tyler Goodlet c3f455a8ec Mask tpt-closed handling of `chan.send(return_msg)`
A partial revert of commit c05d08e426 since it seem we already
suppress tpt-closed errors lower down in `.ipc.Channel.send()`; given
that i'm pretty sure this new handler code should basically never run?

Left in a todo to remove the masked content once i'm done more
thoroughly testing under `piker`.
2026-02-12 00:51:50 -05:00
Tyler Goodlet f78e842fba More `TransportClosed`-handling around IPC-IO
For IPC-disconnects-during-teardown edge cases, augment some `._rpc`
machinery,
- in `._invoke()` around the `await chan.send(return_msg)` where we
  suppress if the underlying `Channel` already disconnected.
- add a disjoint handler in `_errors_relayed_via_ipc()` which just
  reports-n-reraises the exc (same as prior behaviour).
  * originally i thought it needed to be handled specially (to avoid
    being crash handled) but turns out that isn't necessary?
  * hence the also-added-bu-masked-out `debug_filter` / guard expression
    around the `await debug._maybe_enter_pm()` line.
- show the `._invoke()` frame for the moment.
2026-02-12 00:51:50 -05:00
Gud Boi 2ed9e65530 Clear rtvs state on root shutdown..
Fixes the bug discovered in last test update, not sure how this wasn't
caught already XD
2026-02-11 22:17:26 -05:00
Gud Boi 8aee24e83f Fix when root-actor addrs is set as rtvs
Move `_root_addrs` assignment to after `async_main()` unblocks (via
`.started()`) which now delivers the bind addrs , ensuring correct
`UnwrappedAddress` propagation into `._state._runtime_vars` for
non-registar root actors..

Previously for non-registrar root actors the `._state._runtime_vars`
entries were being set as `Address` values which ofc IPC serialize
incorrectly rn vs. the unwrapped versions, (well until we add a msgspec
for their structs anyway) and thus are passed in incorrect form to
children/subactors during spawning..

This fixes the issue by waiting for the `.ipc.*` stack to
bind-and-resolve any randomly allocated addrs (by the OS) until after
the initial `Actor` startup is complete.

Deats,
- primarily, mv `_root_addrs` assignment from before `root_tn.start()`
  to after, using started(-ed) `accept_addrs` now delivered from
  `._runtime.async_main()`..
- update `task_status` type hints to match.
- unpack and set the `(accept_addrs, reg_addrs)` tuple from
  `root_tn.start()` call into `._state._runtime_vars` entries.
- improve and embolden comments distinguishing registrar vs non-registrar
  init paths, ensure typing reflects wrapped vs. unwrapped addrs.

Also,
- add a masked `mk_pdb().set_trace()` for debugging `raddrs` values
  being "off".
- add TODO about using UDS on linux for root mailbox
- rename `trans_bind_addrs` -> `tpt_bind_addrs` for clarity.
- expand comment about random port allocation for
  non-registrar case

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-11 22:17:26 -05:00
Gud Boi 3f0bde1bf8 Use bare `get_logger()` in `.to_asyncio` 2026-02-11 22:02:41 -05:00
Gud Boi fa1a15dce8 Cleaups per copilot PR review 2026-02-11 21:51:40 -05:00
Gud Boi ff02939213 Toss in some `colorlog` alts to try 2026-02-11 21:05:16 -05:00
Gud Boi 0b0c83e9da Drop `name=__name__` from all `get_logger()` calls
Use new implicit module-name detection throughout codebase to simplify
logger creation and leverage auto-naming from caller mod .

Main changes,
- drop `name=__name__` arg from all `get_logger()` calls
  (across 29 modules).
- update `get_console_log()` calls to include `name='tractor'` for
  enabling root logger in test harness and entry points; this ensures
  logic in `get_logger()` triggers so that **all** `tractor`-internal
  logging emits to console.
- add info log msg in test `conftest.py` showing test-harness
  log level

Also,
- fix `.actor.uid` ref to `.actor.aid.uid` in `._trace`.
- adjust a `._context` log msg formatting for clarity.
- add TODO comments in `._addr`, `._uds` for when we mv to
  using `multiaddr`.
- add todo for `RuntimeVars` type hint TODO in `.msg.types` (once we
  eventually get that all going obvi!)

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-11 21:04:49 -05:00
Gud Boi 5e7c0f264d Rework `.get_logger()`, better autonaming, deduping
Overhaul of the automatic-calling-module-name detection and sub-log
creation logic to avoid (at least warn) on duplication(s) and still
handle the common usage of a call with `name=__name__` from a mod's top
level scope.

Main `get_logger()` changes,
- refactor auto-naming logic for implicit `name=None` case such that we
  handle at least `tractor` internal "bare" calls from internal submods.
- factor out the `get_caller_mod()` closure (still inside
  `get_logger()`)for introspecting caller's module with configurable
  frame depth.
- use `.removeprefix()` instead of `.lstrip()` for stripping pkg-name
  from mod paths
- mv root-logger creation before sub-logger name processing
- improve duplicate detection for `pkg_name` in `name`
- add `_strict_debug=True`-only-emitted warnings for duplicate
  pkg/leaf-mod names.
- use `print()` fallback for warnings when no actor runtime is up at
  call time.

Surrounding tweaks,
- add `.level` property to `StackLevelAdapter` for getting
  current emit level as lowercase `str`.
- mv `_proj_name` def to just above `get_logger()`
- use `_curr_actor_no_exc` partial in `_conc_name_getters`
  to avoid runtime errors
- improve comments/doc-strings throughout
- keep some masked `breakpoint()` calls for future debugging

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-11 21:04:29 -05:00
Tyler Goodlet de24bfe052 Mv `load_module_from_path()` to a new `._code_load` submod 2026-02-11 21:03:29 -05:00
Tyler Goodlet dea4b9fd93 Implicitly name sub-logs by caller's mod
That is when no `name` is passed to `get_logger()`, try to introspect
the caller's `module.__name__` and use it to infer/get the "namespace
path" to that module the same as if using `name=__name__` as in the most
common usage.

Further, change the `_root_name` to be `pkg_name: str`, a public and
more obvious param name, and deprecate the former. This obviously adds
the necessary impl to make the new
`test_sys_log::test_implicit_mod_name_applied_for_child` test pass.

Impl detalles for `get_logger()`,
- add `pkg_name` and deprecate `_root_name`, include failover logic
  and a warning.
- implement calling module introspection using
  `inspect.stack()/getmodule()` to get both the `.__name__` and
  `.__package__` info alongside adjusted logic to set the `name`
  when not provided but only when a new `mk_sublog: bool` is set.
- tweak the `name` processing for implicitly set case,
  - rename `sub_name` -> `pkg_path: str` which is the path
    to the calling module minus that module's name component.
  - only partition `name` if `pkg_name` is `in` it.
  - use the `_root_log` for `pkg_name` duplication warnings.

Other/related,
- add types to various public mod vars missing them.
- rename `.log.log` -> `.log._root_log`.
2026-02-11 21:03:07 -05:00
Tyler Goodlet 0e3229f16d Start a logging-sys unit-test module
To start ensuring that when `name=__name__` is passed we try to
de-duplicate the `_root_name` and any `leaf_mod: str` since it's already
included in the headers as `{filename}`.

Deats,
- heavily document the de-duplication `str.partition()`s in
  `.log.get_logger()` and provide the end fix by changing the predicate,
  `if rname == 'tractor':` -> `if rname == _root_name`.
  * also toss in some warnings for when we still detect duplicates.
- add todo comments around logging "filters" (vs. our "adapter").
- create the new `test_log_sys.test_root_pkg_not_duplicated()` which
  runs green with the fixes from ^.
- add a ton of test-suite todos both for existing and anticipated
  logging sys feats in the new mod.
2026-02-11 21:03:07 -05:00
Tyler Goodlet 0c6d512ba4 Solve another OoB cancellation case, the bg task one
Such that we are able to (finally) detect when we should
`Context._scope.cancel()` specifically when the `.parent_task` is
**not** blocking on receiving from the underlying `._rx_chan`, since if
the task is blocking on `.receive()` it will call `.cancel()`
implicitly.

This is a lot to explain with very little code actually needed for the
implementation (are we like `trio` yet anyone?? XD) but the main jist is
that `Context._maybe_cancel_and_set_remote_error()` needed the
additional case of calling `._scope.cancel()` whenever we know that
a remote-error/ctxc won't be immediately handled, bc user code is doing
non `Context`-API things, and result in a similar outcome as if that
task was waiting on `Context.wait_for_result()` or `.__aexite__()`.

Impl details,
- add a new `._is_blocked_on_rx_chan()` method which predicates whether
  the (new) `.parent_task` is blocking on `._rx_chan.receive()`.
  * see various stipulations about the current impl and how we might
    need to adjust for the future given `trio`'s commitment to the
    `Task.custom_sleep_data` attr..
- add `.parent_task`, a pub wrapper for `._task`.
- check for `not ._is_blocked_on_rx_chan()` before manually cancelling
  the local `.parent_task`
- minimize the surrounding branch case expressions.

Other,
- tweak a couple logs.
- add a new `.cancel()` pre-started msg.
- mask the `.cancel_called` setter, it's only (been) used for tracing.
- todos around maybe moving the `._nursery` allocation "around" the
  `.start_remote_task()` call and various subsequent tweaks therein.
2025-09-11 13:12:52 -04:00
Tyler Goodlet fc130d06b8 Check off REPL-ing todo add masked usage in `drain_to_final_msg()` 2025-09-11 10:13:04 -04:00
Tyler Goodlet b1f2a6b394 Rename var for and hide the `_open_and_supervise_one_cancels_all_nursery` frame 2025-09-11 10:13:04 -04:00
Tyler Goodlet 62a364a1d3 Tweaks from copilot, type fix, typos, language. 2025-09-11 10:01:25 -04:00
Tyler Goodlet 9aebe7d8f9 Only read `_mask_cases` if truthy, allow disabling for xfails 2025-09-05 22:23:51 -04:00
Tyler Goodlet e9f3689191 Add "ignore-case-handling" to exc unmasker
Since it turns out there's even case(s) in `trio` core that are guilty
(of implementing things like checkpoints in exc handlers), this adds
facility for ignoring explicit cases via `inspect.FrameInfo` field
matching from the unmasked `exc_ctx` within
`maybe_raise_from_masking_exc()`.

Impl deats,
- use `inspect.getinnerframes()/getmodule()` to extract the equivalent
  "guilty place in code" which raised the masked error which we'd like
  to ignore and **not unmask**.
- start a `_mask_cases: dict` which describes the entries to ignore
  by matching against a specific `FrameInfo`'s fields from indexed
  from `getinnerframes()`.
- describe in that table the case i hit with `trio.WouldBlock` being
  always masked by a `Cancelled` due to way `trio.Lock.acquire()`
  implements the blocking case in the would-block handler..
- always call into a new `is_expected_masking_case()` predicate (from
  `maybe_raise_from_masking_exc()`) on matching `exc_ctx` types.
2025-09-05 14:54:54 -04:00
Tyler Goodlet 93aa39db07 Always pop `._Cache.resources` AFTER `mng.__aexit__()`
The correct ordering is to de-alloc the surrounding `service_n`
+ `trio.Event` **after** the `mng` teardown ensuring the
`mng.__aexit__()` never can hit a ref-error if it touches either (like
if a `tn` is passed to `maybe_open_context()`!
2025-09-05 14:54:41 -04:00
Tyler Goodlet 5ab642bdf0 Drop more `typing.Optional` usage 2025-08-20 12:45:49 -04:00
Tyler Goodlet ed18ecd064 Drop `tn` arg to `maybe_raise_from_masking_exc()` in `._rpc` 2025-08-20 12:45:49 -04:00
Tyler Goodlet cec0282953 Add `never_warn_on: dict` support to unmasker
Such that key->value pairs can be defined which should *never be*
unmasked where values of
- the keys are exc-types which might be masked, and
- the values are exc-types which masked the equivalent key.

For example, the default includes:
- KBI->taskc: a kbi should never be unmasked from its masking
  `trio.Cancelled`.

For the impl, a new `do_warn: bool` in the fn-body determines the
primary guard for whether a warning or re-raising is necessary.
2025-08-20 12:45:49 -04:00
Tyler Goodlet 25c5847f2e Drop `tn` input from `maybe_raise_from_masking_exc()`
Including all caller usage throughout. Moving to a non-`except*` impl
means it's never needed as a signal from the caller - we can just catch
the beg outright (like we should have always been doing)..
2025-08-20 12:45:49 -04:00
Tyler Goodlet ba793fadd9 Pass `tuple` from `._invoke()` unmasker usage
To match the `maybe_raise_from_masking_exc()` sig change.
2025-08-20 12:45:49 -04:00
Tyler Goodlet 6c361a9564 Drop `except*` usage from `._taskc` unmasker
That is from `maybe_raise_from_masking_exc()` thus minimizing us to
a single `except BaseException` block with logic branching for the beg
vs. `unmask_from` exc cases.

Also,
- raise val-err when `unmask_from` is not a `tuple`.
- tweak the exc-note warning format.
- drop all pausing from dev work.
2025-08-20 12:45:49 -04:00
Tyler Goodlet 548855b4f5 Comment/docs tweaks per copilot reivew
Add a micro glossary to clarify questioned terms and refine out some
patch specific comment regions.
2025-08-20 12:36:08 -04:00
Tyler Goodlet 5322861d6d Clean out old-commented tn-opens and ipc-server settings checks 2025-08-20 11:35:31 -04:00
Tyler Goodlet 46a2fa7074 Always pass a `tn` to `._server._serve_ipc_eps()`
Turns out we weren't despite the optional `stream_handler_nursery` input
to `Server.listen_on()`; fail over to the `Server._stream_handler_tn`
allocated during server setup in those cases.
2025-08-20 11:30:58 -04:00
Tyler Goodlet bfe5b2dde6 Hide `collapse_eg()` frame as used from `open_root_actor()` 2025-08-20 10:44:42 -04:00
Tyler Goodlet a9f06df3fb Heh, add back `Actor._root_tn`, it has purpose..
Turns out I didn't read my own internals docs/comments and despite it
not being used previously, this adds the real use case: a root,
per-actor, scope which ensures parent comms are the last conc-thing to
be cancelled.

Also, the impl changes here make the test from 6410e45 (or wtv
it's rebased to) pass, i.e. we can support crash handling in the root
actor despite the root-tn having been (self) cancelled.

Superficial adjustments,
- rename `Actor._service_n` -> `._service_tn` everywhere.
- add asserts to `._runtime.async_main()` which ensure that the any
  `.trionics.maybe_open_nursery()` calls against optionally passed
  `._[root/service]_tn` are allocated-if-not-provided (the
  `._service_tn`-case being an i-guess-prep-for-the-future-anti-pattern
  Bp).
- obvi adjust all internal usage to match new naming.

Serious/real-use-case changes,
- add (back) a `Actor._root_tn` which sits a scope "above" the
  service-tn and is either,
  + assigned in `._runtime.async_main()` for sub-actors OR,
  + assigned in `._root.open_root_actor()` for the root actor.
  **THE primary reason** to keep this "upper" tn is that during
  a full-`Actor`-cancellation condition (more details below) we want to
  ensure that the IPC connection with a sub-actor's parent is **the last
  thing to be cancelled**; this is most simply implemented by ensuring
  that the `Actor._parent_chan: .ipc.Channel` is handled in an upper
  scope in `_rpc.process_messages()`-subtask-terms.
- for the root actor this `root_tn` is allocated in `.open_root_actor()`
  body and assigned as such.
- extend `Actor.cancel_soon()` to be cohesive with this entire teardown
  "policy" by scheduling a task in the `._root_tn` which,
  * waits for the `._service_tn` to complete and then,
  * cancels the `._root_tn.cancel_scope`,
  * includes "sclangy" console logging throughout.
2025-08-20 10:18:52 -04:00
Tyler Goodlet 561954594e Add attempt at non-root-parent REPL guarding
I masked it bc it doesn't seem to actually work for the case I was
testing (`emsd` clobbering a `paperboi` in `piker`..) but figured I'd
leave it as a reminder for solving this problem more generally (#320)
since this is likely the place in the code for a soln.

When i tested it in my case it just resulted in a hang around the `with
debug.acquire_debug_lock()` for some reason? Can't remember if the child
ended up being able to REPL without issue though..
2025-08-19 14:15:14 -04:00
Tyler Goodlet 28a6354e81 Set `shield` when `.cancel_called` for root crashes
Such that we handle them despite a cancellation condition. This is
almost always the case, that `root_tn.cancel_scope.cancel_called` is
set, by the time the `debug._maybe_enter_pm()` hits. Previous I guess we
just weren't actually ever REPL-debugging such cases?

TODO, still needs a test obvi!
2025-08-19 14:14:38 -04:00
Tyler Goodlet d1599449e7 Mk `pause_from_sync()` raise `InternalError` on no `greenback` init 2025-08-19 14:14:27 -04:00
Tyler Goodlet 2d27c94dec Hide `_maybe_enter_pm()` frame (again?) 2025-08-19 14:14:27 -04:00
Tyler Goodlet 0fafd25f0d Comment tweaks per copilot review 2025-08-19 12:33:47 -04:00
Tyler Goodlet 961504b657 Support `chan.started_nowait()` in `.open_channel_from()` target
That is the `target` can declare a `chan: LinkedTaskChannel` instead of
`to_trio`/`from_aio`.

To support it,
- change `.started()` -> the more appropriate `.started_nowait()` which
  can be called sync from the aio child task.
- adjust the `provide_channels` assert to accept either fn sig
  declaration (for now).

Still needs test(s) obvi..
2025-08-18 22:32:51 -04:00
Tyler Goodlet bd148300c5 Relay `asyncio` errors via EoC and raise from rent
Makes the newly added `test_aio_side_raises_before_started` test pass by
ensuring errors raised by any `.to_asyncio.open_channel_from()` spawned
child-`asyncio.Task` are relayed by any caught `trio.EndOfChannel` by
checking for a new `LinkedTaskChannel._closed_by_aio_task: bool`.

Impl deats,
- obvi add `LinkedTaskChannel._closed_by_aio_task: bool = False`
- in `translate_aio_errors()` always check for the new flag on EOC
  conditions and in such cases set `chan._trio_to_raise = aio_err` such
  that the `trio`-parent-task always raises the child's exception
  directly, OW keep original EoC passthrough in place.
- include *very* detailed per-case comments around the extended handler.
- adjust re-raising logic with a new `raise_from` where we only give the
  `aio_err` priority if it's not already set as to `trio_to_raise`.

Also,
- hide the `_run_asyncio_task()` frame by def.
2025-08-18 22:32:51 -04:00
Tyler Goodlet 5c7d930a9a Drop unused `Actor._root_n`.. 2025-08-18 22:16:03 -04:00
Tyler Goodlet c46986504d Switch nursery to `CancelScope`-status properties
Been meaning to do this forever and a recent test hang finally drove me
to it Bp

Like it sounds, adopt the "cancel-status" properties on `ActorNursery`
use already on our `Context` and derived from `trio.CancelScope`:

- add new private `._cancel_called` (set in the head of `.cancel()`)
  & `._cancelled_caught` (set in the tail) instance vars with matching
  read-only `@properties`.

- drop the instance-var and instead delegate a `.cancelled: bool`
  property to `._cancel_called` and add a usage deprecation warning
  (since removing it breaks a buncha tests).
2025-08-18 22:16:03 -04:00
Tyler Goodlet e05a4d3cac Enforce named-args only to `.open_nursery()` 2025-08-18 22:16:03 -04:00
Tyler Goodlet 5021514a6a Disable shm resource tracker via flag on 3.13+
As per the newly added support,
https://docs.python.org/3/library/multiprocessing.shared_memory.html
2025-08-18 22:04:40 -04:00
Tyler Goodlet 331921f612 Hmm disable CRE case for now, causes test fails
So i need to either adjust the tests or figure out if/why this is needed
to avoid the crashing in `pikerd` i found when killin the chart during
a long backfill with `binance` backend..
2025-08-18 21:30:48 -04:00
Tyler Goodlet df0d00abf4 Translate CRE's due to socket-close to tpt-closed
Just like in the BRE case (for UDS) it seems when a peer closes the
(UDS?) socket `trio` instead raises a `ClosedResourceError` which we now
catch and re-raise as a `TransportClosed`. This again results in
`tpt.send()` calls from the rpc-runtime **not** raising when it's known
that the IPC channel is disconnected.
2025-08-18 21:30:48 -04:00
Tyler Goodlet a72d1e6c48 Multi-line-style up the UDS fast-connect handler
Shift around comments and expressions for better reading, assign
`tpt_closed` for easier introspection from REPL during debug oh and fix
the `MsgpackTransport.pformat()` to render '|_peers: 1' .. XD
2025-08-18 21:30:48 -04:00
Tyler Goodlet 5931c59aef Log "out-of-layer" cancellation in `._rpc._invoke()`
Similar to what was just changed for `Context.repr_state`, when the
child task is cancelled but by a different "layer" of the runtime (i.e.
a `Portal.cancel_actor()` / `SIGINT`-to-process canceller) we don't
dump a traceback instead just `log.cancel()` emit.
2025-08-18 21:30:48 -04:00
Tyler Goodlet ba08052ddf Handle "out-of-layer" remote `Context` cancellation
Such that if the local task hasn't resolved but is `trio.Cancelled` and
a `.canceller` was set, we report a `'actor-cancelled'` from
`.repr_state: str`. Bit of formatting to avoid needless newlines too!
2025-08-18 21:30:48 -04:00
Tyler Goodlet 00112edd58 UDS: implicitly create `Address.bindspace: Path`
Since it's merely a local-file-sys subdirectory and there should be no
reason file creation conflicts with other bind spaces.

Also add 2 test suites to match,
- `tests/ipc/test_each_tpt::test_uds_bindspace_created_implicitly` to
  verify the dir creation when DNE.
- `..test_uds_double_listen_raises_connerr` to ensure a double bind
  raises a `ConnectionError` from the src `OSError`.
2025-08-18 21:30:48 -04:00
Tyler Goodlet 1d706bddda Rm `assert` from `Channel.from_addr()`, for UDS we re-created to extract the peer PID 2025-08-18 21:30:48 -04:00
Tyler Goodlet 3c30c559d5 `ipc._uds`: assign `.l/raddr` in `.connect_to()`
Using `.get_stream_addrs()` such that we always (*can*) assign the peer
end's PID in the `._raddr`.

Also factor common `ConnectionError` re-raising into
a `_reraise_as_connerr()`-@cm.
2025-08-18 21:30:48 -04:00
Tyler Goodlet 599020c2c5 Rename all lingering ctx-side bits
As before but more thoroughly in comments and var names finally changing
all,
- caller -> parent
- callee -> child
2025-08-18 21:30:48 -04:00
Tyler Goodlet 50f6543ee7 Add `Channel.closed/.cancel_called`
I.e. the public properties for the private instance var equivs; improves
expected introspection usage.
2025-08-18 21:30:48 -04:00
Tyler Goodlet c0854fd221 Set `Channel._cancel_called` via `chan` var
In `Portal.cancel_actor()` that is, at the least to make it easier to
ref search from an editor Bp
2025-08-18 21:30:48 -04:00
Tyler Goodlet e875b62869 Add `.ipc._shm` todo-idea for `@actor_fixture` API 2025-08-18 21:30:48 -04:00
Tyler Goodlet 3ab7498893 Add todo for py3.13+ `.shared_memory`'s new `track=False` support.. finally they added it XD 2025-08-18 21:30:48 -04:00
Tyler Goodlet 4e252526b5 Accept `tn` to `gather_contexts()/maybe_open_context()`
Such that the caller can be responsible for their own (nursery) scoping
as needed and, for the latter fn's case with
a `trio.Nursery.CancelStatus.encloses()` check to ensure the `tn` is
a valid parent-ish.

Some deats,
- in `gather_contexts()`, mv the `try/finally` outside the nursery block
  to ensure we always do the `parent_exit`.
- for `maybe_open_context()` we do a naive task-tree hierarchy audit to
  ensure the provided scope is not *too* child-ish (with what APIs `trio`
  gives us, see above), OW go with the old approach of using the actor's
  private service nursery.
  Also,
  * better report `trio.Cancelled` around the cache-miss `yield`
    cases and ensure we **never** unmask triggering key-errors.
  * report on any stale-state with the mutex in the `finally` block.
2025-08-18 21:07:12 -04:00
Tyler Goodlet f1ff79a4e6 Always `finally` invoke cache-miss `lock.release()`s
Since the `await service_n.start()` on key-err can be cancel-masked
(checkpoint interrupted before `_Cache.run_ctx` completes), we need to
always `lock.release()` in to avoid lock-owner-state corruption and/or
inf-hangs in peer cache-hitting tasks.

Deats,
- add a `try/except/finally` around the key-err triggered cache-miss
  `service_n.start(_Cache.run_ctx, ..)` call, reporting on any taskc
  and always `finally` unlocking.
- fill out some log msg content and use `.debug()` level.
2025-08-18 21:07:12 -04:00
Tyler Goodlet edc2211444 Go multi-line-style tuples in `maybe_enter_context()`
Allows for an inline comment of the first "cache hit" bool element.
2025-08-18 20:55:18 -04:00
Tyler Goodlet e3a542f2b5 Never shield-wait `ipc_server.wait_for_no_more_peers()`
As mentioned in prior testing commit, it can cause the worst kind of
hangs, the SIGINT ignoring kind.. Pretty sure there was never any reason
outside some esoteric multi-actor debugging case, and pretty sure that
already was solved?
2025-08-18 10:46:37 -04:00
Tyler Goodlet dcb1062bb8 Fix cluster suite, chng to new `gather_contexts()`
Namely `test_empty_mngrs_input_raises()` was failing due to
lazy-iterator use as input to `mngrs` which i guess i added support for
a while back (by it doing a `list(mngrs)` internally)? So just change it
to `gather_contexts(mngrs=())` and also tweak the `trio.fail_after(3)`
since it appears that the prior 1sec was causing
too-fast-of-a-cancellation (before the cluster fully spawned) and thus
the expected `ValueError` never to show..

Also, mask the `tractor.trionics.collapse_eg()` usage (again?) in
`open_actor_cluster()` since it seems unnecessary.
2025-08-18 10:46:37 -04:00
Tyler Goodlet 05d865c0f1 WIP tinkering with strict-eg-tns and cluster API
Seems that the way the actor-nursery interacts with the
`.trionics.gather_contexts()` API on cancellation makes our
`.trionics.collapse_eg()` not work as intended?

I need to dig into how `ActorNursery.cancel()` and `.__aexit__()` might
be causing this discrepancy..

Consider this a commit-of-my-index type save for rn.
2025-08-18 10:46:37 -04:00
Tyler Goodlet 8f19f5d3a8 Mk temp collapser bp work outside runtime as well.. 2025-08-18 10:46:37 -04:00
Tyler Goodlet 64c27a914b Add temp breakpoint support to `collapse_eg()` 2025-08-18 10:46:37 -04:00
Tyler Goodlet d9c8d543b3 Suppress beg tbs from `collapse_eg()`
It was originally this way; I forgot to flip it back when discarding the
`except*` handler impl..

Specially handle the `exc.__cause__` case where we raise from any
detected underlying cause and OW `from None` to suppress the eg's tb.
2025-08-18 10:46:37 -04:00
Tyler Goodlet 048b154f00 Rework `collapse_eg()` to NOT use `except*`..
Since it turns out the semantics are basically inverse of normal
`except` (particularly for re-raising) which is hard to get right, and
bc it's a lot easier to just delegate to what `trio` already has behind
the `strict_exception_groups=False` setting, Bp

I added a rant here which will get removed shortly likely, but i think
going forward recommending against use of `except*` is prudent for
anything low level enough in the runtime (like trying to filter begs).

Dirty deats,
- copy `trio._core._run.collapse_exception_group()` to here with only
  a slight mod to remove the notes check and tb concatting for the
  collapse case.
- rename `maybe_collapse_eg()` - > `get_collapsed_eg()` and delegate it
  directly to the former `trio` fn; return `None` when it returns the
  same beg without collapse.
- simplify our own `collapse_eg()` to either raise the collapsed `exc`
  or original `beg`.
2025-08-18 10:46:37 -04:00
Tyler Goodlet 88828e9f99 Couple more `._root` logging tweaks.. 2025-08-18 10:46:37 -04:00
Tyler Goodlet 25ff195c17 Use collapser around `root_tn` in `async_main()`
Replacing yet another loose-eg-flag. Also toss in a todo to maybe use
the unmasker around the `open_root_actor()` body.
2025-08-18 10:46:37 -04:00
Tyler Goodlet f60cc646ff Facepalm, fix `raise from` in `collapse_eg()`
I dunno what exactly I was thinking but we definitely don't want to
**ever** raise from the original exc-group, instead always raise from
any original `.__cause__` to be consistent with the embedded src-error's
context.

Also, adjust `maybe_collapse_eg()` to return `False` in the non-single
`.exceptions` case, again don't know what I was trying to do but this
simplifies caller logic and the prior return-semantic had no real
value..

This fixes some final usage in the runtime (namely top level nursery
usage in `._root`/`._runtime`) which was previously causing test suite
failures prior to this fix.
2025-08-18 10:46:37 -04:00
Tyler Goodlet a2b754b5f5 Just import `._runtime` ns in `._root`; be a bit more explicit 2025-08-18 10:46:37 -04:00
Tyler Goodlet 5e13588aed Use collapse in `._root.open_root_actor()` too
Seems to add one more cancellation suite failure as well as now cause
the discovery test to error instead of fail?
2025-08-18 10:46:37 -04:00
Tyler Goodlet 0a56f40bab Use collapser around root tn in `.async_main()`
Seems to cause the following test suites to fail however..

- 'test_advanced_faults.py::test_ipc_channel_break_during_stream'
- 'test_advanced_faults.py::test_ipc_channel_break_during_stream'
- 'test_clustering.py::test_empty_mngrs_input_raises'

Also tweak some ctxc request logging content.
2025-08-18 10:46:37 -04:00
Tyler Goodlet 7f584d4f54 Switch to strict-eg nurseries almost everywhere
That is just throughout the core library, not the tests yet. Again, we
simply change over to using our (nearly equivalent?)
`.trionics.collapse_eg()` in place of the already deprecated
`strict_exception_groups=False` flag in the following internals,
- the conc-fan-out tn use in `._discovery.find_actor()`.
- `._portal.open_portal()`'s internal tn used to spawn a bg rpc-msg-loop
  task.
- the daemon and "run-in-actor" layered tn pair allocated in
  `._supervise._open_and_supervise_one_cancels_all_nursery()`.

The remaining loose-eg usage in `._root` and `._runtime` seem to be
necessary to keep the test suite green?? For the moment these are left
out.
2025-08-18 10:46:37 -04:00
Tyler Goodlet d650dda0fa Use collapser in rent side of `Context` 2025-08-18 10:46:37 -04:00
Tyler Goodlet f6598e8400 Add some tooling params to `collapse_eg()` 2025-08-18 10:46:37 -04:00
Tyler Goodlet ca427aec7e More prep-to-reduce the `Actor` method-iface
- drop the (never/un)used `.get_chans()`.
- add #TODO for factoring many methods into a new `.rpc`-subsys/pkg
  primitive, like an `RPCMngr/Server` type eventually.
- add todo to maybe mv `.get_parent()` elsewhere?
- move masked `._hard_mofo_kill()` to bottom.
2025-08-16 17:06:23 -04:00
Tyler Goodlet f53aa992af .log: expose `at_least_level()` as `StackLevelAdapter` meth 2025-08-15 17:29:22 -04:00
Tyler Goodlet 69e0afccf0 Use `Address` where possible in (root) actor boot
Namely inside various bootup-sequences in `._root` and `._runtime`
particularly in the root actor to support both better tpt-address
denoting in our logging and as part of clarifying logic around setting
the root's registry addresses which is soon to be much better factored
out of the core and into an explicit subsystem + API.

Some `_root.open_root_actor()` deats,
- set `registry_addrs` to a new `uw_reg_addrs` (uw: unwrapped) to be
  more explicit about wrapped addr types thoughout.
- instead ensure `registry_addrs` are the wrapped types and pass down
  into the root `Actor` singleton-instance.
- factor the root-actor check + rt-vars update (updating the `'_root_addrs'`)
  out of `._runtime.async_main()` into this fn.
- as previous, set `trans_bind_addrs = uw_reg_addrs` in unwrapped form since it will
  be passed down both through rt-vars as `'_root_addrs'` and to
  `._runtim.async_main()` as `accept_addrs` (which is then passed to the
  IPC server).
- adjust/simplify much logging.
- shield the `await actor.cancel(None)  # self cancel` to avoid any
  finally-footguns.
- as mentioned convert the

For `_runtime.async_main()` tweaks,
- expect `registry_addrs: list[Address]|None = None` with appropriate
  unwrapping prior to setting both `.reg_addrs` and the equiv rt-var.
- add a new `.registry_addrs` prop for the wrapped form.
- convert a final loose-eg for the `service_nursery` to use
  `collapse_eg()`.
- simplify teardown report logging.
2025-08-15 17:29:10 -04:00
Tyler Goodlet e275c49b23 Stackscope import fail msg dun need braces.. 2025-08-15 16:34:03 -04:00
Tyler Goodlet 48fbf38c1d Drop duplicated (masked) debugging-`terminate_after`, prolly a rebase slip.. 2025-08-15 16:33:31 -04:00
Tyler Goodlet defd6e28d2 Facepalm, actually use `.log.cancel()`-level to report parent-side taskc.. 2025-08-15 16:31:52 -04:00
Tyler Goodlet 414b0e2bae Update buncha log msg fmting in `.msg._ops`
Mostly just multi-line code styling again: always putting standalone
`'f\n'` on separate LOC so it reads like it renders to console. Oh and
and a level drop to `.runtime()` for rx-msg reports.
2025-08-15 16:30:10 -04:00
Tyler Goodlet d34fb54f7c Update buncha log msg fmting in `._spawn`
Again using `Channel.aid.reprol()`, `.devx.pformat.nest_from_op()` and
 converting to multi-line code style an ' for str-report-contents. Tweak
 some imports to sub-mod level as well.
2025-08-15 16:29:17 -04:00
Tyler Goodlet 5d87f63377 Update buncha log msg fmting in `._portal`
Namely to use `Channel.aid.reprol()` and converting to our newer style
multi-line code style for str-reports.
2025-08-15 16:29:11 -04:00
Tyler Goodlet 8880a80e3e Use `nest_from_op()`/`pretty_struct` in `._rpc`
Again for nicer console logging. Also fix a double `req_chan` arg bug
when passed to `_invoke` in the `self.cancel()` rt-ep; don't update the
`kwargs: dict` just merge in `req_chan` input at call time.
2025-08-15 16:28:46 -04:00
Tyler Goodlet 7be713ee1e Use `nest_from_op()` in actor-nursery shutdown
Including a new one-line `_shutdown_msg: str` which we mod-var-set for
testing usage and some denoising at `.info()` level. Adjust `Actor()`
instantiating input to the new `.registry_addrs` wrapped addrs property.
2025-08-15 16:28:30 -04:00
Tyler Goodlet 4bd8211abb Add #TODO for `._context` to use `.msg.Aid` 2025-08-15 16:24:35 -04:00
Tyler Goodlet a23a98886c Even more `.ipc.*` repr refinements
Mostly adjusting indentation, noise level, and clarity via `.pformat()`
tweaks more general use of `.devx.pformat.nest_from_op()`.

Specific impl deats,
- use `pformat.ppfmt()/`nest_from_op()` more seriously throughout
  `._server`.
- add a `._server.Endpoint.pformat()`.
- add `._server.Server.len_peers()` and `.repr_state()`.
- polish `Server.pformat()`.
- drop some redundant `log.runtime()`s from `._serve_ipc_eps()` instead
  leaving-them-only/putting-them in the caller pub meth.
- `._tcp.start_listener()` log the bound addr, not the input (which may
  be the 0-port.
2025-08-15 16:24:27 -04:00
Tyler Goodlet 31544c862c More `.ipc.Channel`-repr related tweaks
- only generate a repr in `.from_addr()` when log level is >= 'runtime'.
 |_ add a todo about supporting this optimization more generally on our
   adapter.
- fix `Channel.pformat()` to show unknown peer field line fmt correctly.
- add a `Channel.maddr: str` which just delegates directly to the
  `._transport` like other pass-thru property fields.
2025-08-15 16:24:22 -04:00
Tyler Goodlet 7d320c4e1e Mk `Aid` hashable, use pretty-`.__repr__()`
Hash on the `.uuid: str` and delegate verbatim to
`msg.pretty_struct.Struct`'s equiv method.
2025-08-15 16:24:15 -04:00
Tyler Goodlet 38944ad1d2 Drop `actor_info: str` from `._entry` logs 2025-08-15 16:24:06 -04:00
Tyler Goodlet 9260909fe1 Try `nest_from_op()` in some `._rpc` spots
To start trying out,
- using in the `Start`-msg handler-block to repr the msg coming
  *from* a `repr(Channel)` using '<=)` sclang op.
- for a completed RPC task in `_invoke_non_context()`.
- for the msg loop task's termination report.
2025-08-15 16:23:59 -04:00
Tyler Goodlet c00b3c86ea Hide more `Channel._transport` privates for repr
Such as the `MsgTransport.stream` and `.drain` attrs since they're
rarely that important at the chan level. Also start adopting
a `.<attr>=` style for actual attrs of the type versus a `<name>:
` style for meta-field info lines.
2025-08-15 16:23:54 -04:00
Tyler Goodlet 808a336508 Refine `Actor` status iface, use `Aid` throughout
To simplify `.pformat()` output when the new `privates: bool` is unset
(the default) this adds new public attrs to wrap an actor's
cancellation status as well as provide a `.repr_state: str` (similar to
our equiv on `Context`). Rework `.pformat()` to render a much simplified
repr using all these new refinements.

Further, port the `.cancel()` method to use `.msg.types.Aid` for all
internal `requesting_uid` refs (now renamed with `_aid`) and in all
called downstream methods.

New cancel-state iface deats,
- rename `._cancel_called_by_remote` -> `._cancel_called_by` and expect
  it to be set as an `Aid`.
- add `.cancel_complete: bool` which flags whether `.cancel()` ran to
  completion.
- add `.cancel_called: bool` which just wraps `._cancel_called` (and
  which likely will just be dropped since we already have
  `._cancel_called_by`).
- add `.cancel_caller: Aid|None` which wraps `._cancel_called_by`.

In terms of using `Aid` in cancel methods,
- rename vars with `_aid` suffix in `.cancel()` (and wherever else).
- change `.cancel_rpc_tasks()` input param to `req_aid: msgtypes.Aid`.
- do the same for `._cancel_task()` and (for now until we adjust its
  internals as well) use the `Aid.uid` remap property when assigning
  `Context._canceller`.
- adjust all log msg refs to match obvi.
2025-08-15 16:08:53 -04:00
Tyler Goodlet 679d999185 Add flag to toggle private vars in `Channel.pformat()`
Call it `privates: bool` and only show certain internal instance vars
when set in the `repr()` output.
2025-08-15 16:07:39 -04:00
Tyler Goodlet a8428d7de3 Extend `.msg.types.Aid` method interface
Providing the legacy `.uid -> tuple` style id (since still used for the
`Actor._contexts` table) and a `repr-one-line` method `.reprol() -> str`
for rendering a compact unique actor ID summary (useful in
logging/.pformat()s at the least).
2025-08-15 16:07:39 -04:00
Tyler Goodlet e9f2fecd66 Fix `nest_from_op()` call sigs, already changed upstream
In `._runtime/_root` and since the latest fn-signature changes were
already landed onto main branch via the 65b7956: #384-patch.
2025-07-18 00:35:35 -04:00
Tyler Goodlet b5e3fa7370 Use `nest_from_op()` in some runtime logs for actor-state-repring 2025-07-18 00:35:35 -04:00
Tyler Goodlet 1af35f8170 Add back loose-tn in `gather_contexts()`, mk tests green 2025-07-16 18:18:34 -04:00
Tyler Goodlet 4569d11052 Move `.is_multi_cancelled()` to `.trioniics._beg`
Since it's for beg filtering, the current impl should be renamed anyway;
it's not just for filtering cancelled excs.

Deats,
- added a real doc string, links to official eg docs and fixed the
  return typing.
- adjust all internal imports to match.
2025-07-16 15:49:18 -04:00
Tyler Goodlet 6ba76ab700 .trionics: link in `finally`-footgun `trio` GH ish 2025-07-15 07:23:21 -04:00
Tyler Goodlet 734dda35e9 Hide `._rpc._errors_relayed_via_ipc()` frame by def 2025-07-15 07:23:21 -04:00
Tyler Goodlet b7e04525cc Always `Cancelled`-unmask ctx endpoint excs
To resolve the recently added and failing
`test_remote_exc_relay::test_unmasked_remote_exc`: never allow
`trio.Cancelled` to mask an underlying user-code exception, ever.

Our first real-world (runtime internal) use case for the new
`.trionics.maybe_raise_from_masking_exc()` such that the failing
test now passes with an properly relayed remote RTE unmasking B)

Details,
- flip the `Context._scope_nursery` to the default strict-eg behaviour
  and instead stack its outer scope with a `.trionics.collapse_eg()`.
- wrap the inner-most scope (after `msgops.maybe_limit_plds()`) with
  a `maybe_raise_from_masking_exc()` to ensure user-code errors are
  never masked by `trio.Cancelled`s.

Some err-reporting refinement,
- always capture any `scope_err` from the entire block for debug
  purposes; report it in the `finally` block's log.
- always capture any suppressed `maybe_re`, output from
  `ctx.maybe_raise()`, and `log.cancel()` report it.
2025-07-15 07:23:21 -04:00
Tyler Goodlet e1f26f9611 Extend `._taskc.maybe_raise_from_masking_exc()`
To handle captured non-egs (when the now optional `tn` isn't provided)
as well as yield up a `BoxedMaybeException` which contains any detected
and un-masked `exc_ctx` as its `.value`.

Also add some additional tooling,
- a `raise_unmasked: bool` toggle for when the caller just wants to
  report the masked exc and not raise-it-in-place of the masker.
- `extra_note: str` which by default is tuned to the default
  `unmask_from = (trio.Cancelled,)` but which can be used to deliver
  custom exception msg content.
- `always_warn_on: tuple[BaseException]` which will always emit
  a warning log of what would have been the raised-in-place-of
  `ctx_exc`'s msg for special cases where you want to report
  a masking case that might not be otherwise noticed by the runtime
  (cough like a `Cancelled` masking another `Cancelled) but which
  you'd still like to warn the caller about.
- factor out the masked-`ext_ctx` predicate logic into
  a `find_masked_excs()` and also use it for non-eg cases.

Still maybe todo?
- rewrapping multiple masked sub-excs in an eg back into an eg? left in
  #TODOs and a pause-point where applicable.
2025-07-15 07:23:21 -04:00
Tyler Goodlet 63c5b7696a Mv `maybe_raise_from_masking_exc()` to `.trionics`
Factor the `@acm`-closure it out of the
`test_trioisms::test_acm_embedded_nursery_propagates_enter_err` suite
for real use internally.
2025-07-15 07:23:21 -04:00
Tyler Goodlet f5056cdd02 Mk `test_crash_handler_cms` suite go green
Turns out there were some subtle internal bugs discovered by the just
added `tests/devx/test_tooling::test_crash_handler_cms` suite.

So this fixes,

- a mis-ordering around `rt_repl_fixture :=` in the logic of
  `DebugStatus.maybe_enter_repl_fixture()`.

- `.devx.debug._post_mortem._post_mortem()` ensuring we now **always**
  call `DebugStatus.release()`, and thus unwind the exist-stack managing
  the `repl_fixture` exit/teardown, **even in the case** where
  `yield False` is delivered from the user-fixture-fn (meaning
  `dnter_repl=False`) thus triggering an early `return` (as is done in
  the new test suite).
2025-07-14 18:07:50 -04:00
Tyler Goodlet d000642462 Report `enable_stack_on_sig` on `stackscope` import failure 2025-07-14 13:15:07 -04:00
Tyler Goodlet dd69948744 Reapply `.devx.debug` mod-name change to ipc-server lost during rebase.. 2025-07-14 00:00:13 -04:00
Tyler Goodlet 5b69975f81 Drop "
" from tail of `BoxedMaybeException.pformat()`
2025-07-14 00:00:13 -04:00
Tyler Goodlet 6b474743f9 Drop `.to_asyncio`s usage-of-`greenback`-reporting to `log.devx()` 2025-07-14 00:00:13 -04:00
Tyler Goodlet 5ac229244a Disable `greenback` sync fn breakpointing by def
Opting for performance over broad multi-actor "debug-ability" from
sync-function-contexts when `debug_mode=True` is set;
IOW prefer no behind-the-scenes `greenlet` perf impact over being
able to use an actor-safe `breakpoint()` wherever as per,
https://greenback.readthedocs.io/en/latest/principle.html#performance

Adjust the breakpoint restore ex script to match.
2025-07-14 00:00:13 -04:00
Tyler Goodlet 8d506796ec Re-impl as `DebugStatus.maybe_enter_repl_fixture()`
Dropping the `_maybe_open_repl_fixture()` approach and instead using
a `DebugStatus._fixture_stack = ExitStack()` which provides for much
simpler support around both sync and async pausing APIs thanks to only
invoking `repl_fixture.__exit__()` on actual `PdbREPL` interaction being
complete!

Deats,
- all `repl_fixture` detection logic still happens in one place (the new
  method) but we aren't limited to closing it via an immediate post REPL
  `.__exit__()` call which instead is triggered by,
- `DebugStatus.release()` which now calls `._fixture_stack.close()` and
  thus only invokes `repl_fixture.__exit__()` when user REPL-ing is
  **actually complete** an arbitrary amount of debugging time later.
- include the notes for `@acm` support above the new method, though not
  sure if they're as relevant any more?

Benefits,
- we can drop the previously added indent levels from
  `_enter_repl_sync()` and `_post_mortem()`.
- now we automatically have support for the `.pause_from_sync()` API
  since `_enter_repl_sync()` doesn't close the prior
  `_maybe_open_repl_fixture()` immediately when `debug_func=None`; the
  user's `__exit__()` is only ever called once `.release()` is.

Other,
- add big 'CASE' comments around the various blocks in
  `.pause_from_sync()`, i was having trouble figuring out which i was
  using from a `breakpoint()` in a dependent app..
2025-07-14 00:00:12 -04:00