Compare commits

...

52 Commits

Author SHA1 Message Date
Tyler Goodlet 7e49ac678b WIP, "revertible" or "dynamic" multicast streams
TODO, write up the deats, prolly by distilling (todo) notes from
`tests/test_resource_cache.py::test_open_local_sub_to_stream` comments!
2025-07-15 22:15:19 -04:00
Tyler Goodlet 7a075494f1 Well then, I guess it just needed, a checkpoint XD
Here I was thinking the bcaster (usage) maybe required a rework but,
NOPE it's just bc a checkpoint was needed in the parent task owning the
`tn` which spawns `get_sub_and_pull()` tasks to ensure the bg allocated
`an`/portal is eventually cancel-called..

Ah well, at least i started a patch for `MsgStream.subscribe()` to make
it multicast revertible.. XD

Anyway, I tossed in some checks & notes related to all that unnecessary
effort since I do think i'll move forward implementing it:
- for the `cache_hit` case always verify that the `bcast` clone is
  unregistered from the common state subs after
  `.subscribe().__aexit__()`.
- do a light check that the implicit `MsgStream._broadcaster` is always
  the only bcrx instance left-leaked into that state.. that is until
  i get the proper de-allocation/reversion from multicast -> unicast
  working.
- put in mega detailed note about the required parent-task checkpoint.
2025-07-15 21:59:42 -04:00
Tyler Goodlet c3aa29e7fa TOSQASH 285ebba: woops still use `bcrx._state` for now.. 2025-07-15 19:59:03 -04:00
Tyler Goodlet 9f6acf9ac3 Switch nursery to `CancelScope`-status properties
Been meaning to do this forever and a recent test hang finally drove me
to it Bp

Like it sounds, adopt the "cancel-status" properties on `ActorNursery`
use already on our `Context` and derived from `trio.CancelScope`:

- add new private `._cancel_called` (set in the head of `.cancel()`)
  & `._cancelled_caught` (set in the tail) instance vars with matching
  read-only `@properties`.

- drop the instance-var and instead delegate a `.cancelled: bool`
  property to `._cancel_called` and add a usage deprecation warning
  (since removing it breaks a buncha tests).
2025-07-15 19:29:38 -04:00
Tyler Goodlet 2a69d179e6 Add `Channel.closed/.cancel_called`
I.e. the public properties for the private instance var equivs; improves
expected introspection usage.
2025-07-15 17:32:42 -04:00
Tyler Goodlet c51a49b045 Set `Channel._cancel_called` via `chan` var
In `Portal.cancel_actor()` that is, at the least to make it easier to
ref search from an editor Bp
2025-07-15 17:31:08 -04:00
Tyler Goodlet 6627a3bfda Never shield-wait `ipc_server.wait_for_no_more_peers()`
As mentioned in prior testing commit, it can cause the worst kind of
hangs, the SIGINT ignoring kind.. Pretty sure there was never any reason
outside some esoteric multi-actor debugging case, and pretty sure that
already was solved?
2025-07-15 17:28:48 -04:00
Tyler Goodlet 285ebba4b1 Tool-up `test_resource_cache.test_open_local_sub_to_stream`
Since I recently discovered a very subtle race-case that can sometimes
cause the suite to hang, seemingly due to the `an: ActorNursery`
allocated *behind* the `.trionics.maybe_open_context()` usage; this can
result in never cancelling the 'streamer' subactor despite the `main()`
timeout-guard?

This led me to dig in and find that the underlying issue was 2-fold,

- our `BroadcastReceiver` termination-mgmt semantics in
  `MsgStream.subscribe()` can result in the first subscribing task to
  always keep the `MsgStream._broadcaster` instance allocated; it's
  never `.aclose()`ed, which makes it tough to determine (and thus
  trace) when all subscriber-tasks are actually complete and
  exited-from-`.subscribe()`..

- i was shield waiting `.ipc._server.Server.wait_for_no_more_peers()` in
  `._runtime.async_main()`'s shutdown sequence which would then compound
  the issue resulting in a SIGINT-shielded hang.. the worst kind XD

Actual changes here are just styling, printing, and some mucking with
passing the `an`-ref up to the parent task in the root-actor where i was
doing a conditional `ActorNursery.cancel()` to mk sure that was actually
the problem. Presuming this is fixed the `.pause()` i left unmasked
should never hit.
2025-07-15 16:48:46 -04:00
Tyler Goodlet 20628cc0b8 Go multi-line-style tuples in `maybe_enter_context()`
Allows for an inline comment of the first "cache hit" bool element.
2025-07-15 16:12:08 -04:00
Tyler Goodlet 2536c5b3d2 More prep-to-reduce the `Actor` method-iface
- drop the (never/un)used `.get_chans()`.
- add #TODO for factoring many methods into a new `.rpc`-subsys/pkg
  primitive, like an `RPCMngr/Server` type eventually.
- add todo to maybe mv `.get_parent()` elsewhere?
- move masked `._hard_mofo_kill()` to bottom.
2025-07-15 07:21:11 -04:00
Tyler Goodlet d4ca1a15a5 Add `.ipc._shm` todo-idea for `@actor_fixture` API 2025-07-15 07:21:11 -04:00
Tyler Goodlet 30f5dd1db3 Update buncha log msg fmting in `.msg._ops`
Mostly just multi-line code styling again: always putting standalone
`'f\n'` on separate LOC so it reads like it renders to console. Oh and
and a level drop to `.runtime()` for rx-msg reports.
2025-07-15 07:21:11 -04:00
Tyler Goodlet 7a7f8aff7f Couple more `._root` logging tweaks.. 2025-07-15 07:21:11 -04:00
Tyler Goodlet 90ff9fa7a1 Update buncha log msg fmting in `._spawn`
Again using `Channel.aid.reprol()`, `.devx.pformat.nest_from_op()` and
 converting to multi-line code style an ' for str-report-contents. Tweak
 some imports to sub-mod level as well.
2025-07-15 07:21:11 -04:00
Tyler Goodlet 5b62f0de40 Update buncha log msg fmting in `._portal`
Namely to use `Channel.aid.reprol()` and converting to our newer style
multi-line code style for str-reports.
2025-07-15 07:21:11 -04:00
Tyler Goodlet 0c62a107a8 Use `._supervise._shutdown_msg` in tooling test 2025-07-15 07:21:11 -04:00
Tyler Goodlet 76a00ed2de Use `nest_from_op()`/`pretty_struct` in `._rpc`
Again for nicer console logging. Also fix a double `req_chan` arg bug
when passed to `_invoke` in the `self.cancel()` rt-ep; don't update the
`kwargs: dict` just merge in `req_chan` input at call time.
2025-07-15 07:21:11 -04:00
Tyler Goodlet 88dc62b5a7 Use `nest_from_op()` in actor-nursery shutdown
Including a new one-line `_shutdown_msg: str` which we mod-var-set for
testing usage and some denoising at `.info()` level. Adjust `Actor()`
instantiating input to the new `.registry_addrs` wrapped addrs property.
2025-07-15 07:21:11 -04:00
Tyler Goodlet 07a2015915 Use `Address` where possible in (root) actor boot
Namely inside various bootup-sequences in `._root` and `._runtime`
particularly in the root actor to support both better tpt-address
denoting in our logging and as part of clarifying logic around setting
the root's registry addresses which is soon to be much better factored
out of the core and into an explicit subsystem + API.

Some `_root.open_root_actor()` deats,
- set `registry_addrs` to a new `uw_reg_addrs` (uw: unwrapped) to be
  more explicit about wrapped addr types thoughout.
- instead ensure `registry_addrs` are the wrapped types and pass down
  into the root `Actor` singleton-instance.
- factor the root-actor check + rt-vars update (updating the `'_root_addrs'`)
  out of `._runtime.async_main()` into this fn.
- as previous, set `trans_bind_addrs = uw_reg_addrs` in unwrapped form since it will
  be passed down both through rt-vars as `'_root_addrs'` and to
  `._runtim.async_main()` as `accept_addrs` (which is then passed to the
  IPC server).
- adjust/simplify much logging.
- shield the `await actor.cancel(None)  # self cancel` to avoid any
  finally-footguns.
- as mentioned convert the

For `_runtime.async_main()` tweaks,
- expect `registry_addrs: list[Address]|None = None` with appropriate
  unwrapping prior to setting both `.reg_addrs` and the equiv rt-var.
- add a new `.registry_addrs` prop for the wrapped form.
- convert a final loose-eg for the `service_nursery` to use
  `collapse_eg()`.
- simplify teardown report logging.
2025-07-15 07:21:11 -04:00
Tyler Goodlet aa85d0daa0 Add #TODO for `._context` to use `.msg.Aid` 2025-07-15 07:21:11 -04:00
Tyler Goodlet cfe0ae5fc1 Add todo for py3.13+ `.shared_memory`'s new `track=False` support.. finally they added it XD 2025-07-15 07:21:11 -04:00
Tyler Goodlet 12353a4f21 Even more `.ipc.*` repr refinements
Mostly adjusting indentation, noise level, and clarity via `.pformat()`
tweaks more general use of `.devx.pformat.nest_from_op()`.

Specific impl deats,
- use `pformat.ppfmt()/`nest_from_op()` more seriously throughout
  `._server`.
- add a `._server.Endpoint.pformat()`.
- add `._server.Server.len_peers()` and `.repr_state()`.
- polish `Server.pformat()`.
- drop some redundant `log.runtime()`s from `._serve_ipc_eps()` instead
  leaving-them-only/putting-them in the caller pub meth.
- `._tcp.start_listener()` log the bound addr, not the input (which may
  be the 0-port.
2025-07-15 07:21:11 -04:00
Tyler Goodlet 86e776712d More `.ipc.Channel`-repr related tweaks
- only generate a repr in `.from_addr()` when log level is >= 'runtime'.
 |_ add a todo about supporting this optimization more generally on our
   adapter.
- fix `Channel.pformat()` to show unknown peer field line fmt correctly.
- add a `Channel.maddr: str` which just delegates directly to the
  `._transport` like other pass-thru property fields.
2025-07-15 07:21:11 -04:00
Tyler Goodlet 8472a878fb Mk `Aid` hashable, use pretty-`.__repr__()`
Hash on the `.uuid: str` and delegate verbatim to
`msg.pretty_struct.Struct`'s equiv method.
2025-07-15 07:21:11 -04:00
Tyler Goodlet 70ece35fed .trionics: link in `finally`-footgun `trio` GH ish 2025-07-15 07:21:11 -04:00
Tyler Goodlet 0c7dacfc8c .log: expose `at_least_level()` as `StackLevelAdapter` meth 2025-07-15 07:21:11 -04:00
Tyler Goodlet 90a891f095 Drop `actor_info: str` from `._entry` logs 2025-07-15 07:21:11 -04:00
Tyler Goodlet da02925111 Try `nest_from_op()` in some `._rpc` spots
To start trying out,
- using in the `Start`-msg handler-block to repr the msg coming
  *from* a `repr(Channel)` using '<=)` sclang op.
- for a completed RPC task in `_invoke_non_context()`.
- for the msg loop task's termination report.
2025-07-15 07:21:11 -04:00
Tyler Goodlet 812ec3ed23 Hide more `Channel._transport` privates for repr
Such as the `MsgTransport.stream` and `.drain` attrs since they're
rarely that important at the chan level. Also start adopting
a `.<attr>=` style for actual attrs of the type versus a `<name>:
` style for meta-field info lines.
2025-07-15 07:21:11 -04:00
Tyler Goodlet 7d79b78449 Refine `Actor` status iface, use `Aid` throughout
To simplify `.pformat()` output when the new `privates: bool` is unset
(the default) this adds new public attrs to wrap an actor's
cancellation status as well as provide a `.repr_state: str` (similar to
our equiv on `Context`). Rework `.pformat()` to render a much simplified
repr using all these new refinements.

Further, port the `.cancel()` method to use `.msg.types.Aid` for all
internal `requesting_uid` refs (now renamed with `_aid`) and in all
called downstream methods.

New cancel-state iface deats,
- rename `._cancel_called_by_remote` -> `._cancel_called_by` and expect
  it to be set as an `Aid`.
- add `.cancel_complete: bool` which flags whether `.cancel()` ran to
  completion.
- add `.cancel_called: bool` which just wraps `._cancel_called` (and
  which likely will just be dropped since we already have
  `._cancel_called_by`).
- add `.cancel_caller: Aid|None` which wraps `._cancel_called_by`.

In terms of using `Aid` in cancel methods,
- rename vars with `_aid` suffix in `.cancel()` (and wherever else).
- change `.cancel_rpc_tasks()` input param to `req_aid: msgtypes.Aid`.
- do the same for `._cancel_task()` and (for now until we adjust its
  internals as well) use the `Aid.uid` remap property when assigning
  `Context._canceller`.
- adjust all log msg refs to match obvi.
2025-07-15 07:21:11 -04:00
Tyler Goodlet a2b0a04b39 Add flag to toggle private vars in `Channel.pformat()`
Call it `privates: bool` and only show certain internal instance vars
when set in the `repr()` output.
2025-07-15 07:21:11 -04:00
Tyler Goodlet 8ec13e0370 Extend `.msg.types.Aid` method interface
Providing the legacy `.uid -> tuple` style id (since still used for the
`Actor._contexts` table) and a `repr-one-line` method `.reprol() -> str`
for rendering a compact unique actor ID summary (useful in
logging/.pformat()s at the least).
2025-07-15 07:21:11 -04:00
Tyler Goodlet 6cfc8b8f4a Enforce named-args only to `.open_nursery()` 2025-07-15 07:21:11 -04:00
Tyler Goodlet d67e04d39c Hide `._rpc._errors_relayed_via_ipc()` frame by def 2025-07-15 07:21:11 -04:00
Tyler Goodlet a78eef65fd Facepalm, fix `raise from` in `collapse_eg()`
I dunno what exactly I was thinking but we definitely don't want to
**ever** raise from the original exc-group, instead always raise from
any original `.__cause__` to be consistent with the embedded src-error's
context.

Also, adjust `maybe_collapse_eg()` to return `False` in the non-single
`.exceptions` case, again don't know what I was trying to do but this
simplifies caller logic and the prior return-semantic had no real
value..

This fixes some final usage in the runtime (namely top level nursery
usage in `._root`/`._runtime`) which was previously causing test suite
failures prior to this fix.
2025-07-15 07:20:59 -04:00
Tyler Goodlet 21fde2ee2d Just import `._runtime` ns in `._root`; be a bit more explicit 2025-07-15 07:20:59 -04:00
Tyler Goodlet 34ffb0a78f Use collapse in `._root.open_root_actor()` too
Seems to add one more cancellation suite failure as well as now cause
the discovery test to error instead of fail?
2025-07-15 07:20:59 -04:00
Tyler Goodlet 198725d832 Use collapser around root tn in `.async_main()`
Seems to cause the following test suites to fail however..

- 'test_advanced_faults.py::test_ipc_channel_break_during_stream'
- 'test_advanced_faults.py::test_ipc_channel_break_during_stream'
- 'test_clustering.py::test_empty_mngrs_input_raises'

Also tweak some ctxc request logging content.
2025-07-15 07:20:59 -04:00
Tyler Goodlet 13233e61ab Drop msging-err patt from `subactor_breakpoint` ex
Since the `bdb` module was added to the namespace lookup set in
`._exceptions.get_err_type()` we can now relay a RAE-boxed
`bdb.BdbQuit`.
2025-07-15 07:20:59 -04:00
Tyler Goodlet bd3c8e701b Switch to strict-eg nurseries almost everywhere
That is just throughout the core library, not the tests yet. Again, we
simply change over to using our (nearly equivalent?)
`.trionics.collapse_eg()` in place of the already deprecated
`strict_exception_groups=False` flag in the following internals,
- the conc-fan-out tn use in `._discovery.find_actor()`.
- `._portal.open_portal()`'s internal tn used to spawn a bg rpc-msg-loop
  task.
- the daemon and "run-in-actor" layered tn pair allocated in
  `._supervise._open_and_supervise_one_cancels_all_nursery()`.

The remaining loose-eg usage in `._root` and `._runtime` seem to be
necessary to keep the test suite green?? For the moment these are left
out.
2025-07-15 07:20:59 -04:00
Tyler Goodlet 19a3daa385 Use collapser in rent side of `Context` 2025-07-15 07:20:59 -04:00
Tyler Goodlet fe4f8900e3 Flip to `collapse_eg()` use in `.trionics.gather_contexts()` 2025-07-15 07:20:59 -04:00
Tyler Goodlet 505f0af1bf Always `Cancelled`-unmask ctx endpoint excs
To resolve the recently added and failing
`test_remote_exc_relay::test_unmasked_remote_exc`: never allow
`trio.Cancelled` to mask an underlying user-code exception, ever.

Our first real-world (runtime internal) use case for the new
`.trionics.maybe_raise_from_masking_exc()` such that the failing
test now passes with an properly relayed remote RTE unmasking B)

Details,
- flip the `Context._scope_nursery` to the default strict-eg behaviour
  and instead stack its outer scope with a `.trionics.collapse_eg()`.
- wrap the inner-most scope (after `msgops.maybe_limit_plds()`) with
  a `maybe_raise_from_masking_exc()` to ensure user-code errors are
  never masked by `trio.Cancelled`s.

Some err-reporting refinement,
- always capture any `scope_err` from the entire block for debug
  purposes; report it in the `finally` block's log.
- always capture any suppressed `maybe_re`, output from
  `ctx.maybe_raise()`, and `log.cancel()` report it.
2025-07-15 07:20:59 -04:00
Tyler Goodlet 6e0b1dd17e Adjust ep-masking-suite for the real-use-case
Namely that the more common-and-pertinent case is when
a `@context`-ep-fn contains the `finally`-footgun but without
a surrounding embedded `tn` (which currently still requires its own
scope embedded `trionics.maybe_raise_from_masking_exc()`) which can't
be compensated-for by `._rpc._invoke()` easily. Instead the test is
composed where the `._invoke()`-internal `tn` is the machinery being
addressed in terms of masking user-code excs with `trio.Cancelled`.

Deats,
- rename the test -> `test_unmasked_remote_exc` to reflect what the
  runtime should actually be addressing/solving.
- drop the embedded `tn` from `sleep_n_chkpt_in_finally()` (for now)
  since that case can't currently easily be addressed without the user
  code using its own `trionics.maybe_raise_from_masking_exc()` inside
  the nursery scope.
- as such drop all `tn` related params/logic/usage from the ep.
- add in a `Cancelled` handler block which checks for RTE masking and
  always prints the occurrence loudly.

Follow up,
- obvi this suite will currently fail until the appropriate adjustment
  is made to `._rpc._invoke()` to do the unmasking; coming next.
- we probably still need a case with an embedded user `tn` where if
  the default strict-eg mode is used then a ctxc from the parent might
  cause a non-graceful `Context.cancel()` outcome?
 |_since the embedded user-`tn` will raise
   `ExceptionGroup[trio.Cancelled]` upward despite the parent nursery's
   scope being the canceller, or will a `collapse_eg()` inside the
   `._invoke()` scope handle this as well?
2025-07-15 07:20:59 -04:00
Tyler Goodlet 600249a22a Extend `._taskc.maybe_raise_from_masking_exc()`
To handle captured non-egs (when the now optional `tn` isn't provided)
as well as yield up a `BoxedMaybeException` which contains any detected
and un-masked `exc_ctx` as its `.value`.

Also add some additional tooling,
- a `raise_unmasked: bool` toggle for when the caller just wants to
  report the masked exc and not raise-it-in-place of the masker.
- `extra_note: str` which by default is tuned to the default
  `unmask_from = (trio.Cancelled,)` but which can be used to deliver
  custom exception msg content.
- `always_warn_on: tuple[BaseException]` which will always emit
  a warning log of what would have been the raised-in-place-of
  `ctx_exc`'s msg for special cases where you want to report
  a masking case that might not be otherwise noticed by the runtime
  (cough like a `Cancelled` masking another `Cancelled) but which
  you'd still like to warn the caller about.
- factor out the masked-`ext_ctx` predicate logic into
  a `find_masked_excs()` and also use it for non-eg cases.

Still maybe todo?
- rewrapping multiple masked sub-excs in an eg back into an eg? left in
  #TODOs and a pause-point where applicable.
2025-07-15 07:20:59 -04:00
Tyler Goodlet 20a17066ae Mv `maybe_raise_from_masking_exc()` to `.trionics`
Factor the `@acm`-closure it out of the
`test_trioisms::test_acm_embedded_nursery_propagates_enter_err` suite
for real use internally.
2025-07-15 07:20:59 -04:00
Tyler Goodlet 94580a6ee7 Add ctx-ep suite for `trio`'s *finally-footgun*
Deats are documented within, but basically a subtlety we already track
with `trio`'s masking of excs by a checkpoint-in-`finally` can cause
compounded issues with our `@context` endpoints, mostly in terms of
remote error and cancel-ack relay semantics.
2025-07-15 07:20:59 -04:00
Tyler Goodlet 1d1ef46d9c Add some tooling params to `collapse_eg()` 2025-07-15 07:20:59 -04:00
Tyler Goodlet 92cc0b7d46 Move `.is_multi_cancelled()` to `.trioniics._beg`
Since it's for beg filtering, the current impl should be renamed anyway;
it's not just for filtering cancelled excs.

Deats,
- added a real doc string, links to official eg docs and fixed the
  return typing.
- adjust all internal imports to match.
2025-07-15 07:20:59 -04:00
Tyler Goodlet bd7ef9ca80 Fix `nest_from_op()` call sigs, already changed upstream
In `._runtime/_root` and since the latest fn-signature changes were
already landed onto main branch via the 65b7956: #384-patch.
2025-07-15 07:20:43 -04:00
Tyler Goodlet d49220b785 Drop stale comment from inter-peer suite 2025-07-15 07:20:43 -04:00
Tyler Goodlet 2959c3d0d9 Use `nest_from_op()` in some runtime logs for actor-state-repring 2025-07-15 07:20:43 -04:00
34 changed files with 2005 additions and 758 deletions

View File

@ -317,7 +317,6 @@ def test_subactor_breakpoint(
assert in_prompt_msg(
child, [
'MessagingError:',
'RemoteActorError:',
"('breakpoint_forever'",
'bdb.BdbQuit',

View File

@ -121,9 +121,11 @@ def test_shield_pause(
child.pid,
signal.SIGINT,
)
from tractor._supervise import _shutdown_msg
expect(
child,
'Shutting down actor runtime',
# 'Shutting down actor runtime',
_shutdown_msg,
timeout=6,
)
assert_before(

View File

@ -252,7 +252,7 @@ def test_simple_context(
pass
except BaseExceptionGroup as beg:
# XXX: on windows it seems we may have to expect the group error
from tractor._exceptions import is_multi_cancelled
from tractor.trionics import is_multi_cancelled
assert is_multi_cancelled(beg)
else:
trio.run(main)

View File

@ -410,7 +410,6 @@ def test_peer_canceller(
'''
async def main():
async with tractor.open_nursery(
# NOTE: to halt the peer tasks on ctxc, uncomment this.
debug_mode=debug_mode,
) as an:
canceller: Portal = await an.start_actor(

View File

@ -0,0 +1,237 @@
'''
Special case testing for issues not (dis)covered in the primary
`Context` related functional/scenario suites.
**NOTE: this mod is a WIP** space for handling
odd/rare/undiscovered/not-yet-revealed faults which either
loudly (ideal case) breakl our supervision protocol
or (worst case) result in distributed sys hangs.
Suites here further try to clarify (if [partially] ill-defined) and
verify our edge case semantics for inter-actor-relayed-exceptions
including,
- lowlevel: what remote obj-data is interchanged for IPC and what is
native-obj form is expected from unpacking in the the new
mem-domain.
- which kinds of `RemoteActorError` (and its derivs) are expected by which
(types of) peers (parent, child, sibling, etc) with what
particular meta-data set such as,
- `.src_uid`: the original (maybe) peer who raised.
- `.relay_uid`: the next-hop-peer who sent it.
- `.relay_path`: the sequence of peer actor hops.
- `.is_inception`: a predicate that denotes multi-hop remote errors.
- when should `ExceptionGroup`s be relayed from a particular
remote endpoint, they should never be caused by implicit `._rpc`
nursery machinery!
- various special `trio` edge cases around its cancellation semantics
and how we (currently) leverage `trio.Cancelled` as a signal for
whether a `Context` task should raise `ContextCancelled` (ctx).
'''
import pytest
import trio
import tractor
from tractor import ( # typing
ActorNursery,
Portal,
Context,
ContextCancelled,
)
@tractor.context
async def sleep_n_chkpt_in_finally(
ctx: Context,
sleep_n_raise: bool,
chld_raise_delay: float,
chld_finally_delay: float,
rent_cancels: bool,
rent_ctxc_delay: float,
expect_exc: str|None = None,
) -> None:
'''
Sync, open a tn, then wait for cancel, run a chkpt inside
the user's `finally:` teardown.
This covers a footgun case that `trio` core doesn't seem to care about
wherein an exc can be masked by a `trio.Cancelled` raised inside a tn emedded
`finally:`.
Also see `test_trioisms::test_acm_embedded_nursery_propagates_enter_err`
for the down and gritty details.
Since a `@context` endpoint fn can also contain code like this,
**and** bc we currently have no easy way other then
`trio.Cancelled` to signal cancellation on each side of an IPC `Context`,
the footgun issue can compound itself as demonstrated in this suite..
Here are some edge cases codified with our WIP "sclang" syntax
(note the parent(rent)/child(chld) naming here is just
pragmatism, generally these most of these cases can occurr
regardless of the distributed-task's supervision hiearchy),
- rent c)=> chld.raises-then-taskc-in-finally
|_ chld's body raises an `exc: BaseException`.
_ in its `finally:` block it runs a chkpoint
which raises a taskc (`trio.Cancelled`) which
masks `exc` instead raising taskc up to the first tn.
_ the embedded/chld tn captures the masking taskc and then
raises it up to the ._rpc-ep-tn instead of `exc`.
_ the rent thinks the child ctxc-ed instead of errored..
'''
await ctx.started()
if expect_exc:
expect_exc: BaseException = tractor._exceptions.get_err_type(
type_name=expect_exc,
)
berr: BaseException|None = None
try:
if not sleep_n_raise:
await trio.sleep_forever()
elif sleep_n_raise:
# XXX this sleep is less then the sleep the parent
# does before calling `ctx.cancel()`
await trio.sleep(chld_raise_delay)
# XXX this will be masked by a taskc raised in
# the `finally:` if this fn doesn't terminate
# before any ctxc-req arrives AND a checkpoint is hit
# in that `finally:`.
raise RuntimeError('my app krurshed..')
except BaseException as _berr:
berr = _berr
# TODO: it'd sure be nice to be able to inject our own
# `ContextCancelled` here instead of of `trio.Cancelled`
# so that our runtime can expect it and this "user code"
# would be able to tell the diff between a generic trio
# cancel and a tractor runtime-IPC cancel.
if expect_exc:
if not isinstance(
berr,
expect_exc,
):
raise ValueError(
f'Unexpected exc type ??\n'
f'{berr!r}\n'
f'\n'
f'Expected a {expect_exc!r}\n'
)
raise berr
# simulate what user code might try even though
# it's a known boo-boo..
finally:
# maybe wait for rent ctxc to arrive
with trio.CancelScope(shield=True):
await trio.sleep(chld_finally_delay)
# !!XXX this will raise `trio.Cancelled` which
# will mask the RTE from above!!!
#
# YES, it's the same case as our extant
# `test_trioisms::test_acm_embedded_nursery_propagates_enter_err`
try:
await trio.lowlevel.checkpoint()
except trio.Cancelled as taskc:
if (scope_err := taskc.__context__):
print(
f'XXX MASKED REMOTE ERROR XXX\n'
f'ENDPOINT exception -> {scope_err!r}\n'
f'will be masked by -> {taskc!r}\n'
)
# await tractor.pause(shield=True)
raise taskc
@pytest.mark.parametrize(
'chld_callspec',
[
dict(
sleep_n_raise=None,
chld_raise_delay=0.1,
chld_finally_delay=0.1,
expect_exc='Cancelled',
rent_cancels=True,
rent_ctxc_delay=0.1,
),
dict(
sleep_n_raise='RuntimeError',
chld_raise_delay=0.1,
chld_finally_delay=1,
expect_exc='RuntimeError',
rent_cancels=False,
rent_ctxc_delay=0.1,
),
],
ids=lambda item: f'chld_callspec={item!r}'
)
def test_unmasked_remote_exc(
debug_mode: bool,
chld_callspec: dict,
tpt_proto: str,
):
expect_exc_str: str|None = chld_callspec['sleep_n_raise']
rent_ctxc_delay: float|None = chld_callspec['rent_ctxc_delay']
async def main():
an: ActorNursery
async with tractor.open_nursery(
debug_mode=debug_mode,
enable_transports=[tpt_proto],
) as an:
ptl: Portal = await an.start_actor(
'cancellee',
enable_modules=[__name__],
)
ctx: Context
async with (
ptl.open_context(
sleep_n_chkpt_in_finally,
**chld_callspec,
) as (ctx, sent),
):
assert not sent
await trio.sleep(rent_ctxc_delay)
await ctx.cancel()
# recv error or result from chld
ctxc: ContextCancelled = await ctx.wait_for_result()
assert (
ctxc is ctx.outcome
and
isinstance(ctxc, ContextCancelled)
)
# always graceful terminate the sub in non-error cases
await an.cancel()
if expect_exc_str:
expect_exc: BaseException = tractor._exceptions.get_err_type(
type_name=expect_exc_str,
)
with pytest.raises(
expected_exception=tractor.RemoteActorError,
) as excinfo:
trio.run(main)
rae = excinfo.value
assert expect_exc == rae.boxed_type
else:
trio.run(main)

View File

@ -72,11 +72,13 @@ def test_resource_only_entered_once(key_on):
with trio.move_on_after(0.5):
async with (
tractor.open_root_actor(),
trio.open_nursery() as n,
trio.open_nursery() as tn,
):
for i in range(10):
n.start_soon(enter_cached_mngr, f'task_{i}')
tn.start_soon(
enter_cached_mngr,
f'task_{i}',
)
await trio.sleep(0.001)
trio.run(main)
@ -98,23 +100,34 @@ async def streamer(
@acm
async def open_stream() -> Awaitable[tractor.MsgStream]:
async def open_stream() -> Awaitable[
tuple[
tractor.ActorNursery,
tractor.MsgStream,
]
]:
try:
async with tractor.open_nursery() as an:
portal = await an.start_actor(
'streamer',
enable_modules=[__name__],
)
async with (
portal.open_context(streamer) as (ctx, first),
ctx.open_stream() as stream,
):
yield stream
try:
async with (
portal.open_context(streamer) as (ctx, first),
ctx.open_stream() as stream,
):
print('Entered open_stream() caller')
yield an, stream
print('Exited open_stream() caller')
print('Cancelling streamer')
await portal.cancel_actor()
print('Cancelled streamer')
finally:
print(
'Cancelling streamer with,\n'
'=> `Portal.cancel_actor()`'
)
await portal.cancel_actor()
print('Cancelled streamer')
except Exception as err:
print(
@ -130,8 +143,12 @@ async def maybe_open_stream(taskname: str):
async with tractor.trionics.maybe_open_context(
# NOTE: all secondary tasks should cache hit on the same key
acm_func=open_stream,
) as (cache_hit, stream):
) as (
cache_hit,
(an, stream)
):
# when the actor + portal + ctx + stream has already been
# allocated we want to just bcast to this task.
if cache_hit:
print(f'{taskname} loaded from cache')
@ -139,10 +156,43 @@ async def maybe_open_stream(taskname: str):
# if this feed is already allocated by the first
# task that entereed
async with stream.subscribe() as bstream:
yield bstream
yield an, bstream
print(
f'cached task exited\n'
f')>\n'
f' |_{taskname}\n'
)
# we should always unreg the "cloned" bcrc for this
# consumer-task
assert id(bstream) not in bstream._state.subs
else:
# yield the actual stream
yield stream
try:
yield an, stream
finally:
print(
f'NON-cached task exited\n'
f')>\n'
f' |_{taskname}\n'
)
first_bstream = stream._broadcaster
bcrx_state = first_bstream._state
subs: dict[int, int] = bcrx_state.subs
if len(subs) == 1:
assert id(first_bstream) in subs
# ^^TODO! the bcrx should always de-allocate all subs,
# including the implicit first one allocated on entry
# by the first subscribing peer task, no?
#
# -[ ] adjust `MsgStream.subscribe()` to do this mgmt!
# |_ allows reverting `MsgStream.receive()` to the
# non-bcaster method.
# |_ we can decide whether to reset `._broadcaster`?
#
# await tractor.pause(shield=True)
def test_open_local_sub_to_stream(
@ -159,16 +209,24 @@ def test_open_local_sub_to_stream(
if debug_mode:
timeout = 999
print(f'IN debug_mode, setting large timeout={timeout!r}..')
async def main():
full = list(range(1000))
an: tractor.ActorNursery|None = None
num_tasks: int = 10
async def get_sub_and_pull(taskname: str):
nonlocal an
stream: tractor.MsgStream
async with (
maybe_open_stream(taskname) as stream,
maybe_open_stream(taskname) as (
an,
stream,
),
):
if '0' in taskname:
assert isinstance(stream, tractor.MsgStream)
@ -180,34 +238,70 @@ def test_open_local_sub_to_stream(
first = await stream.receive()
print(f'{taskname} started with value {first}')
seq = []
seq: list[int] = []
async for msg in stream:
seq.append(msg)
assert set(seq).issubset(set(full))
# end of @acm block
print(f'{taskname} finished')
root: tractor.Actor
with trio.fail_after(timeout) as cs:
# TODO: turns out this isn't multi-task entrant XD
# We probably need an indepotent entry semantic?
async with tractor.open_root_actor(
debug_mode=debug_mode,
):
# maybe_enable_greenback=True,
#
# ^TODO? doesn't seem to mk breakpoint() usage work
# bc each bg task needs to open a portal??
# - [ ] we should consider making this part of
# our taskman defaults?
# |_see https://github.com/goodboy/tractor/pull/363
#
) as root:
assert root.is_registrar
async with (
trio.open_nursery() as tn,
):
for i in range(10):
for i in range(num_tasks):
tn.start_soon(
get_sub_and_pull,
f'task_{i}',
)
await trio.sleep(0.001)
print('all consumer tasks finished')
print('all consumer tasks finished!')
# ?XXX, ensure actor-nursery is shutdown or we might
# hang here due to a minor task deadlock/race-condition?
#
# - seems that all we need is a checkpoint to ensure
# the last suspended task, which is inside
# `.maybe_open_context()`, can do the
# `Portal.cancel_actor()` call?
#
# - if that bg task isn't resumed, then this blocks
# timeout might hit before that?
#
if root.ipc_server.has_peers():
await trio.lowlevel.checkpoint()
# alt approach, cancel the entire `an`
# await tractor.pause()
# await an.cancel()
# end of runtime scope
print('root actor terminated.')
if cs.cancelled_caught:
pytest.fail(
'Should NOT time out in `open_root_actor()` ?'
)
print('exiting main.')
trio.run(main)

View File

@ -67,7 +67,6 @@ async def ensure_sequence(
@acm
async def open_sequence_streamer(
sequence: list[int],
reg_addr: tuple[str, int],
start_method: str,
@ -96,39 +95,43 @@ async def open_sequence_streamer(
def test_stream_fan_out_to_local_subscriptions(
reg_addr,
debug_mode: bool,
reg_addr: tuple,
start_method,
):
sequence = list(range(1000))
async def main():
with trio.fail_after(9):
async with open_sequence_streamer(
sequence,
reg_addr,
start_method,
) as stream:
async with open_sequence_streamer(
sequence,
reg_addr,
start_method,
) as stream:
async with (
collapse_eg(),
trio.open_nursery() as tn,
):
for i in range(10):
tn.start_soon(
ensure_sequence,
stream,
sequence.copy(),
name=f'consumer_{i}',
)
async with trio.open_nursery() as n:
for i in range(10):
n.start_soon(
ensure_sequence,
stream,
sequence.copy(),
name=f'consumer_{i}',
)
await stream.send(tuple(sequence))
await stream.send(tuple(sequence))
async for value in stream:
print(f'source stream rx: {value}')
assert value == sequence[0]
sequence.remove(value)
async for value in stream:
print(f'source stream rx: {value}')
assert value == sequence[0]
sequence.remove(value)
if not sequence:
# fully consumed
break
if not sequence:
# fully consumed
break
trio.run(main)
@ -151,67 +154,69 @@ def test_consumer_and_parent_maybe_lag(
sequence = list(range(300))
parent_delay, sub_delay = task_delays
async with open_sequence_streamer(
sequence,
reg_addr,
start_method,
) as stream:
# TODO, maybe mak a cm-deco for main()s?
with trio.fail_after(3):
async with open_sequence_streamer(
sequence,
reg_addr,
start_method,
) as stream:
try:
async with (
collapse_eg(),
trio.open_nursery() as tn,
):
try:
async with (
collapse_eg(),
trio.open_nursery() as tn,
):
tn.start_soon(
ensure_sequence,
stream,
sequence.copy(),
sub_delay,
name='consumer_task',
)
tn.start_soon(
ensure_sequence,
stream,
sequence.copy(),
sub_delay,
name='consumer_task',
)
await stream.send(tuple(sequence))
await stream.send(tuple(sequence))
# async for value in stream:
lagged = False
lag_count = 0
# async for value in stream:
lagged = False
lag_count = 0
while True:
try:
value = await stream.receive()
print(f'source stream rx: {value}')
while True:
try:
value = await stream.receive()
print(f'source stream rx: {value}')
if lagged:
# re set the sequence starting at our last
# value
sequence = sequence[sequence.index(value) + 1:]
else:
assert value == sequence[0]
sequence.remove(value)
if lagged:
# re set the sequence starting at our last
# value
sequence = sequence[sequence.index(value) + 1:]
else:
assert value == sequence[0]
sequence.remove(value)
lagged = False
lagged = False
except Lagged:
lagged = True
print(f'source stream lagged after {value}')
lag_count += 1
continue
except Lagged:
lagged = True
print(f'source stream lagged after {value}')
lag_count += 1
continue
# lag the parent
await trio.sleep(parent_delay)
# lag the parent
await trio.sleep(parent_delay)
if not sequence:
# fully consumed
break
print(f'parent + source stream lagged: {lag_count}')
if not sequence:
# fully consumed
break
print(f'parent + source stream lagged: {lag_count}')
if parent_delay > sub_delay:
assert lag_count > 0
if parent_delay > sub_delay:
assert lag_count > 0
except Lagged:
# child was lagged
assert parent_delay < sub_delay
except Lagged:
# child was lagged
assert parent_delay < sub_delay
trio.run(main)
@ -285,7 +290,11 @@ def test_faster_task_to_recv_is_cancelled_by_slower(
def test_subscribe_errors_after_close():
'''
Verify after calling `BroadcastReceiver.aclose()` you can't
"re-open" it via `.subscribe()`.
'''
async def main():
size = 1
@ -293,6 +302,8 @@ def test_subscribe_errors_after_close():
async with broadcast_receiver(rx, size) as brx:
pass
assert brx.key not in brx._state.subs
try:
# open and close
async with brx.subscribe():
@ -302,7 +313,7 @@ def test_subscribe_errors_after_close():
assert brx.key not in brx._state.subs
else:
assert 0
pytest.fail('brx.subscribe() never raised!?')
trio.run(main)

View File

@ -112,55 +112,11 @@ def test_acm_embedded_nursery_propagates_enter_err(
'''
import tractor
@acm
async def maybe_raise_from_masking_exc(
tn: trio.Nursery,
unmask_from: BaseException|None = trio.Cancelled
# TODO, maybe offer a collection?
# unmask_from: set[BaseException] = {
# trio.Cancelled,
# },
):
if not unmask_from:
yield
return
try:
yield
except* unmask_from as be_eg:
# TODO, if we offer `unmask_from: set`
# for masker_exc_type in unmask_from:
matches, rest = be_eg.split(unmask_from)
if not matches:
raise
for exc_match in be_eg.exceptions:
if (
(exc_ctx := exc_match.__context__)
and
type(exc_ctx) not in {
# trio.Cancelled, # always by default?
unmask_from,
}
):
exc_ctx.add_note(
f'\n'
f'WARNING: the above error was masked by a {unmask_from!r} !?!\n'
f'Are you always cancelling? Say from a `finally:` ?\n\n'
f'{tn!r}'
)
raise exc_ctx from exc_match
@acm
async def wraps_tn_that_always_cancels():
async with (
trio.open_nursery() as tn,
maybe_raise_from_masking_exc(
tractor.trionics.maybe_raise_from_masking_exc(
tn=tn,
unmask_from=(
trio.Cancelled
@ -202,3 +158,60 @@ def test_acm_embedded_nursery_propagates_enter_err(
assert_eg, rest_eg = eg.split(AssertionError)
assert len(assert_eg.exceptions) == 1
def test_gatherctxs_with_memchan_breaks_multicancelled(
debug_mode: bool,
):
'''
Demo how a using an `async with sndchan` inside a `.trionics.gather_contexts()` task
will break a strict-eg-tn's multi-cancelled absorption..
'''
from tractor import (
trionics,
)
@acm
async def open_memchan() -> trio.abc.ReceiveChannel:
task: trio.Task = trio.lowlevel.current_task()
print(
f'Opening {task!r}\n'
)
# 1 to force eager sending
send, recv = trio.open_memory_channel(16)
try:
async with send:
yield recv
finally:
print(
f'Closed {task!r}\n'
)
async def main():
async with (
# XXX should ensure ONLY the KBI
# is relayed upward
trionics.collapse_eg(),
trio.open_nursery(
# strict_exception_groups=False,
), # as tn,
trionics.gather_contexts([
open_memchan(),
open_memchan(),
]) as recv_chans,
):
assert len(recv_chans) == 2
await trio.sleep(1)
raise KeyboardInterrupt
# tn.cancel_scope.cancel()
with pytest.raises(KeyboardInterrupt):
trio.run(main)

View File

@ -101,6 +101,9 @@ from ._state import (
debug_mode,
_ctxvar_Context,
)
from .trionics import (
collapse_eg,
)
# ------ - ------
if TYPE_CHECKING:
from ._portal import Portal
@ -740,6 +743,8 @@ class Context:
# cancelled, NOT their reported canceller. IOW in the
# latter case we're cancelled by someone else getting
# cancelled.
#
# !TODO, switching to `Actor.aid` here!
if (canc := error.canceller) == self._actor.uid:
whom: str = 'us'
self._canceller = canc
@ -940,7 +945,7 @@ class Context:
self.cancel_called = True
header: str = (
f'Cancelling ctx from {side.upper()}-side\n'
f'Cancelling ctx from {side!r}-side\n'
)
reminfo: str = (
# ' =>\n'
@ -948,7 +953,7 @@ class Context:
f'\n'
f'c)=> {self.chan.uid}\n'
f' |_[{self.dst_maddr}\n'
f' >>{self.repr_rpc}\n'
f' >> {self.repr_rpc}\n'
# f' >> {self._nsf}() -> {codec}[dict]:\n\n'
# TODO: pull msg-type from spec re #320
)
@ -2023,10 +2028,8 @@ async def open_context_from_portal(
ctxc_from_callee: ContextCancelled|None = None
try:
async with (
trio.open_nursery(
strict_exception_groups=False,
) as tn,
collapse_eg(),
trio.open_nursery() as tn,
msgops.maybe_limit_plds(
ctx=ctx,
spec=ctx_meta.get('pld_spec'),

View File

@ -28,7 +28,10 @@ from typing import (
from contextlib import asynccontextmanager as acm
from tractor.log import get_logger
from .trionics import gather_contexts
from .trionics import (
gather_contexts,
collapse_eg,
)
from .ipc import _connect_chan, Channel
from ._addr import (
UnwrappedAddress,
@ -87,7 +90,6 @@ async def get_registry(
yield regstr_ptl
@acm
async def get_root(
**kwargs,
@ -253,9 +255,12 @@ async def find_actor(
for addr in registry_addrs
)
portals: list[Portal]
async with gather_contexts(
mngrs=maybe_portals,
) as portals:
async with (
collapse_eg(),
gather_contexts(
mngrs=maybe_portals,
) as portals,
):
# log.runtime(
# 'Gathered portals:\n'
# f'{portals}'

View File

@ -21,7 +21,7 @@ Sub-process entry points.
from __future__ import annotations
from functools import partial
import multiprocessing as mp
import os
# import os
from typing import (
Any,
TYPE_CHECKING,
@ -38,6 +38,7 @@ from .devx import (
_frame_stack,
pformat,
)
# from .msg import pretty_struct
from .to_asyncio import run_as_asyncio_guest
from ._addr import UnwrappedAddress
from ._runtime import (
@ -127,20 +128,13 @@ def _trio_main(
if actor.loglevel is not None:
get_console_log(actor.loglevel)
actor_info: str = (
f'|_{actor}\n'
f' uid: {actor.uid}\n'
f' pid: {os.getpid()}\n'
f' parent_addr: {parent_addr}\n'
f' loglevel: {actor.loglevel}\n'
)
log.info(
'Starting new `trio` subactor\n'
f'Starting `trio` subactor from parent @ '
f'{parent_addr}\n'
+
pformat.nest_from_op(
input_op='>(', # see syntax ideas above
text=actor_info,
nest_indent=2, # since "complete"
text=f'{actor}',
)
)
logmeth = log.info
@ -149,7 +143,7 @@ def _trio_main(
+
pformat.nest_from_op(
input_op=')>', # like a "closed-to-play"-icon from super perspective
text=actor_info,
text=f'{actor}',
nest_indent=1,
)
)
@ -167,7 +161,7 @@ def _trio_main(
+
pformat.nest_from_op(
input_op='c)>', # closed due to cancel (see above)
text=actor_info,
text=f'{actor}',
)
)
except BaseException as err:
@ -177,7 +171,7 @@ def _trio_main(
+
pformat.nest_from_op(
input_op='x)>', # closed by error
text=actor_info,
text=f'{actor}',
)
)
# NOTE since we raise a tb will already be shown on the

View File

@ -1246,55 +1246,6 @@ def unpack_error(
return exc
def is_multi_cancelled(
exc: BaseException|BaseExceptionGroup,
ignore_nested: set[BaseException] = set(),
) -> bool|BaseExceptionGroup:
'''
Predicate to determine if an `BaseExceptionGroup` only contains
some (maybe nested) set of sub-grouped exceptions (like only
`trio.Cancelled`s which get swallowed silently by default) and is
thus the result of "gracefully cancelling" a collection of
sub-tasks (or other conc primitives) and receiving a "cancelled
ACK" from each after termination.
Docs:
----
- https://docs.python.org/3/library/exceptions.html#exception-groups
- https://docs.python.org/3/library/exceptions.html#BaseExceptionGroup.subgroup
'''
if (
not ignore_nested
or
trio.Cancelled in ignore_nested
# XXX always count-in `trio`'s native signal
):
ignore_nested.update({trio.Cancelled})
if isinstance(exc, BaseExceptionGroup):
matched_exc: BaseExceptionGroup|None = exc.subgroup(
tuple(ignore_nested),
# TODO, complain about why not allowed XD
# condition=tuple(ignore_nested),
)
if matched_exc is not None:
return matched_exc
# NOTE, IFF no excs types match (throughout the error-tree)
# -> return `False`, OW return the matched sub-eg.
#
# IOW, for the inverse of ^ for the purpose of
# maybe-enter-REPL--logic: "only debug when the err-tree contains
# at least one exc-type NOT in `ignore_nested`" ; i.e. the case where
# we fallthrough and return `False` here.
return False
def _raise_from_unexpected_msg(
ctx: Context,
msg: MsgType,

View File

@ -39,7 +39,10 @@ import warnings
import trio
from .trionics import maybe_open_nursery
from .trionics import (
maybe_open_nursery,
collapse_eg,
)
from ._state import (
current_actor,
)
@ -115,6 +118,10 @@ class Portal:
@property
def chan(self) -> Channel:
'''
Ref to this ctx's underlying `tractor.ipc.Channel`.
'''
return self._chan
@property
@ -174,10 +181,17 @@ class Portal:
# not expecting a "main" result
if self._expect_result_ctx is None:
peer_id: str = f'{self.channel.aid.reprol()!r}'
log.warning(
f"Portal for {self.channel.aid} not expecting a final"
" result?\nresult() should only be called if subactor"
" was spawned with `ActorNursery.run_in_actor()`")
f'Portal to peer {peer_id} will not deliver a final result?\n'
f'\n'
f'Context.result() can only be called by the parent of '
f'a sub-actor when it was spawned with '
f'`ActorNursery.run_in_actor()`'
f'\n'
f'Further this `ActorNursery`-method-API will deprecated in the'
f'near fututre!\n'
)
return NoResult
# expecting a "main" result
@ -210,6 +224,7 @@ class Portal:
typname: str = type(self).__name__
log.warning(
f'`{typname}.result()` is DEPRECATED!\n'
f'\n'
f'Use `{typname}.wait_for_result()` instead!\n'
)
return await self.wait_for_result(
@ -221,8 +236,10 @@ class Portal:
# terminate all locally running async generator
# IPC calls
if self._streams:
log.cancel(
f"Cancelling all streams with {self.channel.aid}")
peer_id: str = f'{self.channel.aid.reprol()!r}'
report: str = (
f'Cancelling all msg-streams with {peer_id}\n'
)
for stream in self._streams.copy():
try:
await stream.aclose()
@ -231,10 +248,18 @@ class Portal:
# (unless of course at some point down the road we
# won't expect this to always be the case or need to
# detect it for respawning purposes?)
log.debug(f"{stream} was already closed.")
report += (
f'->) {stream!r} already closed\n'
)
log.cancel(report)
async def aclose(self):
log.debug(f"Closing {self}")
log.debug(
f'Closing portal\n'
f'>}}\n'
f'|_{self}\n'
)
# TODO: once we move to implementing our own `ReceiveChannel`
# (including remote task cancellation inside its `.aclose()`)
# we'll need to .aclose all those channels here
@ -260,23 +285,22 @@ class Portal:
__runtimeframe__: int = 1 # noqa
chan: Channel = self.channel
peer_id: str = f'{self.channel.aid.reprol()!r}'
if not chan.connected():
log.runtime(
'This channel is already closed, skipping cancel request..'
'Peer {peer_id} is already disconnected\n'
'-> skipping cancel request..\n'
)
return False
reminfo: str = (
f'c)=> {self.channel.aid}\n'
f' |_{chan}\n'
)
log.cancel(
f'Requesting actor-runtime cancel for peer\n\n'
f'{reminfo}'
f'Sending actor-runtime-cancel-req to peer\n'
f'\n'
f'c)=> {peer_id}\n'
)
# XXX the one spot we set it?
self.channel._cancel_called: bool = True
chan._cancel_called: bool = True
try:
# send cancel cmd - might not get response
# XXX: sure would be nice to make this work with
@ -297,8 +321,9 @@ class Portal:
# may timeout and we never get an ack (obvi racy)
# but that doesn't mean it wasn't cancelled.
log.debug(
'May have failed to cancel peer?\n'
f'{reminfo}'
f'May have failed to cancel peer?\n'
f'\n'
f'c)=?> {peer_id}\n'
)
# if we get here some weird cancellation case happened
@ -316,22 +341,22 @@ class Portal:
TransportClosed,
) as tpt_err:
report: str = (
f'IPC chan for actor already closed or broken?\n\n'
f'{self.channel.aid}\n'
f' |_{self.channel}\n'
ipc_borked_report: str = (
f'IPC for actor already closed/broken?\n\n'
f'\n'
f'c)=x> {peer_id}\n'
)
match tpt_err:
case TransportClosed():
log.debug(report)
log.debug(ipc_borked_report)
case _:
report += (
ipc_borked_report += (
f'\n'
f'Unhandled low-level transport-closed/error during\n'
f'Portal.cancel_actor()` request?\n'
f'<{type(tpt_err).__name__}( {tpt_err} )>\n'
)
log.warning(report)
log.warning(ipc_borked_report)
return False
@ -488,10 +513,13 @@ class Portal:
with trio.CancelScope(shield=True):
await ctx.cancel()
except trio.ClosedResourceError:
except trio.ClosedResourceError as cre:
# if the far end terminates before we send a cancel the
# underlying transport-channel may already be closed.
log.cancel(f'Context {ctx} was already closed?')
log.cancel(
f'Context.cancel() -> {cre!r}\n'
f'cid: {ctx.cid!r} already closed?\n'
)
# XXX: should this always be done?
# await recv_chan.aclose()
@ -558,14 +586,13 @@ async def open_portal(
assert actor
was_connected: bool = False
async with maybe_open_nursery(
tn,
shield=shield,
strict_exception_groups=False,
# ^XXX^ TODO? soo roll our own then ??
# -> since we kinda want the "if only one `.exception` then
# just raise that" interface?
) as tn:
async with (
collapse_eg(),
maybe_open_nursery(
tn,
shield=shield,
) as tn,
):
if not channel.connected():
await channel.connect()

View File

@ -37,16 +37,11 @@ import warnings
import trio
from ._runtime import (
Actor,
Arbiter,
# TODO: rename and make a non-actor subtype?
# Arbiter as Registry,
async_main,
)
from . import _runtime
from .devx import (
debug,
_frame_stack,
pformat as _pformat,
)
from . import _spawn
from . import _state
@ -61,9 +56,12 @@ from ._addr import (
mk_uuid,
wrap_address,
)
from .trionics import (
is_multi_cancelled,
collapse_eg,
)
from ._exceptions import (
RuntimeFailure,
is_multi_cancelled,
)
@ -99,7 +97,7 @@ async def maybe_block_bp(
):
logger.info(
f'Found `greenback` installed @ {maybe_mod}\n'
'Enabling `tractor.pause_from_sync()` support!\n'
f'Enabling `tractor.pause_from_sync()` support!\n'
)
os.environ['PYTHONBREAKPOINT'] = (
'tractor.devx.debug._sync_pause_from_builtin'
@ -194,13 +192,19 @@ async def open_root_actor(
# read-only state to sublayers?
# extra_rt_vars: dict|None = None,
) -> Actor:
) -> _runtime.Actor:
'''
Runtime init entry point for ``tractor``.
Initialize the `tractor` runtime by starting a "root actor" in
a parent-most Python process.
All (disjoint) actor-process-trees-as-programs are created via
this entrypoint.
'''
# XXX NEVER allow nested actor-trees!
if already_actor := _state.current_actor(err_on_no_runtime=False):
if already_actor := _state.current_actor(
err_on_no_runtime=False,
):
rtvs: dict[str, Any] = _state._runtime_vars
root_mailbox: list[str, int] = rtvs['_root_mailbox']
registry_addrs: list[list[str, int]] = rtvs['_registry_addrs']
@ -270,14 +274,20 @@ async def open_root_actor(
DeprecationWarning,
stacklevel=2,
)
registry_addrs = [arbiter_addr]
uw_reg_addrs = [arbiter_addr]
if not registry_addrs:
registry_addrs: list[UnwrappedAddress] = default_lo_addrs(
uw_reg_addrs = registry_addrs
if not uw_reg_addrs:
uw_reg_addrs: list[UnwrappedAddress] = default_lo_addrs(
enable_transports
)
assert registry_addrs
# must exist by now since all below code is dependent
assert uw_reg_addrs
registry_addrs: list[Address] = [
wrap_address(uw_addr)
for uw_addr in uw_reg_addrs
]
loglevel = (
loglevel
@ -326,10 +336,10 @@ async def open_root_actor(
enable_stack_on_sig()
# closed into below ping task-func
ponged_addrs: list[UnwrappedAddress] = []
ponged_addrs: list[Address] = []
async def ping_tpt_socket(
addr: UnwrappedAddress,
addr: Address,
timeout: float = 1,
) -> None:
'''
@ -349,17 +359,22 @@ async def open_root_actor(
# be better to eventually have a "discovery" protocol
# with basic handshake instead?
with trio.move_on_after(timeout):
async with _connect_chan(addr):
async with _connect_chan(addr.unwrap()):
ponged_addrs.append(addr)
except OSError:
# TODO: make this a "discovery" log level?
# ?TODO, make this a "discovery" log level?
logger.info(
f'No actor registry found @ {addr}\n'
f'No root-actor registry found @ {addr!r}\n'
)
# !TODO, this is basically just another (abstract)
# happy-eyeballs, so we should try for formalize it somewhere
# in a `.[_]discovery` ya?
#
async with trio.open_nursery() as tn:
for addr in registry_addrs:
for uw_addr in uw_reg_addrs:
addr: Address = wrap_address(uw_addr)
tn.start_soon(
ping_tpt_socket,
addr,
@ -381,31 +396,35 @@ async def open_root_actor(
f'Registry(s) seem(s) to exist @ {ponged_addrs}'
)
actor = Actor(
actor = _runtime.Actor(
name=name or 'anonymous',
uuid=mk_uuid(),
registry_addrs=ponged_addrs,
loglevel=loglevel,
enable_modules=enable_modules,
)
# DO NOT use the registry_addrs as the transport server
# addrs for this new non-registar, root-actor.
# **DO NOT** use the registry_addrs as the
# ipc-transport-server's bind-addrs as this is
# a new NON-registrar, ROOT-actor.
#
# XXX INSTEAD, bind random addrs using the same tpt
# proto.
for addr in ponged_addrs:
waddr: Address = wrap_address(addr)
trans_bind_addrs.append(
waddr.get_random(bindspace=waddr.bindspace)
addr.get_random(
bindspace=addr.bindspace,
)
)
# Start this local actor as the "registrar", aka a regular
# actor who manages the local registry of "mailboxes" of
# other process-tree-local sub-actors.
else:
# NOTE that if the current actor IS THE REGISTAR, the
# following init steps are taken:
# - the tranport layer server is bound to each addr
# pair defined in provided registry_addrs, or the default.
trans_bind_addrs = registry_addrs
trans_bind_addrs = uw_reg_addrs
# - it is normally desirable for any registrar to stay up
# indefinitely until either all registered (child/sub)
@ -416,7 +435,8 @@ async def open_root_actor(
# https://github.com/goodboy/tractor/pull/348
# https://github.com/goodboy/tractor/issues/296
actor = Arbiter(
# TODO: rename as `RootActor` or is that even necessary?
actor = _runtime.Arbiter(
name=name or 'registrar',
uuid=mk_uuid(),
registry_addrs=registry_addrs,
@ -428,6 +448,16 @@ async def open_root_actor(
# `.trio.run()`.
actor._infected_aio = _state._runtime_vars['_is_infected_aio']
# NOTE, only set the loopback addr for the
# process-tree-global "root" mailbox since all sub-actors
# should be able to speak to their root actor over that
# channel.
raddrs: list[Address] = _state._runtime_vars['_root_addrs']
raddrs.extend(trans_bind_addrs)
# TODO, remove once we have also removed all usage;
# eventually all (root-)registry apis should expect > 1 addr.
_state._runtime_vars['_root_mailbox'] = raddrs[0]
# Start up main task set via core actor-runtime nurseries.
try:
# assign process-local actor
@ -435,21 +465,27 @@ async def open_root_actor(
# start local channel-server and fake the portal API
# NOTE: this won't block since we provide the nursery
ml_addrs_str: str = '\n'.join(
f'@{addr}' for addr in trans_bind_addrs
)
logger.info(
f'Starting local {actor.uid} on the following transport addrs:\n'
f'{ml_addrs_str}'
)
report: str = f'Starting actor-runtime for {actor.aid.reprol()!r}\n'
if reg_addrs := actor.registry_addrs:
report += (
'-> Opening new registry @ '
+
'\n'.join(
f'{addr}' for addr in reg_addrs
)
)
logger.info(f'{report}\n')
# start the actor runtime in a new task
async with trio.open_nursery(
strict_exception_groups=False,
# ^XXX^ TODO? instead unpack any RAE as per "loose" style?
) as nursery:
# start runtime in a bg sub-task, yield to caller.
async with (
collapse_eg(),
trio.open_nursery() as root_tn,
# ``_runtime.async_main()`` creates an internal nursery
# XXX, finally-footgun below?
# -> see note on why shielding.
# maybe_raise_from_masking_exc(),
):
# `_runtime.async_main()` creates an internal nursery
# and blocks here until any underlying actor(-process)
# tree has terminated thereby conducting so called
# "end-to-end" structured concurrency throughout an
@ -457,9 +493,9 @@ async def open_root_actor(
# "actor runtime" primitives are SC-compat and thus all
# transitively spawned actors/processes must be as
# well.
await nursery.start(
await root_tn.start(
partial(
async_main,
_runtime.async_main,
actor,
accept_addrs=trans_bind_addrs,
parent_addr=None
@ -507,7 +543,7 @@ async def open_root_actor(
raise
finally:
# NOTE: not sure if we'll ever need this but it's
# NOTE/TODO?, not sure if we'll ever need this but it's
# possibly better for even more determinism?
# logger.cancel(
# f'Waiting on {len(nurseries)} nurseries in root..')
@ -516,12 +552,21 @@ async def open_root_actor(
# for an in nurseries:
# tempn.start_soon(an.exited.wait)
op_nested_actor_repr: str = _pformat.nest_from_op(
input_op='>) ',
text=actor.pformat(),
nest_prefix='|_',
)
logger.info(
f'Closing down root actor\n'
f'>)\n'
f'|_{actor}\n'
f'{op_nested_actor_repr}'
)
await actor.cancel(None) # self cancel
# XXX, THIS IS A *finally-footgun*!
# -> though already shields iternally it can
# taskc here and mask underlying errors raised in
# the try-block above?
with trio.CancelScope(shield=True):
await actor.cancel(None) # self cancel
finally:
# revert all process-global runtime state
if (
@ -534,10 +579,16 @@ async def open_root_actor(
_state._current_actor = None
_state._last_actor_terminated = actor
logger.runtime(
sclang_repr: str = _pformat.nest_from_op(
input_op=')>',
text=actor.pformat(),
nest_prefix='|_',
nest_indent=1,
)
logger.info(
f'Root actor terminated\n'
f')>\n'
f' |_{actor}\n'
f'{sclang_repr}'
)

View File

@ -37,6 +37,7 @@ import warnings
import trio
from trio import (
Cancelled,
CancelScope,
Nursery,
TaskStatus,
@ -52,13 +53,18 @@ from ._exceptions import (
ModuleNotExposed,
MsgTypeError,
TransportClosed,
is_multi_cancelled,
pack_error,
unpack_error,
)
from .trionics import (
collapse_eg,
is_multi_cancelled,
maybe_raise_from_masking_exc,
)
from .devx import (
debug,
add_div,
pformat as _pformat,
)
from . import _state
from .log import get_logger
@ -67,7 +73,7 @@ from .msg import (
MsgCodec,
PayloadT,
NamespacePath,
# pretty_struct,
pretty_struct,
_ops as msgops,
)
from tractor.msg.types import (
@ -215,11 +221,18 @@ async def _invoke_non_context(
task_status.started(ctx)
result = await coro
fname: str = func.__name__
op_nested_task: str = _pformat.nest_from_op(
input_op=f')> cid: {ctx.cid!r}',
text=f'{ctx._task}',
nest_indent=1, # under >
)
log.runtime(
'RPC complete:\n'
f'task: {ctx._task}\n'
f'|_cid={ctx.cid}\n'
f'|_{fname}() -> {pformat(result)}\n'
f'RPC task complete\n'
f'\n'
f'{op_nested_task}\n'
f'\n'
f')> {fname}() -> {pformat(result)}\n'
)
# NOTE: only send result if we know IPC isn't down
@ -250,7 +263,7 @@ async def _errors_relayed_via_ipc(
ctx: Context,
is_rpc: bool,
hide_tb: bool = False,
hide_tb: bool = True,
debug_kbis: bool = False,
task_status: TaskStatus[
Context | BaseException
@ -375,9 +388,9 @@ async def _errors_relayed_via_ipc(
# they can be individually ccancelled.
finally:
# if the error is not from user code and instead a failure
# of a runtime RPC or transport failure we do prolly want to
# show this frame
# if the error is not from user code and instead a failure of
# an internal-runtime-RPC or IPC-connection, we do (prolly) want
# to show this frame!
if (
rpc_err
and (
@ -616,32 +629,40 @@ async def _invoke(
# -> the below scope is never exposed to the
# `@context` marked RPC function.
# - `._portal` is never set.
scope_err: BaseException|None = None
try:
tn: trio.Nursery
# TODO: better `trionics` primitive/tooling usage here!
# -[ ] should would be nice to have our `TaskMngr`
# nursery here!
# -[ ] payload value checking like we do with
# `.started()` such that the debbuger can engage
# here in the child task instead of waiting for the
# parent to crash with it's own MTE..
#
tn: Nursery
rpc_ctx_cs: CancelScope
async with (
trio.open_nursery(
strict_exception_groups=False,
# ^XXX^ TODO? instead unpack any RAE as per "loose" style?
) as tn,
collapse_eg(),
trio.open_nursery() as tn,
msgops.maybe_limit_plds(
ctx=ctx,
spec=ctx_meta.get('pld_spec'),
dec_hook=ctx_meta.get('dec_hook'),
),
# XXX NOTE, this being the "most embedded"
# scope ensures unasking of the `await coro` below
# *should* never be interfered with!!
maybe_raise_from_masking_exc(
tn=tn,
unmask_from=Cancelled,
) as _mbme, # maybe boxed masked exc
):
ctx._scope_nursery = tn
rpc_ctx_cs = ctx._scope = tn.cancel_scope
task_status.started(ctx)
# TODO: better `trionics` tooling:
# -[ ] should would be nice to have our `TaskMngr`
# nursery here!
# -[ ] payload value checking like we do with
# `.started()` such that the debbuger can engage
# here in the child task instead of waiting for the
# parent to crash with it's own MTE..
# invoke user endpoint fn.
res: Any|PayloadT = await coro
return_msg: Return|CancelAck = return_msg_type(
cid=cid,
@ -651,7 +672,8 @@ async def _invoke(
ctx._result = res
log.runtime(
f'Sending result msg and exiting {ctx.side!r}\n'
f'{return_msg}\n'
f'\n'
f'{pretty_struct.pformat(return_msg)}\n'
)
await chan.send(return_msg)
@ -743,39 +765,48 @@ async def _invoke(
BaseExceptionGroup,
BaseException,
trio.Cancelled,
) as scope_error:
) as _scope_err:
scope_err = _scope_err
if (
isinstance(scope_error, RuntimeError)
and scope_error.args
and 'Cancel scope stack corrupted' in scope_error.args[0]
isinstance(scope_err, RuntimeError)
and
scope_err.args
and
'Cancel scope stack corrupted' in scope_err.args[0]
):
log.exception('Cancel scope stack corrupted!?\n')
# debug.mk_pdb().set_trace()
# always set this (child) side's exception as the
# local error on the context
ctx._local_error: BaseException = scope_error
ctx._local_error: BaseException = scope_err
# ^-TODO-^ question,
# does this matter other then for
# consistentcy/testing?
# |_ no user code should be in this scope at this point
# AND we already set this in the block below?
# if a remote error was set then likely the
# exception group was raised due to that, so
# XXX if a remote error was set then likely the
# exc group was raised due to that, so
# and we instead raise that error immediately!
ctx.maybe_raise()
maybe_re: (
ContextCancelled|RemoteActorError
) = ctx.maybe_raise()
if maybe_re:
log.cancel(
f'Suppressing remote-exc from peer,\n'
f'{maybe_re!r}\n'
)
# maybe TODO: pack in come kinda
# `trio.Cancelled.__traceback__` here so they can be
# unwrapped and displayed on the caller side? no se..
raise
raise scope_err
# `@context` entrypoint task bookeeping.
# i.e. only pop the context tracking if used ;)
finally:
assert chan.uid
assert chan.aid
# don't pop the local context until we know the
# associated child isn't in debug any more
@ -802,16 +833,19 @@ async def _invoke(
descr_str += (
f'\n{merr!r}\n' # needed?
f'{tb_str}\n'
f'\n'
f'scope_error:\n'
f'{scope_err!r}\n'
)
else:
descr_str += f'\n{merr!r}\n'
else:
descr_str += f'\nand final result {ctx.outcome!r}\n'
descr_str += f'\nwith final result {ctx.outcome!r}\n'
logmeth(
message
+
descr_str
f'{message}\n'
f'\n'
f'{descr_str}\n'
)
@ -978,8 +1012,6 @@ async def process_messages(
cid=cid,
kwargs=kwargs,
):
kwargs |= {'req_chan': chan}
# XXX NOTE XXX don't start entire actor
# runtime cancellation if this actor is
# currently in debug mode!
@ -998,14 +1030,14 @@ async def process_messages(
cid,
chan,
actor.cancel,
kwargs,
kwargs | {'req_chan': chan},
is_rpc=False,
return_msg_type=CancelAck,
)
log.runtime(
'Cancelling IPC transport msg-loop with peer:\n'
f'|_{chan}\n'
'Cancelling RPC-msg-loop with peer\n'
f'->c}} {chan.aid.reprol()}@[{chan.maddr}]\n'
)
loop_cs.cancel()
break
@ -1018,7 +1050,7 @@ async def process_messages(
):
target_cid: str = kwargs['cid']
kwargs |= {
'requesting_uid': chan.uid,
'requesting_aid': chan.aid,
'ipc_msg': msg,
# XXX NOTE! ONLY the rpc-task-owning
@ -1054,21 +1086,34 @@ async def process_messages(
ns=ns,
func=funcname,
kwargs=kwargs, # type-spec this? see `msg.types`
uid=actorid,
uid=actor_uuid,
):
if actor_uuid != chan.aid.uid:
raise RuntimeError(
f'IPC <Start> msg <-> chan.aid mismatch!?\n'
f'Channel.aid = {chan.aid!r}\n'
f'Start.uid = {actor_uuid!r}\n'
)
# await debug.pause()
op_repr: str = 'Start <=) '
req_repr: str = _pformat.nest_from_op(
input_op=op_repr,
op_suffix='',
nest_prefix='',
text=f'{chan}',
nest_indent=len(op_repr)-1,
rm_from_first_ln='<',
# ^XXX, subtract -1 to account for
# <Channel
# ^_chevron to be stripped
)
start_status: str = (
'Handling RPC `Start` request\n'
f'<= peer: {actorid}\n\n'
f' |_{chan}\n'
f' |_cid: {cid}\n\n'
# f' |_{ns}.{funcname}({kwargs})\n'
f'>> {actor.uid}\n'
f' |_{actor}\n'
f' -> nsp: `{ns}.{funcname}({kwargs})`\n'
# f' |_{ns}.{funcname}({kwargs})\n\n'
# f'{pretty_struct.pformat(msg)}\n'
'Handling RPC request\n'
f'{req_repr}\n'
f'\n'
f'->{{ ipc-context-id: {cid!r}\n'
f'->{{ nsp for fn: `{ns}.{funcname}({kwargs})`\n'
)
# runtime-internal endpoint: `Actor.<funcname>`
@ -1097,10 +1142,6 @@ async def process_messages(
await chan.send(err_msg)
continue
start_status += (
f' -> func: {func}\n'
)
# schedule a task for the requested RPC function
# in the actor's main "service nursery".
#
@ -1108,7 +1149,7 @@ async def process_messages(
# supervision isolation? would avoid having to
# manage RPC tasks individually in `._rpc_tasks`
# table?
start_status += ' -> scheduling new task..\n'
start_status += '->( scheduling new task..\n'
log.runtime(start_status)
try:
ctx: Context = await actor._service_n.start(
@ -1192,12 +1233,24 @@ async def process_messages(
# END-OF `async for`:
# IPC disconnected via `trio.EndOfChannel`, likely
# due to a (graceful) `Channel.aclose()`.
chan_op_repr: str = '<=x] '
chan_repr: str = _pformat.nest_from_op(
input_op=chan_op_repr,
op_suffix='',
nest_prefix='',
text=chan.pformat(),
nest_indent=len(chan_op_repr)-1,
rm_from_first_ln='<',
)
log.runtime(
f'channel for {chan.uid} disconnected, cancelling RPC tasks\n'
f'|_{chan}\n'
f'IPC channel disconnected\n'
f'{chan_repr}\n'
f'\n'
f'->c) cancelling RPC tasks.\n'
)
await actor.cancel_rpc_tasks(
req_uid=actor.uid,
req_aid=actor.aid,
# a "self cancel" in terms of the lifetime of the
# IPC connection which is presumed to be the
# source of any requests for spawned tasks.
@ -1269,13 +1322,37 @@ async def process_messages(
finally:
# msg debugging for when he machinery is brokey
if msg is None:
message: str = 'Exiting IPC msg loop without receiving a msg?'
message: str = 'Exiting RPC-loop without receiving a msg?'
else:
task_op_repr: str = ')>'
task: trio.Task = trio.lowlevel.current_task()
# maybe add cancelled opt prefix
if task._cancel_status.effectively_cancelled:
task_op_repr = 'c' + task_op_repr
task_repr: str = _pformat.nest_from_op(
input_op=task_op_repr,
text=f'{task!r}',
nest_indent=1,
)
# chan_op_repr: str = '<=} '
# chan_repr: str = _pformat.nest_from_op(
# input_op=chan_op_repr,
# op_suffix='',
# nest_prefix='',
# text=chan.pformat(),
# nest_indent=len(chan_op_repr)-1,
# rm_from_first_ln='<',
# )
message: str = (
'Exiting IPC msg loop with final msg\n\n'
f'<= peer: {chan.uid}\n'
f' |_{chan}\n\n'
# f'{pretty_struct.pformat(msg)}'
f'Exiting RPC-loop with final msg\n'
f'\n'
# f'{chan_repr}\n'
f'{task_repr}\n'
f'\n'
f'{pretty_struct.pformat(msg)}'
f'\n'
)
log.runtime(message)

View File

@ -55,6 +55,7 @@ from typing import (
TYPE_CHECKING,
)
import uuid
import textwrap
from types import ModuleType
import warnings
@ -73,6 +74,9 @@ from tractor.msg import (
pretty_struct,
types as msgtypes,
)
from .trionics import (
collapse_eg,
)
from .ipc import (
Channel,
# IPCServer, # causes cycles atm..
@ -97,7 +101,10 @@ from ._exceptions import (
MsgTypeError,
unpack_error,
)
from .devx import debug
from .devx import (
debug,
pformat as _pformat
)
from ._discovery import get_registry
from ._portal import Portal
from . import _state
@ -206,7 +213,7 @@ class Actor:
*,
enable_modules: list[str] = [],
loglevel: str|None = None,
registry_addrs: list[UnwrappedAddress]|None = None,
registry_addrs: list[Address]|None = None,
spawn_method: str|None = None,
# TODO: remove!
@ -227,7 +234,7 @@ class Actor:
# state
self._cancel_complete = trio.Event()
self._cancel_called_by_remote: tuple[str, tuple]|None = None
self._cancel_called_by: tuple[str, tuple]|None = None
self._cancel_called: bool = False
# retreive and store parent `__main__` data which
@ -249,11 +256,12 @@ class Actor:
if arbiter_addr is not None:
warnings.warn(
'`Actor(arbiter_addr=<blah>)` is now deprecated.\n'
'Use `registry_addrs: list[tuple]` instead.',
'Use `registry_addrs: list[Address]` instead.',
DeprecationWarning,
stacklevel=2,
)
registry_addrs: list[UnwrappedAddress] = [arbiter_addr]
registry_addrs: list[Address] = [wrap_address(arbiter_addr)]
# marked by the process spawning backend at startup
# will be None for the parent most process started manually
@ -292,8 +300,10 @@ class Actor:
# input via the validator.
self._reg_addrs: list[UnwrappedAddress] = []
if registry_addrs:
self.reg_addrs: list[UnwrappedAddress] = registry_addrs
_state._runtime_vars['_registry_addrs'] = registry_addrs
_state._runtime_vars['_registry_addrs'] = self.reg_addrs = [
addr.unwrap()
for addr in registry_addrs
]
@property
def aid(self) -> msgtypes.Aid:
@ -339,46 +349,125 @@ class Actor:
def pid(self) -> int:
return self._aid.pid
def pformat(self) -> str:
ds: str = '='
parent_uid: tuple|None = None
if rent_chan := self._parent_chan:
parent_uid = rent_chan.uid
@property
def repr_state(self) -> str:
if self.cancel_complete:
return 'cancelled'
elif canceller := self.cancel_caller:
return f' and cancel-called by {canceller}'
else:
return 'running'
def pformat(
self,
ds: str = ': ',
indent: int = 0,
privates: bool = False,
) -> str:
fmtstr: str = f'|_id: {self.aid.reprol()!r}\n'
if privates:
aid_nest_prefix: str = '|_aid='
aid_field_repr: str = _pformat.nest_from_op(
input_op='',
text=pretty_struct.pformat(
struct=self.aid,
field_indent=2,
),
op_suffix='',
nest_prefix=aid_nest_prefix,
nest_indent=0,
)
fmtstr: str = f'{aid_field_repr}'
if rent_chan := self._parent_chan:
fmtstr += (
f"|_parent{ds}{rent_chan.aid.reprol()}\n"
)
peers: list = []
server: _server.IPCServer = self.ipc_server
if server:
peers: list[tuple] = list(server._peer_connected)
if privates:
server_repr: str = self._ipc_server.pformat(
privates=privates,
)
# create field ln as a key-header indented under
# and up to the section's key prefix.
# ^XXX if we were to indent `repr(Server)` to
# '<key>: '
# _here_^
server_repr: str = _pformat.nest_from_op(
input_op='', # nest as sub-obj
op_suffix='',
text=server_repr,
)
fmtstr += (
f"{server_repr}"
)
else:
fmtstr += (
f'|_ipc: {server.repr_state!r}\n'
)
fmtstr: str = (
f' |_id: {self.aid!r}\n'
# f" aid{ds}{self.aid!r}\n"
f" parent{ds}{parent_uid}\n"
f'\n'
f' |_ipc: {len(peers)!r} connected peers\n'
f" peers{ds}{peers!r}\n"
f" ipc_server{ds}{self._ipc_server}\n"
f'\n'
f' |_rpc: {len(self._rpc_tasks)} tasks\n'
f" ctxs{ds}{len(self._contexts)}\n"
f'\n'
f' |_runtime: ._task{ds}{self._task!r}\n'
f' _spawn_method{ds}{self._spawn_method}\n'
f' _actoruid2nursery{ds}{self._actoruid2nursery}\n'
f' _forkserver_info{ds}{self._forkserver_info}\n'
f'\n'
f' |_state: "TODO: .repr_state()"\n'
f' _cancel_complete{ds}{self._cancel_complete}\n'
f' _cancel_called_by_remote{ds}{self._cancel_called_by_remote}\n'
f' _cancel_called{ds}{self._cancel_called}\n'
fmtstr += (
f'|_rpc: {len(self._rpc_tasks)} active tasks\n'
)
return (
'<Actor(\n'
+
fmtstr
+
')>\n'
# TODO, actually fix the .repr_state impl/output?
# append ipc-ctx state summary
# ctxs: dict = self._contexts
# if ctxs:
# ctx_states: dict[str, int] = {}
# for ctx in self._contexts.values():
# ctx_state: str = ctx.repr_state
# cnt = ctx_states.setdefault(ctx_state, 0)
# ctx_states[ctx_state] = cnt + 1
# fmtstr += (
# f" ctxs{ds}{ctx_states}\n"
# )
# runtime-state
task_name: str = '<dne>'
if task := self._task:
task_name: str = task.name
fmtstr += (
# TODO, this just like ctx?
f'|_state: {self.repr_state!r}\n'
f' task: {task_name}\n'
f' loglevel: {self.loglevel!r}\n'
f' subactors_spawned: {len(self._actoruid2nursery)}\n'
)
if not _state.is_root_process():
fmtstr += f' spawn_method: {self._spawn_method!r}\n'
if privates:
fmtstr += (
# f' actoruid2nursery{ds}{self._actoruid2nursery}\n'
f' cancel_complete{ds}{self._cancel_complete}\n'
f' cancel_called_by_remote{ds}{self._cancel_called_by}\n'
f' cancel_called{ds}{self._cancel_called}\n'
)
if fmtstr:
fmtstr: str = textwrap.indent(
text=fmtstr,
prefix=' '*(1 + indent),
)
_repr: str = (
f'<{type(self).__name__}(\n'
f'{fmtstr}'
f')>\n'
)
if indent:
_repr: str = textwrap.indent(
text=_repr,
prefix=' '*indent,
)
return _repr
__repr__ = pformat
@ -386,7 +475,11 @@ class Actor:
def reg_addrs(self) -> list[UnwrappedAddress]:
'''
List of (socket) addresses for all known (and contactable)
registry actors.
registry-service actors in "unwrapped" (i.e. IPC interchange
wire-compat) form.
If you are looking for the "wrapped" address form, use
`.registry_addrs` instead.
'''
return self._reg_addrs
@ -405,8 +498,14 @@ class Actor:
self._reg_addrs = addrs
@property
def registry_addrs(self) -> list[Address]:
return [wrap_address(uw_addr)
for uw_addr in self.reg_addrs]
def load_modules(
self,
) -> None:
'''
Load explicitly enabled python modules from local fs after
@ -453,6 +552,14 @@ class Actor:
)
raise
# ?TODO, factor this meth-iface into a new `.rpc` subsys primitive?
# - _get_rpc_func(),
# - _deliver_ctx_payload(),
# - get_context(),
# - start_remote_task(),
# - cancel_rpc_tasks(),
# - _cancel_task(),
#
def _get_rpc_func(self, ns, funcname):
'''
Try to lookup and return a target RPC func from the
@ -496,11 +603,11 @@ class Actor:
queue.
'''
uid: tuple[str, str] = chan.uid
assert uid, f"`chan.uid` can't be {uid}"
aid: msgtypes.Aid = chan.aid
assert aid, f"`chan.aid` can't be {aid}"
try:
ctx: Context = self._contexts[(
uid,
aid.uid,
cid,
# TODO: how to determine this tho?
@ -511,7 +618,7 @@ class Actor:
'Ignoring invalid IPC msg!?\n'
f'Ctx seems to not/no-longer exist??\n'
f'\n'
f'<=? {uid}\n'
f'<=? {aid.reprol()!r}\n'
f' |_{pretty_struct.pformat(msg)}\n'
)
match msg:
@ -560,6 +667,7 @@ class Actor:
msging session's lifetime.
'''
# ?TODO, use Aid here as well?
actor_uid = chan.uid
assert actor_uid
try:
@ -908,6 +1016,22 @@ class Actor:
None, # self cancel all rpc tasks
)
@property
def cancel_complete(self) -> bool:
return self._cancel_complete.is_set()
@property
def cancel_called(self) -> bool:
'''
Was this actor requested to cancel by a remote peer actor.
'''
return self._cancel_called_by is not None
@property
def cancel_caller(self) -> msgtypes.Aid|None:
return self._cancel_called_by
async def cancel(
self,
@ -932,20 +1056,18 @@ class Actor:
'''
(
requesting_uid,
requester_type,
requesting_aid, # Aid
requester_type, # str
req_chan,
log_meth,
) = (
req_chan.uid,
req_chan.aid,
'peer',
req_chan,
log.cancel,
) if req_chan else (
# a self cancel of ALL rpc tasks
self.uid,
self.aid,
'self',
self,
log.runtime,
@ -953,14 +1075,14 @@ class Actor:
# TODO: just use the new `Context.repr_rpc: str` (and
# other) repr fields instead of doing this all manual..
msg: str = (
f'Actor-runtime cancel request from {requester_type}\n\n'
f'<=c) {requesting_uid}\n'
f' |_{self}\n'
f'Actor-runtime cancel request from {requester_type!r}\n'
f'\n'
f'<=c)\n'
f'{self}'
)
# TODO: what happens here when we self-cancel tho?
self._cancel_called_by_remote: tuple = requesting_uid
self._cancel_called_by: tuple = requesting_aid
self._cancel_called = True
# cancel all ongoing rpc tasks
@ -988,7 +1110,7 @@ class Actor:
# self-cancel **all** ongoing RPC tasks
await self.cancel_rpc_tasks(
req_uid=requesting_uid,
req_aid=requesting_aid,
parent_chan=None,
)
@ -1005,19 +1127,11 @@ class Actor:
self._cancel_complete.set()
return True
# XXX: hard kill logic if needed?
# def _hard_mofo_kill(self):
# # If we're the root actor or zombied kill everything
# if self._parent_chan is None: # TODO: more robust check
# root = trio.lowlevel.current_root_task()
# for n in root.child_nurseries:
# n.cancel_scope.cancel()
async def _cancel_task(
self,
cid: str,
parent_chan: Channel,
requesting_uid: tuple[str, str]|None,
requesting_aid: msgtypes.Aid|None,
ipc_msg: dict|None|bool = False,
@ -1055,7 +1169,7 @@ class Actor:
log.runtime(
'Cancel request for invalid RPC task.\n'
'The task likely already completed or was never started!\n\n'
f'<= canceller: {requesting_uid}\n'
f'<= canceller: {requesting_aid}\n'
f'=> {cid}@{parent_chan.uid}\n'
f' |_{parent_chan}\n'
)
@ -1063,9 +1177,12 @@ class Actor:
log.cancel(
'Rxed cancel request for RPC task\n'
f'<=c) {requesting_uid}\n'
f' |_{ctx._task}\n'
f' >> {ctx.repr_rpc}\n'
f'{ctx._task!r} <=c) {requesting_aid}\n'
f'|_>> {ctx.repr_rpc}\n'
# f'|_{ctx._task}\n'
# f' >> {ctx.repr_rpc}\n'
# f'=> {ctx._task}\n'
# f' >> Actor._cancel_task() => {ctx._task}\n'
# f' |_ {ctx._task}\n\n'
@ -1086,9 +1203,9 @@ class Actor:
)
if (
ctx._canceller is None
and requesting_uid
and requesting_aid
):
ctx._canceller: tuple = requesting_uid
ctx._canceller: tuple = requesting_aid.uid
# TODO: pack the RPC `{'cmd': <blah>}` msg into a ctxc and
# then raise and pack it here?
@ -1114,7 +1231,7 @@ class Actor:
# wait for _invoke to mark the task complete
flow_info: str = (
f'<= canceller: {requesting_uid}\n'
f'<= canceller: {requesting_aid}\n'
f'=> ipc-parent: {parent_chan}\n'
f'|_{ctx}\n'
)
@ -1131,7 +1248,7 @@ class Actor:
async def cancel_rpc_tasks(
self,
req_uid: tuple[str, str],
req_aid: msgtypes.Aid,
# NOTE: when None is passed we cancel **all** rpc
# tasks running in this actor!
@ -1148,7 +1265,7 @@ class Actor:
if not tasks:
log.runtime(
'Actor has no cancellable RPC tasks?\n'
f'<= canceller: {req_uid}\n'
f'<= canceller: {req_aid.reprol()}\n'
)
return
@ -1188,7 +1305,7 @@ class Actor:
)
log.cancel(
f'Cancelling {descr} RPC tasks\n\n'
f'<=c) {req_uid} [canceller]\n'
f'<=c) {req_aid} [canceller]\n'
f'{rent_chan_repr}'
f'c)=> {self.uid} [cancellee]\n'
f' |_{self} [with {len(tasks)} tasks]\n'
@ -1216,7 +1333,7 @@ class Actor:
await self._cancel_task(
cid,
task_caller_chan,
requesting_uid=req_uid,
requesting_aid=req_aid,
)
if tasks:
@ -1244,25 +1361,13 @@ class Actor:
'''
return self.accept_addrs[0]
def get_parent(self) -> Portal:
'''
Return a `Portal` to our parent.
'''
assert self._parent_chan, "No parent channel for this actor?"
return Portal(self._parent_chan)
def get_chans(
self,
uid: tuple[str, str],
) -> list[Channel]:
'''
Return all IPC channels to the actor with provided `uid`.
'''
return self._peers[uid]
# TODO, this should delegate ONLY to the
# `._spawn_spec._runtime_vars: dict` / `._state` APIs?
#
# XXX, AH RIGHT that's why..
# it's bc we pass this as a CLI flag to the child.py precisely
# bc we need the bootstrapping pre `async_main()`.. but maybe
# keep this as an impl deat and not part of the pub iface impl?
def is_infected_aio(self) -> bool:
'''
If `True`, this actor is running `trio` in guest mode on
@ -1273,6 +1378,23 @@ class Actor:
'''
return self._infected_aio
# ?TODO, is this the right type for this method?
def get_parent(self) -> Portal:
'''
Return a `Portal` to our parent.
'''
assert self._parent_chan, "No parent channel for this actor?"
return Portal(self._parent_chan)
# XXX: hard kill logic if needed?
# def _hard_mofo_kill(self):
# # If we're the root actor or zombied kill everything
# if self._parent_chan is None: # TODO: more robust check
# root = trio.lowlevel.current_root_task()
# for n in root.child_nurseries:
# n.cancel_scope.cancel()
async def async_main(
actor: Actor,
@ -1316,6 +1438,8 @@ async def async_main(
# establish primary connection with immediate parent
actor._parent_chan: Channel|None = None
# is this a sub-actor?
# get runtime info from parent.
if parent_addr is not None:
(
actor._parent_chan,
@ -1350,18 +1474,18 @@ async def async_main(
# parent is kept alive as a resilient service until
# cancellation steps have (mostly) occurred in
# a deterministic way.
async with trio.open_nursery(
strict_exception_groups=False,
) as root_nursery:
actor._root_n = root_nursery
root_tn: trio.Nursery
async with (
collapse_eg(),
trio.open_nursery() as root_tn,
):
actor._root_n = root_tn
assert actor._root_n
ipc_server: _server.IPCServer
async with (
trio.open_nursery(
strict_exception_groups=False,
) as service_nursery,
collapse_eg(),
trio.open_nursery() as service_nursery,
_server.open_ipc_server(
parent_tn=service_nursery,
stream_handler_tn=service_nursery,
@ -1412,9 +1536,6 @@ async def async_main(
# TODO: why is this not with the root nursery?
try:
log.runtime(
'Booting IPC server'
)
eps: list = await ipc_server.listen_on(
accept_addrs=accept_addrs,
stream_handler_nursery=service_nursery,
@ -1446,18 +1567,6 @@ async def async_main(
# TODO, just read direct from ipc_server?
accept_addrs: list[UnwrappedAddress] = actor.accept_addrs
# NOTE: only set the loopback addr for the
# process-tree-global "root" mailbox since
# all sub-actors should be able to speak to
# their root actor over that channel.
if _state._runtime_vars['_is_root']:
raddrs: list[Address] = _state._runtime_vars['_root_addrs']
for addr in accept_addrs:
waddr: Address = wrap_address(addr)
raddrs.append(addr)
else:
_state._runtime_vars['_root_mailbox'] = raddrs[0]
# Register with the arbiter if we're told its addr
log.runtime(
f'Registering `{actor.name}` => {pformat(accept_addrs)}\n'
@ -1475,6 +1584,7 @@ async def async_main(
except AssertionError:
await debug.pause()
# !TODO, get rid of the local-portal crap XD
async with get_registry(addr) as reg_portal:
for accept_addr in accept_addrs:
accept_addr = wrap_address(accept_addr)
@ -1499,7 +1609,7 @@ async def async_main(
# start processing parent requests until our channel
# server is 100% up and running.
if actor._parent_chan:
await root_nursery.start(
await root_tn.start(
partial(
_rpc.process_messages,
chan=actor._parent_chan,
@ -1511,8 +1621,9 @@ async def async_main(
# 'Blocking on service nursery to exit..\n'
)
log.runtime(
"Service nursery complete\n"
"Waiting on root nursery to complete"
'Service nursery complete\n'
'\n'
'->} waiting on root nursery to complete..\n'
)
# Blocks here as expected until the root nursery is
@ -1567,6 +1678,7 @@ async def async_main(
finally:
teardown_report: str = (
'Main actor-runtime task completed\n'
'\n'
)
# ?TODO? should this be in `._entry`/`._root` mods instead?
@ -1608,7 +1720,8 @@ async def async_main(
# Unregister actor from the registry-sys / registrar.
if (
is_registered
and not actor.is_registrar
and
not actor.is_registrar
):
failed: bool = False
for addr in actor.reg_addrs:
@ -1643,23 +1756,30 @@ async def async_main(
ipc_server.has_peers(check_chans=True)
):
teardown_report += (
f'-> Waiting for remaining peers {ipc_server._peers} to clear..\n'
f'-> Waiting for remaining peers to clear..\n'
f' {pformat(ipc_server._peers)}'
)
log.runtime(teardown_report)
await ipc_server.wait_for_no_more_peers(
shield=True,
)
await ipc_server.wait_for_no_more_peers()
teardown_report += (
'-> All peer channels are complete\n'
'-]> all peer channels are complete.\n'
)
# op_nested_actor_repr: str = _pformat.nest_from_op(
# input_op=')>',
# text=actor.pformat(),
# nest_prefix='|_',
# nest_indent=1, # under >
# )
teardown_report += (
'Actor runtime exiting\n'
f'>)\n'
f'|_{actor}\n'
'-)> actor runtime main task exit.\n'
# f'{op_nested_actor_repr}'
)
log.info(teardown_report)
# if _state._runtime_vars['_is_root']:
# log.info(teardown_report)
# else:
log.runtime(teardown_report)
# TODO: rename to `Registry` and move to `.discovery._registry`!

View File

@ -34,9 +34,9 @@ from typing import (
import trio
from trio import TaskStatus
from .devx.debug import (
maybe_wait_for_debugger,
acquire_debug_lock,
from .devx import (
debug,
pformat as _pformat
)
from tractor._state import (
current_actor,
@ -51,14 +51,17 @@ from tractor._portal import Portal
from tractor._runtime import Actor
from tractor._entry import _mp_main
from tractor._exceptions import ActorFailure
from tractor.msg.types import (
Aid,
SpawnSpec,
from tractor.msg import (
types as msgtypes,
pretty_struct,
)
if TYPE_CHECKING:
from ipc import IPCServer
from ipc import (
_server,
Channel,
)
from ._supervise import ActorNursery
ProcessType = TypeVar('ProcessType', mp.Process, trio.Process)
@ -328,20 +331,21 @@ async def soft_kill(
see `.hard_kill()`).
'''
peer_aid: Aid = portal.channel.aid
chan: Channel = portal.channel
peer_aid: msgtypes.Aid = chan.aid
try:
log.cancel(
f'Soft killing sub-actor via portal request\n'
f'\n'
f'(c=> {peer_aid}\n'
f' |_{proc}\n'
f'c)=> {peer_aid.reprol()}@[{chan.maddr}]\n'
f' |_{proc}\n'
)
# wait on sub-proc to signal termination
await wait_func(proc)
except trio.Cancelled:
with trio.CancelScope(shield=True):
await maybe_wait_for_debugger(
await debug.maybe_wait_for_debugger(
child_in_debug=_runtime_vars.get(
'_debug_mode', False
),
@ -465,7 +469,7 @@ async def trio_proc(
"--uid",
# TODO, how to pass this over "wire" encodings like
# cmdline args?
# -[ ] maybe we can add an `Aid.min_tuple()` ?
# -[ ] maybe we can add an `msgtypes.Aid.min_tuple()` ?
str(subactor.uid),
# Address the child must connect to on startup
"--parent_addr",
@ -483,13 +487,14 @@ async def trio_proc(
cancelled_during_spawn: bool = False
proc: trio.Process|None = None
ipc_server: IPCServer = actor_nursery._actor.ipc_server
ipc_server: _server.Server = actor_nursery._actor.ipc_server
try:
try:
proc: trio.Process = await trio.lowlevel.open_process(spawn_cmd, **proc_kwargs)
log.runtime(
'Started new child\n'
f'|_{proc}\n'
f'Started new child subproc\n'
f'(>\n'
f' |_{proc}\n'
)
# wait for actor to spawn and connect back to us
@ -507,10 +512,10 @@ async def trio_proc(
with trio.CancelScope(shield=True):
# don't clobber an ongoing pdb
if is_root_process():
await maybe_wait_for_debugger()
await debug.maybe_wait_for_debugger()
elif proc is not None:
async with acquire_debug_lock(subactor.uid):
async with debug.acquire_debug_lock(subactor.uid):
# soft wait on the proc to terminate
with trio.move_on_after(0.5):
await proc.wait()
@ -528,14 +533,19 @@ async def trio_proc(
# send a "spawning specification" which configures the
# initial runtime state of the child.
sspec = SpawnSpec(
sspec = msgtypes.SpawnSpec(
_parent_main_data=subactor._parent_main_data,
enable_modules=subactor.enable_modules,
reg_addrs=subactor.reg_addrs,
bind_addrs=bind_addrs,
_runtime_vars=_runtime_vars,
)
log.runtime(f'Sending spawn spec: {str(sspec)}')
log.runtime(
f'Sending spawn spec to child\n'
f'{{}}=> {chan.aid.reprol()!r}\n'
f'\n'
f'{pretty_struct.pformat(sspec)}\n'
)
await chan.send(sspec)
# track subactor in current nursery
@ -563,7 +573,7 @@ async def trio_proc(
# condition.
await soft_kill(
proc,
trio.Process.wait,
trio.Process.wait, # XXX, uses `pidfd_open()` below.
portal
)
@ -571,8 +581,7 @@ async def trio_proc(
# tandem if not done already
log.cancel(
'Cancelling portal result reaper task\n'
f'>c)\n'
f' |_{subactor.uid}\n'
f'c)> {subactor.aid.reprol()!r}\n'
)
nursery.cancel_scope.cancel()
@ -581,21 +590,24 @@ async def trio_proc(
# allowed! Do this **after** cancellation/teardown to avoid
# killing the process too early.
if proc:
reap_repr: str = _pformat.nest_from_op(
input_op='>x)',
text=subactor.pformat(),
)
log.cancel(
f'Hard reap sequence starting for subactor\n'
f'>x)\n'
f' |_{subactor}@{subactor.uid}\n'
f'{reap_repr}'
)
with trio.CancelScope(shield=True):
# don't clobber an ongoing pdb
if cancelled_during_spawn:
# Try again to avoid TTY clobbering.
async with acquire_debug_lock(subactor.uid):
async with debug.acquire_debug_lock(subactor.uid):
with trio.move_on_after(0.5):
await proc.wait()
await maybe_wait_for_debugger(
await debug.maybe_wait_for_debugger(
child_in_debug=_runtime_vars.get(
'_debug_mode', False
),
@ -624,7 +636,7 @@ async def trio_proc(
# acquire the lock and get notified of who has it,
# check that uid against our known children?
# this_uid: tuple[str, str] = current_actor().uid
# await acquire_debug_lock(this_uid)
# await debug.acquire_debug_lock(this_uid)
if proc.poll() is None:
log.cancel(f"Attempting to hard kill {proc}")
@ -727,7 +739,7 @@ async def mp_proc(
log.runtime(f"Started {proc}")
ipc_server: IPCServer = actor_nursery._actor.ipc_server
ipc_server: _server.Server = actor_nursery._actor.ipc_server
try:
# wait for actor to spawn and connect back to us
# channel should have handshake completed by the

View File

@ -102,6 +102,9 @@ class MsgStream(trio.abc.Channel):
self._eoc: bool|trio.EndOfChannel = False
self._closed: bool|trio.ClosedResourceError = False
def is_eoc(self) -> bool|trio.EndOfChannel:
return self._eoc
@property
def ctx(self) -> Context:
'''
@ -188,7 +191,14 @@ class MsgStream(trio.abc.Channel):
return pld
async def receive(
# XXX NOTE, this is left private because in `.subscribe()` usage
# we rebind the public `.recieve()` to a `BroadcastReceiver` but
# on `.subscribe().__aexit__()`, for the first task which enters,
# we want to revert to this msg-stream-instance's method since
# mult-task-tracking provided by the b-caster is then no longer
# necessary.
#
async def _receive(
self,
hide_tb: bool = False,
):
@ -313,6 +323,8 @@ class MsgStream(trio.abc.Channel):
raise src_err
receive = _receive
async def aclose(self) -> list[Exception|dict]:
'''
Cancel associated remote actor task and local memory channel on
@ -528,10 +540,15 @@ class MsgStream(trio.abc.Channel):
receiver wrapper.
'''
# NOTE: This operation is indempotent and non-reversible, so be
# sure you can deal with any (theoretical) overhead of the the
# allocated ``BroadcastReceiver`` before calling this method for
# the first time.
# XXX NOTE, This operation was originally implemented as
# indempotent and non-reversible, so you had to be **VERY**
# aware of any (theoretical) overhead from the allocated
# `BroadcastReceiver.receive()`.
#
# HOWEVER, NOw we do revert and de-alloc the ._broadcaster
# when the final caller (task) exits.
#
bcast: BroadcastReceiver|None = None
if self._broadcaster is None:
bcast = self._broadcaster = broadcast_receiver(
@ -541,29 +558,60 @@ class MsgStream(trio.abc.Channel):
# TODO: can remove this kwarg right since
# by default behaviour is to do this anyway?
receive_afunc=self.receive,
receive_afunc=self._receive,
)
# NOTE: we override the original stream instance's receive
# method to now delegate to the broadcaster's ``.receive()``
# such that new subscribers will be copied received values
# and this stream doesn't have to expect it's original
# consumer(s) to get a new broadcast rx handle.
# XXX NOTE, we override the original stream instance's
# receive method to instead delegate to the broadcaster's
# `.receive()` such that new subscribers (multiple
# `trio.Task`s) will be copied received values and the
# *first* task to enter here doesn't have to expect its original consumer(s)
# to get a new broadcast rx handle; everything happens
# underneath this iface seemlessly.
#
self.receive = bcast.receive # type: ignore
# seems there's no graceful way to type this with ``mypy``?
# seems there's no graceful way to type this with `mypy`?
# https://github.com/python/mypy/issues/708
async with self._broadcaster.subscribe() as bstream:
assert bstream.key != self._broadcaster.key
assert bstream._recv == self._broadcaster._recv
# TODO, prevent re-entrant sub scope?
# if self._broadcaster._closed:
# raise RuntimeError(
# 'This stream
# NOTE: we patch on a `.send()` to the bcaster so that the
# caller can still conduct 2-way streaming using this
# ``bstream`` handle transparently as though it was the msg
# stream instance.
bstream.send = self.send # type: ignore
try:
aenter = self._broadcaster.subscribe()
async with aenter as bstream:
# ?TODO, move into test suite?
assert bstream.key != self._broadcaster.key
assert bstream._recv == self._broadcaster._recv
yield bstream
# NOTE: we patch on a `.send()` to the bcaster so that the
# caller can still conduct 2-way streaming using this
# ``bstream`` handle transparently as though it was the msg
# stream instance.
bstream.send = self.send # type: ignore
# newly-allocated instance
yield bstream
finally:
# XXX, the first-enterer task should, like all other
# subs, close the first allocated bcrx, which adjusts the
# common `bcrx.state`
with trio.CancelScope(shield=True):
if bcast is not None:
await bcast.aclose()
# XXX, when the bcrx.state reports there are no more subs
# we can revert to this obj's method, removing any
# delegation overhead!
if (
(orig_bcast := self._broadcaster)
and
not orig_bcast.state.subs
):
self.receive = self._receive
# self._broadcaster = None
async def send(
self,

View File

@ -21,7 +21,6 @@
from contextlib import asynccontextmanager as acm
from functools import partial
import inspect
from pprint import pformat
from typing import (
TYPE_CHECKING,
)
@ -31,7 +30,10 @@ import warnings
import trio
from .devx.debug import maybe_wait_for_debugger
from .devx import (
debug,
pformat as _pformat,
)
from ._addr import (
UnwrappedAddress,
mk_uuid,
@ -40,8 +42,11 @@ from ._state import current_actor, is_main_process
from .log import get_logger, get_loglevel
from ._runtime import Actor
from ._portal import Portal
from ._exceptions import (
from .trionics import (
is_multi_cancelled,
collapse_eg,
)
from ._exceptions import (
ContextCancelled,
)
from ._root import (
@ -112,7 +117,6 @@ class ActorNursery:
]
] = {}
self.cancelled: bool = False
self._join_procs = trio.Event()
self._at_least_one_child_in_debug: bool = False
self.errors = errors
@ -130,10 +134,53 @@ class ActorNursery:
# TODO: remove the `.run_in_actor()` API and thus this 2ndary
# nursery when that API get's moved outside this primitive!
self._ria_nursery = ria_nursery
# TODO, factor this into a .hilevel api!
#
# portals spawned with ``run_in_actor()`` are
# cancelled when their "main" result arrives
self._cancel_after_result_on_exit: set = set()
# trio.Nursery-like cancel (request) statuses
self._cancelled_caught: bool = False
self._cancel_called: bool = False
@property
def cancel_called(self) -> bool:
'''
Records whether cancellation has been requested for this
actor-nursery by a call to `.cancel()` either due to,
- an explicit call by some actor-local-task,
- an implicit call due to an error/cancel emited inside
the `tractor.open_nursery()` block.
'''
return self._cancel_called
@property
def cancelled_caught(self) -> bool:
'''
Set when this nursery was able to cance all spawned subactors
gracefully via an (implicit) call to `.cancel()`.
'''
return self._cancelled_caught
# TODO! remove internal/test-suite usage!
@property
def cancelled(self) -> bool:
warnings.warn(
"`ActorNursery.cancelled` is now deprecated, use "
" `.cancel_called` instead.",
DeprecationWarning,
stacklevel=2,
)
return (
self._cancel_called
# and
# self._cancelled_caught
)
async def start_actor(
self,
name: str,
@ -197,7 +244,7 @@ class ActorNursery:
loglevel=loglevel,
# verbatim relay this actor's registrar addresses
registry_addrs=current_actor().reg_addrs,
registry_addrs=current_actor().registry_addrs,
)
parent_addr: UnwrappedAddress = self._actor.accept_addr
assert parent_addr
@ -311,7 +358,7 @@ class ActorNursery:
'''
__runtimeframe__: int = 1 # noqa
self.cancelled = True
self._cancel_called = True
# TODO: impl a repr for spawn more compact
# then `._children`..
@ -322,9 +369,10 @@ class ActorNursery:
server: IPCServer = self._actor.ipc_server
with trio.move_on_after(3) as cs:
async with trio.open_nursery(
strict_exception_groups=False,
) as tn:
async with (
collapse_eg(),
trio.open_nursery() as tn,
):
subactor: Actor
proc: trio.Process
@ -388,6 +436,8 @@ class ActorNursery:
) in children.values():
log.warning(f"Hard killing process {proc}")
proc.terminate()
else:
self._cancelled_caught
# mark ourselves as having (tried to have) cancelled all subactors
self._join_procs.set()
@ -417,10 +467,10 @@ async def _open_and_supervise_one_cancels_all_nursery(
# `ActorNursery.start_actor()`).
# errors from this daemon actor nursery bubble up to caller
async with trio.open_nursery(
strict_exception_groups=False,
# ^XXX^ TODO? instead unpack any RAE as per "loose" style?
) as da_nursery:
async with (
collapse_eg(),
trio.open_nursery() as da_nursery,
):
try:
# This is the inner level "run in actor" nursery. It is
# awaited first since actors spawned in this way (using
@ -430,11 +480,10 @@ async def _open_and_supervise_one_cancels_all_nursery(
# immediately raised for handling by a supervisor strategy.
# As such if the strategy propagates any error(s) upwards
# the above "daemon actor" nursery will be notified.
async with trio.open_nursery(
strict_exception_groups=False,
# ^XXX^ TODO? instead unpack any RAE as per "loose" style?
) as ria_nursery:
async with (
collapse_eg(),
trio.open_nursery() as ria_nursery,
):
an = ActorNursery(
actor,
ria_nursery,
@ -451,7 +500,7 @@ async def _open_and_supervise_one_cancels_all_nursery(
# the "hard join phase".
log.runtime(
'Waiting on subactors to complete:\n'
f'{pformat(an._children)}\n'
f'>}} {len(an._children)}\n'
)
an._join_procs.set()
@ -465,7 +514,7 @@ async def _open_and_supervise_one_cancels_all_nursery(
# will make the pdb repl unusable.
# Instead try to wait for pdb to be released before
# tearing down.
await maybe_wait_for_debugger(
await debug.maybe_wait_for_debugger(
child_in_debug=an._at_least_one_child_in_debug
)
@ -541,7 +590,7 @@ async def _open_and_supervise_one_cancels_all_nursery(
# XXX: yet another guard before allowing the cancel
# sequence in case a (single) child is in debug.
await maybe_wait_for_debugger(
await debug.maybe_wait_for_debugger(
child_in_debug=an._at_least_one_child_in_debug
)
@ -590,9 +639,14 @@ async def _open_and_supervise_one_cancels_all_nursery(
# final exit
@acm
_shutdown_msg: str = (
'Actor-runtime-shutdown'
)
# @api_frame
@acm
async def open_nursery(
*, # named params only!
hide_tb: bool = True,
**kwargs,
# ^TODO, paramspec for `open_root_actor()`
@ -677,17 +731,26 @@ async def open_nursery(
):
__tracebackhide__: bool = False
msg: str = (
'Actor-nursery exited\n'
f'|_{an}\n'
op_nested_an_repr: str = _pformat.nest_from_op(
input_op=')>',
text=f'{an}',
# nest_prefix='|_',
nest_indent=1, # under >
)
an_msg: str = (
f'Actor-nursery exited\n'
f'{op_nested_an_repr}\n'
)
# keep noise low during std operation.
log.runtime(an_msg)
if implicit_runtime:
# shutdown runtime if it was started and report noisly
# that we're did so.
msg += '=> Shutting down actor runtime <=\n'
msg: str = (
'\n'
'\n'
f'{_shutdown_msg} )>\n'
)
log.info(msg)
else:
# keep noise low during std operation.
log.runtime(msg)

View File

@ -59,7 +59,7 @@ from tractor._state import (
debug_mode,
)
from tractor.log import get_logger
from tractor._exceptions import (
from tractor.trionics import (
is_multi_cancelled,
)
from ._trace import (

View File

@ -101,11 +101,27 @@ class Channel:
# ^XXX! ONLY set if a remote actor sends an `Error`-msg
self._closed: bool = False
# flag set by ``Portal.cancel_actor()`` indicating remote
# (possibly peer) cancellation of the far end actor
# runtime.
# flag set by `Portal.cancel_actor()` indicating remote
# (possibly peer) cancellation of the far end actor runtime.
self._cancel_called: bool = False
@property
def closed(self) -> bool:
'''
Was `.aclose()` successfully called?
'''
return self._closed
@property
def cancel_called(self) -> bool:
'''
Set when `Portal.cancel_actor()` is called on a portal which
wraps this IPC channel.
'''
return self._cancel_called
@property
def uid(self) -> tuple[str, str]:
'''
@ -171,11 +187,23 @@ class Channel:
)
assert transport.raddr == addr
chan = Channel(transport=transport)
log.runtime(
f'Connected channel IPC transport\n'
f'[>\n'
f' |_{chan}\n'
)
# ?TODO, compact this into adapter level-methods?
# -[ ] would avoid extra repr-calcs if level not active?
# |_ how would the `calc_if_level` look though? func?
if log.at_least_level('runtime'):
from tractor.devx import (
pformat as _pformat,
)
chan_repr: str = _pformat.nest_from_op(
input_op='[>',
text=chan.pformat(),
nest_indent=1,
)
log.runtime(
f'Connected channel IPC transport\n'
f'{chan_repr}'
)
return chan
@cm
@ -196,9 +224,12 @@ class Channel:
self._transport.codec = orig
# TODO: do a .src/.dst: str for maddrs?
def pformat(self) -> str:
def pformat(
self,
privates: bool = False,
) -> str:
if not self._transport:
return '<Channel with inactive transport?>'
return '<Channel( with inactive transport? )>'
tpt: MsgTransport = self._transport
tpt_name: str = type(tpt).__name__
@ -206,26 +237,35 @@ class Channel:
'connected' if self.connected()
else 'closed'
)
return (
repr_str: str = (
f'<Channel(\n'
f' |_status: {tpt_status!r}\n'
) + (
f' _closed={self._closed}\n'
f' _cancel_called={self._cancel_called}\n'
f'\n'
f' |_peer: {self.aid}\n'
f'\n'
if privates else ''
) + ( # peer-actor (processs) section
f' |_peer: {self.aid.reprol()!r}\n'
if self.aid else ' |_peer: <unknown>\n'
) + (
f' |_msgstream: {tpt_name}\n'
f' proto={tpt.laddr.proto_key!r}\n'
f' layer={tpt.layer_key!r}\n'
f' laddr={tpt.laddr}\n'
f' raddr={tpt.raddr}\n'
f' codec={tpt.codec_key!r}\n'
f' stream={tpt.stream}\n'
f' maddr={tpt.maddr!r}\n'
f' drained={tpt.drained}\n'
f' maddr: {tpt.maddr!r}\n'
f' proto: {tpt.laddr.proto_key!r}\n'
f' layer: {tpt.layer_key!r}\n'
f' codec: {tpt.codec_key!r}\n'
f' .laddr={tpt.laddr}\n'
f' .raddr={tpt.raddr}\n'
) + (
f' ._transport.stream={tpt.stream}\n'
f' ._transport.drained={tpt.drained}\n'
if privates else ''
) + (
f' _send_lock={tpt._send_lock.statistics()}\n'
f')>\n'
if privates else ''
) + (
')>\n'
)
return repr_str
# NOTE: making this return a value that can be passed to
# `eval()` is entirely **optional** FYI!
@ -247,6 +287,10 @@ class Channel:
def raddr(self) -> Address|None:
return self._transport.raddr if self._transport else None
@property
def maddr(self) -> str:
return self._transport.maddr if self._transport else '<no-tpt>'
# TODO: something like,
# `pdbp.hideframe_on(errors=[MsgTypeError])`
# instead of the `try/except` hack we have rn..
@ -434,8 +478,8 @@ class Channel:
await self.send(aid)
peer_aid: Aid = await self.recv()
log.runtime(
f'Received hanshake with peer actor,\n'
f'{peer_aid}\n'
f'Received hanshake with peer\n'
f'<= {peer_aid.reprol(sin_uuid=False)}\n'
)
# NOTE, we always are referencing the remote peer!
self.aid = peer_aid

View File

@ -17,9 +17,16 @@
Utils to tame mp non-SC madeness
'''
# !TODO! in 3.13 this can be disabled (the-same/similarly) using
# a flag,
# - [ ] soo if it works like this, drop this module entirely for
# 3.13+ B)
# |_https://docs.python.org/3/library/multiprocessing.shared_memory.html
#
def disable_mantracker():
'''
Disable all ``multiprocessing``` "resource tracking" machinery since
Disable all `multiprocessing` "resource tracking" machinery since
it's an absolute multi-threaded mess of non-SC madness.
'''

View File

@ -26,7 +26,7 @@ from contextlib import (
from functools import partial
from itertools import chain
import inspect
from pprint import pformat
import textwrap
from types import (
ModuleType,
)
@ -43,7 +43,10 @@ from trio import (
SocketListener,
)
# from ..devx import debug
from ..devx.pformat import (
ppfmt,
nest_from_op,
)
from .._exceptions import (
TransportClosed,
)
@ -141,9 +144,8 @@ async def maybe_wait_on_canced_subs(
):
log.cancel(
'Waiting on cancel request to peer..\n'
f'c)=>\n'
f' |_{chan.aid}\n'
'Waiting on cancel request to peer\n'
f'c)=> {chan.aid.reprol()}@[{chan.maddr}]\n'
)
# XXX: this is a soft wait on the channel (and its
@ -179,7 +181,7 @@ async def maybe_wait_on_canced_subs(
log.warning(
'Draining msg from disconnected peer\n'
f'{chan_info}'
f'{pformat(msg)}\n'
f'{ppfmt(msg)}\n'
)
# cid: str|None = msg.get('cid')
cid: str|None = msg.cid
@ -248,7 +250,7 @@ async def maybe_wait_on_canced_subs(
if children := local_nursery._children:
# indent from above local-nurse repr
report += (
f' |_{pformat(children)}\n'
f' |_{ppfmt(children)}\n'
)
log.warning(report)
@ -279,8 +281,9 @@ async def maybe_wait_on_canced_subs(
log.runtime(
f'Peer IPC broke but subproc is alive?\n\n'
f'<=x {chan.aid}@{chan.raddr}\n'
f' |_{proc}\n'
f'<=x {chan.aid.reprol()}@[{chan.maddr}]\n'
f'\n'
f'{proc}\n'
)
return local_nursery
@ -324,9 +327,10 @@ async def handle_stream_from_peer(
chan = Channel.from_stream(stream)
con_status: str = (
'New inbound IPC connection <=\n'
f'|_{chan}\n'
f'New inbound IPC transport connection\n'
f'<=( {stream!r}\n'
)
con_status_steps: str = ''
# initial handshake with peer phase
try:
@ -372,7 +376,7 @@ async def handle_stream_from_peer(
if _pre_chan := server._peers.get(uid):
familiar: str = 'pre-existing-peer'
uid_short: str = f'{uid[0]}[{uid[1][-6:]}]'
con_status += (
con_status_steps += (
f' -> Handshake with {familiar} `{uid_short}` complete\n'
)
@ -397,7 +401,7 @@ async def handle_stream_from_peer(
None,
)
if event:
con_status += (
con_status_steps += (
' -> Waking subactor spawn waiters: '
f'{event.statistics().tasks_waiting}\n'
f' -> Registered IPC chan for child actor {uid}@{chan.raddr}\n'
@ -408,7 +412,7 @@ async def handle_stream_from_peer(
event.set()
else:
con_status += (
con_status_steps += (
f' -> Registered IPC chan for peer actor {uid}@{chan.raddr}\n'
) # type: ignore
@ -422,8 +426,15 @@ async def handle_stream_from_peer(
# TODO: can we just use list-ref directly?
chans.append(chan)
con_status += ' -> Entering RPC msg loop..\n'
log.runtime(con_status)
con_status_steps += ' -> Entering RPC msg loop..\n'
log.runtime(
con_status
+
textwrap.indent(
con_status_steps,
prefix=' '*3, # align to first-ln
)
)
# Begin channel management - respond to remote requests and
# process received reponses.
@ -456,41 +467,67 @@ async def handle_stream_from_peer(
disconnected=disconnected,
)
# ``Channel`` teardown and closure sequence
# `Channel` teardown and closure sequence
# drop ref to channel so it can be gc-ed and disconnected
con_teardown_status: str = (
f'IPC channel disconnected:\n'
f'<=x uid: {chan.aid}\n'
f' |_{pformat(chan)}\n\n'
#
# -[x]TODO mk this be like
# <=x Channel(
# |_field: blah
# )>
op_repr: str = '<=x '
chan_repr: str = nest_from_op(
input_op=op_repr,
op_suffix='',
nest_prefix='',
text=chan.pformat(),
nest_indent=len(op_repr)-1,
rm_from_first_ln='<',
)
con_teardown_status: str = (
f'IPC channel disconnect\n'
f'\n'
f'{chan_repr}\n'
f'\n'
)
chans.remove(chan)
# TODO: do we need to be this pedantic?
if not chans:
con_teardown_status += (
f'-> No more channels with {chan.aid}'
f'-> No more channels with {chan.aid.reprol()!r}\n'
)
server._peers.pop(uid, None)
peers_str: str = ''
for uid, chans in server._peers.items():
peers_str += (
f'uid: {uid}\n'
)
for i, chan in enumerate(chans):
peers_str += (
f' |_[{i}] {pformat(chan)}\n'
if peers := list(server._peers.values()):
peer_cnt: int = len(peers)
if (
(first := peers[0][0]) is not chan
and
not disconnected
and
peer_cnt > 1
):
con_teardown_status += (
f'-> Remaining IPC {peer_cnt-1!r} peers:\n'
)
con_teardown_status += (
f'-> Remaining IPC {len(server._peers)} peers: {peers_str}\n'
)
for chans in server._peers.values():
first: Channel = chans[0]
if not (
first is chan
and
disconnected
):
con_teardown_status += (
f' |_{first.aid.reprol()!r} -> {len(chans)!r} chans\n'
)
# No more channels to other actors (at all) registered
# as connected.
if not server._peers:
con_teardown_status += (
'Signalling no more peer channel connections'
'-> Signalling no more peer connections!\n'
)
server._no_more_peers.set()
@ -579,10 +616,10 @@ async def handle_stream_from_peer(
class Endpoint(Struct):
'''
An instance of an IPC "bound" address where the lifetime of the
"ability to accept connections" (from clients) and then handle
those inbound sessions or sequences-of-packets is determined by
a (maybe pair of) nurser(y/ies).
An instance of an IPC "bound" address where the lifetime of an
"ability to accept connections" and handle the subsequent
sequence-of-packets (maybe oriented as sessions) is determined by
the underlying nursery scope(s).
'''
addr: Address
@ -600,6 +637,24 @@ class Endpoint(Struct):
MsgTransport, # handle to encoded-msg transport stream
] = {}
def pformat(
self,
indent: int = 0,
privates: bool = False,
) -> str:
type_repr: str = type(self).__name__
fmtstr: str = (
# !TODO, always be ns aware!
# f'|_netns: {netns}\n'
f' |.addr: {self.addr!r}\n'
f' |_peers: {len(self.peer_tpts)}\n'
)
return (
f'<{type_repr}(\n'
f'{fmtstr}'
f')>'
)
async def start_listener(self) -> SocketListener:
tpt_mod: ModuleType = inspect.getmodule(self.addr)
lstnr: SocketListener = await tpt_mod.start_listener(
@ -639,11 +694,13 @@ class Endpoint(Struct):
class Server(Struct):
_parent_tn: Nursery
_stream_handler_tn: Nursery
# level-triggered sig for whether "no peers are currently
# connected"; field is **always** set to an instance but
# initialized with `.is_set() == True`.
_no_more_peers: trio.Event
# active eps as allocated by `.listen_on()`
_endpoints: list[Endpoint] = []
# connection tracking & mgmt
@ -651,12 +708,19 @@ class Server(Struct):
str, # uaid
list[Channel], # IPC conns from peer
] = defaultdict(list)
# events-table with entries registered unset while the local
# actor is waiting on a new actor to inbound connect, often
# a parent waiting on its child just after spawn.
_peer_connected: dict[
tuple[str, str],
trio.Event,
] = {}
# syncs for setup/teardown sequences
# - null when not yet booted,
# - unset when active,
# - set when fully shutdown with 0 eps active.
_shutdown: trio.Event|None = None
# TODO, maybe just make `._endpoints: list[Endpoint]` and
@ -664,7 +728,6 @@ class Server(Struct):
# @property
# def addrs2eps(self) -> dict[Address, Endpoint]:
# ...
@property
def proto_keys(self) -> list[str]:
return [
@ -690,7 +753,7 @@ class Server(Struct):
# TODO: obvi a different server type when we eventually
# support some others XD
log.runtime(
f'Cancelling server(s) for\n'
f'Cancelling server(s) for tpt-protos\n'
f'{self.proto_keys!r}\n'
)
self._parent_tn.cancel_scope.cancel()
@ -717,6 +780,14 @@ class Server(Struct):
f'protos: {tpt_protos!r}\n'
)
def len_peers(
self,
) -> int:
return len([
chan.connected()
for chan in chain(*self._peers.values())
])
def has_peers(
self,
check_chans: bool = False,
@ -730,13 +801,11 @@ class Server(Struct):
has_peers
and
check_chans
and
(peer_cnt := self.len_peers())
):
has_peers: bool = (
any(chan.connected()
for chan in chain(
*self._peers.values()
)
)
peer_cnt > 0
and
has_peers
)
@ -745,10 +814,14 @@ class Server(Struct):
async def wait_for_no_more_peers(
self,
shield: bool = False,
# XXX, should this even be allowed?
# -> i've seen it cause hangs on teardown
# in `test_resource_cache.py`
# _shield: bool = False,
) -> None:
with trio.CancelScope(shield=shield):
await self._no_more_peers.wait()
await self._no_more_peers.wait()
# with trio.CancelScope(shield=_shield):
# await self._no_more_peers.wait()
async def wait_for_peer(
self,
@ -803,30 +876,66 @@ class Server(Struct):
return ev.is_set()
def pformat(self) -> str:
@property
def repr_state(self) -> str:
'''
A `str`-status describing the current state of this
IPC server in terms of the current operating "phase".
'''
status = 'server is active'
if self.has_peers():
peer_cnt: int = self.len_peers()
status: str = (
f'{peer_cnt!r} peer chans'
)
else:
status: str = 'No peer chans'
if self.is_shutdown():
status: str = 'server-shutdown'
return status
def pformat(
self,
privates: bool = False,
) -> str:
eps: list[Endpoint] = self._endpoints
state_repr: str = (
f'{len(eps)!r} IPC-endpoints active'
)
# state_repr: str = (
# f'{len(eps)!r} endpoints active'
# )
fmtstr = (
f' |_state: {state_repr}\n'
f' no_more_peers: {self.has_peers()}\n'
f' |_state: {self.repr_state!r}\n'
)
if self._shutdown is not None:
shutdown_stats: EventStatistics = self._shutdown.statistics()
if privates:
fmtstr += f' no_more_peers: {self.has_peers()}\n'
if self._shutdown is not None:
shutdown_stats: EventStatistics = self._shutdown.statistics()
fmtstr += (
f' task_waiting_on_shutdown: {shutdown_stats}\n'
)
if eps := self._endpoints:
addrs: list[tuple] = [
ep.addr for ep in eps
]
repr_eps: str = ppfmt(addrs)
fmtstr += (
f' task_waiting_on_shutdown: {shutdown_stats}\n'
f' |_endpoints: {repr_eps}\n'
# ^TODO? how to indent closing ']'..
)
fmtstr += (
# TODO, use the `ppfmt()` helper from `modden`!
f' |_endpoints: {pformat(self._endpoints)}\n'
f' |_peers: {len(self._peers)} connected\n'
)
if peers := self._peers:
fmtstr += (
f' |_peers: {len(peers)} connected\n'
)
return (
f'<IPCServer(\n'
f'<Server(\n'
f'{fmtstr}'
f')>\n'
)
@ -885,8 +994,8 @@ class Server(Struct):
)
log.runtime(
f'Binding to endpoints for,\n'
f'{accept_addrs}\n'
f'Binding endpoints\n'
f'{ppfmt(accept_addrs)}\n'
)
eps: list[Endpoint] = await self._parent_tn.start(
partial(
@ -896,13 +1005,19 @@ class Server(Struct):
listen_addrs=accept_addrs,
)
)
self._endpoints.extend(eps)
serv_repr: str = nest_from_op(
input_op='(>',
text=self.pformat(),
nest_indent=1,
)
log.runtime(
f'Started IPC endpoints\n'
f'{eps}\n'
f'Started IPC server\n'
f'{serv_repr}'
)
self._endpoints.extend(eps)
# XXX, just a little bit of sanity
# XXX, a little sanity on new ep allocations
group_tn: Nursery|None = None
ep: Endpoint
for ep in eps:
@ -956,9 +1071,13 @@ async def _serve_ipc_eps(
stream_handler_tn=stream_handler_tn,
)
try:
ep_sclang: str = nest_from_op(
input_op='>[',
text=f'{ep.pformat()}',
)
log.runtime(
f'Starting new endpoint listener\n'
f'{ep}\n'
f'{ep_sclang}\n'
)
listener: trio.abc.Listener = await ep.start_listener()
assert listener is ep._listener
@ -996,17 +1115,6 @@ async def _serve_ipc_eps(
handler_nursery=stream_handler_tn
)
)
# TODO, wow make this message better! XD
log.runtime(
'Started server(s)\n'
+
'\n'.join([f'|_{addr}' for addr in listen_addrs])
)
log.runtime(
f'Started IPC endpoints\n'
f'{eps}\n'
)
task_status.started(
eps,
)
@ -1049,8 +1157,7 @@ async def open_ipc_server(
try:
yield ipc_server
log.runtime(
f'Waiting on server to shutdown or be cancelled..\n'
f'{ipc_server}'
'Server-tn running until terminated\n'
)
# TODO? when if ever would we want/need this?
# with trio.CancelScope(shield=True):

View File

@ -789,6 +789,11 @@ def open_shm_list(
readonly=readonly,
)
# TODO, factor into a @actor_fixture acm-API?
# -[ ] also `@maybe_actor_fixture()` which inludes
# the .current_actor() convenience check?
# |_ orr can that just be in the sin-maybe-version?
#
# "close" attached shm on actor teardown
try:
actor = tractor.current_actor()

View File

@ -160,10 +160,9 @@ async def start_listener(
Start a TCP socket listener on the given `TCPAddress`.
'''
log.info(
f'Attempting to bind TCP socket\n'
f'>[\n'
f'|_{addr}\n'
log.runtime(
f'Trying socket bind\n'
f'>[ {addr}\n'
)
# ?TODO, maybe we should just change the lower-level call this is
# using internall per-listener?
@ -178,11 +177,10 @@ async def start_listener(
assert len(listeners) == 1
listener = listeners[0]
host, port = listener.socket.getsockname()[:2]
bound_addr: TCPAddress = type(addr).from_addr((host, port))
log.info(
f'Listening on TCP socket\n'
f'[>\n'
f' |_{addr}\n'
f'[> {bound_addr}\n'
)
return listener

View File

@ -81,10 +81,35 @@ BOLD_PALETTE = {
}
def at_least_level(
log: Logger|LoggerAdapter,
level: int|str,
) -> bool:
'''
Predicate to test if a given level is active.
'''
if isinstance(level, str):
level: int = CUSTOM_LEVELS[level.upper()]
if log.getEffectiveLevel() <= level:
return True
return False
# TODO: this isn't showing the correct '{filename}'
# as it did before..
class StackLevelAdapter(LoggerAdapter):
def at_least_level(
self,
level: str,
) -> bool:
return at_least_level(
log=self,
level=level,
)
def transport(
self,
msg: str,
@ -401,19 +426,3 @@ def get_loglevel() -> str:
# global module logger for tractor itself
log: StackLevelAdapter = get_logger('tractor')
def at_least_level(
log: Logger|LoggerAdapter,
level: int|str,
) -> bool:
'''
Predicate to test if a given level is active.
'''
if isinstance(level, str):
level: int = CUSTOM_LEVELS[level.upper()]
if log.getEffectiveLevel() <= level:
return True
return False

View File

@ -210,12 +210,14 @@ class PldRx(Struct):
match msg:
case Return()|Error():
log.runtime(
f'Rxed final outcome msg\n'
f'Rxed final-outcome msg\n'
f'\n'
f'{msg}\n'
)
case Stop():
log.runtime(
f'Rxed stream stopped msg\n'
f'\n'
f'{msg}\n'
)
if passthrough_non_pld_msgs:
@ -261,8 +263,9 @@ class PldRx(Struct):
if (
type(msg) is Return
):
log.info(
log.runtime(
f'Rxed final result msg\n'
f'\n'
f'{msg}\n'
)
return self.decode_pld(
@ -304,10 +307,13 @@ class PldRx(Struct):
try:
pld: PayloadT = self._pld_dec.decode(pld)
log.runtime(
'Decoded msg payload\n\n'
f'Decoded payload for\n'
# f'\n'
f'{msg}\n'
f'where payload decoded as\n'
f'|_pld={pld!r}\n'
# ^TODO?, ideally just render with `,
# pld={decode}` in the `msg.pformat()`??
f'where, '
f'{type(msg).__name__}.pld={pld!r}\n'
)
return pld
except TypeError as typerr:
@ -494,7 +500,8 @@ def limit_plds(
finally:
log.runtime(
'Reverted to previous payload-decoder\n\n'
f'Reverted to previous payload-decoder\n'
f'\n'
f'{orig_pldec}\n'
)
# sanity on orig settings
@ -629,7 +636,8 @@ async def drain_to_final_msg(
(local_cs := rent_n.cancel_scope).cancel_called
):
log.cancel(
'RPC-ctx cancelled by local-parent scope during drain!\n\n'
f'RPC-ctx cancelled by local-parent scope during drain!\n'
f'\n'
f'c}}>\n'
f' |_{rent_n}\n'
f' |_.cancel_scope = {local_cs}\n'
@ -663,7 +671,8 @@ async def drain_to_final_msg(
# final result arrived!
case Return():
log.runtime(
'Context delivered final draining msg:\n'
f'Context delivered final draining msg\n'
f'\n'
f'{pretty_struct.pformat(msg)}'
)
ctx._result: Any = pld
@ -697,12 +706,14 @@ async def drain_to_final_msg(
):
log.cancel(
'Cancelling `MsgStream` drain since '
f'{reason}\n\n'
f'{reason}\n'
f'\n'
f'<= {ctx.chan.uid}\n'
f' |_{ctx._nsf}()\n\n'
f' |_{ctx._nsf}()\n'
f'\n'
f'=> {ctx._task}\n'
f' |_{ctx._stream}\n\n'
f' |_{ctx._stream}\n'
f'\n'
f'{pretty_struct.pformat(msg)}\n'
)
break
@ -739,7 +750,8 @@ async def drain_to_final_msg(
case Stop():
pre_result_drained.append(msg)
log.runtime( # normal/expected shutdown transaction
'Remote stream terminated due to "stop" msg:\n\n'
f'Remote stream terminated due to "stop" msg\n'
f'\n'
f'{pretty_struct.pformat(msg)}\n'
)
continue
@ -814,7 +826,8 @@ async def drain_to_final_msg(
else:
log.cancel(
'Skipping `MsgStream` drain since final outcome is set\n\n'
f'Skipping `MsgStream` drain since final outcome is set\n'
f'\n'
f'{ctx.outcome}\n'
)

View File

@ -154,6 +154,39 @@ class Aid(
# should also include at least `.pid` (equiv to port for tcp)
# and/or host-part always?
@property
def uid(self) -> tuple[str, str]:
'''
Legacy actor "unique-id" pair format.
'''
return (
self.name,
self.uuid,
)
def reprol(
self,
sin_uuid: bool = True,
) -> str:
if not sin_uuid:
return (
f'{self.name}[{self.uuid[:6]}]@{self.pid!r}'
)
return (
f'{self.name}@{self.pid!r}'
)
# mk hashable via `.uuid`
def __hash__(self) -> int:
return hash(self.uuid)
def __eq__(self, other: Aid) -> bool:
return self.uuid == other.uuid
# use pretty fmt since often repr-ed for console/log
__repr__ = pretty_struct.Struct.__repr__
class SpawnSpec(
pretty_struct.Struct,

View File

@ -38,7 +38,6 @@ from typing import (
import tractor
from tractor._exceptions import (
InternalError,
is_multi_cancelled,
TrioTaskExited,
TrioCancelled,
AsyncioTaskExited,
@ -59,6 +58,9 @@ from tractor.log import (
# from tractor.msg import (
# pretty_struct,
# )
from tractor.trionics import (
is_multi_cancelled,
)
from tractor.trionics._broadcast import (
broadcast_receiver,
BroadcastReceiver,

View File

@ -32,4 +32,8 @@ from ._broadcast import (
from ._beg import (
collapse_eg as collapse_eg,
maybe_collapse_eg as maybe_collapse_eg,
is_multi_cancelled as is_multi_cancelled,
)
from ._taskc import (
maybe_raise_from_masking_exc as maybe_raise_from_masking_exc,
)

View File

@ -22,11 +22,16 @@ first-class-`trio` from a historical perspective B)
from contextlib import (
asynccontextmanager as acm,
)
from typing import (
Literal,
)
import trio
def maybe_collapse_eg(
beg: BaseExceptionGroup,
) -> BaseException:
) -> BaseException|bool:
'''
If the input beg can collapse to a single non-eg sub-exception,
return it instead.
@ -35,11 +40,13 @@ def maybe_collapse_eg(
if len(excs := beg.exceptions) == 1:
return excs[0]
return beg
return False
@acm
async def collapse_eg():
async def collapse_eg(
hide_tb: bool = True,
):
'''
If `BaseExceptionGroup` raised in the body scope is
"collapse-able" (in the same way that
@ -47,12 +54,75 @@ async def collapse_eg():
only raise the lone emedded non-eg in in place.
'''
__tracebackhide__: bool = hide_tb
try:
yield
except* BaseException as beg:
if (
exc := maybe_collapse_eg(beg)
) is not beg:
):
if cause := exc.__cause__:
raise exc from cause
raise exc
raise beg
def is_multi_cancelled(
beg: BaseException|BaseExceptionGroup,
ignore_nested: set[BaseException] = set(),
) -> Literal[False]|BaseExceptionGroup:
'''
Predicate to determine if an `BaseExceptionGroup` only contains
some (maybe nested) set of sub-grouped exceptions (like only
`trio.Cancelled`s which get swallowed silently by default) and is
thus the result of "gracefully cancelling" a collection of
sub-tasks (or other conc primitives) and receiving a "cancelled
ACK" from each after termination.
Docs:
----
- https://docs.python.org/3/library/exceptions.html#exception-groups
- https://docs.python.org/3/library/exceptions.html#BaseExceptionGroup.subgroup
'''
if (
not ignore_nested
or
trio.Cancelled not in ignore_nested
# XXX always count-in `trio`'s native signal
):
ignore_nested.update({trio.Cancelled})
if isinstance(beg, BaseExceptionGroup):
# https://docs.python.org/3/library/exceptions.html#BaseExceptionGroup.subgroup
# |_ "The condition can be an exception type or tuple of
# exception types, in which case each exception is checked
# for a match using the same check that is used in an
# except clause. The condition can also be a callable
# (other than a type object) that accepts an exception as
# its single argument and returns true for the exceptions
# that should be in the subgroup."
matched_exc: BaseExceptionGroup|None = beg.subgroup(
tuple(ignore_nested),
# ??TODO, complain about why not allowed to use
# named arg style calling???
# XD .. wtf?
# condition=tuple(ignore_nested),
)
if matched_exc is not None:
return matched_exc
# NOTE, IFF no excs types match (throughout the error-tree)
# -> return `False`, OW return the matched sub-eg.
#
# IOW, for the inverse of ^ for the purpose of
# maybe-enter-REPL--logic: "only debug when the err-tree contains
# at least one exc-type NOT in `ignore_nested`" ; i.e. the case where
# we fallthrough and return `False` here.
return False

View File

@ -100,6 +100,32 @@ class Lagged(trio.TooSlowError):
'''
def wrap_rx_for_eoc(
rx: AsyncReceiver,
) -> AsyncReceiver:
match rx:
case trio.MemoryReceiveChannel():
# XXX, taken verbatim from .receive_nowait()
def is_eoc() -> bool:
if not rx._state.open_send_channels:
return trio.EndOfChannel
return False
rx.is_eoc = is_eoc
case _:
# XXX, ensure we define a private field!
# case tractor.MsgStream:
assert (
getattr(rx, '_eoc', False) is not None
)
return rx
class BroadcastState(Struct):
'''
Common state to all receivers of a broadcast.
@ -186,11 +212,23 @@ class BroadcastReceiver(ReceiveChannel):
state.subs[self.key] = -1
# underlying for this receiver
self._rx = rx_chan
self._rx = wrap_rx_for_eoc(rx_chan)
self._recv = receive_afunc or rx_chan.receive
self._closed: bool = False
self._raise_on_lag = raise_on_lag
@property
def state(self) -> BroadcastState:
'''
Read-only access to this receivers internal `._state`
instance ref.
If you just want to read the high-level state metrics,
use `.state.statistics()`.
'''
return self._state
def receive_nowait(
self,
_key: int | None = None,
@ -215,7 +253,23 @@ class BroadcastReceiver(ReceiveChannel):
try:
seq = state.subs[key]
except KeyError:
# from tractor import pause_from_sync
# pause_from_sync(shield=True)
if (
(rx_eoc := self._rx.is_eoc())
or
self.state.eoc
):
raise trio.EndOfChannel(
'Broadcast-Rx underlying already ended!'
) from rx_eoc
if self._closed:
# if (rx_eoc := self._rx._eoc):
# raise trio.EndOfChannel(
# 'Broadcast-Rx underlying already ended!'
# ) from rx_eoc
raise trio.ClosedResourceError
raise RuntimeError(
@ -453,8 +507,9 @@ class BroadcastReceiver(ReceiveChannel):
self._closed = True
# NOTE, this can we use as an `@acm` since `BroadcastReceiver`
# derives from `ReceiveChannel`.
def broadcast_receiver(
recv_chan: AsyncReceiver,
max_buffer_size: int,
receive_afunc: Callable[[], Awaitable[Any]]|None = None,

View File

@ -40,6 +40,8 @@ from typing import (
import trio
from tractor._state import current_actor
from tractor.log import get_logger
from ._beg import collapse_eg
if TYPE_CHECKING:
from tractor import ActorNursery
@ -112,17 +114,19 @@ async def gather_contexts(
None,
]:
'''
Concurrently enter a sequence of async context managers (acms),
each from a separate `trio` task and deliver the unwrapped
`yield`-ed values in the same order once all managers have entered.
Concurrently enter a sequence of async context managers (`acm`s),
each scheduled in a separate `trio.Task` and deliver their
unwrapped `yield`-ed values in the same order once all `@acm`s
in every task have entered.
On exit, all acms are subsequently and concurrently exited.
On exit, all `acm`s are subsequently and concurrently exited with
**no order guarantees**.
This function is somewhat similar to a batch of non-blocking
calls to `contextlib.AsyncExitStack.enter_async_context()`
(inside a loop) *in combo with* a `asyncio.gather()` to get the
`.__aenter__()`-ed values, except the managers are both
concurrently entered and exited and *cancellation just works*(R).
concurrently entered and exited and *cancellation-just-works*.
'''
seed: int = id(mngrs)
@ -142,16 +146,15 @@ async def gather_contexts(
if not mngrs:
raise ValueError(
'`.trionics.gather_contexts()` input mngrs is empty?\n'
'\n'
'Did try to use inline generator syntax?\n'
'Use a non-lazy iterator or sequence type intead!'
'Use a non-lazy iterator or sequence-type intead!\n'
)
async with trio.open_nursery(
strict_exception_groups=False,
# ^XXX^ TODO? soo roll our own then ??
# -> since we kinda want the "if only one `.exception` then
# just raise that" interface?
) as tn:
async with (
collapse_eg(),
trio.open_nursery() as tn,
):
for mngr in mngrs:
tn.start_soon(
_enter_and_wait,
@ -168,7 +171,7 @@ async def gather_contexts(
try:
yield tuple(unwrapped.values())
finally:
# NOTE: this is ABSOLUTELY REQUIRED to avoid
# XXX NOTE: this is ABSOLUTELY REQUIRED to avoid
# the following wacky bug:
# <tractorbugurlhere>
parent_exit.set()
@ -286,7 +289,10 @@ async def maybe_open_context(
)
_Cache.users += 1
lock.release()
yield False, yielded
yield (
False, # cache_hit = "no"
yielded,
)
else:
_Cache.users += 1
@ -300,7 +306,10 @@ async def maybe_open_context(
# f'{ctx_key!r} -> {yielded!r}\n'
)
lock.release()
yield True, yielded
yield (
True, # cache_hit = "yes"
yielded,
)
finally:
_Cache.users -= 1

View File

@ -0,0 +1,185 @@
# tractor: structured concurrent "actors".
# Copyright 2018-eternity Tyler Goodlet.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
'''
`trio.Task` cancellation helpers, extensions and "holsters".
'''
from __future__ import annotations
from contextlib import (
asynccontextmanager as acm,
)
from typing import TYPE_CHECKING
import trio
from tractor.log import get_logger
log = get_logger(__name__)
if TYPE_CHECKING:
from tractor.devx.debug import BoxedMaybeException
def find_masked_excs(
maybe_masker: BaseException,
unmask_from: set[BaseException],
) -> BaseException|None:
''''
Deliver any `maybe_masker.__context__` provided
it a declared masking exc-type entry in `unmask_from`.
'''
if (
type(maybe_masker) in unmask_from
and
(exc_ctx := maybe_masker.__context__)
# TODO? what about any cases where
# they could be the same type but not same instance?
# |_i.e. a cancel masking a cancel ??
# or (
# exc_ctx is not maybe_masker
# )
):
return exc_ctx
return None
# XXX, relevant ish discussion @ `trio`-core,
# https://github.com/python-trio/trio/issues/455#issuecomment-2785122216
#
@acm
async def maybe_raise_from_masking_exc(
tn: trio.Nursery|None = None,
unmask_from: (
BaseException|
tuple[BaseException]
) = (trio.Cancelled,),
raise_unmasked: bool = True,
extra_note: str = (
'This can occurr when,\n'
' - a `trio.Nursery` scope embeds a `finally:`-block '
'which executes a checkpoint!'
#
# ^TODO? other cases?
),
always_warn_on: tuple[BaseException] = (
trio.Cancelled,
),
# ^XXX, special case(s) where we warn-log bc likely
# there will be no operational diff since the exc
# is always expected to be consumed.
) -> BoxedMaybeException:
'''
Maybe un-mask and re-raise exception(s) suppressed by a known
error-used-as-signal type (cough namely `trio.Cancelled`).
Though this unmasker targets cancelleds, it can be used more
generally to capture and unwrap masked excs detected as
`.__context__` values which were suppressed by any error type
passed in `unmask_from`.
-------------
STILL-TODO ??
-------------
-[ ] support for egs which have multiple masked entries in
`maybe_eg.exceptions`, in which case we should unmask the
individual sub-excs but maintain the eg-parent's form right?
'''
from tractor.devx.debug import (
BoxedMaybeException,
pause,
)
boxed_maybe_exc = BoxedMaybeException(
raise_on_exit=raise_unmasked,
)
matching: list[BaseException]|None = None
maybe_eg: ExceptionGroup|None
maybe_eg: ExceptionGroup|None
if tn:
try: # handle egs
yield boxed_maybe_exc
return
except* unmask_from as _maybe_eg:
maybe_eg = _maybe_eg
matches: ExceptionGroup
matches, _ = maybe_eg.split(
unmask_from
)
if not matches:
raise
matching: list[BaseException] = matches.exceptions
else:
try: # handle non-egs
yield boxed_maybe_exc
return
except unmask_from as _maybe_exc:
maybe_exc = _maybe_exc
matching: list[BaseException] = [
maybe_exc
]
# XXX, only unmask-ed for debuggin!
# TODO, remove eventually..
except BaseException as _berr:
berr = _berr
await pause(shield=True)
raise berr
if matching is None:
raise
masked: list[tuple[BaseException, BaseException]] = []
for exc_match in matching:
if exc_ctx := find_masked_excs(
maybe_masker=exc_match,
unmask_from={unmask_from},
):
masked.append((exc_ctx, exc_match))
boxed_maybe_exc.value = exc_match
note: str = (
f'\n'
f'^^WARNING^^ the above {exc_ctx!r} was masked by a {unmask_from!r}\n'
)
if extra_note:
note += (
f'\n'
f'{extra_note}\n'
)
exc_ctx.add_note(note)
if type(exc_match) in always_warn_on:
log.warning(note)
# await tractor.pause(shield=True)
if raise_unmasked:
if len(masked) < 2:
raise exc_ctx from exc_match
else:
# ?TODO, see above but, possibly unmasking sub-exc
# entries if there are > 1
await pause(shield=True)
else:
raise