As per much tinkering, re-designs and preceding rubber-ducking via many
"commit msg novelas", **finally** this adds the (hopefully) final
missing layer for typed msg safety: `tractor.msg._ops.PldRx`
(or `PayloadReceiver`? haven't decided how verbose to go..)
Design justification summary:
------ - ------
- need a way to be as-close-as-possible to the `tractor`-application
such that when `MsgType.pld: PayloadT` validation takes place, it is
straightforward and obvious how user code can decide to handle any
resulting `MsgTypeError`.
- there should be a common and optional-yet-modular way to modify
**how** data delivered via IPC (possibly embedded as user defined,
type-constrained `.pld: msgspec.Struct`s) can be handled and processed
during fault conditions and/or IPC "msg attacks".
- support for nested type constraints within a `MsgType.pld` field
should be simple to define, implement and understand at runtime.
- a layer between the app-level IPC primitive APIs
(`Context`/`MsgStream`) and application-task code (consumer code of
those APIs) should be easily customized and prove-to-be-as-such
through demonstrably rigorous internal (sub-sys) use!
-> eg. via seemless runtime RPC eps support like `Actor.cancel()`
-> by correctly implementing our `.devx._debug.Lock` REPL TTY mgmt
dialog prot, via a dead simple payload-as-ctl-msg-spec.
There are some fairly detailed doc strings included so I won't duplicate
that content, the majority of the work here is actually somewhat of
a factoring of many similar blocks that are doing more or less the same
`msg = await Context._rx_chan.receive()` with boilerplate for
`Error`/`Stop` handling via `_raise_from_no_key_in_msg()`. The new
`PldRx` basically provides a shim layer for this common "receive msg,
decode its payload, yield it up to the consuming app task" by pairing
the RPC feeder mem-chan with a msg-payload decoder and expecting IPC API
internals to use **one** API instead of re-implementing the same pattern
all over the place XD
`PldRx` breakdown
------ - ------
- for now only expects a `._msgdec: MsgDec` which allows for
override-able `MsgType.pld` validation and most obviously used in
the impl of `.dec_msg()`, the decode message method.
- provides multiple mem-chan receive options including:
|_ `.recv_pld()` which does the e2e operation of receiving a payload
item.
|_ a sync `.recv_pld_nowait()` version.
|_ a `.recv_msg_w_pld()` which optionally allows retreiving both the
shuttling `MsgType` as well as it's `.pld` body for use cases where
info on both is important (eg. draining a `MsgStream`).
Dirty internal changeover/implementation deatz:
------ - ------
- obvi move over all the IPC "primitives" that previously had the duplicate recv-n-yield
logic:
- `MsgStream.receive[_nowait]()` delegating instead to the equivalent
`PldRx.recv_pld[_nowait]()`.
- add `Context._pld_rx: PldRx`, created and passed in by
`mk_context()`; use it for the `.started()` -> `first: Started`
retrieval inside `open_context_from_portal()`.
- all the relevant `Portal` invocation methods: `.result()`,
`.run_from_ns()`, `.run()`; also allows for dropping `_unwrap_msg()`
and `.Portal_return_once()` outright Bo
- rename `Context.ctx._recv_chan` -> `._rx_chan`.
- add detailed `Context._scope` info for logging whether or not it's
cancelled inside `_maybe_cancel_and_set_remote_error()`.
- move `._context._drain_to_final_msg()` -> `._ops.drain_to_final_msg()`
since it's really not necessarily ctx specific per say, and it does
kinda fit with "msg operations" more abstractly ;)
Breaks out all the (sub)actor local conc primitives from `Lock` (which
is now only used in and by the root actor) such that there's an explicit
distinction between a task that's "consuming" the `Lock` (remotely) vs.
the root-side service tasks which do the actual acquire on behalf of the
requesters.
`DebugStatus` changeover deats:
------ - ------
- move all the actor-local vars over `DebugStatus` including:
- move `_trio_handler` and `_orig_sigint_handler`
- `local_task_in_debug` now `repl_task`
- `_debugger_request_cs` now `req_cs`
- `local_pdb_complete` now `repl_release`
- drop all ^ fields from `Lock.repr()` obvi..
- move over the `.[un]shield_sigint()` and
`.is_main_trio_thread()` methods.
- add some new attrs/meths:
- `DebugStatus.repl` for the currently running `Pdb` in-actor
singleton.
- `.repr()` for pprint of state (like `Lock`).
- Note: that even when a root-actor task is in REPL, the `DebugStatus`
is still used for certain actor-local state mgmt, such as SIGINT
handler shielding.
- obvi change all lock-requester code bits to now use a `DebugStatus` in
their local actor-state instead of `Lock`, i.e. change usage from
`Lock` in `._runtime` and `._root`.
- use new `Lock.get_locking_task_cs()` API in when checking for
sub-in-debug from `._runtime.Actor._stream_handler()`.
Unrelated to topic-at-hand tweaks:
------ - ------
- drop the commented bits about hiding `@[a]cm` stack frames from
`_debug.pause()` and simplify to only one block with the `shield`
passthrough since we already solved the issue with cancel-scopes using
`@pdbp.hideframe` B)
- this includes all the extra logging about the extra frame for the
user (good thing i put in that wasted effort back then eh..)
- put the `try/except BaseException` with `log.exception()` around the
whole of `._pause()` to ensure we don't miss in-func errors which can
cause hangs..
- allow passing in `portal: Portal` to
`Actor.start_remote_task()` such that `Portal` task spawning methods
are always denoted correctly in terms of `Context.side`.
- lotsa logging tweaks, decreasing a bit of noise from `.runtime()`s.
Mostly removing commented (and replaced) code blocks lingering from the
ctxc semantics work and new typed-msg-spec `MsgType`s handling AND use
the new `._exceptions.pack_from_raise()` helper to construct
`StreamOverrun` msgs.
Deaterz:
- clean out the drain loop now that it's implemented to handle our
struct msg types including the `dict`-msg bits left in as
fallback-reminders, any notes/todos better summarized at the top of
their blocks, remove any `_final_result_is_set()` related duplicate/legacy
tidbits.
- use a `case Error()` block in drain loop with fallthrough to `_:`
always resulting in an rte raise.
- move "XXX" notes into the doc-string for `._deliver_msg()` as
a "rules" section.
- use `match:` syntax for logging the `result_or_err: MsgType` outcome
from the final `.result()` call inside `open_context_from_portal()`.
- generally speaking use `MsgType` type annotations throughout!
Better describes the internal RPC impl/latest-architecture with the msgs
delivered being those which either define a `.pld: PayloadT` that gets
passed up to user code, or the error-msg subset that similarly is raised
in a ctx-linked task.
Pretty sure we haven't *needed it* for a while, it was always generally
hazardous in terms of IPC msg types, AND it's definitely incompatible
with a dynamically applied typed msg spec: you can't just expect
a `None` to be willy nilly handled all the time XD
For now I'm masking out all the code and leaving very detailed
surrounding notes but am not removing it quite yet in case for strange
reason it is needed by some edge case (though I haven't found according
to the test suite).
Backstory:
------ - ------
Originally (i'm pretty sure anyway) it was added as a super naive
"remote cancellation" mechanism (back before there were specific `Actor`
methods for such things) that was mostly (only?) used before IPC
`Channel` closures to "more gracefully cancel" the connection's parented
RPC tasks. Since we now have explicit runtime-RPC endpoints for
conducting remote cancellation of both tasks and full actors, it should
really be removed anyway, because:
- a `None`-msg setinel is inconsistent with other RPC endpoint handling
input patterns which (even prior to typed msging) had specific
msg-value triggers.
- the IPC endpoint's (block) implementation should use
`Actor.cancel_rpc_tasks(parent_chan=chan)` instead of a manual loop
through a `Actor._rpc_tasks.copy()`..
Deats:
- mask the `Channel.send(None)` calls from both the `Actor._stream_handler()` tail
as well as from the `._portal.open_portal()` was connected block.
- mask the msg loop endpoint block and toss in lotsa notes.
Unrelated tweaks:
- drop `Actor._debug_mode`; unused.
- make `Actor.cancel_server()` return a `bool`.
- use `.msg.pretty_struct.Struct.pformat()` to show any msg that is
ignored (bc invalid) in `._push_result()`.
Such that it's set to whatever `Actor.reg_addrs: list[tuple]` is during
the actor's init-after-spawn guaranteeing each actor has at least the
registry infos from its parent. Ensure we read this if defined over
`_root._default_lo_addrs` in `._discovery` routines, namely
`.find_actor()` since it's the one API normally used without expecting
the runtime's `current_actor()` to be up.
Update the latest inter-peer cancellation test to use the `reg_addr`
fixture (and thus test this new runtime-vars value via `find_actor()`
usage) since it was failing if run *after* the infected `asyncio` suite
due to registry contact failure.
Since it's handy to be able to debug the *writing* of this instance var
(particularly when checking state passed down to a child in
`Actor._from_parent()`), rename and wrap the underlying
`Actor._reg_addrs` as a settable `@property` and add validation to
the `.setter` for sanity - actor discovery is a critical functionality.
Other tweaks:
- fix `.cancel_soon()` to pass expected argument..
- update internal runtime error message to be simpler and link to GH issues.
- use new `Actor.reg_addrs` throughout core.
Since we'd like to eventually allow a diverse set of transport
(protocol) methods and stacks, and a multi-peer discovery system for
distributed actor-tree applications, this reworks all runtime internals
to support multi-homing for any given tree on a logical host. In other
words any actor can now bind its transport server (currently only
unsecured TCP + `msgspec`) to more then one address available in its
(linux) network namespace. Further, registry actors (now dubbed
"registars" instead of "arbiters") can also similarly bind to multiple
network addresses and provide discovery services to remote actors via
multiple addresses which can now be provided at runtime startup.
Deats:
- adjust `._runtime` internals to use a `list[tuple[str, int]]` (and
thus pluralized) socket address sequence where applicable for transport
server socket binds, now exposed via `Actor.accept_addrs`:
- `Actor.__init__()` now takes a `registry_addrs: list`.
- `Actor.is_arbiter` -> `.is_registrar`.
- `._arb_addr` -> `._reg_addrs: list[tuple]`.
- always reg and de-reg from all registrars in `async_main()`.
- only set the global runtime var `'_root_mailbox'` to the loopback
address since normally all in-tree processes should have access to
it, right?
- `._serve_forever()` task now takes `listen_sockaddrs: list[tuple]`
- make `open_root_actor()` take a `registry_addrs: list[tuple[str, int]]`
and defaults when not passed.
- change `ActorNursery.start_..()` methods take `bind_addrs: list` and
pass down through the spawning layer(s) via the parent-seed-msg.
- generalize all `._discovery()` APIs to accept `registry_addrs`-like
inputs and move all relevant subsystems to adopt the "registry" style
naming instead of "arbiter":
- make `find_actor()` support batched concurrent portal queries over
all provided input addresses using `.trionics.gather_contexts()` Bo
- syntax: move to using `async with <tuples>` 3.9+ style chained
@acms.
- a general modernization of the code to a python 3.9+ style.
- start deprecation and change to "registry" naming / semantics:
- `._discovery.get_arbiter()` -> `.get_registry()`
Where `.devx` is "developer experience", a hopefully broad enough subpkg
name for all the slick stuff planned to augment working on the actor
runtime 💥
Move the `._debug` module into the new subpkg and adjust rest of core
code base to reflect import path change. Also add a new
`.devx._debug.open_crash_handler()` manager for wrapping any sync code
outside a `trio.run()` which is handy for eventual CLI addons for
popular frameworks like `click`/`typer`.
This works now for supporting a new `tractor.pause_from_sync()`
`tractor`-aware-replacement for `Pdb.set_trace()` from sync functions
which are also scheduled from our runtime. Uses `greenback` to do all
the magic of scheduling the bg `tractor._debug._pause()` task and
engaging the normal TTY locking machinery triggered by `await
tractor.breakpoint()`
Further this starts some public API renaming, making a switch to
`tractor.pause()` from `.breakpoint()` which IMO much better expresses
the semantics of the runtime intervention required to suffice
multi-process "breakpointing"; it also is an alternate name for the same
in computer science more generally: https://en.wikipedia.org/wiki/Breakpoint
It also avoids using the same name as the `breakpoint()` built-in which
is important since there **is alot more going on** when you call our
equivalent API.
Deats of that:
- add deprecation warning for `tractor.breakpoint()`
- add `tractor.pause()` and a shorthand, easier-to-type, alias `.pp()`
for "pause-point" B)
- add `pause_from_sync()` as the new `breakpoint()`-from-sync-function
hack which does all the `greenback` stuff for the user.
Still TODO:
- figure out where in the runtime and when to call
`greenback.ensure_portal()`.
- fix the frame selection issue where
`trio._core._ki._ki_protection_decorator:wrapper` seems to be always
shown on REPL start as the selected frame..
- Remove `exceptiongroup` import,
- pin to py 3.11 in `setup.py`
- revert any lingering `tractor.devx` imports; sub-pkg is coming in
a downstream PR!
- remove weird double `@property` lingering from conflict reso..
- modern `pytest` requires conftest mod mods to be relative imported.
- `trio_typing` is nearly obsolete since `trio >= 0.23`
- `exceptiongroup` is built-in to python 3.11
- `async_generator` primitives have lived in `contextlib` for quite
a while!
Since `._runtime` was getting pretty long (> 2k LOC) and much of the RPC
low-level machinery is fairly isolated to a handful of task-funcs, it
makes sense to re-org the RPC task scheduling and driving msg loop to
its own code space.
The move includes:
- `process_messages()` which is the main IPC business logic.
- `try_ship_error_to_remote()` helper, to box local errors for the wire.
- `_invoke()`, the core task scheduler entrypoing used in the msg loop.
- `_invoke_non_context()`, holds impls for non-`@context` task starts.
- `_errors_relayed_via_ipc()` which does all error catch-n-boxing for
wire-msg shipment using `try_ship_error_to_remote()` internally.
Also inside `._runtime` improve some `Actor` methods docs.
Previously i was trying to approach this using lots of
`__tracebackhide__`'s in various internal funcs but since it's not
exactly straight forward to do this inside core deps like `trio` and the
stdlib, it makes a bit more sense to optionally catch and re-raise
certain classes of errors from their originals using `raise from` syntax
as per:
https://docs.python.org/3/library/exceptions.html#exception-context
Deats:
- litter `._context` methods with `__tracebackhide__`/`hide_tb` which
were previously being shown but that don't need to be to application
code now that cancel semantics testing is finished up.
- i originally did the same but later commented it all out in `._ipc`
since error catch and re-raise instead in higher level layers
(above the transport) seems to be a much saner approach.
- add catch-n-reraise-from in `MsgStream.send()`/.`receive()` to avoid
seeing the depths of `trio` and/or our `._ipc` layers on comms errors.
Further this patch adds some refactoring to use the
same remote-error shipper routine from both the actor-core in the RPC
invoker:
- rename it as `try_ship_error_to_remote()` and call it from
`._invoke()` as well as it's prior usage.
- make it optionally accept `cid: str` a `remote_descr: str` and of
course a `hide_tb: bool`.
Other misc tweaks:
- add some todo notes around `Actor.load_modules()` debug hooking.
- tweak the zombie reaper log msg and timeout value ;)
Found exactly why trying this won't work when playing around with
opening workspaces in `modden` using a `Portal.open_context()` back to
the 'bigd' root actor: the RPC machinery only registers one entry in
`Actor._contexts` which will get overwritten by each task's side and
then experience race-based IPC msging errors (eg. rxing `{'started': _}`
on the callee side..). Instead make opening a ctx back to the self-actor
a runtime error describing it as an invalid op.
To match:
- add a new test `test_ctx_with_self_actor()` to the context semantics
suite.
- tried out adding a new `side: str` to the `Actor.get_context()` (and
callers) but ran into not being able to determine the value from in
`._push_result()` where it's needed to figure out which side to push
to.. So, just leaving the commented arg (passing) in the runtime core
for now in case we can come back to trying to make it work, tho i'm
thinking it's not the right hack anyway XD
Like how we set `Context._cancel_msg` in `._deliver_msg()` (in
which case normally it's an `{'error': ..}` msg), do the same when any
RPC task is remotely cancelled via `Actor._cancel_task` where that task
doesn't yet have a cancel msg set yet.
This makes is much easier to distinguish between ctx cancellations due
to some remote error vs. Explicit remote requests via any of
`Actor.cancel()`, `Portal.cancel_actor()` or `Context.cancel()`.
Since eventually we want to implement all other RPC "func types" as
contexts underneath this starts the rework to move all the other cases
into a separate func not only to simplify the main `._invoke()` body but
also as a reminder of the intention to do it XD
Details of re-factor:
- add a new `._invoke_non_context()` which just moves all the old blocks
for non-context handling to a single def.
- factor what was basically just the `finally:` block handler (doing all
the task bookkeeping) into a new `@acm`: `_errors_relayed_via_ipc()`
with that content packed into the post-`yield` (also with a `hide_tb:
bool` flag added of course).
* include a `debug_kbis: bool` for when needed.
- since the `@context` block is the only type left in the main
`_invoke()` body, de-dent it so it's more grok-able B)
Obviously this patch also includes a few improvements regarding
context-cancellation-semantics (for the `context` RPC case) on the
callee side in order to match previous changes to the `Context` api:
- always setting any ctxc as the `Context._local_error`.
- using the new convenience `.maybe_raise()` topically (for now).
- avoiding any previous reliance on `Context.cancelled_caught` for
anything public of meaning.
Further included is more logging content updates:
- being pedantic in `.cancel()` msgs about whether termination is caused
by error or ctxc.
- optional `._invoke()` traceback hiding via a `hide_tb: bool`.
- simpler log headers throughout instead leveraging new `.__repr__()` on
primitives.
- buncha `<= <actor-uid>` sent some message emissions.
- simplified handshake statuses reporting.
Other subsys api changes we need to match:
- change to `Channel.transport`.
- avoiding any `local_nursery: ActorNursery` waiting when the
`._implicit_runtime_started` is set.
And yes, lotsa more comments for #TODOs dawg.. since there's always
somethin!
Besides improving a bunch more log msg contents similarly as before this
changes the cancel method signatures slightly with different arg names:
for `.cancel()`:
- instead of `requesting_uid: str` take in a `req_chan: Channel`
since we can always just read its `.uid: tuple` for logging and
further we can then offer the `chan=None` case indicating a
"self cancel" (since there's no "requesting channel").
- the semantics of "requesting" here better indicate that the IPC connection
is an IPC peer and further (eventually) will allow permission checking
against given peers for cancellation requests.
- when `chan==None` we also define a meth-internal `requester_type: str`
differently for logging content :)
- add much more detailed `.cancel()` content around the requester, its
type, and any debugger related locking steps.
for `._cancel_task()`:
- change the `chan` arg to `parent_chan: Channel` since "parent"
correctly indicates that the channel is the parent of the locally
spawned rpc task to cancel; in fact no other chan should be able to
cancel tasks parented/spawned by other channels obvi!
- also add more extensive meth-internal `.cancel()` logging with a #TODO
around showing only the "relevant/lasest" `Context` state vars in such
logging content.
for `.cancel_rpc_tasks()`:
- shorten `requesting_uid` -> `req_uid`.
- add `parent_chan: Channel` to be similar as above in `._cancel_task()`
(since it's internally delegated to anyway) which replaces the prior
`only_chan` and use it to filter to only tasks spawned by this channel
(thus as their "parent") as before.
- instead of `if tasks:` to enter, invert and `return` early on
`if not tasks`, for less indentation B)
- add WIP str-repr format (for `.cancel()` emissions) to show
a multi-address (maddr) + task func (via the new `Context._nsf`) and
report all cancel task targets with it a "tree"; include #TODO to
finalize and implement some utils for all this!
To match ensure we adjust `process_messages()` self/`Actor` cancel
handling blocks to provide the new `kwargs` (now with `dict`-merge
syntax) to `._invoke()`.
As similarly improved in other parts of the runtime, adds much more
pedantic (`.cancel()`) logging content to indicate the src of remote
cancellation request particularly for `Actor.cancel()` and
`._cancel_task()` cases prior to `._invoke()` task scheduling. Also add
detailed case comments and much more info to the
"request-to-cancel-already-terminated-RPC-task" log emission to include
the `Channel` and `Context.cid` deats.
This helped me find the src of a race condition causing a test to fail
where a callee ctx task was returning a result *before* an expected
`ctx.cancel()` request arrived B). Adding much more pedantic
`.cancel()` msg contents around the requester's deats should ensure
these cases are much easier to detect going forward!
Also, simplify the `._invoke()` final result/error log msg to only put
*one of either* the final error or returned result above the `Context`
pprint.
The only case where we can't is in `Portal.run_from_ns()` usage (since we
pass a path with `self:<Actor.meth>`) and because `.to_tuple()`
internally uses `.load_ref()` which will of course fail on such a path..
So or now impl as,
- mk `Actor.start_remote_task()` take a `nsf: NamespacePath` but also
offer a `load_nsf: bool = False` such that by default we bypass ref
loading (maybe this is fine for perf long run as well?) for the
`Actor`/'self:'` case mentioned above.
- mk `.get_context()` take an instance `nsf` obvi.
More logging msg format tweaks:
- change msg-flow related content to show the `Context._nsf`, which,
right, is coming follow up commit..
- bunch more `.runtime()` format updates to show `msg: dict` contents
and internal primitives with trailing `'\n'` for easier reading.
- report import loading `stackscope` in subactors.
As part of solving some final edge cases todo with inter-peer remote
cancellation (particularly a remote cancel from a separate actor
tree-client hanging on the request side in `modden`..) I needed less
dense, more line-delimited log msg formats when understanding ipc
channel and context cancels from console logging; this adds a ton of
that to:
- `._invoke()` which now does,
- better formatting of `Context`-task info as multi-line
`'<field>: <value>\n'` messages,
- use of `trio.Task` (from `.lowlevel.current_task()` for full
rpc-func namespace-path info,
- better "msg flow annotations" with `<=` for understanding
`ContextCancelled` flow.
- `Actor._stream_handler()` where in we break down IPC peers reporting
better as multi-line `|_<Channel>` log msgs instead of all jammed on
one line..
- `._ipc.Channel.send()` use `pformat()` for repr of packet.
Also tweak some optional deps imports for debug mode:
- add `maybe_import_gb()` for attempting to import `greenback`.
- maybe enable `stackscope` tree pprinter on `SIGUSR1` if installed.
Add a further stale-debugger-lock guard before removal:
- read the `._debug.Lock.global_actor_in_debug: tuple` uid and possibly
`maybe_wait_for_debugger()` when the child-user is known to have
a live process in our tree.
- only cancel `Lock._root_local_task_cs_in_debug: CancelScope` when
the disconnected channel maps to the `Lock.global_actor_in_debug`,
though not sure this is correct yet?
Started adding missing type annots in sections that were modified.
Took me longer then i wanted to figure out the source of
a failed-response to a remote-cancellation (in this case in `modden`
where a client was cancelling a workspace layer.. but disconnects before
receiving the ack msg) that was triggering an IPC error when sending the
error msg for the cancellation of a `Actor._cancel_task()`, but since
this (non-rpc) `._invoke()` task was trying to send to a now
disconnected canceller it was resulting in a `BrokenPipeError` (or similar)
error.
Now, we except for such IPC errors and only raise them when,
1. the transport `Channel` is for sure up (bc ow what's the point of
trying to send an error on the thing that caused it..)
2. it's definitely for handling an RPC task
Similarly if the entire main invoke `try:` excepts,
- we only hide the call-stack frame from the debugger (with
`__tracebackhide__: bool`) if it's an RPC task that has a connected
channel since we always want to see the frame when debugging internal
task or IPC failures.
- we don't bother trying to send errors to the context caller (actor)
when it's a non-RPC request since failures on actor-runtime-internal
tasks shouldn't really ever be reported remotely, only maybe raised
locally.
Also some other tidying,
- this properly corrects for the self-cancel case where an RPC context
is cancelled due to a local (runtime) task calling a method like
`Actor.cancel_soon()`. We now set our own `.uid` as the
`ContextCancelled.canceller` value so that other-end tasks know that
the cancellation was due to a self-cancellation by the actor itself.
We still need to properly test for this though!
- add a more detailed module doc-str.
- more explicit imports for `trio` core types throughout.
As part of extremely detailed inter-peer-actor testing, add much more
granular `Context` cancellation state tracking via the following (new)
fields:
- `.canceller: tuple[str, str]` the uuid of the actor responsible for
the cancellation condition - always set by
`Context._maybe_cancel_and_set_remote_error()` and replaces
`._cancelled_remote` and `.cancel_called_remote`. If set, this value
should normally always match a value from some `ContextCancelled`
raised or caught by one side of the context.
- `._local_error` which is always set to the locally raised (and caller
or callee task's scope-internal) error which caused any
eventual cancellation/error condition and thus any closure of the
context's per-task-side-`trio.Nursery`.
- `.cancelled_caught: bool` is now always `True` whenever the local task
catches (or "silently absorbs") a `ContextCancelled` (a `ctxc`) that
indeed originated from one of the context's linked tasks or any other
context which raised its own `ctxc` in the current `.open_context()` scope.
=> whenever there is a case that no `ContextCancelled` was raised
**in** the `.open_context().__aexit__()` (eg. `ctx.result()` called
after a call `ctx.cancel()`), we still consider the context's as
having "caught a cancellation" since the `ctxc` was indeed silently
handled by the cancel requester; all other error cases are already
represented by mirroring the state of the `._scope: trio.CancelScope`
=> IOW there should be **no case** where an error is **not raised** in
the context's scope and `.cancelled_caught: bool == False`, i.e. no
case where `._scope.cancelled_caught == False and ._local_error is not
None`!
- always raise any `ctxc` from `.open_stream()` if `._cancel_called ==
True` - if the cancellation request has not already resulted in
a `._remote_error: ContextCancelled` we raise a `RuntimeError` to
indicate improper usage to the guilty side's task code.
- make `._maybe_raise_remote_err()` a sync func and don't raise
any `ctxc` which is matched against a `.canceller` determined to
be the current actor, aka a "self cancel", and always set the
`._local_error` to any such `ctxc`.
- `.side: str` taken from inside `.cancel()` and unused as of now since
it might be better re-written as a similar `.is_opener() -> bool`?
- drop unused `._started_received: bool`..
- TONS and TONS of detailed comments/docs to attempt to explain all the
possible cancellation/exit cases and how they should exhibit as either
silent closes or raises from the `Context` API!
Adjust the `._runtime._invoke()` code to match:
- use `ctx._maybe_raise_remote_err()` in `._invoke()`.
- adjust to new `.canceller` property.
- more type hints.
- better `log.cancel()` msging around self-cancels vs. peer-cancels.
- always set the `._local_error: BaseException` for the "callee" task
just like `Portal.open_context()` now will do B)
Prior we were raising any `Context._remote_error` directly and doing
(more or less) the same `ContextCancelled` "absorbing" logic (well
kinda) in block; instead delegate to the method
- `Context._cancel_called_remote` -> `._cancelled_remote` since "called"
implies the cancellation was "requested" when it could be due to
another error and the actor uid is the value - only set once the far
end task scope is terminated due to either error or cancel, which has
nothing to do with *what* caused the cancellation.
- `Actor._cancel_called_remote` -> `._cancel_called_by_remote` which
emphasizes that this variable is **only set** IFF some remote actor
**requested that** this actor's runtime be cancelled via
`Actor.cancel()`.
Because obviously we probably want to support `allow_overruns` on the
remote callee side as well XD
Only found the bugs fixed in this patch this thanks to writing a much
more exhaustive test set for overrun cases B)
This adds remote cancellation semantics to our `tractor.Context`
machinery to more closely match that of `trio.CancelScope` but
with operational differences to handle the nature of parallel tasks interoperating
across multiple memory boundaries:
- if an actor task cancels some context it has opened via
`Context.cancel()`, the remote (scope linked) task will be cancelled
using the normal `CancelScope` semantics of `trio` meaning the remote
cancel scope surrounding the far side task is cancelled and
`trio.Cancelled`s are expected to be raised in that scope as per
normal `trio` operation, and in the case where no error is raised
in that remote scope, a `ContextCancelled` error is raised inside the
runtime machinery and relayed back to the opener/caller side of the
context.
- if any actor task cancels a full remote actor runtime using
`Portal.cancel_actor()` the same semantics as above apply except every
other remote actor task which also has an open context with the actor
which was cancelled will also be sent a `ContextCancelled` **but**
with the `.canceller` field set to the uid of the original cancel
requesting actor.
This changeset also includes a more "proper" solution to the issue of
"allowing overruns" during streaming without attempting to implement any
form of IPC streaming backpressure. Implementing task-granularity
backpressure cross-process turns out to be more or less impossible
without augmenting out streaming protocol (likely at the cost of
performance). Further allowing overruns requires special care since
any blocking of the runtime RPC msg loop task effectively can block
control msgs such as cancels and stream terminations.
The implementation details per abstraction layer are as follows.
._streaming.Context:
- add a new contructor factor func `mk_context()` which provides
a strictly private init-er whilst allowing us to not have to define
an `.__init__()` on the type def.
- add public `.cancel_called` and `.cancel_called_remote` properties.
- general rename of what was the internal `._backpressure` var to
`._allow_overruns: bool`.
- move the old contents of `Actor._push_result()` into a new
`._deliver_msg()` allowing for better encapsulation of per-ctx
msg handling.
- always check for received 'error' msgs and process them with the new
`_maybe_cancel_and_set_remote_error()` **before** any msg delivery to
the local task, thus guaranteeing error and cancellation handling
despite any overflow handling.
- add a new `._drain_overflows()` task-method for use with new
`._allow_overruns: bool = True` mode.
- add back a `._scope_nursery: trio.Nursery` (allocated in
`Portal.open_context()`) who's sole purpose is to spawn a single task
which runs the above method; anything else is an error.
- augment `._deliver_msg()` to start a task and run the above method
when operating in no overrun mode; the task queues overflow msgs and
attempts to send them to the underlying mem chan using a blocking
`.send()` call.
- on context exit, any existing "drainer task" will be cancelled and
remaining overflow queued msgs are discarded with a warning.
- rename `._error` -> `_remote_error` and set it in a new method
`_maybe_cancel_and_set_remote_error()` which is called before
processing
- adjust `.result()` to always call `._maybe_raise_remote_err()` at its
start such that whenever a `ContextCancelled` arrives we do logic for
whether or not to immediately raise that error or ignore it due to the
current actor being the one who requested the cancel, by checking the
error's `.canceller` field.
- set the default value of `._result` to be `id(Context()` thus avoiding
conflict with any `.result()` actually being `False`..
._runtime.Actor:
- augment `.cancel()` and `._cancel_task()` and `.cancel_rpc_tasks()` to
take a `requesting_uid: tuple` indicating the source actor of every
cancellation request.
- pass through the new `Context._allow_overruns` through `.get_context()`
- call the new `Context._deliver_msg()` from `._push_result()` (since
the factoring out that method's contents).
._runtime._invoke:
- `TastStatus.started()` back a `Context` (unless an error is raised)
instead of the cancel scope to make it easy to set/get state on that
context for the purposes of cancellation and remote error relay.
- always raise any remote error via `Context._maybe_raise_remote_err()`
before doing any `ContextCancelled` logic.
- assign any `Context._cancel_called_remote` set by the `requesting_uid`
cancel methods (mentioned above) to the `ContextCancelled.canceller`.
._runtime.process_messages:
- always pass a `requesting_uid: tuple` to `Actor.cancel()` and
`._cancel_task` to that any corresponding `ContextCancelled.canceller`
can be set inside `._invoke()`.
When backpressure is used and a feeder mem chan breaks during msg
delivery (usually because the IPC allocating task already terminated)
instead of raising we simply warn as we do for the non-backpressure
case.
Also, add a proper `Actor.is_arbiter` test inside `._invoke()` to avoid
doing an arbiter-registry lookup if the current actor **is** the
registrar.