Such that we don't require every single src/relay_uid in the final
output but instead at some point in the pre-output of some prompt.
Added some comments to match each actor sub-layer.
Attempting a rework of the post-cancellation "raising semantics" such
that subactors which are `ActorCancelled` as a result of a non-graceful
in-scope error, are acked via a re-raised
`ExceptionGroup[ActorCancelled*N, Exception]`
*outside the an-block*. Eventually, the idea is to have `ActorCancelled`
be relayed from each subactor in response to any
`Actor.cancel()/Portal.cancel_actor()` request much like
`Context.cancel()/ContextCancelled`.
This is a WIP bc it does break a few tests and requires related
`_spawn`-mod-machinery changes to match some of which I'm not yet sure
are required; need to dig into to the details of the currently failing
suites first.
`._supervise` patch deats,
- add `ActorNursery.maybe_error` which delivers the maybe-EG or
`._scope_error` depending on `.errors` (now `._errors`, a mapping from
`Aid`-keys) has entries seet for subs.
- raise ^ if non-null in a new outer-`finally` in
`_open_and_supervise_one_cancels_all_nursery()`; an "outer" block is
added to ensure all sub-actor-excs are emited/captured as part of
`ActorNursery.cancel()` being called (as prior) as well as the
`da_nursery` being explicitly cancelled alongside it (to unblock the
tn-block, but still not sure why this is necessary yet?..).
- (now masked) tried injecting actorcs from `.cancel()` loop, but (again
per more explanation in section below) seems to be suffering a race
issue with RAE relay?
- left in buncha notes obvi for all this..
`._spawn` patch deats,
- as above, expect `errors: dict` to map from `Aid`-keys.
- pass `errors: dict` into `soft_kill()` since it seemed like we'd want
to (for now) inject `ActoreCancelled` in some cases (but now i'm not
sure XD).
- tried out a couple spots (which are now masked) to inject
`ActorCancelled` after calling `Portal.cancel()` in various
subactor-supervision routines whenev an RAE is not set..
- oddly seems to be overwriting actual errors (likely due to racing
with RAE receive and/or actorc-request timeout?) despite the guard
logic..which clearly doesn't resolve the issue..
- buncha `tn`-style renaming.
Defining how an actor-nursery should emit an eg based on non-graceful
cancellation in a new `test_actor_nursery` module. Obviously fails atm
until the implementation is completed.
As in a layer "above" a KBI/SIGINT but "below" a `ContextCancelled` and
generally signalling an interrupt which requests cancellation of the
actor's `trio.run()`.
Impl deats,
- mk the new exc type inherit from our ctxc (for now) but overriding the
`.canceller` impl to,
* pull from the `RemoteActorError._extra_msgdata: dict` when no
`._ipc_msg` is set (which is always to start, until we incorporate
a new `CancelActor` msg type).
* not allow a `None` value since we should key-error if not set per
prev bullet.
- Mk adjustments (related) to parent `RemoteActorError.pformat()` to
accommodate showing the `.canceller` field in repr output,
* change `.relay_uid` to not crash when `._ipc_msg` is unset.
* support `.msg.types.Aid` and use its `.reprol()` from `._mk_fields_str()`.
* always call `._mk_fields_str()`, not just when `tb_str` is provided,
and for now use any `._message` in-place of a `tb_str` when
undefined.
Such that we are able to (finally) detect when we should
`Context._scope.cancel()` specifically when the `.parent_task` is
**not** blocking on receiving from the underlying `._rx_chan`, since if
the task is blocking on `.receive()` it will call `.cancel()`
implicitly.
This is a lot to explain with very little code actually needed for the
implementation (are we like `trio` yet anyone?? XD) but the main jist is
that `Context._maybe_cancel_and_set_remote_error()` needed the
additional case of calling `._scope.cancel()` whenever we know that
a remote-error/ctxc won't be immediately handled, bc user code is doing
non `Context`-API things, and result in a similar outcome as if that
task was waiting on `Context.wait_for_result()` or `.__aexite__()`.
Impl details,
- add a new `._is_blocked_on_rx_chan()` method which predicates whether
the (new) `.parent_task` is blocking on `._rx_chan.receive()`.
* see various stipulations about the current impl and how we might
need to adjust for the future given `trio`'s commitment to the
`Task.custom_sleep_data` attr..
- add `.parent_task`, a pub wrapper for `._task`.
- check for `not ._is_blocked_on_rx_chan()` before manually cancelling
the local `.parent_task`
- minimize the surrounding branch case expressions.
Other,
- tweak a couple logs.
- add a new `.cancel()` pre-started msg.
- mask the `.cancel_called` setter, it's only (been) used for tracing.
- todos around maybe moving the `._nursery` allocation "around" the
`.start_remote_task()` call and various subsequent tweaks therein.
While working on a fix to the hang case found from
`test_cancel_ctx_with_parent_side_entered_in_bg_task` an initial
solution caused this test to hang indefinitely; solve it with a small
wrapping `_main()` + `trio.fail_after()` entrypoint.
Further suite refinements,
- move the top-most `try:`->`else:` block
- toss in a masked base-exc block for tracing unexpected
`ctx.wait_for_result()` outcomes.
- tweak the `raise_sub_spawn_error_after` to be an optional `float`
which scales the `rng_seed: int = 50` msg counter to
`tell_little_bro()` so that the abs value to the `range()` can be
changed.
Such that when `maybe_context.cancel()` is not called (explicitly) and
only the subactor is cancelled by its parent we expect to see a ctxc
raised both from any call to `Context.wait_for_result()` and out of
the `Portal.open_context()` scope, up to the `trio.run()`.
Deats,
- obvi call-n-catch the ctxc (in scope) for the oob-only
subactor-cancelled case.
- add branches around `trio.run()` entry to match.
Discovered while writing a `@context` sanity test to verify unmasker
ignore-cases support. Masked code is due to the process of finding the
minimal example causing the original hang discovered in the original
examples script. Details are in the test-fn doc strings and surrounding
comments; more refinement and cleanup coming obviously.
Also moved over the self-cancel todos from the inter-peer tests module.
Remove all the `tractor` usage (with IPC ctxs) and just get us
a min-reproducing-example with a multi-task-single `trio.Lock`.
The wrapping test suite runs the exact same with an ignore case and
an `.xfail()` for when we let the `trio.WouldBlock` be unmasked.
Demonstrating the guilty `trio.Lock.acquire()` impl which puts
a checkpoint inside its `trio.WouldBlock` handler and which will always
appear to mask the "sync path" case on (graceful) cancellation.
This first script draft demos the issue from within a `tractor.context`
ep bc that's where it was orig discovered, however i'm going to factor
out the `tractor` code and instead just use
a `.trionics.maybe_raise_from_masking_exc()` to demo its low-level
ignore-case feature.
Further, this script exposed a previously unhandled remote graceful
cancellation case which hangs:
- parent actor spawns child and opens a >1 ctxs with it,
- the parent then OoB (out-of-band) cancels the child actor (with
`Portal.cancel_actor()`),
- since the open ctxs raise a ctxc with a `.canceller == parent.uid` the
`Context._is_self_cancelled()` will eval `True`,
- the `Context._scope` will NOT be cancelled in
`._maybe_cancel_and_set_remote_error()` resulting in any bg-task which
is waiting on a `Portal.open_context()` to not be cancelled/unblocked.
So my plan is to factor this ^^ scenario into a standalone unit test
as well as another test which consumes from al low-level `trio`-only
version of **this** script-scenario to sanity check the interaction
of the unmasker-with-ignore-cases usage implicitly around a ctx ep.
Call it `test_trioisms::test_unmask_aclose_as_checkpoint_on_aexit` and
parametrize all script-mod`.main()` toggles including `.xfails()` for
the `raise_unmasked=False` cases.
So we can parametrize in various toggles to `main()` including,
- `child_errors_mid_stream: bool` which now also drives whether an
additional, and otherwise non-affecting, `_tn` is allocated in
the `finite_stream_to_rent()` subtask, only in the early stream
termination case does it seem to produce a masked outcome?
* see surrounding notes within.
- `raise_unmasked: bool` to toggle whether the embedded unmasker fn
will actually raise the masked user RTE; this enables demoing the
masked outcomes via simple switch and makes it easy to wrap them
as `pytest.xfail()` outcomes.
Also in support,
- use `.trionics.collapse_eg()` around the root tn to ensure when
unmasking we can catch the EG-unwrapped RTE easily from a test.
- flip stream `msg` logs to `.debug()` to reduce console noise.
- tweak mod's script iface to report/trace unexpected non-RTEs.
Since it turns out there's even case(s) in `trio` core that are guilty
(of implementing things like checkpoints in exc handlers), this adds
facility for ignoring explicit cases via `inspect.FrameInfo` field
matching from the unmasked `exc_ctx` within
`maybe_raise_from_masking_exc()`.
Impl deats,
- use `inspect.getinnerframes()/getmodule()` to extract the equivalent
"guilty place in code" which raised the masked error which we'd like
to ignore and **not unmask**.
- start a `_mask_cases: dict` which describes the entries to ignore
by matching against a specific `FrameInfo`'s fields from indexed
from `getinnerframes()`.
- describe in that table the case i hit with `trio.WouldBlock` being
always masked by a `Cancelled` due to way `trio.Lock.acquire()`
implements the blocking case in the would-block handler..
- always call into a new `is_expected_masking_case()` predicate (from
`maybe_raise_from_masking_exc()`) on matching `exc_ctx` types.
The correct ordering is to de-alloc the surrounding `service_n`
+ `trio.Event` **after** the `mng` teardown ensuring the
`mng.__aexit__()` never can hit a ref-error if it touches either (like
if a `tn` is passed to `maybe_open_context()`!
Such that key->value pairs can be defined which should *never be*
unmasked where values of
- the keys are exc-types which might be masked, and
- the values are exc-types which masked the equivalent key.
For example, the default includes:
- KBI->taskc: a kbi should never be unmasked from its masking
`trio.Cancelled`.
For the impl, a new `do_warn: bool` in the fn-body determines the
primary guard for whether a warning or re-raising is necessary.
Including all caller usage throughout. Moving to a non-`except*` impl
means it's never needed as a signal from the caller - we can just catch
the beg outright (like we should have always been doing)..
That is from `maybe_raise_from_masking_exc()` thus minimizing us to
a single `except BaseException` block with logic branching for the beg
vs. `unmask_from` exc cases.
Also,
- raise val-err when `unmask_from` is not a `tuple`.
- tweak the exc-note warning format.
- drop all pausing from dev work.
Turns out we weren't despite the optional `stream_handler_nursery` input
to `Server.listen_on()`; fail over to the `Server._stream_handler_tn`
allocated during server setup in those cases.
Turns out I didn't read my own internals docs/comments and despite it
not being used previously, this adds the real use case: a root,
per-actor, scope which ensures parent comms are the last conc-thing to
be cancelled.
Also, the impl changes here make the test from 6410e45 (or wtv
it's rebased to) pass, i.e. we can support crash handling in the root
actor despite the root-tn having been (self) cancelled.
Superficial adjustments,
- rename `Actor._service_n` -> `._service_tn` everywhere.
- add asserts to `._runtime.async_main()` which ensure that the any
`.trionics.maybe_open_nursery()` calls against optionally passed
`._[root/service]_tn` are allocated-if-not-provided (the
`._service_tn`-case being an i-guess-prep-for-the-future-anti-pattern
Bp).
- obvi adjust all internal usage to match new naming.
Serious/real-use-case changes,
- add (back) a `Actor._root_tn` which sits a scope "above" the
service-tn and is either,
+ assigned in `._runtime.async_main()` for sub-actors OR,
+ assigned in `._root.open_root_actor()` for the root actor.
**THE primary reason** to keep this "upper" tn is that during
a full-`Actor`-cancellation condition (more details below) we want to
ensure that the IPC connection with a sub-actor's parent is **the last
thing to be cancelled**; this is most simply implemented by ensuring
that the `Actor._parent_chan: .ipc.Channel` is handled in an upper
scope in `_rpc.process_messages()`-subtask-terms.
- for the root actor this `root_tn` is allocated in `.open_root_actor()`
body and assigned as such.
- extend `Actor.cancel_soon()` to be cohesive with this entire teardown
"policy" by scheduling a task in the `._root_tn` which,
* waits for the `._service_tn` to complete and then,
* cancels the `._root_tn.cancel_scope`,
* includes "sclangy" console logging throughout.
Such that we audit the `shield=root_tn.cancel_scope.cancel_called,`
passed to `await debug._maybe_enter_pm()` in the `open_root_actor()`
exit handler block.
I masked it bc it doesn't seem to actually work for the case I was
testing (`emsd` clobbering a `paperboi` in `piker`..) but figured I'd
leave it as a reminder for solving this problem more generally (#320)
since this is likely the place in the code for a soln.
When i tested it in my case it just resulted in a hang around the `with
debug.acquire_debug_lock()` for some reason? Can't remember if the child
ended up being able to REPL without issue though..
Such that we handle them despite a cancellation condition. This is
almost always the case, that `root_tn.cancel_scope.cancel_called` is
set, by the time the `debug._maybe_enter_pm()` hits. Previous I guess we
just weren't actually ever REPL-debugging such cases?
TODO, still needs a test obvi!
That is the `target` can declare a `chan: LinkedTaskChannel` instead of
`to_trio`/`from_aio`.
To support it,
- change `.started()` -> the more appropriate `.started_nowait()` which
can be called sync from the aio child task.
- adjust the `provide_channels` assert to accept either fn sig
declaration (for now).
Still needs test(s) obvi..