Since we use basically the exact same set of logic in
`Portal.open_context()` when expecting the first `'started'` msg factor
and generalize `._streaming._raise_from_no_yield_msg()` into a new
`._exceptions._raise_from_no_key_in_msg()` (as per the lingering todo)
which obvi requires a more generalized / optional signature including
a caller specific `log` obj. Obvi call the new func from all the other
modules X)
Apparently (and i don't know if this was always broken [i feel like no?]
or is a recent change to stdlib's `logging` stuff) we need increment the
`stacklevel` input by one for our custom level methods now? Without this
you're going to see the path to the method's-callstack-frame on every
emission instead of to the caller's. I first noticed this when debugging
the workspace layer spawning in `modden.bigd` and then verified it in
other depended projects..
I guess we should add some tests for this as well XD
Took me longer then i wanted to figure out the source of
a failed-response to a remote-cancellation (in this case in `modden`
where a client was cancelling a workspace layer.. but disconnects before
receiving the ack msg) that was triggering an IPC error when sending the
error msg for the cancellation of a `Actor._cancel_task()`, but since
this (non-rpc) `._invoke()` task was trying to send to a now
disconnected canceller it was resulting in a `BrokenPipeError` (or similar)
error.
Now, we except for such IPC errors and only raise them when,
1. the transport `Channel` is for sure up (bc ow what's the point of
trying to send an error on the thing that caused it..)
2. it's definitely for handling an RPC task
Similarly if the entire main invoke `try:` excepts,
- we only hide the call-stack frame from the debugger (with
`__tracebackhide__: bool`) if it's an RPC task that has a connected
channel since we always want to see the frame when debugging internal
task or IPC failures.
- we don't bother trying to send errors to the context caller (actor)
when it's a non-RPC request since failures on actor-runtime-internal
tasks shouldn't really ever be reported remotely, only maybe raised
locally.
Also some other tidying,
- this properly corrects for the self-cancel case where an RPC context
is cancelled due to a local (runtime) task calling a method like
`Actor.cancel_soon()`. We now set our own `.uid` as the
`ContextCancelled.canceller` value so that other-end tasks know that
the cancellation was due to a self-cancellation by the actor itself.
We still need to properly test for this though!
- add a more detailed module doc-str.
- more explicit imports for `trio` core types throughout.
This took way too long to get right but hopefully will give us grok-able
and correct context exit semantics going forward B)
The main fixes were:
- always shielding the `MsgStream.aclose()` call on teardown to avoid
bubbling a `Cancelled`.
- properly absorbing any `ContextCancelled` in cases due to "self
cancellation" using the new `Context.canceller` in the logic.
- capturing any error raised by the `Context.result()` call in the
"normal exit, result received" case and setting it as the
`Context._local_error` so that self-cancels can be easily measured via
`Context.cancelled_caught` in same way as remote-error caused
cancellations.
- extremely detailed comments around all of the cancellation-error cases
to avoid ever getting confused about the control flow in the future XD
As part of extremely detailed inter-peer-actor testing, add much more
granular `Context` cancellation state tracking via the following (new)
fields:
- `.canceller: tuple[str, str]` the uuid of the actor responsible for
the cancellation condition - always set by
`Context._maybe_cancel_and_set_remote_error()` and replaces
`._cancelled_remote` and `.cancel_called_remote`. If set, this value
should normally always match a value from some `ContextCancelled`
raised or caught by one side of the context.
- `._local_error` which is always set to the locally raised (and caller
or callee task's scope-internal) error which caused any
eventual cancellation/error condition and thus any closure of the
context's per-task-side-`trio.Nursery`.
- `.cancelled_caught: bool` is now always `True` whenever the local task
catches (or "silently absorbs") a `ContextCancelled` (a `ctxc`) that
indeed originated from one of the context's linked tasks or any other
context which raised its own `ctxc` in the current `.open_context()` scope.
=> whenever there is a case that no `ContextCancelled` was raised
**in** the `.open_context().__aexit__()` (eg. `ctx.result()` called
after a call `ctx.cancel()`), we still consider the context's as
having "caught a cancellation" since the `ctxc` was indeed silently
handled by the cancel requester; all other error cases are already
represented by mirroring the state of the `._scope: trio.CancelScope`
=> IOW there should be **no case** where an error is **not raised** in
the context's scope and `.cancelled_caught: bool == False`, i.e. no
case where `._scope.cancelled_caught == False and ._local_error is not
None`!
- always raise any `ctxc` from `.open_stream()` if `._cancel_called ==
True` - if the cancellation request has not already resulted in
a `._remote_error: ContextCancelled` we raise a `RuntimeError` to
indicate improper usage to the guilty side's task code.
- make `._maybe_raise_remote_err()` a sync func and don't raise
any `ctxc` which is matched against a `.canceller` determined to
be the current actor, aka a "self cancel", and always set the
`._local_error` to any such `ctxc`.
- `.side: str` taken from inside `.cancel()` and unused as of now since
it might be better re-written as a similar `.is_opener() -> bool`?
- drop unused `._started_received: bool`..
- TONS and TONS of detailed comments/docs to attempt to explain all the
possible cancellation/exit cases and how they should exhibit as either
silent closes or raises from the `Context` API!
Adjust the `._runtime._invoke()` code to match:
- use `ctx._maybe_raise_remote_err()` in `._invoke()`.
- adjust to new `.canceller` property.
- more type hints.
- better `log.cancel()` msging around self-cancels vs. peer-cancels.
- always set the `._local_error: BaseException` for the "callee" task
just like `Portal.open_context()` now will do B)
Prior we were raising any `Context._remote_error` directly and doing
(more or less) the same `ContextCancelled` "absorbing" logic (well
kinda) in block; instead delegate to the method
Specifically in the `.__aexit__()` phase to ensure remote,
runtime-internal, and locally raised error-during-cancelled-handling
exceptions are NEVER masked by a local `ContextCancelled` or any
exception group of `trio.Cancelled`s.
Also adds a ton of details to doc strings including extreme detail
surrounding the `ContextCancelled` raising cases and their processing
inside `.open_context()`'s exception handler blocks.
Details, details:
- internal rename `err`/`_err` stuff to just be `scope_err` since it's
effectively the error bubbled up from the context's surrounding (and
cross-actor) "scope".
- always shield `._recv_chan.aclose()` to avoid any `Cancelled` from
masking the `scope_err` with a runtime related `trio.Cancelled`.
- explicitly catch the specific set of `scope_err: BaseException` that
we can reasonably expect to handle instead of the catch-all parent
type including exception groups, cancels and KBIs.
Well first off, turns out it's never used and generally speaking
doesn't seem to help much with "runtime hacking/debugging"; why would
we need to "fabricate" a msg when `.cancel()` is called to self-cancel?
Also (and since `._maybe_cancel_and_set_remote_error()` now takes an
`error: BaseException` as input and thus expects error-msg unpacking
prior to being called), we now manually set `Context._cancel_msg: dict`
just prior to any remote error assignment - so any case where we would
have fabbed a "cancel msg" near calling `.cancel()`, just do the manual
assign.
In this vein some other subtle changes:
- obviously don't set `._cancel_msg` in `.cancel()` since it's no longer
an input.
- generally do walrus-style `error := unpack_error()` before applying
and setting remote error-msg state.
- always raise any `._remote_error` in `.result()` instead of returning
the exception instance and check before AND after the underlying mem
chan read.
- add notes/todos around `raise self._remote_error from None` masking of
(runtime) errors in `._maybe_raise_remote_err()` and use it inside
`.result()` since we had the inverse duplicate logic there anyway..
Further, this adds and extends a ton of (internal) interface docs and
details comments around the `Context` API including many subtleties
pertaining to calling `._maybe_cancel_and_set_remote_error()`.
Bump type annotations to 3.10+ style throughout module as well as fill
out doc strings a bit. Inside `unpack_error()` pop any `error_dict: dict`
and,
- return `None` early if not found,
- versus pass directly as `**error_dict` to the error constructor
instead of a double field read.
Since both `MsgStream.receive()` and `.receive_nowait()` need the same
raising logic when a non-stream msg arrives (so that maybe an
appropriate IPC translated error can be raised) move the `KeyError`
handler code into a new `._streaming._raise_from_no_yield_msg()` func
and call it from both methods to make the error-interface-raising
symmetrical across both methods.
Previously we weren't raising a remote error if the local scope was
cancelled during a call to `Context.result()` which is problematic if
the caller WAS NOT the requester for said remote cancellation; in that
case we still want a `ContextCancelled` raised with the `.canceller:
str` set to the cancelling actor uid.
Further fix a naming bug where the (seemingly older) `._remote_err` was
being set to such an error instead of `._remote_error` XD
We were using a `all(<yielded values>)` condition which obviously won't
work if the batched managers yield any non-truthy value. So instead see
the `unwrapped: dict` with the `id(mngrs)` and only unblock once all
values have been filled in to be something that is not that value.
Detect if the input ref is a non-func (like an `object` instance) in
which case grab its type name using `type()`. Wrap all the name-getting
into a new `_mk_fqpn()` static meth: gets the "fully qualified path
name" and returns path and name in tuple; port other methds to use it.
Refine and update the docs B)
- `Context._cancel_called_remote` -> `._cancelled_remote` since "called"
implies the cancellation was "requested" when it could be due to
another error and the actor uid is the value - only set once the far
end task scope is terminated due to either error or cancel, which has
nothing to do with *what* caused the cancellation.
- `Actor._cancel_called_remote` -> `._cancel_called_by_remote` which
emphasizes that this variable is **only set** IFF some remote actor
**requested that** this actor's runtime be cancelled via
`Actor.cancel()`.
Turns out you can get a case where you might be opening multiple
ctx-streams concurrently and during the context opening phase you block
for all contexts to open, but then when you eventually start opening
streams some slow to start context has caused the others become in an
overrun state.. so we need to let the caller control whether that's an
error ;)
This also needs a test!
Because obviously we probably want to support `allow_overruns` on the
remote callee side as well XD
Only found the bugs fixed in this patch this thanks to writing a much
more exhaustive test set for overrun cases B)
This actually caught further runtime bugs so it's gud i tried..
Add overrun-ignore enabled / disabled cases and error catching for all
of them. More or less this should cover every possible outcome when
it comes to setting `allow_overruns: bool` i hope XD
This adds remote cancellation semantics to our `tractor.Context`
machinery to more closely match that of `trio.CancelScope` but
with operational differences to handle the nature of parallel tasks interoperating
across multiple memory boundaries:
- if an actor task cancels some context it has opened via
`Context.cancel()`, the remote (scope linked) task will be cancelled
using the normal `CancelScope` semantics of `trio` meaning the remote
cancel scope surrounding the far side task is cancelled and
`trio.Cancelled`s are expected to be raised in that scope as per
normal `trio` operation, and in the case where no error is raised
in that remote scope, a `ContextCancelled` error is raised inside the
runtime machinery and relayed back to the opener/caller side of the
context.
- if any actor task cancels a full remote actor runtime using
`Portal.cancel_actor()` the same semantics as above apply except every
other remote actor task which also has an open context with the actor
which was cancelled will also be sent a `ContextCancelled` **but**
with the `.canceller` field set to the uid of the original cancel
requesting actor.
This changeset also includes a more "proper" solution to the issue of
"allowing overruns" during streaming without attempting to implement any
form of IPC streaming backpressure. Implementing task-granularity
backpressure cross-process turns out to be more or less impossible
without augmenting out streaming protocol (likely at the cost of
performance). Further allowing overruns requires special care since
any blocking of the runtime RPC msg loop task effectively can block
control msgs such as cancels and stream terminations.
The implementation details per abstraction layer are as follows.
._streaming.Context:
- add a new contructor factor func `mk_context()` which provides
a strictly private init-er whilst allowing us to not have to define
an `.__init__()` on the type def.
- add public `.cancel_called` and `.cancel_called_remote` properties.
- general rename of what was the internal `._backpressure` var to
`._allow_overruns: bool`.
- move the old contents of `Actor._push_result()` into a new
`._deliver_msg()` allowing for better encapsulation of per-ctx
msg handling.
- always check for received 'error' msgs and process them with the new
`_maybe_cancel_and_set_remote_error()` **before** any msg delivery to
the local task, thus guaranteeing error and cancellation handling
despite any overflow handling.
- add a new `._drain_overflows()` task-method for use with new
`._allow_overruns: bool = True` mode.
- add back a `._scope_nursery: trio.Nursery` (allocated in
`Portal.open_context()`) who's sole purpose is to spawn a single task
which runs the above method; anything else is an error.
- augment `._deliver_msg()` to start a task and run the above method
when operating in no overrun mode; the task queues overflow msgs and
attempts to send them to the underlying mem chan using a blocking
`.send()` call.
- on context exit, any existing "drainer task" will be cancelled and
remaining overflow queued msgs are discarded with a warning.
- rename `._error` -> `_remote_error` and set it in a new method
`_maybe_cancel_and_set_remote_error()` which is called before
processing
- adjust `.result()` to always call `._maybe_raise_remote_err()` at its
start such that whenever a `ContextCancelled` arrives we do logic for
whether or not to immediately raise that error or ignore it due to the
current actor being the one who requested the cancel, by checking the
error's `.canceller` field.
- set the default value of `._result` to be `id(Context()` thus avoiding
conflict with any `.result()` actually being `False`..
._runtime.Actor:
- augment `.cancel()` and `._cancel_task()` and `.cancel_rpc_tasks()` to
take a `requesting_uid: tuple` indicating the source actor of every
cancellation request.
- pass through the new `Context._allow_overruns` through `.get_context()`
- call the new `Context._deliver_msg()` from `._push_result()` (since
the factoring out that method's contents).
._runtime._invoke:
- `TastStatus.started()` back a `Context` (unless an error is raised)
instead of the cancel scope to make it easy to set/get state on that
context for the purposes of cancellation and remote error relay.
- always raise any remote error via `Context._maybe_raise_remote_err()`
before doing any `ContextCancelled` logic.
- assign any `Context._cancel_called_remote` set by the `requesting_uid`
cancel methods (mentioned above) to the `ContextCancelled.canceller`.
._runtime.process_messages:
- always pass a `requesting_uid: tuple` to `Actor.cancel()` and
`._cancel_task` to that any corresponding `ContextCancelled.canceller`
can be set inside `._invoke()`.
To handle both remote cancellation this adds `ContextCanceled.canceller:
tuple` the uid of the cancel requesting actor and is expected to be set
by the runtime when servicing any remote cancel request. This makes it
possible for `ContextCancelled` receivers to know whether "their actor
runtime" is the source of the cancellation.
Also add an explicit `RemoteActor.src_actor_uid` which better formalizes
the notion of "which remote actor" the error originated from.
Both of these new attrs are expected to be packed in the `.msgdata` when
the errors are loaded locally.
Previously we were leaking our (pdb++) override into the Python runtime
which would always result in a runtime error whenever `breakpoint()` is
called outside our runtime; after exit of the root actor . This
explicitly restores any previous hook override (detected during startup)
or deletes the hook and restores the environment if none existed prior.
Also adds a new WIP debugging example script to ensure breakpointing
works as normal after runtime close; this will be added to the test
suite.
Makes the broadcast test suite not hang xD, and is our expected default
behaviour. Also removes a ton of commented legacy cruft from before the
refactor to remove the `.receive()` recursion and fixes some typing.
Oh right, and in the case where there's only one subscriber left we warn
log about it since in theory we could actually entirely unwind the
bcaster back to the original underlying, though not sure if that's sane
or works for some use cases (like wanting to have some other subscriber
get added dynamically later).
Since one-way streaming can be accomplished by just *not* sending on one
side (and/or thus wrapping such usage in a more restrictive API), we
just drop the recv-only parent type. The only method different was
`MsgStream.send()`, now merged in. Further in usage of `.subscribe()`
we monkey patch the underlying stream's `.send()` onto the delivered
broadcast receiver so that subscriber tasks can two-way stream as though
using the stream directly.
This allows us to more definitively drop `tractor.open_stream_from()` in
the longer run if we so choose as well; note currently this will
potentially create an issue if a caller tries to `.send()` on such a one
way stream.
Driven by a bug found in `piker` where we'd get an inf recursion error
due to `BroadcastReceiver.receive()` being called when consumer tasks
are awoken but no value is ready to `.nowait_receive()`.
This new rework takes an approach closer to the interface and internals
of `trio.MemoryReceiveChannel` particularly in terms of,
- implementing a `BroadcastReceiver.receive_nowait()` and using it
within the async `.receive()`.
- failing over to an internal `._receive_from_underlying()` when the
`_nowait()` call raises `trio.WouldBlock`.
- adding `BroadcastState.statistics()` for debugging and testing
dropping recursion from `.receive()`.
We weren't doing this originally I *think* just because of the path
dependent nature of the way the code was developed (originally being
mega pedantic about one-way vs. bidirectional streams) but, it doesn't
seem like there's any issue just calling the stream's `.aclose()`; also
have the benefit of just being less code and logic checks B)
When backpressure is used and a feeder mem chan breaks during msg
delivery (usually because the IPC allocating task already terminated)
instead of raising we simply warn as we do for the non-backpressure
case.
Also, add a proper `Actor.is_arbiter` test inside `._invoke()` to avoid
doing an arbiter-registry lookup if the current actor **is** the
registrar.
The stdlib has all sorts of muckery with ignoring SIGINT in the
`Pdb._cmdloop()` but here we just override all that since we don't trust
their decisions about cancellation handling whatsoever. Adds
a `Lock.repl: MultiActorPdb` attr which is set by any task which
acquires root TTY lock indicating (via actor global state) that the
current actor is using the debugger REPL and can be expected to re-draw
the prompt on SIGINT. Further we mask out log messages from any actor
who also has the `shield_sigint_handler()` enabled to avoid logging
noise when debugging.
If we pack the nursery parent task's error into the `errors` table
directly in the handler, we don't need to specially handle packing that
same error into any exception group raised while handling sub-actor
cancellation; drops some ugly indentation ;)
Pretty sure this is the final touch to alleviate all our debug lock
headaches! Instead of trying to revert to the "last" handler (as `pdb`
does internally in the stdlib) we always just revert to the handler
`trio` registers during startup. Further this seems to allow cancelling
the root-side locking task if it's detected as stale IFF we only do this
when the root actor is in a "no more IPC peers" state.
Deatz:
- (always) set `._debug.Lock._trio_handler` as the `trio` version, not
some last used handler to make sure we're getting the ctrl-c handling
we want when not in debug mode.
- assign the trio handler in `open_root_actor()`
`._runtime._async_main()` to be sure it's applied in subactors as well
as the root.
- only do debug lock blocking and root-side-locking-task cancels when
a "no peers" condition is detected in the root actor: i.e. no IPC
channels are detected by the root meaning it's impossible any actor
has a sane lock-state ongoing for debug mode.