As part of solving some final edge cases todo with inter-peer remote
cancellation (particularly a remote cancel from a separate actor
tree-client hanging on the request side in `modden`..) I needed less
dense, more line-delimited log msg formats when understanding ipc
channel and context cancels from console logging; this adds a ton of
that to:
- `._invoke()` which now does,
- better formatting of `Context`-task info as multi-line
`'<field>: <value>\n'` messages,
- use of `trio.Task` (from `.lowlevel.current_task()` for full
rpc-func namespace-path info,
- better "msg flow annotations" with `<=` for understanding
`ContextCancelled` flow.
- `Actor._stream_handler()` where in we break down IPC peers reporting
better as multi-line `|_<Channel>` log msgs instead of all jammed on
one line..
- `._ipc.Channel.send()` use `pformat()` for repr of packet.
Also tweak some optional deps imports for debug mode:
- add `maybe_import_gb()` for attempting to import `greenback`.
- maybe enable `stackscope` tree pprinter on `SIGUSR1` if installed.
Add a further stale-debugger-lock guard before removal:
- read the `._debug.Lock.global_actor_in_debug: tuple` uid and possibly
`maybe_wait_for_debugger()` when the child-user is known to have
a live process in our tree.
- only cancel `Lock._root_local_task_cs_in_debug: CancelScope` when
the disconnected channel maps to the `Lock.global_actor_in_debug`,
though not sure this is correct yet?
Started adding missing type annots in sections that were modified.
Since it's generally useful to know who is the cause of an overrun (say
bc you want your system to then adjust the writer side to slow tf down)
might as well pack an extra `.sender: tuple[str, str]` actor uid field
which can be relayed through `RemoteActorError` boxing. Add an extra
case for the exc-type to `unpack_error()` to match B)
Originally designed and used throughout `piker`, the subtype adds some
handy pprinting and field diffing extras often handy when viewing struct
types in logging or REPL console interfaces B)
Obvi this rejigs the `tractor.msg` mod into a sub-pkg and moves the
existing namespace obj-pointer stuff into a new `.msg.ptr` sub mod.
Since we use basically the exact same set of logic in
`Portal.open_context()` when expecting the first `'started'` msg factor
and generalize `._streaming._raise_from_no_yield_msg()` into a new
`._exceptions._raise_from_no_key_in_msg()` (as per the lingering todo)
which obvi requires a more generalized / optional signature including
a caller specific `log` obj. Obvi call the new func from all the other
modules X)
Apparently (and i don't know if this was always broken [i feel like no?]
or is a recent change to stdlib's `logging` stuff) we need increment the
`stacklevel` input by one for our custom level methods now? Without this
you're going to see the path to the method's-callstack-frame on every
emission instead of to the caller's. I first noticed this when debugging
the workspace layer spawning in `modden.bigd` and then verified it in
other depended projects..
I guess we should add some tests for this as well XD
Took me longer then i wanted to figure out the source of
a failed-response to a remote-cancellation (in this case in `modden`
where a client was cancelling a workspace layer.. but disconnects before
receiving the ack msg) that was triggering an IPC error when sending the
error msg for the cancellation of a `Actor._cancel_task()`, but since
this (non-rpc) `._invoke()` task was trying to send to a now
disconnected canceller it was resulting in a `BrokenPipeError` (or similar)
error.
Now, we except for such IPC errors and only raise them when,
1. the transport `Channel` is for sure up (bc ow what's the point of
trying to send an error on the thing that caused it..)
2. it's definitely for handling an RPC task
Similarly if the entire main invoke `try:` excepts,
- we only hide the call-stack frame from the debugger (with
`__tracebackhide__: bool`) if it's an RPC task that has a connected
channel since we always want to see the frame when debugging internal
task or IPC failures.
- we don't bother trying to send errors to the context caller (actor)
when it's a non-RPC request since failures on actor-runtime-internal
tasks shouldn't really ever be reported remotely, only maybe raised
locally.
Also some other tidying,
- this properly corrects for the self-cancel case where an RPC context
is cancelled due to a local (runtime) task calling a method like
`Actor.cancel_soon()`. We now set our own `.uid` as the
`ContextCancelled.canceller` value so that other-end tasks know that
the cancellation was due to a self-cancellation by the actor itself.
We still need to properly test for this though!
- add a more detailed module doc-str.
- more explicit imports for `trio` core types throughout.
Drop all the nested `@acm` blocks and defunct comments from initial
validations. Add some todos for cases that are still unclear such as
whether the caller / streamer should have `.cancelled_caught == True` in
it's teardown.
We can now make asserts on `.cancelled_caught` and `_remote_error` vs.
`_local_error`. Expect a runtime error when `Context.open_stream()` is
called AFTER `.cancel()` and the remote `ContextCancelled` hasn't
arrived (yet). Adjust to `'itself'` string in self-cancel case.
This took way too long to get right but hopefully will give us grok-able
and correct context exit semantics going forward B)
The main fixes were:
- always shielding the `MsgStream.aclose()` call on teardown to avoid
bubbling a `Cancelled`.
- properly absorbing any `ContextCancelled` in cases due to "self
cancellation" using the new `Context.canceller` in the logic.
- capturing any error raised by the `Context.result()` call in the
"normal exit, result received" case and setting it as the
`Context._local_error` so that self-cancels can be easily measured via
`Context.cancelled_caught` in same way as remote-error caused
cancellations.
- extremely detailed comments around all of the cancellation-error cases
to avoid ever getting confused about the control flow in the future XD
As part of extremely detailed inter-peer-actor testing, add much more
granular `Context` cancellation state tracking via the following (new)
fields:
- `.canceller: tuple[str, str]` the uuid of the actor responsible for
the cancellation condition - always set by
`Context._maybe_cancel_and_set_remote_error()` and replaces
`._cancelled_remote` and `.cancel_called_remote`. If set, this value
should normally always match a value from some `ContextCancelled`
raised or caught by one side of the context.
- `._local_error` which is always set to the locally raised (and caller
or callee task's scope-internal) error which caused any
eventual cancellation/error condition and thus any closure of the
context's per-task-side-`trio.Nursery`.
- `.cancelled_caught: bool` is now always `True` whenever the local task
catches (or "silently absorbs") a `ContextCancelled` (a `ctxc`) that
indeed originated from one of the context's linked tasks or any other
context which raised its own `ctxc` in the current `.open_context()` scope.
=> whenever there is a case that no `ContextCancelled` was raised
**in** the `.open_context().__aexit__()` (eg. `ctx.result()` called
after a call `ctx.cancel()`), we still consider the context's as
having "caught a cancellation" since the `ctxc` was indeed silently
handled by the cancel requester; all other error cases are already
represented by mirroring the state of the `._scope: trio.CancelScope`
=> IOW there should be **no case** where an error is **not raised** in
the context's scope and `.cancelled_caught: bool == False`, i.e. no
case where `._scope.cancelled_caught == False and ._local_error is not
None`!
- always raise any `ctxc` from `.open_stream()` if `._cancel_called ==
True` - if the cancellation request has not already resulted in
a `._remote_error: ContextCancelled` we raise a `RuntimeError` to
indicate improper usage to the guilty side's task code.
- make `._maybe_raise_remote_err()` a sync func and don't raise
any `ctxc` which is matched against a `.canceller` determined to
be the current actor, aka a "self cancel", and always set the
`._local_error` to any such `ctxc`.
- `.side: str` taken from inside `.cancel()` and unused as of now since
it might be better re-written as a similar `.is_opener() -> bool`?
- drop unused `._started_received: bool`..
- TONS and TONS of detailed comments/docs to attempt to explain all the
possible cancellation/exit cases and how they should exhibit as either
silent closes or raises from the `Context` API!
Adjust the `._runtime._invoke()` code to match:
- use `ctx._maybe_raise_remote_err()` in `._invoke()`.
- adjust to new `.canceller` property.
- more type hints.
- better `log.cancel()` msging around self-cancels vs. peer-cancels.
- always set the `._local_error: BaseException` for the "callee" task
just like `Portal.open_context()` now will do B)
Prior we were raising any `Context._remote_error` directly and doing
(more or less) the same `ContextCancelled` "absorbing" logic (well
kinda) in block; instead delegate to the method
Since it's handy to be able to debug the *writing* of this instance var
(particularly when checking state passed down to a child in
`Actor._from_parent()`), rename and wrap the underlying
`Actor._reg_addrs` as a settable `@property` and add validation to
the `.setter` for sanity - actor discovery is a critical functionality.
Other tweaks:
- fix `.cancel_soon()` to pass expected argument..
- update internal runtime error message to be simpler and link to GH issues.
- use new `Actor.reg_addrs` throughout core.
Tests that appropriate `Context` exit state, the relay of
a `ContextCancelled` error and its `.canceller: tuple[str, str]` value
are set when an inter-peer cancellation happens via an "out of band"
request method (in this case using `Portal.cancel_actor()` and that
cancellation is propagated "horizontally" to other peers. Verify that
any such cancellation scenario which also experiences an "error during
`ContextCancelled` handling" DOES NOT result in that further error being
suppressed and that the user's exception bubbles out of the
`Context.open_context()` block(s) appropriately!
Likely more tests to come as well as some factoring of the teardown
state checks where possible.
Pertains to serious testing the major work landing in #357