Exactly like how it's organized for `Portal.open_context()`, put the
main streaming API `@acm` with the `MsgStream` code and bind the method
to the new module func.
Other,
- rename `Context.result()` -> `.wait_for_result()` to better match the
blocking semantics and rebind `.result()` as deprecated.
- add doc-str for `Context.maybe_raise()`.
It ended up getting necessarily implemented as the `PldRx` though at
a different layer and won't be needed as part of `MsgCodec` most likely,
though this original idea did provide the source of inspiration for how
things work now!
Also Move the commented TODO proto for a codec hook factory from
`.types` to `._codec` where it prolly better fits and update some msg
related todo/questions.
It's been a long time prepped and now finally implemented!
Offer a `shield: bool` argument from our async `._debug` APIs:
- `await tractor.pause(shield=True)`,
- `await tractor.post_mortem(shield=True)`
^-These-^ can now be used inside cancelled `trio.CancelScope`s,
something very handy when introspecting complex (distributed) system
tear/shut-downs particularly under remote error or (inter-peer)
cancellation conditions B)
Thanks to previous prepping in a prior attempt and various patches from
the rigorous rework of `.devx._debug` internals around typed msg specs,
there ain't much that was needed!
Impl deats
- obvi passthrough `shield` from the public API endpoints (was already
done from a prior attempt).
- put ad-hoc internal `with trio.CancelScope(shield=shield):` around all
checkpoints inside `._pause()` for both the root-process and subactor
case branches.
Add a fairly rigorous example, `examples/debugging/shielded_pause.py`
with a wrapping `pexpect` test, `test_debugger.test_shield_pause()` and
ensure it covers as many cases as i can think of offhand:
- multiple `.pause()` entries in a loop despite parent scope
cancellation in a subactor RPC task which itself spawns a sub-task.
- a `trio.Nursery.parent_task` which raises, is handled and
tries to enter and unshielded `.post_mortem()`, which of course
internally raises `Cancelled` in a `._pause()` checkpoint, so we catch
the `Cancelled` again and then debug the debugger's internal
cancellation with specific checks for the particular raising
checkpoint-LOC.
- do ^- the latter -^ for both subactor and root cases to ensure we
can debug `._pause()` itself when it tries to REPL engage from
a cancelled task scope Bo
Keep the old alias, but i think it's better form to use longer names for
internal public APIs and this name better reflects the functionality:
decoding and returning a `PayloadMsg.pld` field.
Since turns out we didn't have a single example using that API Bo
The test granular-ly checks all use cases:
- `.post_mortem()` manual calls in both subactor and root.
- ensuring built-in RPC crash handling activates after each manual one
from ^.
- drafted some call-stack frame checking that i commented out for now
since we need to first do ANSI escape code removal due to the
colorization that `pdbp` does by default.
|_ added a TODO with SO link on `assert_before()`.
Also todo-staged a shielded-pause test to match with the already
existing-but-needs-refinement example B)
Such that we're boxing the interchanged lib's specific error
`msgspec.ValidationError` in this case) type much like how
a `ContextCancelled[trio.Cancelled]` is composed; allows for seemless
multi-backend-codec support later as well B)
Pass `ctx.maybe_raise(from_src_exc=src_err)` where needed in a couple
spots; as `None` in the send-side `Started` MTE case to avoid showing
the `._scope1.cancel_called` result in the traceback from the
`.open_context()` child-sync phase.
That is as a control to `Context._maybe_raise_remote_err()` such that
if set to anything other then the default (`False` value), we do
`raise remote_error from from_src_exc` such that caller can choose to
suppress or override the `.__cause__` tb.
Also tidy up and old masked TODO regarding calling `.maybe_raise()`
after the caller exits from the `yield` in `.open_context()`..
Send-side `MsgTypeError`s actually shouldn't have any "boxed" traceback
per say since they're raised in the transmitting actor's local task env
and we (normally) don't want the ascii decoration added around the
error's `._message: str`, that is not until the exc is `pack_error()`-ed
before transit. As such, the presentation of an embedded traceback (and
its ascii box) gets bypassed when only a `._message: str` is set (as we
now do for pld-spec failures in `_mk_msg_type_err()`).
Further this tweaks the `.pformat()` output to include the `._message`
part to look like `<RemoteActorError( <._message> ) ..` instead of
jamming it implicitly to the end of the embedded `.tb_str` (as was done
implicitly by `unpack_error()`) and also adds better handling for the
`with_type_header == False` case including forcing that case when we
detect that the currently handled exc is the RAE in `.pformat()`.
Toss in a lengthier doc-str explaining it all.
Surrounding/supporting changes,
- better `unpack_error()` message which just briefly reports the remote
task's error type.
- add public `.message: str` prop.
- always set a `._extra_msgdata: dict` since some MTE props rely on it.
- handle `.boxed_type == None` for `.boxed_type_str`.
- maybe pack any detected input or `exc.message` in `pack_error()`.
- comment cruft cleanup in `_mk_msg_type_err()`.
Allows passing a custom error msg other then the traceback-str over the
wire. Make `.tb_str` optional (in the blank `''` sense) since it's
treated that way thus far in `._exceptions.pack_error()`.
Namely checking that `Context._remote_error` is set to the raised MTE
in the invalid started and return value cases since prior to the recent
underlying changes to the `Context.result()` impl, it would not match.
Further,
- do asserts for non-MTE raising cases in both the parent and child.
- add todos for testing ctx-outcomes for per-side-validation policies
i anticipate supporting and implied msg-dialog race cases therein.
More specifically, if `.open_context()` is cancelled when awaiting the
first `Context.started()` during the child task sync phase, check to see
if it was due to `._scope.cancel_called` and raise any remote error via
`.maybe_raise()` instead the `trio.Cancelled` like in every other
remote-error handling case. Ensure we set `._scope[_nursery]` only after
the `Started` has arrived and audited.
Since in the case of the `Actor._cancel_task()` related runtime eps we
actually don't EVER register them in `Actor._rpc_tasks`.. logging about
them is just needless noise, though maybe we should track them in a diff
table; something like a `._runtime_rpc_tasks`?
Drop the cancel-request-for-stale-RPC-task (`KeyError` case in
`Actor._cancel_task()`) log-emit level in to `.runtime()`; it's
generally not useful info other then for granular race condition eval
when hacking the runtime.
So when `is_started_send_side is True` we raise the newly created
`MsgTypeError` (MTE) directly instead of doing all the `Error`-msg pack
and unpack to raise stuff via `_raise_from_unexpected_msg()` since the
raise should happen send side anyway and so doesn't emulate any remote
fault like in a bad `Return` or `Started` without send-side pld-spec
validation.
Oh, and proxy-through the `hide_tb: bool` input from `.drain_to_final_msg()`
to `.recv_msg_w_pld()`.
By calling `Context._maybe_cancel_and_set_remote_error(exc)` on any
unpacked `Error` msg; provides for `Context.maybe_error` consistency to
match all other error delivery cases.
Filling out the helper `validate_payload_msg()` staged in a prior commit
and adjusting all imports to match.
Also add a `raise_mte: bool` flag for potential usage where the caller
wants to handle the MTE instance themselves.
Expecting `Started` or `Return` with respective bad `.pld` values
depending on what type of failure is test parametrized.
This makes the suite run green it seems B)
The `TypeAlias` for the msg type-group is now `MsgType` and any user
touching shuttle messages can now be typed as `PayloadMsg`.
Relatedly, add MTE specific `Error._bad_msg[_as_dict]` fields which are
handy for introspection of remote decode failures.
Such that if caught by user code and/or the runtime we can introspect
the original msg which caused the type error. Previously this was kinda
half-baked with a `.msg_dict` which was delivered from an `Any`-decode
of the shuttle msg in `_mk_msg_type_err()` but now this more explicitly
refines the API and supports both `PayloadMsg`-instance or the msg-dict
style injection:
- allow passing either of `bad_msg: PayloadMsg|None` or
`bad_msg_as_dict: dict|None` to `MsgTypeError.from_decode()`.
- expose public props for both ^ whilst dropping prior `.msgdict`.
- rework `.from_decode()` to explicitly accept `**extra_msgdata: dict`
|_ only overriding it from any `bad_msg_as_dict` if the keys are found in
`_ipcmsg_keys`, **except** for `_bad_msg` when `bad_msg` is passed.
|_ drop `.ipc_msg` passthrough.
|_ drop `msgdict` input.
- adjust `.cid` to only pull from the `.bad_msg` if set.
Related fixes/adjustments:
- `pack_from_raise()` should pull `boxed_type_str` from
`boxed_type.__name__`, not the `type()` of it.. also add a
`hide_tb: bool` flag.
- don't include `_msg_dict` and `_bad_msg` in the `_body_fields` set.
- allow more granular boxed traceback-str controls:
|_ allow passing a `tb_str: str` explicitly in which case we use it
verbatim and presume caller knows what they're doing.
|_ when not provided, use the more explicit
`traceback.format_exception(exc)` since the error instance is
a required input (we still fail back to the old `.format_exc()` call
if for some reason the caller passes `None`; but that should be
a bug right?).
|_ if a `tb: TracebackType` and a `tb_str` is passed, concat them.
- in `RemoteActorError.pformat()` don't indent the `._message` part used
for the `body` when `with_type_header == False`.
- update `_mk_msg_type_err()` to use `bad_msg`/`bad_msg_as_dict`
appropriately and drop passing `ipc_msg`.
In the sense that we handle it as a special case that exposed
through to `RxPld.dec_msg()` with a new `is_started_send_side: bool`.
(Non-ideal) `Context.started()` impl deats:
- only do send-side pld-spec validation when a new `validate_pld_spec`
is set (by default it's not).
- call `self.pld_rx.dec_msg(is_started_send_side=True)` to validate the
payload field from the just codec-ed `Started` msg's `msg_bytes` by
passing the `roundtripped` msg (with it's `.pld: Raw`) directly.
- add a `hide_tb: bool` param and proxy it to the `.dec_msg()` call.
(Non-ideal) `PldRx.dec_msg()` impl deats:
- for now we're packing the MTE inside an `Error` via a manual call to
`pack_error()` and then setting that as the `msg` passed to
`_raise_from_unexpected_msg()` (though really we should just raise
inline?).
- manually set the `MsgTypeError._ipc_msg` to the above..
Other,
- more comprehensive `Context` type doc string.
- various `hide_tb: bool` kwarg additions through `._ops.PldRx` meths.
- proto a `.msg._ops.validate_payload_msg()` helper planned to get the
logic from this version of `.started()`'s send-side validation so as
to be useful more generally elsewhere.. (like for raising back
`Return` values on the child side?).
Warning: this commit may have been made out of order from required
changes to `._exceptions` which will come in a follow up!
Starts with some very basic cases:
- verify both subactor-as-child-ctx-task send side validation (failures)
as well as relay and raise on root-parent-side-task.
- wrap failure expectation cases that bubble out of `@acm`s with
a `maybe_expect_raises()` equiv wrapper with an embedded timeout.
- add `Return` cases including invalid by `str` and valid by a `None`.
Still ToDo:
- commit impl changes to make the bulk of this suite pass.
- adjust how `MsgTypeError`s format the local (`.started()`) send side
`.tb_str` such that we don't do a "boxed" error prior to
`pack_error()` being called normally prior to `Error` transit.
Related to the prior patch, re the new `with_type_header: bool`:
- in the `with_type_header == True` use case make sure we keep the first
`._message: str` line non-indented since it'll show just after the
header-line's type path with ':'.
- when `False` drop the `)>` `repr()`-instance style as well so that we
just get the ascii boxed traceback as though it's the error
message-`str` not the `repr()` of the error obj.
Other,
- hide `pack_from_raise()` call frame since it'll show in debug mode
crash handling..
- mk `MsgTypeError.from_decode()` explicitly accept and proxy an
optional `ipc_msg` and change `msgdict` to also be optional, only
reading out the `**extra_msgdata` when provided.
- expose a `_mk_msg_type_err(src_err_msg: Error|None = None,)` for
callers who which to inject a `._ipc_msg: Msgtype` to the MTE.
|_ add a note how we can't use it due to a causality-dilemma when pld
validating `Started` on the send side..
And IFF the `await wait_func(proc)` is cancelled such that we avoid
clobbering some subactor that might be REPL-ing even though its parent
actor is in the midst of (gracefully) cancelling it.
From both `open_root_actor()` and `._entry._trio_main()`.
Other `breakpoint()`-from-sync-func fixes:
- properly disable the default hook using `"0"` XD
- offer a `hide_tb: bool` from `open_root_actor()`.
- disable hiding the `._trio_main()` frame, bc pretty sure it doesn't
help anyone (either way) when REPL-ing/tb-ing from a subactor..?
Mostly the result of the `RemoteActorError.pformat()` and our
new `_pause/crash_msg: str`s which include the `trio.Task.__repr__()`
in the `log.pdb()` message.
Obvi use the `in_prompt_msg()` to accomplish where not used prior.
ToDo later:
-[ ] still some outstanding questions on how detailed inceptions
should look, eg. in `test_multi_nested_subactors_error_through_nurseries()`
|_maybe we should be more pedantic at checking `.src_uid` vs.
`.relay_uid` fields?
-[ ] staged a placeholder test for verifying correct call-stack frame on
crash handler REPL entry.
-[ ] also need a test to verify that you can't pause from an already paused actor task
such as can happen if you try to step through runtime code that has
a recurrent entry to `._debug.pause()`.
Call it `hide_runtime_frames()` and stick all the lines from the top of
the `._debug` mod in there along with a little `log.devx()` emission on
what gets hidden by default ;)
Other,
- fix ref-error where internal-error handler might trigger despite the
debug `req_ctx` not yet having init-ed, such that we don't try to
cancel or log about it when it never was fully created/initialize..
- fix assignment typo iniside `_set_trace()` for `task`.. lel
Such that when displaying with `.__str__()` we do not show the type
header (style) since normally python's raising machinery already prints
the type path like `'tractor._exceptions.RemoteActorError:'`, so doing
it 2x is a bit ugly ;p
In support,
- include `.relay_uid` in `RemoteActorError.extra_body_fields`.
- offer a `with_type_header: bool` to `.pformat()` and only put the
opening type path and closing `')>'` tail line when `True`.
- add `.is_inception() -> bool:` for an easy way to determine if the
error is multi-hop relayed.
- only repr the `'|_relay_uid=<uid>'` field when an error is an inception.
- tweak the invalid-payload case in `_mk_msg_type_err()` to explicitly
state in the `message` how the `any_pld` value does not match the `MsgDec.pld_spec`
by decoding the invalid `.pld` with an any-dec.
- allow `_mk_msg_type_err(**mte_kwargs)` passthrough.
- pass `boxed_type=cls` inside `MsgTypeError.from_decode()`.
More or less by pedantically separating and managing root and subactor
request syncing events to always be managed by the locking IPC context
task-funcs:
- for the root's "child"-side, `lock_tty_for_child()` directly creates
and sets a new `Lock.req_handler_finished` inside a `finally:`
- for the sub's "parent"-side, `request_root_stdio_lock()` does the same
with a new `DebugStatus.req_finished` event and separates it from
the `.repl_release` event (which indicates a "c" or "q" from user and
thus exit of the REPL session) as well as sets a new `.req_task:
trio.Task` to explicitly distinguish from the app-user-task that
enters the REPL vs. the paired bg task used to request the global
root's stdio mutex alongside it.
- apply the `__pld_spec__` on "child"-side of the ctx using the new
`Portal.open_context(pld_spec)` parameter support; drops use of any
`ContextVar` malarky used prior for `PldRx` mgmt.
- removing `Lock.no_remote_has_tty` since it was a nebulous name and
from the prior "everything is in a `Lock`" design..
------ - ------
More rigorous impl to handle various edge cases in `._pause()`:
- rejig `_enter_repl_sync()` to wrap the `debug_func == None` case
inside maybe-internal-error handler blocks.
- better logic for recurrent vs. multi-task contention for REPL entry in
subactors, by guarding using `DebugStatus.req_task` and by now waiting
on the new `DebugStatus.req_finished` for the multi-task contention
case.
- even better internal error handling and reporting for when this code
is hacked on and possibly broken ;p
------ - ------
Updates to `.pause_from_sync()` support:
- add optional `actor`, `task` kwargs to `_set_trace()` to allow
compat with the new explicit `debug_func` calling in `._pause()` and
pass a `threading.Thread` for `task` in the `.to_thread()` usage case.
- add an `except` block that tries to show the frame on any internal
error.
------ - ------
Relatedly includes a buncha cleanups/simplifications somewhat in
prep for some coming refinements (around `DebugStatus`):
- use all the new attrs mentioned above as needed in the SIGINT shielder.
- wait on `Lock.req_handler_finished` in `maybe_wait_for_debugger()`.
- dropping a ton of masked legacy code left in during the recent reworks.
- better comments, like on the use of `Context._scope` for shielding on
the "child"-side to avoid the need to manage yet another cs.
- add/change-to lotsa `log.devx()` level emissions for those infos which
are handy while hacking on the debugger but not ideal/necessary to be
user visible.
- obvi add lotsa follow up todo notes!
Much like other recent changes attempt to detect runtime-bug-causing
crashes and only show the runtime-endpoint frame when present.
Adds a `ActorNursery._scope_error: BaseException|None` attr to aid with
detection. Also toss in some todo notes for removing and replacing the
`.run_in_actor()` method API.
Just inside `._invoke()` after the `ctx: Context` is retrieved.
Also try our best to *not hide* internal frames when a non-user-code
crash happens, normally either due to a runtime RPC EP bug or
a transport failure.
Kinda like a "runtime"-y level for `.pdb()` (which is more or less like
an `.info()` for our debugger subsys) which can be used to report
internals info for those hacking on `.devx` tools.
Also, inject only the *last* 6 digits of the `id(Task)` in
`pformat_task_uid()` output by default.
Since the state mgmt becomes quite messy with multiple sub-tasks inside
an IPC ctx, AND bc generally speaking the payload-type-spec should map
1-to-1 with the `Context`, it doesn't make a lot of sense to be using
`ContextVar`s to modify the `Context.pld_rx: PldRx` instance.
Instead, always allocate a full instance inside `mk_context()` with the
default `.pld_rx: PldRx` set to use the `msg._ops._def_any_pldec: MsgDec`
In support, simplify the `.msg._ops` impl and APIs:
- drop `_ctxvar_PldRx`, `_def_pld_rx` and `current_pldrx()`.
- rename `PldRx._pldec` -> `._pld_dec`.
- rename the unused `PldRx.apply_to_ipc()` -> `.wraps_ipc()`.
- add a required `PldRx._ctx: Context` attr since it is needed
internally in some meths and each pld-rx now maps to a specific ctx.
- modify all recv methods to accept a `ipc: Context|MsgStream` (instead
of a `ctx` arg) since both have a ref to the same `._rx_chan` and there
are only a couple spots (in `.dec_msg()`) where we need the `ctx`
explicitly (which can now be easily accessed via a new `MsgStream.ctx`
property, see below).
- always show the `.dec_msg()` frame in tbs if there's a reference error
when calling `_raise_from_unexpected_msg()` in the fallthrough case.
- implement `limit_plds()` as light wrapper around getting the
`current_ipc_ctx()` and mutating its `MsgDec` via
`Context.pld_rx.limit_plds()`.
- add a `maybe_limit_plds()` which just provides an `@acm` equivalent of
`limit_plds()` handy for composing in a `async with ():` style block
(avoiding additional indent levels in the body of async funcs).
Obvi extend the `Context` and `MsgStream` interfaces as needed
to match the above:
- add a `Context.pld_rx` pub prop.
- new private refs to `Context._started_msg: Started` and
a `._started_pld` (mostly for internal debugging / testing / logging)
and set inside `.open_context()` immediately after the syncing phase.
- a `Context.has_outcome() -> bool:` predicate which can be used to more
easily determine if the ctx errored or has a final result.
- pub props for `MsgStream.ctx: Context` and `.chan: Channel` providing
full `ipc`-arg compat with the `PldRx` method signatures.
Finally got this working so that if/when an internal bug is introduced
to this request task-func, we can actually REPL-debug the lock request
task itself B)
As in, if the subactor's lock request task internally errors we,
- ensure the task always terminates (by calling `DebugStatus.release()`)
and explicitly reports (via a `log.exception()`) the internal error.
- capture the error instance and set as a new `DebugStatus.req_err` and
always check for it on final teardown - in which case we also,
- ensure it's reraised from a new `DebugRequestError`.
- unhide the stack frames for `_pause()`, `_enter_repl_sync()` so that
the dev can upward inspect the `_pause()` call stack sanely.
Supporting internal impl changes,
- add `DebugStatus.cancel()` and `.req_err`.
- don't ever cancel the request task from
`PdbREPL.set_[continue/quit]()` only when there's some internal error
that would likely result in a hang and stale lock state with the root.
- only release the root's lock when the current ask is also the owner
(avoids bad release errors).
- also show internal `._pause()`-related frames on any `repl_err`.
Other temp-dev-tweaks,
- make pld-dec change log msgs info level again while solving this
final context-vars race stuff..
- drop the debug pld-dec instance match asserts for now since
the problem is already caught (and now debug-able B) by an attr-error
on the decoded-as-`dict` started msg, and instead add in
a `log.exception()` trace to see which task is triggering the case
where the debug `MsgDec` isn't set correctly vs. when we think it's
being applied.