Finally got this working so that if/when an internal bug is introduced
to this request task-func, we can actually REPL-debug the lock request
task itself B)
As in, if the subactor's lock request task internally errors we,
- ensure the task always terminates (by calling `DebugStatus.release()`)
and explicitly reports (via a `log.exception()`) the internal error.
- capture the error instance and set as a new `DebugStatus.req_err` and
always check for it on final teardown - in which case we also,
- ensure it's reraised from a new `DebugRequestError`.
- unhide the stack frames for `_pause()`, `_enter_repl_sync()` so that
the dev can upward inspect the `_pause()` call stack sanely.
Supporting internal impl changes,
- add `DebugStatus.cancel()` and `.req_err`.
- don't ever cancel the request task from
`PdbREPL.set_[continue/quit]()` only when there's some internal error
that would likely result in a hang and stale lock state with the root.
- only release the root's lock when the current ask is also the owner
(avoids bad release errors).
- also show internal `._pause()`-related frames on any `repl_err`.
Other temp-dev-tweaks,
- make pld-dec change log msgs info level again while solving this
final context-vars race stuff..
- drop the debug pld-dec instance match asserts for now since
the problem is already caught (and now debug-able B) by an attr-error
on the decoded-as-`dict` started msg, and instead add in
a `log.exception()` trace to see which task is triggering the case
where the debug `MsgDec` isn't set correctly vs. when we think it's
being applied.
Since obviously the thread is likely expected to halt and raise after
the REPL session exits; this was a regression from the prior impl. The
main reason for this is that otherwise the request task will never
unblock if the user steps through the crashed task using 'next' since
the `.do_next()` handler doesn't by default release the request since in
the `.pause()` case this would end the session too early.
Other,
- toss in draft `Pdb.user_exception()`, though doesn't seem to ever
trigger?
- only release `Lock._debug_lock` when already locked.
Mostly adjustments for the new pld-receiver semantics/shim-layer which
results more often in the direct delivery of `RemoteActorError`s from
IPC API primitives (like `Portal.result()`) instead of being embedded in
an `ExceptionGroup` bundled from an embedded nursery.
Tossed usage of the `debug_mode: bool` fixture to a couple problematic
tests while i was working on them.
Also includes detailed assertion updates to the inter-peer cancellation
suite in terms of,
- `Context.canceller` state correctly matching the true src actor when
expecting a ctxc.
- any rxed `ContextCancelled` should instance match the `Context._local/remote_error`
as should the `.msgdata` and `._ipc_msg`.
- start tossing in `__tracebackhide__`s to various eps which don't need
to show in tbs or in the pdb REPL.
- port final `._maybe_enter_pm()` to pass a `api_frame`.
- start comment-marking up some API eps with `@api_frame`
in prep for actually using the new frame-stack tracing.
Woops, due to a `None` test against the `._final_result`, any actual
final `None` result would be received but not acked as such causing
a spawning test to hang. Fix it by instead receiving and assigning both
a `._final_result_msg: PayloadMsg` and `._final_result_pld`.
NB: as mentioned in many recent comments surrounding this API layer,
really this whole `Portal`-has-final-result interface/semantics should
be entirely removed as should the `ActorNursery.run_in_actor()` API(s).
Instead it should all be replaced by a wrapping "high level" API
(`tractor.hilevel` ?) which combines a task nursery, `Portal.open_context()`
and underlying `Context` APIs + an `outcome.Outcome` to accomplish the
same "run a single task in a spawned actor and return it's result"; aka
a "one-shot-task-actor".
- inside the `Actor.cancel()`'s maybe-wait-on-debugger delay,
report the full debug request status and it's affiliated lock request
IPC ctx.
- use the new `.req_ctx.chan.uid` to do the local nursery lookup during
channel teardown handling.
- another couple log fmt tweaks.
Proto-ing a little suite of call-stack-frame annotation-for-scanning
sub-systems for the purposes of both,
- the `.devx._debug`er and its
traceback and frame introspection needs when entering the REPL,
- detailed trace-style logging such that we can explicitly report
on "which and where" `tractor`'s APIs are used in the "app" code.
Deats:
- change mod name obvi from `._code` and adjust client mod imports.
- using `wrapt` (for perf) implement a `@api_frame` annot decorator
which both stashes per-call-stack-frame instances of `CallerInfo` in
a table and marks the function such that API endpoints can be easily
found via runtime stack scanning despite any internal impl changes.
- add a global `_frame2callerinfo_cache: dict[FrameType, CallerInfo]`
table for providing the per func-frame info caching.
- Re-implement `CallerInfo` to require less (types of) inputs:
|_ `_api_func: Callable`, a ref to the (singleton) func def.
|_ `_api_frame: FrameType` taken from the `@api_frame` marked `tractor`-API
func's runtime call-stack, from which we can determine the
app code's `.caller_frame`.
|_`_caller_frames_up: int|None` allowing the specific `@api_frame` to
determine "how many frames up" the application / calling code is.
And, a better set of derived attrs:
|_`caller_frame: FrameType` which finds and caches the API-eps calling
frame.
|_`caller_frame: FrameType` which finds and caches the API-eps calling
- add a new attempt at "getting a method ref from its runtime frame"
with `get_ns_and_func_from_frame()` using a heuristic that the
`CodeType.co_qualname: str` should have a "." in it for methods.
- main issue is still that the func-ref lookup will require searching
for the method's instance type by name, and that name isn't
guaranteed to be defined in any particular ns..
|_rn we try to read it from the `FrameType.f_locals` but that is
going to obvi fail any time the method is called in a module where
it's type is not also defined/imported.
- returns both the ns and the func ref FYI.
- set `._state._ctxvar_Context` just after `StartAck` inside
`open_context_from_portal()` so that `current_ipc_ctx()` always
works on the 'parent' side.
- always set `.canceller` to any `MsgTypeError.src_uid` and otherwise to
any maybe-detected `.src_uid` (i.e. for RAEs).
- always set `.canceller` to us when we rx a ctxc which reports us as
its canceller; this is a sanity check on definite "self cancellation".
- adjust `._is_self_cancelled()` logic to only be `True` when
`._remote_error` is both a ctxc with a `.canceller` set to us AND
when `Context.canceller` is also set to us (since the change above)
as a little bit of extra rigor.
- fill-in/fix some `.repr_state` edge cases:
- merge self-vs.-peer ctxc cases to one block and distinguish via
nested `._is_self_cancelled()` check.
- set 'errored' for all exception matched cases despite `.canceller`.
- add pre-`Return` phase statuses:
|_'pre-started' and 'syncing-to-child' depending on side and when
`._stream` has not (yet) been set.
|_'streaming' and 'streaming-finished' depending on side when
`._stream` is set and whether it was stopped/closed.
- tweak drainage log-message to use "outcome" instead of "result".
- use new `.devx.pformat.pformat_cs()` inside `_maybe_cancel_and_set_remote_error()`
but, IFF the log level is at least 'cancel'.
Since i was running into them (internal errors) during lock request
machinery dev and was getting all sorts of difficult to understand hangs
whenever i intro-ed a bug to either side of the ipc ctx; this all while
trying to get the msg-spec working for `Lock` requesting subactors..
Deats:
- hideframes for `@acm`s and `trio.Event.wait()`, `Lock.release()`.
- better detail out the `Lock.acquire/release()` impls
- drop `Lock.remote_task_in_debug`, use new `.ctx_in_debug`.
- add a `Lock.release(force: bool)`.
- move most of what was `_acquire_debug_lock_from_root_task()` and some
of the `lock_tty_for_child().__a[enter/exit]()` logic into
`Lock.[acquire/release]()` including bunch more logging.
- move `lock_tty_for_child()` up in the module to below `Lock`, with
some rework:
- drop `subactor_uid: tuple` arg since we can just use the `ctx`..
- add exception handler blocks for reporting internal (impl) errors
and always force release the lock in such cases.
- extend `DebugStatus` (prolly will rename to `DebugRequest` btw):
- add `.req_ctx: Context` for subactor side.
- add `.req_finished: trio.Event` to sub to signal request task exit.
- extend `.shield_sigint()` doc-str.
- add `.release()` to encaps all the state mgmt previously strewn
about inside `._pause()`..
- use new `DebugStatus.release()` to replace all the duplication:
- inside `PdbREPL.set_[continue/quit]()`.
- inside `._pause()` for the subactor branch on internal
repl-invocation error cases,
- in the `_enter_repl_sync()` closure on error,
- replace `apply_debug_codec()` -> `apply_debug_pldec()` in tandem with
the new `PldRx` sub-sys which handles the new `__pld_spec__`.
- add a new `pformat_cs()` helper orig to help debug cs stack
a corruption; going to move to `.devx.pformat` obvi.
- rename `wait_for_parent_stdin_hijack()` -> `request_root_stdio_lock()`
with improvements:
- better doc-str and add todos,
- use `DebugStatus` more stringently to encaps all subactor req state.
- error handling blocks for cancellation and straight up impl errors
directly around the `.open_context()` block with the latter doing
a `ctx.cancel()` to avoid hanging in the shielded `.req_cs` scope.
- similar exc blocks for the func's overall body with explicit
`log.exception()` reporting.
- only set the new `DebugStatus.req_finished: trio.Event` in `finally`.
- rename `mk_mpdb()` -> `mk_pdb()` and don't cal `.shield_sigint()`
implicitly since the caller usage does matter for this.
- factor out `any_connected_locker_child()` from the SIGINT handler.
- rework SIGINT handler to better handle any stale-lock/hang cases:
- use new `Lock.ctx_in_debug: Context` to detect subactor-in-debug.
and use it to cancel any lock request instead of the lower level
- use `problem: str` summary approach to log emissions.
- rework `_pause()` given all of the above, stuff not yet mentioned:
- don't take `shield: bool` input and proxy to `debug_func()` (for now).
- drop `extra_frames_up_when_async: int` usage, expect
`**debug_func_kwargs` to passthrough an `api_frame: Frametype` (more
on this later).
- lotsa asserts around the request ctx vs. task-in-debug ctx using new
`current_ipc_ctx()`.
- asserts around `DebugStatus` state.
- rework and simplify the `debug_func` hooks,
`_set_trace()`/`_post_mortem()`:
- make them accept a non-optional `repl: PdbRepl` and `api_frame:
FrameType` which should be used to set the current frame when the
REPL engages.
- always hide the hook frames.
- always accept a `tb: TracebackType` to `_post_mortem()`.
|_ copy and re-impl what was the delegation to
`pdbp.xpm()`/`pdbp.post_mortem()` and instead call the
underlying `Pdb.interaction()` ourselves with a `caller_frame`
and tb instance.
- adjust the public `.pause()` impl:
- accept optional `hide_tb` and `api_frame` inputs.
- mask opening a cancel-scope for now (can cause `trio` stack
corruption, see notes) and thus don't use the `shield` input other
then to eventually passthrough to `_post_mortem()`?
|_ thus drop `task_status` support for now as well.
|_ pretty sure correct soln is a debug-nursery around `._invoke()`.
- since no longer using `extra_frames_up_when_async` inside
`debug_func()`s ensure all public apis pass a `api_frame`.
- re-impl our `tractor.post_mortem()` to directly call into `._pause()`
instead of binding in via `partial` and mk it take similar input as
`.pause()`.
- drop `Lock.release()` from `_maybe_enter_pm()`, expose and pass
expected frame and tb.
- use necessary changes from all the above within
`maybe_wait_for_debugger()` and `acquire_debug_lock()`.
Lel, sorry thought that would be shorter..
There's still a lot more re-org to do particularly with `DebugStatus`
encapsulation but it's coming in follow up.
Since we need to allow it (at the least) inside
`drain_until_final_msg()` for handling stream-phase termination races
where we don't want to have to handle a raised error from something like
`Context.result()`. Expose the passthrough option via
a `passthrough_non_pld_msgs: bool` kwarg.
Add comprehensive comment to `current_pldrx()`.
Expose it from `._state.current_ipc_ctx()` and set it inside
`._rpc._invoke()` for child and inside `Portal.open_context()` for
parent.
Still need to write a few more tests (particularly demonstrating usage
throughout multiple nested nurseries on each side) but this suffices as
a proto for testing with some debugger request-from-subactor stuff.
Other,
- use new `.devx.pformat.add_div()` for ctxc messages.
- add a block to always traceback dump on corrupted cs stacks.
- better handle non-RAEs exception output-formatting in context
termination summary log message.
- use a summary for `start_status` for msg logging in RPC loop.
Since we usually want them raised from some (internal) call to
`Context.maybe_raise()` and NOT directly from the drainage call, make it
possible via a new `raise_error: bool` to both `PldRx.recv_msg_w_pld()`
and `.dec_msg()`.
In support,
- rename `return_msg` -> `result_msg` since we expect to return
`Error`s.
- do a `result_msg` assign and `break` in the `case Error()`.
- add `**dec_msg_kwargs` passthrough for other `.dec_msg()` calling
methods.
Other,
- drop/aggregate todo-notes around the main loop's
`ctx._pld_rx.recv_msg_w_pld()` call.
- add (configurable) frame hiding to most payload receive meths.
Since `._code` is prolly gonna get renamed (to something "frame & stack
tools" related) and to give a bit better organization.
Also adds a new `add_div()` helper, factored out of ctxc message
creation in `._rpc._invoke()`, for adding a little "header line" divider
under a given `message: str` with a little math to center it.
For more sane manual calls as needed in logging purposes. Obvi remap
the dunder methods to it.
Other:
- drop `hide_tb: bool` from `unpack_error()`, shouldn't need it since
frame won't ever be part of any tb raised from returned error.
- add a `is_invalid_payload: bool` to `_raise_from_unexpected_msg()` to
be used from `PldRx` where we don't need to decode the IPC
msg, just the payload; make the error message reflect this case.
- drop commented `._portal._unwrap_msg()` since we've replaced it with
`PldRx`'s delegation to newer `._raise_from_unexpected_msg()`.
- hide the `Portal.result()` frame by default, again.
A better spot for the pretty-formatting of frame text (and thus tracebacks)
is in the new `.devx._code` module:
- move from `._exceptions` -> `.devx._code.pformat_boxed_tb()`.
- add new `pformat_caller_frame()` factored out the use case in
`._exceptions._mk_msg_type_err()` where we dump a stack trace
for bad `.send()` side IPC msgs.
Add some new pretty-format methods to `Context`:
- explicitly implement `.pformat()` and allow an `extra_fields: dict`
which can be used to inject additional fields (maybe eventually by
default) such as is now used inside
`._maybe_cancel_and_set_remote_error()` when reporting the internal
`._scope` state in cancel logging.
- add a new `.repr_state -> str` which provides a single string status
depending on the internal state of the IPC ctx in terms of the shuttle
protocol's "phase"; use it from `.pformat()` for the `|_state:`.
- set `.started(complain_no_parity=False)` now since we presume decoding
with `.pld: Raw` now with the new `PldRx` design.
- use new `msgops.current_pldrx()` in `mk_context()`.
Not sure it's **that** useful (yet) but in theory would allow avoiding
certain log level usage around transient RPC requests for discovery methods
(like `.register_actor()` and friends); can't hurt to be able to
introspect that last message for other future cases I'd imagine as well.
Adjust the calling code in `._runtime` to match; other spots are using
the `trio.Nursery.start()` schedule style and are fine as is.
Improve a bunch more log messages throughout a few mods mostly by going
to a "summary" single-emission style where possible/appropriate:
- in `._runtime` more "single summary" status style log emissions:
|_mk `Actor.load_modules()` render a single mod loaded summary.
|_use a summary `con_status: str` for `Actor._stream_handler()` conn
setup and an equiv (`con_teardown_status`) for connection teardowns.
|_similar thing in `Actor.wait_for_actor()`.
- generally more usage of `.msg.pretty_struct` apis throughout `._runtime`.
Add new task-scope oriented `PldRx.pld_spec` management API similar to
`.msg._codec.limit_msg_spec()`, but obvi built to process and filter
`MsgType.pld` values.
New API related changes include:
- new per-task singleton getter `msg._ops.current_pldrx()` which
delivers the current (global) payload receiver via a new
`_ctxvar_PldRx: ContextVar` configured with a default
`_def_any_pldec: MsgDec[Any]` decoder.
- a `PldRx.limit_plds()` which sets the decoder (`.type` underneath)
for the specific payload rx instance.
- `.msg._ops.limit_plds()` which obtains the current task-scoped `PldRx`
and applies the pld spec via a new `PldRx.limit_plds()`.
- rename `PldRx._msgdec` -> `._pldec`.
- add `.pld_dec` as pub attr for -^
Unrelated adjustments:
- use `.msg.pretty_struct.pformat()` where handy.
- always pass `expect_msg: MsgType`.
- add a `case Stop()` to `PldRx.dec_msg()` which will `log.warning()`
when a stop is received by no stream was open on this receiving side
since we rarely want that to raise since it's prolly just a runtime
race or mistake in user code.
Other:
Particularly when logging around `MsgTypeError`s.
Other:
- make `_raise_from_unexpected_msg()`'s `expect_msg` a non-default value
arg, must always be passed by caller.
- drop `'canceller'` from `_body_fields` ow it shows up twice for ctxc.
- use `.msg.pretty_struct.pformat()`.
- parameterize `RemoteActorError.reprol()` (repr-one-line method) to
show `RemoteActorError[<self.boxed_type_str>]( ..` to make obvi
the boxed remote error type.
- re-impl `.boxed_type_str` as `str`-casting the `.boxed_type` value
which is guaranteed to render non-`None`.
Basically exact same as that for `MsgCodec` with the `.spec` displayed
via a better (maybe multi-line) `.spec_str: str` generated from a common
new set of helper mod funcs factored out msg-codec meths:
- `mk_msgspec_table()` to gen a `MsgType` name -> msg table.
- `pformat_msgspec()` to `str`-ify said table values nicely.q
Also add a new `MsgCodec.msg_spec_str: str` prop which delegates to the
above for the same.
- tweaking logging to include more `MsgType` dumps on IPC faults.
- removing some commented cruft.
- comment formatting / cleanups / add-ons.
- more type annots.
- fill out some TODO content.
As per the reco:
https://jcristharif.com/msgspec/perf-tips.html#reusing-an-output-buffe
BUT, seems to cause this error in `pikerd`..
`BufferError: Existing exports of data: object cannot be re-sized`
Soo no idea? Maybe there's a tweak needed that we can glean from
tests/examples in the `msgspec` repo?
Disabling for now.
Instead of expecting it to be passed in (as it was prior), when
determining if a `Stop` msg is a valid end-of-channel signal use the
`ctx._stream: MsgStream|None` attr which **must** be set by any stream
opening API; either of:
- `Context.open_stream()`
- `Portal.open_stream_from()`
Adjust the case block logic to match with fallthrough from any EoC to
a closed error if necessary. Change the `_type: str` to match the
failing IPC-prim name in the tail case we raise a `MessagingError`.
Other:
- move `.sender: tuple` uid attr up to `RemoteActorError` since `Error`
optionally defines it as a field and for boxed `StreamOverrun`s (an
ignore case we check for in the runtime during cancellation) we want
it readable from the boxing rae.
- drop still unused `InternalActorError`.
As per much tinkering, re-designs and preceding rubber-ducking via many
"commit msg novelas", **finally** this adds the (hopefully) final
missing layer for typed msg safety: `tractor.msg._ops.PldRx`
(or `PayloadReceiver`? haven't decided how verbose to go..)
Design justification summary:
------ - ------
- need a way to be as-close-as-possible to the `tractor`-application
such that when `MsgType.pld: PayloadT` validation takes place, it is
straightforward and obvious how user code can decide to handle any
resulting `MsgTypeError`.
- there should be a common and optional-yet-modular way to modify
**how** data delivered via IPC (possibly embedded as user defined,
type-constrained `.pld: msgspec.Struct`s) can be handled and processed
during fault conditions and/or IPC "msg attacks".
- support for nested type constraints within a `MsgType.pld` field
should be simple to define, implement and understand at runtime.
- a layer between the app-level IPC primitive APIs
(`Context`/`MsgStream`) and application-task code (consumer code of
those APIs) should be easily customized and prove-to-be-as-such
through demonstrably rigorous internal (sub-sys) use!
-> eg. via seemless runtime RPC eps support like `Actor.cancel()`
-> by correctly implementing our `.devx._debug.Lock` REPL TTY mgmt
dialog prot, via a dead simple payload-as-ctl-msg-spec.
There are some fairly detailed doc strings included so I won't duplicate
that content, the majority of the work here is actually somewhat of
a factoring of many similar blocks that are doing more or less the same
`msg = await Context._rx_chan.receive()` with boilerplate for
`Error`/`Stop` handling via `_raise_from_no_key_in_msg()`. The new
`PldRx` basically provides a shim layer for this common "receive msg,
decode its payload, yield it up to the consuming app task" by pairing
the RPC feeder mem-chan with a msg-payload decoder and expecting IPC API
internals to use **one** API instead of re-implementing the same pattern
all over the place XD
`PldRx` breakdown
------ - ------
- for now only expects a `._msgdec: MsgDec` which allows for
override-able `MsgType.pld` validation and most obviously used in
the impl of `.dec_msg()`, the decode message method.
- provides multiple mem-chan receive options including:
|_ `.recv_pld()` which does the e2e operation of receiving a payload
item.
|_ a sync `.recv_pld_nowait()` version.
|_ a `.recv_msg_w_pld()` which optionally allows retreiving both the
shuttling `MsgType` as well as it's `.pld` body for use cases where
info on both is important (eg. draining a `MsgStream`).
Dirty internal changeover/implementation deatz:
------ - ------
- obvi move over all the IPC "primitives" that previously had the duplicate recv-n-yield
logic:
- `MsgStream.receive[_nowait]()` delegating instead to the equivalent
`PldRx.recv_pld[_nowait]()`.
- add `Context._pld_rx: PldRx`, created and passed in by
`mk_context()`; use it for the `.started()` -> `first: Started`
retrieval inside `open_context_from_portal()`.
- all the relevant `Portal` invocation methods: `.result()`,
`.run_from_ns()`, `.run()`; also allows for dropping `_unwrap_msg()`
and `.Portal_return_once()` outright Bo
- rename `Context.ctx._recv_chan` -> `._rx_chan`.
- add detailed `Context._scope` info for logging whether or not it's
cancelled inside `_maybe_cancel_and_set_remote_error()`.
- move `._context._drain_to_final_msg()` -> `._ops.drain_to_final_msg()`
since it's really not necessarily ctx specific per say, and it does
kinda fit with "msg operations" more abstractly ;)
In prep for a "payload receiver" abstraction that will wrap
`MsgType.pld`-IO delivery from `Context` and `MsgStream`, adds a small
`msgspec.msgpack.Decoder` shim which delegates an API similar to
`MsgCodec` and is offered via a `.msg._codec.mk_dec()` factory.
Detalles:
- move over the TODOs/comments from `.msg.types.Start` to to
`MsgDec.spec` since it's probably the ideal spot to start thinking
about it from a consumer code PoV.
- move codec reversion assert and log emit into `finally:` block.
- flip default `.types._tractor_codec = mk_codec_ipc_pld(ipc_pld_spec=Raw)`
in prep for always doing payload-delayed decodes.
- make `MsgCodec._dec` private with public property getter.
- change `CancelAck` to NOT derive from `Return` so it's mutex in
`match/case:` handling.
Since it's going to be used from the IPC primitive APIs
(`Context`/`MsgStream`) for similarly handling payload type spec
validation errors and bc it's really not well situation in the IPC
module XD
Summary of (impl) tweaks:
- obvi move `_mk_msg_type_err()` and import and use it in `._ipc`; ends
up avoiding a lot of ad-hoc imports we had from `._exceptions` anyway!
- mask out "new codec" runtime log emission from `MsgpackTCPStream`.
- allow passing a (coming in next commit) `codec: MsgDec` (message
decoder) which supports the same required `.pld_spec_str: str` attr.
- for send side logging use existing `MsgCodec..pformat_msg_spec()`.
- rename `_raise_from_no_key_in_msg()` to the now more appropriate
`_raise_from_unexpected_msg()`, but leaving alias for now.