Since it isn't required until the landing of the new service-manager
stuff in #12; was an oversight
from commit `0607a31dddeba032a2cf7d9fe605edd9d7bb4846`.
In particular ensuring we use `ctx._pld_rx.recv_msg_nowait()` from
`.receive_nowait()` (which is called from `.aclose()`) such that we
ALWAYS (can) set the surrounding `Context._result/._outcome_msg` attrs
on reception of a final `Return`!!
This fixes a final stream-teardown-race-condition-bug where prior we
normally didn't set the `Context._result/._outcome_msg` in such cases.
This is **precisely because** `.receive_nowait()` only returns the
`pld` and when called from `.aclose()` this value is discarded, meaning
so is its boxing `Return` despite consuming it from the underlying
`._rx_chan`..
Longer term this should be solved differently by ensuring such races
cases are handled at a higher scope like inside `Context._deliver_msg()`
or the `Portal.open_context()` enter/exit blocks? Add a detailed warning
note and todos for all this around the special case block!
Such that any `Return` is always capture for each ctx instance and set
in `._deliver_msg()` normally; ensures we can at least introspect for it
when missing (like in a recently discovered stream teardown race bug).
Yes this augments the already existing `._result` which is dedicated for
the `._outcome_msg.pld` in the non-error case; we might want to see if
there's a nicer way to directly proxy ref to that without getting the
pre-pld-decoded `Raw` form with `msgspec`?
Also use the new `ctx._pld_rx.recv_msg()` and drop assigning
`pld_rx._ctx`.
Namely renaming and tweaking the `MsgType` receiving methods,
- `.recv_msg()` from what was `.recv_msg_w_pld()` which both receives
the IPC msg from the underlying `._rx_chan` and then decodes its
payload with `.decode_pld()`; it now also log reports on the different
"stage of SC dialog protocol" msg types via a `match/case`.
- a new `.recv_msg_nowait()` sync equivalent of ^ (*was*
`.recv_pld_nowait()`) who's use was the source of a recently
discovered bug where any final `Return.pld` is being
consumed-n-discarded by by `MsgStream.aclose()` depending on
ctx/stream teardown race conditions..
Also,
- remove all the "instance persistent" ipc-ctx attrs, specifically the
optional `_ipc`, `_ctx` and the `.wraps_ipc()` cm, since none of them
were ever really needed/used; all methods which require
a `Context/MsgStream` are explicitly always passed.
- update a buncha typing namely to use the more generic-styled
`PayloadT` over `Any` and obviously `MsgType[PayloadT]`.
I must have had a local touched file but never committed or something?
Seems that new `pytest` requires a top level `tests` pkg in order for
relative `.conftest` imports to work.
Repairs a bug in `drain_to_final_msg()` where in the `Yield()` case
block we weren't guarding against the `ctx._stream is None` edge case
which should be treated a `continue`-draining (not a `break` or
attr-error!!) situation since the peer task maybe be continuing to send
`Yield` but has not yet sent an outcome msg (one of
`Return/Error/ContextCancelled`) to terminate the loop. Ensure we
explicitly warn about this case as well as `.cancel()` emit on a taskc.
Thanks again to @guille for discovering this!
Also add temporary `.info()`s around rxed `Return` msgs as part of
trying to debug a different bug discovered while updating the
context-semantics test suite (in a prior commit).
Muchas grax to @guilledk for finding the first issue which kicked of
this further scrutiny of the `tractor.Context` and `MsgStream` semantics
test suite with a strange edge case where,
- if the parent opened and immediately closed a stream while the remote
child task started and continued (without terminating) to send msgs
the parent's `open_context().__aexit__()` would **not block** on the
child to complete!
=> this was seemingly due to a bug discovered inside the
`.msg._ops.drain_to_final_msg()` stream handling case logic where we
are NOT checking if `Context._stream` is non-`None`!
As such this,
- extends the `test_caller_closes_ctx_after_callee_opens_stream` (now
renamed, see below) to include cases for all combinations of the child
and parent sending before receiving on the stream as well as all
placements of `Context.cancel()` in the parent before, around and after
the stream open.
- uses the new `expect_ctxc()` for expecting the taskc (`trio.Task`
cancelled)` cases.
- also extends the `test_callee_closes_ctx_after_stream_open` (also
renamed) to include the case where the parent sends a msg before it
receives.
=> this case has unveiled yet-another-bug where somehow the underlying
`MsgStream._rx_chan: trio.ReceiveMemoryChannel` is allowing the
child's `Return[None]` msg be consumed and NOT in a place where it is
correctly set as `Context._result` resulting in the parent hanging
forever inside `._ops.drain_to_final_msg()`..
Alongside,
- start renaming using the new "remote-task-peer-side" semantics
throughout the test module: "caller" -> "parent", "callee" -> "child".
Just like we do from the `.devx._debug.open_crash_handler()`, this
allows checking various attrs on the raised `ContextCancelled` much like
`with pytest.raises() as excinfo:`.
Seems this can happen in particular when we raise a `MessageTypeError`
on the sender side of a `Context`, since there isn't any msg relayed
from the other side (though i'm wondering if MTE should derive from RAE
then considering this case?).
Means `RemoteActorError.boxed_type = None` in such cases instead of
raising an attr-error for the `None.boxed_type_str`.
Namely that `add_hooks: bool` should be the same as on the rent side..
Also, just drop the now unused `iter_maybe_sends`.
This makes the suite entire greeeeen btw, including the new sub-suite
which i hadn't runt before Bo
Namely renaming and massively simplifying it to a new
`test_ext_types_over_ipc` which avoids all the wacky "parent dictates
what sender should be able to send beforehand"..
Instead keep it simple and just always try to send the same small set of
types over the wire with expect-logic to handle each case,
- use the new `dec_hook`/`ext_types` args to `mk_[co]dec()` routines for
pld-spec ipc transport.
- always try to stream a small set of types from the child with logic to
handle the cases expected to error.
Other,
- draft a `test_pld_limiting_usage` to check runtime raising of bad API
usage; haven't run it yet tho.
- move `test_custom_extension_types` to top of mod so that the
`enc/dec_nsp()` hooks can be reffed from test parametrizations.
- comment out (and maybe remove) the old routines for
`iter_maybe_sends`, `test_limit_msgspec`, `chk_pld_type`.
XXX TODO, turns out the 2 failing cases from this suite have exposed an
an actual bug with `MsgTypeError` unpacking where the `ipc_msg=` input
is being set to `None` ?? -> see the comment at the bottom of
`._exceptions._mk_recv_mte()` which seems to describe the likely
culprit?
Since it should only be used from within a `Portal.open_context()`
scope, make sure the caller knows that!
Also don't hide the frame in tb if the immediate function errors..
By using our new `PldRx` design we can,
- pass through the pld-spec & a `dec_hook()` to our `MsgDec` which is
used to configure the underlying `.dec: msgspec.msgpack.Decoder`
- pass through a `enc_hook()` to `mk_codec()` and use it to conf the
equiv `MsgCodec.enc` such that sent msg-plds are converted prior
to transport.
The trick ended up being just to always union the `mk_dec()`
extension-types spec with the normaly with the `msgspec.Raw` pld-spec
such that the `dec_hook()` is only invoked for payload types tagged
by the encoder/sender side B)
A variety of impl tweaks to make it all happen as well as various
cleanups in the `.msg._codec` mod include,
- `mk_dec()` no defaul `spec` arg, better doc string, accept the new
`ext_types` arg, doing the union of that with `msgspec.Raw`.
- proto-ed a now unused `mk_boxed_ext_struct()` which will likely get
removed since it ended up that our `PayloadMsg` structs already cover
the ext-type-hook requirement that the decoder is passed
a `.type=msgspec.Struct` of some sort in order for `.dec_hook` to be
used.
- add a `unpack_spec_types()` util fn for getting the `set[Type]` from
from a `Union[Type]` annotation instance.
- mk the default `mk_codec(pc_pld_spec = Raw,)` since the `PldRx` design
was already passing/overriding it and it doesn't make much sense to
use `Any` anymore for the same reason; it will cause various `Context`
apis to now break.
|_ also accept a `enc_hook()` and `ext_types` which are used to maybe
config the `.msgpack.Encoder`
- generally tweak a bunch of comments-as-docs and todos namely the ones
that are completed after the pld-rx design was implemented.
Also,
- mask the non-functioning `'defstruct'` approach `inside
`.msg.types.mk_msg_spec()` to prep for its removal.
Adjust the test suite (rn called `test_caps_based_msging`),
- add a new suite `test_custom_extension_types` and move and
use the `enc/dec_nsp()` hooks to the mod level for its use.
- prolly planning to drop the `test_limit_msgspec` suite since it's
mostly replaced by the `test_pldrx_limiting` mod's version?
- originally was tweaking a bunch in `test_codec_hooks_mod` but likely
it will get mostly rewritten to be simpler and simply verify that
ext-typed fields can be used over IPC `Context`s between actors (as
originally intended for this sub-suite).
Namely the `tractor.pause_from_sync()` examples using both bg threads
and `asyncio` which seem to go into bad states where SIGINT is ignored..
Deats,
- add `maybe_expect_timeout()` cm to ensure the EOF hangs get
`.xfail()`ed instead.
- @pytest.mark.ctlcs_bish` `test_pause_from_sync` and don't expect the
greenback prompt msg.
- also mark `test_sync_pause_from_aio_task`.
Seems that on 3.13 it's not showing our script code in the output now?
Gotta get an example for @oremanj to see what's up but really it'd be
nice to just custom format stuff above `trio`'s runtime by def..
Anyway, update the `.devx._stackscope`,
- log formatting to be a little more "sclangy" lookin.
- change the per-actor "delimiter" lines style.
- report the `signal.getsignal(SIGINT)` which i needed in the
`sync_bp.py` with ctl-c causing a hang..
- mask the `_tree_dumped` duplicator log report as well as the "dumped
fine" one.
- add an example `pkill --signal SIGUSR1` cmdline.
Tweak the test to cope with,
- not showing our script lines now.. which i've commented in the
`assert_before()` patts..
- to expect the newly formatted delimiter (ascii) lines to separate the
root vs. hanger sub-actor sections.
Like any `bdb.BdbQuit` that might be relayed from a remote context after
a REPl exit with the `quit` cmd. This fixes various issues while
debugging where it may not be clear to the parent task that the child
was terminated with a purposefully unrecoverable error.
That is whenever `trio.EndOfChannel` is raised (presumably from the
`._to_trio.receive()` call inside `LinkedTaskChannel.receive()`) we need
to be extra certain that we let it bubble upward transparently DESPITE
special exc-as-signal handling that is normally suppressed from the aio
side; REPEAT we want to ALWAYS bubble any `trio_err ==
trio.EndOfChannel` in the `finally:` handler of `translate_aio_errors()`
despite `chan._trio_to_raise == AsyncioTaskExited` such that the
caller's iterable machinery will operate as normal when the inter-task
stream is stopped (again, presumably by the aio side task terminating
the inter-task stream).
Main impl deats for this,
- in the EoC handler block ensure we assign both `chan._trio_err` and
the local `trio_err` as well as continue to re-raise.
- add a case to the match block in the `finally:` handler which FOR SURE
re-raises any `type(trio_err) is EndOfChannel`!
Additionally fix a bad bug,
- a ref bug where we were NOT using the
`except BaseException as _trio_err` to assign to `chan._trio_err` (by
accident was missing the leading `_`..)
Unrelated impl tweak,
- move all `maybe_raise_aio_side_err()` content back to inline with its
parent func - makes it easier to use `tractor.pause()` mostly Bp
- go back to trying to use `aio_task.set_exception(aio_taskc)` for now
even though i'm pretty sure we're going to move to a try-fute-first
style helper for this in the future.
Adjust some tests to match/mk-them-green,
- break from `aio_echo_server()` recv loop on
`to_asyncio.TrioTaskExited` much like how you'd expect to (implicitly
with a `for`) with a `trio.EndOfChannel`.
- toss in a masked `value is None` pause point i needed for debugging
inf looping caused by not re-raising EoCs per the main patch
description.
- add a debug-mode sized delay to root-infected test.
Inside a new `.trionics._beg` and exposed from the subpkg ns in
anticipation of the `strict_exception_groups=False` being removed by
`trio` in py 3.15.
Notes,
- mk an embedded single-exc "extractor" using a `BaseExceptionGroup.exceptions` length
check, when 1 return the lone child.
- use the above in a new `@acm`, async bc it's most likely to be composed in an
`async with` tuple-style sequence block, called `collapse_eg()` which
acts a one line "absorber" for when the above mentioned flag is no
logner supported by `trio.open_nursery()`.
All untested atm fwiw.. but soon to be used in our test suite(s) likely!
Including changes like,
- loose eg flagging in various test emedded `trio.open_nursery()`s.
- changes to eg handling (like using `except*`).
- added `debug_mode` integration to tests that needed some REPLin
in order to figure out appropriate updates.
Such that a `TrioCancelled` is raised in the aio task via
`.set_exception()` to explicitly indicate and allow that task to handle
a taskc request from the parent `trio.Task`.
Such that any combination of task terminations/exits can be explicitly
handled and "dual side independent" crash cases re-raised in egs.
The main error-or-exit impl changes include,
- use of new per-side "signaling exceptions":
- TrioTaskExited|TrioCancelled for signalling aio.
- AsyncioTaskExited|AsyncioCancelled for signalling trio.
- NOT overloading the `LinkedTaskChannel._trio/aio_err` fields for
err-as-signal relay and instead add a new pair of
`._trio/aio_to_raise` maybe-exc-attrs which allow each side's
task to specify what it would want the other side to raise to signal
its/a termination outcome:
- `._trio_to_raise: AsyncioTaskExited|AsyncioCancelled` to signal,
|_ the aio task having returned while the trio side was still reading
from the `asyncio.Queue` or is just not `.done()`.
|_ the aio task being self or trio-request cancelled where
a `asyncio.CancelledError` is raised and caught but NOT relayed
as is back to trio; instead signal a "more explicit" exc type.
- `._aio_to_raise: TrioTaskExited|TrioCancelled` to signal,
|_ the trio task having returned while the aio side was still reading
from the mem chan and indicating that the trio side might not
care any more about future streamed values (like the
`Stop/EndOfChannel` equivs for ipc `Context`s).
|_ when the trio task canceld we do
a `asyncio.Future.set_exception(TrioTaskExited())` to indicate
to the aio side verbosely that it should cancel due to the trio
parent.
- `_aio/trio_err` are now left to only capturing the **actual**
per-side task excs for introspection / other side's handling logic.
- supporting "graceful exits" depending on API in use from
`translate_aio_errors()` such that if either side exits but the other
side isn't expect to consume the final `return`ed value, we just exit
silently, which required:
- adding a `suppress_graceful_exits: bool` flag.
- adjusting the `maybe_raise_aio_side_err()` logic to use that flag
and suppress only on certain combos of `._trio_to_raise/._trio_err`.
- prefer to raise `._trio_to_raise` when the aio-side is the src and
vice versa.
- filling out pedantic logging for cancellation cases indicating which
side is the cause.
- add a `LinkedTaskChannel._aio_result` modelled after our
`Context._result` a a similar `.wait_for_result()` interface which
allows maybe accessing the aio task's final return value if desired
when using the `open_channel_from()` API.
- rename `cancel_trio()` done handler -> `signal_trio_when_done()`
Also some fairly major test suite updates,
- add a `delay: int` producing fixture which delivers a much larger
timeout whenever `debug_mode` is set so that the REPL can be used
without a surrounding cancel firing.
- add a new `test_aio_exits_early_relays_AsyncioTaskExited` including
a paired `exit_early: bool` flag to `push_from_aio_task()`.
- adjust `test_trio_closes_early_causes_aio_checkpoint_raise` to expect
a `to_asyncio.TrioTaskExited`.
Namely when the subactor fails to lock the root, in which case we
try to be very verbose about how/what failed in logging as well
as ensure we cancel the employed IPC ctx.
Implement the outer `BaseException` handler to handle both styles,
- match on an eg (or the prior std cancel excs) only raising a lone
sub-exc from for former.
- always `as _req_err:` and assign to a new func-global `req_err`
to enable the above matching.
Other,
- raise `DebugStateError` on `status.subactor_uid != actor_uid`.
- fix a `_repl_fail_report` ref error due to making silly assumptions
about the `_repl_fail_msg` global; now copy from global as default.
- various log-fmt and logic expression styling tweaks.
- ignore `trio.Cancelled` by default in `open_crash_handler()`.
Namely inside,
- `ActorNursery.open_portal()` which uses
`.trionics.maybe_open_nursery()` and is now adjusted to
pass-through `**kwargs` for at least this flag.
- inside the `.trionics.gather_contexts()`.
Such that it gets passed through to `.open_root_actor()` in the
`implicit_runtime==True` case - useful for debugging cases where
`.devx._debug` APIs might be used to avoid REPL clobbering in subactors.
Since it'll likely need a bit of detailing to get the test suite running
identically with strict egs (exception groups), i've opted to just flip
the switch on a few core nursery scopes for now until as such a time
i can focus enough to port the matching internals.. Xp
Since it turns out there's a few gotchas moving to python 3.13,
- we need to pin to new(er) `trio` which now flips to strict exception
groups (something to be handled in a follow up patch).
- since we're now using `uv` we should (at least for now) prefer the
system `python` (over astral's distis) since they compile for
`libedit` in terms of what the (new) `readline.backend: str` will read
as; this will break our tab-completion and vi-mode settings in
the `pdbp` REPL without a user configuring a `~/.editrc`
appropriately.
- go back to using latest `pdbp` (not a local dev version) since it
should work fine presuming the previous bullet is addressed.
Lock bumps,
- for now use latest `trio==0.29.0` (which i gotta feeling might have
broken some existing attempts at strict-eg handling i've tried..)
- update to latest `xonsh`, `pdbp` and its dep `tabcompleter`
Other cleaning,
- put back in various deps "comments" from `poetry` content.
- drop the `xonsh-vox` and `xontrib-vox` dev deps; no `vox` support with
`uv` rn anyway..
Since there's no way to activate `greenback`'s portal in such cases, we
should at least have a test verifying our very loud error about the
inability to support this usage..
The (rare) condition is heavily detailed in new comments in
the `cancel_trio()` callback but, more or less the idea here is to be
extra pedantic in raising an `Exceptiongroup` of errors from each task
(both `asyncio` and `trio`) whenever the 2 tasks raise "independently"
- in the sense that it's not obviously one side's task causing an error
(or cancellation) in the other. In this case we set the error for each
side on the `LinkedTaskChannel` (via new attrs described later).
As a synopsis, most of this work was refined out of supporting
`infected_aio=True` mode in the **root actor** and in particular as part
of getting that to work inside the `modden` daemon which at the time of
writing was still using the `i3ipc` lib and thus `asyncio`.
Impl deats,
- extend the `LinkedTaskChannel` field/API set (and type it),
- `._trio_task: trio.Task` for test/user introspection.
- also "stage" some ideas for a more refined interface,
- `.started()` to deliver the value yielded to the `trio.Task` parent.
|_ also includes some todos for how to implement this design
underneath.
- `._aio_first: Any|None = None` to hold that value ^.
- `.wait_aio_complete()` for syncing to the asyncio task.
- some detailed logging around "asyncio cancelled trio" case.
- Move `AsyncioCancelled` in this module.
Styling changes,
- generally more explicit var naming.
- some todos for getting modern and fancy with typing..
NB, Let it be known this commit msg was written on a friday with the
help of various "mr. white" solns.
Might as well break apart the specific test set since there are some
(minor) subtleties and the orig test mod is already getting pretty big
XD
Includes both the new "independent"-event-loops test as well as the std
usage base case suite.
Such that the suite verifies the wip `maybe_raise_from_masking_exc()`
will raise from a `trio.Cancelled.__context__` since I can't think of
any reason a `Cancelled` should ever be raised in-place of
a non-`Cancelled` XD
Not sure what should be raised instead (or maybe just a `log.warning()`
emitted?) but this starts a draft for refinement at the least. Use the
new `@pytest.mark.parametrize` explicit tuple-of-params form with an
`pytest.param + `.mark.xfail()` for the default behaviour case.
Since i wasted 2 days just to find an example of this inside an `@acm`,
figured I better reproduce for the purposes of maybe implementing
a warning sys (inside our wip proto `open_taskman()`) when a nursery
detects a single `Cancelled` in an eg where the `.__context__` is set to
some non-cancel error (which likely means a cancel-causing source
exception was suppressed by accident).
Left in a buncha commented code using `maybe_open_nursery()` which
i thought might be part of the issue but didn't end up being required;
will likely remove on a follow up refinement.
Along the lines of something like `pytest.raises()` where the handled
exception can be inspected from the `pdbp` REPL using its `.value` field
B)
This is super handy in particular for understanding
`BaseException[Group]`s without manually adding surrounding handler code
to assign the `except[*] Exception as exc_var:` particularly when trying
to understand multi-cancelled eg trees.
Trying to replicate cases where errors are raised in both `trio` and
`asyncio` tasks independently (at least in `.to_asyncio` API terms) with
a new `test_trio_prestarted_task_bubbles` that generates 3 cases inside
a `@acm` calls stack composing a `trio.Nursery` with
a `to_asyncio.open_channel_from()` call where a set of `trio` tasks are
started in a loop using `.start()` with various exc raising sequences,
- the aio task raising *before* the last `trio` task spawns.
- the aio task raising just after the last trio task spawns, but before
it starts.
- after the last trio task `.start()` call returns control to the
parent - but (for now) did not error.
TODO, still more cases to discover as i'm still fighting a `modden` bug
of this sort atm..
Other,
- tweak some other tests to have timeouts since some recent hangs were
found..
- started mucking with py3.13 and thus adjustments for strict egs in
some tests; full patchset to test suite likely coming soon!
Since we can't use it to `Task.set_exception()` (since that task method never
seems to work.. XD) and setting the private/internal always seems to do
the desired raising in the task? I realize it's an internal `asyncio`
runtime field but i'd rather take the risk of it breaking then having to
rely on our own equivalent hack..
Also, it seems like the case where the task's associated (and internal)
future-waiter field is null, we won't run into the (same?) prior hanging
issues (maybe since there's nothing for `asyncio` internals to use to
wait XD ??) when `Task.cancel()` is used..??
Main deats,
- add and `Future.set_exception()` a new signal-exception
`class TrioTaskExited(AsyncioCancelled):` whenever the trio-task exits
gracefully and the asyncio-side task is still doing blocking work (of
some sort) which *seem to* be predicated by a check that
`._fut_waiter is not None`.
- always call `asyncio.Queue.shutdown()` for the same^ as well as
whenever we decide to call `Task.cancel()`; in that case the shutdown
relays correctly?
Some further refinements,
- only warn about `Task.cancel()` usage when actually used ;)
- more local scope vars setting in the exit phase of
`translate_aio_errors()`.
- also in ^ use explicit caught-exc var names for each error-type.
Since it can not only cause the guest-mode run to abandon but also in
some edge cases prevent `trio`-errors from propagating (at least on
py3.12-13?) as discovered as part of supporting this mode officially
in the *root actor*.
As such try to avoid that method as much as possible instead opting to
pass the `trio`-side error via the iter-task channel ref.
Deats,
- add a `LinkedTaskChannel._trio_err: BaseException|None` which gets set
whenver the `trio.Task` error is caught; ONLY set `AsyncioCancelled`
when the `trio` task was for sure the cause, whether itself cancelled
or errored.
- always check for this error when exiting the `asyncio` side (even when
terminated via a call to `asyncio.Task.cancel()` or during any other
`CancelledError` handling such that the `asyncio`-task can expect to
handle `AsyncioCancelled` due to the above^^ cases.
- never `cs.cancel()` the `trio` side unless that cancel scope has not
yet been `.cancel_called` whatsoever; it's a noop anyway.
- only raise any exc from `asyncio.Task.result()` when `chan._aio_err`
does not already match it since the existence of the pre-existing
`task_err` means `asyncio` prolly intends (or has already) raised and
interrupted the task elsewhere.
Various supporting tweaks,
- don't bother maybe-init-ing `greenback` from the actor entrypoint
since we already need to (and do) bestow the portals to each `asyncio`
task spawned using the `run_task()`/`open_channel_from()` API; further
the init-ing should be done already by client code that enables
infected mode (even in the root actor).
|_we should prolly also codify it from any
`run_daemon(infected_aio=True, debug_mode=True)` usage we offer.
- pass all the `_<field>`s to `Linked TaskChannel` explicitly in named
kwarg style.
- better sclang-style log reports throughout, particularly on teardowns.
- generally more/better comments and docs around (not well understood)
edge cases.
- prep to just inline `maybe_raise_aio_side_err()` closure..
When `.pause_from_sync()` is called from an `asyncio.Task` which was
never bestowed a portal we want to be mega pedantic about it; indicate
that the task was NOT spawned from our `.to_asyncio` API and likely by
some out-of-our-control code (normally using
`asyncio.ensure_future()/.create_task()`). Though `greenback` already
errors on such usage, it's not always clear why no portal exists;
explaining the situation of a 3rd-party-bg-spawned-task should avoid
dev confusion for most cases.
Impl deats,
- distinguish between an actor in infected mode versus the actual caller
of `.pause_from_sync()` being an `asyncio.Task` with more explicit
`asyncio_task` and `is_infected_aio` vars.
- ONLY in the case of being both an infected-mode-actor AND detecting
that the caller is an `asyncio.Task`, check `greenback.has_portal()`
such that when not bestowed we presume the aforementioned
3rd-party-bg-task case above and raise a new explicit RTE with
a detailed explanatory message.
- add some masked draft code for handling the speical case of a root
actor `asyncio.Task` caller which could (in theory) not actually
require gb portal use since the `Lock` can be acquired directly
without IPC.
|_this will likely require factoring of various pause machinery funcs
into a `_pause_from_root_task()` to mk the impl sane XD
Other,
- expose a new `debug_filter: Callable` which can be provided by the
caller of `_maybe_enter_pm()` to predicate whether to enter the
debugger REPL based on the caught `BaseException|BaseExceptionGroup`;
this is handy for customizing the meaning of "graceful cancellations"
so as to avoid crash handling on expected egs of more then
`trioCancelled`.
|_ make the default as it was implemented: `not is_multi_cancelled(err)`
- pass-through a new `ignore: set[BaseException]` as
`open_crash_handler(ignore_nested=ignore)` to allow for the same
silent-cancellation-egs-swallowing as desired from outside the actor
runtime.
Such that equivalents of `trio.Cancelled` from other runtimes such as
`asyncio.CancelledError` and `subprocess.CalledProcessError` (with
a `.returncode == -2`) can be gracefully ignored as needed by the
caller.
For example this is handy if you want to avoid debug-mode REPL entry on
an exception-group full of only some subset of exception types since you
expect certain tasks to raise such errors after having been cancelled by
a request from some parent supervision sys (some "higher up"
`trio.CancelScope`, a remote triggered `ContextCancelled` or just from
and OS SIGINT).
Impl deats,
- offer a new `ignore_nested: set[BaseException]` param which by
default we add `trio.Cancelled` to when no other types are provided.
- use `ExceptionGroup.subgroup(tuple(ignore_nested)` to filter to egs of
the "ignored sub-errors set" and return any such match (instead of
`True`).
- detail a comment on exclusion case.
Such that we can hook into 3rd-party-libs more easily to monkey them and
use our (prettier/hipper) console logging with something like (an
example from the client project `modden`),
```python
connection_mod = i3ipc.connection
tractor_style_i3ipc_logger: logging.LoggingAdapter = tractor.log.get_console_log(
_root_name=connection_mod.__name__,
logger=i3ipc.connection_mod.logger,
level='info',
)
# monkey the instance-ref in 3rd-party module
connection_mod.logger = our_logger
```
Impl deats,
- expose as `get_console_log(logger: logging.Logger)` and add default
failover logic.
- toss in more typing, also for mod-global instance.
Such that you can use,
```python
tractor.to_asyncio.run_as_asyncio_guest(
trio_main=_trio_main,
)
```
to boostrap the root actor (and thus main parent process) to embed
the actor-rumtime into an `asyncio` loop. Prove it all works with an
subactor-free version of the aio echo-server test suite B)
For comparing a `msgspec.Struct` against an input `dict` presumably to
be used as input for struct instantiation. The main diff with
`.__sub__()` is that non-existing fields on either are reported
(loudly).
Such that maybe we can eventually offer a nicer higher-level API which
implements much of the boilerplate required by `msgspec` (like
type-matched branching to serialization logic) via a type-table
interface or something?
Not sure if the idea is that useful so leaving it all as TODOs for now
obviously.
Ensuring we can at least use `breakpoint()` from an infected actor's
`asyncio.Task` spawned via a `.to_asyncio` API.
Also includes a little `tests/devx/` reorging,
- start splitting out non-`tractor.pause()` tests into a new
`test_pause_from_non_trio.py` for all the `.pause_from_sync()`
use in bg-threaded or `asyncio` applications.
- factor harness commonalities to the `devx/conftest` (namely
the `do_ctlc()` masher).
- mv `test_pause_from_sync` to the new non`-trio` mod.
NOTE, the `ctlc=True` is still failing for
`test_pause_from_asyncio_task` which is a user-happiness bug but not
anything fundamentally broken - just need to handle the `asyncio` case
in `.devx._debug.sigint_shield()`!
Mostly fixing edge cases with `asyncio` and/or bg threads where the
`.repl_release: trio.Event` needs to be used from the main `trio`
thread OW confusing-but-valid teardown tracebacks can show under various
races.
Also improve,
- log reporting for such internal bugs to make them more obvious on
console via `log.exception()`.
- only restore the SIGINT handler when runtime is (still) active.
- reporting when `tractor.pause(shield=True)` should be used and
unhiding the internal frames from the tb in that case.
- for `pause_from_sync()` some deep fixes..
|_add a `allow_no_runtime: bool = False` flag to allow
**not** requiring the actor runtime to be active.
|_fix the `greenback` case-branch to only trigger on `not
is_trio_thread`.
|_add a scope-global `repl_owner: Task|Thread|None = None` to
avoid ref errors..
Various `try`/`except` blocks around external APIs that raise when not
running inside an `tractor` and/or some async framework (mostly to avoid
too-late/benign error tbs on certain classes of actor tree teardown):
- for the `log.pdb()` prompts emitted before REPL console entry.
- inside `DebugStatus.is_main_trio_thread()`'s call to `sniffio`.
- in `_post_mortem()` by catching `NoRuntime` when called from a thread
still active after the `.open_root_actor()` has already exited.
Also,
- create a dedicated `DebugStateError` for raising instead of `assert`s
when we have actual debug-request inconsistencies (as seem to be most
likely with bg thread usage of `breakpoint()`).
- show the `open_crash_handler()` frame on `bdb.BdbQuit` (for now?)
Since it seems that `pdbp.xpm()` can sometimes lose the up-stack
traceback info/frames? Not sure why but ours seems to work just fine
from a `asyncio`-handler in `modden`'s use of `i3ipc` B)
Also call `DebugStatus.shield_sigint()` from `pause_from_sync()` in the
infected-`asyncio` case to get the same shielding behaviour as in all
other usage!
Mostly due to magic from @oremanj where we slap in a little bit of
`.from_asyncio`-type stuff to run a `trio`-task from `asyncio.Task`
code!
I'm not gonna go into tooo too much detail but basically the primary
thing needed was a way to (blocking-ly) invoke a `trio.lowlevel.Task`
from an `asyncio` one (which we now have with a new
`run_trio_task_in_future()` thanks to draft code from the aforementioned
jefe) which we now invoke from a dedicated aio case-branch inside
`.devx._debug.pause_from_sync()`. Further include a case inside
`DebugStatus.release()` to handle using the same func to set the
`repl_release: trio.Event` from the aio side when releasing the REPL on
exit cmds.
Prolly more refinements to come ;{o
It was expecting `AssertionError` as a proceed-in-test signal (by
breaking from a continue loop), but `in_prompt_msg(raise_on_err=True)`
was changed to raise `ValueError`; so instead just use as a predicate
for the `break`.
Also rework `in_prompt_msg()` to accept the `child: BaseSpawn` as input
instead of `before: str` remove the casting boilerplate, and adjust all
usage to match.
Just like we've started doing throughout the rest of the actor runtime
for reporting (and where "sclang" = "structured conc (s)lang", our
little supervision-focused operations syntax i've been playing with in
log msg content).
Further tweaks:
- report the `trio_done_fute` alongside the `main_outcome` value.
- add a todo list for supporting `greenback` for pause points.
The reason for this "duplication" with the `--asyncio` CLI flag (passed
to the child during spawn) is 2-fold:
- allows verifying inside `Actor._from_parent()` that the `trio` runtime was
started via `.start_guest_run()` as well as if the
`Actor._infected_aio` spawn-entrypoint value has been set (by the
`._entry.<spawn-backend>_main()` whenever `--asyncio` is passed)
such that any mismatch can be signaled via an `InternalError`.
- enables checking the `._state._runtime_vars['_is_infected_aio']` value
directly (say from a non-actor/`trio`-thread) instead of calling
`._state.current_actor(err_on_no_runtime=False)` in certain edge
cases.
Impl/testing deats:
- add `._state._runtime_vars['_is_infected_aio'] = False` default.
- raise `InternalError` on any `--asyncio`-flag-passed vs.
`_runtime_vars`-value-relayed-from-parent inside
`Actor._from_parent()` and include a `Runner.is_guest` assert for good
measure B)
- set and relay `infect_asyncio: bool` via runtime-vars to child in
`ActorNursery.start_actor()`.
- verify `actor.is_infected_aio()`, `actor._infected_aio` and
`_state._runtime_vars['_is_infected_aio']` are all set in test suite's
`asyncio_actor()` endpoint.
By re-purposing our `pexpect`-based console matching with a new
`debugging/shield_hang_in_sub.py` example, this tests a few "hanging
actor" conditions more formally:
- that despite a hanging actor's task we can dump
a `stackscope.extract()` tree on relay of `SIGUSR1`.
- the actor tree will terminate despite a shielded forever-sleep by our
"T-800" zombie reaper machinery activating and hard killing the
underlying subprocess.
Some test deats:
- simulates the expect actions of a real user by manually using
`os.kill()` to send both signals to the actor-tree program.
- `pexpect`-matches against `log.devx()` emissions under normal
`debug_mode == True` usage.
- ensure we get the actual "T-800 deployed" `log.error()` msg and
that the actor tree eventually terminates!
Surrounding (re-org/impl/test-suite) changes:
- allow disabling usage via a `maybe_enable_greenback: bool` to
`open_root_actor()` but enable by def.
- pretty up the actual `.devx()` content from `.devx._stackscope`
including be extra pedantic about the conc-primitives for each signal
event.
- try to avoid double handles of `SIGUSR1` even though it seems the
original (what i thought was a) problem was actually just double
logging in the handler..
|_ avoid double applying the handler func via `signal.signal()`,
|_ use a global to avoid double handle func calls and,
|_ a `threading.RLock` around handling.
- move common fixtures and helper routines from `test_debugger` to
`tests/devx/conftest.py` and import them for use in both test mods.
The final issue was making sure we do the same thing on ctl-c/SIGINT
from the user. That is, if there's already a bg-thread in REPL, we
`log.pdb()` about SIGINT shielding and re-draw the prompt; the same UX
as normal actor-runtime-task behaviour.
Reasons this wasn't workin.. and the fix:
- `.pause_from_sync()` was overriding the local `repl` var with `None`
delivered by (transitive) calls to `_pause(debug_func=None)`.. so
remove all that and only assign it OAOO prior to thread-type case
branching.
- always call `DebugStatus.shield_sigint()` as needed from all requesting
threads/tasks:
- in `_pause_from_bg_root_thread()` BEFORE calling `._pause()` AND BEFORE
yielding back to the bg-thread via `.started(out)` to ensure we're
definitely overriding the handler in the `trio`-main-thread task
before unblocking the requesting bg-thread.
- from any requesting bg-thread in the root actor such that both its
main-`trio`-thread scheduled task (as per above bullet) AND it are
SIGINT shielded.
- always call `.shield_sigint()` BEFORE any `greenback._await()` case
don't entirely grok why yet, but it works)?
- for `greenback._await()` case always set `bg_task` to the current one..
- tweaks to the `SIGINT` handler, now renamed `sigint_shield()` so as
not to name-collide with the methods when editor-searching:
- always try to `repr()` the REPL thread/task "owner" as well as the
active `PdbREPL` instance.
- add `.devx()` notes around the prompt flushing deats and comments
for any root-actor-bg-thread edge cases.
Related/supporting refinements:
- add `get_lock()`/`get_debug_req()` factory funcs since the plan is to
eventually implement both as `@singleton` instances per actor.
- fix `acquire_debug_lock()`'s call-sig-bug for scheduling
`request_root_stdio_lock()`..
- in `._pause()` only call `mk_pdb()` when `debug_func != None`.
- add some todo/warning notes around the `cls.repl = None` in
`DebugStatus.release()`
`test_pause_from_sync()` tweaks:
- don't use a `attach_patts.copy()`, since we always `break` on match.
- do `pytest.fail()` on that ^ loop's fallthrough..
- pass `do_ctlc(child, patt=attach_key)` such that we always match the
the current thread's name with the ctl-c triggered `.pdb()` emission.
- oh yeah, return the last `before: str` from `do_ctlc()`.
- in the script, flip `abandon_on_cancel=True` since when `False` it
seems to cause `trio.run()` to hang on exit from the last bg-thread
case?!?
In `lock_stdio_for_peer()` better internal-error handling/reporting:
- only `Lock._blocked.remove(ctx.cid)` if that same cid was added on
entry to avoid needless key-errors.
- drop all `Lock.release(force: bool)` usage remnants.
- if `req_ctx.cancel()` fails mention it with `ctx_err.add_note()`.
- add more explicit internal-failed-request log messaging via a new
`fail_reason: str`.
- use and use new `x)<=\n|_` annots in any failure logging.
Other cleanups/niceties:
- drop `force: bool` flag entirely from the `Lock.release()`.
- use more supervisor-op-annots in `.pdb()` logging
with both `_pause/crash_msg: str` instead of double '|' lines when
`.pdb()`-reported from `._set_trace()`/`._post_mortem()`.
Turns out it does work XD
Prior presumption was from before I had the fute poll-loop so makes
sense we needed more then one sched-tick's worth of context switch vs.
now we can just keep looping-n-pumping as fast possible until the
guest-run's main task completes.
Also,
- minimize the preface commentary (as per todo) now that we have tests
codifying all the edge cases :finger_crossed:
- parameter-ize the pump-loop-cycle delay and default it to 0.
To make the recent set of tests pass this (hopefully) finally solves all
`asyncio` embedded `trio` guest-run abandonment by ensuring we "pump the
event loop" until the guest-run future is fully complete.
Accomplished via simple poll loop of the form `while not
trio_done_fut.done(): await asyncio.sleep(.1)` in the `aio_main()`
task's exception teardown sequence. The loop does a naive 10ms
"pump-via-sleep & poll" for the `trio` side to complete before finally
exiting (and presumably raising) from the SIGINT cancellation.
Other related cleanups and refinements:
- use `asyncio.Task.result()` inside `cancel_trio()` since it also
inline-raises any exception outcome and we can also log-report the
result in non-error cases.
- comment out buncha not-sure-we-need-it stuff in `cancel_trio()`.
- remove the botched `AsyncioCancelled(CancelledError):` idea obvi XD
- comment `greenback` init for now in `aio_main()` since (pretty sure)
we don't ever want to actually REPL in that specific func-as-task?
- always capture any `fute_err: BaseException` from the `main_outcome:
Outcome` delivered by the `trio` side guest-run task.
- add and raise a new super noisy `AsyncioRuntimeTranslationError`
whenever we detect that the guest-run `trio_done_fut` has not
completed before task exit; should avoid abandonment issues ever
happening again without knowing!
Finally this reproduces the issue as it (originally?) exhibited inside
`piker` where the `Actor.lifetime_stack` wasn't closed in cases where
during `infected_aio`-actor cancellation/shutdown `trio` side tasks
which are doing shielded (teardown) work are NOT being watched/waited on
from the `aio_main()` task-closure inside `run_as_asyncio_guest()`!
This is then the root cause of the guest-run being abandoned since if
our `aio_main()` task-closure doesn't know it should allow the run to
finish, it's going to call `loop.close()` eventually resulting in the
`GeneratorExit` thrown into `trio._core._run.unrolled_run()`..
So, this extends the `test_sigint_closes_lifetime_stack()` suite to
include cases for such shielded `trio`-task ops:
- add a new `trio_side_is_shielded: bool` which will toggle whether to
add a shielded 0.5s `trio.sleep()` loop to `manage_file()` which
should outlive the `asyncio` event-loop shutdown sequence and result
in an abandoned guest-run and thus a leaked file.
- parametrize the existing suite with this case resulting in a total 16
test set B)
This patch demonstrates the problem with our `aio_main()` task-closure
impl via the now 4 failing tests, a fix is coming in a follow up commit!
Turns out it somehow breaks our `to_asyncio` error relay since obvi
`asyncio`'s runtime seems to specially handle it (prolly via
`isinstance()` ?) and it caused our
`test_aio_cancelled_from_aio_causes_trio_cancelled()` to hang..
Further, obvi `unpack_error()` won't be able to find the type def if not
kept inside `._exceptions`..
So given all that, revert the change/move as well as:
- tweak the aio-from-aio cancel test to timeout.
- do `trio.sleep()` conc with any bg aio task by moving out nursery
block.
- add a `send_sigint_to: str` parameter to
`test_sigint_closes_lifetime_stack()` such that we test the SIGINT
being relayed to just the parent or the child.
Took me a while to figure out what the heck was going on but, turns out
`asyncio` changed their SIGINT handling in 3.11 as per:
https://docs.python.org/3/library/asyncio-runner.html#handling-keyboard-interruption
I'm not entirely sure if it's the 3.11 changes or possibly wtv further
updates were made in 3.12 but more or less due to the way
our current main task was written the `trio` guest-run was getting
abandoned on SIGINTs sent from the OS to the infected child proc..
Note that much of the bug and soln cases are layed out in very detailed
comment-notes both in the new test and `run_as_asyncio_guest()`, right
above the final "fix" lines.
Add new `test_infected_aio.test_sigint_closes_lifetime_stack()` test suite
which reliably triggers all abandonment issues with multiple cases
of different parent behaviour post-sending-SIGINT-to-child:
1. briefly sleep then raise a KBI in the parent which was originally
demonstrating the file leak not being cleaned up by `Actor.lifetime_stack.close()`
and simulates a ctl-c from the console (relayed in tandem by
the OS to the parent and child processes).
2. do `Context.wait_for_result()` on the child context which would
hang and timeout since the actor runtime would never complete and
thus never relay a `ContextCancelled`.
3. both with and without running a `asyncio` task in the `manage_file`
child actor; originally it seemed that with an aio task scheduled in
the child actor the guest-run abandonment always was the "loud" case
where there seemed to be some actor teardown but with tbs from
python failing to gracefully exit the `trio` runtime..
The (seemingly working) "fix" required 2 lines of code to be run inside
a `asyncio.CancelledError` handler around the call to `await trio_done_fut`:
- `Actor.cancel_soon()` which schedules the actor runtime to cancel on
the next `trio` runner cycle and results in a "self cancellation" of
the actor.
- "pumping the `asyncio` event loop" with a non-0 `.sleep(0.1)` XD
|_ seems that a "shielded" pump with some actual `delay: float >= 0`
did the trick to get `asyncio` to allow the `trio` runner/loop to
fully complete its guest-run without abandonment.
Other supporting changes:
- move `._exceptions.AsyncioCancelled`, our renamed
`asyncio.CancelledError` error-sub-type-wrapper, to `.to_asyncio` and make
it derive from `CancelledError` so as to be sure when raised by our
`asyncio` x-> `trio` exception relay machinery that `asyncio` is
getting the specific type it expects during cancellation.
- do "summary status" style logging in `run_as_asyncio_guest()` wherein
we compile the eventual `startup_msg: str` emitted just before waiting
on the `trio_done_fut`.
- shield-wait with `out: Outcome = await asyncio.shield(trio_done_fut)`
even though it seems to do nothing in the SIGINT handling case..(I
presume it might help avoid abandonment in a `asyncio.Task.cancel()`
case maybe?)
As in whenever `Context.cancel()` is not (runtime internally) called
(i.e. `._cancel_called` is not set), we can attempt to detect the parent
`trio` nursery/cancel-scope that is the source. Emit the report with
a `.cancel()` level and attempt to repr in "sclang" form as well as
unhide the stack frame for debug/traceback-in.
There's a been a todo for soo long for this XD
Since all `Actor`'s store a set of `._peers` we can try a lookup on that
table as a shortcut before pinging the registry Bo
Impl deats:
- add a new `._discovery.get_peer_by_name()` routine which attempts the
`._peers` lookup by combining a copy of that `dict` + an entry added
for `Actor._parent_chan` (since all subs have a parent and often the
desired contact is just that connection).
- change `.find_actor()` (for the `only_first == True` case),
`.query_actor()` and `.wait_for_actor()` to call the new helper and
deliver appropriate outputs if possible.
Other,
- deprecate `get_arbiter()` def and all usage in tests and examples.
- drop lingering use of `arbiter_sockaddr` arg to various routines.
- tweak the `Actor` doc str as well as some code fmting and a tweak to
the `._stream_handler()`'s initial `con_status: str` logging value
since the way it was could never be reached.. oh and `.warning()` on
any new connections which already have a `_pre_chan: Channel` entry in
`._peers` so we can start minimizing IPC duplications.
In the `drain_to_final_msg()` impl, since a stream terminating
gracefully requires this msg, there's really no reason to `log.cancel()`
about it; go `.runtime()` level instead since we're trying de-noise
under "normal operation".
Also,
- passthrough `hide_tb` to taskc-handler's `ctx.maybe_raise()` call.
- raise `MessagingError` for the `MsgType` unmatched `case _:`.
- detail the doc string motivation a little more.
As per a WIP scribbled out TODO in `._entry.nest_from_op()`, change
a bunch of "supervisor/lifetime mgmt ops" related log messages to
contain some supervisor-annotation "headers" in an effort to give
a terser "visual indication" of how some execution/scope/storage
primitive entity (like an actor/task/ctx/connection) is being operated
on (like, opening/started/closed/cancelled/erroring) from a "supervisor
action" POV.
Also tweak a bunch more emissions to lower levels to reduce noise around
normal inter-actor operations like process and IPC ctx supervision.
To avoid showing lowlevel details of exception handling around the
underlying call to `return await self._ctx._pld_rx.recv_pld(ipc=self)`,
any time a `RemoteActorError` is unpacked (an raised locally) we re-raise
it directly from the captured `src_err` captured so as to present to
the user/app caller-code an exception raised directly from the `.receive()`
frame. This simplifies traceback call-stacks for any `log.exception()`
or `pdb`-REPL output filtering out the lower `PldRx` frames by default.
Since it was all ad-hoc defined inside
`._ipc.MsgpackTCPStream._iter_pkts()` more or less, this starts
formalizing a way for particular transport backends to indicate whether
a disconnect condition should be re-raised in the RPC msg loop and if
not what log level to report it at (if any).
Based on our lone transport currently we try to suppress any logging
noise from ephemeral connections expected during normal actor
interaction and discovery subsys ops:
- any short lived discovery related TCP connects are only logged as
`.transport()` level.
- both `.error()` and raise on any underlying `trio.ClosedResource`
cause since that normally means some task touched transport layer
internals that it shouldn't have.
- do a `.warning()` on anything else unexpected.
Impl deats:
- extend the `._exceptions.TransportClosed` to accept an input log
level, raise-on-report toggle and custom reporting & raising via a new
`.report_n_maybe_raise()` method.
- construct the TCs with inputs per case in (the newly named) `._iter_pkts().
- call ^ this method from the `TransportClosed` handler block inside the
RPC msg loop thus delegating reporting levels and/or raising to the
backend's per-case TC instantiating.
Related `._ipc` changes:
- mask out all the `MsgpackTCPStream._codec` debug helper stuff and drop
any lingering cruft from the initial proto-ing of msg-codecs.
- rename some attrs/methods:
|_`MsgpackTCPStream._iter_packets()` -> `._iter_pkts()` and
`._agen` -> `_aiter_pkts`.
|_`Channel._aiter_recv()` -> `._aiter_msgs()` and
`._agen` -> `_aiter_msgs`.
- add `hide_tb: bool` support to `Channel.send()` and only show the
frame on non-MTEs.
- allow passing and report the lib name (`trio` or `tractor`) from
`maybe_open_nursery()`.
- use `.runtime()` level when reporting `_Cache`-hits in
`maybe_open_context()`.
- tidy up some doc strings.
In case the struct doesn't import a field type (which will cause the
`.pformat()` to raise) just report the issue and try to fall back to the
original `repr()` version.
It's too fragile to put in side core RPC machinery since
`msgspec.Struct` defs can fail if a field type can't be
looked up at creation time (like can easily happen if you
conditionally import using `if TYPE_CHECKING:`)
Also,
- rename `cs` to `rpc_ctx_cs: CancelScope` since it's literally
the wrapping RPC `Context._scope`.
- report self cancellation via `explain: str` and add tail case for
"unknown cause".
- put a ?TODO? around what to do about KBIs if a context is opened
from an `infected_aio`-actor task.
- similar to our nursery and portal add TODO list for moving all
`_invoke_non_context()` content out the RPC core and instead implement
them as `.hilevel` endpoint helpers (maybe as decorators?)which under
neath define `@context`-funcs.
Log-report the different types of actor exit conditions including cancel
via KBI, error or normal return with varying levels depending on case.
Also, start proto-ing out this weird ascii-syntax idea for describing
conc system states and implement the first bit in a `nest_from_op()`
log-message fmter that joins and indents an obj `repr()` with
a tree-like `'>)\n|_'` header.
Since we more or less require it for `tractor.pause_from_sync()` this
refines enable toggles and their relay down the actor tree as well as
more explicit logging around init and activation.
Tweaks summary:
- `.info()` report the module if discovered during root boot.
- use a `._state._runtime_vars['use_greenback']: bool` activation flag
inside `Actor._from_parent()` to determine if the sub should try to
use it and set to `False` if mod-loading fails / not installed.
- expose `maybe_init_greenback()` from `.devx` sugpkg.
- comment out RTE in `._pause()` for now since we already have it in
`.pause_from_sync()`.
- always `.exception()` on `maybe_init_greenback()` import errors to
clarify the underlying failure deats.
- always explicitly report if `._state._runtime_vars['use_greenback']`
was NOT set when `.pause_from_sync()` is called.
Other `._runtime.async_main()` adjustments:
- combine the "internal error call ur parents" message and the failed
registry contact status into one new `err_report: str`.
- drop the final exception handler's call to
`Actor.lifetime_stack.close()` since we're already doing it in the
`finally:` block and the earlier call has no currently known benefit.
- only report on the `.lifetime_stack()` callbacks if any are detected
as registered.
Not sure how I forgot this but, obviously it's correct context-var
semantics to revert the current IPC `Context` (set in the latest
`.open_context()` block) such that any prior instance is reset..
This ensures the sanity `assert`s pass inside
`.msg._ops.maybe_limit_plds()` and just in general ensures for any task
that the last opened `Context` is the one returned from
`current_ipc_ctx()`.
This change is adding commentary about the upcoming API removal and
simplification of nursery + portal internals; no actual code changes are
included.
The plan to (re)move the old RPC methods:
- `ActorNursery.run_in_actor()`
- `Portal.run()`
- `Portal.run_from_ns()`
and any related impl internals out of each conc-primitive and instead
into something like a `.hilevel.rpc` set of APIs which then are all
implemented using the newer and more lowlevel `Context`/`MsgStream`
primitives instead Bo
Further,
- formally deprecate the `Portal.result()` meth for
`.wait_for_result()`.
- only `log.info()` about runtime shutdown in the implicit root case.
Might as well add a public maybe-getter for use on the "parent" side
since it can be handy to check out-of-band cancellation conditions (like
from `Portal.cancel_actor()`).
Buncha bitty tweaks for more easily debugging cancel conditions:
- add a `@.cancel_called.setter` for hooking into `.cancel_called = True`
being set in hard to decipher "who cancelled us" scenarios.
- use a new `self_ctxc: bool` var in `.cancel()` to capture the output
state from `._is_self_cancelled(remote_error)` at call time so it can
be compared against the measured value at crash-time (when REPL-ing it
can often have already changed due to runtime teardown sequencing vs.
the crash handler hook entry).
- proxy `hide_tb` to `.drain_to_final_msg()` from `.wait_for_result()`.
- use `remote_error.sender` attr directly instead of through
`RAE.msgdata: dict` lookup.
- change var name `our_uid` -> `peer_uid`; it's not "ours"..
Other various docs/comment updates:
- extend the main class doc to include some other name ideas.
- change over all remaining `.result()` refs to `.wait_for_result()`.
- doc more details on how we want `.outcome` to eventually signature.
Since a local-actor-nursery-parented subactor might also use the root as
its registry, we need to avoid warning when short lived IPC `Channel`
connections establish and then disconnect (quickly, bc the apparently
the subactor isn't re-using an already cached parente-peer<->child conn
as you'd expect efficiency..) since such cases currently considered
normal operation of our super shoddy/naive "discovery sys" XD
As such, (un)guard the whole local-actor-nursery OR channel-draining
waiting blocks with the additional `or Actor._cancel_called` branch
since really we should also be waiting on the parent nurse to exit (at
least, for sure and always) when the local `Actor` indeed has been
"globally" cancelled-called. Further add separate timeout warnings for
channel-draining vs. local-actor-nursery-exit waiting since they are
technically orthogonal cases (at least, afaik).
Also,
- adjust the `Actor._stream_handler()` connection status log-emit to
`.runtime()`, especially to reduce noise around the aforementioned
ephemeral registree connection-requests.
- if we do wait on a local actor-nurse to exit, report its `._children`
table (which should help figure out going forward how useful the
warning is, if at all).
Name them `_mk_send_mte()`/`_mk_recv_mte()` and change the runtime to
call each appropriately depending on location/usage.
Also add some dynamic call-frame "unhide" blocks such that when we
expect raised MTE from the aboves calls but we get a different
unexpected error from the runtime, we ensure the call stack downward is
shown in tbs/pdb.
|_ ideally in the longer run we come up with a fancier dynamic sys for
this, prolly something in `.devx._frame_stack`?
Just pass `_bad_msg` such that it get's injected to `.msgdata` since
with a send-side `MsgTypeError` we don't have a remote `._ipc_msg:
Error` per say to include.
The tests only use one input spec (conveniently) so there's not much to
change in the logic,
- only pass the `maybe_msg_spec` to the child-side decorator and obvi
drop the surrounding `msgops.limit_plds()` block in the child.
- tweak a few `MsgDec` asserts, mostly dropping the
`msg._ops._def_any_spec` state checks since the child-side won't have
any pre pld-spec state given the runtime now applies the `pld_spec`
before running the task's func body.
- also allowed dropping the `finally:` which did a similar check
outside the `.limit_plds()` block.
Namely passing the `.__pld_spec__` directly to the
`lock_stdio_for_peer()` decorator B)
Also, allows dropping `apply_debug_pldec()` (which was a todo) and
removing a `lock_stdio_for_peer()` indent level.
Instead of the WIP/prototyped `Portal.open_context()` offering
a `pld_spec` input arg, this changes to a proper decorator API for
specifying the "payload spec" on `@context` endpoints.
The impl change details actually cover 2-birds:
- monkey patch decorated functions with a new
`._tractor_context_meta: dict[str, Any]` and insert any provided input
`@context` kwargs: `_pld_spec`, `enc_hook`, `enc_hook`.
- use `inspect.get_annotations()` to scan for a `func` arg
type-annotated with `tractor.Context` and use the name of that arg as
the RPC task-side injected `Context`, thus injecting the needed arg
by type instead of by name (a longstanding TODO); raise a type-error
when not found.
- pull the `pld_spec` from the `._tractor_context_meta` attr both in the
`.open_context()` parent-side and child-side `._invoke()`-cation of
the RPC task and use the `msg._ops.maybe_limit_plds()` API to apply it
internally in the runtime for each case.
`RemoteActorError`s show this by default in their `.__repr__()`, and we
obvi capture and embed the src traceback in an `Error` msg prior to
transit, but for logging it's also handy to see the tb of any set
`Context._remote_error` on console especially when trying to decipher
remote error details at their origin actor. Also improve the log message
description using `ctx.repr_state` and show any `ctx.outcome`.
Longer run we don't want `tractor` app devs having to call
`msg._ops.limit_plds()` from every child endpoint.. so this starts
a list of decorator API ideas and obviously ties in with an ideal final
API design that will come with py3.13 and typed funcs. Obviously this
is directly fueled by,
- https://github.com/goodboy/tractor/issues/365
Other,
- type with direct `trio.lowlevel.Task` import.
- use `log.exception()` to show tbs for all error-terminations in
`.open_context()` (for now) and always explicitly mention the `.side`.
Hopefully would make grok-ing this fairly sophisticated sub-sys possible
for any up-and-coming `tractor` hacker
XD
A lot of internal API and re-org ideas I discovered/realized as part of
finishing the `__pld_spec__` and multi-threaded support. Particularly
better isolation between root-actor vs subactor task APIs and generally
less globally-state-ful stuff like `DebugStatus` and `Lock` method APIs
would likely make a lot of the hard to follow edge cases more clear?
Functionally working for multi-threaded (via cpython threads spawned
from `to_trio.to_thread.run_sync()`) alongside subactors, tested (for
now) only with threads started inside the root actor (which seemed to
have the most issues in terms of the impl and special cases..) using the
new `tractor.pause_from_sync()` API!
Main implementation changes to `.pause_from_sync()`
------ - ------
- from the root actor, we need to ensure bg thread case is handled
*specially* since no IPC is used to request the TTY stdio mutex and
`Lock` (API) usage is conducted entirely from a local task or thread;
dedicated `Lock` usage for the root-actor already is branched inside
`._pause()` and needs similar handling from a root bg-thread:
|_for the special case of a root bg thread we need to
`trio`-main-thread schedule a bg task inside a new
`_pause_from_bg_root_thread()`. The new task needs to implement most
of what was is handled inside `._pause()` manually, mostly because in
this root-actor-bg-thread case we have 2 constraints:
1. to enter `PdbREPL.interaction()` **from the bg thread** directly,
2. the task that `Lock._debug_lock.acquire()`s has to be the same
that calls `.release() (a `trio.FIFOLock` constraint)
|_impl deats of this `_pause_from_bg_root_thread()` include:
- (for now) calling `._pause()` to acquire the `Lock._debug_lock`.
- setting its own `DebugStatus.repl_release`.
- calling `.DebugStatus.shield_sigint()` to ensure the root's
main thread uses the right handler when the bg one is REPL-ing.
- wait manually on the `.repl_release()` to be set by the thread's
dedicated `PdbREPL` exit.
- manually calling `Lock.release()` from the **same task** that
acquired it.
- expect calls to `._pause()` to deliver a `tuple[Task, PdbREPL]` such
that we always get the handle both to any newly created REPl instance
and the (maybe) the scheduled bg task within which is runs.
- add a single `message: str` style to `log.devx()` based on branching
style for logging.
- ensure both `DebugStatus.repl` and `.repl_task` are set **just
before** calling `._set_trace()` to ensure the correct `Task|Thread`
is set when the REPL is finally entered from sync code.
- add a wrapping caller `_sync_pause_from_builtin()` which passes in the
new `called_from_builtin=True` to indicate `breakpoint()` caller
usage, obvi pass in `api_frame`.
Changes to `._pause()` in support of ^
------ - ------
- `TaskStatus.started()` and return the `tuple[Task, PdbREPL]` to
callers / starters.
- only call `DebugStatus.shield_sigint()` when no `repl` passed bc some
callers (like bg threads) may need to apply it at some specific point
themselves.
- tweak some asserts for the `debug_func == None` / non-`trio`-thread
case.
- add a mod-level `_repl_fail_msg: str` to be used when there's an
internal `._pause()` failure for testing, easier to pexpect match.
- more comprehensive logging for the root-actor branched case to
(attempt to) indicate any of the 3 cases:
- remote ctx from subactor has the `Lock`,
- already existing root task or thread has it or,
- some kinda stale `.locked()` situation where the root has the lock
but we don't know why.
- for root usage, revert to always `await Lock._debug_lock.acquire()`-ing
despite `called_from_sync` since `.pause_from_sync()` was reworked to
instead handle the special bg thread case in the new
`_pause_from_bg_root_thread()` task.
- always do `return _enter_repl_sync(debug_func)`.
- try to report any `repl_task: Task|Thread` set by the caller
(particularly for the bg thread cases) as being the thread or task
`._pause()` was called "on behalf of"
Changes to `DebugStatus`/`Lock` in support of ^
------ - ------
- only call `Lock.release()` from `DebugStatus.set_[quit/continue]()`
when called from the main `trio` thread and always call
`DebugStatus.release()` **after** to ensure `.repl_released()` is set
**after** `._debug_lock.release()`.
- only call `.repl_release.set()` from `trio` thread otherwise use
`.from_thread.run()`.
- much more refinements in `Lock.release()` for threading cases:
- return `bool` to indicate whether lock was released by caller.
- mask (in prep to drop) `_pause()` usage of
`Lock.release.force=True)` since forcing a release can't ever avoid
the RTE from `trio`.. same task **must** acquire/release.
- don't allow usage from non-`trio`-main-threads, ever; there's no
point since the same-task-needs-to-manage-`FIFOLock` constraint.
- much more detailed logging using `message`-building-style for all
caller (edge) cases.
|_ use a `we_released: bool` to determine failed-to-release edge
cases which can happen if called from bg threads, ensure we
`log.exception()` on any incorrect usage resulting in release
failure.
|_ complain loudly if the release fails and some other task/thread
still holds the lock.
|_ be explicit about "who" (which task or thread) the release is "on
behalf of" by reading `DebugStatus.repl_task` since the caller
isn't the REPL operator in many sync cases.
- more or less drop `force` support, as mentioned above.
- ensure we unset `._owned_by_root` if the caller is a root task.
Other misc
------ - ------
- rename `lock_tty_for_child()` -> `lock_stdio_for_peer()`.
- rejig `Lock.repr()` to show lock and event stats.
- stage `Lock.stats` and `.owner` methods in prep for doing a singleton
instance and `@property`s.
Originally discovered as while using `tractor.pause_from_sync()`
from the `i3ipc` client running in a bg-thread that uses `asyncio`
inside `modden`.
Turns out we definitely aren't correctly handling `.pause_from_sync()`
from the root actor when called from a `trio.to_thread.run_sync()`
bg thread:
- root-actor bg threads which can't `Lock._debug_lock.acquire()` since
they aren't in `trio.Task`s.
- even if scheduled via `.to_thread.run_sync(_debug._pause)` the
acquirer won't be the task/thread which calls `Lock.release()` from
`PdbREPL` hooks; this results in a RTE raised by `trio`..
- multiple threads will step on each other's stdio since cpython's GIL
seems to ctx switch threads on every input from the user to the REPL
loop..
Reproduce via reworking our example and test so that they catch and fail
for all edge cases:
- rework the `/examples/debugging/sync_bp.py` example to demonstrate the
above issues, namely the stdio clobbering in the REPL when multiple
threads and/or a subactor try to debug simultaneously.
|_ run one thread using a task nursery to ensure it runs conc with the
nursery's parent task.
|_ ensure the bg threads run conc a subactor usage of
`.pause_from_sync()`.
|_ gravely detail all the special cases inside a TODO comment.
|_ add some control flags to `sync_pause()` helper and don't use
`breakpoint()` by default.
- extend and adjust `test_debugger.test_pause_from_sync` to match (and
thus currently fail) by ensuring exclusive `PdbREPL` attachment when
the 2 bg root-actor threads are concurrently interacting alongside the
subactor:
|_ should only see one of the `_pause_msg` logs at a time for either
one of the threads or the subactor.
|_ ensure each attaches (in no particular order) before expecting the
script to exit.
Impl adjustments to `.devx._debug`:
- drop `Lock.repl`, no longer used.
- add `Lock._owned_by_root: bool` for the `.ctx_in_debug == None`
root-actor-task active case.
- always `log.exception()` for any `._debug_lock.release()` ownership
RTE emitted by `trio`, like we used to..
- add special `Lock.release()` log message for the stale lock but
`._owned_by_root == True` case; oh yeah and actually
`log.devx(message)`..
- rename `Lock.acquire()` -> `.acquire_for_ctx()` since it's only ever
used from subactor IPC usage; well that and for local root-task
usage we should prolly add a `.acquire_from_root_task()`?
- buncha `._pause()` impl improvements:
|_ type `._pause()`'s `debug_func` as a `partial` as well.
|_ offer `called_from_sync: bool` and `called_from_bg_thread: bool`
for the special case handling when called from `.pause_from_sync()`
|_ only set `DebugStatus.repl/repl_task` when `debug_func != None`
(OW ensure the `.repl_task` is not the current one).
|_ handle error logging even when `debug_func is None`..
|_ lotsa detailed commentary around root-actor-bg-thread special cases.
- when `._set_trace(hide_tb=False)` do `pdbp.set_trace(frame=currentframe())`
so the `._debug` internal frames are always included.
- by default always hide tracebacks for `.pause[_from_sync]()` internals.
- improve `.pause_from_sync()` to avoid root-bg-thread crashes:
|_ pass new `called_from_xxx_` flags and ensure `DebugStatus.repl_task`
is actually set to the `threading.current_thread()` when needed.
|_ manually call `Lock._debug_lock.acquire_nowait()` for the non-bg
thread case.
|_ TODO: still need to implement the bg-thread case using a bg
`trio.Task`-in-thread with an `trio.Event` set by thread REPL exit.
Exactly like how it's organized for `Portal.open_context()`, put the
main streaming API `@acm` with the `MsgStream` code and bind the method
to the new module func.
Other,
- rename `Context.result()` -> `.wait_for_result()` to better match the
blocking semantics and rebind `.result()` as deprecated.
- add doc-str for `Context.maybe_raise()`.
It ended up getting necessarily implemented as the `PldRx` though at
a different layer and won't be needed as part of `MsgCodec` most likely,
though this original idea did provide the source of inspiration for how
things work now!
Also Move the commented TODO proto for a codec hook factory from
`.types` to `._codec` where it prolly better fits and update some msg
related todo/questions.
It's been a long time prepped and now finally implemented!
Offer a `shield: bool` argument from our async `._debug` APIs:
- `await tractor.pause(shield=True)`,
- `await tractor.post_mortem(shield=True)`
^-These-^ can now be used inside cancelled `trio.CancelScope`s,
something very handy when introspecting complex (distributed) system
tear/shut-downs particularly under remote error or (inter-peer)
cancellation conditions B)
Thanks to previous prepping in a prior attempt and various patches from
the rigorous rework of `.devx._debug` internals around typed msg specs,
there ain't much that was needed!
Impl deats
- obvi passthrough `shield` from the public API endpoints (was already
done from a prior attempt).
- put ad-hoc internal `with trio.CancelScope(shield=shield):` around all
checkpoints inside `._pause()` for both the root-process and subactor
case branches.
Add a fairly rigorous example, `examples/debugging/shielded_pause.py`
with a wrapping `pexpect` test, `test_debugger.test_shield_pause()` and
ensure it covers as many cases as i can think of offhand:
- multiple `.pause()` entries in a loop despite parent scope
cancellation in a subactor RPC task which itself spawns a sub-task.
- a `trio.Nursery.parent_task` which raises, is handled and
tries to enter and unshielded `.post_mortem()`, which of course
internally raises `Cancelled` in a `._pause()` checkpoint, so we catch
the `Cancelled` again and then debug the debugger's internal
cancellation with specific checks for the particular raising
checkpoint-LOC.
- do ^- the latter -^ for both subactor and root cases to ensure we
can debug `._pause()` itself when it tries to REPL engage from
a cancelled task scope Bo
Keep the old alias, but i think it's better form to use longer names for
internal public APIs and this name better reflects the functionality:
decoding and returning a `PayloadMsg.pld` field.
Since turns out we didn't have a single example using that API Bo
The test granular-ly checks all use cases:
- `.post_mortem()` manual calls in both subactor and root.
- ensuring built-in RPC crash handling activates after each manual one
from ^.
- drafted some call-stack frame checking that i commented out for now
since we need to first do ANSI escape code removal due to the
colorization that `pdbp` does by default.
|_ added a TODO with SO link on `assert_before()`.
Also todo-staged a shielded-pause test to match with the already
existing-but-needs-refinement example B)
Such that we're boxing the interchanged lib's specific error
`msgspec.ValidationError` in this case) type much like how
a `ContextCancelled[trio.Cancelled]` is composed; allows for seemless
multi-backend-codec support later as well B)
Pass `ctx.maybe_raise(from_src_exc=src_err)` where needed in a couple
spots; as `None` in the send-side `Started` MTE case to avoid showing
the `._scope1.cancel_called` result in the traceback from the
`.open_context()` child-sync phase.
That is as a control to `Context._maybe_raise_remote_err()` such that
if set to anything other then the default (`False` value), we do
`raise remote_error from from_src_exc` such that caller can choose to
suppress or override the `.__cause__` tb.
Also tidy up and old masked TODO regarding calling `.maybe_raise()`
after the caller exits from the `yield` in `.open_context()`..
Send-side `MsgTypeError`s actually shouldn't have any "boxed" traceback
per say since they're raised in the transmitting actor's local task env
and we (normally) don't want the ascii decoration added around the
error's `._message: str`, that is not until the exc is `pack_error()`-ed
before transit. As such, the presentation of an embedded traceback (and
its ascii box) gets bypassed when only a `._message: str` is set (as we
now do for pld-spec failures in `_mk_msg_type_err()`).
Further this tweaks the `.pformat()` output to include the `._message`
part to look like `<RemoteActorError( <._message> ) ..` instead of
jamming it implicitly to the end of the embedded `.tb_str` (as was done
implicitly by `unpack_error()`) and also adds better handling for the
`with_type_header == False` case including forcing that case when we
detect that the currently handled exc is the RAE in `.pformat()`.
Toss in a lengthier doc-str explaining it all.
Surrounding/supporting changes,
- better `unpack_error()` message which just briefly reports the remote
task's error type.
- add public `.message: str` prop.
- always set a `._extra_msgdata: dict` since some MTE props rely on it.
- handle `.boxed_type == None` for `.boxed_type_str`.
- maybe pack any detected input or `exc.message` in `pack_error()`.
- comment cruft cleanup in `_mk_msg_type_err()`.
Allows passing a custom error msg other then the traceback-str over the
wire. Make `.tb_str` optional (in the blank `''` sense) since it's
treated that way thus far in `._exceptions.pack_error()`.
Namely checking that `Context._remote_error` is set to the raised MTE
in the invalid started and return value cases since prior to the recent
underlying changes to the `Context.result()` impl, it would not match.
Further,
- do asserts for non-MTE raising cases in both the parent and child.
- add todos for testing ctx-outcomes for per-side-validation policies
i anticipate supporting and implied msg-dialog race cases therein.
More specifically, if `.open_context()` is cancelled when awaiting the
first `Context.started()` during the child task sync phase, check to see
if it was due to `._scope.cancel_called` and raise any remote error via
`.maybe_raise()` instead the `trio.Cancelled` like in every other
remote-error handling case. Ensure we set `._scope[_nursery]` only after
the `Started` has arrived and audited.
Since in the case of the `Actor._cancel_task()` related runtime eps we
actually don't EVER register them in `Actor._rpc_tasks`.. logging about
them is just needless noise, though maybe we should track them in a diff
table; something like a `._runtime_rpc_tasks`?
Drop the cancel-request-for-stale-RPC-task (`KeyError` case in
`Actor._cancel_task()`) log-emit level in to `.runtime()`; it's
generally not useful info other then for granular race condition eval
when hacking the runtime.
So when `is_started_send_side is True` we raise the newly created
`MsgTypeError` (MTE) directly instead of doing all the `Error`-msg pack
and unpack to raise stuff via `_raise_from_unexpected_msg()` since the
raise should happen send side anyway and so doesn't emulate any remote
fault like in a bad `Return` or `Started` without send-side pld-spec
validation.
Oh, and proxy-through the `hide_tb: bool` input from `.drain_to_final_msg()`
to `.recv_msg_w_pld()`.
By calling `Context._maybe_cancel_and_set_remote_error(exc)` on any
unpacked `Error` msg; provides for `Context.maybe_error` consistency to
match all other error delivery cases.
Filling out the helper `validate_payload_msg()` staged in a prior commit
and adjusting all imports to match.
Also add a `raise_mte: bool` flag for potential usage where the caller
wants to handle the MTE instance themselves.
Expecting `Started` or `Return` with respective bad `.pld` values
depending on what type of failure is test parametrized.
This makes the suite run green it seems B)
The `TypeAlias` for the msg type-group is now `MsgType` and any user
touching shuttle messages can now be typed as `PayloadMsg`.
Relatedly, add MTE specific `Error._bad_msg[_as_dict]` fields which are
handy for introspection of remote decode failures.
Such that if caught by user code and/or the runtime we can introspect
the original msg which caused the type error. Previously this was kinda
half-baked with a `.msg_dict` which was delivered from an `Any`-decode
of the shuttle msg in `_mk_msg_type_err()` but now this more explicitly
refines the API and supports both `PayloadMsg`-instance or the msg-dict
style injection:
- allow passing either of `bad_msg: PayloadMsg|None` or
`bad_msg_as_dict: dict|None` to `MsgTypeError.from_decode()`.
- expose public props for both ^ whilst dropping prior `.msgdict`.
- rework `.from_decode()` to explicitly accept `**extra_msgdata: dict`
|_ only overriding it from any `bad_msg_as_dict` if the keys are found in
`_ipcmsg_keys`, **except** for `_bad_msg` when `bad_msg` is passed.
|_ drop `.ipc_msg` passthrough.
|_ drop `msgdict` input.
- adjust `.cid` to only pull from the `.bad_msg` if set.
Related fixes/adjustments:
- `pack_from_raise()` should pull `boxed_type_str` from
`boxed_type.__name__`, not the `type()` of it.. also add a
`hide_tb: bool` flag.
- don't include `_msg_dict` and `_bad_msg` in the `_body_fields` set.
- allow more granular boxed traceback-str controls:
|_ allow passing a `tb_str: str` explicitly in which case we use it
verbatim and presume caller knows what they're doing.
|_ when not provided, use the more explicit
`traceback.format_exception(exc)` since the error instance is
a required input (we still fail back to the old `.format_exc()` call
if for some reason the caller passes `None`; but that should be
a bug right?).
|_ if a `tb: TracebackType` and a `tb_str` is passed, concat them.
- in `RemoteActorError.pformat()` don't indent the `._message` part used
for the `body` when `with_type_header == False`.
- update `_mk_msg_type_err()` to use `bad_msg`/`bad_msg_as_dict`
appropriately and drop passing `ipc_msg`.
In the sense that we handle it as a special case that exposed
through to `RxPld.dec_msg()` with a new `is_started_send_side: bool`.
(Non-ideal) `Context.started()` impl deats:
- only do send-side pld-spec validation when a new `validate_pld_spec`
is set (by default it's not).
- call `self.pld_rx.dec_msg(is_started_send_side=True)` to validate the
payload field from the just codec-ed `Started` msg's `msg_bytes` by
passing the `roundtripped` msg (with it's `.pld: Raw`) directly.
- add a `hide_tb: bool` param and proxy it to the `.dec_msg()` call.
(Non-ideal) `PldRx.dec_msg()` impl deats:
- for now we're packing the MTE inside an `Error` via a manual call to
`pack_error()` and then setting that as the `msg` passed to
`_raise_from_unexpected_msg()` (though really we should just raise
inline?).
- manually set the `MsgTypeError._ipc_msg` to the above..
Other,
- more comprehensive `Context` type doc string.
- various `hide_tb: bool` kwarg additions through `._ops.PldRx` meths.
- proto a `.msg._ops.validate_payload_msg()` helper planned to get the
logic from this version of `.started()`'s send-side validation so as
to be useful more generally elsewhere.. (like for raising back
`Return` values on the child side?).
Warning: this commit may have been made out of order from required
changes to `._exceptions` which will come in a follow up!
Starts with some very basic cases:
- verify both subactor-as-child-ctx-task send side validation (failures)
as well as relay and raise on root-parent-side-task.
- wrap failure expectation cases that bubble out of `@acm`s with
a `maybe_expect_raises()` equiv wrapper with an embedded timeout.
- add `Return` cases including invalid by `str` and valid by a `None`.
Still ToDo:
- commit impl changes to make the bulk of this suite pass.
- adjust how `MsgTypeError`s format the local (`.started()`) send side
`.tb_str` such that we don't do a "boxed" error prior to
`pack_error()` being called normally prior to `Error` transit.
Related to the prior patch, re the new `with_type_header: bool`:
- in the `with_type_header == True` use case make sure we keep the first
`._message: str` line non-indented since it'll show just after the
header-line's type path with ':'.
- when `False` drop the `)>` `repr()`-instance style as well so that we
just get the ascii boxed traceback as though it's the error
message-`str` not the `repr()` of the error obj.
Other,
- hide `pack_from_raise()` call frame since it'll show in debug mode
crash handling..
- mk `MsgTypeError.from_decode()` explicitly accept and proxy an
optional `ipc_msg` and change `msgdict` to also be optional, only
reading out the `**extra_msgdata` when provided.
- expose a `_mk_msg_type_err(src_err_msg: Error|None = None,)` for
callers who which to inject a `._ipc_msg: Msgtype` to the MTE.
|_ add a note how we can't use it due to a causality-dilemma when pld
validating `Started` on the send side..
And IFF the `await wait_func(proc)` is cancelled such that we avoid
clobbering some subactor that might be REPL-ing even though its parent
actor is in the midst of (gracefully) cancelling it.
From both `open_root_actor()` and `._entry._trio_main()`.
Other `breakpoint()`-from-sync-func fixes:
- properly disable the default hook using `"0"` XD
- offer a `hide_tb: bool` from `open_root_actor()`.
- disable hiding the `._trio_main()` frame, bc pretty sure it doesn't
help anyone (either way) when REPL-ing/tb-ing from a subactor..?
Mostly the result of the `RemoteActorError.pformat()` and our
new `_pause/crash_msg: str`s which include the `trio.Task.__repr__()`
in the `log.pdb()` message.
Obvi use the `in_prompt_msg()` to accomplish where not used prior.
ToDo later:
-[ ] still some outstanding questions on how detailed inceptions
should look, eg. in `test_multi_nested_subactors_error_through_nurseries()`
|_maybe we should be more pedantic at checking `.src_uid` vs.
`.relay_uid` fields?
-[ ] staged a placeholder test for verifying correct call-stack frame on
crash handler REPL entry.
-[ ] also need a test to verify that you can't pause from an already paused actor task
such as can happen if you try to step through runtime code that has
a recurrent entry to `._debug.pause()`.
Call it `hide_runtime_frames()` and stick all the lines from the top of
the `._debug` mod in there along with a little `log.devx()` emission on
what gets hidden by default ;)
Other,
- fix ref-error where internal-error handler might trigger despite the
debug `req_ctx` not yet having init-ed, such that we don't try to
cancel or log about it when it never was fully created/initialize..
- fix assignment typo iniside `_set_trace()` for `task`.. lel
Such that when displaying with `.__str__()` we do not show the type
header (style) since normally python's raising machinery already prints
the type path like `'tractor._exceptions.RemoteActorError:'`, so doing
it 2x is a bit ugly ;p
In support,
- include `.relay_uid` in `RemoteActorError.extra_body_fields`.
- offer a `with_type_header: bool` to `.pformat()` and only put the
opening type path and closing `')>'` tail line when `True`.
- add `.is_inception() -> bool:` for an easy way to determine if the
error is multi-hop relayed.
- only repr the `'|_relay_uid=<uid>'` field when an error is an inception.
- tweak the invalid-payload case in `_mk_msg_type_err()` to explicitly
state in the `message` how the `any_pld` value does not match the `MsgDec.pld_spec`
by decoding the invalid `.pld` with an any-dec.
- allow `_mk_msg_type_err(**mte_kwargs)` passthrough.
- pass `boxed_type=cls` inside `MsgTypeError.from_decode()`.
More or less by pedantically separating and managing root and subactor
request syncing events to always be managed by the locking IPC context
task-funcs:
- for the root's "child"-side, `lock_tty_for_child()` directly creates
and sets a new `Lock.req_handler_finished` inside a `finally:`
- for the sub's "parent"-side, `request_root_stdio_lock()` does the same
with a new `DebugStatus.req_finished` event and separates it from
the `.repl_release` event (which indicates a "c" or "q" from user and
thus exit of the REPL session) as well as sets a new `.req_task:
trio.Task` to explicitly distinguish from the app-user-task that
enters the REPL vs. the paired bg task used to request the global
root's stdio mutex alongside it.
- apply the `__pld_spec__` on "child"-side of the ctx using the new
`Portal.open_context(pld_spec)` parameter support; drops use of any
`ContextVar` malarky used prior for `PldRx` mgmt.
- removing `Lock.no_remote_has_tty` since it was a nebulous name and
from the prior "everything is in a `Lock`" design..
------ - ------
More rigorous impl to handle various edge cases in `._pause()`:
- rejig `_enter_repl_sync()` to wrap the `debug_func == None` case
inside maybe-internal-error handler blocks.
- better logic for recurrent vs. multi-task contention for REPL entry in
subactors, by guarding using `DebugStatus.req_task` and by now waiting
on the new `DebugStatus.req_finished` for the multi-task contention
case.
- even better internal error handling and reporting for when this code
is hacked on and possibly broken ;p
------ - ------
Updates to `.pause_from_sync()` support:
- add optional `actor`, `task` kwargs to `_set_trace()` to allow
compat with the new explicit `debug_func` calling in `._pause()` and
pass a `threading.Thread` for `task` in the `.to_thread()` usage case.
- add an `except` block that tries to show the frame on any internal
error.
------ - ------
Relatedly includes a buncha cleanups/simplifications somewhat in
prep for some coming refinements (around `DebugStatus`):
- use all the new attrs mentioned above as needed in the SIGINT shielder.
- wait on `Lock.req_handler_finished` in `maybe_wait_for_debugger()`.
- dropping a ton of masked legacy code left in during the recent reworks.
- better comments, like on the use of `Context._scope` for shielding on
the "child"-side to avoid the need to manage yet another cs.
- add/change-to lotsa `log.devx()` level emissions for those infos which
are handy while hacking on the debugger but not ideal/necessary to be
user visible.
- obvi add lotsa follow up todo notes!
Much like other recent changes attempt to detect runtime-bug-causing
crashes and only show the runtime-endpoint frame when present.
Adds a `ActorNursery._scope_error: BaseException|None` attr to aid with
detection. Also toss in some todo notes for removing and replacing the
`.run_in_actor()` method API.
Just inside `._invoke()` after the `ctx: Context` is retrieved.
Also try our best to *not hide* internal frames when a non-user-code
crash happens, normally either due to a runtime RPC EP bug or
a transport failure.
Since the state mgmt becomes quite messy with multiple sub-tasks inside
an IPC ctx, AND bc generally speaking the payload-type-spec should map
1-to-1 with the `Context`, it doesn't make a lot of sense to be using
`ContextVar`s to modify the `Context.pld_rx: PldRx` instance.
Instead, always allocate a full instance inside `mk_context()` with the
default `.pld_rx: PldRx` set to use the `msg._ops._def_any_pldec: MsgDec`
In support, simplify the `.msg._ops` impl and APIs:
- drop `_ctxvar_PldRx`, `_def_pld_rx` and `current_pldrx()`.
- rename `PldRx._pldec` -> `._pld_dec`.
- rename the unused `PldRx.apply_to_ipc()` -> `.wraps_ipc()`.
- add a required `PldRx._ctx: Context` attr since it is needed
internally in some meths and each pld-rx now maps to a specific ctx.
- modify all recv methods to accept a `ipc: Context|MsgStream` (instead
of a `ctx` arg) since both have a ref to the same `._rx_chan` and there
are only a couple spots (in `.dec_msg()`) where we need the `ctx`
explicitly (which can now be easily accessed via a new `MsgStream.ctx`
property, see below).
- always show the `.dec_msg()` frame in tbs if there's a reference error
when calling `_raise_from_unexpected_msg()` in the fallthrough case.
- implement `limit_plds()` as light wrapper around getting the
`current_ipc_ctx()` and mutating its `MsgDec` via
`Context.pld_rx.limit_plds()`.
- add a `maybe_limit_plds()` which just provides an `@acm` equivalent of
`limit_plds()` handy for composing in a `async with ():` style block
(avoiding additional indent levels in the body of async funcs).
Obvi extend the `Context` and `MsgStream` interfaces as needed
to match the above:
- add a `Context.pld_rx` pub prop.
- new private refs to `Context._started_msg: Started` and
a `._started_pld` (mostly for internal debugging / testing / logging)
and set inside `.open_context()` immediately after the syncing phase.
- a `Context.has_outcome() -> bool:` predicate which can be used to more
easily determine if the ctx errored or has a final result.
- pub props for `MsgStream.ctx: Context` and `.chan: Channel` providing
full `ipc`-arg compat with the `PldRx` method signatures.
Finally got this working so that if/when an internal bug is introduced
to this request task-func, we can actually REPL-debug the lock request
task itself B)
As in, if the subactor's lock request task internally errors we,
- ensure the task always terminates (by calling `DebugStatus.release()`)
and explicitly reports (via a `log.exception()`) the internal error.
- capture the error instance and set as a new `DebugStatus.req_err` and
always check for it on final teardown - in which case we also,
- ensure it's reraised from a new `DebugRequestError`.
- unhide the stack frames for `_pause()`, `_enter_repl_sync()` so that
the dev can upward inspect the `_pause()` call stack sanely.
Supporting internal impl changes,
- add `DebugStatus.cancel()` and `.req_err`.
- don't ever cancel the request task from
`PdbREPL.set_[continue/quit]()` only when there's some internal error
that would likely result in a hang and stale lock state with the root.
- only release the root's lock when the current ask is also the owner
(avoids bad release errors).
- also show internal `._pause()`-related frames on any `repl_err`.
Other temp-dev-tweaks,
- make pld-dec change log msgs info level again while solving this
final context-vars race stuff..
- drop the debug pld-dec instance match asserts for now since
the problem is already caught (and now debug-able B) by an attr-error
on the decoded-as-`dict` started msg, and instead add in
a `log.exception()` trace to see which task is triggering the case
where the debug `MsgDec` isn't set correctly vs. when we think it's
being applied.
Since obviously the thread is likely expected to halt and raise after
the REPL session exits; this was a regression from the prior impl. The
main reason for this is that otherwise the request task will never
unblock if the user steps through the crashed task using 'next' since
the `.do_next()` handler doesn't by default release the request since in
the `.pause()` case this would end the session too early.
Other,
- toss in draft `Pdb.user_exception()`, though doesn't seem to ever
trigger?
- only release `Lock._debug_lock` when already locked.
Mostly adjustments for the new pld-receiver semantics/shim-layer which
results more often in the direct delivery of `RemoteActorError`s from
IPC API primitives (like `Portal.result()`) instead of being embedded in
an `ExceptionGroup` bundled from an embedded nursery.
Tossed usage of the `debug_mode: bool` fixture to a couple problematic
tests while i was working on them.
Also includes detailed assertion updates to the inter-peer cancellation
suite in terms of,
- `Context.canceller` state correctly matching the true src actor when
expecting a ctxc.
- any rxed `ContextCancelled` should instance match the `Context._local/remote_error`
as should the `.msgdata` and `._ipc_msg`.
- start tossing in `__tracebackhide__`s to various eps which don't need
to show in tbs or in the pdb REPL.
- port final `._maybe_enter_pm()` to pass a `api_frame`.
- start comment-marking up some API eps with `@api_frame`
in prep for actually using the new frame-stack tracing.
Woops, due to a `None` test against the `._final_result`, any actual
final `None` result would be received but not acked as such causing
a spawning test to hang. Fix it by instead receiving and assigning both
a `._final_result_msg: PayloadMsg` and `._final_result_pld`.
NB: as mentioned in many recent comments surrounding this API layer,
really this whole `Portal`-has-final-result interface/semantics should
be entirely removed as should the `ActorNursery.run_in_actor()` API(s).
Instead it should all be replaced by a wrapping "high level" API
(`tractor.hilevel` ?) which combines a task nursery, `Portal.open_context()`
and underlying `Context` APIs + an `outcome.Outcome` to accomplish the
same "run a single task in a spawned actor and return it's result"; aka
a "one-shot-task-actor".
- inside the `Actor.cancel()`'s maybe-wait-on-debugger delay,
report the full debug request status and it's affiliated lock request
IPC ctx.
- use the new `.req_ctx.chan.uid` to do the local nursery lookup during
channel teardown handling.
- another couple log fmt tweaks.
Proto-ing a little suite of call-stack-frame annotation-for-scanning
sub-systems for the purposes of both,
- the `.devx._debug`er and its
traceback and frame introspection needs when entering the REPL,
- detailed trace-style logging such that we can explicitly report
on "which and where" `tractor`'s APIs are used in the "app" code.
Deats:
- change mod name obvi from `._code` and adjust client mod imports.
- using `wrapt` (for perf) implement a `@api_frame` annot decorator
which both stashes per-call-stack-frame instances of `CallerInfo` in
a table and marks the function such that API endpoints can be easily
found via runtime stack scanning despite any internal impl changes.
- add a global `_frame2callerinfo_cache: dict[FrameType, CallerInfo]`
table for providing the per func-frame info caching.
- Re-implement `CallerInfo` to require less (types of) inputs:
|_ `_api_func: Callable`, a ref to the (singleton) func def.
|_ `_api_frame: FrameType` taken from the `@api_frame` marked `tractor`-API
func's runtime call-stack, from which we can determine the
app code's `.caller_frame`.
|_`_caller_frames_up: int|None` allowing the specific `@api_frame` to
determine "how many frames up" the application / calling code is.
And, a better set of derived attrs:
|_`caller_frame: FrameType` which finds and caches the API-eps calling
frame.
|_`caller_frame: FrameType` which finds and caches the API-eps calling
- add a new attempt at "getting a method ref from its runtime frame"
with `get_ns_and_func_from_frame()` using a heuristic that the
`CodeType.co_qualname: str` should have a "." in it for methods.
- main issue is still that the func-ref lookup will require searching
for the method's instance type by name, and that name isn't
guaranteed to be defined in any particular ns..
|_rn we try to read it from the `FrameType.f_locals` but that is
going to obvi fail any time the method is called in a module where
it's type is not also defined/imported.
- returns both the ns and the func ref FYI.
- set `._state._ctxvar_Context` just after `StartAck` inside
`open_context_from_portal()` so that `current_ipc_ctx()` always
works on the 'parent' side.
- always set `.canceller` to any `MsgTypeError.src_uid` and otherwise to
any maybe-detected `.src_uid` (i.e. for RAEs).
- always set `.canceller` to us when we rx a ctxc which reports us as
its canceller; this is a sanity check on definite "self cancellation".
- adjust `._is_self_cancelled()` logic to only be `True` when
`._remote_error` is both a ctxc with a `.canceller` set to us AND
when `Context.canceller` is also set to us (since the change above)
as a little bit of extra rigor.
- fill-in/fix some `.repr_state` edge cases:
- merge self-vs.-peer ctxc cases to one block and distinguish via
nested `._is_self_cancelled()` check.
- set 'errored' for all exception matched cases despite `.canceller`.
- add pre-`Return` phase statuses:
|_'pre-started' and 'syncing-to-child' depending on side and when
`._stream` has not (yet) been set.
|_'streaming' and 'streaming-finished' depending on side when
`._stream` is set and whether it was stopped/closed.
- tweak drainage log-message to use "outcome" instead of "result".
- use new `.devx.pformat.pformat_cs()` inside `_maybe_cancel_and_set_remote_error()`
but, IFF the log level is at least 'cancel'.
Since i was running into them (internal errors) during lock request
machinery dev and was getting all sorts of difficult to understand hangs
whenever i intro-ed a bug to either side of the ipc ctx; this all while
trying to get the msg-spec working for `Lock` requesting subactors..
Deats:
- hideframes for `@acm`s and `trio.Event.wait()`, `Lock.release()`.
- better detail out the `Lock.acquire/release()` impls
- drop `Lock.remote_task_in_debug`, use new `.ctx_in_debug`.
- add a `Lock.release(force: bool)`.
- move most of what was `_acquire_debug_lock_from_root_task()` and some
of the `lock_tty_for_child().__a[enter/exit]()` logic into
`Lock.[acquire/release]()` including bunch more logging.
- move `lock_tty_for_child()` up in the module to below `Lock`, with
some rework:
- drop `subactor_uid: tuple` arg since we can just use the `ctx`..
- add exception handler blocks for reporting internal (impl) errors
and always force release the lock in such cases.
- extend `DebugStatus` (prolly will rename to `DebugRequest` btw):
- add `.req_ctx: Context` for subactor side.
- add `.req_finished: trio.Event` to sub to signal request task exit.
- extend `.shield_sigint()` doc-str.
- add `.release()` to encaps all the state mgmt previously strewn
about inside `._pause()`..
- use new `DebugStatus.release()` to replace all the duplication:
- inside `PdbREPL.set_[continue/quit]()`.
- inside `._pause()` for the subactor branch on internal
repl-invocation error cases,
- in the `_enter_repl_sync()` closure on error,
- replace `apply_debug_codec()` -> `apply_debug_pldec()` in tandem with
the new `PldRx` sub-sys which handles the new `__pld_spec__`.
- add a new `pformat_cs()` helper orig to help debug cs stack
a corruption; going to move to `.devx.pformat` obvi.
- rename `wait_for_parent_stdin_hijack()` -> `request_root_stdio_lock()`
with improvements:
- better doc-str and add todos,
- use `DebugStatus` more stringently to encaps all subactor req state.
- error handling blocks for cancellation and straight up impl errors
directly around the `.open_context()` block with the latter doing
a `ctx.cancel()` to avoid hanging in the shielded `.req_cs` scope.
- similar exc blocks for the func's overall body with explicit
`log.exception()` reporting.
- only set the new `DebugStatus.req_finished: trio.Event` in `finally`.
- rename `mk_mpdb()` -> `mk_pdb()` and don't cal `.shield_sigint()`
implicitly since the caller usage does matter for this.
- factor out `any_connected_locker_child()` from the SIGINT handler.
- rework SIGINT handler to better handle any stale-lock/hang cases:
- use new `Lock.ctx_in_debug: Context` to detect subactor-in-debug.
and use it to cancel any lock request instead of the lower level
- use `problem: str` summary approach to log emissions.
- rework `_pause()` given all of the above, stuff not yet mentioned:
- don't take `shield: bool` input and proxy to `debug_func()` (for now).
- drop `extra_frames_up_when_async: int` usage, expect
`**debug_func_kwargs` to passthrough an `api_frame: Frametype` (more
on this later).
- lotsa asserts around the request ctx vs. task-in-debug ctx using new
`current_ipc_ctx()`.
- asserts around `DebugStatus` state.
- rework and simplify the `debug_func` hooks,
`_set_trace()`/`_post_mortem()`:
- make them accept a non-optional `repl: PdbRepl` and `api_frame:
FrameType` which should be used to set the current frame when the
REPL engages.
- always hide the hook frames.
- always accept a `tb: TracebackType` to `_post_mortem()`.
|_ copy and re-impl what was the delegation to
`pdbp.xpm()`/`pdbp.post_mortem()` and instead call the
underlying `Pdb.interaction()` ourselves with a `caller_frame`
and tb instance.
- adjust the public `.pause()` impl:
- accept optional `hide_tb` and `api_frame` inputs.
- mask opening a cancel-scope for now (can cause `trio` stack
corruption, see notes) and thus don't use the `shield` input other
then to eventually passthrough to `_post_mortem()`?
|_ thus drop `task_status` support for now as well.
|_ pretty sure correct soln is a debug-nursery around `._invoke()`.
- since no longer using `extra_frames_up_when_async` inside
`debug_func()`s ensure all public apis pass a `api_frame`.
- re-impl our `tractor.post_mortem()` to directly call into `._pause()`
instead of binding in via `partial` and mk it take similar input as
`.pause()`.
- drop `Lock.release()` from `_maybe_enter_pm()`, expose and pass
expected frame and tb.
- use necessary changes from all the above within
`maybe_wait_for_debugger()` and `acquire_debug_lock()`.
Lel, sorry thought that would be shorter..
There's still a lot more re-org to do particularly with `DebugStatus`
encapsulation but it's coming in follow up.
Since we need to allow it (at the least) inside
`drain_until_final_msg()` for handling stream-phase termination races
where we don't want to have to handle a raised error from something like
`Context.result()`. Expose the passthrough option via
a `passthrough_non_pld_msgs: bool` kwarg.
Add comprehensive comment to `current_pldrx()`.
Expose it from `._state.current_ipc_ctx()` and set it inside
`._rpc._invoke()` for child and inside `Portal.open_context()` for
parent.
Still need to write a few more tests (particularly demonstrating usage
throughout multiple nested nurseries on each side) but this suffices as
a proto for testing with some debugger request-from-subactor stuff.
Other,
- use new `.devx.pformat.add_div()` for ctxc messages.
- add a block to always traceback dump on corrupted cs stacks.
- better handle non-RAEs exception output-formatting in context
termination summary log message.
- use a summary for `start_status` for msg logging in RPC loop.
Since we usually want them raised from some (internal) call to
`Context.maybe_raise()` and NOT directly from the drainage call, make it
possible via a new `raise_error: bool` to both `PldRx.recv_msg_w_pld()`
and `.dec_msg()`.
In support,
- rename `return_msg` -> `result_msg` since we expect to return
`Error`s.
- do a `result_msg` assign and `break` in the `case Error()`.
- add `**dec_msg_kwargs` passthrough for other `.dec_msg()` calling
methods.
Other,
- drop/aggregate todo-notes around the main loop's
`ctx._pld_rx.recv_msg_w_pld()` call.
- add (configurable) frame hiding to most payload receive meths.
Since `._code` is prolly gonna get renamed (to something "frame & stack
tools" related) and to give a bit better organization.
Also adds a new `add_div()` helper, factored out of ctxc message
creation in `._rpc._invoke()`, for adding a little "header line" divider
under a given `message: str` with a little math to center it.
For more sane manual calls as needed in logging purposes. Obvi remap
the dunder methods to it.
Other:
- drop `hide_tb: bool` from `unpack_error()`, shouldn't need it since
frame won't ever be part of any tb raised from returned error.
- add a `is_invalid_payload: bool` to `_raise_from_unexpected_msg()` to
be used from `PldRx` where we don't need to decode the IPC
msg, just the payload; make the error message reflect this case.
- drop commented `._portal._unwrap_msg()` since we've replaced it with
`PldRx`'s delegation to newer `._raise_from_unexpected_msg()`.
- hide the `Portal.result()` frame by default, again.
A better spot for the pretty-formatting of frame text (and thus tracebacks)
is in the new `.devx._code` module:
- move from `._exceptions` -> `.devx._code.pformat_boxed_tb()`.
- add new `pformat_caller_frame()` factored out the use case in
`._exceptions._mk_msg_type_err()` where we dump a stack trace
for bad `.send()` side IPC msgs.
Add some new pretty-format methods to `Context`:
- explicitly implement `.pformat()` and allow an `extra_fields: dict`
which can be used to inject additional fields (maybe eventually by
default) such as is now used inside
`._maybe_cancel_and_set_remote_error()` when reporting the internal
`._scope` state in cancel logging.
- add a new `.repr_state -> str` which provides a single string status
depending on the internal state of the IPC ctx in terms of the shuttle
protocol's "phase"; use it from `.pformat()` for the `|_state:`.
- set `.started(complain_no_parity=False)` now since we presume decoding
with `.pld: Raw` now with the new `PldRx` design.
- use new `msgops.current_pldrx()` in `mk_context()`.
Not sure it's **that** useful (yet) but in theory would allow avoiding
certain log level usage around transient RPC requests for discovery methods
(like `.register_actor()` and friends); can't hurt to be able to
introspect that last message for other future cases I'd imagine as well.
Adjust the calling code in `._runtime` to match; other spots are using
the `trio.Nursery.start()` schedule style and are fine as is.
Improve a bunch more log messages throughout a few mods mostly by going
to a "summary" single-emission style where possible/appropriate:
- in `._runtime` more "single summary" status style log emissions:
|_mk `Actor.load_modules()` render a single mod loaded summary.
|_use a summary `con_status: str` for `Actor._stream_handler()` conn
setup and an equiv (`con_teardown_status`) for connection teardowns.
|_similar thing in `Actor.wait_for_actor()`.
- generally more usage of `.msg.pretty_struct` apis throughout `._runtime`.
Add new task-scope oriented `PldRx.pld_spec` management API similar to
`.msg._codec.limit_msg_spec()`, but obvi built to process and filter
`MsgType.pld` values.
New API related changes include:
- new per-task singleton getter `msg._ops.current_pldrx()` which
delivers the current (global) payload receiver via a new
`_ctxvar_PldRx: ContextVar` configured with a default
`_def_any_pldec: MsgDec[Any]` decoder.
- a `PldRx.limit_plds()` which sets the decoder (`.type` underneath)
for the specific payload rx instance.
- `.msg._ops.limit_plds()` which obtains the current task-scoped `PldRx`
and applies the pld spec via a new `PldRx.limit_plds()`.
- rename `PldRx._msgdec` -> `._pldec`.
- add `.pld_dec` as pub attr for -^
Unrelated adjustments:
- use `.msg.pretty_struct.pformat()` where handy.
- always pass `expect_msg: MsgType`.
- add a `case Stop()` to `PldRx.dec_msg()` which will `log.warning()`
when a stop is received by no stream was open on this receiving side
since we rarely want that to raise since it's prolly just a runtime
race or mistake in user code.
Other:
Particularly when logging around `MsgTypeError`s.
Other:
- make `_raise_from_unexpected_msg()`'s `expect_msg` a non-default value
arg, must always be passed by caller.
- drop `'canceller'` from `_body_fields` ow it shows up twice for ctxc.
- use `.msg.pretty_struct.pformat()`.
- parameterize `RemoteActorError.reprol()` (repr-one-line method) to
show `RemoteActorError[<self.boxed_type_str>]( ..` to make obvi
the boxed remote error type.
- re-impl `.boxed_type_str` as `str`-casting the `.boxed_type` value
which is guaranteed to render non-`None`.
Basically exact same as that for `MsgCodec` with the `.spec` displayed
via a better (maybe multi-line) `.spec_str: str` generated from a common
new set of helper mod funcs factored out msg-codec meths:
- `mk_msgspec_table()` to gen a `MsgType` name -> msg table.
- `pformat_msgspec()` to `str`-ify said table values nicely.q
Also add a new `MsgCodec.msg_spec_str: str` prop which delegates to the
above for the same.
- tweaking logging to include more `MsgType` dumps on IPC faults.
- removing some commented cruft.
- comment formatting / cleanups / add-ons.
- more type annots.
- fill out some TODO content.
As per the reco:
https://jcristharif.com/msgspec/perf-tips.html#reusing-an-output-buffe
BUT, seems to cause this error in `pikerd`..
`BufferError: Existing exports of data: object cannot be re-sized`
Soo no idea? Maybe there's a tweak needed that we can glean from
tests/examples in the `msgspec` repo?
Disabling for now.
Instead of expecting it to be passed in (as it was prior), when
determining if a `Stop` msg is a valid end-of-channel signal use the
`ctx._stream: MsgStream|None` attr which **must** be set by any stream
opening API; either of:
- `Context.open_stream()`
- `Portal.open_stream_from()`
Adjust the case block logic to match with fallthrough from any EoC to
a closed error if necessary. Change the `_type: str` to match the
failing IPC-prim name in the tail case we raise a `MessagingError`.
Other:
- move `.sender: tuple` uid attr up to `RemoteActorError` since `Error`
optionally defines it as a field and for boxed `StreamOverrun`s (an
ignore case we check for in the runtime during cancellation) we want
it readable from the boxing rae.
- drop still unused `InternalActorError`.
As per much tinkering, re-designs and preceding rubber-ducking via many
"commit msg novelas", **finally** this adds the (hopefully) final
missing layer for typed msg safety: `tractor.msg._ops.PldRx`
(or `PayloadReceiver`? haven't decided how verbose to go..)
Design justification summary:
------ - ------
- need a way to be as-close-as-possible to the `tractor`-application
such that when `MsgType.pld: PayloadT` validation takes place, it is
straightforward and obvious how user code can decide to handle any
resulting `MsgTypeError`.
- there should be a common and optional-yet-modular way to modify
**how** data delivered via IPC (possibly embedded as user defined,
type-constrained `.pld: msgspec.Struct`s) can be handled and processed
during fault conditions and/or IPC "msg attacks".
- support for nested type constraints within a `MsgType.pld` field
should be simple to define, implement and understand at runtime.
- a layer between the app-level IPC primitive APIs
(`Context`/`MsgStream`) and application-task code (consumer code of
those APIs) should be easily customized and prove-to-be-as-such
through demonstrably rigorous internal (sub-sys) use!
-> eg. via seemless runtime RPC eps support like `Actor.cancel()`
-> by correctly implementing our `.devx._debug.Lock` REPL TTY mgmt
dialog prot, via a dead simple payload-as-ctl-msg-spec.
There are some fairly detailed doc strings included so I won't duplicate
that content, the majority of the work here is actually somewhat of
a factoring of many similar blocks that are doing more or less the same
`msg = await Context._rx_chan.receive()` with boilerplate for
`Error`/`Stop` handling via `_raise_from_no_key_in_msg()`. The new
`PldRx` basically provides a shim layer for this common "receive msg,
decode its payload, yield it up to the consuming app task" by pairing
the RPC feeder mem-chan with a msg-payload decoder and expecting IPC API
internals to use **one** API instead of re-implementing the same pattern
all over the place XD
`PldRx` breakdown
------ - ------
- for now only expects a `._msgdec: MsgDec` which allows for
override-able `MsgType.pld` validation and most obviously used in
the impl of `.dec_msg()`, the decode message method.
- provides multiple mem-chan receive options including:
|_ `.recv_pld()` which does the e2e operation of receiving a payload
item.
|_ a sync `.recv_pld_nowait()` version.
|_ a `.recv_msg_w_pld()` which optionally allows retreiving both the
shuttling `MsgType` as well as it's `.pld` body for use cases where
info on both is important (eg. draining a `MsgStream`).
Dirty internal changeover/implementation deatz:
------ - ------
- obvi move over all the IPC "primitives" that previously had the duplicate recv-n-yield
logic:
- `MsgStream.receive[_nowait]()` delegating instead to the equivalent
`PldRx.recv_pld[_nowait]()`.
- add `Context._pld_rx: PldRx`, created and passed in by
`mk_context()`; use it for the `.started()` -> `first: Started`
retrieval inside `open_context_from_portal()`.
- all the relevant `Portal` invocation methods: `.result()`,
`.run_from_ns()`, `.run()`; also allows for dropping `_unwrap_msg()`
and `.Portal_return_once()` outright Bo
- rename `Context.ctx._recv_chan` -> `._rx_chan`.
- add detailed `Context._scope` info for logging whether or not it's
cancelled inside `_maybe_cancel_and_set_remote_error()`.
- move `._context._drain_to_final_msg()` -> `._ops.drain_to_final_msg()`
since it's really not necessarily ctx specific per say, and it does
kinda fit with "msg operations" more abstractly ;)
In prep for a "payload receiver" abstraction that will wrap
`MsgType.pld`-IO delivery from `Context` and `MsgStream`, adds a small
`msgspec.msgpack.Decoder` shim which delegates an API similar to
`MsgCodec` and is offered via a `.msg._codec.mk_dec()` factory.
Detalles:
- move over the TODOs/comments from `.msg.types.Start` to to
`MsgDec.spec` since it's probably the ideal spot to start thinking
about it from a consumer code PoV.
- move codec reversion assert and log emit into `finally:` block.
- flip default `.types._tractor_codec = mk_codec_ipc_pld(ipc_pld_spec=Raw)`
in prep for always doing payload-delayed decodes.
- make `MsgCodec._dec` private with public property getter.
- change `CancelAck` to NOT derive from `Return` so it's mutex in
`match/case:` handling.
Since it's going to be used from the IPC primitive APIs
(`Context`/`MsgStream`) for similarly handling payload type spec
validation errors and bc it's really not well situation in the IPC
module XD
Summary of (impl) tweaks:
- obvi move `_mk_msg_type_err()` and import and use it in `._ipc`; ends
up avoiding a lot of ad-hoc imports we had from `._exceptions` anyway!
- mask out "new codec" runtime log emission from `MsgpackTCPStream`.
- allow passing a (coming in next commit) `codec: MsgDec` (message
decoder) which supports the same required `.pld_spec_str: str` attr.
- for send side logging use existing `MsgCodec..pformat_msg_spec()`.
- rename `_raise_from_no_key_in_msg()` to the now more appropriate
`_raise_from_unexpected_msg()`, but leaving alias for now.
Turns out we do want per-task inheritance particularly if there's to be
per `Context` dynamic mutation of the spec; we don't want mutation in
some task to affect any parent/global setting.
Turns out since we use a common "feeder task" in the rpc loop, we need to
offer a per `Context` payload decoder sys anyway in order to enable
per-task controls for inter-actor multi-task-ctx scenarios.
As per some newly added features and APIs:
- pass `portal: Portal` to `Actor.start_remote_task()` from
`open_context_from_portal()` marking `Portal.open_context()` as
always being the "parent" task side.
- add caller tracing via `.devx._code.CallerInfo/.find_caller_info()`
called in `mk_context()` and (for now) a `__runtimeframe__: int = 2`
inside `open_context_from_portal()` such that any enter-er of
`Portal.open_context()` will be reported.
- pass in a new `._caller_info` attr which is used in 2 new meths:
- `.repr_caller: str` for showing the name of the app-code-func.
- `.repr_api: str` for showing the API ep, which for now we just
hardcode to `Portal.open_context()` since ow its gonna show the mod
func name `open_context_from_portal()`.
- use those new props ^ in the `._deliver_msg()` flow body log msg
content for much clearer msg-flow tracing Bo
- add `Context._cancel_on_msgerr: bool` to toggle whether
a delivered `MsgTypeError` should trigger a `._scope.cancel()` call.
- also (temporarily) add separate `.cancel()` emissions for both cases
as i work through hacking out the maybe `MsgType.pld: Raw` support.
Breaks out all the (sub)actor local conc primitives from `Lock` (which
is now only used in and by the root actor) such that there's an explicit
distinction between a task that's "consuming" the `Lock` (remotely) vs.
the root-side service tasks which do the actual acquire on behalf of the
requesters.
`DebugStatus` changeover deats:
------ - ------
- move all the actor-local vars over `DebugStatus` including:
- move `_trio_handler` and `_orig_sigint_handler`
- `local_task_in_debug` now `repl_task`
- `_debugger_request_cs` now `req_cs`
- `local_pdb_complete` now `repl_release`
- drop all ^ fields from `Lock.repr()` obvi..
- move over the `.[un]shield_sigint()` and
`.is_main_trio_thread()` methods.
- add some new attrs/meths:
- `DebugStatus.repl` for the currently running `Pdb` in-actor
singleton.
- `.repr()` for pprint of state (like `Lock`).
- Note: that even when a root-actor task is in REPL, the `DebugStatus`
is still used for certain actor-local state mgmt, such as SIGINT
handler shielding.
- obvi change all lock-requester code bits to now use a `DebugStatus` in
their local actor-state instead of `Lock`, i.e. change usage from
`Lock` in `._runtime` and `._root`.
- use new `Lock.get_locking_task_cs()` API in when checking for
sub-in-debug from `._runtime.Actor._stream_handler()`.
Unrelated to topic-at-hand tweaks:
------ - ------
- drop the commented bits about hiding `@[a]cm` stack frames from
`_debug.pause()` and simplify to only one block with the `shield`
passthrough since we already solved the issue with cancel-scopes using
`@pdbp.hideframe` B)
- this includes all the extra logging about the extra frame for the
user (good thing i put in that wasted effort back then eh..)
- put the `try/except BaseException` with `log.exception()` around the
whole of `._pause()` to ensure we don't miss in-func errors which can
cause hangs..
- allow passing in `portal: Portal` to
`Actor.start_remote_task()` such that `Portal` task spawning methods
are always denoted correctly in terms of `Context.side`.
- lotsa logging tweaks, decreasing a bit of noise from `.runtime()`s.
Since it's totes possible to have a spec applied that won't permit
`str`s, might as well formalize a small msg set for subactors to request
the tree-wide TTY `Lock`.
BTW, I'm prolly not going into every single change here in this first
WIP since there's still a variety of broken stuff mostly to do with
races on the codec apply being done in a `trio.lowleve.RunVar`; it
should be re-done with a `ContextVar` such that each task does NOT
mutate the global setting..
New msg set and usage is simply:
- `LockStatus` which is the reponse msg delivered from `lock_tty_for_child()`
- `LockRelease` a one-off request msg from the subactor to drop the
`Lock` from a `MsgStream.send()`.
- use these msgs throughout the root and sub sides of the locking
ctx funcs: `lock_tty_for_child()` & `wait_for_parent_stdin_hijack()`
The codec is now applied in both the root and sub `Lock` request tasks:
- for root inside `lock_tty_for_child()` before the `.started()`.
- for subs, inside `wait_for_parent_stdin_hijack()` since we only want
to affect the codec *for the locking task*.
- (hence the need for ctx-var as mentioned above but currently this
can cause races which will break against other app tasks competing
for the codec setting).
- add a `apply_debug_codec()` helper for use in both cases.
- add more detailed logging to both the root and sub side of `Lock`
requesting funcs including requiring that the sub-side task "uid" (a
`tuple[str, int]` = (trio.Task.name, id(trio.Task)` be provided (more
on this later).
A main issue discovered while proto-testing all this was the ability of
a sub to "double lock" (leading to self-deadlock) via an error in
`wait_for_parent_stdin_hijack()` which, for ex., can happen in debug
mode via crash handling of a `MsgTypeError` received from the root
during a codec applied msg-spec race! Originally I was attempting to
solve this by making the SIGINT override handler more resilient but this
case is somewhat impossible to detect by an external root task other
then checking for duplicate ownership via the new `subactor_task_uid`.
=> SO NOW, we always stick the current task uid in the
`Lock._blocked: set` and raise an rte on a double request by the same
remote task.
Included is a variety of small refinements:
- finally figured out how to mark a variety of `.__exit__()` frames with
`pdbp.hideframe()` to actually hide them B)
- add cls methods around managing `Lock._locking_task_cs` from root only.
- re-org all the `Lock` attrs into those only used in root vs. subactors
and proto-prep a new `DebugStatus` actor-singleton to be used in subs.
- add a `Lock.repr()` to contextually print the current conc primitives.
- rename our `Pdb`-subtype to `PdbREPL`.
- rigor out the SIGINT handler a bit, originally to try and hack-solve
the double-lock issue mentioned above, but now just with better
logging and logic for most (all?) possible hang cases that should be
hang-recoverable after enough ctrl-c mashing by the user.. well
hopefully:
- using `Lock.repr()` for both root and sub cases.
- lots more `log.warn()`s and handler reversions on stale lock or cs
detection.
- factor `._pause()` impl a little better moving the actual repl entry
to a new `_enter_repl_sync()` (originally for easier wrapping in the
sub case with `apply_codec()`).
- Drop `test_msg_spec_xor_pld_spec()` since we no longer support
`ipc_msg_spec` arg to `mk_codec()`.
- Expect `MsgTypeError`s around `.open_context()` calls when
`add_codec_hooks == False`.
- toss in some `.pause()` points in the subactor ctx body whilst hacking
out a `.pld` protocol for debug mode TTY locking.
Such that the top level `maybe_enable_greenback` from
`open_root_actor()` can toggle the entire actor tree's usage.
Read the rtv in `._rpc` tasks and only enable if set.
Also, rigor up the `._rpc.process_messages()` loop to handle `Error()`
and `case _:` separately such that we now raise an explicit rte for
unknown / invalid msgs. Use "parent" / "child" for side descriptions in
loop comments and put a fat comment before the `StartAck` in `_invoke()`.
Instead of `allow_msg_keys` since we've fully flipped over to
struct-types for msgs in the runtime.
- drop the loop from `MsgStream.receive_nowait()` since
`Yield/Return.pld` getting will handle both (instead of a loop of
`dict`-key reads).
Which matches with renaming `.payload_msg` -> `.expected_msg` which is
the value we attempt to construct from a vanilla-msgppack
decode-to-`dict` and then construct manually into a `MsgType` using
`.msg.types.from_dict_msg()`. Add a todo to use new `use_pretty` flag
which currently conflicts with `._exceptions.pformat_boxed_type()`
prefix formatting..
Add a bit of special handling for msg-type-errors with a dedicated
log-msg detailing which `.side: str` is the sender/causer and avoiding
a `._scope.cancel()` call in such cases since the local task might be
written to handle and tolerate the badly (typed) IPC msg.
As part of ^, change the ctx task-pair "side" semantics from "caller" ->
"callee" to be "parent" -> "child" which better matches the
cross-process SC-linked-task supervision hierarchy, and
`trio.Nursery.parent_task`; in `trio` the task that opens a nursery is
also named the "parent".
Impl deats / fixes around the `.side` semantics:
- ensure that `._portal: Portal` is set ASAP after
`Actor.start_remote_task()` such that if the `Started` transaction
fails, the parent-vs.-child sides are still denoted correctly (since
`._portal` being set is the predicate for that).
- add a helper func `Context.peer_side(side: str) -> str:` which inverts
from "child" to "parent" and vice versa, useful for logging info.
Other tweaks:
- make `_drain_to_final_msg()` return a tuple of a maybe-`Return` and
the list of other `pre_result_drained: list[MsgType]` such that we
don't ever have to warn about the return msg getting captured as
a pre-"result" msg.
- Add some strictness flags to `.started()` which allow for toggling
whether to error or warn log about mismatching roundtripped `Started`
msgs prior to IPC transit.
Display the new `MsgCodec.pld_spec_str` and format the incorrect field
value to be placed entirely (txt block wise) right of the "type annot"
part of the line:
Iow if you had a bad `dict` value where something else should be it'd
look something like this:
<Started(
|_pld: NamespacePath = {'cid': '3e0ca00c-7d32-4d2a-a0c2-ac2e12453871',
'locked': True,
'msg_type': 'LockStatus',
'subactor_uid': ['sub', 'af7ccb69-1dab-491f-84f7-2ec42c32d137']}
- add some `tb_str: str` indent-prefix args for diff indent levels for the
body vs. the surrounding "ascii box".
- ^-use it-^ from `RemoteActorError.__repr()__` obvi.
- use new `msg.types.from_dict_msg()` in impl of
`MsgTypeError.payload_msg`, handy for showing what the message "would
have looked like in `Struct` form" had it not failed it's type
constraints.
Sure makes console grokability a lot better by showing only the
customizeable fields.
Further, clean up `mk_codec()` a bunch by removing the `ipc_msg_spec`
param since we don't plan to support another msg-set (for now) which
allows cleaning out a buncha logic that was mostly just a source of
bugs..
Also,
- add temporary `log.info()` around codec application.
- throw in some sanity `assert`s to `limit_msg_spec()`.
- add but mask out the `extend_msg_spec()` idea since it seems `msgspec`
won't allow `Decoder.type` extensions when using a custom `dec_hook()`
for some extension type.. (not sure what approach to take here yet).
Handy for re-constructing a struct-`MsgType` from a `dict` decoded from
wire-bytes wherein the msg failed to decode normally due to a field type
error but you'd still like to show the "potential" msg in struct form,
say inside a `MsgTypeError`'s meta data.
Supporting deats:
- add a `.msg.types.from_dict_msg()` to implement it (the helper).
- also a `.msg.types._msg_table: dict[str, MsgType]` for supporting this
func ^ as well as providing just a general `MsgType`-by-`str`-name
lookup.
Unrelated:
- Drop commented idea for still supporting `dict`-msg set via
`enc/dec_hook()`s that would translate to/from `MsgType`s, but that
would require a duplicate impl in the runtime.. so eff that XD
Mostly removing commented (and replaced) code blocks lingering from the
ctxc semantics work and new typed-msg-spec `MsgType`s handling AND use
the new `._exceptions.pack_from_raise()` helper to construct
`StreamOverrun` msgs.
Deaterz:
- clean out the drain loop now that it's implemented to handle our
struct msg types including the `dict`-msg bits left in as
fallback-reminders, any notes/todos better summarized at the top of
their blocks, remove any `_final_result_is_set()` related duplicate/legacy
tidbits.
- use a `case Error()` block in drain loop with fallthrough to `_:`
always resulting in an rte raise.
- move "XXX" notes into the doc-string for `._deliver_msg()` as
a "rules" section.
- use `match:` syntax for logging the `result_or_err: MsgType` outcome
from the final `.result()` call inside `open_context_from_portal()`.
- generally speaking use `MsgType` type annotations throughout!
Such that `Channel.recv()` + `MsgpackTCPStream.recv()` originating
msg-type-errors are not raised at the IPC transport layer but instead
relayed up the runtime stack for eventual handling by user-app code via
the `Context`/`MsgStream` layer APIs.
This design choice leads to a substantial amount of flexibility and
modularity, and avoids `MsgTypeError` handling policies from being
coupled to a particular backend IPC transport layer:
- receive-side msg-type errors, as can be raised and handled in the
`.open_stream()` "nasty" phase of a ctx, whilst being packed at the
`MsgCodec`/transport layer (keeping the underlying src decode error
coupled to the specific transport + interchange lib) and then relayed
upward to app code for custom handling like a normal Error` msg.
- the policy options for handling such cases could be implemented as
`@acm` wrappers around `.open_context()`/`.open_stream()` blocks (and
their respective delivered primitives) OR just plain old async
generators around `MsgStream.receive()` such that both built-in policy
handling and custom user-app solutions can be swapped without touching
any `tractor` internals or providing specialized "registry APIs".
-> eg. the ignore and relay-invalid-msg-to-sender approach can be more
easily implemented as embedded `try: except MsgTypeError:` blocks
around `MsgStream.receive()` possibly applied as either of an
injected wrapper type around a stream or an async gen that `async
for`s from the stream.
- any performance based AOT-lang extensions used to implement a policy
for handling recv-side errors space can avoid knowledge of the lower
level IPC `Channel` (and-downward) primitives.
- `Context` consuming code can choose to let all msg-type-errs
bubble and handle them manually (like any other remote `Error`
shuttled exception).
- we can keep (as before) send-side msg type checks can be raised
locally and cause offending senders to error and adjust before the
streaming phase of an IPC ctx.
Impl (related) deats:
- obvi make `MsgpackTCPStream.recv()` yield up any `MsgTypeError`
constructed by `_mk_msg_type_err()` such that the exception will
eventually be relayed up to `._rpc.process_messages()` and from
their delivered to the corresponding ctx-task.
- in support of ^, make `Channel.recv()` detect said mtes and use the
new `pack_from_raise()` to inject the far end `Actor.uid` for the
`Error.src_uid`.
- keep raising the send side equivalent (when strict enabled) errors
inline immediately with no upward `Error` packing or relay.
- improve `_mk_msg_type_err()` cases handling with far more detailed
`MsgTypeError` "message" contents pertaining to `msgspec` specific
failure-fixing-tips and type-spec mismatch info:
* use `.from_decode()` constructor in recv-side case to inject the
non-spec decoded `msg_dict: dict` and use the new
`MsgCodec.pld_spec_str: str` when clarifying the type discrepancy
with the offending field.
* on send-side, if we detect that an unsupported field type was
described in the original `src_type_error`, AND there is no
`msgpack.Encoder.enc_hook()` set, that the real issue is likely
that the user needs to extend the codec to support the
non-std/custom type with a hook and link to `msgspec` docs.
* if one of a `src_type/validation_error` is provided, set that
error as the `.__cause__` in the new mte.
Make a new `MsgType: TypeAlias` for the union of all msg types such that
it can be used in annots throughout the code base; just make
`.msg.__msg_spec__` delegate to it.
Add some new codec methods:
- `pld_spec_str`: for the `str`-casted value of the payload spec,
generally useful in logging content.
- `msg_spec_items()`: to render a `dict` of msg types to their
`str()`-casted values with support for singling out a specific
`MsgType`, type by input `msg` instance.
- `pformat_msg_spec()`: for rendering the (partial) `.msg_spec` as
a formatted `str` useful in logging.
Oh right, add a `Error._msg_dict: dict` in support of the previous
commit (for `MsgTypeError` packing as RAEs) such that our error msg type
can house a non-type-spec decoded wire-bytes for error
reporting/analysis purposes.
Since in the receive-side error case the source of the exception is the
sender side (normally causing a local `TypeError` at decode time), might
as well bundle the error in remote-capture-style using boxing semantics
around the causing local type error raised from the
`msgspec.msgpack.Decoder.decode()` and with a traceback packed from
`msgspec`-specific knowledge of any field-type spec matching failure.
Deats on new `MsgTypeError` interface:
- includes a `.msg_dict` to get access to any `Decoder.type`-applied
load of the original (underlying and offending) IPC msg into
a `dict` form using a vanilla decoder which is normally packed into
the instance as a `._msg_dict`.
- a public getter to the "supposed offending msg" via `.payload_msg`
which attempts to take the above `.msg_dict` and load it manually into
the corresponding `.msg.types.MsgType` struct.
- a constructor `.from_decode()` to make it simple to build out error
instances from a failed decode scope where the aforementioned
`msgdict: dict` from the vanilla decode can be provided directly.
- ALSO, we now pack into `MsgTypeError` directly just like ctxc in
`unpack_error()`
This also completes the while-standing todo for `RemoteActorError` to
contain a ref to the underlying `Error` msg as `._ipc_msg` with public
`@property` access that `defstruct()`-creates a pretty struct version
via `.ipc_msg`.
Internal tweaks for this include:
- `._ipc_msg` is the internal literal `Error`-msg instance if provided
with `.ipc_msg` the dynamic wrapper as mentioned above.
- `.__init__()` now can still take variable `**extra_msgdata` (similar
to the `dict`-msgdata as before) to maintain support for subtypes
which are constructed manually (not only by `pack_error()`) and insert
their own attrs which get placed in a `._extra_msgdata: dict` if no
`ipc_msg: Error` is provided as input.
- the `.msgdata` is now a merge of any `._extra_msgdata` and
a `dict`-casted form of any `._ipc_msg`.
- adjust all previous `.msgdata` field lookups to try equivalent field
reads on `._ipc_msg: Error`.
- drop default single ws indent from `.tb_str` and do a failover lookup
to `.msgdata` when `._ipc_msg is None` for the manually constructed
subtype-instance case.
- add a new class attr `.extra_body_fields: list[str]` to allow subtypes
to declare attrs they want shown in the `.__repr__()` output, eg.
`ContextCancelled.canceller`, `StreamOverrun.sender` and
`MsgTypeError.payload_msg`.
- ^-rework defaults pertaining to-^ with rename from
`_msgdata_keys` -> `_ipcmsg_keys` with latter now just loading directly
from the `Error` fields def and `_body_fields: list[str]` just taking
that value and removing the not-so-useful-in-REPL or already shown
(i.e. `.tb_str: str`) field names.
- add a new mod level `.pack_from_raise()` helper for auto-boxing RAE
subtypes constructed manually into `Error`s which is normally how
`StreamOverrun` and `MsgTypeError` get created in the runtime.
- in support of the above expose a `src_uid: tuple` override to
`pack_error()` such that the runtime can provide any remote actor id
when packing a locally-created yet remotely-caused RAE subtype.
- adjust all typing to expect `Error`s over `dict`-msgs.
Adjust some tests to match these changes:
- context and inter-peer-cancel tests to make their `.msgdata` related
checks against the new `.ipc_msg` as well and `.tb_str` directly.
- toss in an extra sleep to `sleep_a_bit_then_cancel_peer()` to keep the
'canceller' ctx child task cancelled by it's parent in the 'root' for
the rte-raised-during-ctxc-handling case (apparently now it's
returning too fast, cool?).
Better describes the internal RPC impl/latest-architecture with the msgs
delivered being those which either define a `.pld: PayloadT` that gets
passed up to user code, or the error-msg subset that similarly is raised
in a ctx-linked task.
These are likely temporary changes but still needed to actually see the
desired/correct failures (of which 5 of 6 tests are supposed to fail rn)
mostly to do with `Start` and `Return` msgs which are invalid under each
test's applied msg-spec.
Tweak set here:
- bit more `print()`s in root and sub for grokin test flow.
- never use `pytes.fail()` in subactor.. should know this by now XD
- comment out some bits that can't ever pass rn and make the underlying
expected failues harder to grok:
- the sub's child-side-of-ctx task doing sends should only fail
for certain msg types like `Started` + `Return`, `Yield`s are
processed receiver/parent side.
- don't expect `sent` list to match predicate set for the same reason
as last bullet.
The outstanding msg-type-semantic validation questions are:
- how to handle `.open_context()` with an input `kwargs` set that
doesn't adhere to the currently applied msg-spec?
- should the initial `@acm` entry fail before sending to the child
side?
- where should received `MsgTypeError`s be raised, at the `MsgStream`
`.receive()` or lower in the stack?
- i'm thinking we should mk `MsgTypeError` derive from
`RemoteActorError` and then have it be delivered as an error to the
`Context`/`MsgStream` for per-ctx-task handling; would lead to more
flexible/modular policy overrides in user code outside any defaults
we provide.
Mainly expanding out the runtime endpoints for cancellation to separate
cases and flattening them with the main RPC-request-invoke block, moving
the non-cancel runtime case (where we call `getattr(actor, funcname)`)
inside the main `Start` case (for now) which branches on `ns=="self"`.
Also, add a new IPC msg `class CancelAck(Return):` which is always
included in the default msg-spec such that runtime cancellation (and
eventually all) endpoints return that msg (instead of a `Return`) and
thus sidestep any currently applied `MsgCodec` such that the results
(`bool`s for most cancel methods) are never violating the current type
limit(s) on `Msg.pld`. To support this expose a new variable
`return_msg: Return|CancelAck` param from
`_invoke()`/`_invoke_non_context)()` and set it to `CancelAck` in the
appropriate endpoint case-blocks of the msg loop.
Clean out all the lingering legacy `chan.send(<dict-msg>)` commented
codez from the invoker funcs, with more cleaning likely to come B)
Pretty sure we haven't *needed it* for a while, it was always generally
hazardous in terms of IPC msg types, AND it's definitely incompatible
with a dynamically applied typed msg spec: you can't just expect
a `None` to be willy nilly handled all the time XD
For now I'm masking out all the code and leaving very detailed
surrounding notes but am not removing it quite yet in case for strange
reason it is needed by some edge case (though I haven't found according
to the test suite).
Backstory:
------ - ------
Originally (i'm pretty sure anyway) it was added as a super naive
"remote cancellation" mechanism (back before there were specific `Actor`
methods for such things) that was mostly (only?) used before IPC
`Channel` closures to "more gracefully cancel" the connection's parented
RPC tasks. Since we now have explicit runtime-RPC endpoints for
conducting remote cancellation of both tasks and full actors, it should
really be removed anyway, because:
- a `None`-msg setinel is inconsistent with other RPC endpoint handling
input patterns which (even prior to typed msging) had specific
msg-value triggers.
- the IPC endpoint's (block) implementation should use
`Actor.cancel_rpc_tasks(parent_chan=chan)` instead of a manual loop
through a `Actor._rpc_tasks.copy()`..
Deats:
- mask the `Channel.send(None)` calls from both the `Actor._stream_handler()` tail
as well as from the `._portal.open_portal()` was connected block.
- mask the msg loop endpoint block and toss in lotsa notes.
Unrelated tweaks:
- drop `Actor._debug_mode`; unused.
- make `Actor.cancel_server()` return a `bool`.
- use `.msg.pretty_struct.Struct.pformat()` to show any msg that is
ignored (bc invalid) in `._push_result()`.
Add both the `.send()` and `.recv()` handling blocks to a common
`_raise_msg_type_err()` which includes detailed error msg formatting:
- the `.recv()` side case does introspection of the `Msg` fields and
attempting to report the exact (field type related) issue
- `.send()` side does some boxed-error style tb formatting like
`RemoteActorError`.
- add a `strict_types: bool` to `.send()` to allow for just
warning on bad inputs versus raising, but always raise from any
`Encoder` type error.
As detailed in the surrounding notes, it's pretty advantageous to always
have the child context task ensure the first msg it relays back is
msg-type checked against the current spec and thus `MsgCodec`. Implement
the check via a simple codec-roundtrip of the `Started` msg such that
the `.pld` payload is always validated before transit. This ensures the
child will fail early and notify the parent before any streaming takes
place (i.e. the "nasty" dialog protocol phase).
The main motivation here is to avoid inter-actor task syncing bugs that
are hard(er) to recover from and/or such as if an invalid typed msg is
sent to the parent, who then ignores it (depending on config), and then
the child thinks the parent is in some presumed state while the parent
is still thinking a first msg has yet to arrive. Doing the stringent
check on the sender side (i.e. the child is sending the "first"
application msg via `.started()`) avoids/sidesteps dealing with such
syncing/coordinated-state problems by keeping the entire IPC dialog in
a "cheap" or "control" style transaction up until a stream is opened.
Iow, the parent task's `.open_context()` block entry can't occur until
the child side is definitely (as much as is possible with IPC msg type
checking) in a correct state spec wise. During any streaming phase in
the dialog the msg-type-checking is NOT done for performance (the
"nasty" protocol phase) and instead any type errors are relayed back
from the receiving side. I'm still unsure whether to take the same
approach on the `Return` msg, since at that point erroring early doesn't
benefit the parent task if/when a msg-type error occurs? Definitely more
to ponder and tinker out here..
Impl notes:
- a gotcha with the roundtrip-codec-ed msg is that it often won't match
the input `value` bc in the `msgpack` case many native python
sequence/collection types will map to a common array type due to the
surjection that `msgpack`'s type-sys imposes.
- so we can't assert that `started == rt_started` but it may be useful
to at least report the diff of the type-reduced payload so that the
caller can at least be notified how the input `value` might be
better type-casted prior to call, for ex. pre-casting to `list`s.
- added a `._strict_started: bool` that could provide the stringent
checking if desired in the future.
- on any validation error raise our `MsgTypeError` from it.
- ALSO change over the lingering `.send_yield()` deprecated meth body
to use a `Yield()`.
Such that the current `kwargs: dict` field can eventually be strictly
msg-typed (eventually directly from a `@context` def) using modern typed
python's hippest syntactical approach B)
Also proto a new `CancelAck(Return)` subtype msg for supporting msg-spec
agnostic `Actor.cancel_xx()` method calls in the runtime such that
a user can't break cancellation (and thus SC) by dynamically setting
a codec that doesn't allow `bool` results (as an eg. in this case).
Note that the msg isn't used yet in `._rpc` but that's a comin!
Set a diff `Msg.pld` spec per test and then send multiple types to
a child actor making sure the child can only send certain types over
a stream and fails with validation or decode errors ow. The test is also
param-ed both with and without hooks demonstrating how a custom type,
`NamespacePath`, needs them for effective use. The subactor IPC context
child is passed a `expect_ipc_send: dict` which relays the values along
with their expected `.send()`-ability.
Deats on technical refinements:
------ - ------
- added a `iter_maybe_sends()` send-value-as-msg-auditor and predicate
generator (literally) so as to be able to pre-determine if given the
current codec and `send_values` which values are expected to be IPC
transmittable.
- as per ^, the diff value-msgs are first round-tripped inside
a `Started` msg using the configured codec in the parent/root actor
before bothering with using IPC primitives + a subactor; this is how
the `expect_ipc_send` table is generated initially.
- for serializing the specs (`Union[Type]`s as required by `msgspec`),
added a pair of codec hooks: `enc/dec_type_union()` (that ideally we
move into a `.msg` submod eventually) which code the type-values as
a `list[str]` of names.
- the `dec_` hook had to be modified to NOT raise an error when an
invalid/unhandled value arrives, this is because we do NOT want the
RPC msg handling loop to raise on the `async for msg in chan:` and
instead prefer to ignore and warn (for now, but eventually respond
with error msg - see notes in hook body) these msgs when sent during
a streaming phase; `Context.started()` will however error on a bad
input for the current msg-spec since it is part of the "cheap"
dialog (again see notes in `._context`) wherein the `Started` msg
is always roundtripped prior to `Channel.send()` to guarantee
the child adheres to its own spec.
- tossed in lotsa `print()`s for console groking of the run progress.
Further notes on typed-msging breaking cancellation:
------ - ------
- turns out since the runtime's cancellation implementation, being done
with `Actor.cancel()` methods and friends will actually break when
a stringent spec is applied (eg. a single type-spec) since the return
values from said methods are generally `bool`s..
- this means we do indeed need special handling of "runtime RPC method
invocations" since ideally a user's msg-spec choices do not break core
functionality on them XD
=> The obvi solution is to add a/some special sub-`Msg` types for such
cases, possibly just a `RuntimeReturn(Return)` type that will always
include a `.pld: bool` for these cancel methods such that their
results are always handled without msg type errors.
More to come on a (hopefully) elegant solution to that last bit!
Since I needed the `break_ipc()` helper from the
`examples/advanced_faults/ipc_failure_during_stream.py` used in the
`test_advanced_faults` suite, might as well move it into a pkg-wide
importable module. Also changed the default break method to be
`socket_close` which just calls `Stream.socket.close()` underneath in
`trio`.
Also tweak that example to not keep sending after the stream has been
broken since with new `trio` that will raise `ClosedResourceError` and
in the wrapping test we generally speaking want to see a hang and then
cancel via simulated user sent SIGINT/ctl-c.
Yes, this is "the switch" and will likely cause the test suite to bail
until a few more fixes some in.
Tweaked a couple `.msg` pkg exports:
- remove `__spec__` (used by modules) and change it to `__msg_types:
lists[Msg]` as well as add a new `__msg_spec__: TypeAlias`, being the
default `Any` paramed spec.
- tweak the naming of `msg.types` lists of runtime vs payload msgs to:
`._runtime_msgs` and `._payload_msgs`.
- just build `__msg_types__` out of the above 2 lists.
Since with my in-index runtime-port to our native msg-spec it seems
these ones are hanging B(
- `test_one_end_stream_not_opened()`
- `test_maybe_allow_overruns_stream()`
Tossing in some `trio.fail_after()`s seems to at least gnab them as
failures B)
Though the runtime hasn't been changed over in this patch (it was in the
local index at the time however), the test does now demonstrate that
using a `Started` the correctly typed `.pld` will codec correctly when
passed manually to `MsgCodec.encode/decode()`.
Despite not having the runtime ported to the new shuttle msg set
(meaning the mentioned test will fail without the runtime port patch),
I was able to get this first original test working that limits payload
packets as a `Msg.pld: NamespacePath`this as long as we spec
`enc/dec_hook()`s then the `Msg.pld` will be processed correctly as per:
https://jcristharif.com/msgspec/extending.html#mapping-to-from-native-types
in both the `Any` and `NamespacePath|None` spec cases.
^- turns out in this case -^ that the codec hooks only get invoked on
the unknown-fields NOT the entire `Struct`-msg.
A further gotcha was merging a `|None` into the `pld_spec` since this
test spawns a subactor and opens a context via `send_back_nsp()` and
that func has no explicit `return` - so of course it delivers
a `Return(pld=None)` which will fail if we only spec `NamespacePath`.
Since `contextvars.ContextVar` seems to reset to the default in every
new task, switching to using `trio.lowlevel.RunVar` kinda gets close to
what we'd like where a child scope can override what's in the rent but
ideally without modifying the rent's. I tried `tricycle.TreeVar` as well
but it also seems to reset across (embedded) nurseries in our runtime;
need to try it again bc apparently that's not how it's suppose to work?
NOTE that for now i'm keeping the `.msg.types._ctxvar_MsgCodec` set to
the `msgspec` default (`Any` types) so that the test suite will still
pass until the runtime is ported to the new msg-spec + codec.
Surrounding and in support of all this the `Msg`-set impl deats changed
a bit as well as various stuff in `.msg` sub-mods:
- drop the `.pld` struct types for `Error`, `Start`, `StartAck` since we
don't really need the `.pld` payload field in those cases since
they're runtime control msgs for starting RPC tasks and handling
remote errors; we can just put the fields directly on each msg since
the user will never want/need to override the `.pld` field type.
- add a couple new runtime msgs and include them in `msg.__spec__`
and make them NOT inherit from `Msg` since they are runtime-specific
and thus have no need for `.pld` type constraints:
- `Aid` the actor-id identity handshake msg.
- `SpawnSpec`: the spawn data passed from a parent actor down to a
a child in `Actor._from_parent()` for which we need a shuttle
protocol msg, so might as well make it a pendatic one ;)
- fix some `Actor.uid` field types that were type-borked on `Error`
- add notes about how we need built-in `debug_mode` msgs in order to
avoid msg-type errors when using the TTY lock machinery and
a different `.pld` spec then the default `Any` is in use..
-> since `devx._debug.lock_tty_for_child()` and it's client side
`wait_for_parent_stdin_hijack()` use `Context.started('Locked')`
and `MsgStream.send('pdb_unlock')` string values as their `.pld`
contents we'd need to either always do a `ipc_pld_spec | str` or
pre-define some dedicated `Msg` types which get `Union`-ed in
for this?
- break out `msg.pretty_struct.Struct._sin_props()` into a helper func
`iter_fields()` since the impl doesn't require a struct instance.
- as mentioned above since `ContextVar` didn't work as anticipated
I next tried `tricycle.TreeVar` but that too didn't seem to keep
the `apply_codec()` setting intact across
`Portal.open_context()`/`Context.open_stream()` (it kept reverting to
the default `.pld: Any` default setting) so I finalized on
a trio.lowlevel.RunVar` for now despite it basically being
a `global`..
-> will probably come back to test this with `TreeVar` and some hot
tips i picked up from @mikenerone in the `trio` gitter, which i put in
comments surrounding proto-code.
Turns out the generics based payload speccing API, as in
https://jcristharif.com/msgspec/supported-types.html#generic-types,
DOES WORK properly as long as we don't rely on inheritance from `Msg`
a parent `Generic`..
So let's get real pedantic in the `mk_msg_spec()` internals as well as
verification in the test suite!
Fixes in `.msg.types`:
- implement (as part of tinker testing) multiple spec union building
methods via a `spec_build_method: str` to `mk_msg_spec()` and leave a
buncha notes around what did and didn't work:
- 'indexed_generics' is the only method THAT WORKS and the one that
you'd expect being closest to the `msgspec` docs (link above).
- 'defstruct' using dynamically defined msgs => doesn't work!
- 'types_new_class' using dynamically defined msgs but with
`types.new_clas()` => ALSO doesn't work..
- explicitly separate the `.pld` type-constrainable by user code msg
set into `types._payload_spec_msgs` putting the others in
a `types._runtime_spec_msgs` and the full set defined as `.__spec__`
(moving it out of the pkg-mod and back to `.types` as well).
- for the `_payload_spec_msgs` msgs manually make them inherit `Generic[PayloadT]`
and (redunantly) define a `.pld: PayloadT` field.
- make `IpcCtxSpec.functype` an in line `Literal`.
- toss in some TODO notes about choosing a better `Msg.cid` type.
Fixes/tweaks around `.msg._codec`:
- rename `MsgCodec.ipc/pld_msg_spec` -> `.msg/pld_spec`
- make `._enc/._dec` non optional fields
- wow, ^facepalm^ , make sure `._ipc.MsgpackTCPStream.__init__()` uses
`mk_codec()` since `MsgCodec` can't be (easily) constructed directly.
Get more detailed in testing:
- inside the `chk_pld_type()` helper ensure `roundtrip` is always set to
some value, `None` by default but a bool depending on legit outcome.
- drop input `generic`; no longer used.
- drop the masked `typedef` loop from `Msg.__subclasses__()`.
- for add an `expect_roundtrip: bool` and use to jump into debugger
when any expectation doesn't match the outcome.
- use new `MsgCodec` field names (as per first section above).
- ensure the encoded msg matches the decoded one from both the ad-hoc
decoder and codec loaded values.
- ensure the pld checking is only applied to msgs in the
`types._payload_spec_msgs` set by `typef.__name__` filtering
since `mk_msg_spec()` now returns the full `.types.Msg` set.
Mostly adjusting input args/logic to various spec/codec signatures and
new runtime semantics:
- `test_msg_spec_xor_pld_spec()` to verify that a shuttle prot spec and
payload spec are necessarily mutex and that `mk_codec()` enforces it.
- switch to `ipc_msg_spec` input in `mk_custom_codec()` helper.
- drop buncha commented cruft from `test_limit_msgspec()` including no
longer needed type union instance checks in dunder attributes.
Instead just instantiate `msgpack.Encoder/Decoder` instances inside
`mk_codec()` and assign them directly as `._enc/._dec` fields.
Explicitly take in named-args to both and proxy to the coder/decoder
instantiation calls directly.
Shuffling some codec internals:
- rename `mk_codec()` inputs as `ipc_msg_spec` and `ipc_pld_spec`, make
them mutex such that a payload type spec can't be passed if the
built-in msg-spec isn't used.
=> expose `MsgCodec.ipc_pld_spec` directly from `._dec.type`
=> presume input `ipc_msg_spec` is `Any` by default when no
`ipc_pld_spec` is passed since we have no way atm to enable
a similar type-restricted-payload feature without a wrapping
"shuttle protocol" ;)
- move all the payload-sub-decoders stuff prototyped in GH#311
(inside `.types`) to `._codec` as commented-for-later-maybe `MsgCodec`
methods including:
- `.mk_pld_subdec()` for registering
- `.enc/dec_payload()` for sub-codec field loading.
- also comment out `._codec.mk_tagged_union_dec()` as the orig
tag-to-decoder table factory, now mostly superseded by
`.types.mk_msg_spec()` which takes the generic parameterizing approach
instead.
- change naming to `types.mk_msg_spec(payload_type_union)` input, making
it more explicit that it expects a `Union[Type]`.
Oh right, and start exposing all the `.types.Msg` subtypes in the `.msg`
subpkg in prep for usage throughout the runtime B)
Re-arranging such that element-orders are line-arranged to our new
IPC `.msg.types.Msg` fields spec in prep for replacing the current
`dict`-as-msg impls with the `msgspec.Struct` native versions!
As per the long outstanding GH issue this starts our rigorous journey
into an attempt at a type-safe, cross-actor SC, IPC protocol Bo
boop -> https://github.com/goodboy/tractor/issues/36
The idea is to "formally" define our SC "shuttle (dialog) protocol" by
specifying a new `.msg.types.Msg` subtype-set which can fully
encapsulate all IPC msg schemas needed in order to accomplish
cross-process SC!
The msg set deviated a little in terms of (type) names from the existing
`dict`-msgs currently used in the runtime impl but, I think the name
changes are much better in terms of explicitly representing the internal
semantics of the actor runtime machinery/subsystems and the
IPC-msg-dialog required for SC enforced RPC.
------ - ------
In cursory, the new formal msgs-spec includes the following msg-subtypes
of a new top-level `Msg` boxing type (that holds the base field schema
for all msgs):
- `Start` to request RPC task scheduling by passing a `FuncSpec` payload
(to replace the currently used `{'cmd': ... }` dict msg impl)
- `StartAck` to allow the RPC task callee-side to report a `IpcCtxSpec`
payload immediately back to the caller (currently responded naively via
a `{'functype': ... }` msg)
- `Started` to deliver the first value from `Context.started()`
(instead of the existing `{'started': ... }`)
- `Yield` to shuttle `MsgStream.send()`-ed values (instead of
our `{'yield': ... }`)
- `Stop` to terminate a `Context.open_stream()` session/block
(over `{'stop': True }`)
- `Return` to deliver the final value from the `Actor.start_remote_task()`
(which is a `{'return': ... }`)
- `Error` to box `RemoteActorError` exceptions via a `.pld: ErrorData`
payload, planned to replace/extend the current `RemoteActorError.msgdata`
mechanism internal to `._exceptions.pack/unpack_error()`
The new `tractor.msg.types` includes all the above msg defs as well an API
for rendering a "payload type specification" using a
`payload_type_spec: Union[Type]` that can be passed to
`msgspec.msgpack.Decoder(type=payload_type_spec)`. This ensures that
(for a subset of the above msg set) `Msg.pld: PayloadT` data is
type-parameterized using `msgspec`'s new `Generic[PayloadT]` field
support and thus enables providing for an API where IPC `Context`
dialogs can strictly define the allowed payload-datatype-set via type
union!
Iow, this is the foundation for supporting `Channel`/`Context`/`MsgStream`
IPC primitives which are type checked/safe as desired in GH issue:
- https://github.com/goodboy/tractor/issues/365
Misc notes on current impl(s) status:
------ - ------
- add a `.msg.types.mk_msg_spec()` which uses the new `msgspec` support
for `class MyStruct[Struct, Generic[T]]` parameterize-able fields and
delivers our boxing SC-msg-(sub)set with the desired `payload_types`
applied to `.pld`:
- https://jcristharif.com/msgspec/supported-types.html#generic-types
- as a note this impl seems to need to use `type.new_class()` dynamic
subtype generation, though i don't really get *why* still.. but
without that the `msgspec.msgpack.Decoder` doesn't seem to reject
`.pld` limited `Msg` subtypes as demonstrated in the new test.
- around this ^ add a `.msg._codec.limit_msg_spec()` cm which exposes
this payload type limiting API such that it can be applied per task
via a `MsgCodec` in app code.
- the orig approach in https://github.com/goodboy/tractor/pull/311 was
the idea of making payload fields `.pld: Raw` wherein we could have
per-field/sub-msg decoders dynamically loaded depending on the
particular application-layer schema in use. I don't want to lose the
idea of this since I think it might be useful for an idea I have about
capability-based-fields(-sharing, maybe using field-subset
encryption?), and as such i've kept the (ostensibly) working impls in
TODO-comments in `.msg._codec` wherein maybe we can add
a `MsgCodec._payload_decs: dict` table for this later on.
|_ also left in the `.msg.types.enc/decmsg()` impls but renamed as
`enc/dec_payload()` (but reworked to not rely on the lifo codec
stack tables; now removed) such that we can prolly move them to
`MsgCodec` methods in the future.
- add an unused `._codec.mk_tagged_union_dec()` helper which was
originally factored out the #311 proto-code but didn't end up working
as desired with the new parameterized generic fields approach (now
in `msg.types.mk_msg_spec()`)
Testing/deps work:
------ - ------
- new `test_limit_msgspec()` which ensures all the `.types` content is
correct but without using the wrapping APIs in `._codec`; i.e. using
a in-line `Decoder` instead of a `MsgCodec`.
- pin us to `msgspec>=0.18.5` which has the needed generic-types support
(which took me way too long yester to figure out when implementing all
this XD)!
Leave all the proto native struct-msg stuff in `.types` since i'm
thinking it's the right name for the mod that will hold all the built-in
SCIPP msgspecs longer run. Obvi the naive codec stack stuff needs to be
cleaned out/up and anything useful moved into `._codec` ;)
The greasy details are strewn throughout a `msgspec` issue:
https://github.com/jcrist/msgspec/issues/140
and specifically this code was mostly written as part of POC example in
this comment:
https://github.com/jcrist/msgspec/issues/140#issuecomment-1177850792
This work obviously pertains to our desire and prep for typed messaging
and capabilities aware msg-oriented-protocols in #196. I added a "wants
to have" method to `Context` showing how I think we could offer a pretty
neat msg-type-set-as-capability-for-protocol system.
XXX NOTE XXX: this commit was rewritten during a rebase from a very old
version as per the prior commit.
XXX NOTE XXX: this is a heavily modified commit from the original
(ec226463) which was super out of date when rebased onto the current
branch. I went through a manual conflict rework and removed all the
legacy segments as well as rename-moved this original mod
`tractor.msg.py` -> `tractor.msg/_old_msg.py`. Further the
`NamespacePath` type def was discarded from this mod since it was from
a super old version which was already moved to a `.msg.ptr` submod.
As per original questions and discussion with `msgspec` author:
- https://github.com/jcrist/msgspec/issues/25
- https://github.com/jcrist/msgspec/issues/140
this prototypes a new (but very naive) `msgspec.Struct` codec
implementation which will be more filled out in the next commit.
Fitting in line with the issues outstanding:
- #36: (msg)spec-ing out our SCIPP (structured-con-inter-proc-prot).
(https://github.com/goodboy/tractor/issues/36)
- #196: adding strictly typed IPC msg dialog schemas, more or less
better described as "dialog/transaction scoped message specs"
using `msgspec`'s tagged unions and custom codecs.
(https://github.com/goodboy/tractor/issues/196)
- #365: using modern static type-annots to drive capability based
messaging and RPC.
(statically https://github.com/goodboy/tractor/issues/365)
This is a first draft of a new API for dynamically overriding IPC msg
codecs for a given interchange lib from any task in the runtime. Right
now we obviously only support `msgspec` but ideally this API holds
general enough to be used for other backends eventually (like
`capnproto`, and apache arrow).
Impl is in a new `tractor.msg._codec` with:
- a new `MsgCodec` type for encapsing `msgspec.msgpack.Encoder/Decoder`
pairs and configuring any custom enc/dec_hooks or typed decoding.
- factory `mk_codec()` for creating new codecs ad-hoc from a task.
- `contextvars` support for a new `trio.Task` scoped
`_ctxvar_MsgCodec: ContextVar[MsgCodec]` named 'msgspec_codec'.
- `apply_codec()` for temporarily modifying the above per task
as needed around `.open_context()` / `.open_stream()` operation.
A new test (suite) in `test_caps_msging.py`:
- verify a parent and its child can enable the same custom codec (in
this case to transmit `NamespacePath`s) with tons of pedantic ctx-vars
checks.
- ToDo: still need to implement #36 msg types in order to be able to get
decodes working (as in `MsgStream.receive()` will deliver an already
created `NamespacePath` obj) since currently all msgs come packed in `dict`-msg
wrapper packets..
-> use the proto from PR #35 to get nested `msgspec.Raw` processing up
and running Bo
By simply allowing an input `codec: tuple` of funcs for now to the
`MsgpackTCPStream` transport but, ideally wrapping this in a `Codec`
type with an API for dynamic extension of the interchange lib's msg
processing settings. Right now we're tied to `msgspec.msgpack` for this
transport but with the right design this can likely extend to other libs
in the future.
Relates to starting feature work toward #36, #196, #365.