Commit Graph

1115 Commits (5902ad2ffb79c3d0daf5c79feb2309af6fe678ac)

Author SHA1 Message Date
Tyler Goodlet bef3dd9e97 Another tweak to REPL entry `.pdb()` headers 2024-07-05 13:32:03 -04:00
Tyler Goodlet 31207f92ee Finally implement peer-lookup optimization..
There's a been a todo for soo long for this XD

Since all `Actor`'s store a set of `._peers` we can try a lookup on that
table as a shortcut before pinging the registry Bo

Impl deats:
- add a new `._discovery.get_peer_by_name()` routine which attempts the
  `._peers` lookup by combining a copy of that `dict` + an entry added
  for `Actor._parent_chan` (since all subs have a parent and often the
  desired contact is just that connection).
- change `.find_actor()` (for the `only_first == True` case),
  `.query_actor()` and `.wait_for_actor()` to call the new helper and
  deliver appropriate outputs if possible.

Other,
- deprecate `get_arbiter()` def and all usage in tests and examples.
- drop lingering use of `arbiter_sockaddr` arg to various routines.
- tweak the `Actor` doc str as well as some code fmting and a tweak to
  the `._stream_handler()`'s initial `con_status: str` logging value
  since the way it was could never be reached.. oh and `.warning()` on
  any new connections which already have a `_pre_chan: Channel` entry in
  `._peers` so we can start minimizing IPC duplications.
2024-07-04 19:40:11 -04:00
Tyler Goodlet 5f8f8e98ba More-n-more scops annots in logging 2024-07-04 15:06:15 -04:00
Tyler Goodlet b56352b0e4 Quieter `Stop` handling on ctx result capture
In the `drain_to_final_msg()` impl, since a stream terminating
gracefully requires this msg, there's really no reason to `log.cancel()`
about it; go `.runtime()` level instead since we're trying de-noise
under "normal operation".

Also,
- passthrough `hide_tb` to taskc-handler's `ctx.maybe_raise()` call.
- raise `MessagingError` for the `MsgType` unmatched `case _:`.
- detail the doc string motivation a little more.
2024-07-03 22:42:32 -04:00
Tyler Goodlet 9be821a5cf More failed REPL-lock-request refinements
In `lock_stdio_for_peer()` better internal-error handling/reporting:
- only `Lock._blocked.remove(ctx.cid)` if that same cid was added on
  entry to avoid needless key-errors.
- drop all `Lock.release(force: bool)` usage remnants.
- if `req_ctx.cancel()` fails mention it with `ctx_err.add_note()`.
- add more explicit internal-failed-request log messaging via a new
  `fail_reason: str`.
- use and use new `x)<=\n|_` annots in any failure logging.

Other cleanups/niceties:
- drop `force: bool` flag entirely from the `Lock.release()`.
- use more supervisor-op-annots in `.pdb()` logging
  with both `_pause/crash_msg: str` instead of double '|' lines when
  `.pdb()`-reported from `._set_trace()`/`._post_mortem()`.
2024-07-02 17:06:50 -04:00
Tyler Goodlet b46400a86f Use `._entry` proto-ed "lifetime ops" in logging
As per a WIP scribbled out TODO in `._entry.nest_from_op()`, change
a bunch of "supervisor/lifetime mgmt ops" related log messages to
contain some supervisor-annotation "headers" in an effort to give
a terser "visual indication" of how some execution/scope/storage
primitive entity (like an actor/task/ctx/connection) is being operated
on (like, opening/started/closed/cancelled/erroring) from a "supervisor
action" POV.

Also tweak a bunch more emissions to lower levels to reduce noise around
normal inter-actor operations like process and IPC ctx supervision.
2024-07-02 16:31:58 -04:00
Tyler Goodlet 02812b9f51 Reraise RAEs in `MsgStream.receive()`; truncate tbs
To avoid showing lowlevel details of exception handling around the
underlying call to `return await self._ctx._pld_rx.recv_pld(ipc=self)`,
any time a `RemoteActorError` is unpacked (an raised locally) we re-raise
it directly from the captured `src_err` captured so as to present to
the user/app caller-code an exception raised directly from the `.receive()`
frame. This simplifies traceback call-stacks for any `log.exception()`
or `pdb`-REPL output filtering out the lower `PldRx` frames by default.
2024-07-02 16:31:15 -04:00
Tyler Goodlet 3c5816c977 Add `Portal.chan` property, to wrap `._chan` attr 2024-07-02 15:53:33 -04:00
Tyler Goodlet af3745684c More formal `TransportClosed` reporting/raising
Since it was all ad-hoc defined inside
`._ipc.MsgpackTCPStream._iter_pkts()` more or less, this starts
formalizing a way for particular transport backends to indicate whether
a disconnect condition should be re-raised in the RPC msg loop and if
not what log level to report it at (if any).

Based on our lone transport currently we try to suppress any logging
noise from ephemeral connections expected during normal actor
interaction and discovery subsys ops:
- any short lived discovery related TCP connects are only logged as
  `.transport()` level.
- both `.error()` and raise on any underlying `trio.ClosedResource`
  cause since that normally means some task touched transport layer
  internals that it shouldn't have.
- do a `.warning()` on anything else unexpected.

Impl deats:
- extend the `._exceptions.TransportClosed` to accept an input log
  level, raise-on-report toggle and custom reporting & raising via a new
  `.report_n_maybe_raise()` method.
- construct the TCs with inputs per case in (the newly named) `._iter_pkts().
- call ^ this method from the `TransportClosed` handler block inside the
  RPC msg loop thus delegating reporting levels and/or raising to the
  backend's per-case TC instantiating.

Related `._ipc` changes:
- mask out all the `MsgpackTCPStream._codec` debug helper stuff and drop
  any lingering cruft from the initial proto-ing of msg-codecs.
- rename some attrs/methods:
  |_`MsgpackTCPStream._iter_packets()` -> `._iter_pkts()` and
    `._agen` -> `_aiter_pkts`.
  |_`Channel._aiter_recv()` -> `._aiter_msgs()` and
    `._agen` -> `_aiter_msgs`.
- add `hide_tb: bool` support to `Channel.send()` and only show the
  frame on non-MTEs.
2024-07-02 12:21:26 -04:00
Tyler Goodlet 3907cba68e Refine some `.trionics` docs and logging
- allow passing and report the lib name (`trio` or `tractor`) from
  `maybe_open_nursery()`.
- use `.runtime()` level when reporting `_Cache`-hits in
  `maybe_open_context()`.
- tidy up some doc strings.
2024-06-28 19:28:12 -04:00
Tyler Goodlet e3d59964af Woops, set `.cancel()` level in custom levels table.. 2024-06-28 19:27:13 -04:00
Tyler Goodlet edac717613 Use `msgspec.Struct.__repr__()` failover impl
In case the struct doesn't import a field type (which will cause the
`.pformat()` to raise) just report the issue and try to fall back to the
original `repr()` version.
2024-06-28 19:17:05 -04:00
Tyler Goodlet 7e93b81a83 Don't use pretty struct stuff in `._invoke`
It's too fragile to put in side core RPC machinery since
`msgspec.Struct` defs can fail if a field type can't be
looked up at creation time (like can easily happen if you
conditionally import using `if TYPE_CHECKING:`)

Also,
- rename `cs` to `rpc_ctx_cs: CancelScope` since it's literally
  the wrapping RPC `Context._scope`.
- report self cancellation via `explain: str` and add tail case for
  "unknown cause".
- put a ?TODO? around what to do about KBIs if a context is opened
  from an `infected_aio`-actor task.
- similar to our nursery and portal add TODO list for moving all
  `_invoke_non_context()` content out the RPC core and instead implement
  them as `.hilevel` endpoint helpers (maybe as decorators?)which under
  neath define `@context`-funcs.
2024-06-28 19:06:17 -04:00
Tyler Goodlet 4fbd469c33 Update `._entry` actor status log
Log-report the different types of actor exit conditions including cancel
via KBI, error or normal return with varying levels depending on case.

Also, start proto-ing out this weird ascii-syntax idea for describing
conc system states and implement the first bit in a `nest_from_op()`
log-message fmter that joins and indents an obj `repr()` with
a tree-like `'>)\n|_'` header.
2024-06-28 18:45:52 -04:00
Tyler Goodlet 5e009a8229 Further formalize `greenback` integration
Since we more or less require it for `tractor.pause_from_sync()` this
refines enable toggles and their relay down the actor tree as well as
more explicit logging around init and activation.

Tweaks summary:
- `.info()` report the module if discovered during root boot.
- use a `._state._runtime_vars['use_greenback']: bool` activation flag
  inside `Actor._from_parent()` to determine if the sub should try to
  use it and set to `False` if mod-loading fails / not installed.
- expose `maybe_init_greenback()` from `.devx` sugpkg.
- comment out RTE in `._pause()` for now since we already have it in
  `.pause_from_sync()`.
- always `.exception()` on `maybe_init_greenback()` import errors to
  clarify the underlying failure deats.
- always explicitly report if `._state._runtime_vars['use_greenback']`
  was NOT set when `.pause_from_sync()` is called.

Other `._runtime.async_main()` adjustments:
- combine the "internal error call ur parents" message and the failed
  registry contact status into one new `err_report: str`.
- drop the final exception handler's call to
  `Actor.lifetime_stack.close()` since we're already doing it in the
  `finally:` block and the earlier call has no currently known benefit.
- only report on the `.lifetime_stack()` callbacks if any are detected
  as registered.
2024-06-28 14:45:45 -04:00
Tyler Goodlet b72a025d0f Always reset `._state._ctxvar_Context` to prior
Not sure how I forgot this but, obviously it's correct context-var
semantics to revert the current IPC `Context` (set in the latest
`.open_context()` block) such that any prior instance is reset..

This ensures the sanity `assert`s pass inside
`.msg._ops.maybe_limit_plds()` and just in general ensures for any task
that the last opened `Context` is the one returned from
`current_ipc_ctx()`.
2024-06-28 12:59:31 -04:00
Tyler Goodlet 5739e79645 Use `delay=0` in pump loop..
Turns out it does work XD

Prior presumption was from before I had the fute poll-loop so makes
sense we needed more then one sched-tick's worth of context switch vs.
now we can just keep looping-n-pumping as fast possible until the
guest-run's main task completes.

Also,
- minimize the preface commentary (as per todo) now that we have tests
  codifying all the edge cases :finger_crossed:
- parameter-ize the pump-loop-cycle delay and default it to 0.
2024-06-27 19:27:59 -04:00
Tyler Goodlet 2ac999cc3c Prep for legacy RPC API factor-n-remove
This change is adding commentary about the upcoming API removal and
simplification of nursery + portal internals; no actual code changes are
included.

The plan to (re)move the old RPC methods:
- `ActorNursery.run_in_actor()`
- `Portal.run()`
- `Portal.run_from_ns()`

and any related impl internals out of each conc-primitive and instead
into something like a `.hilevel.rpc` set of APIs which then are all
implemented using the newer and more lowlevel `Context`/`MsgStream`
primitives instead Bo

Further,
- formally deprecate the `Portal.result()` meth for
  `.wait_for_result()`.
- only `log.info()` about runtime shutdown in the implicit root case.
2024-06-27 16:25:46 -04:00
Tyler Goodlet 9f9b0b17dc Add a `Context.portal`, more cancel tooing
Might as well add a public maybe-getter for use on the "parent" side
since it can be handy to check out-of-band cancellation conditions (like
from `Portal.cancel_actor()`).

Buncha bitty tweaks for more easily debugging cancel conditions:
- add a `@.cancel_called.setter` for hooking into `.cancel_called = True`
  being set in hard to decipher "who cancelled us" scenarios.
- use a new `self_ctxc: bool` var in `.cancel()` to capture the output
  state from `._is_self_cancelled(remote_error)` at call time so it can
  be compared against the measured value at crash-time (when REPL-ing it
  can often have already changed due to runtime teardown sequencing vs.
  the crash handler hook entry).
- proxy `hide_tb` to `.drain_to_final_msg()` from `.wait_for_result()`.
- use `remote_error.sender` attr directly instead of through
  `RAE.msgdata: dict` lookup.
- change var name `our_uid` -> `peer_uid`; it's not "ours"..

Other various docs/comment updates:
- extend the main class doc to include some other name ideas.
- change over all remaining `.result()` refs to `.wait_for_result()`.
- doc more details on how we want `.outcome` to eventually signature.
2024-06-26 16:26:18 -04:00
Tyler Goodlet 9133f42b07 Solve our abandonment issues..
To make the recent set of tests pass this (hopefully) finally solves all
`asyncio` embedded `trio` guest-run abandonment by ensuring we "pump the
event loop" until the guest-run future is fully complete.

Accomplished via simple poll loop of the form `while not
trio_done_fut.done(): await asyncio.sleep(.1)` in the `aio_main()`
task's exception teardown sequence. The loop does a naive 10ms
"pump-via-sleep & poll" for the `trio` side to complete before finally
exiting (and presumably raising) from the SIGINT cancellation.

Other related cleanups and refinements:
- use `asyncio.Task.result()` inside `cancel_trio()` since it also
  inline-raises any exception outcome and we can also log-report the
  result in non-error cases.
- comment out buncha not-sure-we-need-it stuff in `cancel_trio()`.
- remove the botched `AsyncioCancelled(CancelledError):` idea obvi XD
- comment `greenback` init for now in `aio_main()` since (pretty sure)
  we don't ever want to actually REPL in that specific func-as-task?
- always capture any `fute_err: BaseException` from the `main_outcome:
  Outcome` delivered by the `trio` side guest-run task.
- add and raise a new super noisy `AsyncioRuntimeTranslationError`
  whenever we detect that the guest-run `trio_done_fut` has not
  completed before task exit; should avoid abandonment issues ever
  happening again without knowing!
2024-06-26 13:48:36 -04:00
Tyler Goodlet 4f1db1ff52 Lel, revert `AsyncioCancelled` inherit, module..
Turns out it somehow breaks our `to_asyncio` error relay since obvi
`asyncio`'s runtime seems to specially handle it (prolly via
`isinstance()` ?) and it caused our
`test_aio_cancelled_from_aio_causes_trio_cancelled()` to hang..
Further, obvi `unpack_error()` won't be able to find the type def if not
kept inside `._exceptions`..

So given all that, revert the change/move as well as:
- tweak the aio-from-aio cancel test to timeout.
- do `trio.sleep()` conc with any bg aio task by moving out nursery
  block.
- add a `send_sigint_to: str` parameter to
  `test_sigint_closes_lifetime_stack()` such that we test the SIGINT
  being relayed to just the parent or the child.
2024-06-25 23:47:14 -04:00
Tyler Goodlet a870df68c0 Hack `asyncio` to not abandon a guest-mode run?
Took me a while to figure out what the heck was going on but, turns out
`asyncio` changed their SIGINT handling in 3.11 as per:

https://docs.python.org/3/library/asyncio-runner.html#handling-keyboard-interruption

I'm not entirely sure if it's the 3.11 changes or possibly wtv further
updates were made in 3.12  but more or less due to the way
our current main task was written the `trio` guest-run was getting
abandoned on SIGINTs sent from the OS to the infected child proc..

Note that much of the bug and soln cases are layed out in very detailed
comment-notes both in the new test and `run_as_asyncio_guest()`, right
above the final "fix" lines.

Add new `test_infected_aio.test_sigint_closes_lifetime_stack()` test suite
which reliably triggers all abandonment issues with multiple cases
of different parent behaviour post-sending-SIGINT-to-child:
 1. briefly sleep then raise a KBI in the parent which was originally
    demonstrating the file leak not being cleaned up by `Actor.lifetime_stack.close()`
    and simulates a ctl-c from the console (relayed in tandem by
    the OS to the parent and child processes).
 2. do `Context.wait_for_result()` on the child context which would
    hang and timeout since the actor runtime would never complete and
    thus never relay a `ContextCancelled`.
 3. both with and without running a `asyncio` task in the `manage_file`
    child actor; originally it seemed that with an aio task scheduled in
    the child actor the guest-run abandonment always was the "loud" case
    where there seemed to be some actor teardown but with tbs from
    python failing to gracefully exit the `trio` runtime..

The (seemingly working) "fix" required 2 lines of code to be run inside
a `asyncio.CancelledError` handler around the call to `await trio_done_fut`:
- `Actor.cancel_soon()` which schedules the actor runtime to cancel on
  the next `trio` runner cycle and results in a "self cancellation" of
  the actor.
- "pumping the `asyncio` event loop" with a non-0 `.sleep(0.1)` XD
 |_ seems that a "shielded" pump with some actual `delay: float >= 0`
   did the trick to get `asyncio` to allow the `trio` runner/loop to
   fully complete its guest-run without abandonment.

Other supporting changes:
- move `._exceptions.AsyncioCancelled`, our renamed
  `asyncio.CancelledError` error-sub-type-wrapper, to `.to_asyncio` and make
  it derive from `CancelledError` so as to be sure when raised by our
  `asyncio` x-> `trio` exception relay machinery that `asyncio` is
  getting the specific type it expects during cancellation.
- do "summary status" style logging in `run_as_asyncio_guest()` wherein
  we compile the eventual `startup_msg: str` emitted just before waiting
  on the `trio_done_fut`.
- shield-wait with `out: Outcome = await asyncio.shield(trio_done_fut)`
  even though it seems to do nothing in the SIGINT handling case..(I
  presume it might help avoid abandonment in a `asyncio.Task.cancel()`
  case maybe?)
2024-06-24 16:10:23 -04:00
Tyler Goodlet 3d12a7e005 Flip `infected_asyncio` status msg to `.runtime()` 2024-06-18 18:14:58 -04:00
Tyler Goodlet 9292d73b40 Avoid actor-nursery-exit warns on registrees
Since a local-actor-nursery-parented subactor might also use the root as
its registry, we need to avoid warning when short lived IPC `Channel`
connections establish and then disconnect (quickly, bc the apparently
the subactor isn't re-using an already cached parente-peer<->child conn
as you'd expect efficiency..) since such cases currently considered
normal operation of our super shoddy/naive "discovery sys" XD

As such, (un)guard the whole local-actor-nursery OR channel-draining
waiting blocks with the additional `or Actor._cancel_called` branch
since really we should also be waiting on the parent nurse to exit (at
least, for sure and always) when the local `Actor` indeed has been
"globally" cancelled-called. Further add separate timeout warnings for
channel-draining vs. local-actor-nursery-exit waiting since they are
technically orthogonal cases (at least, afaik).

Also,
- adjust the `Actor._stream_handler()` connection status log-emit to
  `.runtime()`, especially to reduce noise around the aforementioned
  ephemeral registree connection-requests.
- if we do wait on a local actor-nurse to exit, report its `._children`
  table (which should help figure out going forward how useful the
  warning is, if at all).
2024-06-18 16:10:36 -04:00
Tyler Goodlet 83d69fe395 Change `_Cache` reuse emit to `.runtime()` 2024-06-18 14:40:26 -04:00
Tyler Goodlet 72df312e71 Expand `PayloadMsg` doc-str 2024-06-18 09:57:10 -04:00
Tyler Goodlet 711f639fc5 Break `_mk_msg_type_err()` into recv/send side funcs
Name them `_mk_send_mte()`/`_mk_recv_mte()` and change the runtime to
call each appropriately depending on location/usage.

Also add some dynamic call-frame "unhide" blocks such that when we
expect raised MTE from the aboves calls but we get a different
unexpected error from the runtime, we ensure the call stack downward is
shown in tbs/pdb.
|_ ideally in the longer run we come up with a fancier dynamic sys for
   this, prolly something in `.devx._frame_stack`?
2024-06-17 13:12:16 -04:00
Tyler Goodlet 8477919fc9 Don't pass `ipc_msg` for send side MTEs
Just pass `_bad_msg` such that it get's injected to `.msgdata` since
with a send-side `MsgTypeError` we don't have a remote `._ipc_msg:
Error` per say to include.
2024-06-17 10:32:50 -04:00
Tyler Goodlet 872feef24b Add note about using `@acm` as decorator in 3.10 2024-06-17 10:32:38 -04:00
Tyler Goodlet 04bd111037 Proxy through `dec_hook` in `.limit_plds()` APIs 2024-06-17 09:23:31 -04:00
Tyler Goodlet a0ee0cc713 Port debug request ep to use `@context(pld_spec)`
Namely passing the `.__pld_spec__` directly to the
`lock_stdio_for_peer()` decorator B)

Also, allows dropping `apply_debug_pldec()` (which was a todo) and
removing a `lock_stdio_for_peer()` indent level.
2024-06-17 09:01:13 -04:00
Tyler Goodlet 5449bd5673 Offer a `@context(pld_spec=<TypeAlias>)` API
Instead of the WIP/prototyped `Portal.open_context()` offering
a `pld_spec` input arg, this changes to a proper decorator API for
specifying the "payload spec" on `@context` endpoints.

The impl change details actually cover 2-birds:
- monkey patch decorated functions with a new
  `._tractor_context_meta: dict[str, Any]` and insert any provided input
  `@context` kwargs: `_pld_spec`, `enc_hook`, `enc_hook`.
- use `inspect.get_annotations()` to scan for a `func` arg
  type-annotated with `tractor.Context` and use the name of that arg as
  the RPC task-side injected `Context`, thus injecting the needed arg
  by type instead of by name (a longstanding TODO); raise a type-error
  when not found.
- pull the `pld_spec` from the `._tractor_context_meta` attr both in the
  `.open_context()` parent-side and child-side `._invoke()`-cation of
  the RPC task and use the `msg._ops.maybe_limit_plds()` API to apply it
  internally in the runtime for each case.
2024-06-16 23:12:53 -04:00
Tyler Goodlet e6d4ec43b9 Log tbs from non-RAE `._invoke()`-RPC-task errors
`RemoteActorError`s show this by default in their `.__repr__()`, and we
obvi capture and embed the src traceback in an `Error` msg prior to
transit, but for logging it's also handy to see the tb of any set
`Context._remote_error` on console especially when trying to decipher
remote error details at their origin actor. Also improve the log message
description using `ctx.repr_state` and show any `ctx.outcome`.
2024-06-14 15:49:30 -04:00
Tyler Goodlet 418c6907fd Add `enable_stack_on_sig: bool` for `stackscope` toggle 2024-06-14 15:37:57 -04:00
Tyler Goodlet d528e7ab4d Add `@context(pld_spec=<TypeAlias>)` TODO list
Longer run we don't want `tractor` app devs having to call
`msg._ops.limit_plds()` from every child endpoint.. so this starts
a list of decorator API ideas and obviously ties in with an ideal final
API design that will come with py3.13 and typed funcs. Obviously this
is directly fueled by,

- https://github.com/goodboy/tractor/issues/365

Other,
- type with direct `trio.lowlevel.Task` import.
- use `log.exception()` to show tbs for all error-terminations in
  `.open_context()` (for now) and always explicitly mention the `.side`.
2024-06-14 15:37:07 -04:00
Tyler Goodlet 7a89b59a3f Bleh, make `log.devx()` level less then cancel but > `.runtime()` 2024-06-11 20:45:41 -04:00
Tyler Goodlet 7d4cd8944c Use `_debug._sync_pause_from_builtin()` as `breakpoint()` override 2024-06-10 19:16:21 -04:00
Tyler Goodlet 43a8cf4be1 Make big TODO: for `devx._debug` refinements
Hopefully would make grok-ing this fairly sophisticated sub-sys possible
for any up-and-coming `tractor` hacker

XD

A lot of internal API and re-org ideas I discovered/realized as part of
finishing the `__pld_spec__` and multi-threaded support. Particularly
better isolation between root-actor vs subactor task APIs and generally
less globally-state-ful stuff like `DebugStatus` and `Lock` method APIs
would likely make a lot of the hard to follow edge cases more clear?
2024-06-10 17:46:10 -04:00
Tyler Goodlet 6534a363a5 First proto: multi-threaded synced `pdb`-REPLs
Functionally working for multi-threaded (via cpython threads spawned
from `to_trio.to_thread.run_sync()`) alongside subactors, tested (for
now) only with threads started inside the root actor (which seemed to
have the most issues in terms of the impl and special cases..) using the
new `tractor.pause_from_sync()` API!

Main implementation changes to `.pause_from_sync()`
------ - ------
- from the root actor, we need to ensure bg thread case is handled
  *specially* since no IPC is used to request the TTY stdio mutex and
  `Lock` (API) usage is conducted entirely from a local task or thread;
  dedicated `Lock` usage for the root-actor already is branched inside
  `._pause()` and needs similar handling from a root bg-thread:
 |_for the special case of a root bg thread we need to
   `trio`-main-thread schedule a bg task inside a new
   `_pause_from_bg_root_thread()`. The new task needs to implement most
   of what was is handled inside `._pause()` manually, mostly because in
   this root-actor-bg-thread case we have 2 constraints:
   1. to enter `PdbREPL.interaction()` **from the bg thread** directly,
   2. the task that `Lock._debug_lock.acquire()`s has to be the same
      that calls `.release() (a `trio.FIFOLock` constraint)
 |_impl deats of this `_pause_from_bg_root_thread()` include:
   - (for now) calling `._pause()` to acquire the `Lock._debug_lock`.
   - setting its own `DebugStatus.repl_release`.
   - calling `.DebugStatus.shield_sigint()` to ensure the root's
     main thread  uses the right handler when the bg one is REPL-ing.
   - wait manually on the `.repl_release()` to be set by the thread's
     dedicated `PdbREPL` exit.
   - manually calling `Lock.release()` from the **same task** that
     acquired it.
- expect calls to `._pause()` to deliver a `tuple[Task, PdbREPL]` such
  that we always get the handle both to any newly created REPl instance
  and the (maybe) the scheduled bg task within which is runs.
- add a single `message: str` style to `log.devx()` based on branching
  style for logging.
- ensure both `DebugStatus.repl` and `.repl_task` are set **just
  before** calling `._set_trace()` to ensure the correct `Task|Thread`
  is set when the REPL is finally entered from sync code.
- add a wrapping caller `_sync_pause_from_builtin()` which passes in the
  new `called_from_builtin=True` to indicate `breakpoint()` caller
  usage, obvi pass in `api_frame`.

Changes to `._pause()` in support of ^
------ - ------
- `TaskStatus.started()` and return the `tuple[Task, PdbREPL]` to
  callers / starters.
- only call `DebugStatus.shield_sigint()` when no `repl` passed bc some
  callers (like bg threads) may need to apply it at some specific point
  themselves.
- tweak some asserts for the `debug_func == None` / non-`trio`-thread
  case.
- add a mod-level `_repl_fail_msg: str` to be used when there's an
  internal `._pause()` failure for testing, easier to pexpect match.
- more comprehensive logging for the root-actor branched case to
  (attempt to) indicate any of the 3 cases:
  - remote ctx from subactor has the `Lock`,
  - already existing root task or thread has it or,
  - some kinda stale `.locked()` situation where the root has the lock
    but we don't know why.
- for root usage, revert to always `await Lock._debug_lock.acquire()`-ing
  despite `called_from_sync` since `.pause_from_sync()` was reworked to
  instead handle the special bg thread case in the new
  `_pause_from_bg_root_thread()` task.
- always do `return _enter_repl_sync(debug_func)`.
- try to report any `repl_task: Task|Thread` set by the caller
  (particularly for the bg thread cases) as being the thread or task
  `._pause()` was called "on behalf of"

Changes to `DebugStatus`/`Lock` in support of ^
------ - ------
- only call `Lock.release()` from `DebugStatus.set_[quit/continue]()`
  when called from the main `trio` thread and always call
  `DebugStatus.release()` **after** to ensure `.repl_released()` is set
  **after** `._debug_lock.release()`.
- only call `.repl_release.set()` from `trio` thread otherwise use
  `.from_thread.run()`.
- much more refinements in `Lock.release()` for threading cases:
  - return `bool` to indicate whether lock was released by caller.
  - mask (in prep to drop) `_pause()` usage of
    `Lock.release.force=True)` since forcing a release can't ever avoid
    the RTE from `trio`.. same task **must** acquire/release.
  - don't allow usage from non-`trio`-main-threads, ever; there's no
    point since the same-task-needs-to-manage-`FIFOLock` constraint.
  - much more detailed logging using `message`-building-style for all
    caller (edge) cases.
   |_ use a `we_released: bool` to determine failed-to-release edge
      cases which can happen if called from bg threads, ensure we
      `log.exception()` on any incorrect usage resulting in  release
      failure.
   |_ complain loudly if the release fails and some other task/thread
      still holds the lock.
   |_ be explicit about "who" (which task or thread) the release is "on
      behalf of" by reading `DebugStatus.repl_task` since the caller
      isn't the REPL operator in many sync cases.
  - more or less drop `force` support, as mentioned above.
  - ensure we unset `._owned_by_root` if the caller is a root task.

Other misc
------ - ------
- rename `lock_tty_for_child()` -> `lock_stdio_for_peer()`.
- rejig `Lock.repr()` to show lock and event stats.
- stage `Lock.stats` and `.owner` methods in prep for doing a singleton
  instance and `@property`s.
2024-06-10 17:45:52 -04:00
Tyler Goodlet 408a74784e Catch `.pause_from_sync()` in root bg thread bugs!
Originally discovered as while using `tractor.pause_from_sync()`
from the `i3ipc` client running in a bg-thread that uses `asyncio`
inside `modden`.

Turns out we definitely aren't correctly handling `.pause_from_sync()`
from the root actor when called from a `trio.to_thread.run_sync()`
bg thread:
- root-actor bg threads which can't `Lock._debug_lock.acquire()` since
  they aren't in `trio.Task`s.
- even if scheduled via `.to_thread.run_sync(_debug._pause)` the
  acquirer won't be the task/thread which calls `Lock.release()` from
  `PdbREPL` hooks; this results in a RTE raised by `trio`..
- multiple threads will step on each other's stdio since cpython's GIL
  seems to ctx switch threads on every input from the user to the REPL
  loop..

Reproduce via reworking our example and test so that they catch and fail
for all edge cases:
- rework the `/examples/debugging/sync_bp.py` example to demonstrate the
  above issues, namely the stdio clobbering in the REPL when multiple
  threads and/or a subactor try to debug simultaneously.
  |_ run one thread using a task nursery to ensure it runs conc with the
     nursery's parent task.
  |_ ensure the bg threads run conc a subactor usage of
     `.pause_from_sync()`.
  |_ gravely detail all the special cases inside a TODO comment.
  |_ add some control flags to `sync_pause()` helper and don't use
     `breakpoint()` by default.
- extend and adjust `test_debugger.test_pause_from_sync` to match (and
  thus currently fail) by ensuring exclusive `PdbREPL` attachment when
  the 2 bg root-actor threads are concurrently interacting alongside the
  subactor:
  |_ should only see one of the `_pause_msg` logs at a time for either
     one of the threads or the subactor.
  |_ ensure each attaches (in no particular order) before expecting the
     script to exit.

Impl adjustments to `.devx._debug`:
- drop `Lock.repl`, no longer used.
- add `Lock._owned_by_root: bool` for the `.ctx_in_debug == None`
  root-actor-task active case.
- always `log.exception()` for any `._debug_lock.release()` ownership
  RTE emitted by `trio`, like we used to..
- add special `Lock.release()` log message for the stale lock but
  `._owned_by_root == True` case; oh yeah and actually
  `log.devx(message)`..
- rename `Lock.acquire()` -> `.acquire_for_ctx()` since it's only ever
  used from subactor IPC usage; well that and for local root-task
  usage we should prolly add a `.acquire_from_root_task()`?
- buncha `._pause()` impl improvements:
 |_ type `._pause()`'s `debug_func` as a `partial` as well.
 |_ offer `called_from_sync: bool` and `called_from_bg_thread: bool`
    for the special case handling when called from `.pause_from_sync()`
 |_ only set `DebugStatus.repl/repl_task` when `debug_func != None`
   (OW ensure the `.repl_task` is not the current one).
 |_ handle error logging even when `debug_func is None`..
 |_ lotsa detailed commentary around root-actor-bg-thread special cases.
- when `._set_trace(hide_tb=False)` do `pdbp.set_trace(frame=currentframe())`
  so the `._debug` internal frames are always included.
- by default always hide tracebacks for `.pause[_from_sync]()` internals.
- improve `.pause_from_sync()` to avoid root-bg-thread crashes:
 |_ pass new `called_from_xxx_` flags and ensure `DebugStatus.repl_task`
    is actually set to the `threading.current_thread()` when needed.
 |_ manually call `Lock._debug_lock.acquire_nowait()` for the non-bg
    thread case.
 |_ TODO: still need to implement the bg-thread case using a bg
    `trio.Task`-in-thread with an `trio.Event` set by thread REPL exit.
2024-06-06 16:56:30 -04:00
Tyler Goodlet f0342d6ae3 Move `Context.open_stream()` impl to `._streaming`
Exactly like how it's organized for `Portal.open_context()`, put the
main streaming API `@acm` with the `MsgStream` code and bind the method
to the new module func.

Other,
- rename `Context.result()` -> `.wait_for_result()` to better match the
  blocking semantics and rebind `.result()` as deprecated.
- add doc-str for `Context.maybe_raise()`.
2024-05-31 17:32:11 -04:00
Tyler Goodlet 21f633a900 Use `Context` repr APIs for RPC outcome logs
Delegate to the new `.repr_state: str` and adjust log level based on
error vs. cancel vs. result.
2024-05-31 14:40:55 -04:00
Tyler Goodlet 4a270f85ca Drop sub-decoder proto-cruft from `.msg._codec`
It ended up getting necessarily implemented as the `PldRx` though at
a different layer and won't be needed as part of `MsgCodec` most likely,
though this original idea did provide the source of inspiration for how
things work now!

Also Move the commented TODO proto for a codec hook factory from
`.types` to `._codec` where it prolly better fits and update some msg
related todo/questions.
2024-05-31 12:03:18 -04:00
Tyler Goodlet 8ea0f08386 Finally, officially support shielded REPL-ing!
It's been a long time prepped and now finally implemented!

Offer a `shield: bool` argument from our async `._debug` APIs:
- `await tractor.pause(shield=True)`,
- `await tractor.post_mortem(shield=True)`

^-These-^ can now be used inside cancelled `trio.CancelScope`s,
something very handy when introspecting complex (distributed) system
tear/shut-downs particularly under remote error or (inter-peer)
cancellation conditions B)

Thanks to previous prepping in a prior attempt and various patches from
the rigorous rework of `.devx._debug` internals around typed msg specs,
there ain't much that was needed!

Impl deats
- obvi passthrough `shield` from the public API endpoints (was already
  done from a prior attempt).
- put ad-hoc internal `with trio.CancelScope(shield=shield):` around all
  checkpoints inside `._pause()` for both the root-process and subactor
  case branches.

Add a fairly rigorous example, `examples/debugging/shielded_pause.py`
with a wrapping `pexpect` test, `test_debugger.test_shield_pause()` and
ensure it covers as many cases as i can think of offhand:

- multiple `.pause()` entries in a loop despite parent scope
  cancellation in a subactor RPC task which itself spawns a sub-task.
- a `trio.Nursery.parent_task` which raises, is handled and
  tries to enter and unshielded `.post_mortem()`, which of course
  internally raises `Cancelled` in a `._pause()` checkpoint, so we catch
  the `Cancelled` again and then debug the debugger's internal
  cancellation with specific checks for the particular raising
  checkpoint-LOC.
- do ^- the latter -^ for both subactor and root cases to ensure we
  can debug `._pause()` itself when it tries to REPL engage from
  a cancelled task scope Bo
2024-05-30 17:52:24 -04:00
Tyler Goodlet 13ea500a44 Rename `PldRx.dec_msg()` -> `.decode_pld()`
Keep the old alias, but i think it's better form to use longer names for
internal public APIs and this name better reflects the functionality:
decoding and returning a `PayloadMsg.pld` field.
2024-05-30 16:09:59 -04:00
Tyler Goodlet fcd089c08f Always `.exception()` in `try_ship_error_to_remote()` on internal error 2024-05-30 16:02:25 -04:00
Tyler Goodlet 993281882b Pass `boxed_type` from `_mk_msg_type_err()`
Such that we're boxing the interchanged lib's specific error
`msgspec.ValidationError` in this case) type much like how
a `ContextCancelled[trio.Cancelled]` is composed; allows for seemless
multi-backend-codec support later as well B)

Pass `ctx.maybe_raise(from_src_exc=src_err)` where needed in a couple
spots; as `None` in the send-side `Started` MTE case to avoid showing
the `._scope1.cancel_called` result in the traceback from the
`.open_context()` child-sync phase.
2024-05-30 15:55:34 -04:00
Tyler Goodlet bbb4d4e52c Add `from_src_exc: BaseException` to maybe raisers
That is as a control to `Context._maybe_raise_remote_err()` such that
if set to anything other then the default (`False` value), we do
`raise remote_error from from_src_exc` such that caller can choose to
suppress or override the `.__cause__` tb.

Also tidy up and old masked TODO regarding calling `.maybe_raise()`
after the caller exits from the `yield` in `.open_context()`..
2024-05-30 15:24:25 -04:00
Tyler Goodlet 0e8c60ee4a Better RAE `.pformat()`-ing for send-side MTEs
Send-side `MsgTypeError`s actually shouldn't have any "boxed" traceback
per say since they're raised in the transmitting actor's local task env
and we (normally) don't want the ascii decoration added around the
error's `._message: str`, that is not until the exc is `pack_error()`-ed
before transit. As such, the presentation of an embedded traceback (and
its ascii box) gets bypassed when only a `._message: str` is set (as we
now do for pld-spec failures in `_mk_msg_type_err()`).

Further this tweaks the `.pformat()` output to include the `._message`
part to look like `<RemoteActorError( <._message> ) ..` instead of
jamming it implicitly to the end of the embedded `.tb_str` (as was done
implicitly by `unpack_error()`) and also adds better handling for the
`with_type_header == False` case including forcing that case when we
detect that the currently handled exc is the RAE in `.pformat()`.
Toss in a lengthier doc-str explaining it all.

Surrounding/supporting changes,
- better `unpack_error()` message which just briefly reports the remote
  task's error type.
- add public `.message: str` prop.
- always set a `._extra_msgdata: dict` since some MTE props rely on it.
- handle `.boxed_type == None` for `.boxed_type_str`.
- maybe pack any detected input or `exc.message` in `pack_error()`.
- comment cruft cleanup in `_mk_msg_type_err()`.
2024-05-30 10:25:04 -04:00
Tyler Goodlet 1db5d4def2 Add `Error.message: str`
Allows passing a custom error msg other then the traceback-str over the
wire. Make `.tb_str` optional (in the blank `''` sense) since it's
treated that way thus far in `._exceptions.pack_error()`.
2024-05-30 09:14:04 -04:00
Tyler Goodlet 6e54abc56d Fix missing newline in task-cancel log-message 2024-05-30 09:06:10 -04:00
Tyler Goodlet 28af4749cc Don't need to pack an `Error` with send-side MTEs 2024-05-30 09:05:23 -04:00
Tyler Goodlet 6a4ee461f5 Raise remote errors rxed during `Context` child-sync
More specifically, if `.open_context()` is cancelled when awaiting the
first `Context.started()` during the child task sync phase, check to see
if it was due to `._scope.cancel_called` and raise any remote error via
`.maybe_raise()` instead the `trio.Cancelled` like in every other
remote-error handling case. Ensure we set `._scope[_nursery]` only after
the `Started` has arrived and audited.
2024-05-28 16:11:01 -04:00
Tyler Goodlet 2db03444f7 Don't (noisly) log about runtime cancel RPC tasks
Since in the case of the `Actor._cancel_task()` related runtime eps we
actually don't EVER register them in `Actor._rpc_tasks`.. logging about
them is just needless noise, though maybe we should track them in a diff
table; something like a `._runtime_rpc_tasks`?

Drop the cancel-request-for-stale-RPC-task (`KeyError` case in
`Actor._cancel_task()`) log-emit level in to `.runtime()`; it's
generally not useful info other then for granular race condition eval
when hacking the runtime.
2024-05-28 16:03:36 -04:00
Tyler Goodlet a1b124b62b Raise send-side MTEs inline in `PldRx.dec_msg()`
So when `is_started_send_side is True` we raise the newly created
`MsgTypeError` (MTE) directly instead of doing all the `Error`-msg pack
and unpack to raise stuff via `_raise_from_unexpected_msg()` since the
raise should happen send side anyway and so doesn't emulate any remote
fault like in a bad `Return` or `Started` without send-side pld-spec
validation.

Oh, and proxy-through the `hide_tb: bool` input from `.drain_to_final_msg()`
to `.recv_msg_w_pld()`.
2024-05-28 16:02:51 -04:00
Tyler Goodlet 59ca256183 Set remote errors in `_raise_from_unexpected_msg()`
By calling `Context._maybe_cancel_and_set_remote_error(exc)` on any
unpacked `Error` msg; provides for `Context.maybe_error` consistency to
match all other error delivery cases.
2024-05-28 15:48:01 -04:00
Tyler Goodlet 6c2efc96dc Factor `.started()` validation into `.msg._ops`
Filling out the helper `validate_payload_msg()` staged in a prior commit
and adjusting all imports to match.

Also add a `raise_mte: bool` flag for potential usage where the caller
wants to handle the MTE instance themselves.
2024-05-28 11:08:27 -04:00
Tyler Goodlet 7ac730e326 Drop `msg.types.Msg` for new replacement types
The `TypeAlias` for the msg type-group is now `MsgType` and any user
touching shuttle messages can now be typed as `PayloadMsg`.

Relatedly, add MTE specific `Error._bad_msg[_as_dict]` fields which are
handy for introspection of remote decode failures.
2024-05-28 09:55:16 -04:00
Tyler Goodlet 582144830f Parameterize the `return_msg_type` in `._invoke()`
Since we also handle a runtime-specific `CancelAck`, allow the
caller-scheduler to pass in the expected return-type msg per the RPC msg
endpoint loop.
2024-05-28 09:36:26 -04:00
Tyler Goodlet eee4c61b51 Add `MsgTypeError` "bad msg" capture
Such that if caught by user code and/or the runtime we can introspect
the original msg which caused the type error. Previously this was kinda
half-baked with a `.msg_dict` which was delivered from an `Any`-decode
of the shuttle msg in `_mk_msg_type_err()` but now this more explicitly
refines the API and supports both `PayloadMsg`-instance or the msg-dict
style injection:
- allow passing either of `bad_msg: PayloadMsg|None` or
  `bad_msg_as_dict: dict|None` to `MsgTypeError.from_decode()`.
- expose public props for both ^ whilst dropping prior `.msgdict`.
- rework `.from_decode()` to explicitly accept `**extra_msgdata: dict`
  |_ only overriding it from any `bad_msg_as_dict` if the keys are found in
    `_ipcmsg_keys`, **except** for `_bad_msg` when `bad_msg` is passed.
  |_ drop `.ipc_msg` passthrough.
  |_ drop `msgdict` input.
- adjust `.cid` to only pull from the `.bad_msg` if set.

Related fixes/adjustments:
- `pack_from_raise()` should pull `boxed_type_str` from
  `boxed_type.__name__`, not the `type()` of it.. also add a
  `hide_tb: bool` flag.
- don't include `_msg_dict` and `_bad_msg` in the `_body_fields` set.
- allow more granular boxed traceback-str controls:
  |_ allow passing a `tb_str: str` explicitly in which case we use it
    verbatim and presume caller knows what they're doing.
  |_ when not provided, use the more explicit
    `traceback.format_exception(exc)` since the error instance is
    a required input (we still fail back to the old `.format_exc()` call
    if for some reason the caller passes `None`; but that should be
    a bug right?).
  |_ if a `tb: TracebackType` and a `tb_str` is passed, concat them.
- in `RemoteActorError.pformat()` don't indent the `._message` part used
  for the `body` when `with_type_header == False`.
- update `_mk_msg_type_err()` to use `bad_msg`/`bad_msg_as_dict`
  appropriately and drop passing `ipc_msg`.
2024-05-27 22:36:05 -04:00
Tyler Goodlet 42ba855d1b More correct/explicit `.started()` send-side validation
In the sense that we handle it as a special case that exposed
through to `RxPld.dec_msg()` with a new `is_started_send_side: bool`.

(Non-ideal) `Context.started()` impl deats:
- only do send-side pld-spec validation when a new `validate_pld_spec`
  is set (by default it's not).
- call `self.pld_rx.dec_msg(is_started_send_side=True)` to validate the
  payload field from the just codec-ed `Started` msg's `msg_bytes` by
  passing the `roundtripped` msg (with it's `.pld: Raw`) directly.
- add a `hide_tb: bool` param and proxy it to the `.dec_msg()` call.

(Non-ideal) `PldRx.dec_msg()` impl deats:
- for now we're packing the MTE inside an `Error` via a manual call to
  `pack_error()` and then setting that as the `msg` passed to
  `_raise_from_unexpected_msg()` (though really we should just raise
  inline?).
- manually set the `MsgTypeError._ipc_msg` to the above..

Other,
- more comprehensive `Context` type doc string.
- various `hide_tb: bool` kwarg additions through `._ops.PldRx` meths.
- proto a `.msg._ops.validate_payload_msg()` helper planned to get the
  logic from this version of `.started()`'s send-side validation so as
  to be useful more generally elsewhere.. (like for raising back
  `Return` values on the child side?).

Warning: this commit may have been made out of order from required
changes to `._exceptions` which will come in a follow up!
2024-05-27 14:59:40 -04:00
Tyler Goodlet e4ec6b7b0c Even smarter `RemoteActorError.pformat()`-ing
Related to the prior patch, re the new `with_type_header: bool`:
- in the `with_type_header == True` use case make sure we keep the first
  `._message: str` line non-indented since it'll show just after the
  header-line's type path with ':'.
- when `False` drop the `)>` `repr()`-instance style as well so that we
  just get the ascii boxed traceback as though it's the error
  message-`str` not the `repr()` of the error obj.

Other,
- hide `pack_from_raise()` call frame since it'll show in debug mode
  crash handling..
- mk `MsgTypeError.from_decode()` explicitly accept and proxy an
  optional `ipc_msg` and change `msgdict` to also be optional, only
  reading out the `**extra_msgdata` when provided.
- expose a `_mk_msg_type_err(src_err_msg: Error|None = None,)` for
  callers who which to inject a `._ipc_msg: Msgtype` to the MTE.
  |_ add a note how we can't use it due to a causality-dilemma when pld
     validating `Started` on the send side..
2024-05-22 15:26:48 -04:00
Tyler Goodlet 9ce958cb4a Add debug check-n-wait inside `._spawn.soft_kill()`
And IFF the `await wait_func(proc)` is cancelled such that we avoid
clobbering some subactor that might be REPL-ing even though its parent
actor is in the midst of (gracefully) cancelling it.
2024-05-22 15:21:01 -04:00
Tyler Goodlet ce4d64ed2f Mk `MsgDec.spec_str` have a more compact ` 2024-05-22 15:18:45 -04:00
Tyler Goodlet c6f599b1be Call `.devx._debug.hide_runtime_frames()` by default
From both `open_root_actor()` and `._entry._trio_main()`.

Other `breakpoint()`-from-sync-func fixes:
- properly disable the default hook using `"0"` XD
- offer a `hide_tb: bool` from `open_root_actor()`.
- disable hiding the `._trio_main()` frame, bc pretty sure it doesn't
  help anyone (either way) when REPL-ing/tb-ing from a subactor..?
2024-05-22 15:16:29 -04:00
Tyler Goodlet 9eb74560ad Port `Actor._stream_handler()` to use `.has_outcome`, fix indent bug.. 2024-05-22 15:10:39 -04:00
Tyler Goodlet d15e73557a Move runtime frame hiding into helper func
Call it `hide_runtime_frames()` and stick all the lines from the top of
the `._debug` mod in there along with a little `log.devx()` emission on
what gets hidden by default ;)

Other,
- fix ref-error where internal-error handler might trigger despite the
  debug `req_ctx` not yet having init-ed, such that we don't try to
  cancel or log about it when it never was fully created/initialize..
- fix assignment typo iniside `_set_trace()` for `task`.. lel
2024-05-22 14:56:54 -04:00
Tyler Goodlet 74d4b5280a Woops, make `log.devx()` level less `.error()` 2024-05-22 14:56:18 -04:00
Tyler Goodlet 3538ccd799 Better context aware `RemoteActorError.pformat()`
Such that when displaying with `.__str__()` we do not show the type
header (style) since normally python's raising machinery already prints
the type path like `'tractor._exceptions.RemoteActorError:'`, so doing
it 2x is a bit ugly ;p

In support,
- include `.relay_uid` in `RemoteActorError.extra_body_fields`.
- offer a `with_type_header: bool` to `.pformat()` and only put the
  opening type path and closing `')>'` tail line when `True`.
- add `.is_inception() -> bool:` for an easy way to determine if the
  error is multi-hop relayed.
- only repr the `'|_relay_uid=<uid>'` field when an error is an inception.
- tweak the invalid-payload case in `_mk_msg_type_err()` to explicitly
  state in the `message` how the `any_pld` value does not match the `MsgDec.pld_spec`
  by decoding the invalid `.pld` with an any-dec.
- allow `_mk_msg_type_err(**mte_kwargs)` passthrough.
- pass `boxed_type=cls` inside `MsgTypeError.from_decode()`.
2024-05-22 10:35:42 -04:00
Tyler Goodlet b22f7dcae0 Resolve remaining debug-request race causing hangs
More or less by pedantically separating and managing root and subactor
request syncing events to always be managed by the locking IPC context
task-funcs:
- for the root's "child"-side, `lock_tty_for_child()` directly creates
  and sets a new `Lock.req_handler_finished` inside a `finally:`
- for the sub's "parent"-side, `request_root_stdio_lock()` does the same
  with a new `DebugStatus.req_finished` event and separates it from
  the `.repl_release` event (which indicates a "c" or "q" from user and
  thus exit of the REPL session) as well as sets a new `.req_task:
  trio.Task` to explicitly distinguish from the app-user-task that
  enters the REPL vs. the paired bg task used to request the global
  root's stdio mutex alongside it.
- apply the `__pld_spec__` on "child"-side of the ctx using the new
  `Portal.open_context(pld_spec)` parameter support; drops use of any
  `ContextVar` malarky used prior for `PldRx` mgmt.
- removing `Lock.no_remote_has_tty` since it was a nebulous name and
  from the prior "everything is in a `Lock`" design..

------ - ------

More rigorous impl to handle various edge cases in `._pause()`:
- rejig `_enter_repl_sync()` to wrap the `debug_func == None` case
  inside maybe-internal-error handler blocks.
- better logic for recurrent vs. multi-task contention for REPL entry in
  subactors, by guarding using `DebugStatus.req_task` and by now waiting
  on the new `DebugStatus.req_finished` for the multi-task contention
  case.
- even better internal error handling and reporting for when this code
  is hacked on and possibly broken ;p

------ - ------

Updates to `.pause_from_sync()` support:
- add optional `actor`, `task` kwargs to `_set_trace()` to allow
  compat with the new explicit `debug_func` calling in `._pause()` and
  pass a `threading.Thread` for `task` in the `.to_thread()` usage case.
- add an `except` block that tries to show the frame on any internal
  error.

------ - ------

Relatedly includes a buncha cleanups/simplifications somewhat in
prep for some coming refinements (around `DebugStatus`):
- use all the new attrs mentioned above as needed in the SIGINT shielder.
- wait on `Lock.req_handler_finished` in `maybe_wait_for_debugger()`.
- dropping a ton of masked legacy code left in during the recent reworks.
- better comments, like on the use of `Context._scope` for shielding on
  the "child"-side to avoid the need to manage yet another cs.
- add/change-to lotsa `log.devx()` level emissions for those infos which
  are handy while hacking on the debugger but not ideal/necessary to be
  user visible.
- obvi add lotsa follow up todo notes!
2024-05-21 10:19:41 -04:00
Tyler Goodlet fde62c72be Show runtime nursery frames on internal errors
Much like other recent changes attempt to detect runtime-bug-causing
crashes and only show the runtime-endpoint frame when present.

Adds a `ActorNursery._scope_error: BaseException|None` attr to aid with
detection. Also toss in some todo notes for removing and replacing the
`.run_in_actor()` method API.
2024-05-20 17:04:30 -04:00
Tyler Goodlet 4ef77bb64f Set `_ctxvar_Context` for child-side RPC tasks
Just inside `._invoke()` after the `ctx: Context` is retrieved.

Also try our best to *not hide* internal frames when a non-user-code
crash happens, normally either due to a runtime RPC EP bug or
a transport failure.
2024-05-20 16:23:29 -04:00
Tyler Goodlet e78fdf2f69 Make `log.devx()` level below `.pdb()`
Kinda like a "runtime"-y level for `.pdb()` (which is more or less like
an `.info()` for our debugger subsys) which can be used to report
internals info for those hacking on `.devx` tools.

Also, inject only the *last* 6 digits of the `id(Task)` in
`pformat_task_uid()` output by default.
2024-05-20 16:13:57 -04:00
Tyler Goodlet 13bc3c308d Add error suppress flag to `current_ipc_ctx()` 2024-05-20 16:12:51 -04:00
Tyler Goodlet 60fc43e530 Shield channel closing in `_connect_chan()` 2024-05-20 16:11:59 -04:00
Tyler Goodlet 30afcd2b6b Adjust `Portal` usage of `Context.pld_rx`
Pass the new `ipc` arg and try to show api frames when an unexpected
internal error is detected.
2024-05-20 16:07:57 -04:00
Tyler Goodlet c80f020ebc Expose `tractor.current_ipc_ctx()` at pkg level 2024-05-20 15:47:01 -04:00
Tyler Goodlet 262a0e36c6 Allocate a `PldRx` per `Context`, new pld-spec API
Since the state mgmt becomes quite messy with multiple sub-tasks inside
an IPC ctx, AND bc generally speaking the payload-type-spec should map
1-to-1 with the `Context`, it doesn't make a lot of sense to be using
`ContextVar`s to modify the `Context.pld_rx: PldRx` instance.

Instead, always allocate a full instance inside `mk_context()` with the
default `.pld_rx: PldRx` set to use the `msg._ops._def_any_pldec: MsgDec`

In support, simplify the `.msg._ops` impl and APIs:
- drop `_ctxvar_PldRx`, `_def_pld_rx` and `current_pldrx()`.
- rename `PldRx._pldec` -> `._pld_dec`.
- rename the unused `PldRx.apply_to_ipc()` -> `.wraps_ipc()`.
- add a required `PldRx._ctx: Context` attr since it is needed
  internally in some meths and each pld-rx now maps to a specific ctx.
- modify all recv methods to accept a `ipc: Context|MsgStream` (instead
  of a `ctx` arg) since both have a ref to the same `._rx_chan` and there
  are only a couple spots (in `.dec_msg()`) where we need the `ctx`
  explicitly (which can now be easily accessed via a new `MsgStream.ctx`
  property, see below).
- always show the `.dec_msg()` frame in tbs if there's a reference error
  when calling `_raise_from_unexpected_msg()` in the fallthrough case.
- implement `limit_plds()` as light wrapper around getting the
  `current_ipc_ctx()` and mutating its `MsgDec` via
  `Context.pld_rx.limit_plds()`.
- add a `maybe_limit_plds()` which just provides an `@acm` equivalent of
  `limit_plds()` handy for composing in a `async with ():` style block
  (avoiding additional indent levels in the body of async funcs).

Obvi extend the `Context` and `MsgStream` interfaces as needed
to match the above:
- add a `Context.pld_rx` pub prop.
- new private refs to `Context._started_msg: Started` and
  a `._started_pld` (mostly for internal debugging / testing / logging)
  and set inside `.open_context()` immediately after the syncing phase.
- a `Context.has_outcome() -> bool:` predicate which can be used to more
  easily determine if the ctx errored or has a final result.
- pub props for `MsgStream.ctx: Context` and `.chan: Channel` providing
  full `ipc`-arg compat with the `PldRx` method signatures.
2024-05-20 15:46:28 -04:00
Tyler Goodlet d93135acd8 Include truncated `id(trio.Task)` for task info in log header 2024-05-15 09:36:22 -04:00
Tyler Goodlet b23780c102 Make `request_root_stdio_lock()` post-mortem-able
Finally got this working so that if/when an internal bug is introduced
to this request task-func, we can actually REPL-debug the lock request
task itself B)

As in, if the subactor's lock request task internally errors we,
- ensure the task always terminates (by calling `DebugStatus.release()`)
  and explicitly reports (via a `log.exception()`) the internal error.
- capture the error instance and set as a new `DebugStatus.req_err` and
  always check for it on final teardown - in which case we also,
 - ensure it's reraised from a new `DebugRequestError`.
 - unhide the stack frames for `_pause()`, `_enter_repl_sync()` so that
   the dev can upward inspect the `_pause()` call stack sanely.

Supporting internal impl changes,
- add `DebugStatus.cancel()` and `.req_err`.
- don't ever cancel the request task from
  `PdbREPL.set_[continue/quit]()` only when there's some internal error
  that would likely result in a hang and stale lock state with the root.
- only release the root's lock when the current ask is also the owner
  (avoids bad release errors).
- also show internal `._pause()`-related frames on any `repl_err`.

Other temp-dev-tweaks,
- make pld-dec change log msgs info level again while solving this
  final context-vars race stuff..
- drop the debug pld-dec instance match asserts for now since
  the problem is already caught (and now debug-able B) by an attr-error
  on the decoded-as-`dict` started msg, and instead add in
  a `log.exception()` trace to see which task is triggering the case
  where the debug `MsgDec` isn't set correctly vs. when we think it's
  being applied.
2024-05-14 21:01:20 -04:00
Tyler Goodlet 31de5f6648 Always release debug request from `._post_mortem()`
Since obviously the thread is likely expected to halt and raise after
the REPL session exits; this was a regression from the prior impl. The
main reason for this is that otherwise the request task will never
unblock if the user steps through the crashed task using 'next' since
the `.do_next()` handler doesn't by default release the request since in
the `.pause()` case this would end the session too early.

Other,
- toss in draft `Pdb.user_exception()`, though doesn't seem to ever
  trigger?
- only release `Lock._debug_lock` when already locked.
2024-05-14 11:39:04 -04:00
Tyler Goodlet 236083b6e4 Rename `.msg.types.Msg` -> `PayloadMsg` 2024-05-10 13:15:45 -04:00
Tyler Goodlet fc075e96c6 Hide some API frames, port to new `._debug` apis
- start tossing in `__tracebackhide__`s to various eps which don't need
  to show in tbs or in the pdb REPL.
- port final `._maybe_enter_pm()` to pass a `api_frame`.
- start comment-marking up some API eps with `@api_frame`
  in prep for actually using the new frame-stack tracing.
2024-05-09 16:04:34 -04:00
Tyler Goodlet d6ca4771ce Use `.recv_msg_w_pld()` for final `Portal.result()`
Woops, due to a `None` test against the `._final_result`, any actual
final `None` result would be received but not acked as such causing
a spawning test to hang. Fix it by instead receiving and assigning both
a `._final_result_msg: PayloadMsg` and `._final_result_pld`.

NB: as mentioned in many recent comments surrounding this API layer,
really this whole `Portal`-has-final-result interface/semantics should
be entirely removed as should the `ActorNursery.run_in_actor()` API(s).
Instead it should all be replaced by a wrapping "high level" API
(`tractor.hilevel` ?) which combines a task nursery, `Portal.open_context()`
and underlying `Context` APIs + an `outcome.Outcome` to accomplish the
same "run a single task in a spawned actor and return it's result"; aka
a "one-shot-task-actor".
2024-05-09 09:47:13 -04:00
Tyler Goodlet c5a0cfc639 Rename `.msg.types.Msg` -> `PayloadMsg` 2024-05-08 15:07:34 -04:00
Tyler Goodlet f85314ecab Adjust `._runtime` to report `DebugStatus.req_ctx`
- inside the `Actor.cancel()`'s maybe-wait-on-debugger delay,
  report the full debug request status and it's affiliated lock request
  IPC ctx.
- use the new `.req_ctx.chan.uid` to do the local nursery lookup during
  channel teardown handling.
- another couple log fmt tweaks.
2024-05-08 15:06:50 -04:00
Tyler Goodlet 6690968236 Rework and first draft of `.devx._frame_stack.py`
Proto-ing a little suite of call-stack-frame annotation-for-scanning
sub-systems for the purposes of both,
- the `.devx._debug`er and its
  traceback and frame introspection needs when entering the REPL,
- detailed trace-style logging such that we can explicitly report
  on "which and where" `tractor`'s APIs are used in the "app" code.

Deats:
- change mod name obvi from `._code` and adjust client mod imports.
- using `wrapt` (for perf) implement a `@api_frame` annot decorator
  which both stashes per-call-stack-frame instances of `CallerInfo` in
  a table and marks the function such that API endpoints can be easily
  found via runtime stack scanning despite any internal impl changes.
- add a global `_frame2callerinfo_cache: dict[FrameType, CallerInfo]`
  table for providing the per func-frame info caching.
- Re-implement `CallerInfo` to require less (types of) inputs:
  |_ `_api_func: Callable`, a ref to the (singleton) func def.
  |_ `_api_frame: FrameType` taken from the `@api_frame` marked `tractor`-API
     func's runtime call-stack, from which we can determine the
     app code's `.caller_frame`.
  |_`_caller_frames_up: int|None` allowing the specific `@api_frame` to
    determine "how many frames up" the application / calling code is.
  And, a better set of derived attrs:
  |_`caller_frame: FrameType` which finds and caches the API-eps calling
    frame.
  |_`caller_frame: FrameType` which finds and caches the API-eps calling
- add a new attempt at "getting a method ref from its runtime frame"
  with `get_ns_and_func_from_frame()` using a heuristic that the
  `CodeType.co_qualname: str` should have a "." in it for methods.
  - main issue is still that the func-ref lookup will require searching
    for the method's instance type by name, and that name isn't
    guaranteed to be defined in any particular ns..
   |_rn we try to read it from the `FrameType.f_locals` but that is
     going to obvi fail any time the method is called in a module where
     it's type is not also defined/imported.
  - returns both the ns and the func ref FYI.
2024-05-08 14:51:56 -04:00
Tyler Goodlet 343b7c9712 Even moar bitty `Context` refinements
- set `._state._ctxvar_Context` just after `StartAck` inside
  `open_context_from_portal()` so that `current_ipc_ctx()` always
  works on the 'parent' side.
- always set `.canceller` to any `MsgTypeError.src_uid` and otherwise to
  any maybe-detected `.src_uid` (i.e. for RAEs).
- always set `.canceller` to us when we rx a ctxc which reports us as
  its canceller; this is a sanity check on definite "self cancellation".
- adjust `._is_self_cancelled()` logic to only be `True` when
  `._remote_error` is both a ctxc with a `.canceller` set to us AND
  when `Context.canceller` is also set to us (since the change above)
  as a little bit of extra rigor.
- fill-in/fix some `.repr_state` edge cases:
  - merge self-vs.-peer ctxc cases to one block and distinguish via
    nested `._is_self_cancelled()` check.
  - set 'errored' for all exception matched cases despite `.canceller`.
  - add pre-`Return` phase statuses:
   |_'pre-started' and 'syncing-to-child' depending on side and when
     `._stream` has not (yet) been set.
   |_'streaming' and 'streaming-finished' depending on side when
     `._stream` is set and whether it was stopped/closed.
- tweak drainage log-message to use "outcome" instead of "result".
- use new `.devx.pformat.pformat_cs()` inside `_maybe_cancel_and_set_remote_error()`
  but, IFF the log level is at least 'cancel'.
2024-05-08 14:02:56 -04:00
Tyler Goodlet 45f37870af Add a `.log.at_least_level()` predicate 2024-05-08 13:33:59 -04:00
Tyler Goodlet 4d528b76a0 Move `_debug.pformat_cs()` into `devx.pformat` 2024-05-08 13:30:15 -04:00
Tyler Goodlet 05b143d9ef Big debugger rework, more tolerance for internal err-hangs
Since i was running into them (internal errors) during lock request
machinery dev and was getting all sorts of difficult to understand hangs
whenever i intro-ed a bug to either side of the ipc ctx; this all while
trying to get the msg-spec working for `Lock` requesting subactors..

Deats:
- hideframes for `@acm`s and `trio.Event.wait()`, `Lock.release()`.
- better detail out the `Lock.acquire/release()` impls
- drop `Lock.remote_task_in_debug`, use new `.ctx_in_debug`.
- add a `Lock.release(force: bool)`.
- move most of what was `_acquire_debug_lock_from_root_task()` and some
  of the `lock_tty_for_child().__a[enter/exit]()` logic into
  `Lock.[acquire/release]()`  including bunch more logging.
- move `lock_tty_for_child()` up in the module to below `Lock`, with
  some rework:
  - drop `subactor_uid: tuple` arg since we can just use the `ctx`..
  - add exception handler blocks for reporting internal (impl) errors
    and always force release the lock in such cases.
- extend `DebugStatus` (prolly will rename to `DebugRequest` btw):
  - add `.req_ctx: Context` for subactor side.
  - add `.req_finished: trio.Event` to sub to signal request task exit.
  - extend `.shield_sigint()` doc-str.
  - add `.release()` to encaps all the state mgmt previously strewn
    about inside `._pause()`..
- use new `DebugStatus.release()` to replace all the duplication:
  - inside `PdbREPL.set_[continue/quit]()`.
  - inside `._pause()` for the subactor branch on internal
    repl-invocation error cases,
  - in the `_enter_repl_sync()` closure on error,
- replace `apply_debug_codec()` -> `apply_debug_pldec()` in tandem with
  the new `PldRx` sub-sys  which handles the new `__pld_spec__`.
- add a new `pformat_cs()` helper orig to help debug cs stack
  a corruption; going to move to `.devx.pformat` obvi.
- rename `wait_for_parent_stdin_hijack()` -> `request_root_stdio_lock()`
  with improvements:
  - better doc-str and add todos,
  - use `DebugStatus` more stringently to encaps all subactor req state.
  - error handling blocks for cancellation and straight up impl errors
    directly around the `.open_context()` block with the latter doing
    a `ctx.cancel()` to avoid hanging in the shielded `.req_cs` scope.
  - similar exc blocks for the func's overall body with explicit
    `log.exception()` reporting.
  - only set the new `DebugStatus.req_finished: trio.Event` in `finally`.
- rename `mk_mpdb()` -> `mk_pdb()` and don't cal `.shield_sigint()`
  implicitly since the caller usage does matter for this.
- factor out `any_connected_locker_child()` from the SIGINT handler.
- rework SIGINT handler to better handle any stale-lock/hang cases:
  - use new `Lock.ctx_in_debug: Context` to detect subactor-in-debug.
    and use it to cancel any lock request instead of the lower level
  - use `problem: str` summary approach to log emissions.
- rework `_pause()` given all of the above, stuff not yet mentioned:
  - don't take `shield: bool` input and proxy to `debug_func()` (for now).
  - drop `extra_frames_up_when_async: int` usage, expect
    `**debug_func_kwargs` to passthrough an `api_frame: Frametype` (more
    on this later).
  - lotsa asserts around the request ctx vs. task-in-debug ctx using new
    `current_ipc_ctx()`.
  - asserts around `DebugStatus` state.
- rework and simplify the `debug_func` hooks,
  `_set_trace()`/`_post_mortem()`:
  - make them accept a non-optional `repl: PdbRepl` and `api_frame:
    FrameType` which should be used to set the current frame when the
    REPL engages.
  - always hide the hook frames.
  - always accept a `tb: TracebackType` to `_post_mortem()`.
   |_ copy and re-impl what was the delegation to
     `pdbp.xpm()`/`pdbp.post_mortem()` and instead call the
     underlying `Pdb.interaction()` ourselves with a `caller_frame`
     and tb instance.
- adjust the public `.pause()` impl:
  - accept optional `hide_tb` and `api_frame` inputs.
  - mask opening a cancel-scope for now (can cause `trio` stack
    corruption, see notes) and thus don't use the `shield` input other
    then to eventually passthrough to `_post_mortem()`?
   |_ thus drop `task_status` support for now as well.
   |_ pretty sure correct soln is a debug-nursery around `._invoke()`.
- since no longer using `extra_frames_up_when_async` inside
  `debug_func()`s ensure all public apis pass a `api_frame`.
- re-impl our `tractor.post_mortem()` to directly call into `._pause()`
  instead of binding in via `partial` and mk it take similar input as
  `.pause()`.
- drop `Lock.release()` from `_maybe_enter_pm()`, expose and pass
  expected frame and tb.
- use necessary changes from all the above within
  `maybe_wait_for_debugger()` and `acquire_debug_lock()`.

Lel, sorry thought that would be shorter..
There's still a lot more re-org to do particularly with `DebugStatus`
encapsulation but it's coming in follow up.
2024-05-08 11:44:55 -04:00
Tyler Goodlet a354732a9e Allow `Stop` passthrough from `PldRx.recv_msg_w_pld()`
Since we need to allow it (at the least) inside
`drain_until_final_msg()` for handling stream-phase termination races
where we don't want to have to handle a raised error from something like
`Context.result()`. Expose the passthrough option via
a `passthrough_non_pld_msgs: bool` kwarg.

Add comprehensive comment to `current_pldrx()`.
2024-05-08 08:50:16 -04:00
Tyler Goodlet fbc21a1dec Add a "current IPC `Context`" `ContextVar`
Expose it from `._state.current_ipc_ctx()` and set it inside
`._rpc._invoke()` for child and inside `Portal.open_context()` for
parent.

Still need to write a few more tests (particularly demonstrating usage
throughout multiple nested nurseries on each side) but this suffices as
a proto for testing with some debugger request-from-subactor stuff.

Other,
- use new `.devx.pformat.add_div()` for ctxc messages.
- add a block to always traceback dump on corrupted cs stacks.
- better handle non-RAEs exception output-formatting in context
  termination summary log message.
- use a summary for `start_status` for msg logging in RPC loop.
2024-05-07 15:35:45 -04:00
Tyler Goodlet b278164f83 Mk `drain_to_final_msg()` never raise from `Error`
Since we usually want them raised from some (internal) call to
`Context.maybe_raise()` and NOT directly from the drainage call, make it
possible via a new `raise_error: bool` to both `PldRx.recv_msg_w_pld()`
and `.dec_msg()`.

In support,
- rename `return_msg` -> `result_msg` since we expect to return
  `Error`s.
- do a `result_msg` assign and `break` in the `case Error()`.
- add `**dec_msg_kwargs` passthrough for other `.dec_msg()` calling
  methods.

Other,
- drop/aggregate todo-notes around the main loop's
  `ctx._pld_rx.recv_msg_w_pld()` call.
- add (configurable) frame hiding to most payload receive meths.
2024-05-06 13:43:51 -04:00
Tyler Goodlet 8ffa6a5e68 "Icons" in `._entry`'s subactor `.info()` messages
Add a little `>` or `X` supervision icon indicating the spawning or
termination of each sub-actor respectively.
2024-05-06 13:12:44 -04:00
Tyler Goodlet 7707e0e75a Woops, make `log.devx()` level 600 2024-05-06 13:07:53 -04:00
Tyler Goodlet 523c24eb72 Move pformatters into new `.devx.pformat`
Since `._code` is prolly gonna get renamed (to something "frame & stack
tools" related) and to give a bit better organization.

Also adds a new `add_div()` helper, factored out of ctxc message
creation in `._rpc._invoke()`, for adding a little "header line" divider
under a given `message: str` with a little math to center it.
2024-05-06 13:04:58 -04:00
Tyler Goodlet 544ff5ab4c Change to `RemoteActorError.pformat()`
For more sane manual calls as needed in logging purposes. Obvi remap
the dunder methods to it.

Other:
- drop `hide_tb: bool` from `unpack_error()`, shouldn't need it since
  frame won't ever be part of any tb raised from returned error.
- add a `is_invalid_payload: bool` to `_raise_from_unexpected_msg()` to
  be used from `PldRx` where we don't need to decode the IPC
  msg, just the payload; make the error message reflect this case.
- drop commented `._portal._unwrap_msg()` since we've replaced it with
  `PldRx`'s delegation to newer `._raise_from_unexpected_msg()`.
- hide the `Portal.result()` frame by default, again.
2024-05-06 13:01:56 -04:00
Tyler Goodlet 63c23d6b82 Add todo for rigorous struct-type spec of `SpawnSpec` fields 2024-04-30 13:01:07 -04:00
Tyler Goodlet cca3206fd6 Use `log.devx()` for `stackscope` messages 2024-04-30 13:00:03 -04:00