tractor

Commit Graph

Author	SHA1	Message	Date
Tyler Goodlet	bef3dd9e97	Another tweak to REPL entry `.pdb()` headers	2024-07-05 13:32:03 -04:00
Tyler Goodlet	31207f92ee	Finally implement peer-lookup optimization.. There's a been a todo for soo long for this XD Since all `Actor`'s store a set of `._peers` we can try a lookup on that table as a shortcut before pinging the registry Bo Impl deats: - add a new `._discovery.get_peer_by_name()` routine which attempts the `._peers` lookup by combining a copy of that `dict` + an entry added for `Actor._parent_chan` (since all subs have a parent and often the desired contact is just that connection). - change `.find_actor()` (for the `only_first == True` case), `.query_actor()` and `.wait_for_actor()` to call the new helper and deliver appropriate outputs if possible. Other, - deprecate `get_arbiter()` def and all usage in tests and examples. - drop lingering use of `arbiter_sockaddr` arg to various routines. - tweak the `Actor` doc str as well as some code fmting and a tweak to the `._stream_handler()`'s initial `con_status: str` logging value since the way it was could never be reached.. oh and `.warning()` on any new connections which already have a `_pre_chan: Channel` entry in `._peers` so we can start minimizing IPC duplications.	2024-07-04 19:40:11 -04:00
Tyler Goodlet	5f8f8e98ba	More-n-more scops annots in logging	2024-07-04 15:06:15 -04:00
Tyler Goodlet	b56352b0e4	Quieter `Stop` handling on ctx result capture In the `drain_to_final_msg()` impl, since a stream terminating gracefully requires this msg, there's really no reason to `log.cancel()` about it; go `.runtime()` level instead since we're trying de-noise under "normal operation". Also, - passthrough `hide_tb` to taskc-handler's `ctx.maybe_raise()` call. - raise `MessagingError` for the `MsgType` unmatched `case _:`. - detail the doc string motivation a little more.	2024-07-03 22:42:32 -04:00
Tyler Goodlet	9be821a5cf	More failed REPL-lock-request refinements In `lock_stdio_for_peer()` better internal-error handling/reporting: - only `Lock._blocked.remove(ctx.cid)` if that same cid was added on entry to avoid needless key-errors. - drop all `Lock.release(force: bool)` usage remnants. - if `req_ctx.cancel()` fails mention it with `ctx_err.add_note()`. - add more explicit internal-failed-request log messaging via a new `fail_reason: str`. - use and use new `x)<=\n\|_` annots in any failure logging. Other cleanups/niceties: - drop `force: bool` flag entirely from the `Lock.release()`. - use more supervisor-op-annots in `.pdb()` logging with both `_pause/crash_msg: str` instead of double '\|' lines when `.pdb()`-reported from `._set_trace()`/`._post_mortem()`.	2024-07-02 17:06:50 -04:00
Tyler Goodlet	b46400a86f	Use `._entry` proto-ed "lifetime ops" in logging As per a WIP scribbled out TODO in `._entry.nest_from_op()`, change a bunch of "supervisor/lifetime mgmt ops" related log messages to contain some supervisor-annotation "headers" in an effort to give a terser "visual indication" of how some execution/scope/storage primitive entity (like an actor/task/ctx/connection) is being operated on (like, opening/started/closed/cancelled/erroring) from a "supervisor action" POV. Also tweak a bunch more emissions to lower levels to reduce noise around normal inter-actor operations like process and IPC ctx supervision.	2024-07-02 16:31:58 -04:00
Tyler Goodlet	02812b9f51	Reraise RAEs in `MsgStream.receive()`; truncate tbs To avoid showing lowlevel details of exception handling around the underlying call to `return await self._ctx._pld_rx.recv_pld(ipc=self)`, any time a `RemoteActorError` is unpacked (an raised locally) we re-raise it directly from the captured `src_err` captured so as to present to the user/app caller-code an exception raised directly from the `.receive()` frame. This simplifies traceback call-stacks for any `log.exception()` or `pdb`-REPL output filtering out the lower `PldRx` frames by default.	2024-07-02 16:31:15 -04:00
Tyler Goodlet	3c5816c977	Add `Portal.chan` property, to wrap `._chan` attr	2024-07-02 15:53:33 -04:00
Tyler Goodlet	af3745684c	More formal `TransportClosed` reporting/raising Since it was all ad-hoc defined inside `._ipc.MsgpackTCPStream._iter_pkts()` more or less, this starts formalizing a way for particular transport backends to indicate whether a disconnect condition should be re-raised in the RPC msg loop and if not what log level to report it at (if any). Based on our lone transport currently we try to suppress any logging noise from ephemeral connections expected during normal actor interaction and discovery subsys ops: - any short lived discovery related TCP connects are only logged as `.transport()` level. - both `.error()` and raise on any underlying `trio.ClosedResource` cause since that normally means some task touched transport layer internals that it shouldn't have. - do a `.warning()` on anything else unexpected. Impl deats: - extend the `._exceptions.TransportClosed` to accept an input log level, raise-on-report toggle and custom reporting & raising via a new `.report_n_maybe_raise()` method. - construct the TCs with inputs per case in (the newly named) `._iter_pkts(). - call ^ this method from the `TransportClosed` handler block inside the RPC msg loop thus delegating reporting levels and/or raising to the backend's per-case TC instantiating. Related `._ipc` changes: - mask out all the `MsgpackTCPStream._codec` debug helper stuff and drop any lingering cruft from the initial proto-ing of msg-codecs. - rename some attrs/methods: \|_`MsgpackTCPStream._iter_packets()` -> `._iter_pkts()` and `._agen` -> `_aiter_pkts`. \|_`Channel._aiter_recv()` -> `._aiter_msgs()` and `._agen` -> `_aiter_msgs`. - add `hide_tb: bool` support to `Channel.send()` and only show the frame on non-MTEs.	2024-07-02 12:21:26 -04:00
Tyler Goodlet	3907cba68e	Refine some `.trionics` docs and logging - allow passing and report the lib name (`trio` or `tractor`) from `maybe_open_nursery()`. - use `.runtime()` level when reporting `_Cache`-hits in `maybe_open_context()`. - tidy up some doc strings.	2024-06-28 19:28:12 -04:00
Tyler Goodlet	e3d59964af	Woops, set `.cancel()` level in custom levels table..	2024-06-28 19:27:13 -04:00
Tyler Goodlet	edac717613	Use `msgspec.Struct.__repr__()` failover impl In case the struct doesn't import a field type (which will cause the `.pformat()` to raise) just report the issue and try to fall back to the original `repr()` version.	2024-06-28 19:17:05 -04:00
Tyler Goodlet	7e93b81a83	Don't use pretty struct stuff in `._invoke` It's too fragile to put in side core RPC machinery since `msgspec.Struct` defs can fail if a field type can't be looked up at creation time (like can easily happen if you conditionally import using `if TYPE_CHECKING:`) Also, - rename `cs` to `rpc_ctx_cs: CancelScope` since it's literally the wrapping RPC `Context._scope`. - report self cancellation via `explain: str` and add tail case for "unknown cause". - put a ?TODO? around what to do about KBIs if a context is opened from an `infected_aio`-actor task. - similar to our nursery and portal add TODO list for moving all `_invoke_non_context()` content out the RPC core and instead implement them as `.hilevel` endpoint helpers (maybe as decorators?)which under neath define `@context`-funcs.	2024-06-28 19:06:17 -04:00
Tyler Goodlet	4fbd469c33	Update `._entry` actor status log Log-report the different types of actor exit conditions including cancel via KBI, error or normal return with varying levels depending on case. Also, start proto-ing out this weird ascii-syntax idea for describing conc system states and implement the first bit in a `nest_from_op()` log-message fmter that joins and indents an obj `repr()` with a tree-like `'>)\n\|_'` header.	2024-06-28 18:45:52 -04:00
Tyler Goodlet	5e009a8229	Further formalize `greenback` integration Since we more or less require it for `tractor.pause_from_sync()` this refines enable toggles and their relay down the actor tree as well as more explicit logging around init and activation. Tweaks summary: - `.info()` report the module if discovered during root boot. - use a `._state._runtime_vars['use_greenback']: bool` activation flag inside `Actor._from_parent()` to determine if the sub should try to use it and set to `False` if mod-loading fails / not installed. - expose `maybe_init_greenback()` from `.devx` sugpkg. - comment out RTE in `._pause()` for now since we already have it in `.pause_from_sync()`. - always `.exception()` on `maybe_init_greenback()` import errors to clarify the underlying failure deats. - always explicitly report if `._state._runtime_vars['use_greenback']` was NOT set when `.pause_from_sync()` is called. Other `._runtime.async_main()` adjustments: - combine the "internal error call ur parents" message and the failed registry contact status into one new `err_report: str`. - drop the final exception handler's call to `Actor.lifetime_stack.close()` since we're already doing it in the `finally:` block and the earlier call has no currently known benefit. - only report on the `.lifetime_stack()` callbacks if any are detected as registered.	2024-06-28 14:45:45 -04:00
Tyler Goodlet	b72a025d0f	Always reset `._state._ctxvar_Context` to prior Not sure how I forgot this but, obviously it's correct context-var semantics to revert the current IPC `Context` (set in the latest `.open_context()` block) such that any prior instance is reset.. This ensures the sanity `assert`s pass inside `.msg._ops.maybe_limit_plds()` and just in general ensures for any task that the last opened `Context` is the one returned from `current_ipc_ctx()`.	2024-06-28 12:59:31 -04:00
Tyler Goodlet	5739e79645	Use `delay=0` in pump loop.. Turns out it does work XD Prior presumption was from before I had the fute poll-loop so makes sense we needed more then one sched-tick's worth of context switch vs. now we can just keep looping-n-pumping as fast possible until the guest-run's main task completes. Also, - minimize the preface commentary (as per todo) now that we have tests codifying all the edge cases :finger_crossed: - parameter-ize the pump-loop-cycle delay and default it to 0.	2024-06-27 19:27:59 -04:00
Tyler Goodlet	2ac999cc3c	Prep for legacy RPC API factor-n-remove This change is adding commentary about the upcoming API removal and simplification of nursery + portal internals; no actual code changes are included. The plan to (re)move the old RPC methods: - `ActorNursery.run_in_actor()` - `Portal.run()` - `Portal.run_from_ns()` and any related impl internals out of each conc-primitive and instead into something like a `.hilevel.rpc` set of APIs which then are all implemented using the newer and more lowlevel `Context`/`MsgStream` primitives instead Bo Further, - formally deprecate the `Portal.result()` meth for `.wait_for_result()`. - only `log.info()` about runtime shutdown in the implicit root case.	2024-06-27 16:25:46 -04:00
Tyler Goodlet	9f9b0b17dc	Add a `Context.portal`, more cancel tooing Might as well add a public maybe-getter for use on the "parent" side since it can be handy to check out-of-band cancellation conditions (like from `Portal.cancel_actor()`). Buncha bitty tweaks for more easily debugging cancel conditions: - add a `@.cancel_called.setter` for hooking into `.cancel_called = True` being set in hard to decipher "who cancelled us" scenarios. - use a new `self_ctxc: bool` var in `.cancel()` to capture the output state from `._is_self_cancelled(remote_error)` at call time so it can be compared against the measured value at crash-time (when REPL-ing it can often have already changed due to runtime teardown sequencing vs. the crash handler hook entry). - proxy `hide_tb` to `.drain_to_final_msg()` from `.wait_for_result()`. - use `remote_error.sender` attr directly instead of through `RAE.msgdata: dict` lookup. - change var name `our_uid` -> `peer_uid`; it's not "ours".. Other various docs/comment updates: - extend the main class doc to include some other name ideas. - change over all remaining `.result()` refs to `.wait_for_result()`. - doc more details on how we want `.outcome` to eventually signature.	2024-06-26 16:26:18 -04:00
Tyler Goodlet	9133f42b07	Solve our abandonment issues.. To make the recent set of tests pass this (hopefully) finally solves all `asyncio` embedded `trio` guest-run abandonment by ensuring we "pump the event loop" until the guest-run future is fully complete. Accomplished via simple poll loop of the form `while not trio_done_fut.done(): await asyncio.sleep(.1)` in the `aio_main()` task's exception teardown sequence. The loop does a naive 10ms "pump-via-sleep & poll" for the `trio` side to complete before finally exiting (and presumably raising) from the SIGINT cancellation. Other related cleanups and refinements: - use `asyncio.Task.result()` inside `cancel_trio()` since it also inline-raises any exception outcome and we can also log-report the result in non-error cases. - comment out buncha not-sure-we-need-it stuff in `cancel_trio()`. - remove the botched `AsyncioCancelled(CancelledError):` idea obvi XD - comment `greenback` init for now in `aio_main()` since (pretty sure) we don't ever want to actually REPL in that specific func-as-task? - always capture any `fute_err: BaseException` from the `main_outcome: Outcome` delivered by the `trio` side guest-run task. - add and raise a new super noisy `AsyncioRuntimeTranslationError` whenever we detect that the guest-run `trio_done_fut` has not completed before task exit; should avoid abandonment issues ever happening again without knowing!	2024-06-26 13:48:36 -04:00
Tyler Goodlet	4f1db1ff52	Lel, revert `AsyncioCancelled` inherit, module.. Turns out it somehow breaks our `to_asyncio` error relay since obvi `asyncio`'s runtime seems to specially handle it (prolly via `isinstance()` ?) and it caused our `test_aio_cancelled_from_aio_causes_trio_cancelled()` to hang.. Further, obvi `unpack_error()` won't be able to find the type def if not kept inside `._exceptions`.. So given all that, revert the change/move as well as: - tweak the aio-from-aio cancel test to timeout. - do `trio.sleep()` conc with any bg aio task by moving out nursery block. - add a `send_sigint_to: str` parameter to `test_sigint_closes_lifetime_stack()` such that we test the SIGINT being relayed to just the parent or the child.	2024-06-25 23:47:14 -04:00
Tyler Goodlet	a870df68c0	Hack `asyncio` to not abandon a guest-mode run? Took me a while to figure out what the heck was going on but, turns out `asyncio` changed their SIGINT handling in 3.11 as per: https://docs.python.org/3/library/asyncio-runner.html#handling-keyboard-interruption I'm not entirely sure if it's the 3.11 changes or possibly wtv further updates were made in 3.12 but more or less due to the way our current main task was written the `trio` guest-run was getting abandoned on SIGINTs sent from the OS to the infected child proc.. Note that much of the bug and soln cases are layed out in very detailed comment-notes both in the new test and `run_as_asyncio_guest()`, right above the final "fix" lines. Add new `test_infected_aio.test_sigint_closes_lifetime_stack()` test suite which reliably triggers all abandonment issues with multiple cases of different parent behaviour post-sending-SIGINT-to-child: 1. briefly sleep then raise a KBI in the parent which was originally demonstrating the file leak not being cleaned up by `Actor.lifetime_stack.close()` and simulates a ctl-c from the console (relayed in tandem by the OS to the parent and child processes). 2. do `Context.wait_for_result()` on the child context which would hang and timeout since the actor runtime would never complete and thus never relay a `ContextCancelled`. 3. both with and without running a `asyncio` task in the `manage_file` child actor; originally it seemed that with an aio task scheduled in the child actor the guest-run abandonment always was the "loud" case where there seemed to be some actor teardown but with tbs from python failing to gracefully exit the `trio` runtime.. The (seemingly working) "fix" required 2 lines of code to be run inside a `asyncio.CancelledError` handler around the call to `await trio_done_fut`: - `Actor.cancel_soon()` which schedules the actor runtime to cancel on the next `trio` runner cycle and results in a "self cancellation" of the actor. - "pumping the `asyncio` event loop" with a non-0 `.sleep(0.1)` XD \|_ seems that a "shielded" pump with some actual `delay: float >= 0` did the trick to get `asyncio` to allow the `trio` runner/loop to fully complete its guest-run without abandonment. Other supporting changes: - move `._exceptions.AsyncioCancelled`, our renamed `asyncio.CancelledError` error-sub-type-wrapper, to `.to_asyncio` and make it derive from `CancelledError` so as to be sure when raised by our `asyncio` x-> `trio` exception relay machinery that `asyncio` is getting the specific type it expects during cancellation. - do "summary status" style logging in `run_as_asyncio_guest()` wherein we compile the eventual `startup_msg: str` emitted just before waiting on the `trio_done_fut`. - shield-wait with `out: Outcome = await asyncio.shield(trio_done_fut)` even though it seems to do nothing in the SIGINT handling case..(I presume it might help avoid abandonment in a `asyncio.Task.cancel()` case maybe?)	2024-06-24 16:10:23 -04:00
Tyler Goodlet	3d12a7e005	Flip `infected_asyncio` status msg to `.runtime()`	2024-06-18 18:14:58 -04:00
Tyler Goodlet	9292d73b40	Avoid actor-nursery-exit warns on registrees Since a local-actor-nursery-parented subactor might also use the root as its registry, we need to avoid warning when short lived IPC `Channel` connections establish and then disconnect (quickly, bc the apparently the subactor isn't re-using an already cached parente-peer<->child conn as you'd expect efficiency..) since such cases currently considered normal operation of our super shoddy/naive "discovery sys" XD As such, (un)guard the whole local-actor-nursery OR channel-draining waiting blocks with the additional `or Actor._cancel_called` branch since really we should also be waiting on the parent nurse to exit (at least, for sure and always) when the local `Actor` indeed has been "globally" cancelled-called. Further add separate timeout warnings for channel-draining vs. local-actor-nursery-exit waiting since they are technically orthogonal cases (at least, afaik). Also, - adjust the `Actor._stream_handler()` connection status log-emit to `.runtime()`, especially to reduce noise around the aforementioned ephemeral registree connection-requests. - if we do wait on a local actor-nurse to exit, report its `._children` table (which should help figure out going forward how useful the warning is, if at all).	2024-06-18 16:10:36 -04:00
Tyler Goodlet	83d69fe395	Change `_Cache` reuse emit to `.runtime()`	2024-06-18 14:40:26 -04:00
Tyler Goodlet	72df312e71	Expand `PayloadMsg` doc-str	2024-06-18 09:57:10 -04:00
Tyler Goodlet	711f639fc5	Break `_mk_msg_type_err()` into recv/send side funcs Name them `_mk_send_mte()`/`_mk_recv_mte()` and change the runtime to call each appropriately depending on location/usage. Also add some dynamic call-frame "unhide" blocks such that when we expect raised MTE from the aboves calls but we get a different unexpected error from the runtime, we ensure the call stack downward is shown in tbs/pdb. \|_ ideally in the longer run we come up with a fancier dynamic sys for this, prolly something in `.devx._frame_stack`?	2024-06-17 13:12:16 -04:00
Tyler Goodlet	8477919fc9	Don't pass `ipc_msg` for send side MTEs Just pass `_bad_msg` such that it get's injected to `.msgdata` since with a send-side `MsgTypeError` we don't have a remote `._ipc_msg: Error` per say to include.	2024-06-17 10:32:50 -04:00
Tyler Goodlet	872feef24b	Add note about using `@acm` as decorator in 3.10	2024-06-17 10:32:38 -04:00
Tyler Goodlet	04bd111037	Proxy through `dec_hook` in `.limit_plds()` APIs	2024-06-17 09:23:31 -04:00
Tyler Goodlet	a0ee0cc713	Port debug request ep to use `@context(pld_spec)` Namely passing the `.__pld_spec__` directly to the `lock_stdio_for_peer()` decorator B) Also, allows dropping `apply_debug_pldec()` (which was a todo) and removing a `lock_stdio_for_peer()` indent level.	2024-06-17 09:01:13 -04:00
Tyler Goodlet	5449bd5673	Offer a `@context(pld_spec=<TypeAlias>)` API Instead of the WIP/prototyped `Portal.open_context()` offering a `pld_spec` input arg, this changes to a proper decorator API for specifying the "payload spec" on `@context` endpoints. The impl change details actually cover 2-birds: - monkey patch decorated functions with a new `._tractor_context_meta: dict[str, Any]` and insert any provided input `@context` kwargs: `_pld_spec`, `enc_hook`, `enc_hook`. - use `inspect.get_annotations()` to scan for a `func` arg type-annotated with `tractor.Context` and use the name of that arg as the RPC task-side injected `Context`, thus injecting the needed arg by type instead of by name (a longstanding TODO); raise a type-error when not found. - pull the `pld_spec` from the `._tractor_context_meta` attr both in the `.open_context()` parent-side and child-side `._invoke()`-cation of the RPC task and use the `msg._ops.maybe_limit_plds()` API to apply it internally in the runtime for each case.	2024-06-16 23:12:53 -04:00
Tyler Goodlet	e6d4ec43b9	Log tbs from non-RAE `._invoke()`-RPC-task errors `RemoteActorError`s show this by default in their `.__repr__()`, and we obvi capture and embed the src traceback in an `Error` msg prior to transit, but for logging it's also handy to see the tb of any set `Context._remote_error` on console especially when trying to decipher remote error details at their origin actor. Also improve the log message description using `ctx.repr_state` and show any `ctx.outcome`.	2024-06-14 15:49:30 -04:00
Tyler Goodlet	418c6907fd	Add `enable_stack_on_sig: bool` for `stackscope` toggle	2024-06-14 15:37:57 -04:00
Tyler Goodlet	d528e7ab4d	Add `@context(pld_spec=<TypeAlias>)` TODO list Longer run we don't want `tractor` app devs having to call `msg._ops.limit_plds()` from every child endpoint.. so this starts a list of decorator API ideas and obviously ties in with an ideal final API design that will come with py3.13 and typed funcs. Obviously this is directly fueled by, - https://github.com/goodboy/tractor/issues/365 Other, - type with direct `trio.lowlevel.Task` import. - use `log.exception()` to show tbs for all error-terminations in `.open_context()` (for now) and always explicitly mention the `.side`.	2024-06-14 15:37:07 -04:00
Tyler Goodlet	7a89b59a3f	Bleh, make `log.devx()` level less then cancel but > `.runtime()`	2024-06-11 20:45:41 -04:00
Tyler Goodlet	7d4cd8944c	Use `_debug._sync_pause_from_builtin()` as `breakpoint()` override	2024-06-10 19:16:21 -04:00
Tyler Goodlet	43a8cf4be1	Make big TODO: for `devx._debug` refinements Hopefully would make grok-ing this fairly sophisticated sub-sys possible for any up-and-coming `tractor` hacker XD A lot of internal API and re-org ideas I discovered/realized as part of finishing the `__pld_spec__` and multi-threaded support. Particularly better isolation between root-actor vs subactor task APIs and generally less globally-state-ful stuff like `DebugStatus` and `Lock` method APIs would likely make a lot of the hard to follow edge cases more clear?	2024-06-10 17:46:10 -04:00
Tyler Goodlet	6534a363a5	First proto: multi-threaded synced `pdb`-REPLs Functionally working for multi-threaded (via cpython threads spawned from `to_trio.to_thread.run_sync()`) alongside subactors, tested (for now) only with threads started inside the root actor (which seemed to have the most issues in terms of the impl and special cases..) using the new `tractor.pause_from_sync()` API! Main implementation changes to `.pause_from_sync()` ------ - ------ - from the root actor, we need to ensure bg thread case is handled specially since no IPC is used to request the TTY stdio mutex and `Lock` (API) usage is conducted entirely from a local task or thread; dedicated `Lock` usage for the root-actor already is branched inside `._pause()` and needs similar handling from a root bg-thread: \|_for the special case of a root bg thread we need to `trio`-main-thread schedule a bg task inside a new `_pause_from_bg_root_thread()`. The new task needs to implement most of what was is handled inside `._pause()` manually, mostly because in this root-actor-bg-thread case we have 2 constraints: 1. to enter `PdbREPL.interaction()` from the bg thread directly, 2. the task that `Lock._debug_lock.acquire()`s has to be the same that calls `.release() (a `trio.FIFOLock` constraint) \|_impl deats of this `_pause_from_bg_root_thread()` include: - (for now) calling `._pause()` to acquire the `Lock._debug_lock`. - setting its own `DebugStatus.repl_release`. - calling `.DebugStatus.shield_sigint()` to ensure the root's main thread uses the right handler when the bg one is REPL-ing. - wait manually on the `.repl_release()` to be set by the thread's dedicated `PdbREPL` exit. - manually calling `Lock.release()` from the same task that acquired it. - expect calls to `._pause()` to deliver a `tuple[Task, PdbREPL]` such that we always get the handle both to any newly created REPl instance and the (maybe) the scheduled bg task within which is runs. - add a single `message: str` style to `log.devx()` based on branching style for logging. - ensure both `DebugStatus.repl` and `.repl_task` are set just before calling `._set_trace()` to ensure the correct `Task\|Thread` is set when the REPL is finally entered from sync code. - add a wrapping caller `_sync_pause_from_builtin()` which passes in the new `called_from_builtin=True` to indicate `breakpoint()` caller usage, obvi pass in `api_frame`. Changes to `._pause()` in support of ^ ------ - ------ - `TaskStatus.started()` and return the `tuple[Task, PdbREPL]` to callers / starters. - only call `DebugStatus.shield_sigint()` when no `repl` passed bc some callers (like bg threads) may need to apply it at some specific point themselves. - tweak some asserts for the `debug_func == None` / non-`trio`-thread case. - add a mod-level `_repl_fail_msg: str` to be used when there's an internal `._pause()` failure for testing, easier to pexpect match. - more comprehensive logging for the root-actor branched case to (attempt to) indicate any of the 3 cases: - remote ctx from subactor has the `Lock`, - already existing root task or thread has it or, - some kinda stale `.locked()` situation where the root has the lock but we don't know why. - for root usage, revert to always `await Lock._debug_lock.acquire()`-ing despite `called_from_sync` since `.pause_from_sync()` was reworked to instead handle the special bg thread case in the new `_pause_from_bg_root_thread()` task. - always do `return _enter_repl_sync(debug_func)`. - try to report any `repl_task: Task\|Thread` set by the caller (particularly for the bg thread cases) as being the thread or task `._pause()` was called "on behalf of" Changes to `DebugStatus`/`Lock` in support of ^ ------ - ------ - only call `Lock.release()` from `DebugStatus.set_[quit/continue]()` when called from the main `trio` thread and always call `DebugStatus.release()` after to ensure `.repl_released()` is set after `._debug_lock.release()`. - only call `.repl_release.set()` from `trio` thread otherwise use `.from_thread.run()`. - much more refinements in `Lock.release()` for threading cases: - return `bool` to indicate whether lock was released by caller. - mask (in prep to drop) `_pause()` usage of `Lock.release.force=True)` since forcing a release can't ever avoid the RTE from `trio`.. same task must acquire/release. - don't allow usage from non-`trio`-main-threads, ever; there's no point since the same-task-needs-to-manage-`FIFOLock` constraint. - much more detailed logging using `message`-building-style for all caller (edge) cases. \|_ use a `we_released: bool` to determine failed-to-release edge cases which can happen if called from bg threads, ensure we `log.exception()` on any incorrect usage resulting in release failure. \|_ complain loudly if the release fails and some other task/thread still holds the lock. \|_ be explicit about "who" (which task or thread) the release is "on behalf of" by reading `DebugStatus.repl_task` since the caller isn't the REPL operator in many sync cases. - more or less drop `force` support, as mentioned above. - ensure we unset `._owned_by_root` if the caller is a root task. Other misc ------ - ------ - rename `lock_tty_for_child()` -> `lock_stdio_for_peer()`. - rejig `Lock.repr()` to show lock and event stats. - stage `Lock.stats` and `.owner` methods in prep for doing a singleton instance and `@property`s.	2024-06-10 17:45:52 -04:00
Tyler Goodlet	408a74784e	Catch `.pause_from_sync()` in root bg thread bugs! Originally discovered as while using `tractor.pause_from_sync()` from the `i3ipc` client running in a bg-thread that uses `asyncio` inside `modden`. Turns out we definitely aren't correctly handling `.pause_from_sync()` from the root actor when called from a `trio.to_thread.run_sync()` bg thread: - root-actor bg threads which can't `Lock._debug_lock.acquire()` since they aren't in `trio.Task`s. - even if scheduled via `.to_thread.run_sync(_debug._pause)` the acquirer won't be the task/thread which calls `Lock.release()` from `PdbREPL` hooks; this results in a RTE raised by `trio`.. - multiple threads will step on each other's stdio since cpython's GIL seems to ctx switch threads on every input from the user to the REPL loop.. Reproduce via reworking our example and test so that they catch and fail for all edge cases: - rework the `/examples/debugging/sync_bp.py` example to demonstrate the above issues, namely the stdio clobbering in the REPL when multiple threads and/or a subactor try to debug simultaneously. \|_ run one thread using a task nursery to ensure it runs conc with the nursery's parent task. \|_ ensure the bg threads run conc a subactor usage of `.pause_from_sync()`. \|_ gravely detail all the special cases inside a TODO comment. \|_ add some control flags to `sync_pause()` helper and don't use `breakpoint()` by default. - extend and adjust `test_debugger.test_pause_from_sync` to match (and thus currently fail) by ensuring exclusive `PdbREPL` attachment when the 2 bg root-actor threads are concurrently interacting alongside the subactor: \|_ should only see one of the `_pause_msg` logs at a time for either one of the threads or the subactor. \|_ ensure each attaches (in no particular order) before expecting the script to exit. Impl adjustments to `.devx._debug`: - drop `Lock.repl`, no longer used. - add `Lock._owned_by_root: bool` for the `.ctx_in_debug == None` root-actor-task active case. - always `log.exception()` for any `._debug_lock.release()` ownership RTE emitted by `trio`, like we used to.. - add special `Lock.release()` log message for the stale lock but `._owned_by_root == True` case; oh yeah and actually `log.devx(message)`.. - rename `Lock.acquire()` -> `.acquire_for_ctx()` since it's only ever used from subactor IPC usage; well that and for local root-task usage we should prolly add a `.acquire_from_root_task()`? - buncha `._pause()` impl improvements: \|_ type `._pause()`'s `debug_func` as a `partial` as well. \|_ offer `called_from_sync: bool` and `called_from_bg_thread: bool` for the special case handling when called from `.pause_from_sync()` \|_ only set `DebugStatus.repl/repl_task` when `debug_func != None` (OW ensure the `.repl_task` is not the current one). \|_ handle error logging even when `debug_func is None`.. \|_ lotsa detailed commentary around root-actor-bg-thread special cases. - when `._set_trace(hide_tb=False)` do `pdbp.set_trace(frame=currentframe())` so the `._debug` internal frames are always included. - by default always hide tracebacks for `.pause[_from_sync]()` internals. - improve `.pause_from_sync()` to avoid root-bg-thread crashes: \|_ pass new `called_from_xxx_` flags and ensure `DebugStatus.repl_task` is actually set to the `threading.current_thread()` when needed. \|_ manually call `Lock._debug_lock.acquire_nowait()` for the non-bg thread case. \|_ TODO: still need to implement the bg-thread case using a bg `trio.Task`-in-thread with an `trio.Event` set by thread REPL exit.	2024-06-06 16:56:30 -04:00
Tyler Goodlet	f0342d6ae3	Move `Context.open_stream()` impl to `._streaming` Exactly like how it's organized for `Portal.open_context()`, put the main streaming API `@acm` with the `MsgStream` code and bind the method to the new module func. Other, - rename `Context.result()` -> `.wait_for_result()` to better match the blocking semantics and rebind `.result()` as deprecated. - add doc-str for `Context.maybe_raise()`.	2024-05-31 17:32:11 -04:00
Tyler Goodlet	21f633a900	Use `Context` repr APIs for RPC outcome logs Delegate to the new `.repr_state: str` and adjust log level based on error vs. cancel vs. result.	2024-05-31 14:40:55 -04:00
Tyler Goodlet	4a270f85ca	Drop sub-decoder proto-cruft from `.msg._codec` It ended up getting necessarily implemented as the `PldRx` though at a different layer and won't be needed as part of `MsgCodec` most likely, though this original idea did provide the source of inspiration for how things work now! Also Move the commented TODO proto for a codec hook factory from `.types` to `._codec` where it prolly better fits and update some msg related todo/questions.	2024-05-31 12:03:18 -04:00
Tyler Goodlet	8ea0f08386	Finally, officially support shielded REPL-ing! It's been a long time prepped and now finally implemented! Offer a `shield: bool` argument from our async `._debug` APIs: - `await tractor.pause(shield=True)`, - `await tractor.post_mortem(shield=True)` ^-These-^ can now be used inside cancelled `trio.CancelScope`s, something very handy when introspecting complex (distributed) system tear/shut-downs particularly under remote error or (inter-peer) cancellation conditions B) Thanks to previous prepping in a prior attempt and various patches from the rigorous rework of `.devx._debug` internals around typed msg specs, there ain't much that was needed! Impl deats - obvi passthrough `shield` from the public API endpoints (was already done from a prior attempt). - put ad-hoc internal `with trio.CancelScope(shield=shield):` around all checkpoints inside `._pause()` for both the root-process and subactor case branches. Add a fairly rigorous example, `examples/debugging/shielded_pause.py` with a wrapping `pexpect` test, `test_debugger.test_shield_pause()` and ensure it covers as many cases as i can think of offhand: - multiple `.pause()` entries in a loop despite parent scope cancellation in a subactor RPC task which itself spawns a sub-task. - a `trio.Nursery.parent_task` which raises, is handled and tries to enter and unshielded `.post_mortem()`, which of course internally raises `Cancelled` in a `._pause()` checkpoint, so we catch the `Cancelled` again and then debug the debugger's internal cancellation with specific checks for the particular raising checkpoint-LOC. - do ^- the latter -^ for both subactor and root cases to ensure we can debug `._pause()` itself when it tries to REPL engage from a cancelled task scope Bo	2024-05-30 17:52:24 -04:00
Tyler Goodlet	13ea500a44	Rename `PldRx.dec_msg()` -> `.decode_pld()` Keep the old alias, but i think it's better form to use longer names for internal public APIs and this name better reflects the functionality: decoding and returning a `PayloadMsg.pld` field.	2024-05-30 16:09:59 -04:00
Tyler Goodlet	fcd089c08f	Always `.exception()` in `try_ship_error_to_remote()` on internal error	2024-05-30 16:02:25 -04:00
Tyler Goodlet	993281882b	Pass `boxed_type` from `_mk_msg_type_err()` Such that we're boxing the interchanged lib's specific error `msgspec.ValidationError` in this case) type much like how a `ContextCancelled[trio.Cancelled]` is composed; allows for seemless multi-backend-codec support later as well B) Pass `ctx.maybe_raise(from_src_exc=src_err)` where needed in a couple spots; as `None` in the send-side `Started` MTE case to avoid showing the `._scope1.cancel_called` result in the traceback from the `.open_context()` child-sync phase.	2024-05-30 15:55:34 -04:00
Tyler Goodlet	bbb4d4e52c	Add `from_src_exc: BaseException` to maybe raisers That is as a control to `Context._maybe_raise_remote_err()` such that if set to anything other then the default (`False` value), we do `raise remote_error from from_src_exc` such that caller can choose to suppress or override the `.__cause__` tb. Also tidy up and old masked TODO regarding calling `.maybe_raise()` after the caller exits from the `yield` in `.open_context()`..	2024-05-30 15:24:25 -04:00
Tyler Goodlet	0e8c60ee4a	Better RAE `.pformat()`-ing for send-side MTEs Send-side `MsgTypeError`s actually shouldn't have any "boxed" traceback per say since they're raised in the transmitting actor's local task env and we (normally) don't want the ascii decoration added around the error's `._message: str`, that is not until the exc is `pack_error()`-ed before transit. As such, the presentation of an embedded traceback (and its ascii box) gets bypassed when only a `._message: str` is set (as we now do for pld-spec failures in `_mk_msg_type_err()`). Further this tweaks the `.pformat()` output to include the `._message` part to look like `<RemoteActorError( <._message> ) ..` instead of jamming it implicitly to the end of the embedded `.tb_str` (as was done implicitly by `unpack_error()`) and also adds better handling for the `with_type_header == False` case including forcing that case when we detect that the currently handled exc is the RAE in `.pformat()`. Toss in a lengthier doc-str explaining it all. Surrounding/supporting changes, - better `unpack_error()` message which just briefly reports the remote task's error type. - add public `.message: str` prop. - always set a `._extra_msgdata: dict` since some MTE props rely on it. - handle `.boxed_type == None` for `.boxed_type_str`. - maybe pack any detected input or `exc.message` in `pack_error()`. - comment cruft cleanup in `_mk_msg_type_err()`.	2024-05-30 10:25:04 -04:00
Tyler Goodlet	1db5d4def2	Add `Error.message: str` Allows passing a custom error msg other then the traceback-str over the wire. Make `.tb_str` optional (in the blank `''` sense) since it's treated that way thus far in `._exceptions.pack_error()`.	2024-05-30 09:14:04 -04:00
Tyler Goodlet	6e54abc56d	Fix missing newline in task-cancel log-message	2024-05-30 09:06:10 -04:00
Tyler Goodlet	28af4749cc	Don't need to pack an `Error` with send-side MTEs	2024-05-30 09:05:23 -04:00
Tyler Goodlet	6a4ee461f5	Raise remote errors rxed during `Context` child-sync More specifically, if `.open_context()` is cancelled when awaiting the first `Context.started()` during the child task sync phase, check to see if it was due to `._scope.cancel_called` and raise any remote error via `.maybe_raise()` instead the `trio.Cancelled` like in every other remote-error handling case. Ensure we set `._scope[_nursery]` only after the `Started` has arrived and audited.	2024-05-28 16:11:01 -04:00
Tyler Goodlet	2db03444f7	Don't (noisly) log about runtime cancel RPC tasks Since in the case of the `Actor._cancel_task()` related runtime eps we actually don't EVER register them in `Actor._rpc_tasks`.. logging about them is just needless noise, though maybe we should track them in a diff table; something like a `._runtime_rpc_tasks`? Drop the cancel-request-for-stale-RPC-task (`KeyError` case in `Actor._cancel_task()`) log-emit level in to `.runtime()`; it's generally not useful info other then for granular race condition eval when hacking the runtime.	2024-05-28 16:03:36 -04:00
Tyler Goodlet	a1b124b62b	Raise send-side MTEs inline in `PldRx.dec_msg()` So when `is_started_send_side is True` we raise the newly created `MsgTypeError` (MTE) directly instead of doing all the `Error`-msg pack and unpack to raise stuff via `_raise_from_unexpected_msg()` since the raise should happen send side anyway and so doesn't emulate any remote fault like in a bad `Return` or `Started` without send-side pld-spec validation. Oh, and proxy-through the `hide_tb: bool` input from `.drain_to_final_msg()` to `.recv_msg_w_pld()`.	2024-05-28 16:02:51 -04:00
Tyler Goodlet	59ca256183	Set remote errors in `_raise_from_unexpected_msg()` By calling `Context._maybe_cancel_and_set_remote_error(exc)` on any unpacked `Error` msg; provides for `Context.maybe_error` consistency to match all other error delivery cases.	2024-05-28 15:48:01 -04:00
Tyler Goodlet	6c2efc96dc	Factor `.started()` validation into `.msg._ops` Filling out the helper `validate_payload_msg()` staged in a prior commit and adjusting all imports to match. Also add a `raise_mte: bool` flag for potential usage where the caller wants to handle the MTE instance themselves.	2024-05-28 11:08:27 -04:00
Tyler Goodlet	7ac730e326	Drop `msg.types.Msg` for new replacement types The `TypeAlias` for the msg type-group is now `MsgType` and any user touching shuttle messages can now be typed as `PayloadMsg`. Relatedly, add MTE specific `Error._bad_msg[_as_dict]` fields which are handy for introspection of remote decode failures.	2024-05-28 09:55:16 -04:00
Tyler Goodlet	582144830f	Parameterize the `return_msg_type` in `._invoke()` Since we also handle a runtime-specific `CancelAck`, allow the caller-scheduler to pass in the expected return-type msg per the RPC msg endpoint loop.	2024-05-28 09:36:26 -04:00
Tyler Goodlet	eee4c61b51	Add `MsgTypeError` "bad msg" capture Such that if caught by user code and/or the runtime we can introspect the original msg which caused the type error. Previously this was kinda half-baked with a `.msg_dict` which was delivered from an `Any`-decode of the shuttle msg in `_mk_msg_type_err()` but now this more explicitly refines the API and supports both `PayloadMsg`-instance or the msg-dict style injection: - allow passing either of `bad_msg: PayloadMsg\|None` or `bad_msg_as_dict: dict\|None` to `MsgTypeError.from_decode()`. - expose public props for both ^ whilst dropping prior `.msgdict`. - rework `.from_decode()` to explicitly accept `extra_msgdata: dict` \|_ only overriding it from any `bad_msg_as_dict` if the keys are found in `_ipcmsg_keys`, except** for `_bad_msg` when `bad_msg` is passed. \|_ drop `.ipc_msg` passthrough. \|_ drop `msgdict` input. - adjust `.cid` to only pull from the `.bad_msg` if set. Related fixes/adjustments: - `pack_from_raise()` should pull `boxed_type_str` from `boxed_type.__name__`, not the `type()` of it.. also add a `hide_tb: bool` flag. - don't include `_msg_dict` and `_bad_msg` in the `_body_fields` set. - allow more granular boxed traceback-str controls: \|_ allow passing a `tb_str: str` explicitly in which case we use it verbatim and presume caller knows what they're doing. \|_ when not provided, use the more explicit `traceback.format_exception(exc)` since the error instance is a required input (we still fail back to the old `.format_exc()` call if for some reason the caller passes `None`; but that should be a bug right?). \|_ if a `tb: TracebackType` and a `tb_str` is passed, concat them. - in `RemoteActorError.pformat()` don't indent the `._message` part used for the `body` when `with_type_header == False`. - update `_mk_msg_type_err()` to use `bad_msg`/`bad_msg_as_dict` appropriately and drop passing `ipc_msg`.	2024-05-27 22:36:05 -04:00
Tyler Goodlet	42ba855d1b	More correct/explicit `.started()` send-side validation In the sense that we handle it as a special case that exposed through to `RxPld.dec_msg()` with a new `is_started_send_side: bool`. (Non-ideal) `Context.started()` impl deats: - only do send-side pld-spec validation when a new `validate_pld_spec` is set (by default it's not). - call `self.pld_rx.dec_msg(is_started_send_side=True)` to validate the payload field from the just codec-ed `Started` msg's `msg_bytes` by passing the `roundtripped` msg (with it's `.pld: Raw`) directly. - add a `hide_tb: bool` param and proxy it to the `.dec_msg()` call. (Non-ideal) `PldRx.dec_msg()` impl deats: - for now we're packing the MTE inside an `Error` via a manual call to `pack_error()` and then setting that as the `msg` passed to `_raise_from_unexpected_msg()` (though really we should just raise inline?). - manually set the `MsgTypeError._ipc_msg` to the above.. Other, - more comprehensive `Context` type doc string. - various `hide_tb: bool` kwarg additions through `._ops.PldRx` meths. - proto a `.msg._ops.validate_payload_msg()` helper planned to get the logic from this version of `.started()`'s send-side validation so as to be useful more generally elsewhere.. (like for raising back `Return` values on the child side?). Warning: this commit may have been made out of order from required changes to `._exceptions` which will come in a follow up!	2024-05-27 14:59:40 -04:00
Tyler Goodlet	e4ec6b7b0c	Even smarter `RemoteActorError.pformat()`-ing Related to the prior patch, re the new `with_type_header: bool`: - in the `with_type_header == True` use case make sure we keep the first `._message: str` line non-indented since it'll show just after the header-line's type path with ':'. - when `False` drop the `)>` `repr()`-instance style as well so that we just get the ascii boxed traceback as though it's the error message-`str` not the `repr()` of the error obj. Other, - hide `pack_from_raise()` call frame since it'll show in debug mode crash handling.. - mk `MsgTypeError.from_decode()` explicitly accept and proxy an optional `ipc_msg` and change `msgdict` to also be optional, only reading out the `**extra_msgdata` when provided. - expose a `_mk_msg_type_err(src_err_msg: Error\|None = None,)` for callers who which to inject a `._ipc_msg: Msgtype` to the MTE. \|_ add a note how we can't use it due to a causality-dilemma when pld validating `Started` on the send side..	2024-05-22 15:26:48 -04:00
Tyler Goodlet	9ce958cb4a	Add debug check-n-wait inside `._spawn.soft_kill()` And IFF the `await wait_func(proc)` is cancelled such that we avoid clobbering some subactor that might be REPL-ing even though its parent actor is in the midst of (gracefully) cancelling it.	2024-05-22 15:21:01 -04:00
Tyler Goodlet	ce4d64ed2f	Mk `MsgDec.spec_str` have a more compact `	2024-05-22 15:18:45 -04:00
Tyler Goodlet	c6f599b1be	Call `.devx._debug.hide_runtime_frames()` by default From both `open_root_actor()` and `._entry._trio_main()`. Other `breakpoint()`-from-sync-func fixes: - properly disable the default hook using `"0"` XD - offer a `hide_tb: bool` from `open_root_actor()`. - disable hiding the `._trio_main()` frame, bc pretty sure it doesn't help anyone (either way) when REPL-ing/tb-ing from a subactor..?	2024-05-22 15:16:29 -04:00
Tyler Goodlet	9eb74560ad	Port `Actor._stream_handler()` to use `.has_outcome`, fix indent bug..	2024-05-22 15:10:39 -04:00
Tyler Goodlet	d15e73557a	Move runtime frame hiding into helper func Call it `hide_runtime_frames()` and stick all the lines from the top of the `._debug` mod in there along with a little `log.devx()` emission on what gets hidden by default ;) Other, - fix ref-error where internal-error handler might trigger despite the debug `req_ctx` not yet having init-ed, such that we don't try to cancel or log about it when it never was fully created/initialize.. - fix assignment typo iniside `_set_trace()` for `task`.. lel	2024-05-22 14:56:54 -04:00
Tyler Goodlet	74d4b5280a	Woops, make `log.devx()` level less `.error()`	2024-05-22 14:56:18 -04:00
Tyler Goodlet	3538ccd799	Better context aware `RemoteActorError.pformat()` Such that when displaying with `.__str__()` we do not show the type header (style) since normally python's raising machinery already prints the type path like `'tractor._exceptions.RemoteActorError:'`, so doing it 2x is a bit ugly ;p In support, - include `.relay_uid` in `RemoteActorError.extra_body_fields`. - offer a `with_type_header: bool` to `.pformat()` and only put the opening type path and closing `')>'` tail line when `True`. - add `.is_inception() -> bool:` for an easy way to determine if the error is multi-hop relayed. - only repr the `'\|_relay_uid=<uid>'` field when an error is an inception. - tweak the invalid-payload case in `_mk_msg_type_err()` to explicitly state in the `message` how the `any_pld` value does not match the `MsgDec.pld_spec` by decoding the invalid `.pld` with an any-dec. - allow `_mk_msg_type_err(**mte_kwargs)` passthrough. - pass `boxed_type=cls` inside `MsgTypeError.from_decode()`.	2024-05-22 10:35:42 -04:00
Tyler Goodlet	b22f7dcae0	Resolve remaining debug-request race causing hangs More or less by pedantically separating and managing root and subactor request syncing events to always be managed by the locking IPC context task-funcs: - for the root's "child"-side, `lock_tty_for_child()` directly creates and sets a new `Lock.req_handler_finished` inside a `finally:` - for the sub's "parent"-side, `request_root_stdio_lock()` does the same with a new `DebugStatus.req_finished` event and separates it from the `.repl_release` event (which indicates a "c" or "q" from user and thus exit of the REPL session) as well as sets a new `.req_task: trio.Task` to explicitly distinguish from the app-user-task that enters the REPL vs. the paired bg task used to request the global root's stdio mutex alongside it. - apply the `__pld_spec__` on "child"-side of the ctx using the new `Portal.open_context(pld_spec)` parameter support; drops use of any `ContextVar` malarky used prior for `PldRx` mgmt. - removing `Lock.no_remote_has_tty` since it was a nebulous name and from the prior "everything is in a `Lock`" design.. ------ - ------ More rigorous impl to handle various edge cases in `._pause()`: - rejig `_enter_repl_sync()` to wrap the `debug_func == None` case inside maybe-internal-error handler blocks. - better logic for recurrent vs. multi-task contention for REPL entry in subactors, by guarding using `DebugStatus.req_task` and by now waiting on the new `DebugStatus.req_finished` for the multi-task contention case. - even better internal error handling and reporting for when this code is hacked on and possibly broken ;p ------ - ------ Updates to `.pause_from_sync()` support: - add optional `actor`, `task` kwargs to `_set_trace()` to allow compat with the new explicit `debug_func` calling in `._pause()` and pass a `threading.Thread` for `task` in the `.to_thread()` usage case. - add an `except` block that tries to show the frame on any internal error. ------ - ------ Relatedly includes a buncha cleanups/simplifications somewhat in prep for some coming refinements (around `DebugStatus`): - use all the new attrs mentioned above as needed in the SIGINT shielder. - wait on `Lock.req_handler_finished` in `maybe_wait_for_debugger()`. - dropping a ton of masked legacy code left in during the recent reworks. - better comments, like on the use of `Context._scope` for shielding on the "child"-side to avoid the need to manage yet another cs. - add/change-to lotsa `log.devx()` level emissions for those infos which are handy while hacking on the debugger but not ideal/necessary to be user visible. - obvi add lotsa follow up todo notes!	2024-05-21 10:19:41 -04:00
Tyler Goodlet	fde62c72be	Show runtime nursery frames on internal errors Much like other recent changes attempt to detect runtime-bug-causing crashes and only show the runtime-endpoint frame when present. Adds a `ActorNursery._scope_error: BaseException\|None` attr to aid with detection. Also toss in some todo notes for removing and replacing the `.run_in_actor()` method API.	2024-05-20 17:04:30 -04:00
Tyler Goodlet	4ef77bb64f	Set `_ctxvar_Context` for child-side RPC tasks Just inside `._invoke()` after the `ctx: Context` is retrieved. Also try our best to not hide internal frames when a non-user-code crash happens, normally either due to a runtime RPC EP bug or a transport failure.	2024-05-20 16:23:29 -04:00
Tyler Goodlet	e78fdf2f69	Make `log.devx()` level below `.pdb()` Kinda like a "runtime"-y level for `.pdb()` (which is more or less like an `.info()` for our debugger subsys) which can be used to report internals info for those hacking on `.devx` tools. Also, inject only the last 6 digits of the `id(Task)` in `pformat_task_uid()` output by default.	2024-05-20 16:13:57 -04:00
Tyler Goodlet	13bc3c308d	Add error suppress flag to `current_ipc_ctx()`	2024-05-20 16:12:51 -04:00
Tyler Goodlet	60fc43e530	Shield channel closing in `_connect_chan()`	2024-05-20 16:11:59 -04:00
Tyler Goodlet	30afcd2b6b	Adjust `Portal` usage of `Context.pld_rx` Pass the new `ipc` arg and try to show api frames when an unexpected internal error is detected.	2024-05-20 16:07:57 -04:00
Tyler Goodlet	c80f020ebc	Expose `tractor.current_ipc_ctx()` at pkg level	2024-05-20 15:47:01 -04:00
Tyler Goodlet	262a0e36c6	Allocate a `PldRx` per `Context`, new pld-spec API Since the state mgmt becomes quite messy with multiple sub-tasks inside an IPC ctx, AND bc generally speaking the payload-type-spec should map 1-to-1 with the `Context`, it doesn't make a lot of sense to be using `ContextVar`s to modify the `Context.pld_rx: PldRx` instance. Instead, always allocate a full instance inside `mk_context()` with the default `.pld_rx: PldRx` set to use the `msg._ops._def_any_pldec: MsgDec` In support, simplify the `.msg._ops` impl and APIs: - drop `_ctxvar_PldRx`, `_def_pld_rx` and `current_pldrx()`. - rename `PldRx._pldec` -> `._pld_dec`. - rename the unused `PldRx.apply_to_ipc()` -> `.wraps_ipc()`. - add a required `PldRx._ctx: Context` attr since it is needed internally in some meths and each pld-rx now maps to a specific ctx. - modify all recv methods to accept a `ipc: Context\|MsgStream` (instead of a `ctx` arg) since both have a ref to the same `._rx_chan` and there are only a couple spots (in `.dec_msg()`) where we need the `ctx` explicitly (which can now be easily accessed via a new `MsgStream.ctx` property, see below). - always show the `.dec_msg()` frame in tbs if there's a reference error when calling `_raise_from_unexpected_msg()` in the fallthrough case. - implement `limit_plds()` as light wrapper around getting the `current_ipc_ctx()` and mutating its `MsgDec` via `Context.pld_rx.limit_plds()`. - add a `maybe_limit_plds()` which just provides an `@acm` equivalent of `limit_plds()` handy for composing in a `async with ():` style block (avoiding additional indent levels in the body of async funcs). Obvi extend the `Context` and `MsgStream` interfaces as needed to match the above: - add a `Context.pld_rx` pub prop. - new private refs to `Context._started_msg: Started` and a `._started_pld` (mostly for internal debugging / testing / logging) and set inside `.open_context()` immediately after the syncing phase. - a `Context.has_outcome() -> bool:` predicate which can be used to more easily determine if the ctx errored or has a final result. - pub props for `MsgStream.ctx: Context` and `.chan: Channel` providing full `ipc`-arg compat with the `PldRx` method signatures.	2024-05-20 15:46:28 -04:00
Tyler Goodlet	d93135acd8	Include truncated `id(trio.Task)` for task info in log header	2024-05-15 09:36:22 -04:00
Tyler Goodlet	b23780c102	Make `request_root_stdio_lock()` post-mortem-able Finally got this working so that if/when an internal bug is introduced to this request task-func, we can actually REPL-debug the lock request task itself B) As in, if the subactor's lock request task internally errors we, - ensure the task always terminates (by calling `DebugStatus.release()`) and explicitly reports (via a `log.exception()`) the internal error. - capture the error instance and set as a new `DebugStatus.req_err` and always check for it on final teardown - in which case we also, - ensure it's reraised from a new `DebugRequestError`. - unhide the stack frames for `_pause()`, `_enter_repl_sync()` so that the dev can upward inspect the `_pause()` call stack sanely. Supporting internal impl changes, - add `DebugStatus.cancel()` and `.req_err`. - don't ever cancel the request task from `PdbREPL.set_[continue/quit]()` only when there's some internal error that would likely result in a hang and stale lock state with the root. - only release the root's lock when the current ask is also the owner (avoids bad release errors). - also show internal `._pause()`-related frames on any `repl_err`. Other temp-dev-tweaks, - make pld-dec change log msgs info level again while solving this final context-vars race stuff.. - drop the debug pld-dec instance match asserts for now since the problem is already caught (and now debug-able B) by an attr-error on the decoded-as-`dict` started msg, and instead add in a `log.exception()` trace to see which task is triggering the case where the debug `MsgDec` isn't set correctly vs. when we think it's being applied.	2024-05-14 21:01:20 -04:00
Tyler Goodlet	31de5f6648	Always release debug request from `._post_mortem()` Since obviously the thread is likely expected to halt and raise after the REPL session exits; this was a regression from the prior impl. The main reason for this is that otherwise the request task will never unblock if the user steps through the crashed task using 'next' since the `.do_next()` handler doesn't by default release the request since in the `.pause()` case this would end the session too early. Other, - toss in draft `Pdb.user_exception()`, though doesn't seem to ever trigger? - only release `Lock._debug_lock` when already locked.	2024-05-14 11:39:04 -04:00
Tyler Goodlet	236083b6e4	Rename `.msg.types.Msg` -> `PayloadMsg`	2024-05-10 13:15:45 -04:00
Tyler Goodlet	fc075e96c6	Hide some API frames, port to new `._debug` apis - start tossing in `__tracebackhide__`s to various eps which don't need to show in tbs or in the pdb REPL. - port final `._maybe_enter_pm()` to pass a `api_frame`. - start comment-marking up some API eps with `@api_frame` in prep for actually using the new frame-stack tracing.	2024-05-09 16:04:34 -04:00
Tyler Goodlet	d6ca4771ce	Use `.recv_msg_w_pld()` for final `Portal.result()` Woops, due to a `None` test against the `._final_result`, any actual final `None` result would be received but not acked as such causing a spawning test to hang. Fix it by instead receiving and assigning both a `._final_result_msg: PayloadMsg` and `._final_result_pld`. NB: as mentioned in many recent comments surrounding this API layer, really this whole `Portal`-has-final-result interface/semantics should be entirely removed as should the `ActorNursery.run_in_actor()` API(s). Instead it should all be replaced by a wrapping "high level" API (`tractor.hilevel` ?) which combines a task nursery, `Portal.open_context()` and underlying `Context` APIs + an `outcome.Outcome` to accomplish the same "run a single task in a spawned actor and return it's result"; aka a "one-shot-task-actor".	2024-05-09 09:47:13 -04:00
Tyler Goodlet	c5a0cfc639	Rename `.msg.types.Msg` -> `PayloadMsg`	2024-05-08 15:07:34 -04:00
Tyler Goodlet	f85314ecab	Adjust `._runtime` to report `DebugStatus.req_ctx` - inside the `Actor.cancel()`'s maybe-wait-on-debugger delay, report the full debug request status and it's affiliated lock request IPC ctx. - use the new `.req_ctx.chan.uid` to do the local nursery lookup during channel teardown handling. - another couple log fmt tweaks.	2024-05-08 15:06:50 -04:00
Tyler Goodlet	6690968236	Rework and first draft of `.devx._frame_stack.py` Proto-ing a little suite of call-stack-frame annotation-for-scanning sub-systems for the purposes of both, - the `.devx._debug`er and its traceback and frame introspection needs when entering the REPL, - detailed trace-style logging such that we can explicitly report on "which and where" `tractor`'s APIs are used in the "app" code. Deats: - change mod name obvi from `._code` and adjust client mod imports. - using `wrapt` (for perf) implement a `@api_frame` annot decorator which both stashes per-call-stack-frame instances of `CallerInfo` in a table and marks the function such that API endpoints can be easily found via runtime stack scanning despite any internal impl changes. - add a global `_frame2callerinfo_cache: dict[FrameType, CallerInfo]` table for providing the per func-frame info caching. - Re-implement `CallerInfo` to require less (types of) inputs: \|_ `_api_func: Callable`, a ref to the (singleton) func def. \|_ `_api_frame: FrameType` taken from the `@api_frame` marked `tractor`-API func's runtime call-stack, from which we can determine the app code's `.caller_frame`. \|_`_caller_frames_up: int\|None` allowing the specific `@api_frame` to determine "how many frames up" the application / calling code is. And, a better set of derived attrs: \|_`caller_frame: FrameType` which finds and caches the API-eps calling frame. \|_`caller_frame: FrameType` which finds and caches the API-eps calling - add a new attempt at "getting a method ref from its runtime frame" with `get_ns_and_func_from_frame()` using a heuristic that the `CodeType.co_qualname: str` should have a "." in it for methods. - main issue is still that the func-ref lookup will require searching for the method's instance type by name, and that name isn't guaranteed to be defined in any particular ns.. \|_rn we try to read it from the `FrameType.f_locals` but that is going to obvi fail any time the method is called in a module where it's type is not also defined/imported. - returns both the ns and the func ref FYI.	2024-05-08 14:51:56 -04:00
Tyler Goodlet	343b7c9712	Even moar bitty `Context` refinements - set `._state._ctxvar_Context` just after `StartAck` inside `open_context_from_portal()` so that `current_ipc_ctx()` always works on the 'parent' side. - always set `.canceller` to any `MsgTypeError.src_uid` and otherwise to any maybe-detected `.src_uid` (i.e. for RAEs). - always set `.canceller` to us when we rx a ctxc which reports us as its canceller; this is a sanity check on definite "self cancellation". - adjust `._is_self_cancelled()` logic to only be `True` when `._remote_error` is both a ctxc with a `.canceller` set to us AND when `Context.canceller` is also set to us (since the change above) as a little bit of extra rigor. - fill-in/fix some `.repr_state` edge cases: - merge self-vs.-peer ctxc cases to one block and distinguish via nested `._is_self_cancelled()` check. - set 'errored' for all exception matched cases despite `.canceller`. - add pre-`Return` phase statuses: \|_'pre-started' and 'syncing-to-child' depending on side and when `._stream` has not (yet) been set. \|_'streaming' and 'streaming-finished' depending on side when `._stream` is set and whether it was stopped/closed. - tweak drainage log-message to use "outcome" instead of "result". - use new `.devx.pformat.pformat_cs()` inside `_maybe_cancel_and_set_remote_error()` but, IFF the log level is at least 'cancel'.	2024-05-08 14:02:56 -04:00
Tyler Goodlet	45f37870af	Add a `.log.at_least_level()` predicate	2024-05-08 13:33:59 -04:00
Tyler Goodlet	4d528b76a0	Move `_debug.pformat_cs()` into `devx.pformat`	2024-05-08 13:30:15 -04:00
Tyler Goodlet	05b143d9ef	Big debugger rework, more tolerance for internal err-hangs Since i was running into them (internal errors) during lock request machinery dev and was getting all sorts of difficult to understand hangs whenever i intro-ed a bug to either side of the ipc ctx; this all while trying to get the msg-spec working for `Lock` requesting subactors.. Deats: - hideframes for `@acm`s and `trio.Event.wait()`, `Lock.release()`. - better detail out the `Lock.acquire/release()` impls - drop `Lock.remote_task_in_debug`, use new `.ctx_in_debug`. - add a `Lock.release(force: bool)`. - move most of what was `_acquire_debug_lock_from_root_task()` and some of the `lock_tty_for_child().__a[enter/exit]()` logic into `Lock.[acquire/release]()` including bunch more logging. - move `lock_tty_for_child()` up in the module to below `Lock`, with some rework: - drop `subactor_uid: tuple` arg since we can just use the `ctx`.. - add exception handler blocks for reporting internal (impl) errors and always force release the lock in such cases. - extend `DebugStatus` (prolly will rename to `DebugRequest` btw): - add `.req_ctx: Context` for subactor side. - add `.req_finished: trio.Event` to sub to signal request task exit. - extend `.shield_sigint()` doc-str. - add `.release()` to encaps all the state mgmt previously strewn about inside `._pause()`.. - use new `DebugStatus.release()` to replace all the duplication: - inside `PdbREPL.set_[continue/quit]()`. - inside `._pause()` for the subactor branch on internal repl-invocation error cases, - in the `_enter_repl_sync()` closure on error, - replace `apply_debug_codec()` -> `apply_debug_pldec()` in tandem with the new `PldRx` sub-sys which handles the new `__pld_spec__`. - add a new `pformat_cs()` helper orig to help debug cs stack a corruption; going to move to `.devx.pformat` obvi. - rename `wait_for_parent_stdin_hijack()` -> `request_root_stdio_lock()` with improvements: - better doc-str and add todos, - use `DebugStatus` more stringently to encaps all subactor req state. - error handling blocks for cancellation and straight up impl errors directly around the `.open_context()` block with the latter doing a `ctx.cancel()` to avoid hanging in the shielded `.req_cs` scope. - similar exc blocks for the func's overall body with explicit `log.exception()` reporting. - only set the new `DebugStatus.req_finished: trio.Event` in `finally`. - rename `mk_mpdb()` -> `mk_pdb()` and don't cal `.shield_sigint()` implicitly since the caller usage does matter for this. - factor out `any_connected_locker_child()` from the SIGINT handler. - rework SIGINT handler to better handle any stale-lock/hang cases: - use new `Lock.ctx_in_debug: Context` to detect subactor-in-debug. and use it to cancel any lock request instead of the lower level - use `problem: str` summary approach to log emissions. - rework `_pause()` given all of the above, stuff not yet mentioned: - don't take `shield: bool` input and proxy to `debug_func()` (for now). - drop `extra_frames_up_when_async: int` usage, expect `**debug_func_kwargs` to passthrough an `api_frame: Frametype` (more on this later). - lotsa asserts around the request ctx vs. task-in-debug ctx using new `current_ipc_ctx()`. - asserts around `DebugStatus` state. - rework and simplify the `debug_func` hooks, `_set_trace()`/`_post_mortem()`: - make them accept a non-optional `repl: PdbRepl` and `api_frame: FrameType` which should be used to set the current frame when the REPL engages. - always hide the hook frames. - always accept a `tb: TracebackType` to `_post_mortem()`. \|_ copy and re-impl what was the delegation to `pdbp.xpm()`/`pdbp.post_mortem()` and instead call the underlying `Pdb.interaction()` ourselves with a `caller_frame` and tb instance. - adjust the public `.pause()` impl: - accept optional `hide_tb` and `api_frame` inputs. - mask opening a cancel-scope for now (can cause `trio` stack corruption, see notes) and thus don't use the `shield` input other then to eventually passthrough to `_post_mortem()`? \|_ thus drop `task_status` support for now as well. \|_ pretty sure correct soln is a debug-nursery around `._invoke()`. - since no longer using `extra_frames_up_when_async` inside `debug_func()`s ensure all public apis pass a `api_frame`. - re-impl our `tractor.post_mortem()` to directly call into `._pause()` instead of binding in via `partial` and mk it take similar input as `.pause()`. - drop `Lock.release()` from `_maybe_enter_pm()`, expose and pass expected frame and tb. - use necessary changes from all the above within `maybe_wait_for_debugger()` and `acquire_debug_lock()`. Lel, sorry thought that would be shorter.. There's still a lot more re-org to do particularly with `DebugStatus` encapsulation but it's coming in follow up.	2024-05-08 11:44:55 -04:00
Tyler Goodlet	a354732a9e	Allow `Stop` passthrough from `PldRx.recv_msg_w_pld()` Since we need to allow it (at the least) inside `drain_until_final_msg()` for handling stream-phase termination races where we don't want to have to handle a raised error from something like `Context.result()`. Expose the passthrough option via a `passthrough_non_pld_msgs: bool` kwarg. Add comprehensive comment to `current_pldrx()`.	2024-05-08 08:50:16 -04:00
Tyler Goodlet	fbc21a1dec	Add a "current IPC `Context`" `ContextVar` Expose it from `._state.current_ipc_ctx()` and set it inside `._rpc._invoke()` for child and inside `Portal.open_context()` for parent. Still need to write a few more tests (particularly demonstrating usage throughout multiple nested nurseries on each side) but this suffices as a proto for testing with some debugger request-from-subactor stuff. Other, - use new `.devx.pformat.add_div()` for ctxc messages. - add a block to always traceback dump on corrupted cs stacks. - better handle non-RAEs exception output-formatting in context termination summary log message. - use a summary for `start_status` for msg logging in RPC loop.	2024-05-07 15:35:45 -04:00
Tyler Goodlet	b278164f83	Mk `drain_to_final_msg()` never raise from `Error` Since we usually want them raised from some (internal) call to `Context.maybe_raise()` and NOT directly from the drainage call, make it possible via a new `raise_error: bool` to both `PldRx.recv_msg_w_pld()` and `.dec_msg()`. In support, - rename `return_msg` -> `result_msg` since we expect to return `Error`s. - do a `result_msg` assign and `break` in the `case Error()`. - add `**dec_msg_kwargs` passthrough for other `.dec_msg()` calling methods. Other, - drop/aggregate todo-notes around the main loop's `ctx._pld_rx.recv_msg_w_pld()` call. - add (configurable) frame hiding to most payload receive meths.	2024-05-06 13:43:51 -04:00
Tyler Goodlet	8ffa6a5e68	"Icons" in `._entry`'s subactor `.info()` messages Add a little `>` or `X` supervision icon indicating the spawning or termination of each sub-actor respectively.	2024-05-06 13:12:44 -04:00
Tyler Goodlet	7707e0e75a	Woops, make `log.devx()` level 600	2024-05-06 13:07:53 -04:00
Tyler Goodlet	523c24eb72	Move pformatters into new `.devx.pformat` Since `._code` is prolly gonna get renamed (to something "frame & stack tools" related) and to give a bit better organization. Also adds a new `add_div()` helper, factored out of ctxc message creation in `._rpc._invoke()`, for adding a little "header line" divider under a given `message: str` with a little math to center it.	2024-05-06 13:04:58 -04:00
Tyler Goodlet	544ff5ab4c	Change to `RemoteActorError.pformat()` For more sane manual calls as needed in logging purposes. Obvi remap the dunder methods to it. Other: - drop `hide_tb: bool` from `unpack_error()`, shouldn't need it since frame won't ever be part of any tb raised from returned error. - add a `is_invalid_payload: bool` to `_raise_from_unexpected_msg()` to be used from `PldRx` where we don't need to decode the IPC msg, just the payload; make the error message reflect this case. - drop commented `._portal._unwrap_msg()` since we've replaced it with `PldRx`'s delegation to newer `._raise_from_unexpected_msg()`. - hide the `Portal.result()` frame by default, again.	2024-05-06 13:01:56 -04:00
Tyler Goodlet	63c23d6b82	Add todo for rigorous struct-type spec of `SpawnSpec` fields	2024-04-30 13:01:07 -04:00
Tyler Goodlet	cca3206fd6	Use `log.devx()` for `stackscope` messages	2024-04-30 13:00:03 -04:00

1 2 3 4 5 ...

1115 Commits (5902ad2ffb79c3d0daf5c79feb2309af6fe678ac)