tractor

Commit Graph

Author	SHA1	Message	Date
Tyler Goodlet	4569d11052	Move `.is_multi_cancelled()` to `.trioniics._beg` Since it's for beg filtering, the current impl should be renamed anyway; it's not just for filtering cancelled excs. Deats, - added a real doc string, links to official eg docs and fixed the return typing. - adjust all internal imports to match.	2025-07-16 15:49:18 -04:00
Tyler Goodlet	f5056cdd02	Mk `test_crash_handler_cms` suite go green Turns out there were some subtle internal bugs discovered by the just added `tests/devx/test_tooling::test_crash_handler_cms` suite. So this fixes, - a mis-ordering around `rt_repl_fixture :=` in the logic of `DebugStatus.maybe_enter_repl_fixture()`. - `.devx.debug._post_mortem._post_mortem()` ensuring we now always call `DebugStatus.release()`, and thus unwind the exist-stack managing the `repl_fixture` exit/teardown, even in the case where `yield False` is delivered from the user-fixture-fn (meaning `dnter_repl=False`) thus triggering an early `return` (as is done in the new test suite).	2025-07-14 18:07:50 -04:00
Tyler Goodlet	d000642462	Report `enable_stack_on_sig` on `stackscope` import failure	2025-07-14 13:15:07 -04:00
Tyler Goodlet	5b69975f81	Drop " " from tail of `BoxedMaybeException.pformat()`	2025-07-14 00:00:13 -04:00
Tyler Goodlet	8d506796ec	Re-impl as `DebugStatus.maybe_enter_repl_fixture()` Dropping the `_maybe_open_repl_fixture()` approach and instead using a `DebugStatus._fixture_stack = ExitStack()` which provides for much simpler support around both sync and async pausing APIs thanks to only invoking `repl_fixture.__exit__()` on actual `PdbREPL` interaction being complete! Deats, - all `repl_fixture` detection logic still happens in one place (the new method) but we aren't limited to closing it via an immediate post REPL `.__exit__()` call which instead is triggered by, - `DebugStatus.release()` which now calls `._fixture_stack.close()` and thus only invokes `repl_fixture.__exit__()` when user REPL-ing is actually complete an arbitrary amount of debugging time later. - include the notes for `@acm` support above the new method, though not sure if they're as relevant any more? Benefits, - we can drop the previously added indent levels from `_enter_repl_sync()` and `_post_mortem()`. - now we automatically have support for the `.pause_from_sync()` API since `_enter_repl_sync()` doesn't close the prior `_maybe_open_repl_fixture()` immediately when `debug_func=None`; the user's `__exit__()` is only ever called once `.release()` is. Other, - add big 'CASE' comments around the various blocks in `.pause_from_sync()`, i was having trouble figuring out which i was using from a `breakpoint()` in a dependent app..	2025-07-14 00:00:12 -04:00
Tyler Goodlet	02d03ce700	Always pass `repl: PdbREPL` as first param to fixture	2025-07-14 00:00:12 -04:00
Tyler Goodlet	116137d066	Reorg `.devx.debug` into sub-mods! Which cleans out the pkg-mod to just the expected exports with (its longstanding todo comment list) and thus a separation-of-concerns and smaller mod-file sizes via the following new sub-mods: - `._trace` for the `.pause()`/`breakpoint()`/`pdb.set_trace()`-style APIs including all sync-caller variants. - `._post_mortem` to contain our async `.post_mortem()` and all other public crash handling APIs for use from sync callers. - `._sync` for the high-level syncing helper-routines used throughout the runtime to avoid multi-proc TTY use collisions. And also, - remove `hide_runtime_frames()` since moved to `.devx._frame_stack`.	2025-07-14 00:00:12 -04:00
Tyler Goodlet	7f87b4e717	Mv `.hide_runtime_frames()` -> `.devx._frame_stack` A much more relevant module for a call-stack-frame hider ;)	2025-07-14 00:00:12 -04:00
Tyler Goodlet	e4758550f7	Start splitting into `devx.debug.` sub-mods From what was originall the `.devx._debug` monolith module, since that file was way out of ctl in terms of LoC! New modules so far include, - ._repl: our `pdb[p]` ext type/lowlevel-APIs and `mk_pdb()` factory. - ._sigint: just our REPL-interaction shield-handler. - ._tty_lock: containing all the root-actor TTY mutex machinery including the `Lock`/`DebugStatus` primitives/APIs as well as the inter-tree IPC context eps: * the server-side `lock_stdio_for_peer()` which pairs with the, * client-(subactor)-side `request_root_stdio_lock()` via the, * pld-msg-spec of `LockStatus/LockRelease`. AND the `any_connected_locker_child()` predicate.	2025-07-14 00:00:12 -04:00
Tyler Goodlet	a7efbfdbc2	Add `_maybe_open_repl_fixture()` Factoring the (basically duplicate) content from both use spots into a common `@cm` which delivers a `bool` signalling whether the REPL should be engaged. Fixes a lingering bug with `nullcontext()` calling btw..	2025-07-14 00:00:12 -04:00
Tyler Goodlet	1c6660c497	Mk `.devx._debug` a sub-pkg `.devx.debug` With plans for much factoring of the original module into sub-mods! Adjust all imports and refs throughout to match.	2025-07-14 00:00:12 -04:00
Tyler Goodlet	202befa360	Add exc suppression to `open_crash_handler()` By supporting a new optional param to `open_crash_handler()`, `raise_on_exit: bool\|Sequence[Type[BaseException]] = True` which determines whether, after the REPL interaction completes, the handled exception is raised upward. This is very handy for writing bits of "debug-able but resilient code" as is the case in (many) dependent projects/apps. Impl, - `raise_on_exit` can be a `bool` or (set) sequence of types which will always be raised. - also add a `BoxedMaybeException.raise_on_exit` equiv which (for now) we check matches (in case down the road we want to offer dynamic ctls). - rename both crash-handler cm's `tb_hide` -> `hide_tb`.	2025-07-14 00:00:12 -04:00
Tyler Goodlet	c24708b273	Add initial `repl_fixture` support B) It turns out to be fairly useful to allow hooking into a given actor's entry-and-exit around `.devx._debug._pause/._post_mortem()` calls which engage the `pdbp.Pdb` REPL (really our `._debug.PdbREPL` but yeah). Some very handy use cases include, - swapping out-of-band (config) state that may otherwise halt the user's app since the actor normally handles kb&mouse input, in thread, which means that the handler will be blocked while the REPL is in use. - (remotely) reporting actor-runtime state for monitoring purposes around crashes or pauses in normal operation. - allowing for crash-handling to be hard-disabled via `._state._runtime_vars` say for when you never want a debugger to be entered in a production instance where you're not-sure-if/don't-want per-actor `debug_mode: bool` settings to always be unset, say bc you're still debugging some edge cases that ow you'd normally want to REPL up. Impl details, - add a new optional `._state._runtime_vars['repl_fixture']` field which for now can be manually set; i saw no reason for a formal API yet since we want to convert the `dict` to a struct anyway (first). - augment both `.devx._debug._pause()/._post_mortem()` with a new optional `repl_fixture: AbstractContextManager[bool]` kwarg which when provided is `with repl_fixture()` opened around the lowlevel REPL interaction calls; if the enter-result, an expected `bool`, is `False` then the interaction is hard-bypassed. * for the `._pause()` case the `@cm` is opened around the entire body of the embedded `_enter_repl_sync()` closure (for now) though ideally longer term this entire routine is factored to be a lot less "nested" Bp * in `_post_mortem()` the entire previous body is wrapped similarly and also now excepts an optional `boxed_maybe_exc: BoxedMaybeException` only passed in the `open_crash_handler()` caller case. - when the new runtime-var is overridden, (only manually atm) it is used instead but only whenever the above `repl_fixture` kwarg is left null. - add a `BoxedMaybeException.pformat() = __repr__()` which when a `.value: Exception` is set renders a more "objecty" repr of the exc. Obviously tests for all this should be coming soon!	2025-07-14 00:00:12 -04:00
Tyler Goodlet	6cb361352c	Use `.is_debug_mode()` for maybe-crash-handling Such that the default is `None` and in the case where the caller does not set the `pdb` arg to an explicit `bool` we instead determine it via the output from `._state.is_debug_mode()` allowing for more "nonchalant" usage throughout a (test) code base which passes the `debug_mode: bool` as runtime config; allows delegation to the per-actor proc-global state.	2025-07-14 00:00:12 -04:00
Tyler Goodlet	b09e35f3dc	Mv in `modden.repr` content: some `reprlib`-utils Since I'd like to use some `reprlib` formatting which `modden` already implemented (and it's a main dependee project), figured I'd just bring it all into `.devx.pformat` for now.	2025-07-13 23:33:47 -04:00
Tyler Goodlet	fc57a4d639	Formally add a `nest_from_op()` for "sclang"-fmting Moving it from where i (oddly) first wrote it up in `._entry` to a more proper place with its pals in `.devx.pformat` ;p Iface summary, what caller provides: - `input_op`: a "sclang" chars-symbol to represent the conc "operation", - `text`: the "entity" being operated on, - `nest_prefix/indent`: what the ^ will be prefixed with. - `prefix_op: bool` so that, when unset, the `input_op` is instead used as a suffix to the first line of `text`. - `next_indent: int\|None = None` such that when set (and not null) we use that exact ws-indent instead of calculating it from the `len(nest_prefix)` allowing for specifying a `0`-indent easily. - includes logic where we either get a non-zero value and apply it strictly to both the `nest_prefix` and `text`, OR we auto-calc it from any `nest_prefix`, NOT a conflation of both.. - `op_suffix: str = '\n'` for instead of assuming `f'{input_op}\n'`. - `rm_from_first_ln: str` which allows removing chars from the first line of `text` after a `str.strip()` (handy for removing the '<Channel' first chevron from type-reprs). There's also a huge comment-doc for "sclang" into the fn body which is the terrible "primer" on this whole idea for the moment XD	2025-07-13 23:27:03 -04:00
Tyler Goodlet	c2e7dc7407	Avoid silent `stackscope`-test fail due to dep Oddly my env was borked bc a missing sub-dep (`typing-extensions` apparently not added by `uv` for `stackscope`?) and then `stackscope` was silently failing import and caused the shield-pause test to also fail (since it couldn't match the expected `log.devx()` on console). The import failure is not very explanatory due to the `log.warning()`; change it to `.error()` level. Also, explicitly import `_sync_pause_from_builtin` in `examples/debugging/restore_builtin_breakpoint.py` to ensure the ref is exported properly from `.devx.debug` (which it wasn't during dev of the prior commit Bp).	2025-07-13 13:45:15 -04:00
Tyler Goodlet	f67b0639b8	Move peer-tracking attrs from `Actor` -> `IPCServer` Namely transferring the `Actor` peer-`Channel` tracking attrs, - `._peers` which maps the uids to client channels (with duplicates apparently..) - the `._peer_connected: dict[tuple[str, str], trio.Event]` child-peer syncing table mostly used by parent actors to wait on sub's to connect back during spawn. - the `._no_more_peers = trio.Event()` level triggered state signal. Further we move over with some minor reworks, - `.wait_for_peer()` verbatim (adjusting all dependants). - factor the no-more-peers shielded wait branch-block out of the end of `async_main()` into 2 new server meths, * `.has_peers()` with optional chan-connected checking flag. * `.wait_for_no_more_peers()` which just does the maybe-shielded `._no_more_peers.wait()`	2025-07-08 18:05:05 -04:00
Tyler Goodlet	0711576678	Passthrough `_pause()` kwargs from `_maybe_enter_pm()`	2025-07-08 18:05:05 -04:00
Tyler Goodlet	00d8a2a099	Improve `TransportClosed.__repr__()`, add `src_exc` By borrowing from the implementation of `RemoteActorError.pformat()` which is now factored into a new `.devx.pformat_exc()` and re-used for both error types while maintaining the same func-sig. Obviously delegate `RemoteActorError.pformat()` to the new helper accordingly and keeping the prior `body` generation from `.devx.pformat_boxed_tb()` as before. The new helper allows for, - passing any of a `header\|message\|body: str` which are all combined in that order in the final output. - getting the `exc.message` as the default `message` part. - generating an objecty-looking "type-name" header to be rendered by default when `header` is not overridden. - "first-line-of `message`" processing which we split-off and then re-inject as a `f'<{type(exc).__name__}( {first} )>'` top line header. - an optional `tail: str = '>'` to "close the object"-look only added when `with_type_header: bool = True`. Adjustments to `TransportClosed` around this include, - replacing the init `cause` arg for a `src_exc` which is now always assigned to a same named instance var. - displaying that new `.src_exc` in the `body: str` arg to the `.devx.pformat.pformat_exc()` call so you can always see the underlying (normally `trio`) source error. - just make it inherit from `Exception` not `trio.BrokenResourceError` to avoid handlers catching `TransportClosed` as the former particularly in testing when we want to sometimes to distinguish them.	2025-07-08 18:05:05 -04:00
Tyler Goodlet	acac605c37	Move `DebugRequestError` to `._exceptions`	2025-07-08 18:05:04 -04:00
Guillermo Rodriguez	eceb292415	move tractor._ipc.py into tractor.ipc._chan.py	2025-07-08 12:57:28 -04:00
Tyler Goodlet	2d18e6a4be	Match `maybe_open_crash_handler()` to non-maybe version Such that it will deliver a `BoxedMaybeException` to the caller regardless whether `pdb` is set, and proxy through all `**kwargs`.	2025-03-27 13:38:47 -04:00
Tyler Goodlet	8b4ed31d3b	Handle egs on failed `request_root_stdio_lock()` Namely when the subactor fails to lock the root, in which case we try to be very verbose about how/what failed in logging as well as ensure we cancel the employed IPC ctx. Implement the outer `BaseException` handler to handle both styles, - match on an eg (or the prior std cancel excs) only raising a lone sub-exc from for former. - always `as _req_err:` and assign to a new func-global `req_err` to enable the above matching. Other, - raise `DebugStateError` on `status.subactor_uid != actor_uid`. - fix a `_repl_fail_report` ref error due to making silly assumptions about the `_repl_fail_msg` global; now copy from global as default. - various log-fmt and logic expression styling tweaks. - ignore `trio.Cancelled` by default in `open_crash_handler()`.	2025-03-27 13:38:47 -04:00
Tyler Goodlet	03406e020c	Repair/update `stackscope` test Seems that on 3.13 it's not showing our script code in the output now? Gotta get an example for @oremanj to see what's up but really it'd be nice to just custom format stuff above `trio`'s runtime by def.. Anyway, update the `.devx._stackscope`, - log formatting to be a little more "sclangy" lookin. - change the per-actor "delimiter" lines style. - report the `signal.getsignal(SIGINT)` which i needed in the `sync_bp.py` with ctl-c causing a hang.. - mask the `_tree_dumped` duplicator log report as well as the "dumped fine" one. - add an example `pkill --signal SIGUSR1` cmdline. Tweak the test to cope with, - not showing our script lines now.. which i've commented in the `assert_before()` patts.. - to expect the newly formatted delimiter (ascii) lines to separate the root vs. hanger sub-actor sections.	2025-03-27 13:24:25 -04:00
Tyler Goodlet	3345962253	Yield a boxed-maybe-error from `open_crash_handler()` Along the lines of something like `pytest.raises()` where the handled exception can be inspected from the `pdbp` REPL using its `.value` field B) This is super handy in particular for understanding `BaseException[Group]`s without manually adding surrounding handler code to assign the `except[*] Exception as exc_var:` particularly when trying to understand multi-cancelled eg trees.	2025-03-27 13:24:25 -04:00
Tyler Goodlet	2fd9c0044b	Drop extra nl from boxed error fmt	2025-03-27 13:24:25 -04:00
Tyler Goodlet	79f4197d26	Raise explicitly on missing `greenback` portal When `.pause_from_sync()` is called from an `asyncio.Task` which was never bestowed a portal we want to be mega pedantic about it; indicate that the task was NOT spawned from our `.to_asyncio` API and likely by some out-of-our-control code (normally using `asyncio.ensure_future()/.create_task()`). Though `greenback` already errors on such usage, it's not always clear why no portal exists; explaining the situation of a 3rd-party-bg-spawned-task should avoid dev confusion for most cases. Impl deats, - distinguish between an actor in infected mode versus the actual caller of `.pause_from_sync()` being an `asyncio.Task` with more explicit `asyncio_task` and `is_infected_aio` vars. - ONLY in the case of being both an infected-mode-actor AND detecting that the caller is an `asyncio.Task`, check `greenback.has_portal()` such that when not bestowed we presume the aforementioned 3rd-party-bg-task case above and raise a new explicit RTE with a detailed explanatory message. - add some masked draft code for handling the speical case of a root actor `asyncio.Task` caller which could (in theory) not actually require gb portal use since the `Lock` can be acquired directly without IPC. \|_this will likely require factoring of various pause machinery funcs into a `_pause_from_root_task()` to mk the impl sane XD Other, - expose a new `debug_filter: Callable` which can be provided by the caller of `_maybe_enter_pm()` to predicate whether to enter the debugger REPL based on the caught `BaseException\|BaseExceptionGroup`; this is handy for customizing the meaning of "graceful cancellations" so as to avoid crash handling on expected egs of more then `trioCancelled`. \|_ make the default as it was implemented: `not is_multi_cancelled(err)` - pass-through a new `ignore: set[BaseException]` as `open_crash_handler(ignore_nested=ignore)` to allow for the same silent-cancellation-egs-swallowing as desired from outside the actor runtime.	2025-03-27 13:24:25 -04:00
Tyler Goodlet	9af2a4e739	Add TODO for a tb frame "filterer" sys..	2025-03-27 13:24:25 -04:00
Tyler Goodlet	61c5613943	Support custom `boxer_header: str` provided by `pformat_boxed_tb()` caller	2025-03-27 13:24:25 -04:00
Tyler Goodlet	5b29dd5d2b	Expose a `_ctlc_ignore_header: str` for use in `sigint_shield()`	2025-03-27 13:24:25 -04:00
Tyler Goodlet	bb60a6d623	Messy-teardown `DebugStatus` related fixes Mostly fixing edge cases with `asyncio` and/or bg threads where the `.repl_release: trio.Event` needs to be used from the main `trio` thread OW confusing-but-valid teardown tracebacks can show under various races. Also improve, - log reporting for such internal bugs to make them more obvious on console via `log.exception()`. - only restore the SIGINT handler when runtime is (still) active. - reporting when `tractor.pause(shield=True)` should be used and unhiding the internal frames from the tb in that case. - for `pause_from_sync()` some deep fixes.. \|_add a `allow_no_runtime: bool = False` flag to allow not requiring the actor runtime to be active. \|_fix the `greenback` case-branch to only trigger on `not is_trio_thread`. \|_add a scope-global `repl_owner: Task\|Thread\|None = None` to avoid ref errors..	2025-03-27 13:24:25 -04:00
Tyler Goodlet	6ef06be6d0	More `.pause_from_sync()` in bg-threads "polish" Various `try`/`except` blocks around external APIs that raise when not running inside an `tractor` and/or some async framework (mostly to avoid too-late/benign error tbs on certain classes of actor tree teardown): - for the `log.pdb()` prompts emitted before REPL console entry. - inside `DebugStatus.is_main_trio_thread()`'s call to `sniffio`. - in `_post_mortem()` by catching `NoRuntime` when called from a thread still active after the `.open_root_actor()` has already exited. Also, - create a dedicated `DebugStateError` for raising instead of `assert`s when we have actual debug-request inconsistencies (as seem to be most likely with bg thread usage of `breakpoint()`). - show the `open_crash_handler()` frame on `bdb.BdbQuit` (for now?)	2025-03-27 13:24:25 -04:00
Tyler Goodlet	f8222356ce	Hide `[maybe]_open_crash_handler()` frame by default	2025-03-27 13:24:25 -04:00
Tyler Goodlet	4b9d638be9	Use our `._post_mortem` from `open_crash_handler()` Since it seems that `pdbp.xpm()` can sometimes lose the up-stack traceback info/frames? Not sure why but ours seems to work just fine from a `asyncio`-handler in `modden`'s use of `i3ipc` B) Also call `DebugStatus.shield_sigint()` from `pause_from_sync()` in the infected-`asyncio` case to get the same shielding behaviour as in all other usage!	2025-03-27 13:24:25 -04:00
Tyler Goodlet	6b18fcd437	First draft, `asyncio`-task, sync-pausing Bo Mostly due to magic from @oremanj where we slap in a little bit of `.from_asyncio`-type stuff to run a `trio`-task from `asyncio.Task` code! I'm not gonna go into tooo too much detail but basically the primary thing needed was a way to (blocking-ly) invoke a `trio.lowlevel.Task` from an `asyncio` one (which we now have with a new `run_trio_task_in_future()` thanks to draft code from the aforementioned jefe) which we now invoke from a dedicated aio case-branch inside `.devx._debug.pause_from_sync()`. Further include a case inside `DebugStatus.release()` to handle using the same func to set the `repl_release: trio.Event` from the aio side when releasing the REPL on exit cmds. Prolly more refinements to come ;{o	2025-03-27 13:24:25 -04:00
Tyler Goodlet	64d506970a	Officially test proto-ed `stackscope` integration By re-purposing our `pexpect`-based console matching with a new `debugging/shield_hang_in_sub.py` example, this tests a few "hanging actor" conditions more formally: - that despite a hanging actor's task we can dump a `stackscope.extract()` tree on relay of `SIGUSR1`. - the actor tree will terminate despite a shielded forever-sleep by our "T-800" zombie reaper machinery activating and hard killing the underlying subprocess. Some test deats: - simulates the expect actions of a real user by manually using `os.kill()` to send both signals to the actor-tree program. - `pexpect`-matches against `log.devx()` emissions under normal `debug_mode == True` usage. - ensure we get the actual "T-800 deployed" `log.error()` msg and that the actor tree eventually terminates! Surrounding (re-org/impl/test-suite) changes: - allow disabling usage via a `maybe_enable_greenback: bool` to `open_root_actor()` but enable by def. - pretty up the actual `.devx()` content from `.devx._stackscope` including be extra pedantic about the conc-primitives for each signal event. - try to avoid double handles of `SIGUSR1` even though it seems the original (what i thought was a) problem was actually just double logging in the handler.. \|_ avoid double applying the handler func via `signal.signal()`, \|_ use a global to avoid double handle func calls and, \|_ a `threading.RLock` around handling. - move common fixtures and helper routines from `test_debugger` to `tests/devx/conftest.py` and import them for use in both test mods.	2025-03-27 13:24:25 -04:00
Tyler Goodlet	92713af63e	Get multi-threaded sync-pausing fully workin! The final issue was making sure we do the same thing on ctl-c/SIGINT from the user. That is, if there's already a bg-thread in REPL, we `log.pdb()` about SIGINT shielding and re-draw the prompt; the same UX as normal actor-runtime-task behaviour. Reasons this wasn't workin.. and the fix: - `.pause_from_sync()` was overriding the local `repl` var with `None` delivered by (transitive) calls to `_pause(debug_func=None)`.. so remove all that and only assign it OAOO prior to thread-type case branching. - always call `DebugStatus.shield_sigint()` as needed from all requesting threads/tasks: - in `_pause_from_bg_root_thread()` BEFORE calling `._pause()` AND BEFORE yielding back to the bg-thread via `.started(out)` to ensure we're definitely overriding the handler in the `trio`-main-thread task before unblocking the requesting bg-thread. - from any requesting bg-thread in the root actor such that both its main-`trio`-thread scheduled task (as per above bullet) AND it are SIGINT shielded. - always call `.shield_sigint()` BEFORE any `greenback._await()` case don't entirely grok why yet, but it works)? - for `greenback._await()` case always set `bg_task` to the current one.. - tweaks to the `SIGINT` handler, now renamed `sigint_shield()` so as not to name-collide with the methods when editor-searching: - always try to `repr()` the REPL thread/task "owner" as well as the active `PdbREPL` instance. - add `.devx()` notes around the prompt flushing deats and comments for any root-actor-bg-thread edge cases. Related/supporting refinements: - add `get_lock()`/`get_debug_req()` factory funcs since the plan is to eventually implement both as `@singleton` instances per actor. - fix `acquire_debug_lock()`'s call-sig-bug for scheduling `request_root_stdio_lock()`.. - in `._pause()` only call `mk_pdb()` when `debug_func != None`. - add some todo/warning notes around the `cls.repl = None` in `DebugStatus.release()` `test_pause_from_sync()` tweaks: - don't use a `attach_patts.copy()`, since we always `break` on match. - do `pytest.fail()` on that ^ loop's fallthrough.. - pass `do_ctlc(child, patt=attach_key)` such that we always match the the current thread's name with the ctl-c triggered `.pdb()` emission. - oh yeah, return the last `before: str` from `do_ctlc()`. - in the script, flip `abandon_on_cancel=True` since when `False` it seems to cause `trio.run()` to hang on exit from the last bg-thread case?!?	2025-03-27 13:24:25 -04:00
Tyler Goodlet	4a08d586cd	Another tweak to REPL entry `.pdb()` headers	2025-03-27 13:24:25 -04:00
Tyler Goodlet	607e1dcf45	More failed REPL-lock-request refinements In `lock_stdio_for_peer()` better internal-error handling/reporting: - only `Lock._blocked.remove(ctx.cid)` if that same cid was added on entry to avoid needless key-errors. - drop all `Lock.release(force: bool)` usage remnants. - if `req_ctx.cancel()` fails mention it with `ctx_err.add_note()`. - add more explicit internal-failed-request log messaging via a new `fail_reason: str`. - use and use new `x)<=\n\|_` annots in any failure logging. Other cleanups/niceties: - drop `force: bool` flag entirely from the `Lock.release()`. - use more supervisor-op-annots in `.pdb()` logging with both `_pause/crash_msg: str` instead of double '\|' lines when `.pdb()`-reported from `._set_trace()`/`._post_mortem()`.	2025-03-27 13:24:25 -04:00
Tyler Goodlet	8ff682440d	Further formalize `greenback` integration Since we more or less require it for `tractor.pause_from_sync()` this refines enable toggles and their relay down the actor tree as well as more explicit logging around init and activation. Tweaks summary: - `.info()` report the module if discovered during root boot. - use a `._state._runtime_vars['use_greenback']: bool` activation flag inside `Actor._from_parent()` to determine if the sub should try to use it and set to `False` if mod-loading fails / not installed. - expose `maybe_init_greenback()` from `.devx` sugpkg. - comment out RTE in `._pause()` for now since we already have it in `.pause_from_sync()`. - always `.exception()` on `maybe_init_greenback()` import errors to clarify the underlying failure deats. - always explicitly report if `._state._runtime_vars['use_greenback']` was NOT set when `.pause_from_sync()` is called. Other `._runtime.async_main()` adjustments: - combine the "internal error call ur parents" message and the failed registry contact status into one new `err_report: str`. - drop the final exception handler's call to `Actor.lifetime_stack.close()` since we're already doing it in the `finally:` block and the earlier call has no currently known benefit. - only report on the `.lifetime_stack()` callbacks if any are detected as registered.	2025-03-24 14:04:52 -04:00
Tyler Goodlet	097101f8d3	Port debug request ep to use `@context(pld_spec)` Namely passing the `.__pld_spec__` directly to the `lock_stdio_for_peer()` decorator B) Also, allows dropping `apply_debug_pldec()` (which was a todo) and removing a `lock_stdio_for_peer()` indent level.	2025-03-24 14:04:52 -04:00
Tyler Goodlet	6a5d33b7ed	Make big TODO: for `devx._debug` refinements Hopefully would make grok-ing this fairly sophisticated sub-sys possible for any up-and-coming `tractor` hacker XD A lot of internal API and re-org ideas I discovered/realized as part of finishing the `__pld_spec__` and multi-threaded support. Particularly better isolation between root-actor vs subactor task APIs and generally less globally-state-ful stuff like `DebugStatus` and `Lock` method APIs would likely make a lot of the hard to follow edge cases more clear?	2025-03-24 14:04:52 -04:00
Tyler Goodlet	31cc33c66c	First proto: multi-threaded synced `pdb`-REPLs Functionally working for multi-threaded (via cpython threads spawned from `to_trio.to_thread.run_sync()`) alongside subactors, tested (for now) only with threads started inside the root actor (which seemed to have the most issues in terms of the impl and special cases..) using the new `tractor.pause_from_sync()` API! Main implementation changes to `.pause_from_sync()` ------ - ------ - from the root actor, we need to ensure bg thread case is handled specially since no IPC is used to request the TTY stdio mutex and `Lock` (API) usage is conducted entirely from a local task or thread; dedicated `Lock` usage for the root-actor already is branched inside `._pause()` and needs similar handling from a root bg-thread: \|_for the special case of a root bg thread we need to `trio`-main-thread schedule a bg task inside a new `_pause_from_bg_root_thread()`. The new task needs to implement most of what was is handled inside `._pause()` manually, mostly because in this root-actor-bg-thread case we have 2 constraints: 1. to enter `PdbREPL.interaction()` from the bg thread directly, 2. the task that `Lock._debug_lock.acquire()`s has to be the same that calls `.release() (a `trio.FIFOLock` constraint) \|_impl deats of this `_pause_from_bg_root_thread()` include: - (for now) calling `._pause()` to acquire the `Lock._debug_lock`. - setting its own `DebugStatus.repl_release`. - calling `.DebugStatus.shield_sigint()` to ensure the root's main thread uses the right handler when the bg one is REPL-ing. - wait manually on the `.repl_release()` to be set by the thread's dedicated `PdbREPL` exit. - manually calling `Lock.release()` from the same task that acquired it. - expect calls to `._pause()` to deliver a `tuple[Task, PdbREPL]` such that we always get the handle both to any newly created REPl instance and the (maybe) the scheduled bg task within which is runs. - add a single `message: str` style to `log.devx()` based on branching style for logging. - ensure both `DebugStatus.repl` and `.repl_task` are set just before calling `._set_trace()` to ensure the correct `Task\|Thread` is set when the REPL is finally entered from sync code. - add a wrapping caller `_sync_pause_from_builtin()` which passes in the new `called_from_builtin=True` to indicate `breakpoint()` caller usage, obvi pass in `api_frame`. Changes to `._pause()` in support of ^ ------ - ------ - `TaskStatus.started()` and return the `tuple[Task, PdbREPL]` to callers / starters. - only call `DebugStatus.shield_sigint()` when no `repl` passed bc some callers (like bg threads) may need to apply it at some specific point themselves. - tweak some asserts for the `debug_func == None` / non-`trio`-thread case. - add a mod-level `_repl_fail_msg: str` to be used when there's an internal `._pause()` failure for testing, easier to pexpect match. - more comprehensive logging for the root-actor branched case to (attempt to) indicate any of the 3 cases: - remote ctx from subactor has the `Lock`, - already existing root task or thread has it or, - some kinda stale `.locked()` situation where the root has the lock but we don't know why. - for root usage, revert to always `await Lock._debug_lock.acquire()`-ing despite `called_from_sync` since `.pause_from_sync()` was reworked to instead handle the special bg thread case in the new `_pause_from_bg_root_thread()` task. - always do `return _enter_repl_sync(debug_func)`. - try to report any `repl_task: Task\|Thread` set by the caller (particularly for the bg thread cases) as being the thread or task `._pause()` was called "on behalf of" Changes to `DebugStatus`/`Lock` in support of ^ ------ - ------ - only call `Lock.release()` from `DebugStatus.set_[quit/continue]()` when called from the main `trio` thread and always call `DebugStatus.release()` after to ensure `.repl_released()` is set after `._debug_lock.release()`. - only call `.repl_release.set()` from `trio` thread otherwise use `.from_thread.run()`. - much more refinements in `Lock.release()` for threading cases: - return `bool` to indicate whether lock was released by caller. - mask (in prep to drop) `_pause()` usage of `Lock.release.force=True)` since forcing a release can't ever avoid the RTE from `trio`.. same task must acquire/release. - don't allow usage from non-`trio`-main-threads, ever; there's no point since the same-task-needs-to-manage-`FIFOLock` constraint. - much more detailed logging using `message`-building-style for all caller (edge) cases. \|_ use a `we_released: bool` to determine failed-to-release edge cases which can happen if called from bg threads, ensure we `log.exception()` on any incorrect usage resulting in release failure. \|_ complain loudly if the release fails and some other task/thread still holds the lock. \|_ be explicit about "who" (which task or thread) the release is "on behalf of" by reading `DebugStatus.repl_task` since the caller isn't the REPL operator in many sync cases. - more or less drop `force` support, as mentioned above. - ensure we unset `._owned_by_root` if the caller is a root task. Other misc ------ - ------ - rename `lock_tty_for_child()` -> `lock_stdio_for_peer()`. - rejig `Lock.repr()` to show lock and event stats. - stage `Lock.stats` and `.owner` methods in prep for doing a singleton instance and `@property`s.	2025-03-24 14:04:52 -04:00
Tyler Goodlet	2f1a97e73e	Catch `.pause_from_sync()` in root bg thread bugs! Originally discovered as while using `tractor.pause_from_sync()` from the `i3ipc` client running in a bg-thread that uses `asyncio` inside `modden`. Turns out we definitely aren't correctly handling `.pause_from_sync()` from the root actor when called from a `trio.to_thread.run_sync()` bg thread: - root-actor bg threads which can't `Lock._debug_lock.acquire()` since they aren't in `trio.Task`s. - even if scheduled via `.to_thread.run_sync(_debug._pause)` the acquirer won't be the task/thread which calls `Lock.release()` from `PdbREPL` hooks; this results in a RTE raised by `trio`.. - multiple threads will step on each other's stdio since cpython's GIL seems to ctx switch threads on every input from the user to the REPL loop.. Reproduce via reworking our example and test so that they catch and fail for all edge cases: - rework the `/examples/debugging/sync_bp.py` example to demonstrate the above issues, namely the stdio clobbering in the REPL when multiple threads and/or a subactor try to debug simultaneously. \|_ run one thread using a task nursery to ensure it runs conc with the nursery's parent task. \|_ ensure the bg threads run conc a subactor usage of `.pause_from_sync()`. \|_ gravely detail all the special cases inside a TODO comment. \|_ add some control flags to `sync_pause()` helper and don't use `breakpoint()` by default. - extend and adjust `test_debugger.test_pause_from_sync` to match (and thus currently fail) by ensuring exclusive `PdbREPL` attachment when the 2 bg root-actor threads are concurrently interacting alongside the subactor: \|_ should only see one of the `_pause_msg` logs at a time for either one of the threads or the subactor. \|_ ensure each attaches (in no particular order) before expecting the script to exit. Impl adjustments to `.devx._debug`: - drop `Lock.repl`, no longer used. - add `Lock._owned_by_root: bool` for the `.ctx_in_debug == None` root-actor-task active case. - always `log.exception()` for any `._debug_lock.release()` ownership RTE emitted by `trio`, like we used to.. - add special `Lock.release()` log message for the stale lock but `._owned_by_root == True` case; oh yeah and actually `log.devx(message)`.. - rename `Lock.acquire()` -> `.acquire_for_ctx()` since it's only ever used from subactor IPC usage; well that and for local root-task usage we should prolly add a `.acquire_from_root_task()`? - buncha `._pause()` impl improvements: \|_ type `._pause()`'s `debug_func` as a `partial` as well. \|_ offer `called_from_sync: bool` and `called_from_bg_thread: bool` for the special case handling when called from `.pause_from_sync()` \|_ only set `DebugStatus.repl/repl_task` when `debug_func != None` (OW ensure the `.repl_task` is not the current one). \|_ handle error logging even when `debug_func is None`.. \|_ lotsa detailed commentary around root-actor-bg-thread special cases. - when `._set_trace(hide_tb=False)` do `pdbp.set_trace(frame=currentframe())` so the `._debug` internal frames are always included. - by default always hide tracebacks for `.pause[_from_sync]()` internals. - improve `.pause_from_sync()` to avoid root-bg-thread crashes: \|_ pass new `called_from_xxx_` flags and ensure `DebugStatus.repl_task` is actually set to the `threading.current_thread()` when needed. \|_ manually call `Lock._debug_lock.acquire_nowait()` for the non-bg thread case. \|_ TODO: still need to implement the bg-thread case using a bg `trio.Task`-in-thread with an `trio.Event` set by thread REPL exit.	2025-03-24 14:04:52 -04:00
Tyler Goodlet	15a47dc4f7	Finally, officially support shielded REPL-ing! It's been a long time prepped and now finally implemented! Offer a `shield: bool` argument from our async `._debug` APIs: - `await tractor.pause(shield=True)`, - `await tractor.post_mortem(shield=True)` ^-These-^ can now be used inside cancelled `trio.CancelScope`s, something very handy when introspecting complex (distributed) system tear/shut-downs particularly under remote error or (inter-peer) cancellation conditions B) Thanks to previous prepping in a prior attempt and various patches from the rigorous rework of `.devx._debug` internals around typed msg specs, there ain't much that was needed! Impl deats - obvi passthrough `shield` from the public API endpoints (was already done from a prior attempt). - put ad-hoc internal `with trio.CancelScope(shield=shield):` around all checkpoints inside `._pause()` for both the root-process and subactor case branches. Add a fairly rigorous example, `examples/debugging/shielded_pause.py` with a wrapping `pexpect` test, `test_debugger.test_shield_pause()` and ensure it covers as many cases as i can think of offhand: - multiple `.pause()` entries in a loop despite parent scope cancellation in a subactor RPC task which itself spawns a sub-task. - a `trio.Nursery.parent_task` which raises, is handled and tries to enter and unshielded `.post_mortem()`, which of course internally raises `Cancelled` in a `._pause()` checkpoint, so we catch the `Cancelled` again and then debug the debugger's internal cancellation with specific checks for the particular raising checkpoint-LOC. - do ^- the latter -^ for both subactor and root cases to ensure we can debug `._pause()` itself when it tries to REPL engage from a cancelled task scope Bo	2025-03-24 14:04:52 -04:00
Tyler Goodlet	defe34dec2	Move runtime frame hiding into helper func Call it `hide_runtime_frames()` and stick all the lines from the top of the `._debug` mod in there along with a little `log.devx()` emission on what gets hidden by default ;) Other, - fix ref-error where internal-error handler might trigger despite the debug `req_ctx` not yet having init-ed, such that we don't try to cancel or log about it when it never was fully created/initialize.. - fix assignment typo iniside `_set_trace()` for `task`.. lel	2025-03-24 14:04:51 -04:00
Tyler Goodlet	e1857413a3	Resolve remaining debug-request race causing hangs More or less by pedantically separating and managing root and subactor request syncing events to always be managed by the locking IPC context task-funcs: - for the root's "child"-side, `lock_tty_for_child()` directly creates and sets a new `Lock.req_handler_finished` inside a `finally:` - for the sub's "parent"-side, `request_root_stdio_lock()` does the same with a new `DebugStatus.req_finished` event and separates it from the `.repl_release` event (which indicates a "c" or "q" from user and thus exit of the REPL session) as well as sets a new `.req_task: trio.Task` to explicitly distinguish from the app-user-task that enters the REPL vs. the paired bg task used to request the global root's stdio mutex alongside it. - apply the `__pld_spec__` on "child"-side of the ctx using the new `Portal.open_context(pld_spec)` parameter support; drops use of any `ContextVar` malarky used prior for `PldRx` mgmt. - removing `Lock.no_remote_has_tty` since it was a nebulous name and from the prior "everything is in a `Lock`" design.. ------ - ------ More rigorous impl to handle various edge cases in `._pause()`: - rejig `_enter_repl_sync()` to wrap the `debug_func == None` case inside maybe-internal-error handler blocks. - better logic for recurrent vs. multi-task contention for REPL entry in subactors, by guarding using `DebugStatus.req_task` and by now waiting on the new `DebugStatus.req_finished` for the multi-task contention case. - even better internal error handling and reporting for when this code is hacked on and possibly broken ;p ------ - ------ Updates to `.pause_from_sync()` support: - add optional `actor`, `task` kwargs to `_set_trace()` to allow compat with the new explicit `debug_func` calling in `._pause()` and pass a `threading.Thread` for `task` in the `.to_thread()` usage case. - add an `except` block that tries to show the frame on any internal error. ------ - ------ Relatedly includes a buncha cleanups/simplifications somewhat in prep for some coming refinements (around `DebugStatus`): - use all the new attrs mentioned above as needed in the SIGINT shielder. - wait on `Lock.req_handler_finished` in `maybe_wait_for_debugger()`. - dropping a ton of masked legacy code left in during the recent reworks. - better comments, like on the use of `Context._scope` for shielding on the "child"-side to avoid the need to manage yet another cs. - add/change-to lotsa `log.devx()` level emissions for those infos which are handy while hacking on the debugger but not ideal/necessary to be user visible. - obvi add lotsa follow up todo notes!	2025-03-24 14:04:51 -04:00
Tyler Goodlet	7656326484	Make `request_root_stdio_lock()` post-mortem-able Finally got this working so that if/when an internal bug is introduced to this request task-func, we can actually REPL-debug the lock request task itself B) As in, if the subactor's lock request task internally errors we, - ensure the task always terminates (by calling `DebugStatus.release()`) and explicitly reports (via a `log.exception()`) the internal error. - capture the error instance and set as a new `DebugStatus.req_err` and always check for it on final teardown - in which case we also, - ensure it's reraised from a new `DebugRequestError`. - unhide the stack frames for `_pause()`, `_enter_repl_sync()` so that the dev can upward inspect the `_pause()` call stack sanely. Supporting internal impl changes, - add `DebugStatus.cancel()` and `.req_err`. - don't ever cancel the request task from `PdbREPL.set_[continue/quit]()` only when there's some internal error that would likely result in a hang and stale lock state with the root. - only release the root's lock when the current ask is also the owner (avoids bad release errors). - also show internal `._pause()`-related frames on any `repl_err`. Other temp-dev-tweaks, - make pld-dec change log msgs info level again while solving this final context-vars race stuff.. - drop the debug pld-dec instance match asserts for now since the problem is already caught (and now debug-able B) by an attr-error on the decoded-as-`dict` started msg, and instead add in a `log.exception()` trace to see which task is triggering the case where the debug `MsgDec` isn't set correctly vs. when we think it's being applied.	2025-03-24 14:04:51 -04:00
Tyler Goodlet	8bab8e8bde	Always release debug request from `._post_mortem()` Since obviously the thread is likely expected to halt and raise after the REPL session exits; this was a regression from the prior impl. The main reason for this is that otherwise the request task will never unblock if the user steps through the crashed task using 'next' since the `.do_next()` handler doesn't by default release the request since in the `.pause()` case this would end the session too early. Other, - toss in draft `Pdb.user_exception()`, though doesn't seem to ever trigger? - only release `Lock._debug_lock` when already locked.	2025-03-24 14:04:51 -04:00

1 2

84 Commits (326b258fd56c2468464b3173b3cac12b2d3f9cc0)