Commit Graph

6 Commits (05b143d9ef874e420d154a56fce00044de1b43a2)

Author SHA1 Message Date
Tyler Goodlet 05b143d9ef Big debugger rework, more tolerance for internal err-hangs
Since i was running into them (internal errors) during lock request
machinery dev and was getting all sorts of difficult to understand hangs
whenever i intro-ed a bug to either side of the ipc ctx; this all while
trying to get the msg-spec working for `Lock` requesting subactors..

Deats:
- hideframes for `@acm`s and `trio.Event.wait()`, `Lock.release()`.
- better detail out the `Lock.acquire/release()` impls
- drop `Lock.remote_task_in_debug`, use new `.ctx_in_debug`.
- add a `Lock.release(force: bool)`.
- move most of what was `_acquire_debug_lock_from_root_task()` and some
  of the `lock_tty_for_child().__a[enter/exit]()` logic into
  `Lock.[acquire/release]()`  including bunch more logging.
- move `lock_tty_for_child()` up in the module to below `Lock`, with
  some rework:
  - drop `subactor_uid: tuple` arg since we can just use the `ctx`..
  - add exception handler blocks for reporting internal (impl) errors
    and always force release the lock in such cases.
- extend `DebugStatus` (prolly will rename to `DebugRequest` btw):
  - add `.req_ctx: Context` for subactor side.
  - add `.req_finished: trio.Event` to sub to signal request task exit.
  - extend `.shield_sigint()` doc-str.
  - add `.release()` to encaps all the state mgmt previously strewn
    about inside `._pause()`..
- use new `DebugStatus.release()` to replace all the duplication:
  - inside `PdbREPL.set_[continue/quit]()`.
  - inside `._pause()` for the subactor branch on internal
    repl-invocation error cases,
  - in the `_enter_repl_sync()` closure on error,
- replace `apply_debug_codec()` -> `apply_debug_pldec()` in tandem with
  the new `PldRx` sub-sys  which handles the new `__pld_spec__`.
- add a new `pformat_cs()` helper orig to help debug cs stack
  a corruption; going to move to `.devx.pformat` obvi.
- rename `wait_for_parent_stdin_hijack()` -> `request_root_stdio_lock()`
  with improvements:
  - better doc-str and add todos,
  - use `DebugStatus` more stringently to encaps all subactor req state.
  - error handling blocks for cancellation and straight up impl errors
    directly around the `.open_context()` block with the latter doing
    a `ctx.cancel()` to avoid hanging in the shielded `.req_cs` scope.
  - similar exc blocks for the func's overall body with explicit
    `log.exception()` reporting.
  - only set the new `DebugStatus.req_finished: trio.Event` in `finally`.
- rename `mk_mpdb()` -> `mk_pdb()` and don't cal `.shield_sigint()`
  implicitly since the caller usage does matter for this.
- factor out `any_connected_locker_child()` from the SIGINT handler.
- rework SIGINT handler to better handle any stale-lock/hang cases:
  - use new `Lock.ctx_in_debug: Context` to detect subactor-in-debug.
    and use it to cancel any lock request instead of the lower level
  - use `problem: str` summary approach to log emissions.
- rework `_pause()` given all of the above, stuff not yet mentioned:
  - don't take `shield: bool` input and proxy to `debug_func()` (for now).
  - drop `extra_frames_up_when_async: int` usage, expect
    `**debug_func_kwargs` to passthrough an `api_frame: Frametype` (more
    on this later).
  - lotsa asserts around the request ctx vs. task-in-debug ctx using new
    `current_ipc_ctx()`.
  - asserts around `DebugStatus` state.
- rework and simplify the `debug_func` hooks,
  `_set_trace()`/`_post_mortem()`:
  - make them accept a non-optional `repl: PdbRepl` and `api_frame:
    FrameType` which should be used to set the current frame when the
    REPL engages.
  - always hide the hook frames.
  - always accept a `tb: TracebackType` to `_post_mortem()`.
   |_ copy and re-impl what was the delegation to
     `pdbp.xpm()`/`pdbp.post_mortem()` and instead call the
     underlying `Pdb.interaction()` ourselves with a `caller_frame`
     and tb instance.
- adjust the public `.pause()` impl:
  - accept optional `hide_tb` and `api_frame` inputs.
  - mask opening a cancel-scope for now (can cause `trio` stack
    corruption, see notes) and thus don't use the `shield` input other
    then to eventually passthrough to `_post_mortem()`?
   |_ thus drop `task_status` support for now as well.
   |_ pretty sure correct soln is a debug-nursery around `._invoke()`.
- since no longer using `extra_frames_up_when_async` inside
  `debug_func()`s ensure all public apis pass a `api_frame`.
- re-impl our `tractor.post_mortem()` to directly call into `._pause()`
  instead of binding in via `partial` and mk it take similar input as
  `.pause()`.
- drop `Lock.release()` from `_maybe_enter_pm()`, expose and pass
  expected frame and tb.
- use necessary changes from all the above within
  `maybe_wait_for_debugger()` and `acquire_debug_lock()`.

Lel, sorry thought that would be shorter..
There's still a lot more re-org to do particularly with `DebugStatus`
encapsulation but it's coming in follow up.
2024-05-08 11:44:55 -04:00
Tyler Goodlet a73b24cf4a First draft, sub-msg-spec for debugger `Lock` sys
Since it's totes possible to have a spec applied that won't permit
`str`s, might as well formalize a small msg set for subactors to request
the tree-wide TTY `Lock`.

BTW, I'm prolly not going into every single change here in this first
WIP since there's still a variety of broken stuff mostly to do with
races on the codec apply being done in a `trio.lowleve.RunVar`; it
should be re-done with a `ContextVar` such that each task does NOT
mutate the global setting..

New msg set and usage is simply:
- `LockStatus` which is the reponse msg delivered from `lock_tty_for_child()`
- `LockRelease` a one-off request msg from the subactor to drop the
  `Lock` from a `MsgStream.send()`.
- use these msgs throughout the root and sub sides of the locking
  ctx funcs: `lock_tty_for_child()` & `wait_for_parent_stdin_hijack()`

The codec is now applied in both the root and sub `Lock` request tasks:
- for root inside `lock_tty_for_child()` before the `.started()`.
- for subs, inside `wait_for_parent_stdin_hijack()` since we only want
  to affect the codec *for the locking task*.
  - (hence the need for ctx-var as mentioned above but currently this
    can cause races which will break against other app tasks competing
    for the codec setting).
- add a `apply_debug_codec()` helper for use in both cases.
- add more detailed logging to both the root and sub side of `Lock`
  requesting funcs including requiring that the sub-side task "uid" (a
  `tuple[str, int]` = (trio.Task.name, id(trio.Task)` be provided (more
  on this later).

A main issue discovered while proto-testing all this was the ability of
a sub to "double lock" (leading to self-deadlock) via an error in
`wait_for_parent_stdin_hijack()` which, for ex., can happen in debug
mode via crash handling of a `MsgTypeError` received from the root
during a codec applied msg-spec race! Originally I was attempting to
solve this by making the SIGINT override handler more resilient but this
case is somewhat impossible to detect by an external root task other
then checking for duplicate ownership via the new `subactor_task_uid`.
=> SO NOW, we always stick the current task uid in the
   `Lock._blocked: set` and raise an rte on a double request by the same
   remote task.

Included is a variety of small refinements:
- finally figured out how to mark a variety of `.__exit__()` frames with
  `pdbp.hideframe()` to actually hide them B)
- add cls methods around managing `Lock._locking_task_cs` from root only.
- re-org all the `Lock` attrs into those only used in root vs. subactors
  and proto-prep a new `DebugStatus` actor-singleton to be used in subs.
- add a `Lock.repr()` to contextually print the current conc primitives.
- rename our `Pdb`-subtype to `PdbREPL`.
- rigor out the SIGINT handler a bit, originally to try and hack-solve
  the double-lock issue mentioned above, but now just with better
  logging and logic for most (all?) possible hang cases that should be
  hang-recoverable after enough ctrl-c mashing by the user.. well
  hopefully:
  - using `Lock.repr()` for both root and sub cases.
  - lots more `log.warn()`s and handler reversions on stale lock or cs
    detection.
- factor `._pause()` impl a little better moving the actual repl entry
  to a new `_enter_repl_sync()` (originally for easier wrapping in the
  sub case with `apply_codec()`).
2024-04-16 13:25:19 -04:00
Tyler Goodlet 1f7f84fdfa Mk debugger tests work for arbitrary pre-REPL format
Since this was changed as part of overall project wide logging format
updates, and i ended up changing the both the crash and pause `.pdb()`
msgs to include some multi-line-ascii-"stuff", might as well make the
pre-prompt checks in the test suite more flexible to match.

As such, this exposes 2 new constants inside the `.devx._debug` mod:
- `._pause_msg: str` for the pre `tractor.pause()` header emitted via
  `log.pdb()` and,
- `._crash_msg: str` for the pre `._post_mortem()` equiv when handling
  errors in debug mode.

Adjust the test suite to use these values and thus make us more capable
to absorb changes in the future as well:
- add a new `in_prompt_msg()` predicate, very similar to `assert_before()`
  but minus `assert`s which takes in a `parts: list[str]` to match
  in the pre-prompt stdout.
- delegate to `in_prompt_msg()` in `assert_before()` since it was mostly
  duplicate minus `assert`.
- adjust all previous `<patt> in before` asserts to instead use
  `in_prompt_msg()` with separated pre-prompt-header vs. actor-name
  `parts`.
- use new `._pause/crash_msg` values in all such calls including any
  `assert_before()` cases.
2024-03-05 12:22:04 -05:00
Tyler Goodlet bf0739c194 Add `stackscope` tree pprinter triggered by SIGUSR1
Can be optionally enabled via a new `enable_stack_on_sig()` which will
swap in the SIGUSR1 handler. Much thanks to @oremanj for writing this
amazing project, it's thus far helped me fix some very subtle hangs
inside our new IPC-context cancellation machinery that would have
otherwise taken much more manual pdb-ing and hair pulling XD

Full credit for `dump_task_tree()` goes to the original project author
with some minor tweaks as was handed to me via the trio-general matrix
room B)

Slight changes from orig version:
- use a `log.pdb()` emission to pprint to console
- toss in an ex sh CLI cmd to trigger the dump from another terminal
  using `kill` + `pgrep`.
2024-02-20 09:05:34 -05:00
Tyler Goodlet e94f1261b5 Move `maybe_open_crash_handler()` CLI `--pdb`-driven wrapper to debug mod 2023-10-02 18:10:34 -04:00
Tyler Goodlet fa9a9cfb1d Kick off `.devx` subpkg for our dev tools B)
Where `.devx` is "developer experience", a hopefully broad enough subpkg
name for all the slick stuff planned to augment working on the actor
runtime 💥

Move the `._debug` module into the new subpkg and adjust rest of core
code base to reflect import path change. Also add a new
`.devx._debug.open_crash_handler()` manager for wrapping any sync code
outside a `trio.run()` which is handy for eventual CLI addons for
popular frameworks like `click`/`typer`.
2023-09-28 14:14:50 -04:00