Commit Graph

90 Commits (a6058d14ae8be5d9de718be5dfcfac4fc399c837)

Author SHA1 Message Date
Tyler Goodlet 30d60379c1 Drop thread logging to make `log.pdb()` patts match in test 2024-06-07 22:35:59 -04:00
Tyler Goodlet 408a74784e Catch `.pause_from_sync()` in root bg thread bugs!
Originally discovered as while using `tractor.pause_from_sync()`
from the `i3ipc` client running in a bg-thread that uses `asyncio`
inside `modden`.

Turns out we definitely aren't correctly handling `.pause_from_sync()`
from the root actor when called from a `trio.to_thread.run_sync()`
bg thread:
- root-actor bg threads which can't `Lock._debug_lock.acquire()` since
  they aren't in `trio.Task`s.
- even if scheduled via `.to_thread.run_sync(_debug._pause)` the
  acquirer won't be the task/thread which calls `Lock.release()` from
  `PdbREPL` hooks; this results in a RTE raised by `trio`..
- multiple threads will step on each other's stdio since cpython's GIL
  seems to ctx switch threads on every input from the user to the REPL
  loop..

Reproduce via reworking our example and test so that they catch and fail
for all edge cases:
- rework the `/examples/debugging/sync_bp.py` example to demonstrate the
  above issues, namely the stdio clobbering in the REPL when multiple
  threads and/or a subactor try to debug simultaneously.
  |_ run one thread using a task nursery to ensure it runs conc with the
     nursery's parent task.
  |_ ensure the bg threads run conc a subactor usage of
     `.pause_from_sync()`.
  |_ gravely detail all the special cases inside a TODO comment.
  |_ add some control flags to `sync_pause()` helper and don't use
     `breakpoint()` by default.
- extend and adjust `test_debugger.test_pause_from_sync` to match (and
  thus currently fail) by ensuring exclusive `PdbREPL` attachment when
  the 2 bg root-actor threads are concurrently interacting alongside the
  subactor:
  |_ should only see one of the `_pause_msg` logs at a time for either
     one of the threads or the subactor.
  |_ ensure each attaches (in no particular order) before expecting the
     script to exit.

Impl adjustments to `.devx._debug`:
- drop `Lock.repl`, no longer used.
- add `Lock._owned_by_root: bool` for the `.ctx_in_debug == None`
  root-actor-task active case.
- always `log.exception()` for any `._debug_lock.release()` ownership
  RTE emitted by `trio`, like we used to..
- add special `Lock.release()` log message for the stale lock but
  `._owned_by_root == True` case; oh yeah and actually
  `log.devx(message)`..
- rename `Lock.acquire()` -> `.acquire_for_ctx()` since it's only ever
  used from subactor IPC usage; well that and for local root-task
  usage we should prolly add a `.acquire_from_root_task()`?
- buncha `._pause()` impl improvements:
 |_ type `._pause()`'s `debug_func` as a `partial` as well.
 |_ offer `called_from_sync: bool` and `called_from_bg_thread: bool`
    for the special case handling when called from `.pause_from_sync()`
 |_ only set `DebugStatus.repl/repl_task` when `debug_func != None`
   (OW ensure the `.repl_task` is not the current one).
 |_ handle error logging even when `debug_func is None`..
 |_ lotsa detailed commentary around root-actor-bg-thread special cases.
- when `._set_trace(hide_tb=False)` do `pdbp.set_trace(frame=currentframe())`
  so the `._debug` internal frames are always included.
- by default always hide tracebacks for `.pause[_from_sync]()` internals.
- improve `.pause_from_sync()` to avoid root-bg-thread crashes:
 |_ pass new `called_from_xxx_` flags and ensure `DebugStatus.repl_task`
    is actually set to the `threading.current_thread()` when needed.
 |_ manually call `Lock._debug_lock.acquire_nowait()` for the non-bg
    thread case.
 |_ TODO: still need to implement the bg-thread case using a bg
    `trio.Task`-in-thread with an `trio.Event` set by thread REPL exit.
2024-06-06 16:56:30 -04:00
Tyler Goodlet 8ea0f08386 Finally, officially support shielded REPL-ing!
It's been a long time prepped and now finally implemented!

Offer a `shield: bool` argument from our async `._debug` APIs:
- `await tractor.pause(shield=True)`,
- `await tractor.post_mortem(shield=True)`

^-These-^ can now be used inside cancelled `trio.CancelScope`s,
something very handy when introspecting complex (distributed) system
tear/shut-downs particularly under remote error or (inter-peer)
cancellation conditions B)

Thanks to previous prepping in a prior attempt and various patches from
the rigorous rework of `.devx._debug` internals around typed msg specs,
there ain't much that was needed!

Impl deats
- obvi passthrough `shield` from the public API endpoints (was already
  done from a prior attempt).
- put ad-hoc internal `with trio.CancelScope(shield=shield):` around all
  checkpoints inside `._pause()` for both the root-process and subactor
  case branches.

Add a fairly rigorous example, `examples/debugging/shielded_pause.py`
with a wrapping `pexpect` test, `test_debugger.test_shield_pause()` and
ensure it covers as many cases as i can think of offhand:

- multiple `.pause()` entries in a loop despite parent scope
  cancellation in a subactor RPC task which itself spawns a sub-task.
- a `trio.Nursery.parent_task` which raises, is handled and
  tries to enter and unshielded `.post_mortem()`, which of course
  internally raises `Cancelled` in a `._pause()` checkpoint, so we catch
  the `Cancelled` again and then debug the debugger's internal
  cancellation with specific checks for the particular raising
  checkpoint-LOC.
- do ^- the latter -^ for both subactor and root cases to ensure we
  can debug `._pause()` itself when it tries to REPL engage from
  a cancelled task scope Bo
2024-05-30 17:52:24 -04:00
Tyler Goodlet 2f854a3e86 Add a `tractor.post_mortem()` API test + example
Since turns out we didn't have a single example using that API Bo

The test granular-ly checks all use cases:
- `.post_mortem()` manual calls in both subactor and root.
- ensuring built-in RPC crash handling activates after each manual one
  from ^.
- drafted some call-stack frame checking that i commented out for now
  since we need to first do ANSI escape code removal due to the
  colorization that `pdbp` does by default.
  |_ added a TODO with SO link on `assert_before()`.

Also todo-staged a shielded-pause test to match with the already
existing-but-needs-refinement example B)
2024-05-30 16:03:28 -04:00
Tyler Goodlet 27fd96729a Tweaks to debugger examples
Light stuff like comments, typing, and a couple API usage updates.
2024-05-28 09:22:59 -04:00
Tyler Goodlet d2dee87b36 Modernize streaming example script
- add typing,
- apply multi-line call style,
- use 'cancel' log level,
- enable debug mode.
2024-05-09 16:51:51 -04:00
Tyler Goodlet eca2c02f8b Flip to `.pause()` in subactor bp example 2024-04-14 18:53:42 -04:00
Tyler Goodlet 0fcd424d57 Start a new `._testing.fault_simulation`
Since I needed the `break_ipc()` helper from the
`examples/advanced_faults/ipc_failure_during_stream.py` used in the
`test_advanced_faults` suite, might as well move it into a pkg-wide
importable module. Also changed the default break method to be
`socket_close` which just calls `Stream.socket.close()` underneath in
`trio`.

Also tweak that example to not keep sending after the stream has been
broken since with new `trio` that will raise `ClosedResourceError` and
in the wrapping test we generally speaking want to see a hang and then
cancel via simulated user sent SIGINT/ctl-c.
2024-04-03 10:19:50 -04:00
Tyler Goodlet 72b4dc1461 Provision for infected-`asyncio` debug mode support
It's **almost** there, we're just missing the final translation code to
get from an `asyncio` side task to be able to call
`.devx._debug..wait_for_parent_stdin_hijack()` to do root actor TTY
locking. Then we just need to ensure internals also do the right thing
with `greenback()` for equivalent sync `breakpoint()` style pause
points.

Since i'm deferring this until later, tossing in some xfail tests to
`test_infected_asyncio` with TODOs for the needed implementation as well
as eventual test org.

By "provision" it means we add:
- `greenback` init block to `_run_asyncio_task()` when debug mode is
  enabled (but which will currently rte when `asyncio` is detected)
  using `.bestow_portal()` around the `asyncio.Task`.
- a call to `_debug.maybe_init_greenback()` in the `run_as_asyncio_guest()`
  guest-mode entry point.
- as part of `._debug.Lock.is_main_trio_thread()` whenever the async-lib
  is not 'trio' error lock the backend name (which is obvi `'asyncio'`
  in this use case).
2024-03-25 16:09:32 -04:00
Tyler Goodlet 4f863a6989 Refine and test `tractor.pause_from_sync()`
Now supports use from any `trio` task, any sync thread started with
`trio.to_thread.run_sync()` AND also via `breakpoint()` builtin API!
The only bit missing now is support for `asyncio` tasks when in infected
mode.. Bo

`greenback` setup/API adjustments:
- move `._rpc.maybe_import_gb()` to -> `devx._debug` and factor out the cached
  import checking into a sync func whilst placing the async `.ensure_portal()`
  bootstrapping into a new async `maybe_init_greenback()`.
- use the new init-er func inside `open_root_actor()` with the output
  predicating whether we override the `breakpoint()` hook.

core `devx._debug` implementation deatz:
- make `mk_mpdb()` only return the `pdp.Pdb` subtype instance since
  the sigint unshielding func is now accessible from the `Lock`
  singleton from anywhere.

- add non-main thread support (at least for `trio.to_thread` use cases)
  to our `Lock` with a new `.is_trio_thread()` predicate that delegates
  directly to `trio`'s internal version.

- do `Lock.is_trio_thread()` checks inside any methods which require
  special provisions when invoked from a non-main `trio` thread:
  - `.[un]shield_sigint()` methods since `signal.signal` usage is only
    allowed from cpython's main thread.
  - `.release()` since `trio.StrictFIFOLock` can only be called from
    a `trio` task.

- rework `.pause_from_sync()` itself to directly call `._set_trace()`
  and don't bother with `greenback._await()` when we're already calling
  it from a `.to_thread.run_sync()` thread, oh and try to use the
  thread/task name when setting `Lock.local_task_in_debug`.

- make it an RTE for now if you try to use `.pause_from_sync()` from any
  infected-`asyncio` task, but support is (hopefully) coming soon!

For testing we add a new `test_debugger.py::test_pause_from_sync()`
which includes a ctrl-c parametrization around the
`examples/debugging/sync_bp.py` script which includes all currently
supported/working usages:
- `tractor.pause_from_sync()`.
- via `breakpoint()` overload.
- from a `trio.to_thread.run_sync()` spawn.
2024-03-22 19:58:25 -04:00
Tyler Goodlet c04d77a3c9 First draft workin minus non-main-thread usage! 2024-03-20 19:13:13 -04:00
Tyler Goodlet 8ab5e08830 Adjust advanced faults test(s) for absorbed EoCs
More or less just simplifies to not seeing the stream closure errors and
instead expecting KBIs from the simulated user who 'ctl-cs after hang'.

Toss in a little `stuff_hangin_ctlc()` to the script to wrap all that
and always check stream closure before sending the final KBI.
2024-03-19 19:33:06 -04:00
Tyler Goodlet 9221c57234 Adjust all `RemoteActorError.type` using tests
To instead use the new `.boxed_type` B)
2024-03-19 18:08:54 -04:00
Tyler Goodlet 71de56b09a Drop now-deprecated deps on modern `trio`/Python
- `trio_typing` is nearly obsolete since `trio >= 0.23`
- `exceptiongroup` is built-in to python 3.11
- `async_generator` primitives have lived in `contextlib` for quite
  a while!
2024-03-13 18:41:24 -04:00
Tyler Goodlet 96992bcbb9 Add (back) a `tractor._testing` sub-pkg
Since importing from our top level `conftest.py` is not scaleable
or as "future forward thinking" in terms of:
- LoC-wise (it's only one file),
- prevents "external" (aka non-test) example scripts from importing
  content easily,
- seemingly(?) can't be used via abs-import if using
  a `[tool.pytest.ini_options]` in a `pyproject.toml` vs.
  a `pytest.ini`, see:
  https://docs.pytest.org/en/8.0.x/reference/customize.html#pyproject-toml)

=> Go back to having an internal "testing" pkg like `trio` (kinda) does.

Deats:
- move generic top level helpers into pkg-mod including the new
  `expect_ctxc()` (which i needed in the advanced faults testing script.
- move `@tractor_test` into `._testing.pytest` sub-mod.
- adjust all the helper imports to be a `from tractor._testing import <..>`

Rework `test_ipc_channel_break_during_stream()` and backing script:
- make test(s) pull `debug_mode` from new fixture (which is now
  controlled manually from `--tpdb` flag) and drop the previous
  parametrized input.
- update logic in ^ test for "which-side-fails" cases to better match
  recently updated/stricter cancel/failure semantics in terms of
  `ClosedResouruceError` vs. `EndOfChannel` expectations.
- handle `ExceptionGroup`s with expected embedded errors in test.
- better pendantics around whether to expect a user simulated KBI.
- for `examples/advanced_faults/ipc_failure_during_stream.py` script:
  - generalize ipc breakage in new `break_ipc()` with support for diff
    internal `trio` methods and a #TODO for future disti frameworks
  - only make one sub-actor task break and the other just stream.
  - use new `._testing.expect_ctxc()` around ctx block.
  - add a bit of exception handling with `print()`s around ctxc (unused
    except if 'msg' break method is set) and eoc cases.
  - don't break parent side ipc in loop any more then once
    after first break, checked via flag var.
  - add a `pre_close: bool` flag to control whether
    `MsgStreama.aclose()` is called *before* any ipc breakage method.

Still TODO:
- drop `pytest.ini` and add the alt section to `pyproject.py`.
 -> currently can't get `--rootdir=` opt to work.. not showing in
   console header.
 -> ^ also breaks on 'tests' `enable_modules` imports in subactors
   during discovery tests?
2024-03-13 09:09:08 -04:00
Tyler Goodlet 6b1ceee19f Type out the full-fledged streaming ex. 2023-10-18 15:36:00 -04:00
Tyler Goodlet ee87cf0e29 Add a debug-mode-breakpoint-causes-hang case!
Only found this by luck more or less (while working on something in
a client project) and it turns out we can actually get to (yet another)
hang state where SIGINT will be ignored by the root actor on teardown..

I've added all the necessary logic flags to reproduce. We obviously need
a follow up bug issue and a test suite to replicate!

It appears as though the following are required based on very light
tinkering:
- infected asyncio mode active
- debug mode active
- the `trio` context must breakpoint *before* `.started()`-ing
- the `asyncio` must **not** error
2023-06-21 14:07:31 -04:00
Tyler Goodlet ebcb275cd8 Add (first-draft) infected-`asyncio` actor task uses debugger example 2023-06-21 14:07:31 -04:00
Tyler Goodlet 79622bbeea Restore `breakpoint()` hook after runtime exits
Previously we were leaking our (pdb++) override into the Python runtime
which would always result in a runtime error whenever `breakpoint()` is
called outside our runtime; after exit of the root actor . This
explicitly restores any previous hook override (detected during startup)
or deletes the hook and restores the environment if none existed prior.

Also adds a new WIP debugging example script to ensure breakpointing
works as normal after runtime close; this will be added to the test
suite.
2023-05-15 00:47:29 -04:00
Tyler Goodlet e34823aab4 Add parent vs. child cancels first cases 2023-01-29 14:55:02 -05:00
Tyler Goodlet 6c35ba2cb6 Add IPC breakage on both parent and child side
With the new fancy `_pytest.pathlib.import_path()` we can do real
parametrization of the example-script-module code and thus configure
whether the child, parent, or both silently break the IPC connection.

Parametrize the test for all the above mentioned cases as well as the
case where the IPC never breaks but we still simulate the user hammering
ctl-c / SIGINT to terminate the actor tree. Adjust expected errors based
on each case and heavily document each of these.
2023-01-29 14:55:02 -05:00
Tyler Goodlet 3a0817ff55 Skip `advanced_faults/` subset in docs examples tests 2023-01-29 14:55:02 -05:00
Tyler Goodlet 7fddb4416b Handle `mp` spawn method cases in test suite 2023-01-29 14:55:02 -05:00
Tyler Goodlet 4f8586a928 Wrap ex in new test, change dir helpers to use `pathlib.Path` 2023-01-29 14:55:02 -05:00
Tyler Goodlet fb9ff45745 Move example to a new `advanced_faults` egs subset dir 2023-01-29 14:55:02 -05:00
Tyler Goodlet 36a83cb306 Refine example to drop IPC mid-stream
Use a task nursery in the subactor to spawn tasks which cancel the IPC
channel mid stream to simulate the most concurrent case we're likely to
see. Make `main()` accept a `debug_mode: bool` for parametrization. Fill
out detailed comments/docs on this example.
2023-01-29 14:55:02 -05:00
Tyler Goodlet 158569adae Add WIP example of silent IPC breaks while streaming 2023-01-29 14:55:02 -05:00
Tyler Goodlet c47575997a Expand nested case to include error prop and breakpointing 2022-10-14 19:42:23 -04:00
Tyler Goodlet c0dd5d7ffc Adjust multi-daemon test to be more deterministic 2022-10-14 19:42:23 -04:00
Tyler Goodlet 10eeda2d2b Use built-ins for all data-structure-type annotations 2022-09-15 23:41:28 -04:00
Tyler Goodlet 56c19093bb Add basic module-not-found when opening a ctx eg. 2022-08-02 12:17:06 -04:00
Tyler Goodlet 2a61aa099b Move pydantic-click hang example to new dir, skip in test suite 2022-08-02 12:16:58 -04:00
Tyler Goodlet 4fd924cfd2 Make example a subpkg for `python -m <mod>` testing 2022-07-27 11:40:02 -04:00
Tyler Goodlet fe0fd1a1c1 Add example that triggers bug 2022-07-27 11:40:02 -04:00
Tyler Goodlet dd23e78de1 Add back in async gen loop 2022-07-27 11:40:02 -04:00
Tyler Goodlet bb732cefd0 Drop high log level in ctx example 2022-07-27 11:40:02 -04:00
Tyler Goodlet 21dccb2e79 A `.open_context()` example that causes a hang!
Finally! I think this may be the root issue we've been seeing in
production in a client project.

No idea yet why this is happening but the fault-causing sequence seems
to be:
- `.open_context()` in a child actor
- enter the debugger via `tractor.breakpoint()`
- continue from that entry via `c` command in REPL
- raise an error just after inside the context task's body

Looking at logging it appears as though the child thinks it has the tty
but no input is accepted on the REPL and a further `ctrl-c` results in
some teardown but also a further hang where both parent and child become
unresponsive..
2022-07-27 11:40:02 -04:00
Tyler Goodlet 99c4319940 Fix example name typo 2022-07-27 11:40:02 -04:00
Tyler Goodlet 42f9d10252 Add a pre-started breakpoint example 2022-07-27 11:40:02 -04:00
Tyler Goodlet 98de2fab31 Add context test that opens an inter-task-channel that errors 2022-07-14 16:13:12 -04:00
Tyler Goodlet 73d252e09e Emphasize `asyncio` only with sleeps 2021-12-17 09:38:54 -05:00
Tyler Goodlet b463841019 Add infected `asyncio` echo server example 2021-12-17 09:38:04 -05:00
Tyler Goodlet 949aa9c405 Lol. should probably push the example code... 2021-12-10 12:48:05 -05:00
Tyler Goodlet 7b9d410c4d Adjust remaining examples and tests for non-backpressure default 2021-12-05 19:52:09 -05:00
Tyler Goodlet 546e1b2fa3 Drop unecessary partial 2021-11-04 10:41:25 -04:00
Tyler Goodlet 4cbb8641de Add an `open_actor_cluster()` usage example 2021-11-02 15:37:36 -04:00
Tyler Goodlet 8ba10315c1 Fix type path to new `_supervise` mod 2021-10-23 15:54:40 -04:00
Tyler Goodlet b14699d40b Adjust debugger tests to expect depth > 1 crashes
With the new fixes to the trio spawner we can expect that both root
*and* depth > 1 nursery owning actors will now not clobber any children
that are in debug (either via breakpoint or through crashing). The tests
changed now include more checks which ensure the 2nd level parent-ish
actors also bubble up through into `pdb` and don't kill any of their
(crashed) children before they're done themselves debugging.
2021-10-14 13:39:46 -04:00
Tyler Goodlet ace1b1312c Terminate async gen example caller to avoid (benign) errors in console output 2021-08-02 21:49:15 -04:00
Tyler Goodlet 674fbbc6b3 Docs and comments tidying 2021-08-01 10:44:13 -04:00