Commit Graph

451 Commits (b048ad893b002ae672814c9bf6a56dab2c9dac0e)

Author SHA1 Message Date
Tyler Goodlet b048ad893b Various test tweaks related to 3.13 egs
Including changes like,
- loose eg flagging in various test emedded `trio.open_nursery()`s.
- changes to eg handling (like using `except*`).
- added `debug_mode` integration to tests that needed some REPLin
  in order to figure out appropriate updates.
2025-03-10 12:17:53 -04:00
Tyler Goodlet ac9bf30d09 More `debug_mode` test support, better nursery var names 2025-03-10 12:17:53 -04:00
Tyler Goodlet d6a0c515ec Add per-side graceful-exit/cancel excs-as-signals
Such that any combination of task terminations/exits can be explicitly
handled and "dual side independent" crash cases re-raised in egs.

The main error-or-exit impl changes include,

- use of new per-side "signaling exceptions":
  - TrioTaskExited|TrioCancelled for signalling aio.
  - AsyncioTaskExited|AsyncioCancelled for signalling trio.

- NOT overloading the `LinkedTaskChannel._trio/aio_err` fields for
  err-as-signal relay and instead add a new pair of
  `._trio/aio_to_raise` maybe-exc-attrs which allow each side's
  task to specify what it would want the other side to raise to signal
  its/a termination outcome:
  - `._trio_to_raise: AsyncioTaskExited|AsyncioCancelled` to signal,
    |_ the aio task having returned while the trio side was still reading
       from the `asyncio.Queue` or is just not `.done()`.
    |_ the aio task being self or trio-request cancelled where
       a `asyncio.CancelledError` is raised and caught but NOT relayed
       as is back to trio; instead signal a "more explicit" exc type.
  - `._aio_to_raise: TrioTaskExited|TrioCancelled` to signal,
    |_ the trio task having returned while the aio side was still reading
       from the mem chan and indicating that the trio side might not
       care any more about future streamed values (like the
       `Stop/EndOfChannel` equivs for ipc `Context`s).
    |_ when the trio task canceld we do
        a `asyncio.Future.set_exception(TrioTaskExited())` to indicate
        to the aio side verbosely that it should cancel due to the trio
        parent.
  - `_aio/trio_err` are now left to only capturing the **actual**
    per-side task excs for introspection / other side's handling logic.

- supporting "graceful exits" depending on API in use from
  `translate_aio_errors()` such that if either side exits but the other
  side isn't expect to consume the final `return`ed value, we just exit
  silently, which required:
  - adding a `suppress_graceful_exits: bool` flag.
  - adjusting the `maybe_raise_aio_side_err()` logic to use that flag
    and suppress only on certain combos of `._trio_to_raise/._trio_err`.
  - prefer to raise `._trio_to_raise` when the aio-side is the src and
    vice versa.

- filling out pedantic logging for cancellation cases indicating which
  side is the cause.

- add a `LinkedTaskChannel._aio_result` modelled after our
  `Context._result` a a similar `.wait_for_result()` interface which
  allows maybe accessing the aio task's final return value if desired
  when using the `open_channel_from()` API.

- rename `cancel_trio()` done handler -> `signal_trio_when_done()`

Also some fairly major test suite updates,
- add a `delay: int` producing fixture which delivers a much larger
  timeout whenever `debug_mode` is set so that the REPL can be used
  without a surrounding cancel firing.
- add a new `test_aio_exits_early_relays_AsyncioTaskExited` including
  a paired `exit_early: bool` flag to `push_from_aio_task()`.
- adjust `test_trio_closes_early_causes_aio_checkpoint_raise` to expect
  a `to_asyncio.TrioTaskExited`.
2025-03-10 12:17:53 -04:00
Tyler Goodlet 80e656b53f Another loose-egs flag in `test_child_manages_service_nursery` 2025-03-10 12:17:53 -04:00
Tyler Goodlet dd250fce46 Another `is` fix.. 2025-03-10 12:17:53 -04:00
Tyler Goodlet c53d6429c6 Unset `$PYTHON_COLORS` for test debugger suite..
Since obvi all our `pexpect` patterns aren't going to match with
a heck-ton of terminal color escape sequences in the output XD
2025-03-10 12:17:53 -04:00
Tyler Goodlet 6cf4b4671f Tweak some test asserts to better `is` style 2025-03-10 12:17:53 -04:00
Tyler Goodlet a1d75625e4 Draft test-doc for "out-of-band" `asyncio.Task`..
Since there's no way to activate `greenback`'s portal in such cases, we
should at least have a test verifying our very loud error about the
inability to support this usage..
2025-03-10 12:14:40 -04:00
Tyler Goodlet e2b9c3e769 Add a `tests/test_root_infect_asyncio`
Might as well break apart the specific test set since there are some
(minor) subtleties and the orig test mod is already getting pretty big
XD

Includes both the new "independent"-event-loops test as well as the std
usage base case suite.
2025-03-10 12:04:51 -04:00
Tyler Goodlet ae18ceb633 Impl a proto "unmasker" `@acm` alongside our test
Such that the suite verifies the wip `maybe_raise_from_masking_exc()`
will raise from a `trio.Cancelled.__context__` since I can't think of
any reason a `Cancelled` should ever be raised in-place of
a non-`Cancelled` XD

Not sure what should be raised instead (or maybe just a `log.warning()`
emitted?) but this starts a draft for refinement at the least. Use the
new `@pytest.mark.parametrize` explicit tuple-of-params form with an
`pytest.param + `.mark.xfail()` for the default behaviour case.
2025-03-10 12:04:51 -04:00
Tyler Goodlet 917699417f Add a "raise-from-`finally:`" example test
Since i wasted 2 days just to find an example of this inside an `@acm`,
figured I better reproduce for the purposes of maybe implementing
a warning sys (inside our wip proto `open_taskman()`) when a nursery
detects a single `Cancelled` in an eg where the `.__context__` is set to
some non-cancel error (which likely means a cancel-causing source
exception was suppressed by accident).

Left in a buncha commented code using `maybe_open_nursery()` which
i thought might be part of the issue but didn't end up being required;
will likely remove on a follow up refinement.
2025-03-10 12:04:51 -04:00
Tyler Goodlet 095bf28f5d Add an inter-leaved-task error test
Trying to replicate cases where errors are raised in both `trio` and
`asyncio` tasks independently (at least in `.to_asyncio` API terms) with
a new `test_trio_prestarted_task_bubbles` that generates 3 cases inside
a `@acm` calls stack composing a `trio.Nursery` with
a `to_asyncio.open_channel_from()` call where a set of `trio` tasks are
started in a loop using `.start()` with various exc raising sequences,
- the aio task raising *before* the last `trio` task spawns.
- the aio task raising just after the last trio task spawns, but before
  it starts.
- after the last trio task `.start()` call returns control to the
  parent - but (for now) did not error.

TODO, still more cases to discover as i'm still fighting a `modden` bug
of this sort atm..

Other,
- tweak some other tests to have timeouts since some recent hangs were
  found..
- started mucking with py3.13 and thus adjustments for strict egs in
  some tests; full patchset to test suite likely coming soon!
2025-03-10 12:04:51 -04:00
Tyler Goodlet 9c83f02568 Support and test infected-`asyncio`-mode for root
Such that you can use,

```python

    tractor.to_asyncio.run_as_asyncio_guest(
        trio_main=_trio_main,
    )
```

to boostrap the root actor (and thus main parent process) to embed
the actor-rumtime into an `asyncio` loop. Prove it all works with an
subactor-free version of the aio echo-server test suite B)
2025-03-10 12:04:51 -04:00
Tyler Goodlet b875b35b98 Change `tractor.breakpoint()` to new `.pause()` in test suite 2024-12-09 16:08:55 -05:00
Tyler Goodlet 46ddc214cd Wrap `asyncio_bp.py` ex into test suite
Ensuring we can at least use `breakpoint()` from an infected actor's
`asyncio.Task` spawned via a `.to_asyncio` API.

Also includes a little `tests/devx/` reorging,
- start splitting out non-`tractor.pause()` tests into a new
  `test_pause_from_non_trio.py` for all the `.pause_from_sync()`
  use in bg-threaded or `asyncio` applications.
- factor harness commonalities to the `devx/conftest` (namely
  the `do_ctlc()` masher).
- mv `test_pause_from_sync` to the new non`-trio` mod.

NOTE, the `ctlc=True` is still failing for
`test_pause_from_asyncio_task` which is a user-happiness bug but not
anything fundamentally broken - just need to handle the `asyncio` case
in `.devx._debug.sigint_shield()`!
2024-12-09 15:38:28 -05:00
Tyler Goodlet b3ee20d3b9 Add `breakpoint()` hook restoration example + test 2024-12-05 20:56:39 -05:00
Tyler Goodlet 1f1a3f19d5 Fix multi-daemon debug test `break` signal..
It was expecting `AssertionError` as a proceed-in-test signal (by
breaking from a continue loop), but `in_prompt_msg(raise_on_err=True)`
was changed to raise `ValueError`; so instead just use as a predicate
for the `break`.

Also rework `in_prompt_msg()` to accept the `child: BaseSpawn` as input
instead of `before: str` remove the casting boilerplate, and adjust all
usage to match.
2024-07-12 15:57:41 -04:00
Tyler Goodlet 8363317e11 Pass `infect_asyncio` setting via runtime-vars
The reason for this "duplication" with the `--asyncio` CLI flag (passed
to the child during spawn) is 2-fold:
- allows verifying inside `Actor._from_parent()` that the `trio` runtime was
  started via `.start_guest_run()` as well as if the
  `Actor._infected_aio` spawn-entrypoint value has been set (by the
  `._entry.<spawn-backend>_main()` whenever `--asyncio` is passed)
  such that any mismatch can be signaled via an `InternalError`.
- enables checking the `._state._runtime_vars['_is_infected_aio']` value
  directly (say from a non-actor/`trio`-thread) instead of calling
  `._state.current_actor(err_on_no_runtime=False)` in certain edge
  cases.

Impl/testing deats:
- add `._state._runtime_vars['_is_infected_aio'] = False` default.
- raise `InternalError` on any `--asyncio`-flag-passed vs.
  `_runtime_vars`-value-relayed-from-parent inside
  `Actor._from_parent()` and include a `Runner.is_guest` assert for good
  measure B)
- set and relay `infect_asyncio: bool` via runtime-vars to child in
  `ActorNursery.start_actor()`.
- verify `actor.is_infected_aio()`, `actor._infected_aio` and
  `_state._runtime_vars['_is_infected_aio']` are all set in test suite's
  `asyncio_actor()` endpoint.
2024-07-11 13:22:53 -04:00
Tyler Goodlet a628eabb30 Officially test proto-ed `stackscope` integration
By re-purposing our `pexpect`-based console matching with a new
`debugging/shield_hang_in_sub.py` example, this tests a few "hanging
actor" conditions more formally:

- that despite a hanging actor's task we can dump
  a `stackscope.extract()` tree on relay of `SIGUSR1`.
- the actor tree will terminate despite a shielded forever-sleep by our
  "T-800" zombie reaper machinery activating and hard killing the
  underlying subprocess.

Some test deats:
- simulates the expect actions of a real user by manually using
  `os.kill()` to send both signals to the actor-tree program.
- `pexpect`-matches against `log.devx()` emissions under normal
  `debug_mode == True` usage.
- ensure we get the actual "T-800 deployed" `log.error()` msg and
  that the actor tree eventually terminates!

Surrounding (re-org/impl/test-suite) changes:
- allow disabling usage via a `maybe_enable_greenback: bool` to
  `open_root_actor()` but enable by def.
- pretty up the actual `.devx()` content from `.devx._stackscope`
  including be extra pedantic about the conc-primitives for each signal
  event.
- try to avoid double handles of `SIGUSR1` even though it seems the
  original (what i thought was a) problem was actually just double
  logging in the handler..
  |_ avoid double applying the handler func via `signal.signal()`,
  |_ use a global to avoid double handle func calls and,
  |_ a `threading.RLock` around handling.
- move common fixtures and helper routines from `test_debugger` to
  `tests/devx/conftest.py` and import them for use in both test mods.
2024-07-10 19:58:27 -04:00
Tyler Goodlet d216068713 Start a new `tests/devx/` tooling-subsuite-pkg 2024-07-10 15:52:38 -04:00
Tyler Goodlet 131e3e8157 Move `mk_cmd()` to `._testing`
Since we're going to need it more generally for `.devx` sub-sys tooling
tests.

Also, up the sync-pause ctl-c delay another 10ms..
2024-07-10 15:40:44 -04:00
Tyler Goodlet fc95c6719f Get multi-threaded sync-pausing fully workin!
The final issue was making sure we do the same thing on ctl-c/SIGINT
from the user. That is, if there's already a bg-thread in REPL, we
`log.pdb()` about SIGINT shielding and re-draw the prompt; the same UX
as normal actor-runtime-task behaviour.

Reasons this wasn't workin.. and the fix:
- `.pause_from_sync()` was overriding the local `repl` var with `None`
  delivered by (transitive) calls to `_pause(debug_func=None)`.. so
  remove all that and only assign it OAOO prior to thread-type case
  branching.
- always call `DebugStatus.shield_sigint()` as needed from all requesting
  threads/tasks:
  - in `_pause_from_bg_root_thread()` BEFORE calling `._pause()` AND BEFORE
    yielding back to the bg-thread via `.started(out)` to ensure we're
    definitely overriding the handler in the `trio`-main-thread task
    before unblocking the requesting bg-thread.
  - from any requesting bg-thread in the root actor such that both its
    main-`trio`-thread scheduled task (as per above bullet) AND it are
    SIGINT shielded.
  - always call `.shield_sigint()` BEFORE any `greenback._await()` case
    don't entirely grok why yet, but it works)?
  - for `greenback._await()` case always set `bg_task` to the current one..
- tweaks to the `SIGINT` handler, now renamed `sigint_shield()` so as
  not to name-collide with the methods when editor-searching:
  - always try to `repr()` the REPL thread/task "owner" as well as the
    active `PdbREPL` instance.
  - add `.devx()` notes around the prompt flushing deats and comments
    for any root-actor-bg-thread edge cases.

Related/supporting refinements:
- add `get_lock()`/`get_debug_req()` factory funcs since the plan is to
  eventually implement both as `@singleton` instances per actor.
- fix `acquire_debug_lock()`'s call-sig-bug for scheduling
  `request_root_stdio_lock()`..
- in `._pause()` only call `mk_pdb()` when `debug_func != None`.
- add some todo/warning notes around the `cls.repl = None` in
  `DebugStatus.release()`

`test_pause_from_sync()` tweaks:
- don't use a `attach_patts.copy()`, since we always `break` on match.
- do `pytest.fail()` on that ^ loop's fallthrough..
- pass `do_ctlc(child, patt=attach_key)` such that we always match the
  the current thread's name with the ctl-c triggered `.pdb()` emission.
- oh yeah, return the last `before: str` from `do_ctlc()`.
- in the script, flip `abandon_on_cancel=True` since when `False` it
  seems to cause `trio.run()` to hang on exit from the last bg-thread
  case?!?
2024-07-10 12:29:05 -04:00
Tyler Goodlet e6ccfce751 Adjusts advanced fault tests to match new `TransportClosed` semantics 2024-07-05 13:31:29 -04:00
Tyler Goodlet 31207f92ee Finally implement peer-lookup optimization..
There's a been a todo for soo long for this XD

Since all `Actor`'s store a set of `._peers` we can try a lookup on that
table as a shortcut before pinging the registry Bo

Impl deats:
- add a new `._discovery.get_peer_by_name()` routine which attempts the
  `._peers` lookup by combining a copy of that `dict` + an entry added
  for `Actor._parent_chan` (since all subs have a parent and often the
  desired contact is just that connection).
- change `.find_actor()` (for the `only_first == True` case),
  `.query_actor()` and `.wait_for_actor()` to call the new helper and
  deliver appropriate outputs if possible.

Other,
- deprecate `get_arbiter()` def and all usage in tests and examples.
- drop lingering use of `arbiter_sockaddr` arg to various routines.
- tweak the `Actor` doc str as well as some code fmting and a tweak to
  the `._stream_handler()`'s initial `con_status: str` logging value
  since the way it was could never be reached.. oh and `.warning()` on
  any new connections which already have a `_pre_chan: Channel` entry in
  `._peers` so we can start minimizing IPC duplications.
2024-07-04 19:40:11 -04:00
Tyler Goodlet ba83bab776 Todo a test for sync-pausing from non-main-root-tasks 2024-06-28 19:26:35 -04:00
Tyler Goodlet 18d440c207 (Re)type annot some tests
- For the (still not finished) `test_caps_based_msging`, switch to
  using the new `PayloadMsg`.
- add `testdir` fixture type.
2024-06-28 19:24:03 -04:00
Tyler Goodlet cb90f3e6ba Update `MsgTypeError` content matching to latest 2024-06-28 14:46:29 -04:00
Tyler Goodlet 268bd0d8ec Demo-abandonment on shielded `trio`-side work
Finally this reproduces the issue as it (originally?) exhibited inside
`piker` where the `Actor.lifetime_stack` wasn't closed in cases where
during `infected_aio`-actor cancellation/shutdown `trio` side tasks
which are doing shielded (teardown) work are NOT being watched/waited on
from the `aio_main()` task-closure inside `run_as_asyncio_guest()`!

This is then the root cause of the guest-run being abandoned since if
our `aio_main()` task-closure doesn't know it should allow the run to
finish, it's going to call `loop.close()` eventually resulting in the
`GeneratorExit` thrown into `trio._core._run.unrolled_run()`..

So, this extends the `test_sigint_closes_lifetime_stack()` suite to
include cases for such shielded `trio`-task ops:
- add a new `trio_side_is_shielded: bool` which will toggle whether to
  add a shielded 0.5s `trio.sleep()` loop to `manage_file()` which
  should outlive the `asyncio` event-loop shutdown sequence and result
  in an abandoned guest-run and thus a leaked file.
- parametrize the existing suite with this case resulting in a total 16
  test set B)

This patch demonstrates the problem with our `aio_main()` task-closure
impl via the now 4 failing tests, a fix is coming in a follow up commit!
2024-06-26 12:01:36 -04:00
Tyler Goodlet 4f1db1ff52 Lel, revert `AsyncioCancelled` inherit, module..
Turns out it somehow breaks our `to_asyncio` error relay since obvi
`asyncio`'s runtime seems to specially handle it (prolly via
`isinstance()` ?) and it caused our
`test_aio_cancelled_from_aio_causes_trio_cancelled()` to hang..
Further, obvi `unpack_error()` won't be able to find the type def if not
kept inside `._exceptions`..

So given all that, revert the change/move as well as:
- tweak the aio-from-aio cancel test to timeout.
- do `trio.sleep()` conc with any bg aio task by moving out nursery
  block.
- add a `send_sigint_to: str` parameter to
  `test_sigint_closes_lifetime_stack()` such that we test the SIGINT
  being relayed to just the parent or the child.
2024-06-25 23:47:14 -04:00
Tyler Goodlet a870df68c0 Hack `asyncio` to not abandon a guest-mode run?
Took me a while to figure out what the heck was going on but, turns out
`asyncio` changed their SIGINT handling in 3.11 as per:

https://docs.python.org/3/library/asyncio-runner.html#handling-keyboard-interruption

I'm not entirely sure if it's the 3.11 changes or possibly wtv further
updates were made in 3.12  but more or less due to the way
our current main task was written the `trio` guest-run was getting
abandoned on SIGINTs sent from the OS to the infected child proc..

Note that much of the bug and soln cases are layed out in very detailed
comment-notes both in the new test and `run_as_asyncio_guest()`, right
above the final "fix" lines.

Add new `test_infected_aio.test_sigint_closes_lifetime_stack()` test suite
which reliably triggers all abandonment issues with multiple cases
of different parent behaviour post-sending-SIGINT-to-child:
 1. briefly sleep then raise a KBI in the parent which was originally
    demonstrating the file leak not being cleaned up by `Actor.lifetime_stack.close()`
    and simulates a ctl-c from the console (relayed in tandem by
    the OS to the parent and child processes).
 2. do `Context.wait_for_result()` on the child context which would
    hang and timeout since the actor runtime would never complete and
    thus never relay a `ContextCancelled`.
 3. both with and without running a `asyncio` task in the `manage_file`
    child actor; originally it seemed that with an aio task scheduled in
    the child actor the guest-run abandonment always was the "loud" case
    where there seemed to be some actor teardown but with tbs from
    python failing to gracefully exit the `trio` runtime..

The (seemingly working) "fix" required 2 lines of code to be run inside
a `asyncio.CancelledError` handler around the call to `await trio_done_fut`:
- `Actor.cancel_soon()` which schedules the actor runtime to cancel on
  the next `trio` runner cycle and results in a "self cancellation" of
  the actor.
- "pumping the `asyncio` event loop" with a non-0 `.sleep(0.1)` XD
 |_ seems that a "shielded" pump with some actual `delay: float >= 0`
   did the trick to get `asyncio` to allow the `trio` runner/loop to
   fully complete its guest-run without abandonment.

Other supporting changes:
- move `._exceptions.AsyncioCancelled`, our renamed
  `asyncio.CancelledError` error-sub-type-wrapper, to `.to_asyncio` and make
  it derive from `CancelledError` so as to be sure when raised by our
  `asyncio` x-> `trio` exception relay machinery that `asyncio` is
  getting the specific type it expects during cancellation.
- do "summary status" style logging in `run_as_asyncio_guest()` wherein
  we compile the eventual `startup_msg: str` emitted just before waiting
  on the `trio_done_fut`.
- shield-wait with `out: Outcome = await asyncio.shield(trio_done_fut)`
  even though it seems to do nothing in the SIGINT handling case..(I
  presume it might help avoid abandonment in a `asyncio.Task.cancel()`
  case maybe?)
2024-06-24 16:10:23 -04:00
Tyler Goodlet affc210033 Update pld-rx limiting test(s) to use deco input
The tests only use one input spec (conveniently) so there's not much to
change in the logic,
- only pass the `maybe_msg_spec` to the child-side decorator and obvi
  drop the surrounding `msgops.limit_plds()` block in the child.
- tweak a few `MsgDec` asserts, mostly dropping the
  `msg._ops._def_any_spec` state checks since the child-side won't have
  any pre pld-spec state given the runtime now applies the `pld_spec`
  before running the task's func body.
  - also allowed dropping the `finally:` which did a similar check
    outside the `.limit_plds()` block.
2024-06-17 09:24:03 -04:00
Tyler Goodlet a6058d14ae Use new `._debug._repl_fail_msg` inside `test_pause_from_sync` 2024-06-10 17:57:43 -04:00
Tyler Goodlet 408a74784e Catch `.pause_from_sync()` in root bg thread bugs!
Originally discovered as while using `tractor.pause_from_sync()`
from the `i3ipc` client running in a bg-thread that uses `asyncio`
inside `modden`.

Turns out we definitely aren't correctly handling `.pause_from_sync()`
from the root actor when called from a `trio.to_thread.run_sync()`
bg thread:
- root-actor bg threads which can't `Lock._debug_lock.acquire()` since
  they aren't in `trio.Task`s.
- even if scheduled via `.to_thread.run_sync(_debug._pause)` the
  acquirer won't be the task/thread which calls `Lock.release()` from
  `PdbREPL` hooks; this results in a RTE raised by `trio`..
- multiple threads will step on each other's stdio since cpython's GIL
  seems to ctx switch threads on every input from the user to the REPL
  loop..

Reproduce via reworking our example and test so that they catch and fail
for all edge cases:
- rework the `/examples/debugging/sync_bp.py` example to demonstrate the
  above issues, namely the stdio clobbering in the REPL when multiple
  threads and/or a subactor try to debug simultaneously.
  |_ run one thread using a task nursery to ensure it runs conc with the
     nursery's parent task.
  |_ ensure the bg threads run conc a subactor usage of
     `.pause_from_sync()`.
  |_ gravely detail all the special cases inside a TODO comment.
  |_ add some control flags to `sync_pause()` helper and don't use
     `breakpoint()` by default.
- extend and adjust `test_debugger.test_pause_from_sync` to match (and
  thus currently fail) by ensuring exclusive `PdbREPL` attachment when
  the 2 bg root-actor threads are concurrently interacting alongside the
  subactor:
  |_ should only see one of the `_pause_msg` logs at a time for either
     one of the threads or the subactor.
  |_ ensure each attaches (in no particular order) before expecting the
     script to exit.

Impl adjustments to `.devx._debug`:
- drop `Lock.repl`, no longer used.
- add `Lock._owned_by_root: bool` for the `.ctx_in_debug == None`
  root-actor-task active case.
- always `log.exception()` for any `._debug_lock.release()` ownership
  RTE emitted by `trio`, like we used to..
- add special `Lock.release()` log message for the stale lock but
  `._owned_by_root == True` case; oh yeah and actually
  `log.devx(message)`..
- rename `Lock.acquire()` -> `.acquire_for_ctx()` since it's only ever
  used from subactor IPC usage; well that and for local root-task
  usage we should prolly add a `.acquire_from_root_task()`?
- buncha `._pause()` impl improvements:
 |_ type `._pause()`'s `debug_func` as a `partial` as well.
 |_ offer `called_from_sync: bool` and `called_from_bg_thread: bool`
    for the special case handling when called from `.pause_from_sync()`
 |_ only set `DebugStatus.repl/repl_task` when `debug_func != None`
   (OW ensure the `.repl_task` is not the current one).
 |_ handle error logging even when `debug_func is None`..
 |_ lotsa detailed commentary around root-actor-bg-thread special cases.
- when `._set_trace(hide_tb=False)` do `pdbp.set_trace(frame=currentframe())`
  so the `._debug` internal frames are always included.
- by default always hide tracebacks for `.pause[_from_sync]()` internals.
- improve `.pause_from_sync()` to avoid root-bg-thread crashes:
 |_ pass new `called_from_xxx_` flags and ensure `DebugStatus.repl_task`
    is actually set to the `threading.current_thread()` when needed.
 |_ manually call `Lock._debug_lock.acquire_nowait()` for the non-bg
    thread case.
 |_ TODO: still need to implement the bg-thread case using a bg
    `trio.Task`-in-thread with an `trio.Event` set by thread REPL exit.
2024-06-06 16:56:30 -04:00
Tyler Goodlet d802c8aa90 Woops, set `post_mortem=False` by default again! 2024-05-30 18:33:25 -04:00
Tyler Goodlet 8ea0f08386 Finally, officially support shielded REPL-ing!
It's been a long time prepped and now finally implemented!

Offer a `shield: bool` argument from our async `._debug` APIs:
- `await tractor.pause(shield=True)`,
- `await tractor.post_mortem(shield=True)`

^-These-^ can now be used inside cancelled `trio.CancelScope`s,
something very handy when introspecting complex (distributed) system
tear/shut-downs particularly under remote error or (inter-peer)
cancellation conditions B)

Thanks to previous prepping in a prior attempt and various patches from
the rigorous rework of `.devx._debug` internals around typed msg specs,
there ain't much that was needed!

Impl deats
- obvi passthrough `shield` from the public API endpoints (was already
  done from a prior attempt).
- put ad-hoc internal `with trio.CancelScope(shield=shield):` around all
  checkpoints inside `._pause()` for both the root-process and subactor
  case branches.

Add a fairly rigorous example, `examples/debugging/shielded_pause.py`
with a wrapping `pexpect` test, `test_debugger.test_shield_pause()` and
ensure it covers as many cases as i can think of offhand:

- multiple `.pause()` entries in a loop despite parent scope
  cancellation in a subactor RPC task which itself spawns a sub-task.
- a `trio.Nursery.parent_task` which raises, is handled and
  tries to enter and unshielded `.post_mortem()`, which of course
  internally raises `Cancelled` in a `._pause()` checkpoint, so we catch
  the `Cancelled` again and then debug the debugger's internal
  cancellation with specific checks for the particular raising
  checkpoint-LOC.
- do ^- the latter -^ for both subactor and root cases to ensure we
  can debug `._pause()` itself when it tries to REPL engage from
  a cancelled task scope Bo
2024-05-30 17:52:24 -04:00
Tyler Goodlet 2f854a3e86 Add a `tractor.post_mortem()` API test + example
Since turns out we didn't have a single example using that API Bo

The test granular-ly checks all use cases:
- `.post_mortem()` manual calls in both subactor and root.
- ensuring built-in RPC crash handling activates after each manual one
  from ^.
- drafted some call-stack frame checking that i commented out for now
  since we need to first do ANSI escape code removal due to the
  colorization that `pdbp` does by default.
  |_ added a TODO with SO link on `assert_before()`.

Also todo-staged a shielded-pause test to match with the already
existing-but-needs-refinement example B)
2024-05-30 16:03:28 -04:00
Tyler Goodlet cdb1311e40 Change `reraise` to `post_mortem: bool` in `maybe_expect_raises()` 2024-05-30 16:02:59 -04:00
Tyler Goodlet 02a7c7c276 Ensure only a boxed traceback for MTE on parent side 2024-05-30 01:11:29 -04:00
Tyler Goodlet 4fa71cc01c Ensure ctx error-state matches the MTE scenario
Namely checking that `Context._remote_error` is set to the raised MTE
in the invalid started and return value cases since prior to the recent
underlying changes to the `Context.result()` impl, it would not match.

Further,
- do asserts for non-MTE raising cases in both the parent and child.
- add todos for testing ctx-outcomes for per-side-validation policies
  i anticipate supporting and implied msg-dialog race cases therein.
2024-05-28 20:07:48 -04:00
Tyler Goodlet f7fd8278af Fix `test_basic_payload_spec` bad msg matching
Expecting `Started` or `Return` with respective bad `.pld` values
depending on what type of failure is test parametrized.

This makes the suite run green it seems B)
2024-05-28 11:05:46 -04:00
Tyler Goodlet c2cc12e14f Add basic payload-spec test suite
Starts with some very basic cases:
- verify both subactor-as-child-ctx-task send side validation (failures)
  as well as relay and raise on root-parent-side-task.
- wrap failure expectation cases that bubble out of `@acm`s with
  a `maybe_expect_raises()` equiv wrapper with an embedded timeout.
- add `Return` cases including invalid by `str` and valid by a `None`.

Still ToDo:
- commit impl changes to make the bulk of this suite pass.
- adjust how `MsgTypeError`s format the local (`.started()`) send side
  `.tb_str` such that we don't do a "boxed" error prior to
  `pack_error()` being called normally prior to `Error` transit.
2024-05-27 13:52:35 -04:00
Tyler Goodlet 702dfe47d5 Update debugger tests to expect new pformatting
Mostly the result of the `RemoteActorError.pformat()` and our
new `_pause/crash_msg: str`s which include the `trio.Task.__repr__()`
in the `log.pdb()` message.

Obvi use the `in_prompt_msg()` to accomplish where not used prior.

ToDo later:
-[ ] still some outstanding questions on how detailed inceptions
   should look, eg. in `test_multi_nested_subactors_error_through_nurseries()`
  |_maybe we should be more pedantic at checking `.src_uid` vs.
    `.relay_uid` fields?
-[ ] staged a placeholder test for verifying correct call-stack frame on
   crash handler REPL entry.
-[ ] also need a test to verify that you can't pause from an already paused actor task
   such as can happen if you try to step through runtime code that has
   a recurrent entry to `._debug.pause()`.
2024-05-22 15:01:31 -04:00
Tyler Goodlet 5cb0cc0f0b Update tests for `PldRx` and `Context` changes
Mostly adjustments for the new pld-receiver semantics/shim-layer which
results more often in the direct delivery of `RemoteActorError`s from
IPC API primitives (like `Portal.result()`) instead of being embedded in
an `ExceptionGroup` bundled from an embedded nursery.

Tossed usage of the `debug_mode: bool` fixture to a couple problematic
tests while i was working on them.

Also includes detailed assertion updates to the inter-peer cancellation
suite in terms of,
- `Context.canceller` state correctly matching the true src actor when
  expecting a ctxc.
- any rxed `ContextCancelled` should instance match the `Context._local/remote_error`
  as should the `.msgdata` and `._ipc_msg`.
2024-05-09 16:48:53 -04:00
Tyler Goodlet fbc21a1dec Add a "current IPC `Context`" `ContextVar`
Expose it from `._state.current_ipc_ctx()` and set it inside
`._rpc._invoke()` for child and inside `Portal.open_context()` for
parent.

Still need to write a few more tests (particularly demonstrating usage
throughout multiple nested nurseries on each side) but this suffices as
a proto for testing with some debugger request-from-subactor stuff.

Other,
- use new `.devx.pformat.add_div()` for ctxc messages.
- add a block to always traceback dump on corrupted cs stacks.
- better handle non-RAEs exception output-formatting in context
  termination summary log message.
- use a summary for `start_status` for msg logging in RPC loop.
2024-05-07 15:35:45 -04:00
Tyler Goodlet 3869e91b19 More msg-spec tests tidying
- Drop `test_msg_spec_xor_pld_spec()` since we no longer support
  `ipc_msg_spec` arg to `mk_codec()`.
- Expect `MsgTypeError`s around `.open_context()` calls when
  `add_codec_hooks == False`.
- toss in some `.pause()` points in the subactor ctx body whilst hacking
  out a `.pld` protocol for debug mode TTY locking.
2024-04-14 19:50:09 -04:00
Tyler Goodlet b209990d04 Flip a last `MultiError` to a beg, add todo on `@stream` func 2024-04-14 19:39:57 -04:00
Tyler Goodlet cf48fdecfe Unify `MsgTypeError` as a `RemoteActorError` subtype
Since in the receive-side error case the source of the exception is the
sender side (normally causing a local `TypeError` at decode time), might
as well bundle the error in remote-capture-style using boxing semantics
around the causing local type error raised from the
`msgspec.msgpack.Decoder.decode()` and with a traceback packed from
`msgspec`-specific knowledge of any field-type spec matching failure.

Deats on new `MsgTypeError` interface:
- includes a `.msg_dict` to get access to any `Decoder.type`-applied
  load of the original (underlying and offending) IPC msg into
  a `dict` form using a vanilla decoder which is normally packed into
  the instance as a `._msg_dict`.
- a public getter to the "supposed offending msg" via `.payload_msg`
  which attempts to take the above `.msg_dict` and load it manually into
  the corresponding `.msg.types.MsgType` struct.
- a constructor `.from_decode()` to make it simple to build out error
  instances from a failed decode scope where the aforementioned
  `msgdict: dict` from the vanilla decode can be provided directly.
- ALSO, we now pack into `MsgTypeError` directly just like ctxc in
  `unpack_error()`

This also completes the while-standing todo for `RemoteActorError` to
contain a ref to the underlying `Error` msg as `._ipc_msg` with public
`@property` access that `defstruct()`-creates a pretty struct version
via `.ipc_msg`.

Internal tweaks for this include:
- `._ipc_msg` is the internal literal `Error`-msg instance if provided
  with `.ipc_msg` the dynamic wrapper as mentioned above.
- `.__init__()` now can still take variable `**extra_msgdata` (similar
  to the `dict`-msgdata as before) to maintain support for subtypes
  which are constructed manually (not only by `pack_error()`) and insert
  their own attrs which get placed in a `._extra_msgdata: dict` if no
  `ipc_msg: Error` is provided as input.
- the `.msgdata` is now a merge of any `._extra_msgdata` and
  a `dict`-casted form of any `._ipc_msg`.
- adjust all previous `.msgdata` field lookups to try equivalent field
  reads on `._ipc_msg: Error`.
- drop default single ws indent from `.tb_str` and do a failover lookup
  to `.msgdata` when `._ipc_msg is None` for the manually constructed
  subtype-instance case.
- add a new class attr `.extra_body_fields: list[str]` to allow subtypes
  to declare attrs they want shown in the `.__repr__()` output, eg.
  `ContextCancelled.canceller`, `StreamOverrun.sender` and
  `MsgTypeError.payload_msg`.
- ^-rework defaults pertaining to-^ with rename from
  `_msgdata_keys` -> `_ipcmsg_keys` with latter now just loading directly
  from the `Error` fields def and `_body_fields: list[str]` just taking
  that value and removing the not-so-useful-in-REPL or already shown
  (i.e. `.tb_str: str`) field names.
- add a new mod level `.pack_from_raise()` helper for auto-boxing RAE
  subtypes constructed manually into `Error`s which is normally how
  `StreamOverrun` and `MsgTypeError` get created in the runtime.
- in support of the above expose a `src_uid: tuple` override to
  `pack_error()` such that the runtime can provide any remote actor id
  when packing a locally-created yet remotely-caused RAE subtype.
- adjust all typing to expect `Error`s over `dict`-msgs.

Adjust some tests to match these changes:
- context and inter-peer-cancel tests to make their `.msgdata` related
  checks against the new `.ipc_msg` as well and `.tb_str` directly.
- toss in an extra sleep to `sleep_a_bit_then_cancel_peer()` to keep the
  'canceller' ctx child task cancelled by it's parent in the 'root' for
  the rte-raised-during-ctxc-handling case (apparently now it's
  returning too fast, cool?).
2024-04-09 10:07:10 -04:00
Tyler Goodlet 2f451ab9a3 Caps-msging test tweaks to get correct failures
These are likely temporary changes but still needed to actually see the
desired/correct failures (of which 5 of 6 tests are supposed to fail rn)
mostly to do with `Start` and `Return` msgs which are invalid under each
test's applied msg-spec.

Tweak set here:
- bit more `print()`s in root and sub for grokin test flow.
- never use `pytes.fail()` in subactor.. should know this by now XD
- comment out some bits that can't ever pass rn and make the underlying
  expected failues harder to grok:
  - the sub's child-side-of-ctx task doing sends should only fail
    for certain msg types like `Started` + `Return`, `Yield`s are
    processed receiver/parent side.
  - don't expect `sent` list to match predicate set for the same reason
    as last bullet.

The outstanding msg-type-semantic validation questions are:
- how to handle `.open_context()` with an input `kwargs` set that
  doesn't adhere to the currently applied msg-spec?
  - should the initial `@acm` entry fail before sending to the child
    side?
- where should received `MsgTypeError`s be raised, at the `MsgStream`
  `.receive()` or lower in the stack?
  - i'm thinking we should mk `MsgTypeError` derive from
    `RemoteActorError` and then have it be delivered as an error to the
    `Context`/`MsgStream` for per-ctx-task handling; would lead to more
    flexible/modular policy overrides in user code outside any defaults
    we provide.
2024-04-08 10:13:14 -04:00
Tyler Goodlet 10c98946bd Extend codec test to for msg-spec parameterizing
Set a diff `Msg.pld` spec per test and then send multiple types to
a child actor making sure the child can only send certain types over
a stream and fails with validation or decode errors ow. The test is also
param-ed both with and without hooks demonstrating how a custom type,
`NamespacePath`, needs them for effective use. The subactor IPC context
child is passed a `expect_ipc_send: dict` which relays the values along
with their expected `.send()`-ability.

Deats on technical refinements:
------ - ------
- added a `iter_maybe_sends()` send-value-as-msg-auditor and predicate
  generator (literally) so as to be able to pre-determine if given the
  current codec and `send_values` which values are expected to be IPC
  transmittable.
- as per ^, the diff value-msgs are first round-tripped inside
  a `Started` msg using the configured codec in the parent/root actor
  before bothering with using IPC primitives + a subactor; this is how
  the `expect_ipc_send` table is generated initially.
- for serializing the specs (`Union[Type]`s as required by `msgspec`),
  added a pair of codec hooks: `enc/dec_type_union()` (that ideally we
  move into a `.msg` submod eventually) which code the type-values as
  a `list[str]` of names.
  - the `dec_` hook had to be modified to NOT raise an error when an
    invalid/unhandled value arrives, this is because we do NOT want the
    RPC msg handling loop to raise on the `async for msg in chan:` and
    instead prefer to ignore and warn (for now, but eventually respond
    with error msg - see notes in hook body) these msgs when sent during
    a streaming phase; `Context.started()` will however error on a bad
    input for the current msg-spec since it is part of the "cheap"
    dialog (again see notes in `._context`) wherein the `Started` msg
    is always roundtripped prior to `Channel.send()` to guarantee
    the child adheres to its own spec.
- tossed in lotsa `print()`s for console groking of the run progress.

Further notes on typed-msging breaking cancellation:
------ - ------
- turns out since the runtime's cancellation implementation, being done
  with `Actor.cancel()` methods and friends will actually break when
  a stringent spec is applied (eg. a single type-spec) since the return
  values from said methods are generally `bool`s..
- this means we do indeed need special handling of "runtime RPC method
  invocations" since ideally a user's msg-spec choices do not break core
  functionality on them XD
=> The obvi solution is to add a/some special sub-`Msg` types for such
  cases, possibly just a `RuntimeReturn(Return)` type that will always
  include a `.pld: bool` for these cancel methods such that their
  results are always handled without msg type errors.

More to come on a (hopefully) elegant solution to that last bit!
2024-04-05 11:36:09 -04:00
Tyler Goodlet 5b551dd9fa Use `._testing.break_ipc()` in final advanced fault test child ctx 2024-04-05 10:53:07 -04:00