Prevent asyncio from abandoning guest-runs, .pause_from_sync() support via .to_asyncio #2

goodboy · 2024-06-24T19:11:03Z

goodboy commented

2024-06-24 19:11:03 +00:00

On `asyncio` being super lovely and abandoning our guest-run..

Took me a while to figure out what the heck was going on but, turns out asyncio changed their SIGINT handling in 3.11 as per:

https://docs.python.org/3/library/asyncio-runner.html#handling-keyboard-interruption

I’m not entirely sure if it’s the 3.11 changes or possibly wtv further updates were made in 3.12 but more or less due to the way our current main task was written the trio guest-run was getting abandoned on SIGINTs sent from the OS to the infected child proc..

Note that much of the bug and soln cases are layed out in very detailed comment-notes both in the new test and run_as_asyncio_guest(), right above the final “fix” lines.

The (seemingly working) “fix” required 2 lines of code to be run inside a asyncio.CancelledError handler around the call to await trio_done_fut:

Actor.cancel_soon() which schedules the actor runtime to cancel on the next trio runner cycle and results in a “self cancellation” of the actor.
“pumping the asyncio event loop” with a non-0 .sleep(0.1) XD |_ seems that a “shielded” pump with some actual delay: float >= 0 did the trick to get asyncio to allow the trio runner/loop to fully complete its guest-run without abandonment.

Much improved `asyncio`-mode support driven by upcoming py3.13 support,

better handling of simultaneous but “independent” trio vs. asyncio.Task errors such that they are raised in an eg per 1ff79f86b7
support for infected-asyncio-mode in a root actor via the tractor.to_asyncio.run_as_asyncio_guest() entrypoint which can be now used over the std trio.run() from the first main/root process to open the runtime via either .open_root_actor() or .open_nursery() (delivered in commit 4a195eef4c)

New test suites/extensions introduced here,

a new tests/devx subpkg for all tooling affiliated tests
officially testing the proto-ed stackscope integration
a new examples/debugging/asyncio_bp.py to verify to_asyncio based support when using tractor.pause_from_sync()/breakpoint() from asyncio.Tasks.
extensions/reworks to existing test_infected_asyncio suite generally as part of 72035a20d7,
- test_trio_prestarted_task_bubbles
- test_trio_closes_early_and_channel_exits
- test_aio_cancelled_from_aio_causes_trio_cancelled
adding a new dedicated tests/test_root_infect_asyncio suite in f26d487000 to pair with new root actor support (see below).
a new examples/debugging/restore_builtin_breakpoint.py and test_breakpoint_hook_restored to verify breakpoint() restoration in 9af6271e99
- XXX almost lost it re-orging in the next commit a356233b47 but was able to catch that muck-up thanks to this very bullet list! XD now brought back to life in e8111e40f9

History of outstandings from the original `.pause_from_sync()`

and debug REPL from asyncio.Tasks effort:

There chronology started on github in a WIP PR,

https://github.com/goodboy/tractor/pull/362

but then was further followed up by adding a new tractor.devx sub-pkg again via github in PR,

https://github.com/goodboy/tractor/pull/372

There was also an original PR proposed in (gitea) #1 but i (by old habbit/accident) landed it via a new GH PR,

https://github.com/goodboy/tractor/pull/374

TODO list from GH #374

To be solved here obvi!

from non-main-threads spawned via trio.to_thread.run_sync()
- doc a test to verify added in 32e12c8b03
- full test verification in 5cdd012417
- better impl polish for handling actor-tree-teardown races in d9662d9b34 and 7443e387b5.
from asyncio tasks when using our .to_asyncio subsys:
- using both .pause_from_sync() and breakpoint() which are verified supported as of commit a356233b47 BUT ALSO shows (via test) that crashes inside an asyncio.Task engage crash-handling correctly (at least mostly)!
- FOLLOW UP TODO #9, we still have oustanding ctl-c handling issues that needs to be followed up in the future!
decide on greenback bootstrapping:
- should it be an optional dep?
  - NO, instead we include it with the uv install --dev optional deps
- always override the breakpoint() hook when it is installed?
  - YES, and we also raise a RTE on any breakpoint() usage from code that both does not spec debug_mode=True to open_root_actor() as well as if no greenback is avail! as of test in 9af6271e99
- when greenback is not installed how should we guard against debug_mode=True usage from sync code?
  - we raise an explicit RTE always around breakpoint() usage as per test from 9af6271e99
  - also raise a dedicated error when an “out-of-band-asyncio.Task” tries to call breakpoint as per 1afef149d4
  - doc-prep in b7aa72465d a new test_sync_pause_from_non_greenbacked_aio_task for maybe attempting to support this somehow in the future?
- where is the best place to call devx._debug.maybe_init_greenback()?
  - _root and in _invoke() tasks supported from commit 9811db9ac5 which (will) land(s) in the upstream #7

Unrelated improvements thrown in,

Stuff that was deemed (historically) necessary enough to land alongside all the above,

72fc6fce24 Support for passing pre-conf-ed Logger; super handy for getting tractor-styled console pretty-formatting around an external sys/lib’s logging usage/config.
deliver a new a boxed-maybe-error from open_crash_handler() for post crash introspection (often for testing) purposes in a60837550e
example “raise-from-finally:” in trio nursery test which demos a footgun 2bd4cc9727 with a potential “holster” solution for unmasking the underlying suppressed errors in such cases in 1075ea3687

Maybe to cherry from `py313_support` and `ext_type_plds` branches?

maybe from py313_support,
- 8573cd3 Tweak some test asserts to better is style
- 4de4897 Unset $PYTHON_COLORS for test debugger suite..
- 1f951a9 Another is fix..
- 08fa266 Add per-side graceful-exit/cancel excs-as-signals
- 985c5a4 More debug_mode test support, better nursery var names
- 60eca81 Be extra sure to re-raise EoCs from translator
- 5ff2740 Add a mark to pytest.xfail() questionably conc py stuff (ur mam .xfail()s bish!)
- e313cb5 Repair/update stackscope test
- e78223 Mask ctlc borked REPL tests
maybe from ext_type_plds,
- 90287b9 Fix an aio_err ref bug
- 3d54885 Continue supporting py3.11+
  - since it all is in .to_asyncio anyway?
- a66caa2 Drop asyncio-canc error from ._exceptions
- 47ec7e7 Add equiv of AsyncioCancelled for aio side

On `asyncio` being super lovely and abandoning our guest-run.. -------------------------------------------------------------- Took me a while to figure out what the heck was going on but, turns out `asyncio` changed their SIGINT handling in 3.11 as per: https://docs.python.org/3/library/asyncio-runner.html#handling-keyboard-interruption I'm not entirely sure if it's the 3.11 changes or possibly wtv further updates were made in 3.12 but more or less due to the way our current main task was written the `trio` guest-run was getting abandoned on SIGINTs sent from the OS to the infected child proc.. Note that much of the bug and soln cases are layed out in very detailed comment-notes both in the new test and `run_as_asyncio_guest()`, right above the final "fix" lines. The (seemingly working) "fix" required 2 lines of code to be run inside a `asyncio.CancelledError` handler around the call to `await trio_done_fut`: - `Actor.cancel_soon()` which schedules the actor runtime to cancel on the next `trio` runner cycle and results in a "self cancellation" of the actor. - "pumping the `asyncio` event loop" with a non-0 `.sleep(0.1)` XD |_ seems that a "shielded" pump with some actual `delay: float >= 0` did the trick to get `asyncio` to allow the `trio` runner/loop to fully complete its guest-run without abandonment. --- Much improved `asyncio`-mode support driven by upcoming py3.13 support, ----------------------------------------------------------------------- - better handling of simultaneous but "independent" `trio` vs. `asyncio.Task` errors such that they are raised in an eg per 1ff79f86b71b21f0c9547af1f71096b8d692b801 - [x] support for infected-asyncio-mode in a root actor via the `tractor.to_asyncio.run_as_asyncio_guest()` entrypoint which can be now used over the std `trio.run()` from the first main/root process to open the runtime via either `.open_root_actor()` or `.open_nursery()` (delivered in commit 4a195eef4c360263989063480fd646c9c06d2819) --- New test suites/extensions introduced here, ------------------------------------------- - a new `tests/devx` subpkg for all tooling affiliated tests - officially testing the proto-ed `stackscope` integration - a new `examples/debugging/asyncio_bp.py` to verify `to_asyncio` based support when using `tractor.pause_from_sync()`/`breakpoint()` from `asyncio.Task`s. - extensions/reworks to existing `test_infected_asyncio` suite generally as part of 72035a20d7bc15fa6d39cc929045cfd0e59949a7, - `test_trio_prestarted_task_bubbles` - `test_trio_closes_early_and_channel_exits` - `test_aio_cancelled_from_aio_causes_trio_cancelled` - adding a new dedicated `tests/test_root_infect_asyncio` suite in f26d4870008b24e49cf58e1414ff94a5689cfb53 to pair with new root actor support (see below). - a new `examples/debugging/restore_builtin_breakpoint.py` and `test_breakpoint_hook_restored` to verify `breakpoint()` restoration in 9af6271e99fbe91b6724d36b7acd25007d2dc8f8 - XXX almost lost it re-orging in the next commit a356233b47fcd2431f4414e7a310e4d3a100fc58 but was able to catch that muck-up thanks to this very bullet list! XD now brought back to life in e8111e40f9c50c5a3a4dc64bdab0815f0cba6a80 --- History of outstandings from the original `.pause_from_sync()` ------------------------------------------------------------- and debug REPL from `asyncio.Task`s effort: There chronology started on github in a WIP PR, - https://github.com/goodboy/tractor/pull/362 but then was further followed up by adding a new `tractor.devx` sub-pkg again via github in PR, - https://github.com/goodboy/tractor/pull/372 There was also an original PR proposed in (gitea) #1 but i (by old habbit/accident) landed it via a new GH PR, - https://github.com/goodboy/tractor/pull/374 --- TODO list from [GH #374](https://github.com/goodboy/tractor/pull/374) ------------------------ To be solved here obvi! - [x] from non-main-threads spawned via `trio.to_thread.run_sync()` - doc a test to verify added in 32e12c8b034f1043e1859209bac311c513f9c2fc - full test verification in 5cdd012417ebbb520bff326f0f7a2de7453151cd - better impl polish for handling actor-tree-teardown races in d9662d9b343f721c2cfe07914004401b7a1a6e65 and 7443e387b58dececb9eea69a41d59bd14da0687c. - [x] from `asyncio` tasks when using our `.to_asyncio` subsys: - [x] using both `.pause_from_sync()` and `breakpoint()` which are verified supported as of commit a356233b47fcd2431f4414e7a310e4d3a100fc58 BUT ALSO shows (via test) that crashes inside an `asyncio.Task` engage crash-handling correctly (at least mostly)! - [x] FOLLOW UP TODO https://pikers.dev/goodboy/tractor/issues/9, we still have oustanding ctl-c handling issues that needs to be followed up in the future! - [x] decide on `greenback` bootstrapping: - [x] should it be an optional dep? - [x] NO, instead we include it with the `uv install --dev` optional deps - [x] always override the `breakpoint()` hook when it **is** installed? - [x] YES, and we also raise a RTE on any `breakpoint()` usage from code that both does not spec `debug_mode=True` to `open_root_actor()` as well as if no `greenback` is avail! as of test in 9af6271e99fbe91b6724d36b7acd25007d2dc8f8 - [x] when `greenback` is **not** installed how should we guard against `debug_mode=True` usage from sync code? - we raise an explicit RTE always around `breakpoint()` usage as per test from 9af6271e99fbe91b6724d36b7acd25007d2dc8f8 - also raise a dedicated error when an "out-of-band-`asyncio.Task`" tries to call breakpoint as per 1afef149d4a7edf4f72ce9cd524daa601945773b - doc-prep in b7aa72465d8cd6d8cf86d26c9dbaba802d5062fc a new `test_sync_pause_from_non_greenbacked_aio_task` for maybe attempting to support this somehow in the future? - [x] where is the best place to call `devx._debug.maybe_init_greenback()`? - `_root` and in `_invoke()` tasks supported from commit 9811db9ac50c9f307b6c92ddf5aee57b91a6a1a9 which (will) land(s) in the upstream #7 --- Unrelated improvements thrown in, --------------------------------- Stuff that was deemed (historically) necessary enough to land alongside all the above, - 72fc6fce24933628290fbd2ea5f54293ac733553 Support for passing pre-conf-ed `Logger`; super handy for getting `tractor`-styled console pretty-formatting around an external sys/lib's `logging` usage/config. - deliver a new a *boxed-maybe-error* from `open_crash_handler()` for post crash introspection (often for testing) purposes in a60837550e7bd32fc0156daf4b95dab6c0cc6646 - example "raise-from-`finally:`" in `trio` nursery test which demos a footgun 2bd4cc9727dfd3da28c5c65762ee5155d2fbfc2d with a potential "holster" solution for unmasking the underlying suppressed errors in such cases in 1075ea3687bef7a45576401a7bb521734dffc1f3 --- Maybe to cherry from `py313_support` and `ext_type_plds` branches? ------------------------------------------------------------------ - maybe from `py313_support`, - [x] 8573cd3 Tweak some test asserts to better `is` style - [x] 4de4897 Unset `$PYTHON_COLORS` for test debugger suite.. - [x] 1f951a9 Another `is` fix.. - [x] 08fa266 Add per-side graceful-exit/cancel excs-as-signals - [x] 985c5a4 More `debug_mode` test support, better nursery var names - [x] 60eca81 Be extra sure to re-raise EoCs from translator - [x] 5ff2740 Add a mark to `pytest.xfail()` questionably conc py stuff (ur mam `.xfail()`s bish!) - [x] e313cb5 Repair/update `stackscope` test - [x] e78223 Mask ctlc borked REPL tests - maybe from `ext_type_plds`, - [x] 90287b9 Fix an `aio_err` ref bug - [x] 3d54885 Continue supporting py3.11+ - since it all is in `.to_asyncio` anyway? - [x] a66caa2 Drop `asyncio`-canc error from `._exceptions` - [x] 47ec7e7 Add equiv of `AsyncioCancelled` for aio side

goodboy added 1 commit 2024-06-24 19:11:04 +00:00

5ed30dec40 Hack `asyncio` to not abandon a guest-mode run?

Took me a while to figure out what the heck was going on but, turns out
`asyncio` changed their SIGINT handling in 3.11 as per:

https://docs.python.org/3/library/asyncio-runner.html#handling-keyboard-interruption

I'm not entirely sure if it's the 3.11 changes or possibly wtv further
updates were made in 3.12  but more or less due to the way
our current main task was written the `trio` guest-run was getting
abandoned on SIGINTs sent from the OS to the infected child proc..

Note that much of the bug and soln cases are layed out in very detailed
comment-notes both in the new test and `run_as_asyncio_guest()`, right
above the final "fix" lines.

Add new `test_infected_aio.test_sigint_closes_lifetime_stack()` test suite
which reliably triggers all abandonment issues with multiple cases
of different parent behaviour post-sending-SIGINT-to-child:
 1. briefly sleep then raise a KBI in the parent which was originally
    demonstrating the file leak not being cleaned up by `Actor.lifetime_stack.close()`
    and simulates a ctl-c from the console (relayed in tandem by
    the OS to the parent and child processes).
 2. do `Context.wait_for_result()` on the child context which would
    hang and timeout since the actor runtime would never complete and
    thus never relay a `ContextCancelled`.
 3. both with and without running a `asyncio` task in the `manage_file`
    child actor; originally it seemed that with an aio task scheduled in
    the child actor the guest-run abandonment always was the "loud" case
    where there seemed to be some actor teardown but with tbs from
    python failing to gracefully exit the `trio` runtime..

The (seemingly working) "fix" required 2 lines of code to be run inside
a `asyncio.CancelledError` handler around the call to `await trio_done_fut`:
- `Actor.cancel_soon()` which schedules the actor runtime to cancel on
  the next `trio` runner cycle and results in a "self cancellation" of
  the actor.
- "pumping the `asyncio` event loop with a non-0 `.sleep(0.1)` XD
 |_ seems that a "shielded" pump with some actual `delay: float >= 0`
   did the trick to get `asyncio` to allow the `trio` runner/loop to
   fully complete its guest-run without abandonment.

Other supporting changes:
- move `._exceptions.AsyncioCancelled`, our renamed
  `asyncio.CancelledError` error-sub-type-wrapper, to `.to_asyncio` and make
  it derive from `CancelledError` so as to be sure when raised by our
  `asyncio` x-> `trio` exception relay machinery that `asyncio` is
  getting the specific type it expects during cancellation.
- do "summary status" style logging in `run_as_asyncio_guest()` wherein
  we compile the eventual `startup_msg: str` emitted just before waiting
  on the `trio_done_fut`.
- shield-wait with `out: Outcome = await asyncio.shield(trio_done_fut)`
  even though it seems to do nothing in the SIGINT handling case..(I
  presume it might help avoid abandonment in a `asyncio.Task.cancel()`
  case maybe?)

goodboy force-pushed aio_abandons from 5ed30dec40 to 284fa0340e

2024-06-24 19:14:19 +00:00

Compare

goodboy force-pushed aio_abandons from 284fa0340e to a870df68c0

2024-06-24 20:11:45 +00:00

Compare

goodboy added 3 commits 2024-06-26 18:00:51 +00:00

4f1db1ff52 Lel, revert `AsyncioCancelled` inherit, module..

Turns out it somehow breaks our `to_asyncio` error relay since obvi
`asyncio`'s runtime seems to specially handle it (prolly via
`isinstance()` ?) and it caused our
`test_aio_cancelled_from_aio_causes_trio_cancelled()` to hang..
Further, obvi `unpack_error()` won't be able to find the type def if not
kept inside `._exceptions`..

So given all that, revert the change/move as well as:
- tweak the aio-from-aio cancel test to timeout.
- do `trio.sleep()` conc with any bg aio task by moving out nursery
  block.
- add a `send_sigint_to: str` parameter to
  `test_sigint_closes_lifetime_stack()` such that we test the SIGINT
  being relayed to just the parent or the child.

268bd0d8ec Demo-abandonment on shielded `trio`-side work

Finally this reproduces the issue as it (originally?) exhibited inside
`piker` where the `Actor.lifetime_stack` wasn't closed in cases where
during `infected_aio`-actor cancellation/shutdown `trio` side tasks
which are doing shielded (teardown) work are NOT being watched/waited on
from the `aio_main()` task-closure inside `run_as_asyncio_guest()`!

This is then the root cause of the guest-run being abandoned since if
our `aio_main()` task-closure doesn't know it should allow the run to
finish, it's going to call `loop.close()` eventually resulting in the
`GeneratorExit` thrown into `trio._core._run.unrolled_run()`..

So, this extends the `test_sigint_closes_lifetime_stack()` suite to
include cases for such shielded `trio`-task ops:
- add a new `trio_side_is_shielded: bool` which will toggle whether to
  add a shielded 0.5s `trio.sleep()` loop to `manage_file()` which
  should outlive the `asyncio` event-loop shutdown sequence and result
  in an abandoned guest-run and thus a leaked file.
- parametrize the existing suite with this case resulting in a total 16
  test set B)

This patch demonstrates the problem with our `aio_main()` task-closure
impl via the now 4 failing tests, a fix is coming in a follow up commit!

9133f42b07 Solve our abandonment issues..

To make the recent set of tests pass this (hopefully) finally solves all
`asyncio` embedded `trio` guest-run abandonment by ensuring we "pump the
event loop" until the guest-run future is fully complete.

Accomplished via simple poll loop of the form `while not
trio_done_fut.done(): await asyncio.sleep(.1)` in the `aio_main()`
task's exception teardown sequence. The loop does a naive 10ms
"pump-via-sleep & poll" for the `trio` side to complete before finally
exiting (and presumably raising) from the SIGINT cancellation.

Other related cleanups and refinements:
- use `asyncio.Task.result()` inside `cancel_trio()` since it also
  inline-raises any exception outcome and we can also log-report the
  result in non-error cases.
- comment out buncha not-sure-we-need-it stuff in `cancel_trio()`.
- remove the botched `AsyncioCancelled(CancelledError):` idea obvi XD
- comment `greenback` init for now in `aio_main()` since (pretty sure)
  we don't ever want to actually REPL in that specific func-as-task?
- always capture any `fute_err: BaseException` from the `main_outcome:
  Outcome` delivered by the `trio` side guest-run task.
- add and raise a new super noisy `AsyncioRuntimeTranslationError`
  whenever we detect that the guest-run `trio_done_fut` has not
  completed before task exit; should avoid abandonment issues ever
  happening again without knowing!

goodboy added 7 commits 2024-06-28 23:05:27 +00:00

9f9b0b17dc Add a `Context.portal`, more cancel tooing

Might as well add a public maybe-getter for use on the "parent" side
since it can be handy to check out-of-band cancellation conditions (like
from `Portal.cancel_actor()`).

Buncha bitty tweaks for more easily debugging cancel conditions:
- add a `@.cancel_called.setter` for hooking into `.cancel_called = True`
  being set in hard to decipher "who cancelled us" scenarios.
- use a new `self_ctxc: bool` var in `.cancel()` to capture the output
  state from `._is_self_cancelled(remote_error)` at call time so it can
  be compared against the measured value at crash-time (when REPL-ing it
  can often have already changed due to runtime teardown sequencing vs.
  the crash handler hook entry).
- proxy `hide_tb` to `.drain_to_final_msg()` from `.wait_for_result()`.
- use `remote_error.sender` attr directly instead of through
  `RAE.msgdata: dict` lookup.
- change var name `our_uid` -> `peer_uid`; it's not "ours"..

Other various docs/comment updates:
- extend the main class doc to include some other name ideas.
- change over all remaining `.result()` refs to `.wait_for_result()`.
- doc more details on how we want `.outcome` to eventually signature.

2ac999cc3c Prep for legacy RPC API factor-n-remove

This change is adding commentary about the upcoming API removal and
simplification of nursery + portal internals; no actual code changes are
included.

The plan to (re)move the old RPC methods:
- `ActorNursery.run_in_actor()`
- `Portal.run()`
- `Portal.run_from_ns()`

and any related impl internals out of each conc-primitive and instead
into something like a `.hilevel.rpc` set of APIs which then are all
implemented using the newer and more lowlevel `Context`/`MsgStream`
primitives instead Bo

Further,
- formally deprecate the `Portal.result()` meth for
  `.wait_for_result()`.
- only `log.info()` about runtime shutdown in the implicit root case.

5739e79645 Use `delay=0` in pump loop..

Turns out it does work XD

Prior presumption was from before I had the fute poll-loop so makes
sense we needed more then one sched-tick's worth of context switch vs.
now we can just keep looping-n-pumping as fast possible until the
guest-run's main task completes.

Also,
- minimize the preface commentary (as per todo) now that we have tests
  codifying all the edge cases :finger_crossed:
- parameter-ize the pump-loop-cycle delay and default it to 0.

b72a025d0f Always reset `._state._ctxvar_Context` to prior

Not sure how I forgot this but, obviously it's correct context-var
semantics to revert the current IPC `Context` (set in the latest
`.open_context()` block) such that any prior instance is reset..

This ensures the sanity `assert`s pass inside
`.msg._ops.maybe_limit_plds()` and just in general ensures for any task
that the last opened `Context` is the one returned from
`current_ipc_ctx()`.

5e009a8229 Further formalize `greenback` integration

Since we more or less require it for `tractor.pause_from_sync()` this
refines enable toggles and their relay down the actor tree as well as
more explicit logging around init and activation.

Tweaks summary:
- `.info()` report the module if discovered during root boot.
- use a `._state._runtime_vars['use_greenback']: bool` activation flag
  inside `Actor._from_parent()` to determine if the sub should try to
  use it and set to `False` if mod-loading fails / not installed.
- expose `maybe_init_greenback()` from `.devx` sugpkg.
- comment out RTE in `._pause()` for now since we already have it in
  `.pause_from_sync()`.
- always `.exception()` on `maybe_init_greenback()` import errors to
  clarify the underlying failure deats.
- always explicitly report if `._state._runtime_vars['use_greenback']`
  was NOT set when `.pause_from_sync()` is called.

Other `._runtime.async_main()` adjustments:
- combine the "internal error call ur parents" message and the failed
  registry contact status into one new `err_report: str`.
- drop the final exception handler's call to
  `Actor.lifetime_stack.close()` since we're already doing it in the
  `finally:` block and the earlier call has no currently known benefit.
- only report on the `.lifetime_stack()` callbacks if any are detected
  as registered.

cb90f3e6ba Update `MsgTypeError` content matching to latest

4fbd469c33 Update `._entry` actor status log

Log-report the different types of actor exit conditions including cancel
via KBI, error or normal return with varying levels depending on case.

Also, start proto-ing out this weird ascii-syntax idea for describing
conc system states and implement the first bit in a `nest_from_op()`
log-message fmter that joins and indents an obj `repr()` with
a tree-like `'>)\n|_'` header.

goodboy added 7 commits 2024-07-02 17:02:32 +00:00

7e93b81a83 Don't use pretty struct stuff in `._invoke`

It's too fragile to put in side core RPC machinery since
`msgspec.Struct` defs can fail if a field type can't be
looked up at creation time (like can easily happen if you
conditionally import using `if TYPE_CHECKING:`)

Also,
- rename `cs` to `rpc_ctx_cs: CancelScope` since it's literally
  the wrapping RPC `Context._scope`.
- report self cancellation via `explain: str` and add tail case for
  "unknown cause".
- put a ?TODO? around what to do about KBIs if a context is opened
  from an `infected_aio`-actor task.
- similar to our nursery and portal add TODO list for moving all
  `_invoke_non_context()` content out the RPC core and instead implement
  them as `.hilevel` endpoint helpers (maybe as decorators?)which under
  neath define `@context`-funcs.

edac717613 Use `msgspec.Struct.__repr__()` failover impl

In case the struct doesn't import a field type (which will cause the
`.pformat()` to raise) just report the issue and try to fall back to the
original `repr()` version.

18d440c207 (Re)type annot some tests

- For the (still not finished) `test_caps_based_msging`, switch to
  using the new `PayloadMsg`.
- add `testdir` fixture type.

ba83bab776 Todo a test for sync-pausing from non-main-root-tasks

e3d59964af Woops, set `.cancel()` level in custom levels table..

3907cba68e Refine some `.trionics` docs and logging

- allow passing and report the lib name (`trio` or `tractor`) from
  `maybe_open_nursery()`.
- use `.runtime()` level when reporting `_Cache`-hits in
  `maybe_open_context()`.
- tidy up some doc strings.

af3745684c More formal `TransportClosed` reporting/raising

Since it was all ad-hoc defined inside
`._ipc.MsgpackTCPStream._iter_pkts()` more or less, this starts
formalizing a way for particular transport backends to indicate whether
a disconnect condition should be re-raised in the RPC msg loop and if
not what log level to report it at (if any).

Based on our lone transport currently we try to suppress any logging
noise from ephemeral connections expected during normal actor
interaction and discovery subsys ops:
- any short lived discovery related TCP connects are only logged as
  `.transport()` level.
- both `.error()` and raise on any underlying `trio.ClosedResource`
  cause since that normally means some task touched transport layer
  internals that it shouldn't have.
- do a `.warning()` on anything else unexpected.

Impl deats:
- extend the `._exceptions.TransportClosed` to accept an input log
  level, raise-on-report toggle and custom reporting & raising via a new
  `.report_n_maybe_raise()` method.
- construct the TCs with inputs per case in (the newly named) `._iter_pkts().
- call ^ this method from the `TransportClosed` handler block inside the
  RPC msg loop thus delegating reporting levels and/or raising to the
  backend's per-case TC instantiating.

Related `._ipc` changes:
- mask out all the `MsgpackTCPStream._codec` debug helper stuff and drop
  any lingering cruft from the initial proto-ing of msg-codecs.
- rename some attrs/methods:
  |_`MsgpackTCPStream._iter_packets()` -> `._iter_pkts()` and
    `._agen` -> `_aiter_pkts`.
  |_`Channel._aiter_recv()` -> `._aiter_msgs()` and
    `._agen` -> `_aiter_msgs`.
- add `hide_tb: bool` support to `Channel.send()` and only show the
  frame on non-MTEs.

goodboy added 4 commits 2024-07-02 21:24:51 +00:00

3c5816c977 Add `Portal.chan` property, to wrap `._chan` attr

02812b9f51 Reraise RAEs in `MsgStream.receive()`; truncate tbs

To avoid showing lowlevel details of exception handling around the
underlying call to `return await self._ctx._pld_rx.recv_pld(ipc=self)`,
any time a `RemoteActorError` is unpacked (an raised locally) we re-raise
it directly from the captured `src_err` captured so as to present to
the user/app caller-code an exception raised directly from the `.receive()`
frame. This simplifies traceback call-stacks for any `log.exception()`
or `pdb`-REPL output filtering out the lower `PldRx` frames by default.

b46400a86f Use `._entry` proto-ed "lifetime ops" in logging

As per a WIP scribbled out TODO in `._entry.nest_from_op()`, change
a bunch of "supervisor/lifetime mgmt ops" related log messages to
contain some supervisor-annotation "headers" in an effort to give
a terser "visual indication" of how some execution/scope/storage
primitive entity (like an actor/task/ctx/connection) is being operated
on (like, opening/started/closed/cancelled/erroring) from a "supervisor
action" POV.

Also tweak a bunch more emissions to lower levels to reduce noise around
normal inter-actor operations like process and IPC ctx supervision.

9be821a5cf More failed REPL-lock-request refinements

In `lock_stdio_for_peer()` better internal-error handling/reporting:
- only `Lock._blocked.remove(ctx.cid)` if that same cid was added on
  entry to avoid needless key-errors.
- drop all `Lock.release(force: bool)` usage remnants.
- if `req_ctx.cancel()` fails mention it with `ctx_err.add_note()`.
- add more explicit internal-failed-request log messaging via a new
  `fail_reason: str`.
- use and use new `x)<=\n|_` annots in any failure logging.

Other cleanups/niceties:
- drop `force: bool` flag entirely from the `Lock.release()`.
- use more supervisor-op-annots in `.pdb()` logging
  with both `_pause/crash_msg: str` instead of double '|' lines when
  `.pdb()`-reported from `._set_trace()`/`._post_mortem()`.

goodboy added 9 commits 2024-07-10 22:33:33 +00:00

b56352b0e4 Quieter `Stop` handling on ctx result capture

In the `drain_to_final_msg()` impl, since a stream terminating
gracefully requires this msg, there's really no reason to `log.cancel()`
about it; go `.runtime()` level instead since we're trying de-noise
under "normal operation".

Also,
- passthrough `hide_tb` to taskc-handler's `ctx.maybe_raise()` call.
- raise `MessagingError` for the `MsgType` unmatched `case _:`.
- detail the doc string motivation a little more.

5f8f8e98ba More-n-more scops annots in logging

31207f92ee Finally implement peer-lookup optimization..

There's a been a todo for soo long for this XD

Since all `Actor`'s store a set of `._peers` we can try a lookup on that
table as a shortcut before pinging the registry Bo

Impl deats:
- add a new `._discovery.get_peer_by_name()` routine which attempts the
  `._peers` lookup by combining a copy of that `dict` + an entry added
  for `Actor._parent_chan` (since all subs have a parent and often the
  desired contact is just that connection).
- change `.find_actor()` (for the `only_first == True` case),
  `.query_actor()` and `.wait_for_actor()` to call the new helper and
  deliver appropriate outputs if possible.

Other,
- deprecate `get_arbiter()` def and all usage in tests and examples.
- drop lingering use of `arbiter_sockaddr` arg to various routines.
- tweak the `Actor` doc str as well as some code fmting and a tweak to
  the `._stream_handler()`'s initial `con_status: str` logging value
  since the way it was could never be reached.. oh and `.warning()` on
  any new connections which already have a `_pre_chan: Channel` entry in
  `._peers` so we can start minimizing IPC duplications.

e6ccfce751 Adjusts advanced fault tests to match new `TransportClosed` semantics

bef3dd9e97 Another tweak to REPL entry `.pdb()` headers

fc95c6719f Get multi-threaded sync-pausing fully workin!

The final issue was making sure we do the same thing on ctl-c/SIGINT
from the user. That is, if there's already a bg-thread in REPL, we
`log.pdb()` about SIGINT shielding and re-draw the prompt; the same UX
as normal actor-runtime-task behaviour.

Reasons this wasn't workin.. and the fix:
- `.pause_from_sync()` was overriding the local `repl` var with `None`
  delivered by (transitive) calls to `_pause(debug_func=None)`.. so
  remove all that and only assign it OAOO prior to thread-type case
  branching.
- always call `DebugStatus.shield_sigint()` as needed from all requesting
  threads/tasks:
  - in `_pause_from_bg_root_thread()` BEFORE calling `._pause()` AND BEFORE
    yielding back to the bg-thread via `.started(out)` to ensure we're
    definitely overriding the handler in the `trio`-main-thread task
    before unblocking the requesting bg-thread.
  - from any requesting bg-thread in the root actor such that both its
    main-`trio`-thread scheduled task (as per above bullet) AND it are
    SIGINT shielded.
  - always call `.shield_sigint()` BEFORE any `greenback._await()` case
    don't entirely grok why yet, but it works)?
  - for `greenback._await()` case always set `bg_task` to the current one..
- tweaks to the `SIGINT` handler, now renamed `sigint_shield()` so as
  not to name-collide with the methods when editor-searching:
  - always try to `repr()` the REPL thread/task "owner" as well as the
    active `PdbREPL` instance.
  - add `.devx()` notes around the prompt flushing deats and comments
    for any root-actor-bg-thread edge cases.

Related/supporting refinements:
- add `get_lock()`/`get_debug_req()` factory funcs since the plan is to
  eventually implement both as `@singleton` instances per actor.
- fix `acquire_debug_lock()`'s call-sig-bug for scheduling
  `request_root_stdio_lock()`..
- in `._pause()` only call `mk_pdb()` when `debug_func != None`.
- add some todo/warning notes around the `cls.repl = None` in
  `DebugStatus.release()`

`test_pause_from_sync()` tweaks:
- don't use a `attach_patts.copy()`, since we always `break` on match.
- do `pytest.fail()` on that ^ loop's fallthrough..
- pass `do_ctlc(child, patt=attach_key)` such that we always match the
  the current thread's name with the ctl-c triggered `.pdb()` emission.
- oh yeah, return the last `before: str` from `do_ctlc()`.
- in the script, flip `abandon_on_cancel=True` since when `False` it
  seems to cause `trio.run()` to hang on exit from the last bg-thread
  case?!?

131e3e8157 Move `mk_cmd()` to `._testing`

Since we're going to need it more generally for `.devx` sub-sys tooling
tests.

Also, up the sync-pause ctl-c delay another 10ms..

d216068713 Start a new `tests/devx/` tooling-subsuite-pkg

547b957bbf Officially test proto-ed `stackscope` integration

By re-purposing our `pexpect`-based console matching with a new
`debugging/shield_hang_in_sub.py` example, this tests a few "hanging
actor" conditions more formally:

- that despite a hanging actor's task we can dump
  a `stackscope.extract()` tree on relay of `SIGUSR1`.
- the actor tree will terminate despite a shielded forever-sleep by our
  "T-800" zombie reaper machinery activating and hard killing the
  underlying subprocess.

Some test deats:
- simulates the expect actions of a real user by manually using
  `os.kill()` to send both signals to the actor-tree program.
- `pexpect`-matches against `log.devx()` emissions under normal
  `debug_mode == True` usage.
- ensure we get the actual "T-800 deployed" `log.error()` msg and
  that the actor tree eventually terminates!

Surrounding (re-org/impl/test-suite) changes:
- allow disabling usage via a `maybe_enable_greenback: bool` to
  `open_root_actor()` but enable by def.
- pretty up the actual `.devx()` content from `.devx._stackscope`
  including be extra pedantic about the conc-primitives for each signal
  event.
- try to avoid double handles of `SIGUSR1` even though it seems the
  original (what i thought was a) problem was actually just double
  logging in the handler..
  |_ avoid double applying the handler func via `signal.signal()`,
  |_ use a global to avoid double handle func calls and,
  |_ a `threading.RLock` around handling.
- move common fixtures and helper routines from `test_debugger` to
  `tests/devx/conftest.py` and import them for use in both test mods.

goodboy force-pushed aio_abandons from 547b957bbf to f7469442e3

2024-07-12 00:13:06 +00:00

Compare

goodboy added 2 commits 2024-07-13 05:00:57 +00:00

1f1a3f19d5 Fix multi-daemon debug test `break` signal..

It was expecting `AssertionError` as a proceed-in-test signal (by
breaking from a continue loop), but `in_prompt_msg(raise_on_err=True)`
was changed to raise `ValueError`; so instead just use as a predicate
for the `break`.

Also rework `in_prompt_msg()` to accept the `child: BaseSpawn` as input
instead of `before: str` remove the casting boilerplate, and adjust all
usage to match.

a69bc00593 First draft, `asyncio`-task, sync-pausing Bo

Mostly due to magic from @oremanj (a super-end-level-boss) where we slap
in a little bit of `.from_asyncio`-type stuff to run a `trio`-task from
`asyncio` code

I'm not gonna go into tooo too much detail but basically the primary
thing needed was a way to (blocking-ly) invoke a `trio.lowlevel.Task`
from an `asyncio.Task` (which we now have with a new
`run_trio_task_in_future()` thanks to the "aforementioned jefe") which
we now invoke from a dedicated aio case-branch inside
`.devx._debug.pause_from_sync()`. Further include a case inside
`DebugStatus.release()` to handle using the same func to set the
`repl_release: trio.Event` from the `asyncio` side when releasing the
REPL.

Prolly more refinements to come ;{o

goodboy force-pushed aio_abandons from a69bc00593 to 71cf9e7bd3

2024-07-15 15:37:33 +00:00

Compare

goodboy force-pushed aio_abandons from 71cf9e7bd3 to 3b39cce741

2024-07-29 21:59:34 +00:00

Compare

goodboy added 1 commit 2024-08-01 01:35:34 +00:00

ae95e0c83e Hide `[maybe]_open_crash_handler()` frame by default

goodboy added 4 commits 2024-09-18 23:57:14 +00:00

f7f738638d More `.pause_from_sync()` in bg-threads "polish"

Various `try`/`except` blocks around external APIs that raise when not
running inside an `tractor` and/or some async framework (mostly to avoid
too-late/benign error tbs on certain classes of actor tree teardown):
- for the `log.pdb()` prompts emitted before REPL console entry.
- inside `DebugStatus.is_main_trio_thread()`'s call to `sniffio`.
- in `_post_mortem()` by catching `NoRuntime` when called from a thread
  still active after the `.open_root_actor()` has already exited.

Also,
- create a dedicated `DebugStateError` for raising instead of `assert`s
  when we have actual debug-request inconsistencies (as seem to be most
  likely with bg thread usage of `breakpoint()`).
- show the `open_crash_handler()` frame on `bdb.BdbQuit` (for now?)

7859e743cc Add `tb_hide: bool` ctl flag to `_open_and_supervise_one_cancels_all_nursery()`

59f4024242 Add `indent: str` suport to `Context.pformat()` using `textwrap`

5c2e972315 Report any external-rent-task-canceller during msg-drain

As in whenever `Context.cancel()` is not (runtime internally) called
(i.e. `._cancel_called` is not set), we can attempt to detect the parent
`trio` nursery/cancel-scope that is the source. Emit the report with
a `.cancel()` level and attempt to repr in "sclang" form as well as
unhide the stack frame for debug/traceback-in.

goodboy added 5 commits 2024-12-09 21:09:24 +00:00

8af9b0201d Messy-teardown `DebugStatus` related fixes

Mostly fixing edge cases with `asyncio` and/or bg threads where the
`.repl_release: trio.Event` needs to be used from the main `trio`
thread OW confusing-but-valid teardown tracebacks can show under various
races.

Also improve,
- log reporting for such internal bugs to make them more obvious on
  console via `log.exception()`.
- only restore the SIGINT handler when runtime is (still) active.
- reporting when `tractor.pause(shield=True)` should be used and
  unhiding the internal frames from the tb in that case.
- for `pause_from_sync()` some deep fixes..
 |_add a `allow_no_runtime: bool = False` flag to allow
   **not** requiring the actor runtime to be active.
 |_fix the `greenback` case-branch to only trigger on `not
   is_trio_thread`.
 |_add a scope-global `repl_owner: Task|Thread|None = None` to
   avoid ref errors..

cf3e6c1218 Rename `n: trio.Nursery` -> `tn` (task nursery)

b3ee20d3b9 Add `breakpoint()` hook restoration example + test

46ddc214cd Wrap `asyncio_bp.py` ex into test suite

Ensuring we can at least use `breakpoint()` from an infected actor's
`asyncio.Task` spawned via a `.to_asyncio` API.

Also includes a little `tests/devx/` reorging,
- start splitting out non-`tractor.pause()` tests into a new
  `test_pause_from_non_trio.py` for all the `.pause_from_sync()`
  use in bg-threaded or `asyncio` applications.
- factor harness commonalities to the `devx/conftest` (namely
  the `do_ctlc()` masher).
- mv `test_pause_from_sync` to the new non`-trio` mod.

NOTE, the `ctlc=True` is still failing for
`test_pause_from_asyncio_task` which is a user-happiness bug but not
anything fundamentally broken - just need to handle the `asyncio` case
in `.devx._debug.sigint_shield()`!

b875b35b98 Change `tractor.breakpoint()` to new `.pause()` in test suite

goodboy added 11 commits 2024-12-10 19:42:47 +00:00

508ba510a5 Expose a `_ctlc_ignore_header: str` for use in `sigint_shield()`

ad40fcd2bc Support custom `boxer_header: str` provided by `pformat_boxed_tb()` caller

cd14c4fe72 Set `RemoteActorError.pformat(boxer_header=self.relay_uid)` by def

b91ab9e3a8 Add TODO for a tb frame "filterer" sys..

54699d7a0b Denoise duplicate chan logging for now

a2659069c5 Type the inter-loop chans

e26fa8330f Change masked `.pause()` line

8ebc022535 Add TODO for a runtime-vars passing mechanism

9002f608ee Add `notes_to_self/howtorelease.md` reminder doc

aa1f6fa4b5 Spitballing how to expose custom `msgspec` type hooks

Such that maybe we can eventually offer a nicer higher-level API which
implements much of the boilerplate required by `msgspec` (like
type-matched branching to serialization logic) via a type-table
interface or something?

Not sure if the idea is that useful so leaving it all as TODOs for now
obviously.

fb04f74605 Draft a (pretty)`Struct.fields_diff()`

For comparing a `msgspec.Struct` against an input `dict` presumably to
be used as input for struct instantiation. The main diff with
`.__sub__()` is that non-existing fields on either are reported
(loudly).

goodboy referenced this pull request

2025-02-26 17:53:18 +00:00

py3.13 test-suite surgery #8

goodboy added 16 commits 2025-03-10 16:34:06 +00:00

441cf0962d TOSQUASH: 9002f60 howtorelease.md file

9c83f02568 Support and test infected-`asyncio`-mode for root

Such that you can use,

```python

    tractor.to_asyncio.run_as_asyncio_guest(
        trio_main=_trio_main,
    )
```

to boostrap the root actor (and thus main parent process) to embed
the actor-rumtime into an `asyncio` loop. Prove it all works with an
subactor-free version of the aio echo-server test suite B)

7537c6f053 Support passing pre-conf-ed `Logger`

Such that we can hook into 3rd-party-libs more easily to monkey them and
use our (prettier/hipper) console logging with something like (an
example from the client project `modden`),

```python
    connection_mod = i3ipc.connection
    tractor_style_i3ipc_logger: logging.LoggingAdapter = tractor.log.get_console_log(
        _root_name=connection_mod.__name__,
        logger=i3ipc.connection_mod.logger,
        level='info',
    )
    # monkey the instance-ref in 3rd-party module
    connection_mod.logger = our_logger
```

Impl deats,
- expose as `get_console_log(logger: logging.Logger)` and add default
  failover logic.
- toss in more typing, also for mod-global instance.

f7cd8739a5 Accept err-type override in `is_multi_cancelled()`

Such that equivalents of `trio.Cancelled` from other runtimes such as
`asyncio.CancelledError` and `subprocess.CalledProcessError` (with
a `.returncode == -2`) can be gracefully ignored as needed by the
caller.

For example this is handy if you want to avoid debug-mode REPL entry on
an exception-group full of only some subset of exception types since you
expect certain tasks to raise such errors after having been cancelled by
a request from some parent supervision sys (some "higher up"
`trio.CancelScope`, a remote triggered `ContextCancelled` or just from
and OS SIGINT).

Impl deats,
- offer a new `ignore_nested: set[BaseException]` param which by
  default we add `trio.Cancelled` to when no other types are provided.
- use `ExceptionGroup.subgroup(tuple(ignore_nested)` to filter to egs of
  the "ignored sub-errors set" and return any such match (instead of
  `True`).
- detail a comment on exclusion case.

52238ade28 Raise explicitly on missing `greenback` portal

When `.pause_from_sync()` is called from an `asyncio.Task` which was
never bestowed a portal we want to be mega pedantic about it; indicate
that the task was NOT spawned from our `.to_asyncio` API and likely by
some out-of-our-control code (normally using
`asyncio.ensure_future()/.create_task()`). Though `greenback` already
errors on such usage, it's not always clear why no portal exists;
explaining the situation of a 3rd-party-bg-spawned-task should avoid
dev confusion for most cases.

Impl deats,
- distinguish between an actor in infected mode versus the actual caller
  of `.pause_from_sync()` being an `asyncio.Task` with more explicit
  `asyncio_task` and `is_infected_aio` vars.
- ONLY in the case of being both an infected-mode-actor AND detecting
  that the caller is an `asyncio.Task`, check `greenback.has_portal()`
  such that when not bestowed we presume the aforementioned
  3rd-party-bg-task case above and raise a new explicit RTE with
  a detailed explanatory message.
- add some masked draft code for handling the speical case of a root
  actor `asyncio.Task` caller which could (in theory) not actually
  require gb portal use since the `Lock` can be acquired directly
  without IPC.
 |_this will likely require factoring of various pause machinery funcs
   into a `_pause_from_root_task()` to mk the impl sane XD

Other,
- expose a new `debug_filter: Callable` which can be provided by the
  caller of `_maybe_enter_pm()` to predicate whether to enter the
  debugger REPL based on the caught `BaseException|BaseExceptionGroup`;
  this is handy for customizing the meaning of "graceful cancellations"
  so as to avoid crash handling on expected egs of more then
  `trioCancelled`.
|_ make the default as it was implemented: `not is_multi_cancelled(err)`
- pass-through a new `ignore: set[BaseException]` as
  `open_crash_handler(ignore_nested=ignore)` to allow for the same
  silent-cancellation-egs-swallowing as desired from outside the actor
  runtime.

33e5e2c06f Drop extra nl from boxed error fmt

b6608e1c46 Expose `debug_filter` from `open_root_actor()` also

Such that actor-runtime graceful cancel handling can be used throughout
any process tree.

9167fbb0a8 Much more limited `asyncio.Task.cancel()` use

Since it can not only cause the guest-mode run to abandon but also in
some edge cases prevent `trio`-errors from propagating (at least on
py3.12-13?) as discovered as part of supporting this mode officially
in the *root actor*.

As such try to avoid that method as much as possible instead opting to
pass the `trio`-side error via the iter-task channel ref.

Deats,
- add a `LinkedTaskChannel._trio_err: BaseException|None` which gets set
  whenver the `trio.Task` error is caught; ONLY set `AsyncioCancelled`
  when the `trio` task was for sure the cause, whether itself cancelled
  or errored.
- always check for this error when exiting the `asyncio` side (even when
  terminated via a call to `asyncio.Task.cancel()` or during any other
  `CancelledError` handling such that the `asyncio`-task can expect to
  handle `AsyncioCancelled` due to the above^^ cases.
- never `cs.cancel()` the `trio` side unless that cancel scope has not
  yet been `.cancel_called` whatsoever; it's a noop anyway.
- only raise any exc from `asyncio.Task.result()` when `chan._aio_err`
  does not already match it since the existence of the pre-existing
  `task_err` means `asyncio` prolly intends (or has already) raised and
  interrupted the task elsewhere.

Various supporting tweaks,
- don't bother maybe-init-ing `greenback` from the actor entrypoint
  since we already need to (and do) bestow the portals to each `asyncio`
  task spawned using the `run_task()`/`open_channel_from()` API; further
  the init-ing should be done already by client code that enables
  infected mode (even in the root actor).
 |_we should prolly also codify it from any
   `run_daemon(infected_aio=True, debug_mode=True)` usage we offer.
- pass all the `_<field>`s to `Linked TaskChannel` explicitly in named
  kwarg style.
- better sclang-style log reports throughout, particularly on teardowns.
- generally more/better comments and docs around (not well understood)
  edge cases.
- prep to just inline `maybe_raise_aio_side_err()` closure..

129dff575f Hm, `asyncio.Task._fut_waiter.set_exception()`?

Since we can't use it to `Task.set_exception()` (since that task method never
seems to work.. XD) and setting the private/internal always seems to do
the desired raising in the task? I realize it's an internal `asyncio`
runtime field but i'd rather take the risk of it breaking then having to
rely on our own equivalent hack..

Also, it seems like the case where the task's associated (and internal)
future-waiter field is null, we won't run into the (same?) prior hanging
issues (maybe since there's nothing for `asyncio` internals to use to
wait XD ??) when `Task.cancel()` is used..??

Main deats,
- add and `Future.set_exception()` a new signal-exception
  `class TrioTaskExited(AsyncioCancelled):` whenever the trio-task exits
  gracefully and the asyncio-side task is still doing blocking work (of
  some sort) which *seem to* be predicated by a check that
  `._fut_waiter is not None`.
- always call `asyncio.Queue.shutdown()` for the same^ as well as
  whenever we decide to call `Task.cancel()`; in that case the shutdown
  relays correctly?

Some further refinements,
- only warn about `Task.cancel()` usage when actually used ;)
- more local scope vars setting in the exit phase of
  `translate_aio_errors()`.
- also in ^ use explicit caught-exc var names for each error-type.

095bf28f5d Add an inter-leaved-task error test

Trying to replicate cases where errors are raised in both `trio` and
`asyncio` tasks independently (at least in `.to_asyncio` API terms) with
a new `test_trio_prestarted_task_bubbles` that generates 3 cases inside
a `@acm` calls stack composing a `trio.Nursery` with
a `to_asyncio.open_channel_from()` call where a set of `trio` tasks are
started in a loop using `.start()` with various exc raising sequences,
- the aio task raising *before* the last `trio` task spawns.
- the aio task raising just after the last trio task spawns, but before
  it starts.
- after the last trio task `.start()` call returns control to the
  parent - but (for now) did not error.

TODO, still more cases to discover as i'm still fighting a `modden` bug
of this sort atm..

Other,
- tweak some other tests to have timeouts since some recent hangs were
  found..
- started mucking with py3.13 and thus adjustments for strict egs in
  some tests; full patchset to test suite likely coming soon!

71a29d0106 Yield a boxed-maybe-error from `open_crash_handler()`

Along the lines of something like `pytest.raises()` where the handled
exception can be inspected from the `pdbp` REPL using its `.value` field
B)

This is super handy in particular for understanding
`BaseException[Group]`s without manually adding surrounding handler code
to assign the `except[*] Exception as exc_var:` particularly when trying
to understand multi-cancelled eg trees.

917699417f Add a "raise-from-`finally:`" example test

Since i wasted 2 days just to find an example of this inside an `@acm`,
figured I better reproduce for the purposes of maybe implementing
a warning sys (inside our wip proto `open_taskman()`) when a nursery
detects a single `Cancelled` in an eg where the `.__context__` is set to
some non-cancel error (which likely means a cancel-causing source
exception was suppressed by accident).

Left in a buncha commented code using `maybe_open_nursery()` which
i thought might be part of the issue but didn't end up being required;
will likely remove on a follow up refinement.

ae18ceb633 Impl a proto "unmasker" `@acm` alongside our test

Such that the suite verifies the wip `maybe_raise_from_masking_exc()`
will raise from a `trio.Cancelled.__context__` since I can't think of
any reason a `Cancelled` should ever be raised in-place of
a non-`Cancelled` XD

Not sure what should be raised instead (or maybe just a `log.warning()`
emitted?) but this starts a draft for refinement at the least. Use the
new `@pytest.mark.parametrize` explicit tuple-of-params form with an
`pytest.param + `.mark.xfail()` for the default behaviour case.

e2b9c3e769 Add a `tests/test_root_infect_asyncio`

Might as well break apart the specific test set since there are some
(minor) subtleties and the orig test mod is already getting pretty big
XD

Includes both the new "independent"-event-loops test as well as the std
usage base case suite.

85c60095ba Raise "independent" task errors in an eg

The (rare) condition is heavily detailed in new comments in
the `cancel_trio()` callback but, more or less the idea here is to be
extra pedantic in raising an `Exceptiongroup` of errors from each task
(both `asyncio` and `trio`) whenever the 2 tasks raise "independently"
- in the sense that it's not obviously one side's task causing an error
(or cancellation) in the other. In this case we set the error for each
side on the `LinkedTaskChannel` (via new attrs described later).

As a synopsis, most of this work was refined out of supporting
`infected_aio=True` mode in the **root actor** and in particular as part
of getting that to work inside the `modden` daemon which at the time of
writing was still using the `i3ipc` lib and thus `asyncio`.

Impl deats,
- extend the `LinkedTaskChannel` field/API set (and type it),
  - `._trio_task: trio.Task` for test/user introspection.
- also "stage" some ideas for a more refined interface,
  - `.started()` to deliver the value yielded to the `trio.Task` parent.
   |_ also includes some todos for how to implement this design
      underneath.
  - `._aio_first: Any|None = None` to hold that value ^.
  - `.wait_aio_complete()` for syncing to the asyncio task.
- some detailed logging around "asyncio cancelled trio" case.
- Move `AsyncioCancelled` in this module.

Styling changes,
- generally more explicit var naming.
- some todos for getting modern and fancy with typing..

NB, Let it be known this commit msg was written on a friday with the
help of various "mr. white" solns.

a1d75625e4 Draft test-doc for "out-of-band" `asyncio.Task`..

Since there's no way to activate `greenback`'s portal in such cases, we
should at least have a test verifying our very loud error about the
inability to support this usage..

goodboy force-pushed aio_abandons from a1d75625e4 to b7aa72465d

2025-03-22 18:51:08 +00:00

Compare

goodboy referenced this pull request

2025-03-22 22:15:32 +00:00

Add `tractor.pause_from_sync()` using the amazing `greenback`! #1

goodboy changed title from ~~Hack `asyncio` to not abandon a guest-mode run?~~ to Hack `asyncio` to not abandon a guest-mode run, `.pause_from_sync()` support via `.to_asyncio`

2025-03-22 22:16:36 +00:00

goodboy requested review from guille 2025-03-22 22:16:51 +00:00

goodboy requested review from jc211 2025-03-22 22:16:51 +00:00

goodboy changed title from ~~Hack `asyncio` to not abandon a guest-mode run, `.pause_from_sync()` support via `.to_asyncio`~~ to Prevent `asyncio` from abandoning guest-runs, `.pause_from_sync()` support via `.to_asyncio`

2025-03-22 22:21:19 +00:00

goodboy referenced this issue from a commit

2025-03-23 00:31:59 +00:00

Save an MIA `breakpoint()`-restore test from prior!?

goodboy added 1 commit 2025-03-23 00:31:59 +00:00

e8111e40f9 Save an MIA `breakpoint()`-restore test from prior!?

It appears that during the reorg commit
a356233b47 this was intended to be moved
(presumably where i have here) to `test_tooling` but was somehow just
never pasted over XD

Good thing this was caught while going through the remaining TODO
bullets in #2 !!

Also includes fixed relative `.conftest` imports!

goodboy added 9 commits 2025-03-24 17:35:42 +00:00

816b82f9fe Tweak some test asserts to better `is` style

34e9e529d2 Unset `$PYTHON_COLORS` for test debugger suite..

Since obvi all our `pexpect` patterns aren't going to match with
a heck-ton of terminal color escape sequences in the output XD

af660c1019 Another `is` fix..

69fd46e1ce Add per-side graceful-exit/cancel excs-as-signals

Such that any combination of task terminations/exits can be explicitly
handled and "dual side independent" crash cases re-raised in egs.

The main error-or-exit impl changes include,

- use of new per-side "signaling exceptions":
  - TrioTaskExited|TrioCancelled for signalling aio.
  - AsyncioTaskExited|AsyncioCancelled for signalling trio.

- NOT overloading the `LinkedTaskChannel._trio/aio_err` fields for
  err-as-signal relay and instead add a new pair of
  `._trio/aio_to_raise` maybe-exc-attrs which allow each side's
  task to specify what it would want the other side to raise to signal
  its/a termination outcome:
  - `._trio_to_raise: AsyncioTaskExited|AsyncioCancelled` to signal,
    |_ the aio task having returned while the trio side was still reading
       from the `asyncio.Queue` or is just not `.done()`.
    |_ the aio task being self or trio-request cancelled where
       a `asyncio.CancelledError` is raised and caught but NOT relayed
       as is back to trio; instead signal a "more explicit" exc type.
  - `._aio_to_raise: TrioTaskExited|TrioCancelled` to signal,
    |_ the trio task having returned while the aio side was still reading
       from the mem chan and indicating that the trio side might not
       care any more about future streamed values (like the
       `Stop/EndOfChannel` equivs for ipc `Context`s).
    |_ when the trio task canceld we do
        a `asyncio.Future.set_exception(TrioTaskExited())` to indicate
        to the aio side verbosely that it should cancel due to the trio
        parent.
  - `_aio/trio_err` are now left to only capturing the **actual**
    per-side task excs for introspection / other side's handling logic.

- supporting "graceful exits" depending on API in use from
  `translate_aio_errors()` such that if either side exits but the other
  side isn't expect to consume the final `return`ed value, we just exit
  silently, which required:
  - adding a `suppress_graceful_exits: bool` flag.
  - adjusting the `maybe_raise_aio_side_err()` logic to use that flag
    and suppress only on certain combos of `._trio_to_raise/._trio_err`.
  - prefer to raise `._trio_to_raise` when the aio-side is the src and
    vice versa.

- filling out pedantic logging for cancellation cases indicating which
  side is the cause.

- add a `LinkedTaskChannel._aio_result` modelled after our
  `Context._result` a a similar `.wait_for_result()` interface which
  allows maybe accessing the aio task's final return value if desired
  when using the `open_channel_from()` API.

- rename `cancel_trio()` done handler -> `signal_trio_when_done()`

Also some fairly major test suite updates,
- add a `delay: int` producing fixture which delivers a much larger
  timeout whenever `debug_mode` is set so that the REPL can be used
  without a surrounding cancel firing.
- add a new `test_aio_exits_early_relays_AsyncioTaskExited` including
  a paired `exit_early: bool` flag to `push_from_aio_task()`.
- adjust `test_trio_closes_early_causes_aio_checkpoint_raise` to expect
  a `to_asyncio.TrioTaskExited`.

ecd61226d8 More `debug_mode` test support, better nursery var names

724c22d266 Be extra sure to re-raise EoCs from translator

That is whenever `trio.EndOfChannel` is raised (presumably from the
`._to_trio.receive()` call inside `LinkedTaskChannel.receive()`) we need
to be extra certain that we let it bubble upward transparently DESPITE
special exc-as-signal handling that is normally suppressed from the aio
side; REPEAT we want to ALWAYS bubble any `trio_err ==
trio.EndOfChannel` in the `finally:` handler of `translate_aio_errors()`
despite `chan._trio_to_raise == AsyncioTaskExited` such that the
caller's iterable machinery will operate as normal when the inter-task
stream is stopped (again, presumably by the aio side task terminating
the inter-task stream).

Main impl deats for this,
- in the EoC handler block ensure we assign both `chan._trio_err` and
  the local `trio_err` as well as continue to re-raise.
- add a case to the match block in the `finally:` handler which FOR SURE
  re-raises any `type(trio_err) is EndOfChannel`!

Additionally fix a bad bug,
- a ref bug where we were NOT using the
  `except BaseException as _trio_err` to assign to `chan._trio_err` (by
  accident was missing the leading `_`..)

Unrelated impl tweak,
- move all `maybe_raise_aio_side_err()` content back to inline with its
  parent func - makes it easier to use `tractor.pause()` mostly Bp
- go back to trying to use `aio_task.set_exception(aio_taskc)` for now
  even though i'm pretty sure we're going to move to a try-fute-first
  style helper for this in the future.

Adjust some tests to match/mk-them-green,
- break from `aio_echo_server()` recv loop on
  `to_asyncio.TrioTaskExited` much like how you'd expect to (implicitly
  with a `for`) with a `trio.EndOfChannel`.
- toss in a masked `value is None` pause point i needed for debugging
  inf looping caused by not re-raising EoCs per the main patch
  description.
- add a debug-mode sized delay to root-infected test.

beb7097ab4 Add a mark to `pytest.xfail()` questionably conc py stuff (ur mam `.xfail()`s bish!)

b6d800954a Repair/update `stackscope` test

Seems that on 3.13 it's not showing our script code in the output now?
Gotta get an example for @oremanj to see what's up but really it'd be
nice to just custom format stuff above `trio`'s runtime by def..

Anyway, update the `.devx._stackscope`,
- log formatting to be a little more "sclangy" lookin.
- change the per-actor "delimiter" lines style.
- report the `signal.getsignal(SIGINT)` which i needed in the
  `sync_bp.py` with ctl-c causing a hang..
- mask the `_tree_dumped` duplicator log report as well as the "dumped
  fine" one.
- add an example `pkill --signal SIGUSR1` cmdline.

Tweak the test to cope with,
- not showing our script lines now.. which i've commented in the
  `assert_before()` patts..
- to expect the newly formatted delimiter (ascii) lines to separate the
  root vs. hanger sub-actor sections.

e646ce5c0d Mask ctlc borked REPL tests

Namely the `tractor.pause_from_sync()` examples using both bg threads
and `asyncio` which seem to go into bad states where SIGINT is ignored..

Deats,
- add `maybe_expect_timeout()` cm to ensure the EOF hangs get
  `.xfail()`ed instead.
- @pytest.mark.ctlcs_bish` `test_pause_from_sync` and don't expect the
  greenback prompt msg.
- also mark `test_sync_pause_from_aio_task`.

goodboy referenced this issue from a commit

2025-03-24 20:03:52 +00:00

Save an MIA `breakpoint()`-restore test from prior!?

goodboy force-pushed aio_abandons from e646ce5c0d to 15f99c313e

2025-03-24 20:03:52 +00:00

Compare

goodboy added 2 commits 2025-03-25 16:08:12 +00:00

90287b9875 Fix an `aio_err` ref bug

b1018a13fe Continue supporting py3.11+

Apparently the only thing needing a guard was use of
`asyncio.Queue.shutdown()` and the paired `QueueShutDown` exception?

Cool.

goodboy referenced this pull request

2025-03-25 16:28:34 +00:00

Python 3.13 support #18

goodboy added 3 commits 2025-03-25 20:00:03 +00:00

a66caa2397 Drop `asyncio`-canc error from `._exceptions`

47ec7e7a49 Add equiv of `AsyncioCancelled` for aio side

Such that a `TrioCancelled` is raised in the aio task via
`.set_exception()` to explicitly indicate and allow that task to handle
a taskc request from the parent `trio.Task`.

010d75248e Comment-tag pause points in `asycnio_bp.py`

Thought i already did this but, obvi needed these to make the expect
matches pass in our test.

goodboy commented

2025-03-27 02:26:19 +00:00

Poster

I mean anyone wanting to click approve (since they already built a buncha sh#! on top of this ;) would allow us to conduct the (normal) formal [boom, rhyme time] protocol to all things “community” and “foss”..

@guille

I mean anyone wanting to *click approve* (since they already built a buncha sh#! on top of this ;) would allow us to conduct the (normal) formal [boom, rhyme time] protocol to all things "community" and "foss".. @guille

goodboy changed title from ~~Prevent `asyncio` from abandoning guest-runs, `.pause_from_sync()` support via `.to_asyncio`~~ to Prevent `asyncio` from abandoning guest-runs, `.pause_from_sync()` support via `.to_asyncio`

2025-03-27 17:15:45 +00:00

goodboy changed target branch from runtime_to_msgspec to main

2025-03-27 17:15:45 +00:00

goodboy referenced this issue from a commit

2025-03-27 17:24:38 +00:00

Save an MIA `breakpoint()`-restore test from prior!?

goodboy force-pushed aio_abandons from 010d75248e to c91373148a

2025-03-27 17:24:38 +00:00

Compare

guille approved these changes 2025-03-27 17:26:33 +00:00

goodboy reviewed 2025-03-27 17:27:02 +00:00

tractor/_exceptions.py

													
				@ -82,6 +82,48 @@ class InternalError(RuntimeError):

				    '''

				class AsyncioCancelled(Exception):

goodboy commented

2025-03-27 17:27:02 +00:00

Poster

This is the main error-translation-semantics that changed, more or less being more pedantic about which side errored/cancelled/exited-gracefully and whether it was independent of the other side.

goodboy reviewed 2025-03-27 17:30:16 +00:00

tractor/to_asyncio.py

													
				@ -40,0 +39,4 @@

				from tractor._exceptions import (

				    InternalError,

				    is_multi_cancelled,

				    TrioTaskExited,

goodboy commented

2025-03-27 17:30:16 +00:00

Poster

For back-lookers (from the future) these new excs drove the improved error translation semantics throughout the cancel and exit handling machinery.

This file’s diff consists of the bulk of the changes described by the PR description.

For back-lookers (from the future) these new excs drove the improved error translation semantics throughout the cancel and exit handling machinery. This file's diff consists of the bulk of the changes described by the PR description.

goodboy reviewed 2025-03-27 17:31:05 +00:00

tractor/to_asyncio.py

													
				@ -73,0 +162,4 @@

				    #         self._final_result_is_set()

				    #     )

				    async def wait_for_result(

goodboy commented

2025-03-27 17:31:05 +00:00

Poster

Replicating the same outcome waiting API as Context.

Replicating the same outcome waiting API as `Context`.

goodboy reviewed 2025-03-27 17:33:14 +00:00

tractor/to_asyncio.py

													
				@ -442,0 +987,4 @@

				                # TODO? factor the next 2 branches into a func like

				                # `try_terminate_aio_task()` and use it for the taskc

				                # case above as well?

				                fut: asyncio.Future|None = aio_task._fut_waiter

goodboy commented

2025-03-27 17:33:14 +00:00

Poster

This is one of the critical-yet-questionable changes; asyncio.Task.cancel() seems to never work reliably and can often cause full guest-run abandonment, so instead we take the approach of touching any internal Future first and hoping for the best (which seems to work in practise!).

This is one of the critical-yet-questionable changes; `asyncio.Task.cancel()` seems to never work reliably and can often cause full guest-run abandonment, so instead we take the approach of touching any internal `Future` first and hoping for the best (which seems to work in practise!).

goodboy reviewed 2025-03-27 17:34:09 +00:00

tractor/to_asyncio.py

													
				@ -516,0 +1218,4 @@

				                # a `Return`-msg for IPC ctxs)

				                aio_task: asyncio.Task = chan._aio_task

				                if not aio_task.done():

				                    fut: asyncio.Future|None = aio_task._fut_waiter

goodboy commented

2025-03-27 17:34:09 +00:00

Poster

Same as mentioned above; appears to be the best/most-reliable hack for the moment..

goodboy reviewed 2025-03-27 17:34:39 +00:00

tractor/to_asyncio.py

													
				@ -516,0 +1242,4 @@

				    '''

				def run_trio_task_in_future(

goodboy commented

2025-03-27 17:34:39 +00:00

Poster

Much thanks to @oremanj (from core trio team on GH) for this fn!

Much thanks to @oremanj (from core `trio` team on GH) for this fn!

goodboy reviewed 2025-03-27 17:37:33 +00:00

examples/debugging/asyncio_bp.py

													
				@ -1,8 +1,16 @@

				'''

goodboy commented

2025-03-27 17:37:33 +00:00

Poster

This is a pretty important step forward for the debugger REPL tooling since now you can definitely get multi-actor safe pausing from infected-asyncio actors including crash handling B)

This is a pretty important step forward for the debugger REPL tooling since now you can definitely get multi-actor safe pausing from infected-`asyncio` actors including crash handling B)

goodboy merged commit 222b90940c into main

2025-03-27 17:37:57 +00:00

goodboy referenced this issue from a commit

2025-03-27 17:37:57 +00:00

Merge pull request 'Prevent `asyncio` from abandoning guest-runs, `.pause_from_sync()` support via `.to_asyncio`' (#2) from aio_abandons into main

goodboy referenced this pull request

2025-03-27 18:00:10 +00:00

Python 3.13 support #18

Sign in to join this conversation.

No reviewers

No Label

No Milestone

No project

No Assignees

2 Participants

Notifications

Due Date

The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: goodboy/tractor#2

				`@ -1,8 +1,16 @@`
				`'''`

Prevent asyncio from abandoning guest-runs, .pause_from_sync() support via .to_asyncio #2

On asyncio being super lovely and abandoning our guest-run..

Much improved asyncio-mode support driven by upcoming py3.13 support,

New test suites/extensions introduced here,

History of outstandings from the original .pause_from_sync()

TODO list from GH #374

Unrelated improvements thrown in,

Maybe to cherry from py313_support and ext_type_plds branches?

Prevent `asyncio` from abandoning guest-runs, `.pause_from_sync()` support via `.to_asyncio` #2

On `asyncio` being super lovely and abandoning our guest-run..

Much improved `asyncio`-mode support driven by upcoming py3.13 support,

History of outstandings from the original `.pause_from_sync()`

Maybe to cherry from `py313_support` and `ext_type_plds` branches?