If `stackscope` is importable and debug_mode is enabled then we by
default call and report `.devx.enable_stack_on_sig()` is set B)
This makes debugging unexpected (SIGINT ignoring) hangs a cinch!
We're passing a `extra_frames_up_when_async=2` now (from prior attempt
to hide `CancelScope.__exit__()` when `shield=True`) and thus both
`debug_func`s must accept it 🤦
On the brighter side found out that the `TypeError` from the call-sig
mismatch was actually being swallowed entirely so add some
`.exception()` msgs for such cases to at least alert the dev they broke
stuff XD
It's been on the todo for a while and I've given up trying to properly
hide the `trio.CancelScope.__exit__()` frame for now instead opting to
just `log.pdb()` a big apology XD
Users can obvi still just not use the flag and wrap `tractor.pause()` in
their own cs block if they want to avoid having to hit `'up'` in the pdb
REPL if needed in a cancelled task-scope.
Impl deatz:
- factor orig `.pause()` impl into new `._pause()` so that we can more tersely
wrap the original content depending on `shield: bool` input; only open
the cancel-scope when shield is set to avoid aforemented extra strack
frame annoyance.
- pass through `shield` to underlying `_pause` and `debug_func()` so we
can actually know when so log our apology.
- add a buncha notes to new `.pause()` wrapper regarding the inability
to hide the cancel-scope `.__exit__()`, inluding that overriding the
code in `trio._core._run.CancelScope` doesn't seem to solve the issue
either..
Unrelated `maybe_wait_for_debugger()` tweaks:
- don't read `Lock.global_actor_in_debug` more then needed, rename local
read var to `in_debug` (since it can also hold the root actor uid, not
just sub-actors).
- shield the `await debug_complete.wait()` since ideally we avoid the
root cancellation child-actors in debug even when the root calls this
func in a cancelled scope.
Since this was changed as part of overall project wide logging format
updates, and i ended up changing the both the crash and pause `.pdb()`
msgs to include some multi-line-ascii-"stuff", might as well make the
pre-prompt checks in the test suite more flexible to match.
As such, this exposes 2 new constants inside the `.devx._debug` mod:
- `._pause_msg: str` for the pre `tractor.pause()` header emitted via
`log.pdb()` and,
- `._crash_msg: str` for the pre `._post_mortem()` equiv when handling
errors in debug mode.
Adjust the test suite to use these values and thus make us more capable
to absorb changes in the future as well:
- add a new `in_prompt_msg()` predicate, very similar to `assert_before()`
but minus `assert`s which takes in a `parts: list[str]` to match
in the pre-prompt stdout.
- delegate to `in_prompt_msg()` in `assert_before()` since it was mostly
duplicate minus `assert`.
- adjust all previous `<patt> in before` asserts to instead use
`in_prompt_msg()` with separated pre-prompt-header vs. actor-name
`parts`.
- use new `._pause/crash_msg` values in all such calls including any
`assert_before()` cases.
Allow callers to stick in a header to the `.pdb()` level emitted msg(s)
such that any "waiting status" content is only shown if the caller
actually get's blocked waiting for the debug lock; use it inside the
`._spawn` sub-process reaper call.
Also, return early if `Lock.global_actor_in_debug == None` and thus
only enter the poll loop when actually needed, consequently raise
if we fall through the loop without acquisition.
When entered by the root actor avoid excessive polling cycles by,
- blocking on the `Lock.no_remote_has_tty: trio.Event` and breaking
*immediately* when set (though we should really also lock
it from the root right?) to avoid extra loops..
- shielding the `await trio.sleep(poll_delay)` call to avoid any local
cancellation causing the (presumably root-actor task) caller to move
on (possibly to cancel its children) and instead to continue
poll-blocking until the lock is actually released by its user.
- `break` the poll loop immediately if no remote locker is detected.
- use `.pdb()` level for reporting lock state changes.
Also add a #TODO to handle calls by non-root actors as it pertains to
Can be optionally enabled via a new `enable_stack_on_sig()` which will
swap in the SIGUSR1 handler. Much thanks to @oremanj for writing this
amazing project, it's thus far helped me fix some very subtle hangs
inside our new IPC-context cancellation machinery that would have
otherwise taken much more manual pdb-ing and hair pulling XD
Full credit for `dump_task_tree()` goes to the original project author
with some minor tweaks as was handed to me via the trio-general matrix
room B)
Slight changes from orig version:
- use a `log.pdb()` emission to pprint to console
- toss in an ex sh CLI cmd to trigger the dump from another terminal
using `kill` + `pgrep`.
Implement it like you'd expect using simply a wrapping
`trio.CancelScope` which is itself shielded by the input `shield: bool`
B)
There's seemingly still some issues with the frame selection when the
REPL engages and not sure how to resolve it yet but at least this does
indeed work for practical purposes. Still needs a test obviously!
Starting of with just a `typer` (and thus transitively `click`)
`typer.Typer.callback` hook which allows passthrough of the `--ll
<loglevel: str>` and `--pdb <debug_mode: bool>` flags for use when
building CLIs that use the runtime Bo
Still needs lotsa refinement and obviously better docs but, the doc
string for `load_runtime_vars()` shows how to use the underlying
`.devx._debug.open_crash_handler()` via a wrapper that can be passed the
`--pdb` flag and then enable debug mode throughout the entire actor
system.
Where `.devx` is "developer experience", a hopefully broad enough subpkg
name for all the slick stuff planned to augment working on the actor
runtime 💥
Move the `._debug` module into the new subpkg and adjust rest of core
code base to reflect import path change. Also add a new
`.devx._debug.open_crash_handler()` manager for wrapping any sync code
outside a `trio.run()` which is handy for eventual CLI addons for
popular frameworks like `click`/`typer`.
For whatever reason pdb(p), and in general, will show the frame of the
*next* python instruction/LOC on initial entry (at least using
`.set_trace()`), as such remove the `try/finally` block in the sync
code entrypoint `.pause_from_sync()`, and also since doesn't seem like
we really need it anyway.
Further, and to this end:
- enable hidden frames support in our default config.
- fix/drop/mask all the frame ref-ing/mangling we had prior since it's no
longer needed as well as manual `Lock` releasing which seems to work
already by having the `greenback` spawned task do it's normal thing?
- move to no `Union` type annots.
- hide all frames that can add "this is the runtime confusion" to
traces.
This works now for supporting a new `tractor.pause_from_sync()`
`tractor`-aware-replacement for `Pdb.set_trace()` from sync functions
which are also scheduled from our runtime. Uses `greenback` to do all
the magic of scheduling the bg `tractor._debug._pause()` task and
engaging the normal TTY locking machinery triggered by `await
tractor.breakpoint()`
Further this starts some public API renaming, making a switch to
`tractor.pause()` from `.breakpoint()` which IMO much better expresses
the semantics of the runtime intervention required to suffice
multi-process "breakpointing"; it also is an alternate name for the same
in computer science more generally: https://en.wikipedia.org/wiki/Breakpoint
It also avoids using the same name as the `breakpoint()` built-in which
is important since there **is alot more going on** when you call our
equivalent API.
Deats of that:
- add deprecation warning for `tractor.breakpoint()`
- add `tractor.pause()` and a shorthand, easier-to-type, alias `.pp()`
for "pause-point" B)
- add `pause_from_sync()` as the new `breakpoint()`-from-sync-function
hack which does all the `greenback` stuff for the user.
Still TODO:
- figure out where in the runtime and when to call
`greenback.ensure_portal()`.
- fix the frame selection issue where
`trio._core._ki._ki_protection_decorator:wrapper` seems to be always
shown on REPL start as the selected frame..
Only found this by luck more or less (while working on something in
a client project) and it turns out we can actually get to (yet another)
hang state where SIGINT will be ignored by the root actor on teardown..
I've added all the necessary logic flags to reproduce. We obviously need
a follow up bug issue and a test suite to replicate!
It appears as though the following are required based on very light
tinkering:
- infected asyncio mode active
- debug mode active
- the `trio` context must breakpoint *before* `.started()`-ing
- the `asyncio` must **not** error
More or less just simplifies to not seeing the stream closure errors and
instead expecting KBIs from the simulated user who 'ctl-cs after hang'.
Toss in a little `stuff_hangin_ctlc()` to the script to wrap all that
and always check stream closure before sending the final KBI.
I swear long ago it used to operate this way but, I guess this finalizes
the design decision. It makes a lot more sense to *not* propagate any
`trio.EndOfChannel` raised from a `Context.open_stream() as stream:`
block when that EoC is due to graceful-explicit stream termination.
We use the EoC much like a `StopAsyncIteration` where the error
indicates termination of the stream due to either:
- reception of a stop IPC msg indicating the far end ended the stream
(gracecfully),
- closure of the underlying `Context._recv_chan` either by the runtime
or due to user code having called `MsgStream.aclose()`.
User code shouldn't expect to handle EoC outside the block since the
`@acm` having closed should indicate the exactly same lifetime state
(of said stream) ;)
Deats:
- add special EoC handler in `.open_stream()` which silently "absorbs"
the error only when the stream is already marked as closed (meaning
the EoC indeed corresponds to IPC closure) with an assert for now
ensuring the error is the same as set to `MsgStream._eoc`.
- in `MsgStream.receive()` break up the handlers for EoC and
`trio.ClosedResourceError` since the error instances are saved to
different variables and we **don't** want to rewrite the exception in
the eoc case (normally to mask `trio` internals in tbs) bc we need the
instance to be the exact one for doing checks inside
`.open_stream().__aexit__()` to absorb it.
Other surrounding "improvements":
- start using the new `Context.maybe_raise()` helper where it can easily
replace existing equivalent block-sections.
- use new `RemoteActorError.src_uid` as required.
- `trio_typing` is nearly obsolete since `trio >= 0.23`
- `exceptiongroup` is built-in to python 3.11
- `async_generator` primitives have lived in `contextlib` for quite
a while!
Since `._runtime` was getting pretty long (> 2k LOC) and much of the RPC
low-level machinery is fairly isolated to a handful of task-funcs, it
makes sense to re-org the RPC task scheduling and driving msg loop to
its own code space.
The move includes:
- `process_messages()` which is the main IPC business logic.
- `try_ship_error_to_remote()` helper, to box local errors for the wire.
- `_invoke()`, the core task scheduler entrypoing used in the msg loop.
- `_invoke_non_context()`, holds impls for non-`@context` task starts.
- `_errors_relayed_via_ipc()` which does all error catch-n-boxing for
wire-msg shipment using `try_ship_error_to_remote()` internally.
Also inside `._runtime` improve some `Actor` methods docs.
Finally, since normally you need the content from `._context.Context`
and surroundings in order to effectively grok `Portal.open_context()`
anyways, might as well move the impl to the ctx module as
`open_context_from_portal()` and just bind it on the `Portal` class def.
Associated/required tweaks:
- avoid circ import on `.devx` by only import
`.maybe_wait_for_debugger()` when debug mode is set.
- drop `async_generator` usage, not sure why this hadn't already been
changed to `contextlib`?
- use `@acm` alias throughout `._portal`
Previously i was trying to approach this using lots of
`__tracebackhide__`'s in various internal funcs but since it's not
exactly straight forward to do this inside core deps like `trio` and the
stdlib, it makes a bit more sense to optionally catch and re-raise
certain classes of errors from their originals using `raise from` syntax
as per:
https://docs.python.org/3/library/exceptions.html#exception-context
Deats:
- litter `._context` methods with `__tracebackhide__`/`hide_tb` which
were previously being shown but that don't need to be to application
code now that cancel semantics testing is finished up.
- i originally did the same but later commented it all out in `._ipc`
since error catch and re-raise instead in higher level layers
(above the transport) seems to be a much saner approach.
- add catch-n-reraise-from in `MsgStream.send()`/.`receive()` to avoid
seeing the depths of `trio` and/or our `._ipc` layers on comms errors.
Further this patch adds some refactoring to use the
same remote-error shipper routine from both the actor-core in the RPC
invoker:
- rename it as `try_ship_error_to_remote()` and call it from
`._invoke()` as well as it's prior usage.
- make it optionally accept `cid: str` a `remote_descr: str` and of
course a `hide_tb: bool`.
Other misc tweaks:
- add some todo notes around `Actor.load_modules()` debug hooking.
- tweak the zombie reaper log msg and timeout value ;)
Since importing from our top level `conftest.py` is not scaleable
or as "future forward thinking" in terms of:
- LoC-wise (it's only one file),
- prevents "external" (aka non-test) example scripts from importing
content easily,
- seemingly(?) can't be used via abs-import if using
a `[tool.pytest.ini_options]` in a `pyproject.toml` vs.
a `pytest.ini`, see:
https://docs.pytest.org/en/8.0.x/reference/customize.html#pyproject-toml)
=> Go back to having an internal "testing" pkg like `trio` (kinda) does.
Deats:
- move generic top level helpers into pkg-mod including the new
`expect_ctxc()` (which i needed in the advanced faults testing script.
- move `@tractor_test` into `._testing.pytest` sub-mod.
- adjust all the helper imports to be a `from tractor._testing import <..>`
Rework `test_ipc_channel_break_during_stream()` and backing script:
- make test(s) pull `debug_mode` from new fixture (which is now
controlled manually from `--tpdb` flag) and drop the previous
parametrized input.
- update logic in ^ test for "which-side-fails" cases to better match
recently updated/stricter cancel/failure semantics in terms of
`ClosedResouruceError` vs. `EndOfChannel` expectations.
- handle `ExceptionGroup`s with expected embedded errors in test.
- better pendantics around whether to expect a user simulated KBI.
- for `examples/advanced_faults/ipc_failure_during_stream.py` script:
- generalize ipc breakage in new `break_ipc()` with support for diff
internal `trio` methods and a #TODO for future disti frameworks
- only make one sub-actor task break and the other just stream.
- use new `._testing.expect_ctxc()` around ctx block.
- add a bit of exception handling with `print()`s around ctxc (unused
except if 'msg' break method is set) and eoc cases.
- don't break parent side ipc in loop any more then once
after first break, checked via flag var.
- add a `pre_close: bool` flag to control whether
`MsgStreama.aclose()` is called *before* any ipc breakage method.
Still TODO:
- drop `pytest.ini` and add the alt section to `pyproject.py`.
-> currently can't get `--rootdir=` opt to work.. not showing in
console header.
-> ^ also breaks on 'tests' `enable_modules` imports in subactors
during discovery tests?
With the seeming cause that some cases occasionally raise
`ExceptionGroup` instead of a (collapsed out) single error which, in
those cases at least try to check that `.exceptions` has the original
error.
Found exactly why trying this won't work when playing around with
opening workspaces in `modden` using a `Portal.open_context()` back to
the 'bigd' root actor: the RPC machinery only registers one entry in
`Actor._contexts` which will get overwritten by each task's side and
then experience race-based IPC msging errors (eg. rxing `{'started': _}`
on the callee side..). Instead make opening a ctx back to the self-actor
a runtime error describing it as an invalid op.
To match:
- add a new test `test_ctx_with_self_actor()` to the context semantics
suite.
- tried out adding a new `side: str` to the `Actor.get_context()` (and
callers) but ran into not being able to determine the value from in
`._push_result()` where it's needed to figure out which side to push
to.. So, just leaving the commented arg (passing) in the runtime core
for now in case we can come back to trying to make it work, tho i'm
thinking it's not the right hack anyway XD
Call it `allow_msg_keys: list[str] = ['yield']` and set it to accept
`['yield', 'return']` from the drain loop in `.aclose()`. Only pass the
last key error to `_raise_from_no_key_in_msg()` in the fall-through
case.
Somehow this seems to prevent all the intermittent test failures i was
seeing in local runs including when running the entire suite all in
sequence; i ain't complaining B)
Much like similar recent changes throughout the core, build out `msg:
str` depending on error cases and emit with `.cancel()` level as
appropes. Also mute (via level) some duplication in the cancel case
inside `_run_asyncio_task()` for console noise reduction.
Since apparently `str(KeyboardInterrupt()) == ''`? So instead add little
`<str> or repr(merr)` expressions throughout to avoid blank strings
rendering if various `repr()`/`.__str__()` outputs..
Not sure if it's really that useful other then for reporting errors from
`current_actor()` but at least it alerts `tractor` devs and/or users
when the runtime has already terminated vs. hasn't been started
yet/correctly.
Set the `._last_actor_terminated: tuple` in the root's final block which
allows testing for an already terminated tree which is the case where
`._state._current_actor == None` and the last is set.
Including mostly tweaking asserts on relayed `ContextCancelled`s and
the new pub ctx properties: `.outcome`, `.maybe_error`, etc. as it
pertains to graceful (absorbed) remote cancellation vs. loud ctxc cases
expected to be raised by any `Portal.cancel_actor()` style teardown.
Start checking a variety internals like `._remote/local_error`,
`._is_self_cancelled()`, `._is_final_result_set()`, `._cancel_msg`
where applicable.
Also factor out the new `expect_ctxc()` checker to our `conftest.py` for
use in other suites.