forked from goodboy/tractor
1
0
Fork 0
Commit Graph

1234 Commits (7dd72e042d8f7811a22a264457c6b0accfefa863)

Author SHA1 Message Date
Tyler Goodlet 7dd72e042d Show full KBI trace for help with CI hangs 2022-07-27 11:38:13 -04:00
Tyler Goodlet ee8ead4d7a Move pydantic-click hang example to new dir, skip in test suite 2022-07-27 11:38:13 -04:00
Tyler Goodlet 8a70a52ff9 Add spaces before values in log msg 2022-07-27 11:38:06 -04:00
Tyler Goodlet 70e4458fb0 Add runtime level msg around channel draining 2022-07-27 11:38:06 -04:00
Tyler Goodlet 70d1c98c10 Always undo SIGINT overrides , cancel detached children
Ensure that even when `pdb` resumption methods are called during a crash
where `trio`'s runtime has already terminated (eg. `Event.set()` will
raise) we always revert our sigint handler to the original. Further
inside the handler if we hit a case where a child is in debug and
(thinks it) has the global pdb lock, if it has no IPC connection to
a parent, simply presume tty sync-coordination is now lost and cancel
the child immediately.
2022-07-27 11:38:06 -04:00
Tyler Goodlet dade6a4b43 Readme formatting tweaks 2022-07-27 11:38:06 -04:00
Tyler Goodlet 7b4049198a Tolerate double `.remove()`s of stream on portal teardowns 2022-07-27 11:37:59 -04:00
Tyler Goodlet 931b20cf35 Always propagate SIGINT when no locking peer found
A hopefully significant fix here is to always avoid suppressing a SIGINT
when the root actor can not detect an active IPC connections (via
a connected channel) to the supposed debug lock holding actor. In that
case it is most likely that the actor has either terminated or has lost
its connection for debugger control and there is no way the root can
verify the lock is in use; thus we choose to allow KBI cancellation.

Drop the (by comment) `try`-`finally` block in
`_hijoack_stdin_for_child()` around the `_acquire_debug_lock()` call
since all that logic should now be handled internal to that locking
manager. Try to catch a weird error around the `.do_longlist()` method
call that seems to sometimes break on py3.10 and latest `pdbpp`.
2022-07-27 11:37:59 -04:00
Tyler Goodlet 11c1582c39 Always call pdb hook even if tty locking fails 2022-07-27 11:37:59 -04:00
Tyler Goodlet d1f347c21f Log cancels with appropriate level 2022-07-27 11:37:59 -04:00
Tyler Goodlet 2800100b21 Just warn on IPC breaks 2022-07-27 11:37:59 -04:00
Tyler Goodlet 1fd4588243 Only warn on `trio.BrokenResourceError`s from `_invoke()` 2022-07-27 11:37:59 -04:00
Tyler Goodlet 7ecc48b3c9 Make example a subpkg for `python -m <mod>` testing 2022-07-27 11:37:59 -04:00
Tyler Goodlet ebefd6e6b4 Add example that triggers bug #302 2022-07-27 11:37:59 -04:00
Tyler Goodlet 67607a4d37 Add back in async gen loop 2022-07-27 11:37:59 -04:00
Tyler Goodlet df16a0c315 Pre-declare disconnected flag 2022-07-27 11:37:59 -04:00
Tyler Goodlet 1163ec5034 Avoid attr error XD 2022-07-27 11:37:59 -04:00
Tyler Goodlet e2169f227d Type annot updates 2022-07-27 11:37:59 -04:00
Tyler Goodlet 4e06b10502 Drop uneeded backframe traceback hide annotation 2022-07-27 11:37:59 -04:00
Tyler Goodlet a5f543eb22 Make `Actor._process_messages()` report disconnects
The method now returns a `bool` which flags whether the transport died
to the caller and allows for reporting a disconnect in the
channel-transport handler task. This is something a user will normally
want to know about on the caller side especially after seeing
a traceback from the peer (if in tree) on console.
2022-07-27 11:37:59 -04:00
Tyler Goodlet d280c26f15 Only cancel/get-result from a ctx if transport is up
There's no point in sending a cancel message to the remote linked task
and especially no reason to block waiting on a result from that task if
the transport layer is detected to be disconnected. We expect that the
transport shouldn't go down at the layer of the message loop
(reconnection logic should be handled in the transport layer itself) so
if we detect the channel is not connected we don't bother requesting
cancels nor waiting on a final result message.

Why?

- if the connection goes down in error the caller side won't have a way
  to know "how long" it should block to wait for a cancel ack or result
  and causes a potential hang that may require an additional ctrl-c from
  the user especially if using the debugger or if the traceback is not
  seen on console.
- obviously there's no point in waiting for messages when there's no
  transport to deliver them XD

Further, add some more detailed cancel logging detailing the task and
actor ids.
2022-07-27 11:37:59 -04:00
Tyler Goodlet 74819821a8 Drop high log level in ctx example 2022-07-27 11:37:59 -04:00
Tyler Goodlet 5dd8adcfb8 Typing fixes, simplify `_set_trace()` 2022-07-27 11:37:59 -04:00
Tyler Goodlet 95ccb27004 Add notes around py3.10 stdlib bug from `pdb++`
There's a bug that's triggered in the stdlib without latest `pdb++`
installed; add a note for that.

Further inside `wait_for_parent_stdin_hijack()` don't `.started()` until
the interactor stream has been opened to avoid races when debugging this
`._debug.py` module (at the least) since we usually don't want the
spawning (parent) task to resume until we know for sure the tty lock has
been acquired. Also, drop the random checkpoint we had inside
`_breakpoint()`, not sure it was actually adding anything useful since
we're (mostly) carefully shielded throughout this func.
2022-07-27 11:37:59 -04:00
Tyler Goodlet 4e6d00918b Add and use a pdb instance factory 2022-07-27 11:37:59 -04:00
Tyler Goodlet a0016bcdc8 A `.open_context()` example that causes a hang!
Finally! I think this may be the root issue we've been seeing in
production in a client project.

No idea yet why this is happening but the fault-causing sequence seems
to be:
- `.open_context()` in a child actor
- enter the debugger via `tractor.breakpoint()`
- continue from that entry via `c` command in REPL
- raise an error just after inside the context task's body

Looking at logging it appears as though the child thinks it has the tty
but no input is accepted on the REPL and a further `ctrl-c` results in
some teardown but also a further hang where both parent and child become
unresponsive..
2022-07-27 11:37:59 -04:00
Tyler Goodlet 9e0bb4f90c Drop all the `@cm.__exit__()` override attempts..
None of it worked (you still will see `.__exit__()` frames on debugger
entry - you'd think this would have been solved by now but, shrug) so
instead wrap the debugger entry-point in a `try:` and put the SIGINT
handler restoration inside `MultiActorPdb` teardown hooks.

This seems to restore the UX as it was prior but with also giving the
desired SIGINT override handler behaviour.
2022-07-27 11:37:59 -04:00
Tyler Goodlet a617631400 Try overriding `_GeneratorContextManager.__exit__()`; didn't work..
Using either of `@pdb.hideframe` or `__tracebackhide__` on stdlib
methods doesn't seem to work either.. This all seems to have something
to do with async generator usage I think ?
2022-07-27 11:37:59 -04:00
Tyler Goodlet 4ea2bc5932 Fix example name typo 2022-07-27 11:37:59 -04:00
Tyler Goodlet a8a2110458 Handle a context cancel? Might be a noop 2022-07-27 11:37:59 -04:00
Tyler Goodlet aad9d7e947 Add a pre-started breakpoint example 2022-07-27 11:37:59 -04:00
Tyler Goodlet aee00e6741 Make `mypy` happy 2022-07-27 11:37:59 -04:00
Tyler Goodlet 688e0b9ebe Refine the handler for child vs. root cases
This gets very close to avoiding any possible hangs to do with tty
locking and SIGINT handling minus a special case that will be detailed
below.

Summary of implementation changes:

- convert `_mk_pdb()` -> `with _open_pdb() as pdb:` which implicitly
  handles the `bdb.BdbQuit` case such that debugger teardown hooks are
  always called.
- rename the handler to `shield_sigint()` and handle a variety of new
  cases:
  * the root is in debug but hasn't been cancelled -> call
    `Actor.cancel_soon()`
  * the root is in debug but *has* been called (`Actor.cancel_soon()`
    already called) -> raise KBI
  * a child is in debug *and* has a task locking the debugger -> ignore
    SIGINT in child *and* the root actor.
- if the debugger instance is provided to the handler at acquire time,
  on SIGINT handling completion re-print the last pdb++ REPL output so
  that the user realizes they are still actively in debug.
- ignore the unlock case where a race condition of "no task" holding the
  lock causes the `RuntimeError` normally associated with the "wrong
  task" doing so (not sure if this is a `trio` bug?).
- change debug logs to runtime level.

Unhandled case(s):

- a child is maybe in debug mode but does not itself have any task using
  the debugger.
    * ToDo: we need a way to decide what to do with
      "intermediate" child actors who themselves either are not in
      `debug_mode=True` but have children who *are* such that a SIGINT
      won't cause cancellation of that child-as-parent-of-another-child
      **iff** any of their children are in in debug mode.
2022-07-27 11:37:59 -04:00
Tyler Goodlet 1e789ecad2 (facepalm) Reraise `BdbQuit` and discard ownerless lock releases 2022-07-27 11:37:59 -04:00
Tyler Goodlet 0503142332 Add WIP while-debugger-active SIGINT ignore handler 2022-07-27 11:37:59 -04:00
goodboy 4902e184e9
Merge pull request #318 from goodboy/aio_error_propagation
Add context test that opens an inter-task-channel that errors
2022-07-15 12:42:19 -04:00
Tyler Goodlet 05790a20c1 Slight lint fixes 2022-07-15 11:18:48 -04:00
Tyler Goodlet 565c603300 Add nooz 2022-07-15 11:17:57 -04:00
Tyler Goodlet f0d78e1a6e Use local task ref, fixes `mypy` 2022-07-15 10:39:49 -04:00
Tyler Goodlet ce01f6b21c Increase timeout for CI/windows 2022-07-14 20:44:10 -04:00
Tyler Goodlet 0906559ed9 Drop manual stack construction, fix attr typo 2022-07-14 20:43:17 -04:00
Tyler Goodlet 38d03858d7 Fix `asyncio`-task-sync and error propagation
This fixes an previously undetected bug where if an
`.open_channel_from()` spawned task errored the error would not be
propagated to the `trio` side and instead would fail silently with
a console log error. What was most odd is that it only seems easy to
trigger when you put a slight task sleep before the error is raised
(:eyeroll:). This patch adds a few things to address this and just in
general improve iter-task lifetime syncing:

- add `LinkedTaskChannel._trio_exited: bool` a flag set from the `trio`
  side when the channel block exits.
- add a `wait_on_aio_task: bool` flag to `translate_aio_errors` which
  toggles whether to wait the `asyncio` task termination event on exit.
- cancel the `asyncio` task if the trio side has ended, when
  `._trio_exited == True`.
- always close the `trio` mem channel when the task exits such that
  the `asyncio` side can error on any next `.send()` call.
2022-07-14 16:35:41 -04:00
Tyler Goodlet 98de2fab31 Add context test that opens an inter-task-channel that errors 2022-07-14 16:13:12 -04:00
goodboy 80121ed211
Merge pull request #317 from goodboy/drop_msgpack
Drop `msgpack`
2022-07-12 13:31:45 -04:00
Tyler Goodlet 41983edc43 Use `str` | `bytes` union for typing msg dump 2022-07-12 11:59:11 -04:00
Tyler Goodlet 5168700fbf Tolerate non-decode-able bytes 2022-07-12 11:55:55 -04:00
Tyler Goodlet 673c4a8c66 Decode bytes prior to log msg 2022-07-12 11:55:55 -04:00
Tyler Goodlet 932b841176 Allow up to 4 `msgpsec` decode failures 2022-07-12 11:55:55 -04:00
Tyler Goodlet f594f1bdda Handle a connection reset on `msgspec` transport 2022-07-12 11:55:55 -04:00
Tyler Goodlet 53e3648eca Readme bump 2022-07-12 11:52:42 -04:00