Commit Graph

1211 Commits (af205c08f2bdca8957102be7cb5e6ccad205a2dc)

Author SHA1 Message Date
Tyler Goodlet af205c08f2 Show full KBI trace for help with CI hangs 2022-06-26 16:00:14 -04:00
Tyler Goodlet ab557fae21 Move pydantic-click hang example to new dir, skip in test suite 2022-06-26 15:34:33 -04:00
Tyler Goodlet 0b1c1ac568 Drop asyncio-cancelled-itself msg for now, report task names 2022-06-26 15:06:00 -04:00
Tyler Goodlet 9e37bb22e1 Add spaces before values in log msg 2022-06-26 15:06:00 -04:00
Tyler Goodlet 01dea6fe32 Add runtime level msg around channel draining 2022-06-26 15:06:00 -04:00
Tyler Goodlet 79faffd577 Always undo SIGINT overrides , cancel detached children
Ensure that even when `pdb` resumption methods are called during a crash
where `trio`'s runtime has already terminated (eg. `Event.set()` will
raise) we always revert our sigint handler to the original. Further
inside the handler if we hit a case where a child is in debug and
(thinks it) has the global pdb lock, if it has no IPC connection to
a parent, simply presume tty sync-coordination is now lost and cancel
the child immediately.
2022-06-26 15:06:00 -04:00
Tyler Goodlet b3fd5da1be Allow up to 4 `msgpsec` decode failures 2022-06-26 15:06:00 -04:00
Tyler Goodlet 414c59cca6 Readme formatting tweaks 2022-06-26 15:06:00 -04:00
Tyler Goodlet 7710213604 Pin to `pdbpp` upstream master, 3.10 problem?
See issues:
- https://github.com/pdbpp/pdbpp/issues/480
- https://github.com/pdbpp/pdbpp/pull/482
2022-06-26 15:06:00 -04:00
Tyler Goodlet 71e779dca3 Tolerate double `.remove()`s of stream on portal teardowns 2022-06-26 15:05:59 -04:00
Tyler Goodlet 5457aa566c Always propagate SIGINT when no locking peer found
A hopefully significant fix here is to always avoid suppressing a SIGINT
when the root actor can not detect an active IPC connections (via
a connected channel) to the supposed debug lock holding actor. In that
case it is most likely that the actor has either terminated or has lost
its connection for debugger control and there is no way the root can
verify the lock is in use; thus we choose to allow KBI cancellation.

Drop the (by comment) `try`-`finally` block in
`_hijoack_stdin_for_child()` around the `_acquire_debug_lock()` call
since all that logic should now be handled internal to that locking
manager. Try to catch a weird error around the `.do_longlist()` method
call that seems to sometimes break on py3.10 and latest `pdbpp`.
2022-06-26 15:05:16 -04:00
Tyler Goodlet 20b902b300 Always call pdb hook even if tty locking fails 2022-04-17 13:15:33 -04:00
Tyler Goodlet 3955906654 Handle a connection reset on `msgspec` transport 2022-04-17 13:15:33 -04:00
Tyler Goodlet af1ee3f0a6 Log cancels with appropriate level 2022-04-17 13:15:33 -04:00
Tyler Goodlet f9f4bcf27c Just warn on IPC breaks 2022-04-17 13:15:33 -04:00
Tyler Goodlet 07ac6eb5d0 Only warn on `trio.BrokenResourceError`s from `_invoke()` 2022-04-17 13:15:33 -04:00
Tyler Goodlet 83ed2f6286 Make example a subpkg for `python -m <mod>` testing 2022-04-17 13:15:33 -04:00
Tyler Goodlet a40b168dfb Add example that triggers bug #302 2022-04-17 13:15:33 -04:00
Tyler Goodlet 1822d0b48b Add back in async gen loop 2022-04-17 13:15:33 -04:00
Tyler Goodlet dbc689d55a Pre-declare disconnected flag 2022-04-17 13:15:33 -04:00
Tyler Goodlet e49cccf666 Avoid attr error XD 2022-04-17 13:15:33 -04:00
Tyler Goodlet 5892d64d6e Type annot updates 2022-04-17 13:15:33 -04:00
Tyler Goodlet e83d158bfb Drop uneeded backframe traceback hide annotation 2022-04-17 13:15:33 -04:00
Tyler Goodlet e107257ac0 Make `Actor._process_messages()` report disconnects
The method now returns a `bool` which flags whether the transport died
to the caller and allows for reporting a disconnect in the
channel-transport handler task. This is something a user will normally
want to know about on the caller side especially after seeing
a traceback from the peer (if in tree) on console.
2022-04-17 13:15:33 -04:00
Tyler Goodlet 83367caf42 Only cancel/get-result from a ctx if transport is up
There's no point in sending a cancel message to the remote linked task
and especially no reason to block waiting on a result from that task if
the transport layer is detected to be disconnected. We expect that the
transport shouldn't go down at the layer of the message loop
(reconnection logic should be handled in the transport layer itself) so
if we detect the channel is not connected we don't bother requesting
cancels nor waiting on a final result message.

Why?

- if the connection goes down in error the caller side won't have a way
  to know "how long" it should block to wait for a cancel ack or result
  and causes a potential hang that may require an additional ctrl-c from
  the user especially if using the debugger or if the traceback is not
  seen on console.
- obviously there's no point in waiting for messages when there's no
  transport to deliver them XD

Further, add some more detailed cancel logging detailing the task and
actor ids.
2022-04-17 13:15:33 -04:00
Tyler Goodlet e6a520c944 Drop high log level in ctx example 2022-04-17 13:15:33 -04:00
Tyler Goodlet 5a83f373ef Typing fixes, simplify `_set_trace()` 2022-04-17 13:15:33 -04:00
Tyler Goodlet 228dfff91c Add notes around py3.10 stdlib bug from `pdb++`
There's a bug that's triggered in the stdlib without latest `pdb++`
installed; add a note for that.

Further inside `wait_for_parent_stdin_hijack()` don't `.started()` until
the interactor stream has been opened to avoid races when debugging this
`._debug.py` module (at the least) since we usually don't want the
spawning (parent) task to resume until we know for sure the tty lock has
been acquired. Also, drop the random checkpoint we had inside
`_breakpoint()`, not sure it was actually adding anything useful since
we're (mostly) carefully shielded throughout this func.
2022-04-17 13:15:33 -04:00
Tyler Goodlet 866f6f9d40 Add and use a pdb instance factory 2022-04-17 13:15:33 -04:00
Tyler Goodlet f9a8543811 A `.open_context()` example that causes a hang!
Finally! I think this may be the root issue we've been seeing in
production in a client project.

No idea yet why this is happening but the fault-causing sequence seems
to be:
- `.open_context()` in a child actor
- enter the debugger via `tractor.breakpoint()`
- continue from that entry via `c` command in REPL
- raise an error just after inside the context task's body

Looking at logging it appears as though the child thinks it has the tty
but no input is accepted on the REPL and a further `ctrl-c` results in
some teardown but also a further hang where both parent and child become
unresponsive..
2022-04-17 13:15:33 -04:00
Tyler Goodlet aa09a31d25 Drop all the `@cm.__exit__()` override attempts..
None of it worked (you still will see `.__exit__()` frames on debugger
entry - you'd think this would have been solved by now but, shrug) so
instead wrap the debugger entry-point in a `try:` and put the SIGINT
handler restoration inside `MultiActorPdb` teardown hooks.

This seems to restore the UX as it was prior but with also giving the
desired SIGINT override handler behaviour.
2022-04-17 13:15:33 -04:00
Tyler Goodlet 9a1dadecff Try overriding `_GeneratorContextManager.__exit__()`; didn't work..
Using either of `@pdb.hideframe` or `__tracebackhide__` on stdlib
methods doesn't seem to work either.. This all seems to have something
to do with async generator usage I think ?
2022-04-17 13:15:33 -04:00
Tyler Goodlet 861884e075 Fix example name typo 2022-04-17 13:15:33 -04:00
Tyler Goodlet e204f858ac Handle a context cancel? Might be a noop 2022-04-17 13:15:33 -04:00
Tyler Goodlet 36e92b9faf Add a pre-started breakpoint example 2022-04-17 13:15:33 -04:00
Tyler Goodlet 742e004810 Make `mypy` happy 2022-04-17 13:15:33 -04:00
Tyler Goodlet ef7921ce11 Refine the handler for child vs. root cases
This gets very close to avoiding any possible hangs to do with tty
locking and SIGINT handling minus a special case that will be detailed
below.

Summary of implementation changes:

- convert `_mk_pdb()` -> `with _open_pdb() as pdb:` which implicitly
  handles the `bdb.BdbQuit` case such that debugger teardown hooks are
  always called.
- rename the handler to `shield_sigint()` and handle a variety of new
  cases:
  * the root is in debug but hasn't been cancelled -> call
    `Actor.cancel_soon()`
  * the root is in debug but *has* been called (`Actor.cancel_soon()`
    already called) -> raise KBI
  * a child is in debug *and* has a task locking the debugger -> ignore
    SIGINT in child *and* the root actor.
- if the debugger instance is provided to the handler at acquire time,
  on SIGINT handling completion re-print the last pdb++ REPL output so
  that the user realizes they are still actively in debug.
- ignore the unlock case where a race condition of "no task" holding the
  lock causes the `RuntimeError` normally associated with the "wrong
  task" doing so (not sure if this is a `trio` bug?).
- change debug logs to runtime level.

Unhandled case(s):

- a child is maybe in debug mode but does not itself have any task using
  the debugger.
    * ToDo: we need a way to decide what to do with
      "intermediate" child actors who themselves either are not in
      `debug_mode=True` but have children who *are* such that a SIGINT
      won't cause cancellation of that child-as-parent-of-another-child
      **iff** any of their children are in in debug mode.
2022-04-17 13:15:33 -04:00
Tyler Goodlet 542fe0372b (facepalm) Reraise `BdbQuit` and discard ownerless lock releases 2022-04-17 13:15:33 -04:00
Tyler Goodlet 3e9998ea83 Add WIP while-debugger-active SIGINT ignore handler 2022-04-17 13:15:33 -04:00
goodboy 71f19f217d
Merge pull request #305 from goodboy/name_query
Add `tractor.query_actor()` an addr looker-upper
2022-04-13 09:19:26 -04:00
Tyler Goodlet 8901272854 Fix typing 2022-04-13 08:20:53 -04:00
Tyler Goodlet 7c151bed48 Add nooz 2022-04-13 08:18:11 -04:00
Tyler Goodlet 80897a8f2b Add `tractor.query_actor()` an addr looker-upper
Sometimes it's handy to just have a non-`Portal` yielding way
to figure out if a "service" actor is up, so add this discovery
helper for that. We'll prolly just leave it undocumented for
now until we figure out a longer-term/better discovery system.
2022-04-13 07:50:42 -04:00
goodboy 62983684d1
Merge pull request #308 from goodboy/sort_subs_results_infected_aio
Sort `.subscribe()` results before comparison in test
2022-04-12 20:06:55 -04:00
Tyler Goodlet 1c63bb6130 Sort fan out results before comparison in test 2022-04-12 19:49:36 -04:00
goodboy bfe99f29b8
Merge pull request #304 from goodboy/aio_explicit_task_cancels
`LinkedTaskChannel.subscribe()`, explicit `asyncio` task cancel logging, `test_trioisms.py`
2022-04-12 17:27:29 -04:00
Tyler Goodlet 9c27858aaf WIP prints to debug frickin windows 2022-04-12 16:48:50 -04:00
Tyler Goodlet 597ae4b690 Add nooz file 2022-04-12 15:59:33 -04:00
Tyler Goodlet fa354ffe2b Handle not all values pulled case 2022-04-12 15:51:06 -04:00
Tyler Goodlet 333fad8819 Facepalm: join nursery first to avoid channel-closed-too-early 2022-04-12 15:06:35 -04:00