Since one-way streaming can be accomplished by just *not* sending on one
side (and/or thus wrapping such usage in a more restrictive API), we
just drop the recv-only parent type. The only method different was
`MsgStream.send()`, now merged in. Further in usage of `.subscribe()`
we monkey patch the underlying stream's `.send()` onto the delivered
broadcast receiver so that subscriber tasks can two-way stream as though
using the stream directly.
This allows us to more definitively drop `tractor.open_stream_from()` in
the longer run if we so choose as well; note currently this will
potentially create an issue if a caller tries to `.send()` on such a one
way stream.
Driven by a bug found in `piker` where we'd get an inf recursion error
due to `BroadcastReceiver.receive()` being called when consumer tasks
are awoken but no value is ready to `.nowait_receive()`.
This new rework takes an approach closer to the interface and internals
of `trio.MemoryReceiveChannel` particularly in terms of,
- implementing a `BroadcastReceiver.receive_nowait()` and using it
within the async `.receive()`.
- failing over to an internal `._receive_from_underlying()` when the
`_nowait()` call raises `trio.WouldBlock`.
- adding `BroadcastState.statistics()` for debugging and testing
dropping recursion from `.receive()`.
We weren't doing this originally I *think* just because of the path
dependent nature of the way the code was developed (originally being
mega pedantic about one-way vs. bidirectional streams) but, it doesn't
seem like there's any issue just calling the stream's `.aclose()`; also
have the benefit of just being less code and logic checks B)
When backpressure is used and a feeder mem chan breaks during msg
delivery (usually because the IPC allocating task already terminated)
instead of raising we simply warn as we do for the non-backpressure
case.
Also, add a proper `Actor.is_arbiter` test inside `._invoke()` to avoid
doing an arbiter-registry lookup if the current actor **is** the
registrar.
The stdlib has all sorts of muckery with ignoring SIGINT in the
`Pdb._cmdloop()` but here we just override all that since we don't trust
their decisions about cancellation handling whatsoever. Adds
a `Lock.repl: MultiActorPdb` attr which is set by any task which
acquires root TTY lock indicating (via actor global state) that the
current actor is using the debugger REPL and can be expected to re-draw
the prompt on SIGINT. Further we mask out log messages from any actor
who also has the `shield_sigint_handler()` enabled to avoid logging
noise when debugging.
If we pack the nursery parent task's error into the `errors` table
directly in the handler, we don't need to specially handle packing that
same error into any exception group raised while handling sub-actor
cancellation; drops some ugly indentation ;)
Pretty sure this is the final touch to alleviate all our debug lock
headaches! Instead of trying to revert to the "last" handler (as `pdb`
does internally in the stdlib) we always just revert to the handler
`trio` registers during startup. Further this seems to allow cancelling
the root-side locking task if it's detected as stale IFF we only do this
when the root actor is in a "no more IPC peers" state.
Deatz:
- (always) set `._debug.Lock._trio_handler` as the `trio` version, not
some last used handler to make sure we're getting the ctrl-c handling
we want when not in debug mode.
- assign the trio handler in `open_root_actor()`
`._runtime._async_main()` to be sure it's applied in subactors as well
as the root.
- only do debug lock blocking and root-side-locking-task cancels when
a "no peers" condition is detected in the root actor: i.e. no IPC
channels are detected by the root meaning it's impossible any actor
has a sane lock-state ongoing for debug mode.
This is a lingering debugger locking race case we needed to handle:
- child crashes acquires TTY lock in root and attaches to `pdb`
- child IPC goes down such that all channels to the root are broken
/ non-functional.
- root is stuck thinking the child is still in debug even though it
can't be contacted and the child actor machinery hasn't been
cancelled by its parent.
- root get's stuck in deadlock with child since it won't send a cancel
request until the child is finished debugging, but the child can't
unlock the debugger bc IPC is down.
To avoid this scenario add debug lock blocking list via
`._debug.Lock._blocked: set[tuple]` which holds actor uids for any actor
that is detected by the root as having no transport channel connections
with said root (of which at least one should exist if this sub-actor at
some point acquired the debug lock). The root consequently checks this
list for any actor that tries to (re)acquire the lock and blocks with
a `ContextCancelled`. When a debug condition is tested in
`._runtime._invoke` the context's `._enter_debugger_on_cancel` which
is set to `False` if the actor is on the block list in which case the
post-mortem entry is skipped.
Further this adds a root-locking-task side cancel scope to
`Lock._root_local_task_cs_in_debug` which can be cancelled by the root
runtime when a stale lock is detected after all IPC channels for the
actor have been torn down. NOTE: right now we're NOT doing this since it
seems to cause test failures likely due because it may cause pre-mature
cancellation and maybe needs a bit more experimenting?
In the case of a callee-side context cancelling itself it can be handy
to let the caller-side task know (even if through logging) that the
cancel was due to some known reason. Make `.cancel()` accept such
a message on the callee side and have it included in the
`._runtime._invoke()` raised `ContextCancelled` emission.
Also add a `Context._trigger_debugger_on_cancel: bool` flag which can be
set to `False` to avoid the debugger post-mortem crash mode from
engaging on cross-context tasks which cancel themselves for a known
reason (as is needed for blocked tasks in the debug TTY-lock machinery).
Turns out the lifetime mgmt of separate nurseries per delegate manager
is tricky; a new nursery can't be naively allocated on cache-misses since
it may get closed by some early terminating task instead of by the "last
using" consumer task. In theory if we allocate using the same logic as
that used for the last-task-triggers-exit then this should work?
For now just go back to a single global nursery per `_Cache` which still
avoids use of the internal actor service nursery.
Instead of sticking all `trionics.maybe_open_context()` tasks inside the
actor's (root) service nursery, open a unique one per manager function
instance (id).
Further, accept a callable for the `key` such that a user can have
more flexible control on the caching logic and move the
`maybe_open_nursery()` helper out of the portal mod and into this
trionics "managers" module.
Instead of the logic branching create a table `._spawn._methods`
which is used to lookup the desired backend framework (in this case
still only one of `multiprocessing` or `trio`) and make the top level
`.new_proc()` do the lookup and any common logic. Use a `typing.Literal`
to define the lookup table's key set.
Repair and ignore a bunch of type-annot related stuff todo with `mypy`
updates and backend-specific process typing.
The common logic to both remove our custom SIGINT handler as well
as signal the actor global event that pdb is complete. Call this
whenever we exit a post mortem call and thus any time some rpc task
get's debugged inside `._actor._invoke()`.
Further, we have to manually print the REPL prompt on 3.9 for some wack
reason, so stick a version guard in the sigint handler for that..
When in an uncertain teardown state and in debug mode a context can be
popped from actor runtime before a child finished debugging (the case
when the parent is tearing down but the child hasn't closed/completed
its tty lock IPC exit phase) and the child sends the "stop" message to
unlock the debugger but it's ignored bc the parent has already dropped
the ctx. Instead we call `._debug.maybe_wait_for_deugger()` before these
context removals to avoid the root getting stuck thinking the lock was
never released.
Further, add special `Actor._cancel_task()` handling code inside
`_invoke()` which continues to execute the method despite the IPC
channel to the caller being broken and thus avoiding potential hangs due
to a target (child) actor task remaining alive.
Ensure that even when `pdb` resumption methods are called during a crash
where `trio`'s runtime has already terminated (eg. `Event.set()` will
raise) we always revert our sigint handler to the original. Further
inside the handler if we hit a case where a child is in debug and
(thinks it) has the global pdb lock, if it has no IPC connection to
a parent, simply presume tty sync-coordination is now lost and cancel
the child immediately.
A hopefully significant fix here is to always avoid suppressing a SIGINT
when the root actor can not detect an active IPC connections (via
a connected channel) to the supposed debug lock holding actor. In that
case it is most likely that the actor has either terminated or has lost
its connection for debugger control and there is no way the root can
verify the lock is in use; thus we choose to allow KBI cancellation.
Drop the (by comment) `try`-`finally` block in
`_hijoack_stdin_for_child()` around the `_acquire_debug_lock()` call
since all that logic should now be handled internal to that locking
manager. Try to catch a weird error around the `.do_longlist()` method
call that seems to sometimes break on py3.10 and latest `pdbpp`.
The method now returns a `bool` which flags whether the transport died
to the caller and allows for reporting a disconnect in the
channel-transport handler task. This is something a user will normally
want to know about on the caller side especially after seeing
a traceback from the peer (if in tree) on console.
There's no point in sending a cancel message to the remote linked task
and especially no reason to block waiting on a result from that task if
the transport layer is detected to be disconnected. We expect that the
transport shouldn't go down at the layer of the message loop
(reconnection logic should be handled in the transport layer itself) so
if we detect the channel is not connected we don't bother requesting
cancels nor waiting on a final result message.
Why?
- if the connection goes down in error the caller side won't have a way
to know "how long" it should block to wait for a cancel ack or result
and causes a potential hang that may require an additional ctrl-c from
the user especially if using the debugger or if the traceback is not
seen on console.
- obviously there's no point in waiting for messages when there's no
transport to deliver them XD
Further, add some more detailed cancel logging detailing the task and
actor ids.
There's a bug that's triggered in the stdlib without latest `pdb++`
installed; add a note for that.
Further inside `wait_for_parent_stdin_hijack()` don't `.started()` until
the interactor stream has been opened to avoid races when debugging this
`._debug.py` module (at the least) since we usually don't want the
spawning (parent) task to resume until we know for sure the tty lock has
been acquired. Also, drop the random checkpoint we had inside
`_breakpoint()`, not sure it was actually adding anything useful since
we're (mostly) carefully shielded throughout this func.