Keep an actor local (bool) flag which determines if there is already
a running debugger instance for the current process. If another task
tries to enter in this case, simply ignore it since allowing entry may
result in a deadlock where the new task will be sync waiting on the
parent stdio lock (a case that will never arrive due to the current
debugger's active use of it).
In the future we may want to allow FIFO queueing of local tasks where
instead of ignoring re-entrant breakpoints we allow tasks to async wait
for debugger release, though not sure the implications of that since
you'd likely want to support switching the debugger to the new task and
that could cause deadlocks where tasks are inter-dependent. It may be
more sane to just error on multiple breakpoint requests within an actor.
This is the first step in addressing #113 and the initial support
of #130. Basically this allows (sub)processes to engage the `pdbpp`
debug machinery which read/writes the root actor's tty but only in
a FIFO semaphored way such that no two processes are using it
simultaneously. That means you can have multiple actors enter a trace or
crash and run the debugger in a sensible way without clobbering each
other's access to stdio. It required adding some "tear down hooks" to
a custom `pdbpp.Pdb` type such that we release a child's lock on the
parent on debugger exit (in this case when either of the "continue" or
"quit" commands are issued to the debugger console).
There's some code left commented in anticipation of full support for
issue #130 where we're need to actually capture and feed stdin to the
target (remote) actor which won't necessarily being running on the same
host.
Allow entering and attaching to a `pdb` instance in a child process.
The current hackery is to have the child make an rpc to the parent and
ask it to hijack stdin, once complete the child enters a `pdb` blocking
method. The parent then relays all stdin input to the child thus
controlling the "remote" debugger.
A few things were added to accomplish this:
- tracking the mapping of subactors to their parent nurseries
- in the root actor, cancelling all nurseries under the root `trio` task
on cancellation (i.e. `Actor.cancel()`)
- pass a "runtime vars" map down the actor tree for propagating global state
In an effort acquire more deterministic actor cancellation,
this adds a clearer and more resilient (whilst possibly a bit
slower) internal nursery structure with explicit semantics for
clarifying the task-scope shutdown sequence.
Namely, on cancellation, the explicit steps are now:
- cancel all currently running rpc tasks and wait
for them to complete
- cancel the channel server and wait for it to complete
- cancel the msg loop for the channel with the immediate parent
- de-register with arbiter if possible
- wait on remaining connections to release
- exit process
To accomplish this add a new nursery called the "service nursery" which
spawns all rpc tasks **instead of using** the "root nursery". The root
is now used solely for async launching the msg loop for the primary
channel with the parent such that it is (nearly) the last thing torn
down on cancellation.
In the future it should also be possible to have `self.cancel()` return
a result to the parent once the runtime is sure that the rest of the
shutdown is atomic; this would allow for a true unbounded shield in
`Portal.cancel_actor()`. This will likely require that the error
handling blocks in `Actor._async_main()` are moved "inside" the root
nursery block such that the msg loop with the parent truly is the last
thing to terminate.
Always shield waiting for he process and always run
``trio.Process.__aexit__()`` on teardown. This enforces
that shutdown happens to due cancellation triggered inside
the sub-actor instead of the process being killed externally
by the parent.
The real issue is if the root nursery gets cancelled prior to
de-registration with the arbiter. This doesn't seem easy to
reproduce by side effect of a KBI however that is how it was
discovered in practise.
There was code from the last de-registration fix PR that I had commented
(to do with shielding arbiter dereg steps in `Actor._async_main()`) because
the block didn't seem to make a difference under infinite streaming
tests. Turns out it **for sure** is needed under certain conditions (likely
if the actor's root nursery is cancelled prior to actor nursery exit).
This was an attempt to simulate the failure mode if you manually close the
stream **before** cancelling the containing **actor**.
More tests to come I guess.