It's definitely possible to have a nursery spawn task be cancelled
before a `trio.Process` handle is ever returned; we now handle this
case as a cancelled-during-spawn scenario. Zombie collection logic
also is bypassed in this case.
This is actually surprisingly easy to grok having gone through a lot of
pain understanding edge cases in the zombie lord dev branch. Basically
we just need to make sure actors are managed in a 2 step reap sequence.
In the "soft" reap phase we wait for the process to terminate on its own
concurrently with (maybe) waiting for its portal's final result (if it's
a `.run_in_actor()`). If this path is cancelled or errors, then we do
a "hard" reap where we timeout and send a signal to the proc to
terminate immediately. The only last remaining trick is to tie in the
root-is-debugger-aware logic to yet again avoid tty clobbers.
If the root calls `trio.Process.kill()` on immediate child proc teardown
when the child is using pdb, we can get stdstreams clobbering that
results in a pdb++ repl where the user can't see what's been typed. Not
killing such children on cancellation / error seems to resolve this
issue whilst still giving reliable termination. For now, code that
special path until a time it becomes a problem for ensuring zombie
reaps.
It's clear now that special attention is needed to handle the case where
a spawned `multiprocessing` proc is started but then the parent is
cancelled before the child can connect back; in this case we need to be
sure to kill the near-zombie child asap. This may end up being the
solution to other resiliency issues seen around mp with nested process
trees too. More testing is needed to be sure.
Relates to #84#89#134#146
For reliable remote cancellation we need to "report" `trio.Cancelled`s
(just like any other error) when exhausting a portal such that the
caller can make decisions about cancelling the respective actor if need
be.
Resolves#156
Allow entering and attaching to a `pdb` instance in a child process.
The current hackery is to have the child make an rpc to the parent and
ask it to hijack stdin, once complete the child enters a `pdb` blocking
method. The parent then relays all stdin input to the child thus
controlling the "remote" debugger.
A few things were added to accomplish this:
- tracking the mapping of subactors to their parent nurseries
- in the root actor, cancelling all nurseries under the root `trio` task
on cancellation (i.e. `Actor.cancel()`)
- pass a "runtime vars" map down the actor tree for propagating global state
Always shield waiting for he process and always run
``trio.Process.__aexit__()`` on teardown. This enforces
that shutdown happens to due cancellation triggered inside
the sub-actor instead of the process being killed externally
by the parent.
Trio will kill subprocesses via `Process.__aexit__()` using a `finally:`
block (which, yes, will get triggered on cancellation) so we avoid that
until true process "tear down" since subactors do many things during
graceful shutdown (such as de-registering from the name discovery
system). Oddly this only seems to be an issue during cancellation of
infinite stream consumption.
Resolves#141
In order to have reliable subactor startup we need the following
sequence to take place:
- connect to the parent actor, handshake and receive runtime state
- load exposed modules into memory
- start the channel server up fully using the provided bind address
- finally, start processing new messages from the parent
Add a bunch more comments to clarify all this.
Using the context manager interface does some extra teardown beyond simply
calling `.wait()`. Pass the subactor's "uid" on the exec line for
debugging purposes when monitoring the process tree from the OS.
Hard code the child script module path to avoid a double import warning.
This is an initial solution for #120.
Allow spawning `asyncio` based actors which run `trio` in guest
mode. This enables spawning `tractor` actors on top of the `asyncio`
event loop whilst still leveraging the SC focused internal actor
supervision machinery. Add a `tractor.to_syncio.run()` api to allow
spawning tasks on the `asyncio` loop from an embedded (remote) `trio`
task and return or stream results all the way back through the `tractor`
IPC system using a very similar api to portals.
One outstanding problem is getting SC around calls to
`asyncio.create_task()`. Currently a task that crashes isn't able to
easily relay the error to the embedded `trio` task without us fully
enforcing the portals based message protocol (which seems superfluous
given the error ref is in process). Further experiments using `anyio`
task groups may alleviate this.
Instead of hackery trying to map modules manually from the filesystem
let Python do all the work by simply copying what ``multiprocessing``
does to "fixup the __main__ module" in spawned subprocesses. The new
private module ``_mp_fixup_main.py`` is simply cherry picked code from
``multiprocessing.spawn`` which does just that. We only need these
"fixups" when using a backend other then ``multiprocessing``; for
now just when using ``trio_run_in_process``.
Thanks to @salotz for pointing out that the first example in the docs
was broken. Though it's somewhat embarrassing this might also explain
the problem in #79 and certain issues in #59...
The solution here is to import the target RPC module using the its
unique basename and absolute filepath in the sub-actor that requires it.
Special handling for `__main__` and `__mp_main__` is needed since the
spawned subprocess will have no knowledge about these parent-
-state-specific module variables. Solution: map the modules name to the
respective module file basename in the child process since the module
variables will of course have different values in children.