This took a ton of tinkering and a rework of the actor nursery tear down
logic. The main changes include:
- each subprocess is now spawned from inside a trio task
from one of two containing nurseries created in the body of
`tractor.open_nursery()`: one for `run_in_actor()` processes and one for
`start_actor()` "daemons". This is to address the need for
`trio-run-in_process.open_in_process()` opening a nursery which must
be closed from the same task that opened it. Using this same approach
for `multiprocessing` seems to work well. The nurseries are waited in
order (rip actors then daemon actors) during tear down which allows
for avoiding the recursive re-entry of `ActorNursery.wait()` handled
prior.
- pull out all the nested functions / closures that were in
`ActorNursery.wait()` and move into the `_spawn` module such that
that process shutdown logic takes place in each containing task's
code path. This allows for vastly simplifying `.wait()` to just contain an
event trigger which initiates process waiting / result collection.
Likely `.wait()` should just be removed since it can no longer be used
to synchronously wait on the actor nursery.
- drop `ActorNursery.__aenter__()` / `.__atexit__()` and move this
"supervisor" tear down logic into the closing block of `open_nursery()`.
This not only cleans makes the code more comprehensible it also
makes our nursery implementation look more like the one in `trio`.
Resolves#93
Get a few more things working:
- fail reliably when remote module loading goes awry
- do a real hacky job of module loading using `sys.path` stuffsies
- we're still totally borked when trying to spin up and quickly cancel
a bunch of subactors...
It's a small move forward I guess.
If a nursery fails to cancel (some sub-actors presumably) then hard kill
the whole process tree to avoid hangs during a catastrophic failure.
This logic may get factored out (and changed) as we introduce custom
supervisor strategies.
At the expense of a bit more complexity in `ActorNursery.wait()`
(which I commented the heck out of fwiw) this adds far superior and
correct cancellation semantics for when a nursery is cancelled due
to (remote) errors in subactors.
This includes:
- `wait()` will now raise a `trio.MultiError` if multiple subactors
error with the same semantics as in `trio`.
- in `wait()` portals which are paired with `run_in_actor()`
spawned subactors (versus `start_actor()`) are waited on separately
and if the nursery **hasn't** been cancelled but there are errors
those are raised immediately before waiting on `start_actor()`
subactors which will block indefinitely if they haven't been
explicitly cancelled.
- if `wait()` does raise when the nursery hasn't yet been cancelled
it's expected that it will be called again depending on the actor
supervision strategy (i.e. right now we operate with a one-cancels-all
strategy, the same as `trio`, so `ActorNursery.__aexit__() calls
`cancel()` if any error is raised by `wait()`).
Oh and I added `is_main_process()` helper; can't remember why..
This is purely for documentation purposes for now as it should be
obvious a bunch of the signatures aren't using the correct "generics"
syntax (i.e. the use of `(str, int)` instead of `typing.Tuple[str, int])`)
in a bunch of places. We're also not using a type checker yet and besides,
`trio` doesn't really expose a lot of its internal types very well.
2SQASH
Start a forkserver once in the main (parent-most) process
and pass ipc info (fds) to subprocesses manually such that embedded
calls to `multiprocessing.Process.start()` just work. Note that this
relies on our overridden version of the stdlib's
`multiprocessing.forkserver` module.
Resolves#6
Stop worrying about a "main task" in each actor and instead add an
additional `ActorNursery.run_in_actor()` method which wraps calls
to create an actor and run a lone RPC task inside it. Note this
adjusts the public API of `ActorNursery.start_actor()` to drop
its `main` kwarg.
The dirty deats of making this possible:
- each spawned RPC task is now tracked with a specific cancel scope such
that when the actor is cancelled all ongoing responders are cancelled
before any IPC/channel machinery is closed (turns out that spawning
new actors from `outlive_main=True` actors was probably borked before
finally getting this working).
- make each initial RPC response be a packet which describes the
`functype` (eg. `{'functype': 'asyncfunction'}`) allowing for async
calls/submissions by client actors (this was required to make
`run_in_actor()` work - `Portal._submit()` is the new async method).
- hooray we can stop faking "main task" results for daemon actors
- add better handling/raising of internal errors caught in the bowels of
the `Actor` itself.
- drop the rpc spawning nursery; just use the `Actor._root_nursery`
- only wait on `_no_more_peers` if there are existing peer channels that
are actually still connected.
- an `ActorNursery.__aexit__()` now implicitly waits on `Portal.result()` on close
for each `run_in_actor()` spawned actor.
- handle cancelling partial started actors which haven't yet connected
back to the parent
Resolves#24