The stdlib insists on creating multiple forkservers and semaphore trackers
for each sub-sub-process launched. This isn't ideal since it costs each
`tractor` sub-actor an additional 2 more processes then necessary and is
confusing when viewed as a process tree (eg. via `pstree`).
The majority of the change is simply avoiding the call to
`forkserver.ensure_running()` and `semaphore_tracker.ensure_running()`
in `ForkServer.connect_new_process()` and instead treating the user like
an adult and expecting those calls to be made *once* in the parent most
process (i.e. what `multiprocessing` calls the `MainProcess`).
Really a proper patch should be made against cpython which allows for
similar manual management of the server along with a mechanism to communicate
forkserver and semaphore tracker fd info to sub-processes such that
further calls to `Process.start()` work as expected.
Relates to #6
Stop worrying about a "main task" in each actor and instead add an
additional `ActorNursery.run_in_actor()` method which wraps calls
to create an actor and run a lone RPC task inside it. Note this
adjusts the public API of `ActorNursery.start_actor()` to drop
its `main` kwarg.
The dirty deats of making this possible:
- each spawned RPC task is now tracked with a specific cancel scope such
that when the actor is cancelled all ongoing responders are cancelled
before any IPC/channel machinery is closed (turns out that spawning
new actors from `outlive_main=True` actors was probably borked before
finally getting this working).
- make each initial RPC response be a packet which describes the
`functype` (eg. `{'functype': 'asyncfunction'}`) allowing for async
calls/submissions by client actors (this was required to make
`run_in_actor()` work - `Portal._submit()` is the new async method).
- hooray we can stop faking "main task" results for daemon actors
- add better handling/raising of internal errors caught in the bowels of
the `Actor` itself.
- drop the rpc spawning nursery; just use the `Actor._root_nursery`
- only wait on `_no_more_peers` if there are existing peer channels that
are actually still connected.
- an `ActorNursery.__aexit__()` now implicitly waits on `Portal.result()` on close
for each `run_in_actor()` spawned actor.
- handle cancelling partial started actors which haven't yet connected
back to the parent
Resolves#24
Take @njsmith's advice and properly close actor invoked async generators
using `async_generator.aclosing()` instead of hacking it (as previous)
with a shielded cancel scope.
Cancellation requires that each actor cancel it's spawned subactors
before cancelling its own root (nursery's) cancel scope to avoid breaking
channel connections before kill commands (`Actor.cancel()`) have been sent
off to peers. To solve this, ensure each main task is cancelled to
completion first (which will guarantee that all actor nurseries have
completed their cancellation steps) before cancelling the actor's "core"
tasks under the "root" scope.
- steal from `trio` and add a `tractor_test` decorator
- use a random arbiter port to avoid conflicts with locally running
systems
- add all the (obviously) hilarious readme tests
- add a complex cancellation test which works with
`trio.move_on_after()`
Here is a bunch of code tightening to make sure cancellation works even
if recently spawned actors haven't fully started up and the parent is
cancelled.
The fixes include:
- passing the arbiter socket address to each actor
- ensure all spawned actors respect the spawner's log level
- handle process versus final `portal.result()` teardown in multiple
tasks such that if a proc dies before shipping a result we don't wait
- more detailed debug logging in teardown code paths
- don't store peer connected events in the same `dict` as the peer channels
- if necessary fake main task results on peer channel disconnect
- warn when a `trio.Cancelled` is what causes a nursery to bail
otherwise error
- store the subactor portal in the nursery for teardown purposes
- add dedicated `Portal.cancel_actor()` which acts as a "hard cancel"
and never blocks (indefinitely)
- add `Arbiter.unregister_actor()` it's more explicit what's being
requested
- Allow passing in a program-wide `loglevel`
- Add detailed debug logging particularly to do with channel msg processing
and connection handling
- Don't daemonize subprocesses for now as it prevents use of
sub-sub-actors (need to solve #6 first)
- Add a `Portal.close()` which just tells the remote actor to tear down
the channel (for now)
- Add a message to signal the remote `StopAsyncIteration` from an async
gen such that the client side terminates properly as well
- Make `Actor.cancel()` cancel the channel server first
- Actors *must* complete the arbiter registeration steps before moving
on with their main taks and rpc handling
- When delivering rpc responses (using the local per caller queue) use
the blocking interface (`trio.Queue.put()`) to get backpressure
- Properly detect an `partial` wrapped async generators in `_invoke`
Remove all the `piker` stuff and add some further checks including:
- main task result is returned correctly
- remote errors are raised locally
- remote async generator yields values locally
Fix quite a few little bugs:
- async gen func detection in `_invoke()`
- always cancel channel server on main task exit
- wait for remaining channel peers after unsub from arbiter
- return result from main task(s) all the way up to `tractor.run()`
Also add a `Portal.result()` for getting the final result(s) from the
actor's main task and fix up a bunch of docs.
Every actor now registers (and unregisters) with the arbiter at
startup/teardown. For now the registry is stored in a plain `dict` in
the arbiter's memory. This makes it possible to easily coordinate actors
started as plain Python processes or via `multiprocessing`.
A whole smörgåsbord of changes was required to accomplish this:
- factor handshake steps into a func
- track *every* channel connected to an actor including multiples to the
same remote peer (may want to optimize this later)
- handle `trio.ClosedStreamError` gracefully in the message loop
- add an `open_portal` asynccontextmanager which handles channel
creation, handshaking, and spawning a bg task for msg processing
- add a `start_actor()` for starting in-process actors directly
- add working `get_arbiter()` and `find_actor()` public routines
- `_main` now tries an anonymous channel connect to the stated
arbiter sockaddr and uses that to determine whether to crown itself
Fail gracefully (by "aborting") the same way `trio` does. This avoids
ugly sub-proc tracebacks thrown at the console. Unset the local actor
when `tractor._main` completes. Cancel all tasks for a peer channel on
disconnect.
When an error is raised inside a nursery block (in the local actor)
cancel all spawned children and ensure the error is properly
unsuppressed.
Also,
- change `invoke_cmd` to `send_cmd` and expect the caller to use
`result_from_q` (avoids implicit blocking for responses that might
never arrive)
- `nursery.start()` the channel server block such that we wait for the
underlying listener to spawn before making outbound connections
- cancel the channel server when an actor's main task completes
(given that `outlive_main == False`)
- raise subactor errors directly in the local actors's msg loop
- enforce that `treat_as_gen` async functions respond with a caller id
(`cid`) in each yield packet