In order to have reliable subactor startup we need the following
sequence to take place:
- connect to the parent actor, handshake and receive runtime state
- load exposed modules into memory
- start the channel server up fully using the provided bind address
- finally, start processing new messages from the parent
Add a bunch more comments to clarify all this.
Instead of hackery trying to map modules manually from the filesystem
let Python do all the work by simply copying what ``multiprocessing``
does to "fixup the __main__ module" in spawned subprocesses. The new
private module ``_mp_fixup_main.py`` is simply cherry picked code from
``multiprocessing.spawn`` which does just that. We only need these
"fixups" when using a backend other then ``multiprocessing``; for
now just when using ``trio_run_in_process``.
Thanks to @salotz for pointing out that the first example in the docs
was broken. Though it's somewhat embarrassing this might also explain
the problem in #79 and certain issues in #59...
The solution here is to import the target RPC module using the its
unique basename and absolute filepath in the sub-actor that requires it.
Special handling for `__main__` and `__mp_main__` is needed since the
spawned subprocess will have no knowledge about these parent-
-state-specific module variables. Solution: map the modules name to the
respective module file basename in the child process since the module
variables will of course have different values in children.
Set `trio-run-in-process` as the default on *nix systems and
`multiprocessing`'s spawn method on Windows. Enable overriding the
default choice using `tractor._spawn.try_set_start_method()`. Allows
for easy runs of the test suite using a user chosen backend.
Get a few more things working:
- fail reliably when remote module loading goes awry
- do a real hacky job of module loading using `sys.path` stuffsies
- we're still totally borked when trying to spin up and quickly cancel
a bunch of subactors...
It's a small move forward I guess.
`trio.MultiError` isn't an `Exception` (derived instead from
`BaseException`) so we have to specially catch it in the task
invocation machinery and ship it upwards (like regular errors)
since nurseries running in sub-actors can raise them.
Add `@tractor.stream` which must be used to denote non async generator
streaming functions which use the `tractor.Context` API to push values.
This enforces a more explicit denotation as well as allows enforcing the
declaration of the `ctx` argument in definitions.
This begins moving toward explicitly decorated "streaming functions"
instead of checking for a `ctx` arg in the signature.
- provide each context with its task's top level `trio.CancelScope`
such that tasks can cancel themselves explictly if needed via calling
`Context.cancel_scope()`
- make `Actor.cancel_task()` a private method (`_cancel_task()`) and
handle remote rpc calls specially such that the caller does not need
to provide the `chan` argument; non-primitive types can't be passed on
the wire and we don't want the client actor be require knowledge of
the channel instance the request is associated with. This also ties into
how we're tracking tasks right now (`Actor._rpc_tasks` is keyed by the
call id, a UUID, *plus* the channel).
- make `_do_handshake` a private actor method
- use UUID version 4
Add full support for using the "spawn" process starting method as per:
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
Add a `spawn_method` argument to `tractor.run()` for specifying the
desired method explicitly. By default use the "fastest" method available.
On *nix systems this is the original "forkserver" method.
This should be the solution to getting windows support!
Resolves#60
As mentioned in prior commits there's currently a bug in Python that
make async gens **not** task safe. Since this is the core cause of almost
all recent problems, instead implement our own async iterator derivative of
`trio.abc.ReceiveChannel` by wrapping a `trio._channel.MemoryReceiveChannel`.
This fits more natively with the memory channel API in ``trio`` and adds
potentially more flexibility for possible bidirectional inter-actor streaming
in the future.
Huge thanks to @oremanj and of course @njsmith for guidance on this one!
Enable cancelling specific tasks from a peer actor such that when
a actor task or the actor itself is cancelled, remotely spawned tasks
can also be cancelled. In much that same way that you'd expect a node
(task) in the `trio` task tree to cancel any subtasks, actors should
be able to cancel any tasks they spawn in separate processes.
To enable this:
- track rpc tasks in a flat dict keyed by (chan, cid)
- store a `is_complete` event to enable waiting on specific
tasks to complete
- allow for shielding the msg loop inside an internal cancel scope
if requested by the caller; there was an issue with `open_portal()`
where the channel would be torn down because the current task was
cancelled but we still need messaging to continue until the portal
block is exited
- throw an error if the arbiter tries to find itself for now
Instead of chan/cid, whenever a remote function defines a `ctx` argument
name deliver a `Context` instance to the function. This allows remote
funcs to provide async generator like streaming replies (and maybe more
later).
Additionally,
- load actor modules *after* establishing a connection to the spawning
parent to avoid crashing before the error can be reported upwards
- fix a bug to do with unpacking and raising local internal actor errors
from received messages
RPC module/function lookups should not cause the target actor to crash.
This change instead ships the error back to the calling actor allowing
for the remote actor to continue running depending on the caller's
error handling logic. Adds a new `ModuleNotExposed` error to accommodate.
I'm not sure how this ever worked but when a "fake" async gen
(i.e. function with special `chan`, `cid` kwargs) is completed
we need to signal the end of the stream just like with normal
async gens. Also don't fail when trying to remove tasks that were
never tracked.
Fixes#46
Use the new custom error types throughout the actor and portal
primitives and set a few new rules:
- internal errors are any error not raised by an rpc task and are
**not** forwarded to portals but instead are raised directly in
the msg loop.
- portals always re-raise a "main task" error for every call to
``Portal.result()``.
When an actor has already been registered with the arbiter it should
exist in the registry and thus the wait event should have been removed.
Check that the registry indeed holds an event before clearing it.
This is purely for documentation purposes for now as it should be
obvious a bunch of the signatures aren't using the correct "generics"
syntax (i.e. the use of `(str, int)` instead of `typing.Tuple[str, int])`)
in a bunch of places. We're also not using a type checker yet and besides,
`trio` doesn't really expose a lot of its internal types very well.
2SQASH
This ensures that internal errors received from a remote actor are
indeed raised even in the `MainProcess` **before** comms tasks are
cancelled. Internal error in this case means any error packet received
on a channel that doesn't have a `cid` header. RPC errors (which **do**
have a `cid` header) are still forwarded to the consuming caller as usual.
If an internal error is bubbled up from some sub-actor throw that error
into the `MainProcess` "main" async function / coro in order to trigger
nursery teardowns (i.e. cancellations) that need to be done.
I'll likely change this shortly back to where we run a "main task"
inside `actor._async_main()`...
Allows for waiting on another actor (by name) to register with the
arbiter. This makes synchronized actor spawning and consecutive task
coordination easier to accomplish from within sub-actors.
Resolves#31
This allows for registering more then one actor with the same "name"
when you have multiple actors fulfilling the same role. Eventually
we'll need support for looking up all actors registered under a given
"service name" (or whatever we decide to call it).
Also, a fix to the arbiter such that each new instance refers to a
separate `_registry` dict (found an issue with duplicate names during
testing).
Resolves#7
Start a forkserver once in the main (parent-most) process
and pass ipc info (fds) to subprocesses manually such that embedded
calls to `multiprocessing.Process.start()` just work. Note that this
relies on our overridden version of the stdlib's
`multiprocessing.forkserver` module.
Resolves#6
Stop worrying about a "main task" in each actor and instead add an
additional `ActorNursery.run_in_actor()` method which wraps calls
to create an actor and run a lone RPC task inside it. Note this
adjusts the public API of `ActorNursery.start_actor()` to drop
its `main` kwarg.
The dirty deats of making this possible:
- each spawned RPC task is now tracked with a specific cancel scope such
that when the actor is cancelled all ongoing responders are cancelled
before any IPC/channel machinery is closed (turns out that spawning
new actors from `outlive_main=True` actors was probably borked before
finally getting this working).
- make each initial RPC response be a packet which describes the
`functype` (eg. `{'functype': 'asyncfunction'}`) allowing for async
calls/submissions by client actors (this was required to make
`run_in_actor()` work - `Portal._submit()` is the new async method).
- hooray we can stop faking "main task" results for daemon actors
- add better handling/raising of internal errors caught in the bowels of
the `Actor` itself.
- drop the rpc spawning nursery; just use the `Actor._root_nursery`
- only wait on `_no_more_peers` if there are existing peer channels that
are actually still connected.
- an `ActorNursery.__aexit__()` now implicitly waits on `Portal.result()` on close
for each `run_in_actor()` spawned actor.
- handle cancelling partial started actors which haven't yet connected
back to the parent
Resolves#24