You can always wrap a sync function in an async one and there seems to
be no good reason to support invoking them directly especially since
cancellation won't work without some thread hackery. If it's requested
we'll point users to `trio-parallel`.
Resolves#77
Add a sync method that can be used to cancel the current actor from
a synchronous context. This is useful in debugging situations where
sync debugger code may need to kill the process tree.
Also, make the internal "lifetime stack" a global var; easier to manage
from client code that may was to add callbacks prior to the actor
runtime being fully setup.
This begins the move to dropping support for `tractor.run()` which we
don't really need since the runtime is started (as it always has been)
from a new sub-task / nursery. Instead this introduces starting the
actor tree through a `open_root_actor()` async context manager which
we'll likely implicitly call (from the root) on the first use of an
actor nursery.
Drop `_actor._start_actor()` and factor its contents into this new api.
Make `run()` and `run_daemon()` use `open_root_actor()` until we decide
to remove them.
Relates to #168 and #177
This resolves and completes #69 allowing all RPC invocation APIs to pass
function references directly instead of explicit `str` names for the
target namespace and function (this is still done implicitly
underneath). This brings us closer to `trio`'s task running API as well
as acknowledges that any inter-host RPC system (and API) will likely
need to be implemented on top of local RPC primitives anyway. Even if
this ends up **not** being true we can always go to "function stubs" as
part of our IAC protocol or, add a new method to do explicit namespace
calls: `.run_from_module()` or whatever everyone votes on.
Resolves#69
Further, this commit drops `Actor.statespace` from the entire system
since a user can easily get this same functionality using module
level variables. Fix docs to match all these changes (luckily mostly
already done due to example scripts referencing).
Turns out this is a lower level issue in terms of the stdlib's default
`pdb.Pdb` settings and how they conflict with `trio`s cancellation and
KBI handling. The details are hashed out more thoroughly in
python-trio/trio#1155. Maybe we can get a fix in trio so things are
solved under our feet :)
The channel server should be torn down *before* the rpc
task/service nursery. Do this explicitly even in the root's main task
to avoid a strange hang I found in the pubsub tests. Start dropping
the `warnings.warn()` usage.
Add `Actor._cancel_called` and `._cancel_complete` making it possible to
determine whether the actor has started the cancellation sequence and
whether that sequence has fully completed. This allows for blocking in
internal machinery tasks as necessary. Also, always trigger the end of
ongoing rpc tasks even if the last task errors; there's no guarantee the
trio cancellation semantics will guarantee us a nice internal "state"
without this.
Every subactor in the tree now receives the socket (or whatever the
mailbox type ends up being) during startup and can call the new
`tractor._discovery.get_root()` function to get a portal to the current
root actor in their tree. The main reason for adding this atm is to
support nested child actors gaining access to the root's tty lock for
debugging.
Also, when a channel disconnects from a message loop, might as well kill
all its rpc tasks.
This is needed in order to avoid the deadlock condition where
a child actor is waiting on the root actor's tty lock but it's parent
(possibly the root) is waiting on it to terminate after sending a cancel
request. The solution is simple: create a cancel scope around the
request in the child and always cancel it when a cancel request from the
parent arrives.
This aids with tearing down resources **after** the crash handling and
debugger have completed. Leaving this internal for now but should
eventually get a public convenience function like
`tractor.context_stack()`.
This is the first step in addressing #113 and the initial support
of #130. Basically this allows (sub)processes to engage the `pdbpp`
debug machinery which read/writes the root actor's tty but only in
a FIFO semaphored way such that no two processes are using it
simultaneously. That means you can have multiple actors enter a trace or
crash and run the debugger in a sensible way without clobbering each
other's access to stdio. It required adding some "tear down hooks" to
a custom `pdbpp.Pdb` type such that we release a child's lock on the
parent on debugger exit (in this case when either of the "continue" or
"quit" commands are issued to the debugger console).
There's some code left commented in anticipation of full support for
issue #130 where we're need to actually capture and feed stdin to the
target (remote) actor which won't necessarily being running on the same
host.
Allow entering and attaching to a `pdb` instance in a child process.
The current hackery is to have the child make an rpc to the parent and
ask it to hijack stdin, once complete the child enters a `pdb` blocking
method. The parent then relays all stdin input to the child thus
controlling the "remote" debugger.
A few things were added to accomplish this:
- tracking the mapping of subactors to their parent nurseries
- in the root actor, cancelling all nurseries under the root `trio` task
on cancellation (i.e. `Actor.cancel()`)
- pass a "runtime vars" map down the actor tree for propagating global state
In an effort acquire more deterministic actor cancellation,
this adds a clearer and more resilient (whilst possibly a bit
slower) internal nursery structure with explicit semantics for
clarifying the task-scope shutdown sequence.
Namely, on cancellation, the explicit steps are now:
- cancel all currently running rpc tasks and wait
for them to complete
- cancel the channel server and wait for it to complete
- cancel the msg loop for the channel with the immediate parent
- de-register with arbiter if possible
- wait on remaining connections to release
- exit process
To accomplish this add a new nursery called the "service nursery" which
spawns all rpc tasks **instead of using** the "root nursery". The root
is now used solely for async launching the msg loop for the primary
channel with the parent such that it is (nearly) the last thing torn
down on cancellation.
In the future it should also be possible to have `self.cancel()` return
a result to the parent once the runtime is sure that the rest of the
shutdown is atomic; this would allow for a true unbounded shield in
`Portal.cancel_actor()`. This will likely require that the error
handling blocks in `Actor._async_main()` are moved "inside" the root
nursery block such that the msg loop with the parent truly is the last
thing to terminate.
In order to have reliable subactor startup we need the following
sequence to take place:
- connect to the parent actor, handshake and receive runtime state
- load exposed modules into memory
- start the channel server up fully using the provided bind address
- finally, start processing new messages from the parent
Add a bunch more comments to clarify all this.
Instead of hackery trying to map modules manually from the filesystem
let Python do all the work by simply copying what ``multiprocessing``
does to "fixup the __main__ module" in spawned subprocesses. The new
private module ``_mp_fixup_main.py`` is simply cherry picked code from
``multiprocessing.spawn`` which does just that. We only need these
"fixups" when using a backend other then ``multiprocessing``; for
now just when using ``trio_run_in_process``.
Thanks to @salotz for pointing out that the first example in the docs
was broken. Though it's somewhat embarrassing this might also explain
the problem in #79 and certain issues in #59...
The solution here is to import the target RPC module using the its
unique basename and absolute filepath in the sub-actor that requires it.
Special handling for `__main__` and `__mp_main__` is needed since the
spawned subprocess will have no knowledge about these parent-
-state-specific module variables. Solution: map the modules name to the
respective module file basename in the child process since the module
variables will of course have different values in children.
Set `trio-run-in-process` as the default on *nix systems and
`multiprocessing`'s spawn method on Windows. Enable overriding the
default choice using `tractor._spawn.try_set_start_method()`. Allows
for easy runs of the test suite using a user chosen backend.
Get a few more things working:
- fail reliably when remote module loading goes awry
- do a real hacky job of module loading using `sys.path` stuffsies
- we're still totally borked when trying to spin up and quickly cancel
a bunch of subactors...
It's a small move forward I guess.
`trio.MultiError` isn't an `Exception` (derived instead from
`BaseException`) so we have to specially catch it in the task
invocation machinery and ship it upwards (like regular errors)
since nurseries running in sub-actors can raise them.
Add `@tractor.stream` which must be used to denote non async generator
streaming functions which use the `tractor.Context` API to push values.
This enforces a more explicit denotation as well as allows enforcing the
declaration of the `ctx` argument in definitions.
This begins moving toward explicitly decorated "streaming functions"
instead of checking for a `ctx` arg in the signature.
- provide each context with its task's top level `trio.CancelScope`
such that tasks can cancel themselves explictly if needed via calling
`Context.cancel_scope()`
- make `Actor.cancel_task()` a private method (`_cancel_task()`) and
handle remote rpc calls specially such that the caller does not need
to provide the `chan` argument; non-primitive types can't be passed on
the wire and we don't want the client actor be require knowledge of
the channel instance the request is associated with. This also ties into
how we're tracking tasks right now (`Actor._rpc_tasks` is keyed by the
call id, a UUID, *plus* the channel).
- make `_do_handshake` a private actor method
- use UUID version 4
Add full support for using the "spawn" process starting method as per:
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
Add a `spawn_method` argument to `tractor.run()` for specifying the
desired method explicitly. By default use the "fastest" method available.
On *nix systems this is the original "forkserver" method.
This should be the solution to getting windows support!
Resolves#60
As mentioned in prior commits there's currently a bug in Python that
make async gens **not** task safe. Since this is the core cause of almost
all recent problems, instead implement our own async iterator derivative of
`trio.abc.ReceiveChannel` by wrapping a `trio._channel.MemoryReceiveChannel`.
This fits more natively with the memory channel API in ``trio`` and adds
potentially more flexibility for possible bidirectional inter-actor streaming
in the future.
Huge thanks to @oremanj and of course @njsmith for guidance on this one!
Enable cancelling specific tasks from a peer actor such that when
a actor task or the actor itself is cancelled, remotely spawned tasks
can also be cancelled. In much that same way that you'd expect a node
(task) in the `trio` task tree to cancel any subtasks, actors should
be able to cancel any tasks they spawn in separate processes.
To enable this:
- track rpc tasks in a flat dict keyed by (chan, cid)
- store a `is_complete` event to enable waiting on specific
tasks to complete
- allow for shielding the msg loop inside an internal cancel scope
if requested by the caller; there was an issue with `open_portal()`
where the channel would be torn down because the current task was
cancelled but we still need messaging to continue until the portal
block is exited
- throw an error if the arbiter tries to find itself for now
Instead of chan/cid, whenever a remote function defines a `ctx` argument
name deliver a `Context` instance to the function. This allows remote
funcs to provide async generator like streaming replies (and maybe more
later).
Additionally,
- load actor modules *after* establishing a connection to the spawning
parent to avoid crashing before the error can be reported upwards
- fix a bug to do with unpacking and raising local internal actor errors
from received messages
RPC module/function lookups should not cause the target actor to crash.
This change instead ships the error back to the calling actor allowing
for the remote actor to continue running depending on the caller's
error handling logic. Adds a new `ModuleNotExposed` error to accommodate.
I'm not sure how this ever worked but when a "fake" async gen
(i.e. function with special `chan`, `cid` kwargs) is completed
we need to signal the end of the stream just like with normal
async gens. Also don't fail when trying to remove tasks that were
never tracked.
Fixes#46
Use the new custom error types throughout the actor and portal
primitives and set a few new rules:
- internal errors are any error not raised by an rpc task and are
**not** forwarded to portals but instead are raised directly in
the msg loop.
- portals always re-raise a "main task" error for every call to
``Portal.result()``.
When an actor has already been registered with the arbiter it should
exist in the registry and thus the wait event should have been removed.
Check that the registry indeed holds an event before clearing it.
This is purely for documentation purposes for now as it should be
obvious a bunch of the signatures aren't using the correct "generics"
syntax (i.e. the use of `(str, int)` instead of `typing.Tuple[str, int])`)
in a bunch of places. We're also not using a type checker yet and besides,
`trio` doesn't really expose a lot of its internal types very well.
2SQASH
This ensures that internal errors received from a remote actor are
indeed raised even in the `MainProcess` **before** comms tasks are
cancelled. Internal error in this case means any error packet received
on a channel that doesn't have a `cid` header. RPC errors (which **do**
have a `cid` header) are still forwarded to the consuming caller as usual.
If an internal error is bubbled up from some sub-actor throw that error
into the `MainProcess` "main" async function / coro in order to trigger
nursery teardowns (i.e. cancellations) that need to be done.
I'll likely change this shortly back to where we run a "main task"
inside `actor._async_main()`...
Allows for waiting on another actor (by name) to register with the
arbiter. This makes synchronized actor spawning and consecutive task
coordination easier to accomplish from within sub-actors.
Resolves#31
This allows for registering more then one actor with the same "name"
when you have multiple actors fulfilling the same role. Eventually
we'll need support for looking up all actors registered under a given
"service name" (or whatever we decide to call it).
Also, a fix to the arbiter such that each new instance refers to a
separate `_registry` dict (found an issue with duplicate names during
testing).
Resolves#7
Start a forkserver once in the main (parent-most) process
and pass ipc info (fds) to subprocesses manually such that embedded
calls to `multiprocessing.Process.start()` just work. Note that this
relies on our overridden version of the stdlib's
`multiprocessing.forkserver` module.
Resolves#6