This commit obviously denotes a re-license of all applicable parts of
the code base. Acknowledgement of this change was completed in #274 by
the majority of the current set of contributors. From here henceforth
all changes will be AGPL licensed and distributed. This is purely an
effort to maintain the same copy-left policy whilst closing the
(perceived) SaaS loophole the GPL allows for. It is merely for this
loophole: to avoid code hiding by any potential "network providers" who
are attempting to use the project to make a profit without either
compensating the authors or re-distributing their changes.
I thought quite a bit about this change and can't see a reason not to
close the SaaS loophole in our current license. We still are (hard)
copy-left and I plan to keep the code base this way for a couple
reasons:
- The code base produces income/profit through parent projects and is
demonstrably of high value.
- I believe firms should not get free lunch for the sake of
"contributions from their employees" or "usage as a service" which
I have found to be a dubious argument at best.
- If a firm who intends to profit from the code base wants to use it
they can propose a secondary commercial license to purchase with the
proceeds going to the project's authors under some form of well
defined contract.
- Many successful projects like Qt use this model; I see no reason it
can't work in this case until such a time as the authors feel it
should be loosened.
There has been detailed discussion in #103 on licensing alternatives.
The main point of this AGPL change is to protect the code base for the
time being from exploitation while it grows and as we move into the next
phase of development which will include extension into the multi-host
distributed software space.
This more formally declares the runtime's remote task startingn API
and uses it throughout all the dependent `Portal` API methods.
Allows dropping `Portal._submit()` and simplifying `.run_in_actor()`
style result waiting to be delegated to the context APIs at remote
task `return` response time. We now also track the remote entrypoint
"type` as `Context._remote_func_type`.
Instead of tracking feeder mem chans per RPC dialog, store `Context`
instances which (now) hold refs to the underlying RPC-task feeder chans
and track them inside a `Actor._contexts` map. This begins a transition
to making the "context" idea the primitive abstraction for representing
messaging dialogs between tasks in different memory domains (i.e.
usually separate processes).
A slew of changes made this possible:
- change `Actor.get_memchans()` -> `.get_context()`.
- Add new `Context._send_chan` and `._recv_chan` vars.
- implicitly create a new context on every `Actor.send_cmd()` call.
- use the context created by `.send_cmd()` in `Portal.open_context()`
instead of manually creating one.
- call `Actor.get_context()` inside tasks run from `._invoke()`
such that feeder chans are implicitly created for callee tasks
thus fixing the bug #265.
NB: We might change some of the internal semantics to do with *when* the
feeder chans are actually created to denote whether or not a far end
task is actually *read to receive* messages. For example, in the cases
where it **never** will be ready to receive messages (one-way streaming,
a context that never opens a stream, etc.) we will likely want some kind
of error or at least warning to the caller that messages can't be sent
(yet).
We don't need to any more presuming you get ideal remote cancellation
conditions where the remote actor should teardown and kill the streams
from its end.
Thanks to @richardsheridan for pointing out the limitations of using
*any* kind of value as the result-cached-flag and how it might cause
problems for anyone returning pickled blob-data. This changes the
`Portal` internal result value tracking to stash the full message from
which the value can be retrieved by any `Portal.result()` caller.
The internal change is that `Portal._return_once()` now returns a tuple
of the message *and* its value.
Fixes the issue where if the main remote task returns `None`,
`Portal.result()` would erroneously wait again on the underlying feeder
mem chan since `None` was being used as the cache flag. Instead set the
flag as the channel uid and consider the result collected when set to
anything else (since it would be odd to return that value from a remote
task when you already can read it as part of portal/channel apis).
Now that we're on our way to a (somewhat) serious beta release I think
it's about time to start de-noising the logging emissions. Since we're
trying out this approach of "stack layer oriented" log levels, I figured
this is a good time to move most of the "warnings" to what they should
be: cancellation monitoring status messages. The level is set to 16
which is just above our "runtime" level but just below the traditional
"info" level. I think this will be a decent approach since usually if
you're confused about why your `tractor` app is behaving unlike you
expect, it's 90% of the time going to be to do with cancellation or
error propagation. This this setup a user can specify the 'cancel' level
and see all the msgs pertaining to both actor and task-in-actor
cancellation mechanics.
Revert this change since it really is poking at internals and doesn't
make a lot of sense. If the context is going to be cancelled then the
msg loop will tear down the feed memory channel when ready, we don't
need to be clobbering it and confusing the runtime machinery lol.
Add clear teardown semantics for `Context` such that the remote side
cancellation propagation happens only on error or if client code
explicitly requests it (either by exit flag to `Portal.open_context()`
or by manually calling `Context.cancel()`). Add `Context.result()`
to wait on and capture the final result from a remote context function;
any lingering msg sequence will be consumed/discarded.
Changes in order to make this possible:
- pass the runtime msg loop's feeder receive channel in to the context
on the calling (portal opening) side such that a final 'return' msg
can be waited upon using `Context.result()` which delivers the final
return value from the callee side `@tractor.context` async function.
- always await a final result from the target context function in
`Portal.open_context()`'s `__aexit__()` if the context has not
been (requested to be) cancelled by client code on block exit.
- add an internal `Context._cancel_called` for context "cancel
requested" tracking (much like `trio`'s cancel scope).
- allow flagging a stream as terminated using an internal
`._eoc` flag which will mark the stream as stopped for iteration.
- drop `StopAsyncIteration` catching in `.receive()`; it does
nothing.
This mostly adds the api described in
https://github.com/goodboy/tractor/issues/53#issuecomment-806258798
The first draft summary:
- formalize bidir steaming using the `trio.Channel` style interface
which we derive as a `MsgStream` type.
- add `Portal.open_context()` which provides a `trio.Nursery.start()`
remote task invocation style for setting up and tearing down tasks
contexts in remote actors.
- add a distinct `'started'` message to the ipc protocol to facilitate
`Context.start()` with a first return value.
- for our `ReceiveMsgStream` type, don't cancel the remote task in
`.aclose()`; this is now done explicitly by the surrounding `Context`
usage: `Context.cancel()`.
- streams in either direction still use a `'yield'` message keeping the
proto mostly symmetric without having to worry about which side is the
caller / portal opener.
- subtlety: only allow sending a `'stop'` message during a 2-way
streaming context from `ReceiveStream.aclose()`, detailed comment
with explanation is included.
Relates to #53
NB: this is a breaking change removing support for `Portal.run()` being
able to invoke remote streaming functions and instead replacing the
method call with an async context manager api `Portal.open_stream_from()`
This style explicitly defines stream teardown at the call site instead
of expecting the user to handle tricky things correctly themselves: eg.
`async_geneartor.aclosing()`. Going forward `Portal.run()` can be used
only for invoking async functions.
Move receive stream into streaming modules and rebrand as a "message
stream". Factor out cancellation mechanics in `.aclose()` into the
`Context` type which will soon provide the api for for cancelling portal
invocations. Comment-stage a few methods on both types in anticipation
of a new bi-directional streaming api. Add a `MsgStream` bidirectional
channel type which will be the eventual type yielded from
`Context.open_stream()`. Adjust the response/dialog types to be the set
`{'asyncfun', 'asyncgen', 'context'}`. OH, and add async func checking
in `Portal.run()` to catch and error on sync funcs early.
It turns out in order to maintain our sneaky little "call an `Actor`
method in this remote process" we still need the ability to invoke
functions from a namespace. We're currently using a "self" namespace as
a way to do this for internal inter-process method calling. Either way,
I see no reason not to keep a public method for this invoke style (we
just won't market it) since it is still how the machinery works
underneath.
This resolves and completes #69 allowing all RPC invocation APIs to pass
function references directly instead of explicit `str` names for the
target namespace and function (this is still done implicitly
underneath). This brings us closer to `trio`'s task running API as well
as acknowledges that any inter-host RPC system (and API) will likely
need to be implemented on top of local RPC primitives anyway. Even if
this ends up **not** being true we can always go to "function stubs" as
part of our IAC protocol or, add a new method to do explicit namespace
calls: `.run_from_module()` or whatever everyone votes on.
Resolves#69
Further, this commit drops `Actor.statespace` from the entire system
since a user can easily get this same functionality using module
level variables. Fix docs to match all these changes (luckily mostly
already done due to example scripts referencing).
Add a ``tractor._portal.StreamReceiveChannel.shield_channel()`` context
manager which allows for avoiding the closing of an IPC stream's
underlying channel for the purposes of task re-spawning. Sometimes you
might want to cancel a task consuming a stream but not tear down the IPC
between actors (the default). A common use can might be where the task's
"setup" work might need to be redone but you want to keep the
established portal / channel in tact despite the task restart.
Includes a test.
This begins moving toward explicitly decorated "streaming functions"
instead of checking for a `ctx` arg in the signature.
- provide each context with its task's top level `trio.CancelScope`
such that tasks can cancel themselves explictly if needed via calling
`Context.cancel_scope()`
- make `Actor.cancel_task()` a private method (`_cancel_task()`) and
handle remote rpc calls specially such that the caller does not need
to provide the `chan` argument; non-primitive types can't be passed on
the wire and we don't want the client actor be require knowledge of
the channel instance the request is associated with. This also ties into
how we're tracking tasks right now (`Actor._rpc_tasks` is keyed by the
call id, a UUID, *plus* the channel).
- make `_do_handshake` a private actor method
- use UUID version 4
As mentioned in prior commits there's currently a bug in Python that
make async gens **not** task safe. Since this is the core cause of almost
all recent problems, instead implement our own async iterator derivative of
`trio.abc.ReceiveChannel` by wrapping a `trio._channel.MemoryReceiveChannel`.
This fits more natively with the memory channel API in ``trio`` and adds
potentially more flexibility for possible bidirectional inter-actor streaming
in the future.
Huge thanks to @oremanj and of course @njsmith for guidance on this one!
For now stop `.aclose()`-ing all async gens on portal close since it can
cause hangs and other weird behaviour if another task operates on the
same instance.
See https://bugs.python.org/issue32526.
Turns out you get a bad situation if the target actor who's task you're
trying to cancel has already died (eg. from an external
`KeyboardInterrupt` or other error) and so we need to eventually bail on
the RPC request. Also don't bother closing the channel created in
`open_portal()` manually since the cancel scope should take care of all
that.
Use the new `Actor.cancel_task()` api to remotely cancel streaming
tasks spawned by a portal. This guarantees that if an actor is
cancelled all its (remote) portal spawned tasks will be as well.
On portal teardown only cancel all async
generator calls (though we should cancel all RPC requests in general
eventually) and don't close the channel since it may have been passed
in from some other context that wishes to keep it connected. In
`open_portal()` run the message loop shielded so that if the local
task is cancelled, messaging will continue until the internal scope
is cancelled at end of block.
Use the new custom error types throughout the actor and portal
primitives and set a few new rules:
- internal errors are any error not raised by an rpc task and are
**not** forwarded to portals but instead are raised directly in
the msg loop.
- portals always re-raise a "main task" error for every call to
``Portal.result()``.
This is purely for documentation purposes for now as it should be
obvious a bunch of the signatures aren't using the correct "generics"
syntax (i.e. the use of `(str, int)` instead of `typing.Tuple[str, int])`)
in a bunch of places. We're also not using a type checker yet and besides,
`trio` doesn't really expose a lot of its internal types very well.
2SQASH
This ensures that internal errors received from a remote actor are
indeed raised even in the `MainProcess` **before** comms tasks are
cancelled. Internal error in this case means any error packet received
on a channel that doesn't have a `cid` header. RPC errors (which **do**
have a `cid` header) are still forwarded to the consuming caller as usual.
Start a forkserver once in the main (parent-most) process
and pass ipc info (fds) to subprocesses manually such that embedded
calls to `multiprocessing.Process.start()` just work. Note that this
relies on our overridden version of the stdlib's
`multiprocessing.forkserver` module.
Resolves#6
Stop worrying about a "main task" in each actor and instead add an
additional `ActorNursery.run_in_actor()` method which wraps calls
to create an actor and run a lone RPC task inside it. Note this
adjusts the public API of `ActorNursery.start_actor()` to drop
its `main` kwarg.
The dirty deats of making this possible:
- each spawned RPC task is now tracked with a specific cancel scope such
that when the actor is cancelled all ongoing responders are cancelled
before any IPC/channel machinery is closed (turns out that spawning
new actors from `outlive_main=True` actors was probably borked before
finally getting this working).
- make each initial RPC response be a packet which describes the
`functype` (eg. `{'functype': 'asyncfunction'}`) allowing for async
calls/submissions by client actors (this was required to make
`run_in_actor()` work - `Portal._submit()` is the new async method).
- hooray we can stop faking "main task" results for daemon actors
- add better handling/raising of internal errors caught in the bowels of
the `Actor` itself.
- drop the rpc spawning nursery; just use the `Actor._root_nursery`
- only wait on `_no_more_peers` if there are existing peer channels that
are actually still connected.
- an `ActorNursery.__aexit__()` now implicitly waits on `Portal.result()` on close
for each `run_in_actor()` spawned actor.
- handle cancelling partial started actors which haven't yet connected
back to the parent
Resolves#24