e97ed377b0
This adds remote cancellation semantics to our `tractor.Context` machinery to more closely match that of `trio.CancelScope` but with operational differences to handle the nature of parallel tasks interoperating across multiple memory boundaries: - if an actor task cancels some context it has opened via `Context.cancel()`, the remote (scope linked) task will be cancelled using the normal `CancelScope` semantics of `trio` meaning the remote cancel scope surrounding the far side task is cancelled and `trio.Cancelled`s are expected to be raised in that scope as per normal `trio` operation, and in the case where no error is raised in that remote scope, a `ContextCancelled` error is raised inside the runtime machinery and relayed back to the opener/caller side of the context. - if any actor task cancels a full remote actor runtime using `Portal.cancel_actor()` the same semantics as above apply except every other remote actor task which also has an open context with the actor which was cancelled will also be sent a `ContextCancelled` **but** with the `.canceller` field set to the uid of the original cancel requesting actor. This changeset also includes a more "proper" solution to the issue of "allowing overruns" during streaming without attempting to implement any form of IPC streaming backpressure. Implementing task-granularity backpressure cross-process turns out to be more or less impossible without augmenting out streaming protocol (likely at the cost of performance). Further allowing overruns requires special care since any blocking of the runtime RPC msg loop task effectively can block control msgs such as cancels and stream terminations. The implementation details per abstraction layer are as follows. ._streaming.Context: - add a new contructor factor func `mk_context()` which provides a strictly private init-er whilst allowing us to not have to define an `.__init__()` on the type def. - add public `.cancel_called` and `.cancel_called_remote` properties. - general rename of what was the internal `._backpressure` var to `._allow_overruns: bool`. - move the old contents of `Actor._push_result()` into a new `._deliver_msg()` allowing for better encapsulation of per-ctx msg handling. - always check for received 'error' msgs and process them with the new `_maybe_cancel_and_set_remote_error()` **before** any msg delivery to the local task, thus guaranteeing error and cancellation handling despite any overflow handling. - add a new `._drain_overflows()` task-method for use with new `._allow_overruns: bool = True` mode. - add back a `._scope_nursery: trio.Nursery` (allocated in `Portal.open_context()`) who's sole purpose is to spawn a single task which runs the above method; anything else is an error. - augment `._deliver_msg()` to start a task and run the above method when operating in no overrun mode; the task queues overflow msgs and attempts to send them to the underlying mem chan using a blocking `.send()` call. - on context exit, any existing "drainer task" will be cancelled and remaining overflow queued msgs are discarded with a warning. - rename `._error` -> `_remote_error` and set it in a new method `_maybe_cancel_and_set_remote_error()` which is called before processing - adjust `.result()` to always call `._maybe_raise_remote_err()` at its start such that whenever a `ContextCancelled` arrives we do logic for whether or not to immediately raise that error or ignore it due to the current actor being the one who requested the cancel, by checking the error's `.canceller` field. - set the default value of `._result` to be `id(Context()` thus avoiding conflict with any `.result()` actually being `False`.. ._runtime.Actor: - augment `.cancel()` and `._cancel_task()` and `.cancel_rpc_tasks()` to take a `requesting_uid: tuple` indicating the source actor of every cancellation request. - pass through the new `Context._allow_overruns` through `.get_context()` - call the new `Context._deliver_msg()` from `._push_result()` (since the factoring out that method's contents). ._runtime._invoke: - `TastStatus.started()` back a `Context` (unless an error is raised) instead of the cancel scope to make it easy to set/get state on that context for the purposes of cancellation and remote error relay. - always raise any remote error via `Context._maybe_raise_remote_err()` before doing any `ContextCancelled` logic. - assign any `Context._cancel_called_remote` set by the `requesting_uid` cancel methods (mentioned above) to the `ContextCancelled.canceller`. ._runtime.process_messages: - always pass a `requesting_uid: tuple` to `Actor.cancel()` and `._cancel_task` to that any corresponding `ContextCancelled.canceller` can be set inside `._invoke()`. |
||
---|---|---|
.github/workflows | ||
docs | ||
examples | ||
nooz | ||
tests | ||
tractor | ||
.gitignore | ||
LICENSE | ||
MANIFEST.in | ||
NEWS.rst | ||
mypy.ini | ||
pyproject.toml | ||
requirements-docs.txt | ||
requirements-test.txt | ||
setup.py |
docs/README.rst
tractor
: next-gen Python parallelism
tractor
is a structured concurrent, multi-processing runtime built on trio.
Fundamentally tractor
gives you parallelism via trio
-"actors": our nurseries let you spawn new Python processes which each run a trio
scheduled runtime - a call to trio.run()
.
We believe the system adheres to the 3 axioms of an "actor model" but likely does not look like what you probably think an "actor model" looks like, and that's intentional.
The first step to grok tractor
is to get the basics of trio
down. A great place to start is the trio docs and this blog post.
Features
- It's just a
trio
API - Infinitely nesteable process trees
- Builtin IPC streaming APIs with task fan-out broadcasting
- A (first ever?) "native" multi-core debugger UX for Python using pdb++
- Support for a swappable, OS specific, process spawning layer
- A modular transport stack, allowing for custom serialization (eg. with msgspec), communications protocols, and environment specific IPC primitives
- Support for spawning process-level-SC, inter-loop one-to-one-task oriented
asyncio
actors via "infectedasyncio
" mode - structured chadcurrency from the ground up
Run a func in a process
Use trio
's style of focussing on tasks as functions:
"""
Run with a process monitor from a terminal using::
$TERM -e watch -n 0.1 "pstree -a $$" \
& python examples/parallelism/single_func.py \
&& kill $!
"""
import os
import tractor
import trio
async def burn_cpu():
= os.getpid()
pid
# burn a core @ ~ 50kHz
for _ in range(50000):
await trio.sleep(1/50000/50)
return os.getpid()
async def main():
async with tractor.open_nursery() as n:
= await n.run_in_actor(burn_cpu)
portal
# burn rubber in the parent too
await burn_cpu()
# wait on result from target function
= await portal.result()
pid
# end of nursery block
print(f"Collected subproc {pid}")
if __name__ == '__main__':
trio.run(main)
This runs burn_cpu()
in a new process and reaps it on completion of the nursery block.
If you only need to run a sync function and retreive a single result, you might want to check out trio-parallel.
Zombie safe: self-destruct a process tree
tractor
tries to protect you from zombies, no matter what.
"""
Run with a process monitor from a terminal using::
$TERM -e watch -n 0.1 "pstree -a $$" \
& python examples/parallelism/we_are_processes.py \
&& kill $!
"""
from multiprocessing import cpu_count
import os
import tractor
import trio
async def target():
print(
f"Yo, i'm '{tractor.current_actor().name}' "
f"running in pid {os.getpid()}"
)
await trio.sleep_forever()
async def main():
async with tractor.open_nursery() as n:
for i in range(cpu_count()):
await n.run_in_actor(target, name=f'worker_{i}')
print('This process tree will self-destruct in 1 sec...')
await trio.sleep(1)
# raise an error in root actor/process and trigger
# reaping of all minions
raise Exception('Self Destructed')
if __name__ == '__main__':
try:
trio.run(main)except Exception:
print('Zombies Contained')
If you can create zombie child processes (without using a system signal) it is a bug.
"Native" multi-process debugging
Using the magic of pdb++ and our internal IPC, we've been able to create a native feeling debugging experience for any (sub-)process in your tractor
tree.
from os import getpid
import tractor
import trio
async def breakpoint_forever():
"Indefinitely re-enter debugger in child actor."
while True:
yield 'yo'
await tractor.breakpoint()
async def name_error():
"Raise a ``NameError``"
getattr(doggypants)
async def main():
"""Test breakpoint in a streaming actor.
"""
async with tractor.open_nursery(
=True,
debug_mode='error',
loglevelas n:
)
= await n.start_actor('bp_forever', enable_modules=[__name__])
p0 = await n.start_actor('name_error', enable_modules=[__name__])
p1
# retreive results
= await p0.run(breakpoint_forever)
stream await p1.run(name_error)
if __name__ == '__main__':
trio.run(main)
You can run this with:
>>> python examples/debugging/multi_daemon_subactors.py
And, yes, there's a built-in crash handling mode B)
We're hoping to add a respawn-from-repl system soon!
SC compatible bi-directional streaming
Yes, you saw it here first; we provide 2-way streams with reliable, transitive setup/teardown semantics.
Our nascent api is remniscent of trio.Nursery.start()
style invocation:
import trio
import tractor
@tractor.context
async def simple_rpc(
ctx: tractor.Context,int,
data:
-> None:
) '''Test a small ping-pong 2-way streaming server.
'''
# signal to parent that we're up much like
# ``trio_typing.TaskStatus.started()``
await ctx.started(data + 1)
async with ctx.open_stream() as stream:
= 0
count async for msg in stream:
assert msg == 'ping'
await stream.send('pong')
+= 1
count
else:
assert count == 10
async def main() -> None:
async with tractor.open_nursery() as n:
= await n.start_actor(
portal 'rpc_server',
=[__name__],
enable_modules
)
# XXX: this syntax requires py3.9
async with (
portal.open_context(
simple_rpc,=10,
dataas (ctx, sent),
)
as stream,
ctx.open_stream()
):
assert sent == 11
= 0
count # receive msgs using async for style
await stream.send('ping')
async for msg in stream:
assert msg == 'pong'
await stream.send('ping')
+= 1
count
if count >= 9:
break
# explicitly teardown the daemon-actor
await portal.cancel_actor()
if __name__ == '__main__':
trio.run(main)
See original proposal and discussion in #53 as well as follow up improvements in #223 that we'd love to hear your thoughts on!
Worker poolz are easy peasy
The initial ask from most new users is "how do I make a worker pool thing?".
tractor
is built to handle any SC (structured concurrent) process tree you can imagine; a "worker pool" pattern is a trivial special case.
We have a full worker pool re-implementation of the std-lib's concurrent.futures.ProcessPoolExecutor
example for reference.
You can run it like so (from this dir) to see the process tree in real time:
$TERM -e watch -n 0.1 "pstree -a $$" \
& python examples/parallelism/concurrent_actors_primes.py \
&& kill $!
This uses no extra threads, fancy semaphores or futures; all we need is tractor
's IPC!
"Infected asyncio
" mode
Have a bunch of asyncio
code you want to force to be SC at the process level?
Check out our experimental system for guest-mode controlled asyncio
actors:
import asyncio
from statistics import mean
import time
import trio
import tractor
async def aio_echo_server(
to_trio: trio.MemorySendChannel,
from_trio: asyncio.Queue,-> None:
)
# a first message must be sent **from** this ``asyncio``
# task or the ``trio`` side will never unblock from
# ``tractor.to_asyncio.open_channel_from():``
'start')
to_trio.send_nowait(
# XXX: this uses an ``from_trio: asyncio.Queue`` currently but we
# should probably offer something better.
while True:
# echo the msg back
await from_trio.get())
to_trio.send_nowait(await asyncio.sleep(0)
@tractor.context
async def trio_to_aio_echo_server(
ctx: tractor.Context,
):# this will block until the ``asyncio`` task sends a "first"
# message.
async with tractor.to_asyncio.open_channel_from(
aio_echo_server,as (first, chan):
)
assert first == 'start'
await ctx.started(first)
async with ctx.open_stream() as stream:
async for msg in stream:
await chan.send(msg)
= await chan.receive()
out # echo back to parent actor-task
await stream.send(out)
async def main():
async with tractor.open_nursery() as n:
= await n.start_actor(
p 'aio_server',
=[__name__],
enable_modules=True,
infect_asyncio
)async with p.open_context(
trio_to_aio_echo_server,as (ctx, first):
)
assert first == 'start'
= 0
count async with ctx.open_stream() as stream:
= []
delays = time.time()
send
await stream.send(count)
async for msg in stream:
= time.time()
recv - send)
delays.append(recv assert msg == count
+= 1
count = time.time()
send await stream.send(count)
if count >= 1e3:
break
print(f'mean round trip rate (Hz): {1/mean(delays)}')
await p.cancel_actor()
if __name__ == '__main__':
trio.run(main)
Yes, we spawn a python process, run asyncio
, start trio
on the asyncio
loop, then send commands to the trio
scheduled tasks to tell asyncio
tasks what to do XD
We need help refining the asyncio-side channel API to be more trio-like. Feel free to sling your opinion in #273!
Higher level "cluster" APIs
To be extra terse the tractor
devs have started hacking some "higher level" APIs for managing actor trees/clusters. These interfaces should generally be condsidered provisional for now but we encourage you to try them and provide feedback. Here's a new API that let's you quickly spawn a flat cluster:
import trio
import tractor
async def sleepy_jane():
= tractor.current_actor().uid
uid print(f'Yo i am actor {uid}')
await trio.sleep_forever()
async def main():
'''
Spawn a flat actor cluster, with one process per
detected core.
'''
dict[str, tractor.Portal]
portal_map: dict[str, str]
results:
# look at this hip new syntax!
async with (
tractor.open_actor_cluster(=[__name__]
modulesas portal_map,
)
as n,
trio.open_nursery()
):
for (name, portal) in portal_map.items():
n.start_soon(portal.run, sleepy_jane)
await trio.sleep(0.5)
# kill the cluster with a cancel
raise KeyboardInterrupt
if __name__ == '__main__':
try:
trio.run(main)except KeyboardInterrupt:
pass
Install
From PyPi:
pip install tractor
From git:
pip install git+git://github.com/goodboy/tractor.git
Under the hood
tractor
is an attempt to pair trionic structured concurrency with distributed Python. You can think of it as a trio
-across-processes or simply as an opinionated replacement for the stdlib's multiprocessing
but built on async programming primitives from the ground up.
Don't be scared off by this description. tractor
is just trio
but with nurseries for process management and cancel-able streaming IPC. If you understand how to work with trio
, tractor
will give you the parallelism you may have been needing.
Wait, huh?! I thought "actors" have messages, and mailboxes and stuff?!
Let's stop and ask how many canon actor model papers have you actually read ;)
From our experience many "actor systems" aren't really "actor models" since they don't adhere to the 3 axioms and pay even less attention to the problem of unbounded non-determinism (which was the whole point for creation of the model in the first place).
From the author's mouth, the only thing required is adherance to the 3 axioms, and that's it.
tractor
adheres to said base requirements of an "actor model":
In response to a message, an actor may:
- send a finite number of new messages
- create a finite number of new actors
- designate a new behavior to process subsequent messages
and requires no further api changes to accomplish this.
If you want do debate this further please feel free to chime in on our chat or discuss on one of the following issues after you've read everything in them:
Let's clarify our parlance
Whether or not tractor
has "actors" underneath should be mostly irrelevant to users other then for referring to the interactions of our primary runtime primitives: each Python process + trio.run()
+ surrounding IPC machinery. These are our high level, base runtime-units-of-abstraction which both are (as much as they can be in Python) and will be referred to as our "actors".
The main goal of tractor
is is to allow for highly distributed software that, through the adherence to structured concurrency, results in systems which fail in predictable, recoverable and maybe even understandable ways; being an "actor model" is just one way to describe properties of the system.
What's on the TODO:
Help us push toward the future of distributed Python.
- Erlang-style supervisors via composed context managers (see #22)
- Typed messaging protocols (ex. via
msgspec.Struct
, see #36) - Typed capability-based (dialog) protocols ( see #196 with draft work started in #311)
- We recently disabled CI-testing on windows and need help getting it running again! (see #327). We do have windows support (and have for quite a while) but since no active hacker exists in the user-base to help test on that OS, for now we're not actively maintaining testing due to the added hassle and general latency..
Feel like saying hi?
This project is very much coupled to the ongoing development of trio
(i.e. tractor
gets most of its ideas from that brilliant community). If you want to help, have suggestions or just want to say hi, please feel free to reach us in our matrix channel. If matrix seems too hip, we're also mostly all in the the trio gitter channel!