distributed structured concurrency https://github.com/goodboy/tractor

Go to file

Tyler Goodlet c5985169cc Init def of "SC shuttle prot" with "msg-spec-limiting" As per the long outstanding GH issue this starts our rigorous journey into an attempt at a type-safe, cross-actor SC, IPC protocol Bo boop -> https://github.com/goodboy/tractor/issues/36 The idea is to "formally" define our SC "shuttle (dialog) protocol" by specifying a new `.msg.types.Msg` subtype-set which can fully encapsulate all IPC msg schemas needed in order to accomplish cross-process SC! The msg set deviated a little in terms of (type) names from the existing `dict`-msgs currently used in the runtime impl but, I think the name changes are much better in terms of explicitly representing the internal semantics of the actor runtime machinery/subsystems and the IPC-msg-dialog required for SC enforced RPC. ------ - ------ In cursory, the new formal msgs-spec includes the following msg-subtypes of a new top-level `Msg` boxing type (that holds the base field schema for all msgs): - `Start` to request RPC task scheduling by passing a `FuncSpec` payload (to replace the currently used `{'cmd': ... }` dict msg impl) - `StartAck` to allow the RPC task callee-side to report a `IpcCtxSpec` payload immediately back to the caller (currently responded naively via a `{'functype': ... }` msg) - `Started` to deliver the first value from `Context.started()` (instead of the existing `{'started': ... }`) - `Yield` to shuttle `MsgStream.send()`-ed values (instead of our `{'yield': ... }`) - `Stop` to terminate a `Context.open_stream()` session/block (over `{'stop': True }`) - `Return` to deliver the final value from the `Actor.start_remote_task()` (which is a `{'return': ... }`) - `Error` to box `RemoteActorError` exceptions via a `.pld: ErrorData` payload, planned to replace/extend the current `RemoteActorError.msgdata` mechanism internal to `._exceptions.pack/unpack_error()` The new `tractor.msg.types` includes all the above msg defs as well an API for rendering a "payload type specification" using a `payload_type_spec: Union[Type]` that can be passed to `msgspec.msgpack.Decoder(type=payload_type_spec)`. This ensures that (for a subset of the above msg set) `Msg.pld: PayloadT` data is type-parameterized using `msgspec`'s new `Generic[PayloadT]` field support and thus enables providing for an API where IPC `Context` dialogs can strictly define the allowed payload-datatype-set via type union! Iow, this is the foundation for supporting `Channel`/`Context`/`MsgStream` IPC primitives which are type checked/safe as desired in GH issue: - https://github.com/goodboy/tractor/issues/365 Misc notes on current impl(s) status: ------ - ------ - add a `.msg.types.mk_msg_spec()` which uses the new `msgspec` support for `class MyStruct[Struct, Generic[T]]` parameterize-able fields and delivers our boxing SC-msg-(sub)set with the desired `payload_types` applied to `.pld`: - https://jcristharif.com/msgspec/supported-types.html#generic-types - as a note this impl seems to need to use `type.new_class()` dynamic subtype generation, though i don't really get why still.. but without that the `msgspec.msgpack.Decoder` doesn't seem to reject `.pld` limited `Msg` subtypes as demonstrated in the new test. - around this ^ add a `.msg._codec.limit_msg_spec()` cm which exposes this payload type limiting API such that it can be applied per task via a `MsgCodec` in app code. - the orig approach in https://github.com/goodboy/tractor/pull/311 was the idea of making payload fields `.pld: Raw` wherein we could have per-field/sub-msg decoders dynamically loaded depending on the particular application-layer schema in use. I don't want to lose the idea of this since I think it might be useful for an idea I have about capability-based-fields(-sharing, maybe using field-subset encryption?), and as such i've kept the (ostensibly) working impls in TODO-comments in `.msg._codec` wherein maybe we can add a `MsgCodec._payload_decs: dict` table for this later on. \|_ also left in the `.msg.types.enc/decmsg()` impls but renamed as `enc/dec_payload()` (but reworked to not rely on the lifo codec stack tables; now removed) such that we can prolly move them to `MsgCodec` methods in the future. - add an unused `._codec.mk_tagged_union_dec()` helper which was originally factored out the #311 proto-code but didn't end up working as desired with the new parameterized generic fields approach (now in `msg.types.mk_msg_spec()`) Testing/deps work: ------ - ------ - new `test_limit_msgspec()` which ensures all the `.types` content is correct but without using the wrapping APIs in `._codec`; i.e. using a in-line `Decoder` instead of a `MsgCodec`. - pin us to `msgspec>=0.18.5` which has the needed generic-types support (which took me way too long yester to figure out when implementing all this XD)!		2025-03-24 14:04:49 -04:00
.github/workflows	Only run CI on py3.11	2025-03-19 15:34:30 -04:00
docs	Add rendezvous proto link	2025-03-24 13:30:12 -04:00
examples	Tweaks to debugger examples	2025-03-20 23:22:45 -04:00
nooz	Add news file	2023-05-15 09:35:59 -04:00
tests	Init def of "SC shuttle prot" with "msg-spec-limiting"	2025-03-24 14:04:49 -04:00
tractor	Init def of "SC shuttle prot" with "msg-spec-limiting"	2025-03-24 14:04:49 -04:00
.gitignore	Initial commit	2018-07-05 16:01:15 -04:00
LICENSE	Re-license code base for distribution under AGPL	2021-12-14 23:33:27 -05:00
MANIFEST.in	Include ./docs/README.rst in src dist	2022-07-11 14:25:26 -04:00
NEWS.rst	Add summary section	2022-08-03 11:42:53 -04:00
mypy.ini	Add mypy.ini lel	2020-01-21 15:28:12 -05:00
pyproject.toml	Drop `trio-typing` as dep	2025-03-23 00:33:44 -04:00
pytest.ini	Unmask `pytest.ini` log-capture lines (again)	2025-03-20 19:50:31 -04:00
ruff.toml	Disable invalid line in `ruff` config?	2025-03-21 00:18:05 -04:00
uv.lock	Drop `trio-typing` as dep	2025-03-23 00:33:44 -04:00

docs/README.rst

logo tractor: distributed structurred concurrency

tractor is a structured concurrency (SC), multi-processing runtime built on trio.

Fundamentally, tractor provides parallelism via trio-"actors": independent Python processes (i.e. non-shared-memory threads) which can schedule trio tasks whilst maintaining end-to-end SC inside a distributed supervision tree.

Cross-process (and thus cross-host) SC is accomplished through the combined use of our,

"actor nurseries" which provide for spawning multiple, and possibly nested, Python processes each running a trio scheduled runtime - a call to trio.run(),
an "SC-transitive supervision protocol" enforced as an IPC-message-spec encapsulating all RPC-dialogs.

We believe the system adheres to the 3 axioms of an "actor model" but likely does not look like what you probably think an "actor model" looks like, and that's intentional.

Where do i start!?

The first step to grok tractor is to get an intermediate knowledge of trio and structured concurrency B)

Some great places to start are,

the seminal blog post
obviously the trio docs
wikipedia's nascent SC page
the fancy diagrams @ libdill-docs

Features

It's just a trio API!
Infinitely nesteable process trees running embedded trio tasks.
Swappable, OS-specific, process spawning via multiple backends.
Modular IPC stack, allowing for custom interchange formats (eg. as offered from msgspec), varied transport protocols (TCP, RUDP, QUIC, wireguard), and OS-env specific higher-perf primitives (UDS, shm-ring-buffers).
Optionally distributed: all IPC and RPC APIs work over multi-host transports the same as local.
Builtin high-level streaming API that enables your app to easily leverage the benefits of a "cheap or nasty" (un)protocol.
A "native UX" around a multi-process safe debugger REPL using pdbp (a fork & fix of pdb++)
"Infected asyncio" mode: support for starting an actor's runtime as a guest on the asyncio loop allowing us to provide stringent SC-style trio.Task-supervision around any asyncio.Task spawned via our tractor.to_asyncio APIs.
A very naive and still very much work-in-progress inter-actor discovery sys with plans to support multiple modern protocol approaches.
Various trio extension APIs via tractor.trionics such as,
- task fan-out broadcasting,
- multi-task-single-resource-caching and fan-out-to-multi __aenter__() APIs for @acm functions,
- (WIP) a TaskMngr: one-cancels-one style nursery supervisor.

Install

tractor is still in a alpha-near-beta-stage for many of its subsystems, however we are very close to having a stable lowlevel runtime and API.

As such, it's currently recommended that you clone and install the repo from source:

pip install git+git://github.com/goodboy/tractor.git

We use the very hip uv for project mgmt:

git clone https://github.com/goodboy/tractor.git
cd tractor
uv sync --dev
uv run python examples/rpc_bidir_streaming.py

Consider activating a virtual/project-env before starting to hack on the code base:

# you could use plain ol' venvs
# https://docs.astral.sh/uv/pip/environments/
uv venv tractor_py313 --python 3.13

# but @goodboy prefers the more explicit (and shell agnostic)
# https://docs.astral.sh/uv/configuration/environment/#uv_project_environment
UV_PROJECT_ENVIRONMENT="tractor_py313

# hint hint, enter @goodboy's fave shell B)
uv run --dev xonsh

Alongside all this we ofc offer "releases" on PyPi:

pip install tractor

Just note that YMMV since the main git branch is often much further ahead then any latest release.

Example codez

In tractor's (very lacking) documention we prefer to point to example scripts in the repo over duplicating them in docs, but with that in mind here are some definitive snippets to try and hook you into digging deeper.

Run a func in a process

Use trio's style of focussing on tasks as functions:

"""
Run with a process monitor from a terminal using::

    $TERM -e watch -n 0.1  "pstree -a $$" \
        & python examples/parallelism/single_func.py \
        && kill $!

"""
import os

import tractor
import trio


async def burn_cpu():

    pid = os.getpid()

    # burn a core @ ~ 50kHz
    for _ in range(50000):
        await trio.sleep(1/50000/50)

    return os.getpid()


async def main():

    async with tractor.open_nursery() as n:

        portal = await n.run_in_actor(burn_cpu)

        #  burn rubber in the parent too
        await burn_cpu()

        # wait on result from target function
        pid = await portal.result()

    # end of nursery block
    print(f"Collected subproc {pid}")


if __name__ == '__main__':
    trio.run(main)

This runs burn_cpu() in a new process and reaps it on completion of the nursery block.

If you only need to run a sync function and retreive a single result, you might want to check out trio-parallel.

Zombie safe: self-destruct a process tree

tractor tries to protect you from zombies, no matter what.

"""
Run with a process monitor from a terminal using::

    $TERM -e watch -n 0.1  "pstree -a $$" \
        & python examples/parallelism/we_are_processes.py \
        && kill $!

"""
from multiprocessing import cpu_count
import os

import tractor
import trio


async def target():
    print(
        f"Yo, i'm '{tractor.current_actor().name}' "
        f"running in pid {os.getpid()}"
    )

    await trio.sleep_forever()


async def main():

    async with tractor.open_nursery() as n:

        for i in range(cpu_count()):
            await n.run_in_actor(target, name=f'worker_{i}')

        print('This process tree will self-destruct in 1 sec...')
        await trio.sleep(1)

        # raise an error in root actor/process and trigger
        # reaping of all minions
        raise Exception('Self Destructed')


if __name__ == '__main__':
    try:
        trio.run(main)
    except Exception:
        print('Zombies Contained')

If you can create zombie child processes (without using a system signal) it is a bug.

"Native" multi-process debugging

Using the magic of pdbp and our internal IPC, we've been able to create a native feeling debugging experience for any (sub-)process in your tractor tree.

from os import getpid

import tractor
import trio


async def breakpoint_forever():
    "Indefinitely re-enter debugger in child actor."
    while True:
        yield 'yo'
        await tractor.breakpoint()


async def name_error():
    "Raise a ``NameError``"
    getattr(doggypants)


async def main():
    """Test breakpoint in a streaming actor.
    """
    async with tractor.open_nursery(
        debug_mode=True,
        loglevel='error',
    ) as n:

        p0 = await n.start_actor('bp_forever', enable_modules=[__name__])
        p1 = await n.start_actor('name_error', enable_modules=[__name__])

        # retreive results
        stream = await p0.run(breakpoint_forever)
        await p1.run(name_error)


if __name__ == '__main__':
    trio.run(main)

You can run this with:

>>> python examples/debugging/multi_daemon_subactors.py

And, yes, there's a built-in crash handling mode B)

We're hoping to add a respawn-from-repl system soon!

SC compatible bi-directional streaming

Yes, you saw it here first; we provide 2-way streams with reliable, transitive setup/teardown semantics.

Our nascent api is remniscent of trio.Nursery.start() style invocation:

import trio
import tractor


@tractor.context
async def simple_rpc(

    ctx: tractor.Context,
    data: int,

) -> None:
    '''Test a small ping-pong 2-way streaming server.

    '''
    # signal to parent that we're up much like
    # ``trio_typing.TaskStatus.started()``
    await ctx.started(data + 1)

    async with ctx.open_stream() as stream:

        count = 0
        async for msg in stream:

            assert msg == 'ping'
            await stream.send('pong')
            count += 1

        else:
            assert count == 10


async def main() -> None:

    async with tractor.open_nursery() as n:

        portal = await n.start_actor(
            'rpc_server',
            enable_modules=[__name__],
        )

        # XXX: this syntax requires py3.9
        async with (

            portal.open_context(
                simple_rpc,
                data=10,
            ) as (ctx, sent),

            ctx.open_stream() as stream,
        ):

            assert sent == 11

            count = 0
            # receive msgs using async for style
            await stream.send('ping')

            async for msg in stream:
                assert msg == 'pong'
                await stream.send('ping')
                count += 1

                if count >= 9:
                    break


        # explicitly teardown the daemon-actor
        await portal.cancel_actor()


if __name__ == '__main__':
    trio.run(main)

See original proposal and discussion in #53 as well as follow up improvements in #223 that we'd love to hear your thoughts on!

Worker poolz are easy peasy

The initial ask from most new users is "how do I make a worker pool thing?".

tractor is built to handle any SC (structured concurrent) process tree you can imagine; a "worker pool" pattern is a trivial special case.

We have a full worker pool re-implementation of the std-lib's concurrent.futures.ProcessPoolExecutor example for reference.

You can run it like so (from this dir) to see the process tree in real time:

$TERM -e watch -n 0.1  "pstree -a $$" \
    & python examples/parallelism/concurrent_actors_primes.py \
    && kill $!

This uses no extra threads, fancy semaphores or futures; all we need is tractor's IPC!

"Infected `asyncio`" mode

Have a bunch of asyncio code you want to force to be SC at the process level?

Check out our experimental system for guest-mode controlled asyncio actors:

import asyncio
from statistics import mean
import time

import trio
import tractor


async def aio_echo_server(
    to_trio: trio.MemorySendChannel,
    from_trio: asyncio.Queue,
) -> None:

    # a first message must be sent **from** this ``asyncio``
    # task or the ``trio`` side will never unblock from
    # ``tractor.to_asyncio.open_channel_from():``
    to_trio.send_nowait('start')

    # XXX: this uses an ``from_trio: asyncio.Queue`` currently but we
    # should probably offer something better.
    while True:
        # echo the msg back
        to_trio.send_nowait(await from_trio.get())
        await asyncio.sleep(0)


@tractor.context
async def trio_to_aio_echo_server(
    ctx: tractor.Context,
):
    # this will block until the ``asyncio`` task sends a "first"
    # message.
    async with tractor.to_asyncio.open_channel_from(
        aio_echo_server,
    ) as (first, chan):

        assert first == 'start'
        await ctx.started(first)

        async with ctx.open_stream() as stream:

            async for msg in stream:
                await chan.send(msg)

                out = await chan.receive()
                # echo back to parent actor-task
                await stream.send(out)


async def main():

    async with tractor.open_nursery() as n:
        p = await n.start_actor(
            'aio_server',
            enable_modules=[__name__],
            infect_asyncio=True,
        )
        async with p.open_context(
            trio_to_aio_echo_server,
        ) as (ctx, first):

            assert first == 'start'

            count = 0
            async with ctx.open_stream() as stream:

                delays = []
                send = time.time()

                await stream.send(count)
                async for msg in stream:
                    recv = time.time()
                    delays.append(recv - send)
                    assert msg == count
                    count += 1
                    send = time.time()
                    await stream.send(count)

                    if count >= 1e3:
                        break

        print(f'mean round trip rate (Hz): {1/mean(delays)}')
        await p.cancel_actor()


if __name__ == '__main__':
    trio.run(main)

Yes, we spawn a python process, run asyncio, start trio on the asyncio loop, then send commands to the trio scheduled tasks to tell asyncio tasks what to do XD

We need help refining the asyncio-side channel API to be more trio-like. Feel free to sling your opinion in #273!

Higher level "cluster" APIs

To be extra terse the tractor devs have started hacking some "higher level" APIs for managing actor trees/clusters. These interfaces should generally be condsidered provisional for now but we encourage you to try them and provide feedback. Here's a new API that let's you quickly spawn a flat cluster:

import trio
import tractor


async def sleepy_jane():
    uid = tractor.current_actor().uid
    print(f'Yo i am actor {uid}')
    await trio.sleep_forever()


async def main():
    '''
    Spawn a flat actor cluster, with one process per
    detected core.

    '''
    portal_map: dict[str, tractor.Portal]
    results: dict[str, str]

    # look at this hip new syntax!
    async with (

        tractor.open_actor_cluster(
            modules=[__name__]
        ) as portal_map,

        trio.open_nursery() as n,
    ):

        for (name, portal) in portal_map.items():
            n.start_soon(portal.run, sleepy_jane)

        await trio.sleep(0.5)

        # kill the cluster with a cancel
        raise KeyboardInterrupt


if __name__ == '__main__':
    try:
        trio.run(main)
    except KeyboardInterrupt:
        pass

Under the hood

tractor is an attempt to pair trionic structured concurrency with distributed Python. You can think of it as a trio -across-processes or simply as an opinionated replacement for the stdlib's multiprocessing but built on async programming primitives from the ground up.

Don't be scared off by this description. tractor is just trio but with nurseries for process management and cancel-able streaming IPC. If you understand how to work with trio, tractor will give you the parallelism you may have been needing.

Wait, huh?! I thought "actors" have messages, and mailboxes and stuff?!

Let's stop and ask how many canon actor model papers have you actually read ;)

From our experience many "actor systems" aren't really "actor models" since they don't adhere to the 3 axioms and pay even less attention to the problem of unbounded non-determinism (which was the whole point for creation of the model in the first place).

From the author's mouth, the only thing required is adherance to the 3 axioms, and that's it.

tractor adheres to said base requirements of an "actor model":

In response to a message, an actor may:

- send a finite number of new messages
- create a finite number of new actors
- designate a new behavior to process subsequent messages

and requires no further api changes to accomplish this.

If you want do debate this further please feel free to chime in on our chat or discuss on one of the following issues after you've read everything in them:

Let's clarify our parlance

Whether or not tractor has "actors" underneath should be mostly irrelevant to users other then for referring to the interactions of our primary runtime primitives: each Python process + trio.run() + surrounding IPC machinery. These are our high level, base runtime-units-of-abstraction which both are (as much as they can be in Python) and will be referred to as our "actors".

The main goal of tractor is is to allow for highly distributed software that, through the adherence to structured concurrency, results in systems which fail in predictable, recoverable and maybe even understandable ways; being an "actor model" is just one way to describe properties of the system.

What's on the TODO:

Help us push toward the future of distributed Python.

Erlang-style supervisors via composed context managers (see #22)
Typed messaging protocols (ex. via msgspec.Struct, see #36)
Typed capability-based (dialog) protocols ( see #196 with draft work started in #311)
We recently disabled CI-testing on windows and need help getting it running again! (see #327). We do have windows support (and have for quite a while) but since no active hacker exists in the user-base to help test on that OS, for now we're not actively maintaining testing due to the added hassle and general latency..

Feel like saying hi?

This project is very much coupled to the ongoing development of trio (i.e. tractor gets most of its ideas from that brilliant community). If you want to help, have suggestions or just want to say hi, please feel free to reach us in our matrix channel. If matrix seems too hip, we're also mostly all in the the trio gitter channel!