'structured concurrent `trio`-"actors"' https://github.com/goodboy/tractor
Go to file
Tyler Goodlet 7d41492f53 Don't kill root's immediate children when in debug
If the root calls `trio.Process.kill()` on immediate child proc teardown
when the child is using pdb, we can get stdstreams clobbering that
results in a pdb++ repl where the user can't see what's been typed. Not
killing such children on cancellation / error seems to resolve this
issue whilst still giving reliable termination. For now, code that
special path until a time it becomes a problem for ensuring zombie
reaps.
2021-05-31 09:29:45 -04:00
.github/workflows Include Python 3.9 in CI 2020-12-27 13:28:54 -05:00
docs Link to SC on wikipedia 2021-05-31 09:28:07 -04:00
examples Add debug example that causes pdb stdin clobbering 2021-05-31 09:25:47 -04:00
tests Pass func refs 2021-05-31 09:27:40 -04:00
tractor Don't kill root's immediate children when in debug 2021-05-31 09:29:45 -04:00
.gitignore Initial commit 2018-07-05 16:01:15 -04:00
LICENSE Initial commit 2018-07-05 16:01:15 -04:00
mypy.ini Add mypy.ini lel 2020-01-21 15:28:12 -05:00
requirements-docs.txt Added logo, fixed github links and grammar issues 2020-08-31 11:49:14 -03:00
requirements-test.txt Add `pexpect` dep for debugger tests 2020-10-13 11:04:16 -04:00
setup.py Bump alpha version 2021-04-27 11:35:16 -04:00

docs/README.rst

logo tractor: next-gen Python parallelism

gh_actions Documentation Status

tractor is a structured concurrent "actor model" built on trio and multi-processing.

We pair structured concurrency and true multi-core parallelism with the aim of being the multi-processing framework you always wanted.

The first step to grok tractor is to get the basics of trio down. A great place to start is the trio docs and this blog post.

Features

  • It's just a trio API
  • Infinitely nesteable process trees
  • Built-in APIs for inter-process streaming
  • A (first ever?) "native" multi-core debugger for Python using pdb++
  • Support for multiple process spawning backends
  • A modular transport layer, allowing for custom serialization, communications protocols, and environment specific IPC primitives

Run a func in a process

Use trio's style of focussing on tasks as functions:

"""
Run with a process monitor from a terminal using::

    $TERM -e watch -n 0.1  "pstree -a $$" \
        & python examples/parallelism/single_func.py \
        && kill $!

"""
import os

import tractor
import trio


async def burn_cpu():

    pid = os.getpid()

    # burn a core @ ~ 50kHz
    for _ in range(50000):
        await trio.sleep(1/50000/50)

    return os.getpid()


async def main():

    async with tractor.open_nursery() as n:

        portal = await n.run_in_actor(burn_cpu)

        #  burn rubber in the parent too
        await burn_cpu()

        # wait on result from target function
        pid = await portal.result()

    # end of nursery block
    print(f"Collected subproc {pid}")


if __name__ == '__main__':
    trio.run(main)

This runs burn_cpu() in a new process and reaps it on completion of the nursery block.

If you only need to run a sync function and retreive a single result, you might want to check out trio-parallel.

Zombie safe: self-destruct a process tree

tractor tries to protect you from zombies, no matter what.

"""
Run with a process monitor from a terminal using::

    $TERM -e watch -n 0.1  "pstree -a $$" \
        & python examples/parallelism/we_are_processes.py \
        && kill $!

"""
from multiprocessing import cpu_count
import os

import tractor
import trio


async def target():
    print(
        f"Yo, i'm '{tractor.current_actor().name}' "
        f"running in pid {os.getpid()}"
    )

   await trio.sleep_forever()


async def main():

    async with tractor.open_nursery() as n:

        for i in range(cpu_count()):
            await n.run_in_actor(target, name=f'worker_{i}')

        print('This process tree will self-destruct in 1 sec...')
        await trio.sleep(1)

        # you could have done this yourself
        raise Exception('Self Destructed')


if __name__ == '__main__':
    try:
        trio.run(main)
    except Exception:
        print('Zombies Contained')

If you can create zombie child processes (without using a system signal) it is a bug.

"Native" multi-process debugging

Using the magic of pdb++ and our internal IPC, we've been able to create a native feeling debugging experience for any (sub-)process in your tractor tree.

from os import getpid

import tractor
import trio


async def breakpoint_forever():
    "Indefinitely re-enter debugger in child actor."
    while True:
        yield 'yo'
        await tractor.breakpoint()


async def name_error():
    "Raise a ``NameError``"
    getattr(doggypants)


async def main():
    """Test breakpoint in a streaming actor.
    """
    async with tractor.open_nursery(
        debug_mode=True,
        loglevel='error',
    ) as n:

        p0 = await n.start_actor('bp_forever', enable_modules=[__name__])
        p1 = await n.start_actor('name_error', enable_modules=[__name__])

        # retreive results
        stream = await p0.run(breakpoint_forever)
        await p1.run(name_error)


if __name__ == '__main__':
    trio.run(main)

You can run this with:

>>> python examples/debugging/multi_daemon_subactors.py

And, yes, there's a built-in crash handling mode B)

We're hoping to add a respawn-from-repl system soon!

Worker poolz are easy peasy

The initial ask from most new users is "how do I make a worker pool thing?".

tractor is built to handle any SC (structured concurrent) process tree you can imagine; a "worker pool" pattern is a trivial special case.

We have a full worker pool re-implementation of the std-lib's concurrent.futures.ProcessPoolExecutor example for reference.

You can run it like so (from this dir) to see the process tree in real time:

$TERM -e watch -n 0.1  "pstree -a $$" \
    & python examples/parallelism/concurrent_actors_primes.py \
    && kill $!

This uses no extra threads, fancy semaphores or futures; all we need is tractor's IPC!

Install

From PyPi:

pip install tractor

From git:

pip install git+git://github.com/goodboy/tractor.git

Under the hood

tractor is an attempt to pair trionic structured concurrency with distributed Python. You can think of it as a trio -across-processes or simply as an opinionated replacement for the stdlib's multiprocessing but built on async programming primitives from the ground up.

tractor's nurseries let you spawn trio "actors": new Python processes which each run a trio scheduled runtime - a call to trio.run().

Don't be scared off by this description. tractor is just trio but with nurseries for process management and cancel-able streaming IPC. If you understand how to work with trio, tractor will give you the parallelism you've been missing.

"Actors" communicate by exchanging asynchronous messages and avoid sharing state. The intention of this model is to allow for highly distributed software that, through the adherence to structured concurrency, results in systems which fail in predictable and recoverable ways.

What's on the TODO:

Help us push toward the future.

  • (Soon to land) asyncio support allowing for "infected" actors where trio drives the asyncio scheduler via the astounding "guest mode"
  • Typed messaging protocols (ex. via msgspec)
  • Erlang-style supervisors via composed context managers

Feel like saying hi?

This project is very much coupled to the ongoing development of trio (i.e. tractor gets most of its ideas from that brilliant community). If you want to help, have suggestions or just want to say hi, please feel free to reach us in our matrix channel. If matrix seems too hip, we're also mostly all in the the trio gitter channel!