Compare commits
No commits in common. "4b5176e2c3040491c2b60cc9f321081a4a60e9c9" and "54561959e6cd69876f31a573642777b5df9d4ff7" have entirely different histories.
4b5176e2c3
...
54561959e6
|
|
@ -36,279 +36,20 @@ across four scenarios by
|
||||||
py3.14.
|
py3.14.
|
||||||
|
|
||||||
This submodule lifts the validated primitives out of the
|
This submodule lifts the validated primitives out of the
|
||||||
smoke-test and into tractor proper as the
|
smoke-test and into tractor proper, so they can eventually be
|
||||||
`subint_forkserver` spawn backend.
|
wired into a real "subint forkserver" spawn backend — where:
|
||||||
|
|
||||||
Design rationale — why a forkserver, and why in-process
|
|
||||||
-------------------------------------------------------
|
|
||||||
|
|
||||||
There are two design questions worth pinning down up front,
|
|
||||||
since the name "subint_forkserver" intentionally evokes the
|
|
||||||
stdlib `multiprocessing.forkserver` for comparison:
|
|
||||||
|
|
||||||
**(1) Why a forkserver pattern at all, vs. forking directly
|
|
||||||
from the trio task?**
|
|
||||||
|
|
||||||
`os.fork()` is fundamentally hostile to trio: trio owns
|
|
||||||
file descriptors, signal-wakeup-fds, threadpools, and an
|
|
||||||
event loop with non-trivial post-fork lifecycle invariants
|
|
||||||
(see python-trio/trio#1614 et al.). Forking a trio-running
|
|
||||||
thread duplicates all that state into the child, which then
|
|
||||||
either needs surgical reset (fragile) or has to immediately
|
|
||||||
`exec()` (defeats the point of fork-without-exec). The
|
|
||||||
*forkserver* sidesteps this by isolating the `os.fork()`
|
|
||||||
call in a worker that has provably never entered trio — so
|
|
||||||
the child inherits a clean, trio-free image.
|
|
||||||
|
|
||||||
**(2) Why an in-process forkserver, vs. stdlib
|
|
||||||
`multiprocessing.forkserver`?**
|
|
||||||
|
|
||||||
The stdlib design solves the same "fork from clean state"
|
|
||||||
problem by spinning up a **separate sidecar process** at
|
|
||||||
first use of `mp.set_start_method('forkserver')`. The parent
|
|
||||||
then IPC's each spawn request to that sidecar over a unix
|
|
||||||
socket; the sidecar is the process that actually calls
|
|
||||||
`os.fork()`. This works but pays for cleanliness with three
|
|
||||||
costs:
|
|
||||||
|
|
||||||
- **Sidecar lifecycle**: a second long-lived process per
|
|
||||||
parent, with its own start/stop/health-check semantics.
|
|
||||||
- **IPC overhead per spawn**: every actor-spawn round-trips
|
|
||||||
an `mp` request message through a unix socket before any
|
|
||||||
child code runs.
|
|
||||||
- **State isolation by process boundary**: the sidecar can't
|
|
||||||
share parent state at all — every spawn is a "cold" child
|
|
||||||
re-importing modules from disk.
|
|
||||||
|
|
||||||
The subint architecture lets us keep the forkserver
|
|
||||||
**in-process** because subints already provide the
|
|
||||||
state-isolation guarantee that `mp.forkserver`'s sidecar
|
|
||||||
buys via the process boundary. Concretely: in the envisioned
|
|
||||||
arch (currently partially landed — see "Status" below),
|
|
||||||
|
|
||||||
- the **main interpreter** stays trio-free and hosts the
|
|
||||||
forkserver worker thread that owns `os.fork()`,
|
|
||||||
- the parent actor's **`trio.run()`** lives in a separate
|
|
||||||
*sub-interpreter* (a different worker thread) — fully
|
|
||||||
isolated `sys.modules` / `__main__` / globals from main,
|
|
||||||
- when a spawn is requested, the trio task signals the
|
|
||||||
forkserver thread (intra-process, ~free) and the
|
|
||||||
forkserver forks; the child inherits the parent's full
|
|
||||||
in-memory state cheaply.
|
|
||||||
|
|
||||||
That collapses the three costs above:
|
|
||||||
|
|
||||||
- no sidecar — the forkserver is just another thread,
|
|
||||||
- spawn signal is a thread-local event/condition, not IPC,
|
|
||||||
- child inherits the warm parent state (loaded modules,
|
|
||||||
populated caches, etc.) for free.
|
|
||||||
|
|
||||||
The tradeoff we accept in exchange: this design is
|
|
||||||
3.14-only (legacy-config subints still share the GIL, so
|
|
||||||
the parent's trio loop and the forkserver worker contend
|
|
||||||
on it; once PEP 684 isolated-mode + msgspec
|
|
||||||
[jcrist/msgspec#1026](https://github.com/jcrist/msgspec/issues/1026)
|
|
||||||
land, this constraint relaxes). And the dedicated worker
|
|
||||||
threads here are heavier than `trio.to_thread.run_sync`
|
|
||||||
calls — see the "TODO" section further down for the audit
|
|
||||||
plan once those upstream pieces land.
|
|
||||||
|
|
||||||
Future arch — what subints would buy us
|
|
||||||
---------------------------------------
|
|
||||||
|
|
||||||
The `subint` in this module's name is **family-naming
|
|
||||||
today** — currently the implementation only uses a regular
|
|
||||||
worker thread on the main interp; no subinterpreter is
|
|
||||||
created anywhere in the parent or child. The naming becomes
|
|
||||||
*literal* once jcrist/msgspec#1026 unblocks isolated-mode
|
|
||||||
subints (PEP 684 per-interp GIL). Three concrete wins land
|
|
||||||
at that point:
|
|
||||||
|
|
||||||
**(1) Cheaper forks (smaller main-interp COW image)**
|
|
||||||
|
|
||||||
Today the parent's main interp carries the full tractor
|
|
||||||
stack: trio runtime, msgspec codecs, IPC layer, every
|
|
||||||
user module the actor imported. When the forkserver
|
|
||||||
worker calls `os.fork()` the child inherits ALL of that
|
|
||||||
as COW memory — even though most gets overwritten when
|
|
||||||
the child boots its own `trio.run()`.
|
|
||||||
|
|
||||||
Move the parent's `trio.run()` into a subint (its own
|
|
||||||
`sys.modules` / `__main__` / globals) and the main
|
|
||||||
interp **stays minimal** — just the forkserver-thread
|
|
||||||
plumbing + bare CPython. The main interp becomes the
|
|
||||||
*literal* forkserver: an intentionally-empty execution
|
|
||||||
context whose only job is to call `os.fork()` cleanly.
|
|
||||||
Inherited COW image shrinks proportionally.
|
|
||||||
|
|
||||||
**(2) True parallelism between forkserver and trio
|
|
||||||
(per-interp GIL)**
|
|
||||||
|
|
||||||
Today the forkserver worker and the trio.run() thread
|
|
||||||
share the main GIL — when one runs the other waits.
|
|
||||||
Spawn requests briefly stall trio while the worker
|
|
||||||
takes the GIL to call `os.fork()`. PEP 684 isolated-
|
|
||||||
mode gives each subint its own GIL: forkserver thread
|
|
||||||
on main + trio on subint actually run in parallel.
|
|
||||||
Spawn latency drops, trio loop doesn't notice the
|
|
||||||
fork happening.
|
|
||||||
|
|
||||||
**(3) Multi-actor-per-process (the architectural prize)**
|
|
||||||
|
|
||||||
The bigger payoff and the reason `_subint.py` (the
|
|
||||||
in-thread `subint` backend) exists in parallel with
|
|
||||||
this module. With per-interp-GIL subints, one process
|
|
||||||
can host:
|
|
||||||
|
|
||||||
- main interp: forkserver thread + bookkeeping
|
|
||||||
- subint A: actor 1's `trio.run()`
|
|
||||||
- subint B: actor 2's `trio.run()`
|
|
||||||
- subint C: ...
|
|
||||||
|
|
||||||
`os.fork()` becomes the **last-resort** spawn — used
|
|
||||||
only when a new OS process is actually required
|
|
||||||
(cgroups, namespaces, security boundary, multi-host
|
|
||||||
distribution). Within a single process, subint-per-
|
|
||||||
actor is radically cheaper: no fork, no COW, no
|
|
||||||
inherited-fd cleanup — just `_interpreters.create()`
|
|
||||||
+ `_interpreters.exec()`.
|
|
||||||
|
|
||||||
The two backends converge on a coherent story:
|
|
||||||
`subint` → in-process spawn (cheap, GIL-isolated),
|
|
||||||
`subint_forkserver` → cross-process spawn (when you
|
|
||||||
truly need OS-level isolation). The forkserver isn't
|
|
||||||
the default mechanism; it's the bridge to a new
|
|
||||||
process when subint isolation isn't enough.
|
|
||||||
|
|
||||||
Implementation status — what's wired today
|
|
||||||
-----------------------------------------
|
|
||||||
|
|
||||||
The "envisioned arch" above is the eventual target; the
|
|
||||||
**currently-landed** flow is a partial step toward it:
|
|
||||||
|
|
||||||
- A dedicated main-interp worker thread owns all `os.fork()`
|
- A dedicated main-interp worker thread owns all `os.fork()`
|
||||||
calls (never enters a subint). ✓ landed.
|
calls (never enters a subint).
|
||||||
- Parent actor's `trio.run()` lives **on the main interp**
|
- The tractor parent-actor's `trio.run()` lives in a
|
||||||
for now (not a subint yet). The subint-hosted root
|
sub-interpreter on a different worker thread.
|
||||||
runtime is gated on jcrist/msgspec#1026 (see
|
- When a spawn is requested, the trio-task signals the
|
||||||
`_subint.py` docstring).
|
forkserver thread; the forkserver forks; child re-enters
|
||||||
- Spawn-request signal: trio task `→ to_thread.run_sync`
|
the same pattern (trio in a subint + forkserver on main).
|
||||||
to the forkserver-worker thread. ✓ landed.
|
|
||||||
- Forked child: runs `_actor_child_main` against a normal
|
|
||||||
trio runtime. ✓ landed.
|
|
||||||
|
|
||||||
The "subint" in the backend name refers to the *family* —
|
This mirrors the stdlib `multiprocessing.forkserver` design
|
||||||
this backend ships in the same PR series as `_subint.py`
|
but keeps the forkserver in-process for faster spawn latency
|
||||||
(in-thread subint backend) and `_subint_fork.py` (the RFC
|
and inherited parent state.
|
||||||
stub for fork-from-non-main-subint, blocked upstream).
|
|
||||||
Once the parent's trio also lives in a subint we'll have
|
|
||||||
the full envisioned arch; until then the forkserver
|
|
||||||
half is independently useful and ship-able.
|
|
||||||
|
|
||||||
What survives the fork? — POSIX semantics
|
|
||||||
-----------------------------------------
|
|
||||||
|
|
||||||
A natural worry when forking from a parent that's running
|
|
||||||
`trio.run()` on another thread: does that trio thread (and
|
|
||||||
any other threads in the parent) keep running in the child?
|
|
||||||
|
|
||||||
**No.** POSIX `fork()` only preserves the *calling* thread
|
|
||||||
in the child. Every other thread in the parent — trio's
|
|
||||||
runner thread, any `to_thread` cache threads, anything else
|
|
||||||
— is gone the instant `fork()` returns in the child.
|
|
||||||
|
|
||||||
Concretely, after the forkserver worker calls `os.fork()`:
|
|
||||||
|
|
||||||
| thread | parent | child |
|
|
||||||
|-----------------------|-----------|---------------|
|
|
||||||
| forkserver worker | continues | sole survivor |
|
|
||||||
| `trio.run()` thread | continues | gone |
|
|
||||||
| any other thread | continues | gone |
|
|
||||||
|
|
||||||
The forkserver worker becomes the new "main" execution
|
|
||||||
context in the child; `trio.run()` and every other
|
|
||||||
parent thread never executes a single instruction
|
|
||||||
post-fork in the child.
|
|
||||||
|
|
||||||
This is exactly *why* `os.fork()` is delegated to a
|
|
||||||
dedicated worker thread that has provably never entered
|
|
||||||
trio: we want that trio-free thread to be the surviving
|
|
||||||
one in the child.
|
|
||||||
|
|
||||||
That said, dead-thread *artifacts* still cross the fork
|
|
||||||
boundary (canonical "fork in a multithreaded program is
|
|
||||||
dangerous" — see `man pthread_atfork`). What persists, and
|
|
||||||
how we handle each:
|
|
||||||
|
|
||||||
- **Inherited file descriptors** — the dead trio thread's
|
|
||||||
epoll fd, signal-wakeup-fd, eventfds, sockets, IPC
|
|
||||||
pipes, pytest's capture-fds, etc. are all still in the
|
|
||||||
child's fd table (kernel-level inheritance). Handled by
|
|
||||||
`_close_inherited_fds()` in the child prelude — walks
|
|
||||||
`/proc/self/fd` and closes everything except stdio +
|
|
||||||
the channel pipe to the forkserver.
|
|
||||||
- **Memory image** — trio's internal data structures
|
|
||||||
(scheduler, task queues, runner state) sit in COW
|
|
||||||
memory but nobody's executing them. Get GC'd /
|
|
||||||
overwritten when the child's fresh `trio.run()` boots.
|
|
||||||
- **Python thread state** — handled automatically by
|
|
||||||
CPython. `PyOS_AfterFork_Child()` calls
|
|
||||||
`_PyThreadState_DeleteExceptCurrent()`, so dead
|
|
||||||
`PyThreadState` objects are cleaned and
|
|
||||||
`threading.enumerate()` returns just the surviving
|
|
||||||
thread.
|
|
||||||
- **User-level locks (`threading.Lock`)** —
|
|
||||||
held-by-dead-thread state is the canonical fork hazard.
|
|
||||||
Not an issue in practice for tractor: trio doesn't hold
|
|
||||||
cross-thread locks across fork (its synchronization is
|
|
||||||
within the trio task system, which doesn't survive in
|
|
||||||
either direction). CPython's GIL is auto-reset by the
|
|
||||||
fork callback.
|
|
||||||
|
|
||||||
FYI: how this dodges the `trio.run()` × `fork()` hazards
|
|
||||||
--------------------------------------------------------
|
|
||||||
|
|
||||||
`os.fork()` is famously hostile to `trio` (see
|
|
||||||
python-trio/trio#1614 et al.) because trio owns several
|
|
||||||
classes of process-global state that all break across the
|
|
||||||
fork boundary in different ways. The forkserver-thread
|
|
||||||
design dodges each class explicitly:
|
|
||||||
|
|
||||||
- **Signal-wakeup-fd**: trio installs a wakeup-fd via
|
|
||||||
`signal.set_wakeup_fd()` on `trio.run()` startup so
|
|
||||||
signals can interrupt `epoll_wait`. The child inherits
|
|
||||||
this fd, but trio's runner that owns it is gone — so
|
|
||||||
any signal delivery in the child writes to a dead
|
|
||||||
reader. *Dodge*: the inherited wakeup-fd is closed by
|
|
||||||
`_close_inherited_fds()`, then the child's own
|
|
||||||
`trio.run()` installs a fresh one.
|
|
||||||
- **`epoll`/`kqueue` instance**: trio's I/O backend holds
|
|
||||||
one. Inherited as a dead fd; same fix as above.
|
|
||||||
- **Threadpool cache threads** (`trio.to_thread`): worker
|
|
||||||
threads with cached tstate. Don't exist in the child
|
|
||||||
(POSIX); cache state is meaningless garbage that gets
|
|
||||||
reset when the child's trio.run() initializes its own
|
|
||||||
thread cache.
|
|
||||||
- **Cancel scopes / nurseries / open `trio.Process` /
|
|
||||||
open sockets**: these are trio-runtime objects, not
|
|
||||||
kernel objects. The runtime that owns them is gone in
|
|
||||||
the child, so the Python objects exist as zombie data
|
|
||||||
in COW memory and get overwritten as the child runs.
|
|
||||||
Inherited *kernel* fds those objects wrapped (sockets,
|
|
||||||
proc pipes) are caught by `_close_inherited_fds()`.
|
|
||||||
- **`atexit` handlers**: trio doesn't register any that
|
|
||||||
would mis-fire post-fork; trio's lifetime-stack is
|
|
||||||
all `with`-block-scoped and dies with the runner.
|
|
||||||
- **Foreign-language I/O state** (libcurl, OpenSSL session
|
|
||||||
caches, etc.): out of scope — same hazard as any
|
|
||||||
fork-without-exec; users layering those on top of
|
|
||||||
tractor need their own pthread_atfork handlers.
|
|
||||||
|
|
||||||
Net effect: for the runtime surface tractor controls
|
|
||||||
(trio + IPC layer + msgspec), the forkserver-thread
|
|
||||||
isolation + `_close_inherited_fds()` cleanup gives the
|
|
||||||
forked child a clean trio environment. Everything else
|
|
||||||
falls under the standard fork-without-exec disclaimer.
|
|
||||||
|
|
||||||
Status
|
Status
|
||||||
------
|
------
|
||||||
|
|
@ -359,7 +100,7 @@ to know.
|
||||||
Full analysis + audit plan for when we can revisit is in
|
Full analysis + audit plan for when we can revisit is in
|
||||||
`ai/conc-anal/subint_forkserver_thread_constraints_on_pep684_issue.md`.
|
`ai/conc-anal/subint_forkserver_thread_constraints_on_pep684_issue.md`.
|
||||||
Intent: file a follow-up GH issue linked to #379 once
|
Intent: file a follow-up GH issue linked to #379 once
|
||||||
[jcrist/msgspec#1026](https://github.com/jcrist/msgspec/issues/1026)
|
[jcrist/msgspec#563](https://github.com/jcrist/msgspec/issues/563)
|
||||||
unblocks isolated-mode subints in tractor.
|
unblocks isolated-mode subints in tractor.
|
||||||
|
|
||||||
See also
|
See also
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue