Doc `_subint_forkserver` design + fork semantics

Major expansion of the module docstring. Code is
unchanged; this lands the architectural reasoning that
was previously implicit, plus the POSIX/trio fork
mechanics the design relies on.

New sections:
- "Design rationale" — answers two implicit questions:
  (1) why a forkserver pattern at all (vs. forking
  directly from a trio task), (2) why in-process (vs.
  stdlib `mp.forkserver`'s sidecar process). Documents
  the three costs the in-process design avoids
  (sidecar lifecycle, per-spawn IPC, cold-start child)
  and the tradeoffs we accept in exchange (3.14-only,
  heavier than `to_thread.run_sync`).
- "Implementation status" — clarifies what's actually
  landed today vs. the envisioned arch: parent's
  `trio.run()` still lives on main interp (subint-
  hosted root gated on jcrist/msgspec#1026). Names
  why the "subint" prefix is correct anyway — same PR
  series as `_subint.py` / `_subint_fork.py`.
- "What survives the fork? — POSIX semantics" — POSIX
  preserves only the calling thread, so the
  `trio.run()` thread is gone in the child. Includes
  a small parent/child thread-survival table and
  covers the four artifact classes that DO cross the
  fork boundary (inherited fds, COW memory, Python
  thread state, user-level locks) and how each is
  handled.
- "FYI: how this dodges the `trio.run()` × `fork()`
  hazards" — itemizes each class of trio process-
  global state (wakeup-fd, `epoll`/`kqueue`,
  threadpool, cancel scopes / nurseries, `atexit`,
  foreign-language I/O) and explains how the
  forkserver-thread design avoids each.

Also,
- bump the gated msgspec issue link from
  `jcrist/msgspec#563` to `jcrist/msgspec#1026` (the
  PEP 684 isolated-mode tracker).

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
subint_forkserver_backend
Gud Boi 2026-04-27 18:16:50 -04:00
parent 54561959e6
commit 3ab99d557a
1 changed files with 204 additions and 12 deletions

View File

@ -36,20 +36,212 @@ across four scenarios by
py3.14.
This submodule lifts the validated primitives out of the
smoke-test and into tractor proper, so they can eventually be
wired into a real "subint forkserver" spawn backend where:
smoke-test and into tractor proper as the
`subint_forkserver` spawn backend.
Design rationale why a forkserver, and why in-process
-------------------------------------------------------
There are two design questions worth pinning down up front,
since the name "subint_forkserver" intentionally evokes the
stdlib `multiprocessing.forkserver` for comparison:
**(1) Why a forkserver pattern at all, vs. forking directly
from the trio task?**
`os.fork()` is fundamentally hostile to trio: trio owns
file descriptors, signal-wakeup-fds, threadpools, and an
event loop with non-trivial post-fork lifecycle invariants
(see python-trio/trio#1614 et al.). Forking a trio-running
thread duplicates all that state into the child, which then
either needs surgical reset (fragile) or has to immediately
`exec()` (defeats the point of fork-without-exec). The
*forkserver* sidesteps this by isolating the `os.fork()`
call in a worker that has provably never entered trio so
the child inherits a clean, trio-free image.
**(2) Why an in-process forkserver, vs. stdlib
`multiprocessing.forkserver`?**
The stdlib design solves the same "fork from clean state"
problem by spinning up a **separate sidecar process** at
first use of `mp.set_start_method('forkserver')`. The parent
then IPC's each spawn request to that sidecar over a unix
socket; the sidecar is the process that actually calls
`os.fork()`. This works but pays for cleanliness with three
costs:
- **Sidecar lifecycle**: a second long-lived process per
parent, with its own start/stop/health-check semantics.
- **IPC overhead per spawn**: every actor-spawn round-trips
an `mp` request message through a unix socket before any
child code runs.
- **State isolation by process boundary**: the sidecar can't
share parent state at all every spawn is a "cold" child
re-importing modules from disk.
The subint architecture lets us keep the forkserver
**in-process** because subints already provide the
state-isolation guarantee that `mp.forkserver`'s sidecar
buys via the process boundary. Concretely: in the envisioned
arch (currently partially landed see "Status" below),
- the **main interpreter** stays trio-free and hosts the
forkserver worker thread that owns `os.fork()`,
- the parent actor's **`trio.run()`** lives in a separate
*sub-interpreter* (a different worker thread) fully
isolated `sys.modules` / `__main__` / globals from main,
- when a spawn is requested, the trio task signals the
forkserver thread (intra-process, ~free) and the
forkserver forks; the child inherits the parent's full
in-memory state cheaply.
That collapses the three costs above:
- no sidecar the forkserver is just another thread,
- spawn signal is a thread-local event/condition, not IPC,
- child inherits the warm parent state (loaded modules,
populated caches, etc.) for free.
The tradeoff we accept in exchange: this design is
3.14-only (legacy-config subints still share the GIL, so
the parent's trio loop and the forkserver worker contend
on it; once PEP 684 isolated-mode + msgspec
[jcrist/msgspec#1026](https://github.com/jcrist/msgspec/issues/1026)
land, this constraint relaxes). And the dedicated worker
threads here are heavier than `trio.to_thread.run_sync`
calls see the "TODO" section further down for the audit
plan once those upstream pieces land.
Implementation status what's wired today
-----------------------------------------
The "envisioned arch" above is the eventual target; the
**currently-landed** flow is a partial step toward it:
- A dedicated main-interp worker thread owns all `os.fork()`
calls (never enters a subint).
- The tractor parent-actor's `trio.run()` lives in a
sub-interpreter on a different worker thread.
- When a spawn is requested, the trio-task signals the
forkserver thread; the forkserver forks; child re-enters
the same pattern (trio in a subint + forkserver on main).
calls (never enters a subint). landed.
- Parent actor's `trio.run()` lives **on the main interp**
for now (not a subint yet). The subint-hosted root
runtime is gated on jcrist/msgspec#1026 (see
`_subint.py` docstring).
- Spawn-request signal: trio task ` to_thread.run_sync`
to the forkserver-worker thread. landed.
- Forked child: runs `_actor_child_main` against a normal
trio runtime. landed.
This mirrors the stdlib `multiprocessing.forkserver` design
but keeps the forkserver in-process for faster spawn latency
and inherited parent state.
The "subint" in the backend name refers to the *family*
this backend ships in the same PR series as `_subint.py`
(in-thread subint backend) and `_subint_fork.py` (the RFC
stub for fork-from-non-main-subint, blocked upstream).
Once the parent's trio also lives in a subint we'll have
the full envisioned arch; until then the forkserver
half is independently useful and ship-able.
What survives the fork? POSIX semantics
-----------------------------------------
A natural worry when forking from a parent that's running
`trio.run()` on another thread: does that trio thread (and
any other threads in the parent) keep running in the child?
**No.** POSIX `fork()` only preserves the *calling* thread
in the child. Every other thread in the parent trio's
runner thread, any `to_thread` cache threads, anything else
is gone the instant `fork()` returns in the child.
Concretely, after the forkserver worker calls `os.fork()`:
| thread | parent | child |
|-----------------------|-----------|---------------|
| forkserver worker | continues | sole survivor |
| `trio.run()` thread | continues | gone |
| any other thread | continues | gone |
The forkserver worker becomes the new "main" execution
context in the child; `trio.run()` and every other
parent thread never executes a single instruction
post-fork in the child.
This is exactly *why* `os.fork()` is delegated to a
dedicated worker thread that has provably never entered
trio: we want that trio-free thread to be the surviving
one in the child.
That said, dead-thread *artifacts* still cross the fork
boundary (canonical "fork in a multithreaded program is
dangerous" — see `man pthread_atfork`). What persists, and
how we handle each:
- **Inherited file descriptors** the dead trio thread's
epoll fd, signal-wakeup-fd, eventfds, sockets, IPC
pipes, pytest's capture-fds, etc. are all still in the
child's fd table (kernel-level inheritance). Handled by
`_close_inherited_fds()` in the child prelude walks
`/proc/self/fd` and closes everything except stdio +
the channel pipe to the forkserver.
- **Memory image** trio's internal data structures
(scheduler, task queues, runner state) sit in COW
memory but nobody's executing them. Get GC'd /
overwritten when the child's fresh `trio.run()` boots.
- **Python thread state** handled automatically by
CPython. `PyOS_AfterFork_Child()` calls
`_PyThreadState_DeleteExceptCurrent()`, so dead
`PyThreadState` objects are cleaned and
`threading.enumerate()` returns just the surviving
thread.
- **User-level locks (`threading.Lock`)**
held-by-dead-thread state is the canonical fork hazard.
Not an issue in practice for tractor: trio doesn't hold
cross-thread locks across fork (its synchronization is
within the trio task system, which doesn't survive in
either direction). CPython's GIL is auto-reset by the
fork callback.
FYI: how this dodges the `trio.run()` × `fork()` hazards
--------------------------------------------------------
`os.fork()` is famously hostile to `trio` (see
python-trio/trio#1614 et al.) because trio owns several
classes of process-global state that all break across the
fork boundary in different ways. The forkserver-thread
design dodges each class explicitly:
- **Signal-wakeup-fd**: trio installs a wakeup-fd via
`signal.set_wakeup_fd()` on `trio.run()` startup so
signals can interrupt `epoll_wait`. The child inherits
this fd, but trio's runner that owns it is gone — so
any signal delivery in the child writes to a dead
reader. *Dodge*: the inherited wakeup-fd is closed by
`_close_inherited_fds()`, then the child's own
`trio.run()` installs a fresh one.
- **`epoll`/`kqueue` instance**: trio's I/O backend holds
one. Inherited as a dead fd; same fix as above.
- **Threadpool cache threads** (`trio.to_thread`): worker
threads with cached tstate. Don't exist in the child
(POSIX); cache state is meaningless garbage that gets
reset when the child's trio.run() initializes its own
thread cache.
- **Cancel scopes / nurseries / open `trio.Process` /
open sockets**: these are trio-runtime objects, not
kernel objects. The runtime that owns them is gone in
the child, so the Python objects exist as zombie data
in COW memory and get overwritten as the child runs.
Inherited *kernel* fds those objects wrapped (sockets,
proc pipes) are caught by `_close_inherited_fds()`.
- **`atexit` handlers**: trio doesn't register any that
would mis-fire post-fork; trio's lifetime-stack is
all `with`-block-scoped and dies with the runner.
- **Foreign-language I/O state** (libcurl, OpenSSL session
caches, etc.): out of scope same hazard as any
fork-without-exec; users layering those on top of
tractor need their own pthread_atfork handlers.
Net effect: for the runtime surface tractor controls
(trio + IPC layer + msgspec), the forkserver-thread
isolation + `_close_inherited_fds()` cleanup gives the
forked child a clean trio environment. Everything else
falls under the standard fork-without-exec disclaimer.
Status
------
@ -100,7 +292,7 @@ to know.
Full analysis + audit plan for when we can revisit is in
`ai/conc-anal/subint_forkserver_thread_constraints_on_pep684_issue.md`.
Intent: file a follow-up GH issue linked to #379 once
[jcrist/msgspec#563](https://github.com/jcrist/msgspec/issues/563)
[jcrist/msgspec#1026](https://github.com/jcrist/msgspec/issues/1026)
unblocks isolated-mode subints in tractor.
See also