Compare commits

..

3 Commits

Author SHA1 Message Date
Gud Boi 532a9834f3 Add posix-multithreaded-`fork()` explainer doc 2026-04-29 12:50:23 -04:00
Gud Boi 2917b74ba4 Add todo for running `test_debugger` suite on forkserver spawner 2026-04-29 12:49:36 -04:00
Gud Boi 2d4995e08d Route `stackscope` SIGUSR1 onto trio loop
Signal handlers fire in a non-trio stack frame; calling
`stackscope.extract(recurse_child_tasks=True)` from there
only walks the `<init>` task and misses everything inside
`async_main`'s nurseries — exactly the part you want to
see during a hang.

Fix: capture `trio.lowlevel.current_trio_token()` at
`enable_stack_on_sig()` time and stash it as a module-
level `_trio_token`. The SIGUSR1 handler then dispatches
the dump *onto* the trio loop via
`_trio_token.run_sync_soon(_safe_dump_task_tree)`, so
`stackscope.extract` runs from a real trio-task context
and walks the full nursery tree.

Late-binding: pytest's `pytest_configure` calls
`enable_stack_on_sig()` outside any `trio.run`, so token
capture there is a `RuntimeError` — left at `None`. The
runtime re-calls `enable_stack_on_sig()` from inside
`async_main` (subactor side) where the token IS
available, so subactors get the full-tree path.
`dump_tree_on_sig` falls back to a direct call when
`_trio_token is None` (parent process pre-trio.run, or
signal delivered after `trio.run` returns).

`_safe_dump_task_tree()` is a `run_sync_soon`-friendly
wrapper that swallows any exception from
`dump_task_tree()` — trio prints + crashes on uncaught
exceptions in scheduled callbacks; better to log + keep
the run alive so the user can re-trigger.

Other,
- emit `capture-bypass tee: <fpath>` line + `tail -f`
  hint in the rendered dump header so users know where
  to find the artifact even when stdio is captured.
- swap the inline `f'     |_{actor}'` line for a
  `_pformat.nest_from_op` rendering of `actor_repr`
  (matches the rest of the runtime's nested-op style).
- log lines on handler install + already-installed
  branches now note `(trio_token captured: <bool>)`
  so it's obvious from the log whether the full-tree
  path is wired.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-29 12:01:03 -04:00
3 changed files with 385 additions and 18 deletions

View File

@ -0,0 +1,281 @@
# `fork()` in a multi-threaded program — execution-side vs. memory-side of the same coin
A reference doc for readers who've encountered one of two
opposite-sounding framings of POSIX `fork()` semantics in a
multi-threaded program and are confused by the other.
This is a sibling to
`subint_fork_blocked_by_cpython_post_fork_issue.md` — that
doc covers a CPython-level refusal of fork-from-subint;
this one covers the more general POSIX layer, since
tractor's main-thread forkserver design rests on it.
## TL;DR
POSIX `fork()` only preserves the *calling* thread as a
runnable thread in the child — every other thread in the
parent simply never executes another instruction in the
child. trio's docs call this "leaked"; tractor's
`_main_thread_forkserver.py` docstring calls it "gone".
Both are correct: "gone" is the *execution* side (no
scheduler entry, no instructions retired), "leaked" is the
*memory* side (the dead threads' stacks and per-thread
heap structures still ride into the child's address space
as orphaned COW pages with no owner and no cleanup hook).
Same POSIX reality, two halves of the same coin.
## The two framings
[python-trio/trio#1614][trio-1614] (the canonical "trio +
fork" hazards thread) puts it this way:
> If you use `fork()` in a process with multiple threads,
> all the other thread stacks are just leaked: there's
> nothing else you can reasonably do with them.
`tractor.spawn._main_thread_forkserver`'s module docstring
(specifically the "What survives the fork? — POSIX
semantics" section) puts it this way:
> POSIX `fork()` only preserves the *calling* thread as a
> runnable thread in the child. Every other thread in the
> parent — trio's runner thread, any `to_thread` cache
> threads, anything else — never executes another
> instruction post-fork.
A reader bouncing between the two can be forgiven for
asking: well, *which* is it — leaked or gone?
The answer is "yes". They're describing the same POSIX
behavior from two different angles:
- trio is talking about the **bytes** the dead threads
leave behind — stacks, TLS slots, per-thread arena
metadata — and the fact that nothing in the child can
drive them forward, free them, or even safely walk
them. That's a memory leak in the strict sense: held
but unreachable.
- tractor is talking about the **execution** side
relevant to the forkserver design: which threads
retire instructions in the child? Exactly one — the
one that called `fork()`. Everything else, regardless
of the bytes left behind, is dead in a scheduler
sense.
Neither framing is wrong; they're just answering
different questions.
## POSIX `fork()` in a multi-threaded program — what actually happens
Per POSIX (and concretely on Linux glibc), the contract
of `fork()` in a multi-threaded process is:
1. The kernel creates a new process whose virtual
address space is a COW copy of the parent's. *All*
pages map across — code, heap, every thread's stack,
every malloc arena, every mmap region.
2. Of the parent's N threads, exactly **one** is
reified in the child as a runnable kernel task: the
thread that called `fork()`. The other N-1 threads
have *no* corresponding task in the child kernel. They
were never scheduled, never `clone()`d for the child,
never exist as runnable entities.
3. Their **memory artifacts** — pthread stacks, TLS,
`pthread_t` structures, glibc per-thread arena
bookkeeping — are still mapped in the child's address
space, because (1) duplicates *everything* page-wise.
They sit there as inert COW bytes.
4. The kernel does not clean those bytes up. There is no
"phantom-thread cleanup" pass post-fork. The kernel
doesn't know which mapped pages "belonged to" which
thread — at the kernel level mappings are
process-scoped, not thread-scoped.
5. The surviving thread (the caller of `fork()`) cannot
safely access those leaked bytes either. Any state
they encoded — held mutexes, in-flight syscalls,
half-updated invariants — is frozen at whatever
instant the parent's fork-syscall observed it. Some
of those mutexes may even still be locked from the
child's POV (the canonical "fork-in-multithreaded-
program-deadlocks" hazard; see `man pthread_atfork`).
So: from the kernel's PoV, the child has one thread.
From the address-space's PoV, the child has all the
parent's bytes — including the corpses of the N-1 dead
threads' stacks. Both true simultaneously.
## Why trio says "leaked"
trio's framing makes sense from the parent's
PoV, looking at *what those threads were doing*. In a
running `trio.run()` process you typically have:
- The trio runner thread itself — owns the `selectors`
epoll fd, the signal-wakeup-fd, the run-queue.
- Threadpool worker threads (`trio.to_thread`'s cache)
— blocked in `wait()` on the threadpool's work
condvar.
- Whatever other ad-hoc threads the application
started.
Each of those threads owns *real work-state*: epoll
registrations, file descriptors held in
soon-to-be-completed reads, half-released locks, posted
but unconsumed wakeups. After fork, that state is still
encoded in the child's memory. None of it is invalid in
a well-formed-bytes sense. It's just that:
- The thread that was driving it is gone.
- Nothing else in the child knows the layout well
enough to take over.
- Even if it did, the kernel objects backing the work
(epoll fd, signalfd) have separate post-fork
semantics that don't compose with userland trio
state.
So the bytes are *held* (they're in the child's
address space, they count against RSS, they survive
until something clobbers them), and they're
*unreachable* in any meaningful sense — no thread can
safely drive them forward. That is the textbook
definition of a leak.
trio's quote is reminding the user that `fork()` from a
multi-threaded process is a one-way memory hazard:
whatever those threads were doing, that work-state is
now garbage you happen to still be carrying.
## Why tractor says "gone"
tractor's `_main_thread_forkserver` framing is concerned
with a different question: *which thread executes in the
child, and is it safe?*
The forkserver design rests on POSIX's "calling thread
is the sole survivor" guarantee. We pick that calling
thread very deliberately: a dedicated worker that has
provably never entered trio. So the thread that *does*
run in the child is one whose locals, TLS, and stack
contain nothing trio-related. Trio's runner thread —
the one that owned the epoll fd and the run-queue — is
*gone* from the child in the execution sense. It will
never run another instruction. The fact that its stack
bytes still exist in the child's address space (the
"leaked" view) is irrelevant to the forkserver, because
nothing in the child reads or writes those pages.
So when the docstring says "Every other thread … is
gone the instant `fork()` returns in the child", it's
being precise about the surface that matters for the
backend: scheduler-level liveness. Nothing schedules
those threads ever again. Whether their bytes are
hanging around is a separate (and, for the design,
non-load-bearing) fact.
## Cross-table
The same tabular layout the `_main_thread_forkserver`
docstring uses, expanded with a fourth "what handles
it" column:
| thread | parent | child (executing) | child (memory) | what handles it |
|---------------------|-----------|-------------------|------------------------------|-----------------------------|
| forkserver worker | continues | sole survivor | live stack | runs the child's bootstrap |
| `trio.run()` thread | continues | not running | leaked stack (zombie bytes) | overwritten by child's fresh `trio.run()` |
| any other thread | continues | not running | leaked stack (zombie bytes) | overwritten / GC'd / clobbered by `exec()` if used |
The "child (executing)" column is the *execution* side
of the coin — what tractor cares about. The "child
(memory)" column is the *memory* side — what trio
cares about.
The "what handles it" column is the deliberate punchline
of the design: nothing has to handle the leaked bytes
*explicitly*. They get clobbered by ordinary forward
progress in the child:
- The fresh `trio.run()` the child boots up allocates
its own stack, scheduler, and run-queue, which over
time overlaps and overwrites the inherited zombie
pages.
- Python's GC walks live objects only; the dead-thread
Python frames aren't reachable from any
`PyThreadState`, so they get freed at the next
collection cycle.
- If the child eventually `exec()`s, the entire address
space is replaced and the leak vanishes.
## What this means for the forkserver design
The crucial point is that **the design doesn't and
*can't* prevent the leak**. There is no userland fix
for COW thread stacks. The kernel hands the child a
duplicated address space; that's what `fork()` *is*. No
amount of pre-fork hookery, `pthread_atfork()`
gymnastics, or post-fork cleanup can un-COW the dead
threads' pages without unmapping them, and unmapping
arbitrary regions of a duplicated address space is
neither portable nor safe.
What the design *does* ensure is the orthogonal
property: the survivor thread is one that doesn't need
any of that leaked state to function. Concretely:
- Survivor is the forkserver worker thread.
- That worker has provably never imported, called into,
or held any reference to `trio`. (Enforced by keeping
the worker's lifecycle entirely in
`_main_thread_forkserver.py` and never letting trio
task-state cross into it.)
- So the leaked pages — trio runner stack, threadpool
caches, etc. — are inert relative to the survivor.
No code path in the child references them.
- The child then boots its own fresh `trio.run()`,
which allocates new state in new pages. Over the
child's lifetime the COW'd zombie pages get
overwritten, GC'd, or (if the child eventually
`exec()`s) discarded wholesale.
The "leak" is real but inert. It costs RSS until
clobbered; it doesn't cost correctness. That's exactly
the property the forkserver pattern is built on, and
it's also why the design needs the "calling thread is
trio-free" precondition to be airtight: if the survivor
were a trio thread, it *would* try to drive the leaked
trio state, and the leak would no longer be inert.
## See also
- `tractor/spawn/_main_thread_forkserver.py` — module
docstring's "What survives the fork? — POSIX
semantics" section is the in-tree, code-adjacent
prose this doc expands on. The cross-table here is a
fourth-column expansion of the table there.
- [python-trio/trio#1614][trio-1614] — the trio issue
with the "leaked" framing, and the canonical thread
for trio + `fork()` hazards more broadly.
- [`subint_fork_blocked_by_cpython_post_fork_issue.md`](./subint_fork_blocked_by_cpython_post_fork_issue.md)
— sibling analysis covering CPython's *post-fork*
hooks (`PyOS_AfterFork_Child`,
`_PyInterpreterState_DeleteExceptMain`) and why
fork-from-non-main-subint is a CPython-level hard
refusal. Complementary axis: this doc is about POSIX
semantics; that doc is about the CPython runtime
layer that runs *after* POSIX `fork()` returns in
the child.
- `man pthread_atfork(3)` — canonical "fork in a
multithreaded process is dangerous" reference.
Especially the rationale section, which is the
closest thing to a normative statement of "the
surviving thread cannot safely use anything the dead
threads were touching."
- `man fork(2)` (Linux) — "Other than [the calling
thread], … no other threads are replicated …"
paragraph is the kernel-side statement of the
execution-side framing this doc opens with.
[trio-1614]: https://github.com/python-trio/trio/issues/1614

View File

@ -65,9 +65,18 @@ def spawn(
run an `./examples/..` script by name. run an `./examples/..` script by name.
''' '''
if start_method != 'trio': supported_spawners: set[str] = {
'trio',
# ?TODO, other spawners that will work?
# - [ ] need to pass `start_method={spawner}` to underlying
# `examples/debugging/<script>.py` somehow?
# 'main_thread_forkserver',
# 'subint_forkserver',
}
if start_method not in supported_spawners:
pytest.skip( pytest.skip(
'`pexpect` based tests only supported on `trio` backend' f'`pexpect` based tests NOT supported on spawning-backend: {start_method!r}\n'
f'supported-spawners: {supported_spawners!r}'
) )
def unset_colors(): def unset_colors():
@ -148,21 +157,38 @@ def spawn(
def ctlc( def ctlc(
request: pytest.FixtureRequest, request: pytest.FixtureRequest,
ci_env: bool, ci_env: bool,
start_method: str,
) -> bool: ) -> bool:
'''
Parametrize and optionally skip tests which handle
ctlc-in-`pdbp`-REPL testing scenarios; certain spawners and actor-tree depths
cope very poorly with this..
In particular the spawning backends from `multiprocessing` are
fragile, as can be the default `trio` spawner under certain
conditions where SIGINT is relayed down the entire subproc tree.
'''
use_ctlc: bool = request.param use_ctlc: bool = request.param
node = request.node node = request.node
markers = node.own_markers markers = node.own_markers
for mark in markers: for mark in markers:
if mark.name == 'has_nested_actors': if (
mark.name == 'has_nested_actors'
and
start_method not in {
# TODO, any spawners we should try again?
# - [ ] 'trio' but WITHOUT the SIGINT handler setup
# per subproc?
# 'main_thread_forkserver',
}
):
pytest.skip( pytest.skip(
f'Test {node} has nested actors and fails with Ctrl-C.\n' f'Test {node} has nested actors and fails with Ctrl-C.\n'
f'The test can sometimes run fine locally but until' f'The test can sometimes run fine locally but until'
' we solve' 'this issue this CI test will be xfail:\n' ' we solve' 'this issue this CI test will be xfail:\n'
'https://github.com/goodboy/tractor/issues/320' 'https://github.com/goodboy/tractor/issues/320'
) )
if ( if (
mark.name == 'ctlcs_bish' mark.name == 'ctlcs_bish'
and and

View File

@ -47,7 +47,9 @@ from typing import (
import trio import trio
from tractor.runtime import _state from tractor.runtime import _state
from tractor import log as logmod from tractor import log as logmod
from tractor.devx import debug from tractor.devx import (
debug,
)
log = logmod.get_logger() log = logmod.get_logger()
@ -109,16 +111,29 @@ def dump_task_tree() -> None:
# |_{Supervisor/Scope # |_{Supervisor/Scope
# |_[Storage/Memory/IPC-Stream/Data-Struct # |_[Storage/Memory/IPC-Stream/Data-Struct
fpath: str = f'/tmp/tractor-stackscope-{os.getpid()}.log'
from . import _pformat
actor_repr: str = _pformat.nest_from_op(
input_op='|_',
text=f'{actor}',
nest_prefilx='|_',
nest_indent=3,
)
full_dump: str = ( full_dump: str = (
f'Dumping `stackscope` tree for actor\n' f'Dumping `stackscope` tree for actor\n'
f'(>: {actor.uid!r}\n' f'(>: {actor.uid!r}\n'
f' |_{mp.current_process()}\n' f' |_{mp.current_process()}\n'
f' |_{thr}\n' f' |_{thr}\n'
f' |_{actor}\n' # TODO, use the nest_from_op
f'{actor_repr}'
# f' |_{actor}'
f'\n' f'\n'
f'{sigint_handler_report}\n' f'{sigint_handler_report}\n'
f'signal.getsignal(SIGINT) -> {current_sigint_handler!r}\n' f'signal.getsignal(SIGINT) -> {current_sigint_handler!r}\n'
f'\n' f'\n'
f'capture-bypass tee: {fpath}\n'
f'(`tail -f {fpath}` to follow across signals)\n'
f'\n'
f'------ start-of-{actor.uid!r} ------\n' f'------ start-of-{actor.uid!r} ------\n'
f'|\n' f'|\n'
f'{tree_str}' f'{tree_str}'
@ -131,7 +146,6 @@ def dump_task_tree() -> None:
# `--capture=fd` swallows `log.devx()` above; the # `--capture=fd` swallows `log.devx()` above; the
# following two writes guarantee the dump reaches the # following two writes guarantee the dump reaches the
# human even when stdio is captured. # human even when stdio is captured.
fpath: str = f'/tmp/tractor-stackscope-{os.getpid()}.log'
try: try:
with open(fpath, 'a') as f: with open(fpath, 'a') as f:
f.write(full_dump + '\n') f.write(full_dump + '\n')
@ -151,6 +165,34 @@ def dump_task_tree() -> None:
_handler_lock = RLock() _handler_lock = RLock()
_tree_dumped: bool = False _tree_dumped: bool = False
# Captured at `enable_stack_on_sig()` time when running
# inside a trio task. `dump_tree_on_sig` uses this to
# schedule `dump_task_tree` ON the trio loop via
# `token.run_sync_soon` so stackscope sees a real current
# task and can recurse into nursery children. Without
# it (signal handler running in a non-trio stack frame),
# `stackscope.extract` only walks the `<init>` task and
# misses everything inside `async_main`'s nurseries.
_trio_token: trio.lowlevel.TrioToken|None = None
def _safe_dump_task_tree() -> None:
'''
`run_sync_soon`-friendly wrapper that swallows any
exception from `dump_task_tree`. Trio prints
+ crashes on uncaught exceptions in scheduled
callbacks; we'd rather log + keep the test running so
the user can re-trigger the dump.
'''
try:
dump_task_tree()
except BaseException:
log.exception(
'`dump_task_tree()` raised (scheduled via '
'`run_sync_soon`); continuing.\n'
)
def dump_tree_on_sig( def dump_tree_on_sig(
sig: int, sig: int,
@ -174,16 +216,17 @@ def dump_tree_on_sig(
'Trying to dump `stackscope` tree..\n' 'Trying to dump `stackscope` tree..\n'
) )
try: try:
dump_task_tree() # Prefer scheduling on the trio loop — runs the
# await actor._service_n.start_soon( # dump from a real trio-task context so
# partial( # `stackscope.extract(recurse_child_tasks=True)`
# trio.to_thread.run_sync, # walks every nursery child instead of seeing
# dump_task_tree, # only the `<init>` task. Falls back to a direct
# ) # call when no token was captured (e.g. signal
# ) # delivered outside a trio.run).
# trio.lowlevel.current_trio_token().run_sync_soon( if _trio_token is not None:
# dump_task_tree _trio_token.run_sync_soon(_safe_dump_task_tree)
# ) else:
dump_task_tree()
except RuntimeError: except RuntimeError:
log.exception( log.exception(
@ -269,11 +312,27 @@ def enable_stack_on_sig(
) )
return None return None
# Capture the trio token if we're inside `trio.run`
# so SIGUSR1 dispatches the dump *onto* the trio loop
# (full task-tree visibility). When called outside trio
# (e.g. from `pytest_configure`), token capture fails
# silently and `dump_tree_on_sig` falls back to the
# direct-call path.
global _trio_token
try:
_trio_token = trio.lowlevel.current_trio_token()
except RuntimeError:
# not in a `trio.run` — leave None; runtime can
# re-call `enable_stack_on_sig()` later from
# inside `async_main` to capture it.
_trio_token = None
handler: Callable|int = getsignal(sig) handler: Callable|int = getsignal(sig)
if handler is dump_tree_on_sig: if handler is dump_tree_on_sig:
log.devx( log.devx(
'A `SIGUSR1` handler already exists?\n' 'A `SIGUSR1` handler already exists?\n'
f'|_ {handler!r}\n' f'|_ {handler!r}\n'
f'(trio_token captured: {_trio_token is not None})\n'
) )
return return
@ -287,5 +346,6 @@ def enable_stack_on_sig(
f'{stackscope!r}\n\n' f'{stackscope!r}\n\n'
f'With `SIGUSR1` handler\n' f'With `SIGUSR1` handler\n'
f'|_{dump_tree_on_sig}\n' f'|_{dump_tree_on_sig}\n'
f'(trio_token captured: {_trio_token is not None})\n'
) )
return stackscope return stackscope