Refine fork-survival docs + `EBADF` handling
Two cleanup tweaks in `_main_thread_forkserver`: Doc, "what survives the fork?" section — expand the "non-calling threads are gone in the child" claim with the precise execution-vs-memory split that reconciles this module's prior framing with trio's (canonical [python-trio/trio#1614][trio-1614]) "leaked stacks" framing: - execution-side: only the calling thread runs post-fork; all others never execute another instruction. - memory-side: those non-running threads' stacks + per-thread heap structures are still COW-inherited as orphaned bytes — what trio means by "leaked". Same POSIX reality, opposite sides; the table is extended to a 4-col `parent | child (executing) | child (memory)` layout to make both views explicit. Also blank-line-padded the bulleted hazard classes for cleaner markdown rendering. [trio-1614]: https://github.com/python-trio/trio/issues/1614 Code, `_close_inherited_fds()` log noise — split the catch-all `except OSError` into: - `EBADF` — benign race where the dirfd that `os.listdir('/proc/self/fd')` itself opened ends up in `candidates`, then auto-closes before the loop reaches it. Demote to `log.debug()` + `continue`; prior `log.exception` drowned the post-fork log channel with stack traces every spawn. - other errnos (EIO / EPERM / EINTR / ...) keep the loud `log.exception` surface — those ARE genuinely unexpected. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-codesubint_forkserver_backend
parent
5418f2dc3c
commit
8c730193f9
|
|
@ -38,6 +38,7 @@ Two empirical CPython properties drive the design:
|
||||||
the forked child otherwise (`Fatal Python error: not main
|
the forked child otherwise (`Fatal Python error: not main
|
||||||
interpreter`). Full source-level walkthrough:
|
interpreter`). Full source-level walkthrough:
|
||||||
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
`ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`.
|
||||||
|
|
||||||
2. **`os.fork()` from a regular `threading.Thread` attached to
|
2. **`os.fork()` from a regular `threading.Thread` attached to
|
||||||
the *main* interpreter — i.e. a worker thread that has never
|
the *main* interpreter — i.e. a worker thread that has never
|
||||||
entered a subint — works cleanly.** Empirically validated
|
entered a subint — works cleanly.** Empirically validated
|
||||||
|
|
@ -86,9 +87,11 @@ costs:
|
||||||
|
|
||||||
- **Sidecar lifecycle**: a second long-lived process per
|
- **Sidecar lifecycle**: a second long-lived process per
|
||||||
parent, with its own start/stop/health-check semantics.
|
parent, with its own start/stop/health-check semantics.
|
||||||
|
|
||||||
- **IPC overhead per spawn**: every actor-spawn round-trips
|
- **IPC overhead per spawn**: every actor-spawn round-trips
|
||||||
an `mp` request message through a unix socket before any
|
an `mp` request message through a unix socket before any
|
||||||
child code runs.
|
child code runs.
|
||||||
|
|
||||||
- **State isolation by process boundary**: the sidecar can't
|
- **State isolation by process boundary**: the sidecar can't
|
||||||
share parent state at all — every spawn is a "cold" child
|
share parent state at all — every spawn is a "cold" child
|
||||||
re-importing modules from disk.
|
re-importing modules from disk.
|
||||||
|
|
@ -106,6 +109,7 @@ For the full variant-2 picture see
|
||||||
1) we already get costs 1 + 2 collapsed; cost 3 will land
|
1) we already get costs 1 + 2 collapsed; cost 3 will land
|
||||||
when msgspec#1026 unblocks isolated-mode subints.
|
when msgspec#1026 unblocks isolated-mode subints.
|
||||||
|
|
||||||
|
|
||||||
What survives the fork? — POSIX semantics
|
What survives the fork? — POSIX semantics
|
||||||
-----------------------------------------
|
-----------------------------------------
|
||||||
|
|
||||||
|
|
@ -113,33 +117,58 @@ A natural worry when forking from a parent that's running
|
||||||
`trio.run()` on another thread: does that trio thread (and
|
`trio.run()` on another thread: does that trio thread (and
|
||||||
any other threads in the parent) keep running in the child?
|
any other threads in the parent) keep running in the child?
|
||||||
|
|
||||||
**No.** POSIX `fork()` only preserves the *calling* thread
|
**No** — but with a precise meaning that's worth pinning
|
||||||
in the child. Every other thread in the parent — trio's
|
down, since the canonical trio framing
|
||||||
runner thread, any `to_thread` cache threads, anything else
|
([python-trio/trio#1614](https://github.com/python-trio/trio/issues/1614))
|
||||||
— is gone the instant `fork()` returns in the child.
|
puts it the opposite-sounding way:
|
||||||
|
|
||||||
|
> If you use `fork()` in a process with multiple threads,
|
||||||
|
> all the other thread stacks are just leaked: there's
|
||||||
|
> nothing else you can reasonably do with them.
|
||||||
|
|
||||||
|
Both statements describe the same POSIX reality from
|
||||||
|
opposite sides:
|
||||||
|
|
||||||
|
- **Execution-side ("gone")**: POSIX `fork()` only
|
||||||
|
preserves the *calling* thread as a runnable thread in
|
||||||
|
the child. Every other thread in the parent — trio's
|
||||||
|
runner thread, any `to_thread` cache threads, anything
|
||||||
|
else — never executes another instruction post-fork.
|
||||||
|
|
||||||
|
- **Memory-side ("leaked")**: those non-running threads'
|
||||||
|
*stacks* and per-thread heap structures are still
|
||||||
|
COW-inherited into the child's address space. They
|
||||||
|
persist as orphaned bytes with no owning thread, no
|
||||||
|
scheduler entry, and no way for the child to clean
|
||||||
|
them up — hence trio's word "leaked".
|
||||||
|
|
||||||
Concretely, after the forkserver worker calls `os.fork()`:
|
Concretely, after the forkserver worker calls `os.fork()`:
|
||||||
|
|
||||||
| thread | parent | child |
|
| thread | parent | child (executing) | child (memory) |
|
||||||
|-----------------------|-----------|---------------|
|
|---------------------|-----------|-------------------|-----------------------------|
|
||||||
| forkserver worker | continues | sole survivor |
|
| forkserver worker | continues | sole survivor | live stack |
|
||||||
| `trio.run()` thread | continues | gone |
|
| `trio.run()` thread | continues | not running | leaked stack (zombie bytes) |
|
||||||
| any other thread | continues | gone |
|
| any other thread | continues | not running | leaked stack (zombie bytes) |
|
||||||
|
|
||||||
The forkserver worker becomes the new "main" execution
|
The forkserver worker becomes the new "main" execution
|
||||||
context in the child; `trio.run()` and every other parent
|
context in the child; `trio.run()` and every other parent
|
||||||
thread never executes a single instruction post-fork in the
|
thread never executes a single instruction post-fork.
|
||||||
child.
|
Their stack memory rides along as inert COW pages until
|
||||||
|
the child's fresh `trio.run()` boots and overwrites/GCs
|
||||||
|
it (or until the child `exec()`s and discards the entire
|
||||||
|
image).
|
||||||
|
|
||||||
This is exactly *why* `os.fork()` is delegated to a
|
This is exactly *why* `os.fork()` is delegated to a
|
||||||
dedicated worker thread that has provably never entered
|
dedicated worker thread that has provably never entered
|
||||||
trio: we want that trio-free thread to be the surviving
|
trio: we want that trio-free thread to be the surviving
|
||||||
one in the child.
|
*executing* thread in the child, with the leaked trio
|
||||||
|
stack reduced to inert COW pages we don't touch.
|
||||||
|
|
||||||
That said, dead-thread *artifacts* still cross the fork
|
The leaked-stack residue is one slice of the broader
|
||||||
boundary (canonical "fork in a multithreaded program is
|
"fork in a multithreaded program is dangerous" hazard
|
||||||
dangerous" — see `man pthread_atfork`). What persists, and
|
class (see `man pthread_atfork`). Other dead-thread
|
||||||
how we handle each:
|
artifacts that cross the fork boundary, and how we handle
|
||||||
|
each:
|
||||||
|
|
||||||
- **Inherited file descriptors** — the dead trio thread's
|
- **Inherited file descriptors** — the dead trio thread's
|
||||||
epoll fd, signal-wakeup-fd, eventfds, sockets, IPC
|
epoll fd, signal-wakeup-fd, eventfds, sockets, IPC
|
||||||
|
|
@ -148,16 +177,20 @@ how we handle each:
|
||||||
`_close_inherited_fds()` in the child prelude — walks
|
`_close_inherited_fds()` in the child prelude — walks
|
||||||
`/proc/self/fd` and closes everything except stdio +
|
`/proc/self/fd` and closes everything except stdio +
|
||||||
the channel pipe to the forkserver.
|
the channel pipe to the forkserver.
|
||||||
|
|
||||||
- **Memory image** — trio's internal data structures
|
- **Memory image** — trio's internal data structures
|
||||||
(scheduler, task queues, runner state) sit in COW
|
(scheduler, task queues, runner state) sit in COW
|
||||||
memory but nobody's executing them. Get GC'd /
|
memory alongside the leaked stacks above. Nobody's
|
||||||
overwritten when the child's fresh `trio.run()` boots.
|
executing them; they get GC'd / overwritten when the
|
||||||
|
child's fresh `trio.run()` boots.
|
||||||
|
|
||||||
- **Python thread state** — handled automatically by
|
- **Python thread state** — handled automatically by
|
||||||
CPython. `PyOS_AfterFork_Child()` calls
|
CPython. `PyOS_AfterFork_Child()` calls
|
||||||
`_PyThreadState_DeleteExceptCurrent()`, so dead
|
`_PyThreadState_DeleteExceptCurrent()`, so dead
|
||||||
`PyThreadState` objects are cleaned and
|
`PyThreadState` objects are cleaned and
|
||||||
`threading.enumerate()` returns just the surviving
|
`threading.enumerate()` returns just the surviving
|
||||||
thread.
|
thread.
|
||||||
|
|
||||||
- **User-level locks (`threading.Lock`)** —
|
- **User-level locks (`threading.Lock`)** —
|
||||||
held-by-dead-thread state is the canonical fork hazard.
|
held-by-dead-thread state is the canonical fork hazard.
|
||||||
Not an issue in practice for tractor: trio doesn't hold
|
Not an issue in practice for tractor: trio doesn't hold
|
||||||
|
|
@ -166,6 +199,7 @@ how we handle each:
|
||||||
either direction). CPython's GIL is auto-reset by the
|
either direction). CPython's GIL is auto-reset by the
|
||||||
fork callback.
|
fork callback.
|
||||||
|
|
||||||
|
|
||||||
FYI: how this dodges the `trio.run()` × `fork()` hazards
|
FYI: how this dodges the `trio.run()` × `fork()` hazards
|
||||||
--------------------------------------------------------
|
--------------------------------------------------------
|
||||||
|
|
||||||
|
|
@ -183,13 +217,16 @@ design dodges each class explicitly:
|
||||||
reader. *Dodge*: the inherited wakeup-fd is closed by
|
reader. *Dodge*: the inherited wakeup-fd is closed by
|
||||||
`_close_inherited_fds()`, then the child's own
|
`_close_inherited_fds()`, then the child's own
|
||||||
`trio.run()` installs a fresh one.
|
`trio.run()` installs a fresh one.
|
||||||
|
|
||||||
- **`epoll`/`kqueue` instance**: trio's I/O backend holds
|
- **`epoll`/`kqueue` instance**: trio's I/O backend holds
|
||||||
one. Inherited as a dead fd; same fix as above.
|
one. Inherited as a dead fd; same fix as above.
|
||||||
|
|
||||||
- **Threadpool cache threads** (`trio.to_thread`): worker
|
- **Threadpool cache threads** (`trio.to_thread`): worker
|
||||||
threads with cached tstate. Don't exist in the child
|
threads with cached tstate. Don't exist in the child
|
||||||
(POSIX); cache state is meaningless garbage that gets
|
(POSIX); cache state is meaningless garbage that gets
|
||||||
reset when the child's trio.run() initializes its own
|
reset when the child's trio.run() initializes its own
|
||||||
thread cache.
|
thread cache.
|
||||||
|
|
||||||
- **Cancel scopes / nurseries / open `trio.Process` /
|
- **Cancel scopes / nurseries / open `trio.Process` /
|
||||||
open sockets**: these are trio-runtime objects, not
|
open sockets**: these are trio-runtime objects, not
|
||||||
kernel objects. The runtime that owns them is gone in
|
kernel objects. The runtime that owns them is gone in
|
||||||
|
|
@ -197,9 +234,11 @@ design dodges each class explicitly:
|
||||||
in COW memory and get overwritten as the child runs.
|
in COW memory and get overwritten as the child runs.
|
||||||
Inherited *kernel* fds those objects wrapped (sockets,
|
Inherited *kernel* fds those objects wrapped (sockets,
|
||||||
proc pipes) are caught by `_close_inherited_fds()`.
|
proc pipes) are caught by `_close_inherited_fds()`.
|
||||||
|
|
||||||
- **`atexit` handlers**: trio doesn't register any that
|
- **`atexit` handlers**: trio doesn't register any that
|
||||||
would mis-fire post-fork; trio's lifetime-stack is
|
would mis-fire post-fork; trio's lifetime-stack is
|
||||||
all `with`-block-scoped and dies with the runner.
|
all `with`-block-scoped and dies with the runner.
|
||||||
|
|
||||||
- **Foreign-language I/O state** (libcurl, OpenSSL session
|
- **Foreign-language I/O state** (libcurl, OpenSSL session
|
||||||
caches, etc.): out of scope — same hazard as any
|
caches, etc.): out of scope — same hazard as any
|
||||||
fork-without-exec; users layering those on top of
|
fork-without-exec; users layering those on top of
|
||||||
|
|
@ -211,6 +250,7 @@ isolation + `_close_inherited_fds()` cleanup gives the
|
||||||
forked child a clean trio environment. Everything else
|
forked child a clean trio environment. Everything else
|
||||||
falls under the standard fork-without-exec disclaimer.
|
falls under the standard fork-without-exec disclaimer.
|
||||||
|
|
||||||
|
|
||||||
Implementation status
|
Implementation status
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
|
|
@ -231,10 +271,11 @@ follow-up) including the
|
||||||
|
|
||||||
Still-open work (tracked on tractor #379):
|
Still-open work (tracked on tractor #379):
|
||||||
|
|
||||||
- no cancellation / hard-kill stress coverage yet
|
- [ ] no cancellation / hard-kill stress coverage yet
|
||||||
(counterpart to `tests/test_subint_cancellation.py` for
|
(counterpart to `tests/test_subint_cancellation.py` for
|
||||||
the plain `subint` backend),
|
the plain `subint` backend),
|
||||||
- `child_sigint='trio'` mode (flag scaffolded below; default
|
|
||||||
|
- [ ] `child_sigint='trio'` mode (flag scaffolded below; default
|
||||||
is `'ipc'`). Originally intended as a manual SIGINT →
|
is `'ipc'`). Originally intended as a manual SIGINT →
|
||||||
trio-cancel bridge, but investigation showed trio's
|
trio-cancel bridge, but investigation showed trio's
|
||||||
handler IS already correctly installed in the fork-child
|
handler IS already correctly installed in the fork-child
|
||||||
|
|
@ -287,18 +328,22 @@ See also
|
||||||
- `tractor.spawn._subint_forkserver` — variant-2 placeholder
|
- `tractor.spawn._subint_forkserver` — variant-2 placeholder
|
||||||
module; reserved for the future subint-isolated-child
|
module; reserved for the future subint-isolated-child
|
||||||
runtime once jcrist/msgspec#1026 unblocks.
|
runtime once jcrist/msgspec#1026 unblocks.
|
||||||
|
|
||||||
- `tractor.spawn._subint_fork` — the stub for the
|
- `tractor.spawn._subint_fork` — the stub for the
|
||||||
fork-from-non-main-subint strategy that DIDN'T work (kept
|
fork-from-non-main-subint strategy that DIDN'T work (kept
|
||||||
in-tree as documentation of the attempt + the CPython-level
|
in-tree as documentation of the attempt + the CPython-level
|
||||||
block).
|
block).
|
||||||
|
|
||||||
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
- `ai/conc-anal/subint_fork_blocked_by_cpython_post_fork_issue.md`
|
||||||
— CPython source walkthrough of why fork-from-subint is dead.
|
— CPython source walkthrough of why fork-from-subint is dead.
|
||||||
|
|
||||||
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`
|
- `ai/conc-anal/subint_fork_from_main_thread_smoketest.py`
|
||||||
— standalone feasibility check (delegates to this module
|
— standalone feasibility check (delegates to this module
|
||||||
for the primitives it exercises).
|
for the primitives it exercises).
|
||||||
|
|
||||||
'''
|
'''
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
import errno
|
||||||
import os
|
import os
|
||||||
import signal
|
import signal
|
||||||
import sys
|
import sys
|
||||||
|
|
@ -423,9 +468,24 @@ def _close_inherited_fds(
|
||||||
try:
|
try:
|
||||||
os.close(fd)
|
os.close(fd)
|
||||||
closed += 1
|
closed += 1
|
||||||
except OSError:
|
except OSError as oserr:
|
||||||
# fd was already closed (race with listdir) or otherwise
|
# `EBADF` is the benign-and-expected case: the
|
||||||
# unclosable — either is fine.
|
# `os.listdir('/proc/self/fd')` call above itself
|
||||||
|
# opens a transient dirfd that ends up in
|
||||||
|
# `candidates`, then auto-closes before this loop
|
||||||
|
# reaches it. Same for any fd whose Python wrapper
|
||||||
|
# was GC'd between `listdir` and `os.close`.
|
||||||
|
# Suppress at debug-level — surfacing every
|
||||||
|
# EBADF as a full traceback (prior `log.exception`
|
||||||
|
# behavior) drowned the post-fork log channel.
|
||||||
|
if oserr.errno == errno.EBADF:
|
||||||
|
log.debug(
|
||||||
|
f'Skip already-closed inherited fd {fd!r} '
|
||||||
|
f'(EBADF, benign race with listdir)\n'
|
||||||
|
)
|
||||||
|
continue
|
||||||
|
# Other errnos (EIO / EPERM / EINTR / ...) are
|
||||||
|
# genuinely unexpected — keep the loud surface.
|
||||||
log.exception(
|
log.exception(
|
||||||
f'Failed to close inherited fd in child ??\n'
|
f'Failed to close inherited fd in child ??\n'
|
||||||
f'{fd!r}\n'
|
f'{fd!r}\n'
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue