Document `SharedMemory` × `subint_forkserver` incompat

New `ai/conc-anal/` doc: `mp.SharedMemory` is
fork-without-exec unsafe — child inherits parent's
`resource_tracker` fd → EBADF on first shm op;
leaked `/shm_list` cascades `FileExistsError`
across parametrize variants. Canonical CPython
issue class, NOT a tractor bug. Includes two
longer-term mitigation paths (reset inherited
tracker fd vs migrate off `mp.shared_memory`).

Also, update `tests/test_shm.py`:
- comment out `subint_forkserver` from skip list
- rewrite reason with precise failure-mode
  descriptions + link to the analysis doc

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
subint_forkserver_backend
Gud Boi 2026-04-26 20:13:24 -04:00
parent 6d76b60404
commit c99d475d03
2 changed files with 134 additions and 5 deletions

View File

@ -0,0 +1,125 @@
# `subint_forkserver` × `multiprocessing.SharedMemory`: incompatible-by-mp-design
Surfaced by `tests/test_shm.py` under
`--spawn-backend=subint_forkserver`. Both test functions
fail with distinct symptoms that share one root cause:
**`multiprocessing.resource_tracker` is fork-without-exec
unsafe.**
## TL;DR
`mp.shared_memory.SharedMemory` registers each shm
allocation with the per-process
`multiprocessing.resource_tracker` singleton. The
tracker is a daemon process started lazily, and the
parent owns a unix-pipe-fd to it. When the parent
forks-without-execing into a `subint_forkserver`
child, the child inherits that fd — but the fd refers
to the *parent's* tracker, which the child has no
business writing to.
Two manifestations:
1. **`test_child_attaches_alot`** — child loops 1000×
`attach_shm_list()`. First `mp.SharedMemory` call
in the child triggers
`resource_tracker._ensure_running_and_write`
`_teardown_dead_process``os.close(self._fd)` on
an fd the child should never have touched. Surfaces
as `OSError: [Errno 9] Bad file descriptor`
wrapped in `tractor.RemoteActorError`.
2. **`test_parent_writer_child_reader[*]`** — first
parametrize variant "passes" (with
`resource_tracker: leaked shared_memory` warning)
because nobody ever cleans up `/shm_list`.
Subsequent variants then fail with
`FileExistsError: '/shm_list'` because the leak
persists across the parametrize loop and forkserver
children can't `shm_open(create=True)` an existing
key. Trio backend doesn't surface this because
each subactor `exec`s a fresh interpreter →
independent resource tracker per subactor → no
inherited-fd issue, and the test's pre-existing
leak is masked by the per-process tracker reset.
## Why trio backend works
Under `--spawn-backend=trio`, each subactor is born
via `python -m tractor._child` (full `execve`) →
fresh interpreter → fresh module-level globals →
`mp.resource_tracker._resource_tracker` is `None`
until first use → `mp.SharedMemory` constructs its
own tracker, talks to its own pipe-fd. No cross-
process fd inheritance.
Under `subint_forkserver`, the child is
`os.fork()`'d from a worker thread of the parent
(no `exec`) → inherits parent's
`mp.resource_tracker._resource_tracker._fd`
EBADF / cross-talk on first `mp.SharedMemory`
operation in the child.
## Status
**Not a tractor bug.** This is the canonical
"fork-without-exec breaks `multiprocessing`
internals" class — see CPython issues:
- https://bugs.python.org/issue38119
- https://bugs.python.org/issue45209
Pure-`fork` start method has the same incompatibility;
that's why `mp` itself defaults to `spawn` on macOS
and `forkserver`/`spawn` on Linux post-3.14.
## Mitigation
`tests/test_shm.py` is module-marked with
`pytest.mark.skipon_spawn_backend('subint_forkserver',
'subint', reason=...)` pointing at this doc.
Two longer-term options if we ever want shm tests under
`subint_forkserver`:
1. **Reset the inherited tracker fd in the child
prelude** —
`tractor/spawn/_subint_forkserver.py::_child_target`
already calls `_close_inherited_fds()`. We could
additionally explicitly clear
`multiprocessing.resource_tracker._resource_tracker`
so the child re-creates a fresh tracker on first
shm op. **Caveat**: this means each
forkserver-subactor spawns its own resource-tracker
daemon-process, multiplying daemon-proc count by
subactor count. mp authors deliberately avoided
this — the tracker is meant to be a per-mp-context
singleton.
2. **Stop using `multiprocessing.shared_memory`**
migrate to `posix_ipc` directly (no resource
tracker) or finish the `hotbaud`-based ringbuf
transport that already supersedes shm in many
`tractor` IPC paths.
Neither is in scope for the
`subint_forkserver`-backend-lands PR; both are tracked
out as future work.
## Reproducer
```sh
# fail mode 1 (EBADF on resource_tracker._fd):
./py314/bin/python -m pytest \
tests/test_shm.py::test_child_attaches_alot \
--spawn-backend=subint_forkserver --tb=short
# fail mode 2 (FileExistsError on /shm_list):
./py314/bin/python -m pytest \
tests/test_shm.py::test_parent_writer_child_reader \
--spawn-backend=subint_forkserver
# baseline (passes):
./py314/bin/python -m pytest \
tests/test_shm.py --spawn-backend=trio
```

View File

@ -16,14 +16,18 @@ from tractor.ipc._shm import (
pytestmark = pytest.mark.skipon_spawn_backend(
'subint',
'subint_forkserver',
# 'subint_forkserver',
reason=(
'subint: GIL-contention hanging class.\n'
'subint_forkserver: `multiprocessing.SharedMemory` '
'has known issues with fork-without-exec (mp\'s '
'resource_tracker and SharedMemory internals assume '
'fresh-process state). RemoteActorError surfaces from '
'the shm-attach path. TODO, put issue link!\n'
'is fork-without-exec unsafe — child inherits parent\'s '
'`resource_tracker` fd → EBADF on first shm op '
'(`test_child_attaches_alot`); leaked `/shm_list` from '
'a "passing" run cascades into `FileExistsError` across '
'parametrize variants (`test_parent_writer_child_reader`). '
'Canonical CPython issue class, NOT a tractor bug; full '
'tracker doc:\n'
'ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md'
)
)