From c99d475d0317f9e0fbbb4d51abc299c90e4d965a Mon Sep 17 00:00:00 2001 From: goodboy Date: Sun, 26 Apr 2026 20:13:24 -0400 Subject: [PATCH] =?UTF-8?q?Document=20`SharedMemory`=20=C3=97=20`subint=5F?= =?UTF-8?q?forkserver`=20incompat?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New `ai/conc-anal/` doc: `mp.SharedMemory` is fork-without-exec unsafe — child inherits parent's `resource_tracker` fd → EBADF on first shm op; leaked `/shm_list` cascades `FileExistsError` across parametrize variants. Canonical CPython issue class, NOT a tractor bug. Includes two longer-term mitigation paths (reset inherited tracker fd vs migrate off `mp.shared_memory`). Also, update `tests/test_shm.py`: - comment out `subint_forkserver` from skip list - rewrite reason with precise failure-mode descriptions + link to the analysis doc (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code --- ...ubint_forkserver_mp_shared_memory_issue.md | 125 ++++++++++++++++++ tests/test_shm.py | 14 +- 2 files changed, 134 insertions(+), 5 deletions(-) create mode 100644 ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md diff --git a/ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md b/ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md new file mode 100644 index 00000000..8351aae0 --- /dev/null +++ b/ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md @@ -0,0 +1,125 @@ +# `subint_forkserver` × `multiprocessing.SharedMemory`: incompatible-by-mp-design + +Surfaced by `tests/test_shm.py` under +`--spawn-backend=subint_forkserver`. Both test functions +fail with distinct symptoms that share one root cause: +**`multiprocessing.resource_tracker` is fork-without-exec +unsafe.** + +## TL;DR + +`mp.shared_memory.SharedMemory` registers each shm +allocation with the per-process +`multiprocessing.resource_tracker` singleton. The +tracker is a daemon process started lazily, and the +parent owns a unix-pipe-fd to it. When the parent +forks-without-execing into a `subint_forkserver` +child, the child inherits that fd — but the fd refers +to the *parent's* tracker, which the child has no +business writing to. + +Two manifestations: + +1. **`test_child_attaches_alot`** — child loops 1000× + `attach_shm_list()`. First `mp.SharedMemory` call + in the child triggers + `resource_tracker._ensure_running_and_write` → + `_teardown_dead_process` → `os.close(self._fd)` on + an fd the child should never have touched. Surfaces + as `OSError: [Errno 9] Bad file descriptor` + wrapped in `tractor.RemoteActorError`. + +2. **`test_parent_writer_child_reader[*]`** — first + parametrize variant "passes" (with + `resource_tracker: leaked shared_memory` warning) + because nobody ever cleans up `/shm_list`. + Subsequent variants then fail with + `FileExistsError: '/shm_list'` because the leak + persists across the parametrize loop and forkserver + children can't `shm_open(create=True)` an existing + key. Trio backend doesn't surface this because + each subactor `exec`s a fresh interpreter → + independent resource tracker per subactor → no + inherited-fd issue, and the test's pre-existing + leak is masked by the per-process tracker reset. + +## Why trio backend works + +Under `--spawn-backend=trio`, each subactor is born +via `python -m tractor._child` (full `execve`) → +fresh interpreter → fresh module-level globals → +`mp.resource_tracker._resource_tracker` is `None` +until first use → `mp.SharedMemory` constructs its +own tracker, talks to its own pipe-fd. No cross- +process fd inheritance. + +Under `subint_forkserver`, the child is +`os.fork()`'d from a worker thread of the parent +(no `exec`) → inherits parent's +`mp.resource_tracker._resource_tracker._fd` → +EBADF / cross-talk on first `mp.SharedMemory` +operation in the child. + +## Status + +**Not a tractor bug.** This is the canonical +"fork-without-exec breaks `multiprocessing` +internals" class — see CPython issues: + +- https://bugs.python.org/issue38119 +- https://bugs.python.org/issue45209 + +Pure-`fork` start method has the same incompatibility; +that's why `mp` itself defaults to `spawn` on macOS +and `forkserver`/`spawn` on Linux post-3.14. + +## Mitigation + +`tests/test_shm.py` is module-marked with +`pytest.mark.skipon_spawn_backend('subint_forkserver', +'subint', reason=...)` pointing at this doc. + +Two longer-term options if we ever want shm tests under +`subint_forkserver`: + +1. **Reset the inherited tracker fd in the child + prelude** — + `tractor/spawn/_subint_forkserver.py::_child_target` + already calls `_close_inherited_fds()`. We could + additionally explicitly clear + `multiprocessing.resource_tracker._resource_tracker` + so the child re-creates a fresh tracker on first + shm op. **Caveat**: this means each + forkserver-subactor spawns its own resource-tracker + daemon-process, multiplying daemon-proc count by + subactor count. mp authors deliberately avoided + this — the tracker is meant to be a per-mp-context + singleton. + +2. **Stop using `multiprocessing.shared_memory`** — + migrate to `posix_ipc` directly (no resource + tracker) or finish the `hotbaud`-based ringbuf + transport that already supersedes shm in many + `tractor` IPC paths. + +Neither is in scope for the +`subint_forkserver`-backend-lands PR; both are tracked +out as future work. + +## Reproducer + +```sh +# fail mode 1 (EBADF on resource_tracker._fd): +./py314/bin/python -m pytest \ + tests/test_shm.py::test_child_attaches_alot \ + --spawn-backend=subint_forkserver --tb=short + +# fail mode 2 (FileExistsError on /shm_list): +./py314/bin/python -m pytest \ + tests/test_shm.py::test_parent_writer_child_reader \ + --spawn-backend=subint_forkserver + +# baseline (passes): +./py314/bin/python -m pytest \ + tests/test_shm.py --spawn-backend=trio +``` diff --git a/tests/test_shm.py b/tests/test_shm.py index 61bcdee2..8ea43457 100644 --- a/tests/test_shm.py +++ b/tests/test_shm.py @@ -16,14 +16,18 @@ from tractor.ipc._shm import ( pytestmark = pytest.mark.skipon_spawn_backend( 'subint', - 'subint_forkserver', + # 'subint_forkserver', reason=( 'subint: GIL-contention hanging class.\n' 'subint_forkserver: `multiprocessing.SharedMemory` ' - 'has known issues with fork-without-exec (mp\'s ' - 'resource_tracker and SharedMemory internals assume ' - 'fresh-process state). RemoteActorError surfaces from ' - 'the shm-attach path. TODO, put issue link!\n' + 'is fork-without-exec unsafe — child inherits parent\'s ' + '`resource_tracker` fd → EBADF on first shm op ' + '(`test_child_attaches_alot`); leaked `/shm_list` from ' + 'a "passing" run cascades into `FileExistsError` across ' + 'parametrize variants (`test_parent_writer_child_reader`). ' + 'Canonical CPython issue class, NOT a tractor bug; full ' + 'tracker doc:\n' + 'ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md' ) )