Since `tractor.ipc._mp_bs.disable_mantracker()` turns off
`mp.resource_tracker` entirely (see the conc-anal doc
`subint_forkserver_mp_shared_memory_issue.md`), a
hard-crashing actor can leave `/dev/shm/<key>` segments
that nothing else GCs. New `tractor-reap` phase 2 sweeps
them.
Deats,
- `tractor/_testing/_reap.py`: add `find_orphaned_shm()`
+ `reap_shm()` helpers. Match criteria: regular file
under `/dev/shm`, owned by current uid, AND no live
proc has it open (mmap'd or fd-held). In-use
enumeration via `psutil.Process.memory_maps()` +
`.open_files()` — xplatform, kernel-canonical (same
answer `lsof` would give), no reliance on
tractor-specific shm-key naming.
- `_ensure_shm_supported()` guard: helpers raise
`NotImplementedError` outside Linux/FreeBSD bc macOS
POSIX shm has no fs-visible path (`shm_open` only)
and Windows is a different story.
- `scripts/tractor-reap`: new `--shm` (run after
process reap) and `--shm-only` (skip process phase)
flags. `-n` dry-runs both phases. Exit code is `1`
if either phase had survivors/errors.
- `pyproject.toml` + `uv.lock`: add `psutil>=7.0.0` to
the `testing` dep group; lazy-imported in `_reap.py`
so the process-reap path stays import-clean without
it.
Also,
- doc `--shm` in `.claude/skills/run-tests/SKILL.md`
(new section 10c) — covers match criteria + the
preservation guarantee for unrelated apps.
- flip mitigation status in
`subint_forkserver_mp_shared_memory_issue.md` from
"could extend `tractor-reap`" to "implemented", with
a note that callers should still UUID-pin shm keys to
avoid cross-session collisions.
Verified locally vs 81 in-use segments held by `piker`,
`lttng-ust-*`, `aja-shm-*` — all preserved; only the
genuinely-orphaned tractor segments got unlinked.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
New `scripts/tractor-reap` CLI wraps the
`_testing._reap` mod for manual zombie-subactor
cleanup after crashed pytest sessions. Two modes:
- orphan-mode (default): finds PPid==1 procs
with cwd matching repo root + `python` in
cmdline.
- descendant-mode (`--parent <pid>`): scoped
sweep under a still-live supervisor.
SC-polite: SIGINT with bounded grace window
(default 3s) before escalating to SIGKILL.
Exit code signals whether escalation was needed
(useful for CI health-checks).
Also, document both the auto-reap fixture and
the CLI in `/run-tests` SKILL.md (section 10).
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code