Add `--shm` orphan sweep to `tractor-reap`
Since `tractor.ipc._mp_bs.disable_mantracker()` turns off `mp.resource_tracker` entirely (see the conc-anal doc `subint_forkserver_mp_shared_memory_issue.md`), a hard-crashing actor can leave `/dev/shm/<key>` segments that nothing else GCs. New `tractor-reap` phase 2 sweeps them. Deats, - `tractor/_testing/_reap.py`: add `find_orphaned_shm()` + `reap_shm()` helpers. Match criteria: regular file under `/dev/shm`, owned by current uid, AND no live proc has it open (mmap'd or fd-held). In-use enumeration via `psutil.Process.memory_maps()` + `.open_files()` — xplatform, kernel-canonical (same answer `lsof` would give), no reliance on tractor-specific shm-key naming. - `_ensure_shm_supported()` guard: helpers raise `NotImplementedError` outside Linux/FreeBSD bc macOS POSIX shm has no fs-visible path (`shm_open` only) and Windows is a different story. - `scripts/tractor-reap`: new `--shm` (run after process reap) and `--shm-only` (skip process phase) flags. `-n` dry-runs both phases. Exit code is `1` if either phase had survivors/errors. - `pyproject.toml` + `uv.lock`: add `psutil>=7.0.0` to the `testing` dep group; lazy-imported in `_reap.py` so the process-reap path stays import-clean without it. Also, - doc `--shm` in `.claude/skills/run-tests/SKILL.md` (new section 10c) — covers match criteria + the preservation guarantee for unrelated apps. - flip mitigation status in `subint_forkserver_mp_shared_memory_issue.md` from "could extend `tractor-reap`" to "implemented", with a note that callers should still UUID-pin shm keys to avoid cross-session collisions. Verified locally vs 81 in-use segments held by `piker`, `lttng-ust-*`, `aja-shm-*` — all preserved; only the genuinely-orphaned tractor segments got unlinked. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-codesubint_forkserver_backend
parent
aa3e230926
commit
4f12d69b41
|
|
@ -585,3 +585,41 @@ to force-reap under a still-live supervisor.
|
||||||
active in another terminal. It's safe (won't touch
|
active in another terminal. It's safe (won't touch
|
||||||
that session's live children in orphan-mode) but can
|
that session's live children in orphan-mode) but can
|
||||||
race if the target session is mid-teardown.
|
race if the target session is mid-teardown.
|
||||||
|
|
||||||
|
### c) `--shm` / `--shm-only`: orphan-segment sweep
|
||||||
|
|
||||||
|
Because `tractor.ipc._mp_bs.disable_mantracker()`
|
||||||
|
turns off `mp.resource_tracker` (see
|
||||||
|
`ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md`),
|
||||||
|
a hard-crashing actor can leave `/dev/shm/<key>`
|
||||||
|
segments behind that nothing else GCs.
|
||||||
|
|
||||||
|
```sh
|
||||||
|
# process reap THEN shm sweep
|
||||||
|
scripts/tractor-reap --shm
|
||||||
|
|
||||||
|
# shm sweep only (skip process phase)
|
||||||
|
scripts/tractor-reap --shm-only
|
||||||
|
|
||||||
|
# dry-run: list candidates, don't unlink
|
||||||
|
scripts/tractor-reap --shm -n
|
||||||
|
```
|
||||||
|
|
||||||
|
**Match criteria** (very conservative — this is a
|
||||||
|
shared-system path, can't be wrong):
|
||||||
|
- segment is a regular file under `/dev/shm`,
|
||||||
|
- owned by the **current uid** (`stat.st_uid`),
|
||||||
|
- AND **no live process holds it open** —
|
||||||
|
enumerated by walking every readable
|
||||||
|
`/proc/<pid>/maps` (post-mmap mappings) AND
|
||||||
|
`/proc/<pid>/fd/*` (pre-mmap shm-opened fds).
|
||||||
|
|
||||||
|
The "nobody has it open" check is the
|
||||||
|
kernel-canonical "is this leaked?" test — same
|
||||||
|
answer `lsof /dev/shm/<key>` would give. No
|
||||||
|
reliance on tractor-specific naming, so it works
|
||||||
|
for any tractor app. Critically, it WILL NOT touch
|
||||||
|
segments held by other apps you have running
|
||||||
|
(e.g. `piker`, `lttng-ust-*`, `aja-shm-*` —
|
||||||
|
verified locally with 81 in-use segments correctly
|
||||||
|
preserved).
|
||||||
|
|
|
||||||
|
|
@ -132,14 +132,20 @@ segment (legitimate race in shared-key setups).
|
||||||
|
|
||||||
- **Crash-leaked segments.** If an actor segfaults
|
- **Crash-leaked segments.** If an actor segfaults
|
||||||
or is `SIGKILL`'d before its lifetime stack runs,
|
or is `SIGKILL`'d before its lifetime stack runs,
|
||||||
`/dev/shm/<key>` will leak. Mitigations:
|
`/dev/shm/<key>` will leak. Mitigation:
|
||||||
- `tractor-reap` (the new
|
`scripts/tractor-reap --shm` walks `/dev/shm`,
|
||||||
`scripts/tractor-reap` CLI) doesn't yet sweep
|
filters to segments owned by the current uid that
|
||||||
`/dev/shm` — could extend it.
|
no live process is mapping or holding open (via
|
||||||
- Higher-level apps using shm should pin a UUID
|
`/proc/*/maps` + `/proc/*/fd/*`), and unlinks
|
||||||
into the key (the `'shml_<uuid>'` pattern in
|
them. The "nobody-has-it-open" filter is
|
||||||
`test_child_attaches_alot`) so leaks are
|
kernel-canonical so it never touches in-flight
|
||||||
distinct per session and easy to GC out-of-band.
|
segments held by sibling apps (verified locally
|
||||||
|
against 81 piker/lttng/aja-held segments — all
|
||||||
|
preserved).
|
||||||
|
- Higher-level apps using shm should still pin a
|
||||||
|
UUID into the key (the `'shml_<uuid>'` pattern
|
||||||
|
in `test_child_attaches_alot`) so concurrent
|
||||||
|
sessions don't collide on the same key.
|
||||||
- **Cross-actor unlink races.** Two actors holding
|
- **Cross-actor unlink races.** Two actors holding
|
||||||
the same shm key racing on `unlink()` — handled
|
the same shm key racing on `unlink()` — handled
|
||||||
by the `FileNotFoundError` swallow.
|
by the `FileNotFoundError` swallow.
|
||||||
|
|
|
||||||
|
|
@ -84,6 +84,11 @@ testing = [
|
||||||
# known-hanging `subint`-backend audit tests; see
|
# known-hanging `subint`-backend audit tests; see
|
||||||
# `ai/conc-anal/subint_*_issue.md`).
|
# `ai/conc-anal/subint_*_issue.md`).
|
||||||
"pytest-timeout>=2.3",
|
"pytest-timeout>=2.3",
|
||||||
|
# used by `tractor._testing._reap` for the
|
||||||
|
# `tractor-reap` zombie-subactor + leaked-shm
|
||||||
|
# cleanup utility (xplatform `Process.memory_maps`,
|
||||||
|
# `Process.open_files`).
|
||||||
|
"psutil>=7.0.0",
|
||||||
]
|
]
|
||||||
repl = [
|
repl = [
|
||||||
"pyperclip>=1.9.0",
|
"pyperclip>=1.9.0",
|
||||||
|
|
|
||||||
|
|
@ -4,14 +4,26 @@
|
||||||
#
|
#
|
||||||
# SPDX-License-Identifier: AGPL-3.0-or-later
|
# SPDX-License-Identifier: AGPL-3.0-or-later
|
||||||
'''
|
'''
|
||||||
`tractor-reap` — SC-polite zombie-subactor reaper.
|
`tractor-reap` — SC-polite zombie-subactor reaper +
|
||||||
|
optional `/dev/shm/` orphan-segment sweep.
|
||||||
|
|
||||||
Finds `tractor` subactor processes left alive after a
|
Two cleanup phases (run in order when both are enabled):
|
||||||
`pytest` (or any tractor-app) run that failed to fully
|
|
||||||
cancel its actor tree, then sends SIGINT with a bounded
|
|
||||||
grace window before escalating to SIGKILL.
|
|
||||||
|
|
||||||
Detection modes (auto-selected):
|
1. **process reap** — finds `tractor` subactor processes
|
||||||
|
left alive after a `pytest` (or any tractor-app) run
|
||||||
|
that failed to fully cancel its actor tree, then sends
|
||||||
|
SIGINT with a bounded grace window before escalating
|
||||||
|
to SIGKILL.
|
||||||
|
|
||||||
|
2. **shm sweep** (`--shm` / `--shm-only`) — unlinks
|
||||||
|
`/dev/shm/<file>` entries owned by the current uid
|
||||||
|
that no live process has open (mmap'd or fd-held).
|
||||||
|
Needed because `tractor` disables
|
||||||
|
`mp.resource_tracker` (see `tractor.ipc._mp_bs`), so a
|
||||||
|
hard-crashing actor leaves leaked segments that
|
||||||
|
nothing else GCs.
|
||||||
|
|
||||||
|
Process-reap detection modes (auto-selected):
|
||||||
|
|
||||||
--parent <pid> : descendant-mode — kill procs whose
|
--parent <pid> : descendant-mode — kill procs whose
|
||||||
PPid == <pid>. Use when a parent
|
PPid == <pid>. Use when a parent
|
||||||
|
|
@ -29,14 +41,21 @@ Detection modes (auto-selected):
|
||||||
|
|
||||||
Usage:
|
Usage:
|
||||||
|
|
||||||
# after a pytest run crashed/was Ctrl+C'd
|
# process reap only (default)
|
||||||
scripts/tractor-reap
|
scripts/tractor-reap
|
||||||
|
|
||||||
|
# process reap + shm sweep
|
||||||
|
scripts/tractor-reap --shm
|
||||||
|
|
||||||
|
# only the shm sweep, skip process reap
|
||||||
|
scripts/tractor-reap --shm-only
|
||||||
|
|
||||||
# from inside a still-live supervisor
|
# from inside a still-live supervisor
|
||||||
scripts/tractor-reap --parent 12345
|
scripts/tractor-reap --parent 12345
|
||||||
|
|
||||||
# dry-run: list what would be reaped, don't signal
|
# dry-run: list what would be reaped, don't act
|
||||||
scripts/tractor-reap -n
|
scripts/tractor-reap -n
|
||||||
|
scripts/tractor-reap --shm -n
|
||||||
|
|
||||||
'''
|
'''
|
||||||
import argparse
|
import argparse
|
||||||
|
|
@ -83,7 +102,21 @@ def main() -> int:
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
'--dry-run', '-n',
|
'--dry-run', '-n',
|
||||||
action='store_true',
|
action='store_true',
|
||||||
help='list matched pids but do not signal',
|
help='list matched pids/paths but do not signal/unlink',
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
'--shm',
|
||||||
|
action='store_true',
|
||||||
|
help=(
|
||||||
|
'after process reap, also unlink orphaned '
|
||||||
|
'/dev/shm segments owned by the current user '
|
||||||
|
'that no live process is mapping or holding open'
|
||||||
|
),
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
'--shm-only',
|
||||||
|
action='store_true',
|
||||||
|
help='skip process reap; only do the shm sweep',
|
||||||
)
|
)
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
|
@ -95,29 +128,54 @@ def main() -> int:
|
||||||
from tractor._testing._reap import (
|
from tractor._testing._reap import (
|
||||||
find_descendants,
|
find_descendants,
|
||||||
find_orphans,
|
find_orphans,
|
||||||
|
find_orphaned_shm,
|
||||||
reap,
|
reap,
|
||||||
|
reap_shm,
|
||||||
)
|
)
|
||||||
|
|
||||||
if args.parent is not None:
|
rc: int = 0
|
||||||
pids: list[int] = find_descendants(args.parent)
|
|
||||||
mode: str = f'descendants of PPid={args.parent}'
|
|
||||||
else:
|
|
||||||
pids = find_orphans(repo)
|
|
||||||
mode = f'orphans (PPid=1, cwd={repo})'
|
|
||||||
|
|
||||||
if not pids:
|
# --- phase 1: process reap (skipped under --shm-only) ---
|
||||||
print(f'[tractor-reap] no {mode} to reap')
|
if not args.shm_only:
|
||||||
return 0
|
if args.parent is not None:
|
||||||
|
pids: list[int] = find_descendants(args.parent)
|
||||||
|
mode: str = f'descendants of PPid={args.parent}'
|
||||||
|
else:
|
||||||
|
pids = find_orphans(repo)
|
||||||
|
mode = f'orphans (PPid=1, cwd={repo})'
|
||||||
|
|
||||||
if args.dry_run:
|
if not pids:
|
||||||
print(f'[tractor-reap] dry-run — {mode}:\n {pids}')
|
print(f'[tractor-reap] no {mode} to reap')
|
||||||
return 0
|
elif args.dry_run:
|
||||||
|
print(
|
||||||
|
f'[tractor-reap] dry-run — {mode}:\n {pids}'
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
_, survivors = reap(pids, grace=args.grace)
|
||||||
|
if survivors:
|
||||||
|
rc = 1
|
||||||
|
|
||||||
signalled, survivors = reap(pids, grace=args.grace)
|
# --- phase 2: shm sweep (opt-in) ---
|
||||||
# exit 0 if everyone exited cleanly, else 1 to signal
|
if args.shm or args.shm_only:
|
||||||
# escalation happened — makes the command useful in
|
leaked: list[str] = find_orphaned_shm()
|
||||||
# CI health-checks and `||`-chaining.
|
if not leaked:
|
||||||
return 0 if not survivors else 1
|
print(
|
||||||
|
'[tractor-reap] no orphaned /dev/shm '
|
||||||
|
'segments to sweep'
|
||||||
|
)
|
||||||
|
elif args.dry_run:
|
||||||
|
print(
|
||||||
|
f'[tractor-reap] dry-run — {len(leaked)} '
|
||||||
|
f'orphaned shm segment(s):\n {leaked}'
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
_, errors = reap_shm(leaked)
|
||||||
|
if errors:
|
||||||
|
rc = 1
|
||||||
|
|
||||||
|
# exit 0 if everything cleaned cleanly, else 1 — useful
|
||||||
|
# for CI health-check chaining.
|
||||||
|
return rc
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
if __name__ == '__main__':
|
||||||
|
|
|
||||||
|
|
@ -16,17 +16,25 @@
|
||||||
|
|
||||||
'''
|
'''
|
||||||
Zombie-subactor reaper — SC-polite (SIGINT first, SIGKILL
|
Zombie-subactor reaper — SC-polite (SIGINT first, SIGKILL
|
||||||
as last resort with a bounded grace window).
|
as last resort with a bounded grace window) plus optional
|
||||||
|
`/dev/shm/` orphan-segment sweep.
|
||||||
|
|
||||||
Shared implementation between the `tractor-reap` CLI
|
Shared implementation between the `tractor-reap` CLI
|
||||||
(`scripts/tractor-reap`) and the pytest session-scoped
|
(`scripts/tractor-reap`) and the pytest session-scoped
|
||||||
auto-fixture that guards the test suite against leftover
|
auto-fixture that guards the test suite against leftover
|
||||||
subactor processes.
|
subactor processes.
|
||||||
|
|
||||||
Design notes
|
Design notes — process reap
|
||||||
------------
|
---------------------------
|
||||||
|
|
||||||
|
- Linux-only today: reads `/proc/<pid>/{status,cwd,cmdline}`.
|
||||||
|
Module imports cleanly elsewhere; calling `find_*` on a
|
||||||
|
non-Linux box returns an empty list (no `/proc`
|
||||||
|
enumeration). A future xplatform pass could swap this
|
||||||
|
for `psutil.Process.children()` /
|
||||||
|
`psutil.process_iter()` since `psutil` is already a
|
||||||
|
test-time dependency.
|
||||||
|
|
||||||
- Linux-only: reads `/proc/<pid>/{status,cwd,cmdline}`.
|
|
||||||
- Two detection modes:
|
- Two detection modes:
|
||||||
|
|
||||||
1. **descendant-mode** — when invoked from a still-live
|
1. **descendant-mode** — when invoked from a still-live
|
||||||
|
|
@ -49,14 +57,71 @@ Design notes
|
||||||
we want the subactor runtime to run its trio cancel
|
we want the subactor runtime to run its trio cancel
|
||||||
shield + IPC teardown paths where it can.
|
shield + IPC teardown paths where it can.
|
||||||
|
|
||||||
|
Design notes — shm sweep
|
||||||
|
------------------------
|
||||||
|
|
||||||
|
Since `tractor/ipc/_mp_bs.disable_mantracker()` turns off
|
||||||
|
`mp.resource_tracker` entirely, a hard-crashing actor can
|
||||||
|
leave `/dev/shm/<key>` segments behind that nothing else
|
||||||
|
GCs (see
|
||||||
|
`ai/conc-anal/subint_forkserver_mp_shared_memory_issue.md`,
|
||||||
|
"Trade-offs / known gaps").
|
||||||
|
|
||||||
|
The shm sweep is **Linux-/FreeBSD-only**: both expose
|
||||||
|
POSIX shared-memory segments as regular files under
|
||||||
|
`/dev/shm`, so `os.stat()` + `os.unlink()` are the
|
||||||
|
correct primitives. macOS POSIX shm has no fs-visible
|
||||||
|
path (segments live behind `shm_open`/`shm_unlink`
|
||||||
|
syscalls only), and Windows is a different story
|
||||||
|
entirely. Calling the shm helpers on an unsupported
|
||||||
|
platform raises `NotImplementedError`.
|
||||||
|
|
||||||
|
In-use enumeration delegates to `psutil` —
|
||||||
|
`Process.memory_maps()` (post-mmap) +
|
||||||
|
`Process.open_files()` (pre-mmap shm-opened fds) —
|
||||||
|
xplatform, mature, and handles the per-process
|
||||||
|
permission/race edge cases correctly. Segments matching
|
||||||
|
neither are genuinely leaked → safe to unlink.
|
||||||
|
|
||||||
|
The "nobody has it open" check is the kernel-canonical
|
||||||
|
test — same answer `lsof /dev/shm/<key>` would give. No
|
||||||
|
reliance on tractor-specific naming conventions (shm
|
||||||
|
keys are caller-defined).
|
||||||
|
|
||||||
'''
|
'''
|
||||||
from __future__ import annotations
|
from __future__ import annotations
|
||||||
|
|
||||||
import os
|
import os
|
||||||
import pathlib
|
import pathlib
|
||||||
import signal
|
import signal
|
||||||
|
import stat
|
||||||
|
import sys
|
||||||
import time
|
import time
|
||||||
|
|
||||||
|
# `/dev/shm` is the POSIX-shm filesystem on Linux + FreeBSD.
|
||||||
|
# macOS uses `shm_open` syscalls without a fs-visible path,
|
||||||
|
# so the shm helpers refuse to run there.
|
||||||
|
_SHM_PLATFORM_OK: bool = sys.platform.startswith(
|
||||||
|
('linux', 'freebsd')
|
||||||
|
)
|
||||||
|
SHM_DIR: str = '/dev/shm'
|
||||||
|
|
||||||
|
|
||||||
|
def _ensure_shm_supported() -> None:
|
||||||
|
'''
|
||||||
|
Guard for shm helpers — they assume `/dev/shm` exists
|
||||||
|
as a tmpfs and `os.unlink()` is the right primitive.
|
||||||
|
Both true on Linux + FreeBSD; not true elsewhere.
|
||||||
|
|
||||||
|
'''
|
||||||
|
if not _SHM_PLATFORM_OK:
|
||||||
|
raise NotImplementedError(
|
||||||
|
f'shm reap is only supported on Linux/FreeBSD; '
|
||||||
|
f'got sys.platform={sys.platform!r}. macOS '
|
||||||
|
f'POSIX shm has no fs-visible path; Windows '
|
||||||
|
f'has no /dev/shm equivalent.'
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
def _read_status_ppid(pid: int) -> int | None:
|
def _read_status_ppid(pid: int) -> int | None:
|
||||||
'''
|
'''
|
||||||
|
|
@ -69,7 +134,11 @@ def _read_status_ppid(pid: int) -> int | None:
|
||||||
for line in f:
|
for line in f:
|
||||||
if line.startswith('PPid:'):
|
if line.startswith('PPid:'):
|
||||||
return int(line.split()[1])
|
return int(line.split()[1])
|
||||||
except (FileNotFoundError, PermissionError, ProcessLookupError):
|
except (
|
||||||
|
FileNotFoundError,
|
||||||
|
PermissionError,
|
||||||
|
ProcessLookupError,
|
||||||
|
):
|
||||||
return None
|
return None
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
@ -77,21 +146,32 @@ def _read_status_ppid(pid: int) -> int | None:
|
||||||
def _read_cwd(pid: int) -> str | None:
|
def _read_cwd(pid: int) -> str | None:
|
||||||
try:
|
try:
|
||||||
return os.readlink(f'/proc/{pid}/cwd')
|
return os.readlink(f'/proc/{pid}/cwd')
|
||||||
except (FileNotFoundError, PermissionError, ProcessLookupError):
|
except (
|
||||||
|
FileNotFoundError,
|
||||||
|
PermissionError,
|
||||||
|
ProcessLookupError,
|
||||||
|
):
|
||||||
return None
|
return None
|
||||||
|
|
||||||
|
|
||||||
def _read_cmdline(pid: int) -> str:
|
def _read_cmdline(pid: int) -> str:
|
||||||
try:
|
try:
|
||||||
with open(f'/proc/{pid}/cmdline', 'rb') as f:
|
with open(f'/proc/{pid}/cmdline', 'rb') as f:
|
||||||
return f.read().replace(b'\0', b' ').decode(errors='replace')
|
return f.read().replace(b'\0', b' ').decode(
|
||||||
except (FileNotFoundError, PermissionError, ProcessLookupError):
|
errors='replace',
|
||||||
|
)
|
||||||
|
except (
|
||||||
|
FileNotFoundError,
|
||||||
|
PermissionError,
|
||||||
|
ProcessLookupError,
|
||||||
|
):
|
||||||
return ''
|
return ''
|
||||||
|
|
||||||
|
|
||||||
def _iter_live_pids() -> list[int]:
|
def _iter_live_pids() -> list[int]:
|
||||||
'''
|
'''
|
||||||
Enumerate currently-alive pids from `/proc`.
|
Enumerate currently-alive pids from `/proc`. Returns
|
||||||
|
`[]` on systems without `/proc` (e.g. macOS).
|
||||||
|
|
||||||
'''
|
'''
|
||||||
try:
|
try:
|
||||||
|
|
@ -225,6 +305,158 @@ def _is_alive(pid: int) -> bool:
|
||||||
if line.startswith('State:'):
|
if line.startswith('State:'):
|
||||||
# e.g. 'State:\tZ (zombie)'
|
# e.g. 'State:\tZ (zombie)'
|
||||||
return 'Z' not in line.split()[1]
|
return 'Z' not in line.split()[1]
|
||||||
except (FileNotFoundError, ProcessLookupError):
|
except (
|
||||||
|
FileNotFoundError,
|
||||||
|
ProcessLookupError,
|
||||||
|
):
|
||||||
return False
|
return False
|
||||||
return True
|
return True
|
||||||
|
|
||||||
|
|
||||||
|
def _enumerate_in_use_shm(
|
||||||
|
shm_dir: str = SHM_DIR,
|
||||||
|
) -> set[str]:
|
||||||
|
'''
|
||||||
|
Return the set of `<shm_dir>/<file>` paths currently
|
||||||
|
held open by any live process — via `psutil`'s
|
||||||
|
xplatform `Process.memory_maps()` (post-mmap
|
||||||
|
segments) and `Process.open_files()` (pre-mmap
|
||||||
|
shm-opened fds).
|
||||||
|
|
||||||
|
Lazy-imports `psutil` so the module stays importable
|
||||||
|
on installs without it (it's a `testing` group dep).
|
||||||
|
|
||||||
|
'''
|
||||||
|
_ensure_shm_supported()
|
||||||
|
|
||||||
|
# lazy + actionable failure: leaked shm sweep is the
|
||||||
|
# only thing in this module that needs psutil; we
|
||||||
|
# don't want a top-level ImportError breaking the
|
||||||
|
# process-reap path.
|
||||||
|
try:
|
||||||
|
import psutil
|
||||||
|
except ImportError as exc:
|
||||||
|
raise RuntimeError(
|
||||||
|
'shm reap requires `psutil` — install the '
|
||||||
|
'`testing` dep group, e.g. '
|
||||||
|
'`uv sync --group testing`.'
|
||||||
|
) from exc
|
||||||
|
|
||||||
|
in_use: set[str] = set()
|
||||||
|
prefix: str = shm_dir.rstrip('/') + '/'
|
||||||
|
for proc in psutil.process_iter(['pid']):
|
||||||
|
try:
|
||||||
|
for m in proc.memory_maps(grouped=False):
|
||||||
|
if m.path.startswith(prefix):
|
||||||
|
in_use.add(m.path)
|
||||||
|
for f in proc.open_files():
|
||||||
|
if f.path.startswith(prefix):
|
||||||
|
in_use.add(f.path)
|
||||||
|
except (
|
||||||
|
psutil.NoSuchProcess,
|
||||||
|
psutil.AccessDenied,
|
||||||
|
psutil.ZombieProcess,
|
||||||
|
FileNotFoundError,
|
||||||
|
PermissionError,
|
||||||
|
):
|
||||||
|
# raced — proc died or we can't see its
|
||||||
|
# mappings (e.g. root-owned). Skip; missing
|
||||||
|
# an in-use entry only means we'd preserve
|
||||||
|
# something we could reap, never the
|
||||||
|
# reverse — safe-by-default.
|
||||||
|
continue
|
||||||
|
return in_use
|
||||||
|
|
||||||
|
|
||||||
|
def find_orphaned_shm(
|
||||||
|
*,
|
||||||
|
uid: int | None = None,
|
||||||
|
shm_dir: str = SHM_DIR,
|
||||||
|
) -> list[str]:
|
||||||
|
'''
|
||||||
|
`<shm_dir>/<file>` paths that are:
|
||||||
|
|
||||||
|
- owned by `uid` (default: the current effective uid),
|
||||||
|
- and currently held by NO live process — i.e.
|
||||||
|
genuinely leaked.
|
||||||
|
|
||||||
|
Linux/FreeBSD only — see module docstring. No reliance
|
||||||
|
on caller-defined shm-key naming, so this works for
|
||||||
|
any tractor app (not just the test suite).
|
||||||
|
|
||||||
|
'''
|
||||||
|
_ensure_shm_supported()
|
||||||
|
|
||||||
|
if uid is None:
|
||||||
|
uid = os.geteuid()
|
||||||
|
|
||||||
|
try:
|
||||||
|
entries: list[str] = os.listdir(shm_dir)
|
||||||
|
except OSError:
|
||||||
|
return []
|
||||||
|
|
||||||
|
in_use: set[str] = _enumerate_in_use_shm(shm_dir=shm_dir)
|
||||||
|
leaked: list[str] = []
|
||||||
|
prefix: str = shm_dir.rstrip('/') + '/'
|
||||||
|
for entry in entries:
|
||||||
|
path: str = prefix + entry
|
||||||
|
try:
|
||||||
|
st: os.stat_result = os.stat(path)
|
||||||
|
except OSError:
|
||||||
|
continue
|
||||||
|
# only regular files — skip subdirs / sockets etc.
|
||||||
|
if not stat.S_ISREG(st.st_mode):
|
||||||
|
continue
|
||||||
|
if st.st_uid != uid:
|
||||||
|
continue
|
||||||
|
if path in in_use:
|
||||||
|
continue
|
||||||
|
leaked.append(path)
|
||||||
|
return leaked
|
||||||
|
|
||||||
|
|
||||||
|
def reap_shm(
|
||||||
|
paths: list[str],
|
||||||
|
*,
|
||||||
|
log=print,
|
||||||
|
) -> tuple[list[str], list[tuple[str, OSError]]]:
|
||||||
|
'''
|
||||||
|
Unlink the given `/dev/shm/...` paths.
|
||||||
|
|
||||||
|
Linux/FreeBSD only — `os.unlink()` is the correct
|
||||||
|
primitive on the POSIX-shm tmpfs there. macOS POSIX
|
||||||
|
shm has no fs-visible path; the equivalent there is
|
||||||
|
`posix_ipc.unlink_shared_memory(name)` (not
|
||||||
|
implemented here — see module docstring).
|
||||||
|
|
||||||
|
Returns `(unlinked, errors)` where `errors` is a list
|
||||||
|
of `(path, exc)` for paths that could not be removed
|
||||||
|
(e.g. permissions). Paths that raced to being already-
|
||||||
|
gone are counted as successfully unlinked.
|
||||||
|
|
||||||
|
'''
|
||||||
|
_ensure_shm_supported()
|
||||||
|
|
||||||
|
unlinked: list[str] = []
|
||||||
|
errors: list[tuple[str, OSError]] = []
|
||||||
|
for path in paths:
|
||||||
|
try:
|
||||||
|
os.unlink(path)
|
||||||
|
unlinked.append(path)
|
||||||
|
except FileNotFoundError:
|
||||||
|
# raced — already gone, treat as success
|
||||||
|
unlinked.append(path)
|
||||||
|
except OSError as exc:
|
||||||
|
errors.append((path, exc))
|
||||||
|
|
||||||
|
if unlinked:
|
||||||
|
log(
|
||||||
|
f'[tractor-reap] unlinked {len(unlinked)} '
|
||||||
|
f'orphaned shm segment(s): {unlinked}'
|
||||||
|
)
|
||||||
|
for path, exc in errors:
|
||||||
|
log(
|
||||||
|
f'[tractor-reap] could not unlink {path}: '
|
||||||
|
f'{exc!r}'
|
||||||
|
)
|
||||||
|
return (unlinked, errors)
|
||||||
|
|
|
||||||
2
uv.lock
2
uv.lock
|
|
@ -716,6 +716,7 @@ sync-pause = [
|
||||||
]
|
]
|
||||||
testing = [
|
testing = [
|
||||||
{ name = "pexpect" },
|
{ name = "pexpect" },
|
||||||
|
{ name = "psutil" },
|
||||||
{ name = "pytest" },
|
{ name = "pytest" },
|
||||||
{ name = "pytest-timeout" },
|
{ name = "pytest-timeout" },
|
||||||
]
|
]
|
||||||
|
|
@ -761,6 +762,7 @@ subints = [{ name = "msgspec", marker = "python_full_version >= '3.14'", specifi
|
||||||
sync-pause = [{ name = "greenback", marker = "python_full_version == '3.13.*'", specifier = ">=1.2.1,<2" }]
|
sync-pause = [{ name = "greenback", marker = "python_full_version == '3.13.*'", specifier = ">=1.2.1,<2" }]
|
||||||
testing = [
|
testing = [
|
||||||
{ name = "pexpect", specifier = ">=4.9.0,<5" },
|
{ name = "pexpect", specifier = ">=4.9.0,<5" },
|
||||||
|
{ name = "psutil", specifier = ">=7.0.0" },
|
||||||
{ name = "pytest", specifier = ">=8.3.5" },
|
{ name = "pytest", specifier = ">=8.3.5" },
|
||||||
{ name = "pytest-timeout", specifier = ">=2.3" },
|
{ name = "pytest-timeout", specifier = ">=2.3" },
|
||||||
]
|
]
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue