Add `tractor-reap` CLI + document auto-reap
New `scripts/tractor-reap` CLI wraps the `_testing._reap` mod for manual zombie-subactor cleanup after crashed pytest sessions. Two modes: - orphan-mode (default): finds PPid==1 procs with cwd matching repo root + `python` in cmdline. - descendant-mode (`--parent <pid>`): scoped sweep under a still-live supervisor. SC-polite: SIGINT with bounded grace window (default 3s) before escalating to SIGKILL. Exit code signals whether escalation was needed (useful for CI health-checks). Also, document both the auto-reap fixture and the CLI in `/run-tests` SKILL.md (section 10). (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-codesubint_forkserver_backend
parent
eae478f3d5
commit
6d76b60404
|
|
@ -521,3 +521,67 @@ filling log volume. Full post-mortem in
|
|||
`ai/conc-anal/subint_forkserver_test_cancellation_leak_issue.md`.
|
||||
Lesson codified here so future-me grep-finds the
|
||||
workaround before digging.
|
||||
|
||||
## 10. Reaping zombie subactors (`tractor-reap`)
|
||||
|
||||
**Symptom:** after a `pytest` run crashes, times out,
|
||||
or is `Ctrl+C`'d, subactor forks (esp. under
|
||||
`subint_forkserver`) can be reparented to `init`
|
||||
(PPid==1) and linger. They hold onto ports, inherit
|
||||
pytest's capture-pipe fds, and flakify later
|
||||
sessions.
|
||||
|
||||
**Two layers of defense:**
|
||||
|
||||
### a) Session-scoped auto-fixture (always on)
|
||||
|
||||
`tractor/_testing/pytest.py::_reap_orphaned_subactors`
|
||||
runs at pytest session teardown. It walks `/proc` for
|
||||
direct descendants of the pytest pid, SIGINTs them,
|
||||
waits up to 3s, then SIGKILLs survivors. SC-polite:
|
||||
gives the subactor runtime a chance to run its trio
|
||||
cancel shield + IPC teardown before escalation.
|
||||
|
||||
This is *autouse* and session-scoped — you don't need
|
||||
to do anything. It just runs.
|
||||
|
||||
### b) `scripts/tractor-reap` CLI (manual reap)
|
||||
|
||||
For the **pytest-died-mid-session** case (Ctrl+C, OOM
|
||||
kill, hung process you had to `kill -9`), the fixture
|
||||
never ran. Reach for the CLI:
|
||||
|
||||
```sh
|
||||
# default: orphans (PPid==1, cwd==repo, cmd contains python)
|
||||
scripts/tractor-reap
|
||||
|
||||
# descendant-mode: from a still-live supervisor
|
||||
scripts/tractor-reap --parent <pytest-pid>
|
||||
|
||||
# see what would be reaped, don't signal
|
||||
scripts/tractor-reap -n
|
||||
|
||||
# tune the SIGINT → SIGKILL grace window
|
||||
scripts/tractor-reap --grace 5
|
||||
```
|
||||
|
||||
Exit code: `0` if everyone exited on SIGINT, `1` if
|
||||
SIGKILL had to escalate — so you can chain it in CI
|
||||
health-checks (`scripts/tractor-reap || <alert>`).
|
||||
|
||||
**What it matches** (orphan-mode):
|
||||
- `PPid == 1` (reparented to init → definitely
|
||||
orphaned, not just a currently-running child)
|
||||
- `cwd == <repo-root>` (keeps the sweep scoped; won't
|
||||
touch unrelated init-children elsewhere)
|
||||
- `python` in cmdline
|
||||
|
||||
**What it does not do:** kill anything whose PPid is
|
||||
still a live tractor parent. If the parent is alive
|
||||
it's not an orphan; use `--parent <pid>` if you need
|
||||
to force-reap under a still-live supervisor.
|
||||
|
||||
**When NOT to run it:** while a pytest session is
|
||||
active in another terminal. It's safe (won't touch
|
||||
that session's live children in orphan-mode) but can
|
||||
race if the target session is mid-teardown.
|
||||
|
|
|
|||
|
|
@ -0,0 +1,124 @@
|
|||
#!/usr/bin/env python3
|
||||
# tractor: structured concurrent "actors".
|
||||
# Copyright 2018-eternity Tyler Goodlet.
|
||||
#
|
||||
# SPDX-License-Identifier: AGPL-3.0-or-later
|
||||
'''
|
||||
`tractor-reap` — SC-polite zombie-subactor reaper.
|
||||
|
||||
Finds `tractor` subactor processes left alive after a
|
||||
`pytest` (or any tractor-app) run that failed to fully
|
||||
cancel its actor tree, then sends SIGINT with a bounded
|
||||
grace window before escalating to SIGKILL.
|
||||
|
||||
Detection modes (auto-selected):
|
||||
|
||||
--parent <pid> : descendant-mode — kill procs whose
|
||||
PPid == <pid>. Use when a parent
|
||||
is still alive and you want to
|
||||
scope the sweep precisely (e.g.
|
||||
CI wrapper calling in from outside
|
||||
pytest).
|
||||
|
||||
(default) : orphan-mode — kill procs with
|
||||
PPid==1 (init-reparented) whose
|
||||
cwd matches the repo root AND
|
||||
whose cmdline contains `python`.
|
||||
The cwd filter is what prevents
|
||||
sweeping unrelated init-children.
|
||||
|
||||
Usage:
|
||||
|
||||
# after a pytest run crashed/was Ctrl+C'd
|
||||
scripts/tractor-reap
|
||||
|
||||
# from inside a still-live supervisor
|
||||
scripts/tractor-reap --parent 12345
|
||||
|
||||
# dry-run: list what would be reaped, don't signal
|
||||
scripts/tractor-reap -n
|
||||
|
||||
'''
|
||||
import argparse
|
||||
import pathlib
|
||||
import subprocess
|
||||
import sys
|
||||
|
||||
|
||||
def _repo_root() -> pathlib.Path:
|
||||
'''
|
||||
Use `git rev-parse --show-toplevel` when available;
|
||||
fall back to the repo this script lives in.
|
||||
|
||||
'''
|
||||
try:
|
||||
out: str = subprocess.check_output(
|
||||
['git', 'rev-parse', '--show-toplevel'],
|
||||
stderr=subprocess.DEVNULL,
|
||||
text=True,
|
||||
).strip()
|
||||
return pathlib.Path(out)
|
||||
except (subprocess.CalledProcessError, FileNotFoundError):
|
||||
return pathlib.Path(__file__).resolve().parent.parent
|
||||
|
||||
|
||||
def main() -> int:
|
||||
parser = argparse.ArgumentParser(
|
||||
prog='tractor-reap',
|
||||
description=__doc__,
|
||||
formatter_class=argparse.RawDescriptionHelpFormatter,
|
||||
)
|
||||
parser.add_argument(
|
||||
'--parent', '-p',
|
||||
type=int,
|
||||
default=None,
|
||||
help='descendant-mode: reap procs with PPid==<pid>',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--grace', '-g',
|
||||
type=float,
|
||||
default=3.0,
|
||||
help='SIGINT grace window in seconds (default 3.0)',
|
||||
)
|
||||
parser.add_argument(
|
||||
'--dry-run', '-n',
|
||||
action='store_true',
|
||||
help='list matched pids but do not signal',
|
||||
)
|
||||
args = parser.parse_args()
|
||||
|
||||
# import lazily so `--help` doesn't require the tractor
|
||||
# package to be importable (e.g. when running from a
|
||||
# shell not inside a venv).
|
||||
repo = _repo_root()
|
||||
sys.path.insert(0, str(repo))
|
||||
from tractor._testing._reap import (
|
||||
find_descendants,
|
||||
find_orphans,
|
||||
reap,
|
||||
)
|
||||
|
||||
if args.parent is not None:
|
||||
pids: list[int] = find_descendants(args.parent)
|
||||
mode: str = f'descendants of PPid={args.parent}'
|
||||
else:
|
||||
pids = find_orphans(repo)
|
||||
mode = f'orphans (PPid=1, cwd={repo})'
|
||||
|
||||
if not pids:
|
||||
print(f'[tractor-reap] no {mode} to reap')
|
||||
return 0
|
||||
|
||||
if args.dry_run:
|
||||
print(f'[tractor-reap] dry-run — {mode}:\n {pids}')
|
||||
return 0
|
||||
|
||||
signalled, survivors = reap(pids, grace=args.grace)
|
||||
# exit 0 if everyone exited cleanly, else 1 to signal
|
||||
# escalation happened — makes the command useful in
|
||||
# CI health-checks and `||`-chaining.
|
||||
return 0 if not survivors else 1
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
raise SystemExit(main())
|
||||
Loading…
Reference in New Issue