Commit Graph

2682 Commits (084f0fc404bf470454c5017a88762c6afeb50c8e)

Author SHA1 Message Date
Gud Boi 21b17b6a40 Refactor `_runtime_vars` into pure get/set API
Resetting `_runtime_vars` post-(forking-)spawn was
previously only possible via direct mutation of
`_state._runtime_vars` from an external module + an
inline default dict duplicating the
`_state.py`-internal defaults. Split the access
surface into a pure getter + explicit setter so such
a reset call site becomes a one-liner composition:
`set_runtime_vars(get_runtime_vars(clear_values=True))`.

Deats `tractor/runtime/_state.py`,
- extract initial values into a module-level
  `_RUNTIME_VARS_DEFAULTS: dict[str, Any]` constant; the
  live `_runtime_vars` is now initialised from
  `dict(_RUNTIME_VARS_DEFAULTS)`
- `get_runtime_vars()` grows a `clear_values: bool = False`
  kwarg. When True, returns a fresh copy of
  `_RUNTIME_VARS_DEFAULTS` instead of the live dict —
  still a **pure read**, never mutates anything
- new `set_runtime_vars(rtvars: dict | RuntimeVars)` —
  atomic replacement of the live dict's contents via
  `.clear()` + `.update()`, so existing references to the
  same dict object remain valid. Accepts either the
  historical dict form or the `RuntimeVars` struct

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 7804a9fe57693dd5e15bee6a08e7d2fa14b6a98a)
(factored: kept only the tractor/runtime/_state.py part; dropped
 tractor/spawn/_subint_forkserver.py call-site rewire)
2026-06-17 17:39:44 -04:00
Gud Boi d5c549a3c3 Mark `subint`-hanging tests with `skipon_spawn_backend`
Adopt the `@pytest.mark.skipon_spawn_backend('subint',
reason=...)` marker (a617b521) across the suites
reproducing the `subint` GIL-contention / starvation
hang classes doc'd in `ai/conc-anal/subint_*_issue.md`.

Deats,
- Module-level `pytestmark` on full-file-hanging suites:
  - `tests/test_cancellation.py`
  - `tests/test_inter_peer_cancellation.py`
  - `tests/test_pubsub.py`
  - `tests/test_shm.py`
- Per-test decorator where only one test in the file
  hangs:
  - `tests/discovery/test_registrar.py
    ::test_stale_entry_is_deleted` — replaces the
    inline `if start_method == 'subint': pytest.skip`
    branch with a declarative skip.
  - `tests/test_subint_cancellation.py
    ::test_subint_non_checkpointing_child`.
- A few per-test decorators are left commented-in-
  place as breadcrumbs for later finer-grained unskips.

Also, some nearby tidying in the affected files:
- Annotate loose fixture / test params
  (`pytest.FixtureRequest`, `str`, `tuple`, `bool`) in
  `tests/conftest.py`, `tests/devx/conftest.py`, and
  `tests/test_cancellation.py`.
- Normalize `"""..."""` → `'''...'''` docstrings per
  repo convention on a few touched tests.
- Add `timeout=6` / `timeout=10` to
  `@tractor_test(...)` on `test_cancel_infinite_streamer`
  and `test_some_cancels_all`.
- Drop redundant `spawn_backend` param from
  `test_cancel_via_SIGINT`; use `start_method` in the
  `'mp' in ...` check instead.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 4b2a0886c3)
(factored: dropped spawn-backend-only path: tests/test_subint_cancellation.py)
2026-06-17 17:39:44 -04:00
Gud Boi 239c3fb14d Add `skipon_spawn_backend` pytest marker
A reusable `@pytest.mark.skipon_spawn_backend( '<backend>' [, ...],
reason='...')` marker for backend-specific known-hang / -borked cases
— avoids scattering `@pytest.mark.skipif(lambda ...)` branches across
tests that misbehave under a particular `--spawn-backend`.

Deats,
- `pytest_configure()` registers the marker via
  `addinivalue_line('markers', ...)`.
- New `pytest_collection_modifyitems()` hook walks
  each collected item with `item.iter_markers(
  name='skipon_spawn_backend')`, checks whether the
  active `--spawn-backend` appears in `mark.args`, and
  if so injects a concrete `pytest.mark.skip(
  reason=...)`. `iter_markers()` makes the decorator
  work at function, class, or module (`pytestmark =
  [...]`) scope transparently.
- First matching mark wins; default reason is
  `f'Borked on --spawn-backend={backend!r}'` if the
  caller doesn't supply one.

Also, tighten type annotations on nearby `pytest`
integration points — `pytest_configure`, `debug_mode`,
`spawn_backend`, `tpt_protos`, `tpt_proto` — now taking
typed `pytest.Config` / `pytest.FixtureRequest` params.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 3b26b59dad)
2026-06-17 17:39:44 -04:00
Gud Boi 6afeb6b6ef Skip `test_stale_entry_is_deleted` hanger with `subint`s
(cherry picked from commit 985ea76de5)
2026-06-17 17:39:44 -04:00
Gud Boi b2cc4f502c Wall-cap `test_stale_entry_is_deleted` via `pytest-timeout`
Add a hard process-level wall-clock bound on a test
known to wedge un-Ctrl-C-ably under an in-dev spawn
backend, so an unattended suite run can't hang
indefinitely.

Deats,
- New `testing` dep: `pytest-timeout>=2.3`.
- `test_stale_entry_is_deleted`:
  `@pytest.mark.timeout(3, method='thread')`. The
  `method='thread'` choice is deliberate —
  `method='signal'` routes via `SIGALRM` which can be
  starved by the same GIL-hostage path that drops
  `SIGINT`, so it'd never actually fire in the
  starvation case.

At timeout, `pytest-timeout` hard-kills the pytest
process itself — that's the intended behavior here;
the alternative is the suite never returning.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 189f4e3f72e9f1eda5d24bcbab5743f7e35bd913)
(factored: kept pyproject + tests/discovery/test_registrar.py parts of
 "Wall-cap `subint` audit tests via `pytest-timeout`"; dropped
 tests/test_subint_cancellation.py)
2026-06-17 17:39:44 -04:00
Gud Boi fd4caa1903 Arm `dump_on_hang` on `test_stale_entry_is_deleted`
Wrap the test's `trio.run(main)` in
`dump_on_hang(seconds=20)` so any future hang
regression captures a stack dump for triage instead
of wedging CI silently; under the default backends
it's a no-op safety net.

Includes a "KNOWN ISSUE" comment block documenting
the (future) `subint` backend hang classes observed
against this test during Phase B bringup (#379).

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 4a3254583b)
(factored: kept only the tests/discovery/test_registrar.py part of
 "Doc `subint` backend hang classes + arm `dump_on_hang`"; dropped
 subint conc-anal docs + tests/test_subint_cancellation.py)
2026-06-17 17:39:44 -04:00
Gud Boi 253b210bcd Add `._debug_hangs` to `.devx` for hang triage
Bottle up the diagnostic primitives that actually cracked the
silent mid-suite hangs in the `subint` spawn-backend bringup (issue
there" session has them on the shelf instead of reinventing from
scratch.

Deats,
- `dump_on_hang(seconds, *, path)` — context manager wrapping
  `faulthandler.dump_traceback_later()`. Critical gotcha baked in:
  dumps go to a *file*, not `sys.stderr`, bc pytest's stderr
  capture silently eats the output and you can spend an hour
  convinced you're looking at the wrong thing
- `track_resource_deltas(label, *, writer)` — context manager
  logging per-block `(threading.active_count(),
  len(_interpreters.list_all()))` deltas; quickly rules out
  leak-accumulation theories when a suite progressively worsens (if
  counts don't grow, it's not a leak, look for a race on shared
  cleanup instead)
- `resource_delta_fixture(*, autouse, writer)` — factory returning
  a `pytest` fixture wrapping `track_resource_deltas` per-test; opt
  in by importing into a `conftest.py`. Kept as a factory (not a
  bare fixture) so callers own `autouse` / `writer` wiring

Also,
- export the three names from `tractor.devx`
- dep-free on py<3.13 (swallows `ImportError` for `_interpreters`)
- link back to the provenance in the module docstring (issue #379 /
  commit `26fb820`)

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 09466a1e9d)
2026-06-17 17:39:44 -04:00
Gud Boi f7dfd37df4 Extract `_actor_child_main()` as shared child entry
Pull the `_child.py` `__main__` block body out into
a callable `_actor_child_main()` so alternate spawn
backends can bootstrap a subactor without going
through the CLI entrypoint.

Deats,
- new `_actor_child_main(uid, loglevel, parent_addr,
  infect_asyncio, spawn_method='trio')` holds the
  full child-side runtime startup previously inlined
  under `if __name__ == '__main__':`
- `__main__` block reduces to arg-parsing + a call
  into the new func
- add `"subint"` to the `_runtime.py` spawn-method
  check so a child accepts `SpawnSpec` from that
  (future) backend; inert str-compare w/o it

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit b8f243e98d)
(factored: kept only the `_child.py`/`_runtime.py` entry-extraction parts of
 "Impl min-viable `subint` spawn backend (B.2)"; dropped
 tractor/spawn/_subint.py + subint prompt-io logs)
2026-06-17 17:39:44 -04:00
Gud Boi 4b63b7f1cc Handle py3.14+ incompats as test skips
Since we're devving subints we require the 3.14+ stdlib API
and a couple compiled libs don't support it yet, namely:
- `cffi`, which we're only using for the `.ipc._linux` eventfd
  stuff (now factored into `hotbaud` anyway).
- `greenback`, which requires `greenlet` which doesn't seem to be
  wheeled yet
  * on nixos the sdist build was failing due to lack of `g++` which
    i don't care to figure out rn since we don't need `.devx` stuff
    immediately for this subints prototype.
  * [ ] we still need to adjust any dependent suites to skip.

Adjust `test_ringbuf` to skip on import failure.

Also project wide,
- pin us to py 3.13+ in prep for last-2-minor-version policy.
- drop `msgspec>=0.20.0`, the first release with py3.14 support.

(cherry picked from commit d2ea8aa2de)
2026-06-17 17:39:44 -04:00
Gud Boi 54a694d452 Open py-version range + harness gate for py3.14 backends (#379)
Prep for a future sub-interpreter (PEP 734
`concurrent.interpreters`) spawn backend per issue
test-harness error-gating; the backend itself comes
later.

Deats,
- bump `pyproject.toml` `requires-python` to
  `>=3.12, <3.15` and list the `3.14` classifier —
  the new stdlib `concurrent.interpreters` module
  only ships on 3.14
- `_testing.pytest.pytest_configure` wraps
  `try_set_start_method()` in a `pytest.UsageError`
  handler so an unsupported `--spawn-backend` on the
  running py-version prints a clean banner instead
  of a traceback

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit d318f1f8f4)
(factored: kept only the pyproject + `_testing/pytest.py` parts of
 "Add `'subint'` spawn backend scaffold (#379)"; dropped
 tractor/spawn/_spawn.py + tractor/spawn/_subint.py)
2026-06-17 17:39:44 -04:00
Bd eb3e2b3574
Merge pull request #461 from goodboy/tooling_skills_n_config_from_mtf_dev
Tooling skills n config from mtf dev
2026-06-17 17:33:28 -04:00
Gud Boi 26f2b23da2 Address Copilot review nits on PR #461 (round 2)
Apply the genuinely-valid items from the latest GH Copilot
pass; the rest are house-style/local-convenience and get a
reply instead,

- drop the stray space before the colon in
  `test_no_runtime`'s `pytest.raises(...)` (one-off typo).
- drop the unused `cffi` binding in `test_ringbuf` — the
  `importorskip` side-effect is all we need.
- declare the `run-tests` skill's process-cleanup commands
  in `allowed-tools` (`ss`/`pgrep`/`pkill`/`sleep` + a
  scoped `rm -f /tmp/registry@*.sock`).

Review: PR #461 (copilot-pull-request-reviewer)
https://github.com/goodboy/tractor/pull/461#pullrequestreview-4519551472

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-06-17 17:16:36 -04:00
Gud Boi c0fdfa4c7b Pin sdist-install step to py3.13
The `sdist` CI job's install step ran bare `python -m pip
install`, picking up the runner's default py3.12 — which our
bumped `requires-python = ">=3.13"` now rejects
(`3.12.3 not in '<3.15,>=3.13'`).

- install into a `uv venv --python 3.13` so it matches the
  build step's `--python=3.13`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-06-17 16:25:41 -04:00
Gud Boi c5feeac4ce Pin `pytest>=9.0.3` for CVE-2025-71176 floor
Tighten the test-dep floor from `>=9.0` to `>=9.0.3` so a
fresh `uv` relock can't resolve back into the vulnerable
9.0.0–9.0.2 window; 9.0.3 is where upstream patched the
insecure-tmpdir advisory (CVE-2025-71176).

- annotate the constraint w/ the CVE id for future readers.
- update the existing bump-rationale comment to name the
  precise patched version.

(this commit-msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-06-17 16:05:41 -04:00
Gud Boi 9f1a64fcf7 Relock `uv.lock` for py3.13+ & `pytest` CVE
Regenerate the lockfile to match the bumped
`requires-python = ">=3.13, <3.15"` so the shipped venv
state lines up with `pyproject.toml`,

- drop the now-unsupported `cp312` wheels.
- add `cp314` wheels + py3.14 `resolution-markers`.
- bump the lock's own `requires-python` envelope.

The relock also pulls `pytest` 8.3.5 -> 9.1.0 which
resolves the insecure-tmpdir advisory (CVE-2025-71176,
patched in 9.0.3); this closes dependabot alert #3 and
supersedes the bot's standalone bump in #442.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-06-17 15:56:19 -04:00
Gud Boi 6b0cb17c04 Address Copilot review nits on PR #461
Per the GH Copilot review, clean a few metadata/comment
inconsistencies that would otherwise live on `main`,

- drop the `3.12` Trove classifier since `requires-python`
  is now `>=3.13`.
- remove the `ci.yml` comment referencing the not-yet-landed
  `--spawn-method=main_thread_forkserver` (misleading on
  `main`, and leaks the MTF name early).
- fix the `//tmp` double-slash typo in the local `Read()`
  permission pattern.

Review: PR #461 (copilot-pull-request-reviewer)
https://github.com/goodboy/tractor/pull/461#pullrequestreview-4510034938

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-06-17 15:47:07 -04:00
Gud Boi dd651943d1 Add `logspec` leaf-mod Route B follow-up doc
Follow-up note documenting why the deeper "Route B" fix
for `LogSpec`/`apply_logspec()` true per-leaf-MODULE level
control was NOT taken — in favor of the smaller
sub-PACKAGE fix that shipped in 9c36363b.

Doc covers,
- Status: what 9c36363b already gives (per-sub-pkg
  control at any nesting depth, `devx.debug` ≠ `devx`)
  vs. what remains unaddressed (per-leaf-mod levels,
  top-level lib mods like `tractor.to_asyncio` on the
  root logger).
- "Route B" sketch: make logger *identity* the full
  dotted module path; mv the cosmetic leaf-trim out of
  logger-naming into the *formatter's* `{name}`
  rendering.
- 6 breaking-change costs: every logger name changes,
  formatter rewrite, propagation/double-emit surface
  grows, level-inheritance semantics shift,
  `modden`/`piker` contract churn, `get_logger()`
  refactor risk.
- Migration plan if pursued: extract a pure
  `_mk_logger_name()` helper w/ an exhaustive name-shape
  test matrix, swap `get_logger()` to use it for
  identity, swap formatter to use the display string,
  golden-diff rendered headers, coordinate w/
  downstreams.
- "Route A" alternative: a `logging.Filter` keyed on
  `record.module`/`pathname` for per-leaf control w/o
  name churn — lower risk, narrower power.
- Recommendation: defer Route B; prefer Route A if
  per-leaf is needed soon; the shipped sub-PKG fix
  covers the common ask.

Lives under `ai/tooling-todos/` since it's a deferred-
work decision record, not a triage/conc-anal doc.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 5b3c2e3762)
(cherry picked from commit 232d6ccfbf)
2026-06-17 15:47:07 -04:00
Gud Boi cec0731253 Mk `test_no_runtime()` not require `pytest-trio`
(cherry picked from commit 79dda4cb4a)
(cherry picked from commit f2871ae3ff)
2026-06-17 15:47:07 -04:00
Gud Boi 9defec4922 Bump to latest `pytest` release!
(cherry picked from commit e329c3108c)
(cherry picked from commit 8d917cbf87)
2026-06-17 13:45:31 -04:00
Gud Boi 0a3772c0c1 Add `RuntimeVars` env-var lift design plan
Draft plan for consolidating pytest CLI flags,
ad-hoc env vars, and hardcoded fixture defaults
into the existing (but unused) `RuntimeVars`
struct as the single source of truth.

Deats,
- `_rtvars.py` leaf mod w/ `dump`/`load`/`get`/
  `update` helpers using `str(dict)` +
  `ast.literal_eval` encoding
- phased migration: test infra first, then
  runtime callers, then per-session bindspace
- addresses concurrent pytest session collisions
  and subproc env propagation for `devx/` scripts

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 7882c37ce0)
(cherry picked from commit 79c189bddd)
2026-06-17 13:45:31 -04:00
Gud Boi 3f18caaaac Add `test_register_duplicate_name` race analysis
Document the intermittent connect-refused failure in the registrar
daemon test — root cause is the `daemon` fixture's blind `time.sleep()`
readiness gate racing against the subproc's `bind()`/ `listen()`
completion. Distinct from the cancel- cascade `TooSlowError` flake
class.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 29f9928524)
(cherry picked from commit 15c3b670e1)
2026-06-17 13:45:31 -04:00
Gud Boi 3dc3f8401c Flip back to default `pytest` capture for CI
(cherry picked from commit 22cdf15b73)
(cherry picked from commit 781abf7558)
2026-06-17 13:45:31 -04:00
Gud Boi 8c6b7fdb31 Add posix-multithreaded-`fork()` explainer doc
(cherry picked from commit 532a9834f3)
(cherry picked from commit 23b8a80e15)
2026-06-17 13:45:31 -04:00
Gud Boi 4e9f392ca3 Drop global `pytest-timeout` cap from `pyproject.toml`
`timeout = 200` was firing via SIGALRM (the default
`method='signal'`) which synchronously raises `Failed` in
trio's main thread mid-`epoll.poll()`, abandoning trio's
runner mid-flight and leaving `GLOBAL_RUN_CONTEXT` half-
installed. EVERY subsequent `trio.run()` in the same pytest
session then bails with
`RuntimeError: Attempted to call run() from inside a run()`.

Empirical impact: a session that hits a single 200s hang
cascades into 30-40 false-positive failures across every
downstream test file that uses `trio.run`. Recent UDS run
saw 1 real timeout (`test_unregistered_err_still_relayed`)
poison 38 sibling tests with cascade-fails — a debugging
nightmare.

Same architectural bug we already documented in
`tests/test_advanced_streaming.py::test_dynamic_pub_sub`
(see its module-level NOTE) — both `pytest-timeout`
enforcement modes are incompatible with trio under fork-
based spawn backends. Now scoped session-wide.

For tests that legitimately need a wall-clock cap, the
canonical pattern is `with trio.fail_after(N):` INSIDE the
test — trio's own `Cancelled` machinery cleanly unwinds
the actor nursery without disturbing global state.

For CI: rely on job-level wall-clock timeouts (e.g. GitHub
Actions `timeout-minutes`) to abort genuinely-stuck suites.

`pyproject.toml` comment block spells this all out so a
future contributor doesn't reach back for `timeout =` and
re-introduce the bug.

ALSO, bump `xonsh` to at least `0.23.0` release.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 3c366cac13)
(cherry picked from commit bbf4fe66e3)
2026-06-17 13:45:31 -04:00
Gud Boi cda94235f4 Codify capture-pipe hang lesson in skills
Encode the hard-won lesson from the forkserver
cancel-cascade investigation into two skill docs
so future sessions grep-find it before spelunking
into trio internals.

Deats,
- `.claude/skills/conc-anal/SKILL.md`:
  - new "Unbounded waits in cleanup paths"
    section — rule: bound every `await X.wait()`
    in cleanup paths with `trio.move_on_after()`
    unless the setter is unconditionally
    reachable. Recent example:
    `ipc_server.wait_for_no_more_peers()` in
    `async_main`'s finally (was unbounded,
    deadlocked when any peer handler stuck)
  - new "The capture-pipe-fill hang pattern"
    section — mechanism, grep-pointers to the
    existing `conftest.py` guards (`tests/conftest
    .py:258`, `:316`), cross-ref to the full
    post-mortem doc, and the grep-note: "if a
    multi-subproc tractor test hangs, `pytest -s`
    first, conc-anal second"
- `.claude/skills/run-tests/SKILL.md`: new
  "Section 9: The pytest-capture hang pattern
  (CHECK THIS FIRST)" with symptom / cause /
  pre-existing guards to grep / three-step debug
  recipe (try `-s`, lower loglevel, redirect
  stdout/stderr) / signature of this bug vs. a
  real code hang / historical reference

Cost several investigation sessions before the
capture-pipe issue surfaced — it was masked by
deeper cascade deadlocks. Once the cascades were
fixed, the tree tore down enough to generate
pipe-filling log volume. Lesson: **grep this
pattern first when any multi-subproc tractor test
hangs under default pytest but passes with `-s`.**

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 4106ba73ea)
(cherry picked from commit 45c442060b)
2026-06-17 13:45:31 -04:00
Gud Boi a26619dfea Claude-perms: ensure /commit-msg files can be written!
(cherry picked from commit 76d12060aa)
(cherry picked from commit 1d70d33d9a)
2026-06-17 13:45:31 -04:00
Gud Boi 974cb159e8 Use SIGINT-first ladder in `run-tests` cleanup
The previous cleanup recipe went straight to
SIGTERM+SIGKILL, which hides bugs: tractor is
structured concurrent — `_trio_main` catches SIGINT
as an OS-cancel and cascades `Portal.cancel_actor`
over IPC to every descendant. So a graceful SIGINT
exercises the actual SC teardown path; if it hangs,
that's a real bug to file (the forkserver `:1616`
zombie was originally suspected to be one of these
but turned out to be a teardown gap in
`_ForkedProc.kill()` instead).

Deats,
- step 1: `pkill -INT` scoped to `$(pwd)/py*` — no
  sleep yet, just send the signal
- step 2: bounded wait loop (10 × 0.3s = ~3s) using
  `pgrep` to poll for exit. Loop breaks early on
  clean exit
- step 3: `pkill -9` only if graceful timed out, w/
  a logged escalation msg so it's obvious when SC
  teardown didn't complete
- step 4: same SIGINT-first ladder for the rare
  `:1616`-holding zombie that doesn't match the
  cmdline pattern (find PID via `ss -tlnp`, then
  `kill -INT NNNN; sleep 1; kill -9 NNNN`)
- steps 5-6: UDS-socket `rm -f` + re-verify
  unchanged

Goal: surface real teardown bugs through the test-
cleanup workflow instead of papering over them with
`-9`.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 70d58c4bd2)
(cherry picked from commit 222784ccc8)
2026-06-17 13:45:31 -04:00
Gud Boi c170007ddc Add zombie-actor check to `run-tests` skill
Fork-based backends (esp. `subint_forkserver`) can
leak child actor processes on cancelled / SIGINT'd
test runs; the zombies keep the tractor default
registry (`127.0.0.1:1616` / `/tmp/registry@1616.sock`)
bound, so every subsequent session can't bind and
50+ unrelated tests fail with the same
`TooSlowError` / "address in use" signature. Document
the pre-flight + post-cancel check as a mandatory
step 4.

Deats,
- **primary signal**: `ss -tlnp | grep ':1616'` for a
  bound TCP registry listener — the authoritative
  check since :1616 is unique to our runtime
- `pgrep -af` scoped to `$(pwd)/py[0-9]*/bin/python.*
  _actor_child_main|subint-forkserv` for leftover
  actor/forkserver procs — scoped deliberately so we
  don't false-flag legit long-running tractor-
  embedding apps like `piker`
- `ls /tmp/registry@*.sock` for stale UDS sockets
- scoped cleanup recipe (SIGTERM + SIGKILL sweep
  using the same `$(pwd)/py*` pattern, UDS `rm -f`,
  re-verify) plus a fallback for when a zombie holds
  :1616 but doesn't match the pattern: `ss -tlnp` →
  kill by PID
- explicit false-positive warning calling out the
  `piker` case (`~/repos/piker/py*/bin/python3 -m
  tractor._child ...`) so a bare `pgrep` doesn't lead
  to nuking unrelated apps

Goal: short-circuit the "spelunking into test code"
rabbit-hole when the real cause is just a leaked PID
from a prior session, without collateral damage to
other tractor-embedding projects on the same box.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit d093c31979)
(cherry picked from commit 1ebe15db3b)
2026-06-17 13:45:31 -04:00
Gud Boi fc3ab9561b Add global 200s `pytest-timeout`
(cherry picked from commit 5998774535)
(cherry picked from commit 19dd6fc739)
2026-06-17 13:45:31 -04:00
Gud Boi e5d4d9494b Import-or-skip `.devx.` tests requiring `greenback`
Which is for sure true on py3.14+ rn since `greenlet` didn't want to
build for us (yet).

(cherry picked from commit d6e70e9de4)
(cherry picked from commit ba2e474d9d)
2026-06-17 11:53:12 -04:00
Gud Boi 47ac8c0fef Avoid skip `.ipc._ringbuf` import when no `cffi`
(cherry picked from commit 03bf2b931e)
(cherry picked from commit 9157f58c15)
2026-06-17 11:03:12 -04:00
Gud Boi ba111b3042 Split py-version-gated uv dependency-groups
Reshuffle `pyproject.toml` deps into per-python-version
`[tool.uv.dependency-groups]`:
- `subints` group: `msgspec>=0.21.0`, py>=3.14
- `eventfd` group: `cffi>=1.17.1`, py>=3.13,<3.14
- `sync_pause` group: `greenback`, py>=3.13,<3.14
  (was in `devx`; moved out bc no 3.14 yet)

Bump top-level `msgspec>=0.20.0` too.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit 34d9d482e4)
(factored: kept only the pyproject dep-group parts of
 "Raise `subint` floor to py3.14 and split dep-groups"; dropped
 tractor/spawn/_spawn.py + tractor/spawn/_subint.py)
(cherry picked from commit ab6796dd45)
2026-06-16 14:21:08 -04:00
Gud Boi e8e4657e6d Pin `xonsh` to GH `main` in editable mode
(cherry picked from commit 64ddc42ad8)
(cherry picked from commit c4951c86ec)
2026-06-16 14:21:08 -04:00
Gud Boi c72dc8c235 Bump `xonsh` to latest pre `0.23` release
(cherry picked from commit b524ee4633)
(cherry picked from commit 83dab8e24e)
2026-06-16 14:21:08 -04:00
Gud Boi fae2525461 Expand `/run-tests` venv pre-flight to cover all cases
Rework section 3 from a worktree-only check into a
structured 3-step flow: detect active venv, interpret
results (Case A: active, B: none, C: worktree), then
run import + collection checks.

Deats,
- Case B prompts via `AskUserQuestion` when no venv
  is detected, offering `uv sync` or manual activate
- add `uv run` fallback section for envs where venv
  activation isn't practical
- new allowed-tools: `uv run python`, `uv run pytest`,
  `uv pip show`, `AskUserQuestion`

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit b1a0753a3f)
(cherry picked from commit 2dab0fdcc8)
2026-06-16 14:21:08 -04:00
Gud Boi dca79670aa Add `lastfailed` cache inspection to `/run-tests` skill
New "Inspect last failures" section reads the pytest
`lastfailed` cache JSON directly — instant, no
collection overhead, and filters to `tests/`-prefixed
entries to avoid stale junk paths.

Also,
- add `jq` tool permission for `.pytest_cache/` files

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit ba86d482e3)
(cherry picked from commit 2cad99f0bd)
2026-06-16 14:21:08 -04:00
Gud Boi bca7f10915 Reorganize `.gitignore` by skill/purpose
Group `.claude/` ignores per-skill instead of a
flat list: `ai.skillz` symlinks, `/open-wkt`,
`/code-review-changes`, `/pr-msg`, `/commit-msg`.
Add missing symlink entries (`yt-url-lookup` ->
`resolve-conflicts`, `inter-skill-review`). Drop
stale `Claude worktrees` section (already covered
by `.claude/wkts/`).

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code

(cherry picked from commit d3d6f646f9)
(cherry picked from commit 82aa0cca73)
2026-06-16 14:21:08 -04:00
Gud Boi 61f9b62d04 Ignore notes & snippets subdirs in `git`
(cherry picked from commit 9cf3d588e7)
(cherry picked from commit 9a7b595029)
2026-06-16 14:21:08 -04:00
Bd e75e29b1dc
Merge pull request #444 from goodboy/spawn_modularize
Spawner modules: split up subactor spawning  backends
2026-04-23 18:42:33 -04:00
Gud Boi a7b1ee34ef Restore fn-arg `_runtime_vars` in `trio_proc` teardown
During the Phase A extraction of `trio_proc()` out of
`spawn._spawn` into its own submod, the
`debug.maybe_wait_for_debugger(child_in_debug=...)` call site in
the hard-reap `finally` got refactored from the original
`_runtime_vars.get('_debug_mode', ...)` (the fn parameter — the
dict that was constructed by the *parent* for the *child*'s
`SpawnSpec`) to `get_runtime_vars().get(...)` (a global getter that
returns the *parent's* live `_state`). Those are semantically
different — the first asks "is the child we just spawned in debug
mode?", the second asks "are *we* in debug mode?". Under
mixed-debug-mode trees the swap can incorrectly skip (or
unnecessarily delay) the debugger-lock wait during teardown.

Revert to the fn-parameter lookup and add an inline `NOTE` comment
calling out the distinction so it's harder to regress again.

Deats,
- `spawn/_trio.py`: `child_in_debug=get_runtime_vars().get(...)` →
  `child_in_debug=_runtime_vars.get(...)` at the
  `debug.maybe_wait_for_debugger(...)` call in the hard-reap block;
  add 4-line `NOTE` explaining the parent-vs-child distinction.
- `spawn/__init__.py`: drop trailing whitespace after the
  `'mp_forkserver'` docstring bullet.
- `ai/prompt-io/prompts/subints_spawner.md`: drop duplicated `with`
  in `"as with with subprocs"` prose (copilot grammar catch).

Review: PR #444 (Copilot)
https://github.com/goodboy/tractor/pull/444#pullrequestreview-4165928469

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-23 18:30:11 -04:00
Gud Boi ae5b63c0bc Bump to `msgspec>=0.21.0` in lock file 2026-04-17 19:28:11 -04:00
Gud Boi f75865fb2e Tidy `spawn/` subpkg docstrings and imports
Drop unused `TYPE_CHECKING` imports (`Channel`,
`_server`), remove commented-out `import os` in
`_entry.py`, and use `get_runtime_vars()` accessor
instead of bare `_runtime_vars` in `_trio.py`.

Also,
- freshen `__init__.py` layout docstring for the
  new per-backend submod structure
- update `_spawn.py` + `_trio.py` module docstrings

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-17 19:03:00 -04:00
Gud Boi e0b8f23cbc Add prompt-io files for "phase-A", fix typos caught by copilot 2026-04-17 18:26:41 -04:00
Gud Boi 8d662999a4 Bump to `msgspec>=0.21` for py314 support 2026-04-17 16:54:07 -04:00
Gud Boi d7ca68cf61 Mv `trio_proc`/`mp_proc` to per-backend submods
Split the monolithic `spawn._spawn` into a slim
"core" + per-backend submodules so a future
`._subint` backend (per issue #379) can drop in
without piling more onto `_spawn.py`.

`._spawn` retains the cross-backend supervisor
machinery: `SpawnMethodKey`, `_methods` registry,
`_spawn_method`/`_ctx` state, `try_set_start_method()`,
the `new_proc()` dispatcher, and the shared helpers
`exhaust_portal()`, `cancel_on_completion()`,
`hard_kill()`, `soft_kill()`, `proc_waiter()`.

Deats,
- mv `trio_proc()` → new `spawn._trio`
- mv `mp_proc()` → new `spawn._mp`, reads `_ctx` and
  `_spawn_method` via `from . import _spawn` for
  late binding bc both get mutated by
  `try_set_start_method()`
- `_methods` wires up the new submods via late
  bottom-of-module imports to side-step circular
  dep (both backend mods pull shared helpers from
  `._spawn`)
- prune now-unused imports from `_spawn.py` — `sys`,
  `is_root_process`, `current_actor`,
  `is_main_process`, `_mp_main`, `ActorFailure`,
  `pretty_struct`, `_pformat`

Also,
- `_testing.pytest.pytest_generate_tests()` now
  drives the valid-backend set from
  `typing.get_args(SpawnMethodKey)` so adding a
  new backend (e.g. `'subint'`) doesn't require
  touching the harness
- refresh `spawn/__init__.py` docstring for the
  new layout

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-17 16:48:22 -04:00
Gud Boi b5b0504918 Add prompt-IO log for subint spawner design kickoff
Log the `claude-opus-4-7` design session that produced the phased plan
(A: modularize `_spawn`, B: `_subint` backend, C: harness) and concrete
Phase A file-split for #379. Substantive bc the plan directly drives
upcoming impl.

Prompt-IO: ai/prompt-io/claude/20260417T034918Z_9703210_prompt_io.md

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-17 16:48:22 -04:00
Gud Boi de78a6445b Initial prompt to vibe subint support Bo 2026-04-17 16:48:18 -04:00
Bd 5c98ab1fb6
Merge pull request #429 from goodboy/multiaddr_support
Multiaddresses: a novel `libp2p` peep's idea worth embracing
2026-04-16 23:59:11 -04:00
Gud Boi 3867403fab Scale `test_open_local_sub_to_stream` timeout by CPU factor
Import and apply `cpu_scaling_factor()` from
`conftest`; bump base from 3.6 -> 4 and multiply
through so CI boxes with slow CPUs don't flake.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-16 20:03:32 -04:00
Gud Boi 7c8e5a6732 Drop `snippets/multiaddr_ex.py` scratch script
Since we no longer need the example after integrating `multiaddr` into
the `.discovery` subsys.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-04-16 17:45:38 -04:00