tractor/tests/trionics/test_patches.py

'''
Regression tests for `tractor.trionics.patches` —
defensive monkey-patches on upstream `trio` bugs.

Each test asserts:

1. The bug exists (or is gone — skip cleanly if
   upstream shipped the fix and our `is_needed()` now
   returns `False`).
2. Our patch fixes it (post-`apply()` the `repro()`
   returns cleanly within a tight wall-clock cap).

Wall-clock caps are critical here — the bugs we patch
are tight-loops or deadlocks, so a regression would
HANG the test runner unless we hard-cap each
`repro()` call.

'''
import signal

import pytest

from tractor.trionics import patches
from tractor.trionics.patches import _wakeup_socketpair as wsp


@pytest.fixture(autouse=True)
def _alarm_cleanup():
    '''
    Ensure no leftover SIGALRM survives a test failure
    or unexpected return.

    '''
    yield
    signal.alarm(0)


def test_wakeup_socketpair_drain_eof_patch_works():
    '''
    Without the patch, `WakeupSocketpair.drain()` on a
    socketpair whose write-end has been closed spins
    forever. With the patch applied, it returns
    cleanly within milliseconds.

    Wall-clock cap: 2s. If the patch regresses, SIGALRM
    fires and the test hard-fails with a clear signal
    instead of hanging CI indefinitely.

    '''
    if not wsp.is_needed():
        pytest.skip(
            'upstream trio shipped the fix — '
            'patch no longer needed for trio '
            '(see `is_needed()` for version gate)'
        )

    # Apply the patch.
    applied: bool = wsp.apply()
    # First call MUST return True; idempotent guard
    # prevents False on subsequent calls within the
    # same process.
    assert applied is True or applied is False  # idempotent

    # Cap wall-clock at 2s; SIGALRM raises in main
    # thread which interrupts the C-level recv loop
    # IF the patch regresses (since `signal.alarm`
    # uses Python's signal-wakeup-fd which the patch
    # itself relies on... but `repro()` runs OUTSIDE
    # a trio.run, so it's plain stdlib semantics here
    # — alarm WILL fire during `recv` syscall).
    signal.alarm(2)
    wsp.repro()
    signal.alarm(0)


def test_apply_all_idempotent():
    '''
    Calling `apply_all()` twice should not double-
    apply: second call's dict has all-False values
    (every patch reports "already applied").

    '''
    first: dict[str, bool] = patches.apply_all()
    second: dict[str, bool] = patches.apply_all()

    # Second call: every patch reports skipped.
    assert all(v is False for v in second.values()), (
        f'apply_all() not idempotent: {second}'
    )

    # First call: at least one patch was applied
    # (or all are no-ops because `is_needed()` is
    # False everywhere — the all-fixed-upstream future
    # state which is also valid).
    assert isinstance(first, dict)
    for name, applied in first.items():
        assert isinstance(applied, bool), (
            f'patch {name!r} returned non-bool: {applied!r}'
        )
Add `tractor.trionics.patches` subpkg + first fix With a seminal patch fixing `trio`'s `WakeupSocketpair.drain()` which can busy-loop due to lack of handling `EOF`. New `tractor.trionics.patches` subpkg housing defensive monkey-patches for upstream `trio` bugs we've encountered while running `tractor` — particularly as of recent, fork-survival edge cases that haven't been filed/fixed upstream yet. Each patch is idempotent, version-gated via `is_needed()`, and carries a `# REMOVE WHEN:` marker pointing at the upstream release whose adoption allows deletion. Subpkg layout + per-patch contract documented in `tractor/trionics/patches/README.md` — `apply()` / `is_needed()` / `repro()` API, registry pattern via `_PATCHES` in `__init__.py`, single-call entry point `apply_all()`. First patch, `_wakeup_socketpair`: - `trio`'s `WakeupSocketpair.drain()` loops on `recv(64KB)` and exits ONLY on `BlockingIOError`, NEVER on `recv() == b''` (peer-closed FIN). - under `fork()`-spawning backends the COW-inherited socketpair fds & `_close_inherited_fds()` teardown can leave a `WakeupSocketpair` instance whose write-end is closed, and `drain()` then spins forever in C with no Python checkpoints, - this obviously burns 100% CPU and no signal delivery. Standalone repro: from trio._core._wakeup_socketpair import WakeupSocketpair ws = WakeupSocketpair() ws.write_sock.close() ws.drain() # spins forever Patch is one-line — break the drain loop on b'' EOF. Manifested as two distinct test failures: - `tests/test_multi_program.py::test_register_duplicate_name` hung at 100% CPU on the busy-loop directly (fork child's worker thread) - `tests/test_infected_asyncio.py::test_aio_simple_error` Mode-A deadlock — busy-loop wedged trio's scheduler inside `start_guest_run`, both threads parked in `epoll_wait`, no TCP connect-back to parent ever happened. Same patch fixes both. Restored 99.7% pass rate on full suite under `--spawn-backend=main_thread_forkserver` (was hanging indefinitely before). Wired into `tractor._child._actor_child_main` via `apply_all()` BEFORE any trio runtime init. Harmless on non-fork backends. Conc-anal write-ups, including strace + py-spy evidence: - `ai/conc-anal/trio_wakeup_socketpair_busy_loop_under_fork_issue.md` - `ai/conc-anal/infected_asyncio_under_main_thread_forkserver_hang_issue.md` Regression tests in `tests/trionics/test_patches.py`: each test asserts (a) the bug exists pre-patch (or is fixed upstream — skip cleanly), (b) the patch fixes it with a SIGALRM wall-clock cap so a regression hangs loud instead of silently. TODO: - [ ] file the upstream `python-trio/trio` issue + PR. - [ ] use the `repro()` callable in `_wakeup_socketpair.py` IS the issue body's evidence section. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code 2026-05-04 16:03:57 +00:00			`'''`
			Regression tests for `tractor.trionics.patches` —
			defensive monkey-patches on upstream `trio` bugs.

			`Each test asserts:`

			`1. The bug exists (or is gone — skip cleanly if`
			upstream shipped the fix and our `is_needed()` now
			returns `False`).
			2. Our patch fixes it (post-`apply()` the `repro()`
			`returns cleanly within a tight wall-clock cap).`

			`Wall-clock caps are critical here — the bugs we patch`
			`are tight-loops or deadlocks, so a regression would`
			`HANG the test runner unless we hard-cap each`
			`repro()` call.

			`'''`
			`import signal`

			`import pytest`

			`from tractor.trionics import patches`
			`from tractor.trionics.patches import _wakeup_socketpair as wsp`


			`@pytest.fixture(autouse=True)`
			`def _alarm_cleanup():`
			`'''`
			`Ensure no leftover SIGALRM survives a test failure`
			`or unexpected return.`

			`'''`
			`yield`
			`signal.alarm(0)`


			`def test_wakeup_socketpair_drain_eof_patch_works():`
			`'''`
			Without the patch, `WakeupSocketpair.drain()` on a
			`socketpair whose write-end has been closed spins`
			`forever. With the patch applied, it returns`
			`cleanly within milliseconds.`

			`Wall-clock cap: 2s. If the patch regresses, SIGALRM`
			`fires and the test hard-fails with a clear signal`
			`instead of hanging CI indefinitely.`

			`'''`
			`if not wsp.is_needed():`
			`pytest.skip(`
			`'upstream trio shipped the fix — '`
			`'patch no longer needed for trio '`
			'(see `is_needed()` for version gate)'
			`)`

			`# Apply the patch.`
			`applied: bool = wsp.apply()`
			`# First call MUST return True; idempotent guard`
			`# prevents False on subsequent calls within the`
			`# same process.`
			`assert applied is True or applied is False # idempotent`

			`# Cap wall-clock at 2s; SIGALRM raises in main`
			`# thread which interrupts the C-level recv loop`
			# IF the patch regresses (since `signal.alarm`
			`# uses Python's signal-wakeup-fd which the patch`
			# itself relies on... but `repro()` runs OUTSIDE
			`# a trio.run, so it's plain stdlib semantics here`
			# — alarm WILL fire during `recv` syscall).
			`signal.alarm(2)`
			`wsp.repro()`
			`signal.alarm(0)`


			`def test_apply_all_idempotent():`
			`'''`
			Calling `apply_all()` twice should not double-
			`apply: second call's dict has all-False values`
			`(every patch reports "already applied").`

			`'''`
			`first: dict[str, bool] = patches.apply_all()`
			`second: dict[str, bool] = patches.apply_all()`

			`# Second call: every patch reports skipped.`
			`assert all(v is False for v in second.values()), (`
			`f'apply_all() not idempotent: {second}'`
			`)`

			`# First call: at least one patch was applied`
			# (or all are no-ops because `is_needed()` is
			`# False everywhere — the all-fixed-upstream future`
			`# state which is also valid).`
			`assert isinstance(first, dict)`
			`for name, applied in first.items():`
			`assert isinstance(applied, bool), (`
			`f'patch {name!r} returned non-bool: {applied!r}'`
			`)`