Add WIP `subint_fork_proc` backend scaffold
Experimental third spawn backend: use a fresh sub-interpreter purely as a trio-free launchpad from which to `os.fork()` + exec back into `python -m tractor._child`. Per issue #379's "fork()-workaround/hacks" thread. Intent is to sidestep both, - the trio+fork hazards hitting `trio_proc` (python- trio/trio#1614 et al.), since the forking interp is guaranteed trio-free. - the shared-GIL abandoned-thread hazards hitting `subint_proc` (`ai/conc-anal/subint_sigint_starvation_issue.md`), since we don't *stay* in the subint — it only lives long enough to call `os.fork()` Downstream of the fork+exec, all the existing `trio_proc` plumbing is reused verbatim: `ipc_server.wait_for_peer()`, `SpawnSpec`, `Portal` yield, soft-kill. Status: NOT wired up beyond scaffolding. The fn raises `NotImplementedError` immediately; the `bootstrap` fork/exec string builder and the `# TODO: orchestrate driver thread` block are kept in-tree as deliberate dead code so the next iteration starts from a concrete shape rather than a blank page. Docstring calls out three open questions that need empirical validation before wiring this up: 1. Does CPython permit `os.fork()` from a non-main legacy subint? 2. Can the child stay fork-without-exec and `trio.run()` directly from within the launchpad subint? 3. How do `signal.set_wakeup_fd()` handlers and other process-global state interact when the forking thread is inside a subint? (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-codesubint_fork_backend
parent
99d70337b7
commit
cf0e3e6f8b
|
|
@ -431,3 +431,205 @@ async def subint_proc(
|
|||
finally:
|
||||
if not cancelled_during_spawn:
|
||||
actor_nursery._children.pop(uid, None)
|
||||
|
||||
|
||||
# ============================================================
|
||||
# WIP PROTOTYPE — `subint_fork_proc`
|
||||
# ============================================================
|
||||
# Experimental: use a sub-interpreter purely as a launchpad
|
||||
# from which to `os.fork()`, sidestepping the well-known
|
||||
# trio+fork issues (python-trio/trio#1614 etc.) by guaranteeing
|
||||
# the forking interp hasn't ever imported / run `trio`.
|
||||
#
|
||||
# The current `tractor.spawn._trio` backend already spawns a
|
||||
# subprocess and has the child connect back to the parent
|
||||
# over IPC. THIS prototype only changes *how* the subproc
|
||||
# comes into existence — everything downstream (parent-side
|
||||
# `ipc_server.wait_for_peer()`, `SpawnSpec`, `Portal` yield,
|
||||
# soft-kill) is reused verbatim.
|
||||
#
|
||||
# Reference: issue #379's "Our own thoughts, ideas for
|
||||
# fork()-workaround/hacks..." section.
|
||||
# ============================================================
|
||||
|
||||
|
||||
async def subint_fork_proc(
|
||||
name: str,
|
||||
actor_nursery: ActorNursery,
|
||||
subactor: Actor,
|
||||
errors: dict[tuple[str, str], Exception],
|
||||
|
||||
# passed through to actor main
|
||||
bind_addrs: list[UnwrappedAddress],
|
||||
parent_addr: UnwrappedAddress,
|
||||
_runtime_vars: dict[str, Any],
|
||||
*,
|
||||
infect_asyncio: bool = False,
|
||||
task_status: TaskStatus[Portal] = trio.TASK_STATUS_IGNORED,
|
||||
proc_kwargs: dict[str, any] = {},
|
||||
|
||||
) -> None:
|
||||
'''
|
||||
EXPERIMENTAL / WIP: `trio`-safe `fork()` via a pristine
|
||||
sub-interpreter launchpad.
|
||||
|
||||
Core trick
|
||||
----------
|
||||
Create a fresh subint that has *never* imported `trio`.
|
||||
From a worker thread, drive that subint to call
|
||||
`os.fork()`. In the forked CHILD process, `exec()` back
|
||||
into `python -m tractor._child` (a fresh process). In the
|
||||
fork PARENT (still inside the launchpad subint), do
|
||||
nothing — just let the subint's `exec` call return and
|
||||
the worker thread exit. The parent-side trio task then
|
||||
waits for the child process to connect back using the
|
||||
same `ipc_server.wait_for_peer()` flow as `trio_proc`.
|
||||
|
||||
Why this matters
|
||||
----------------
|
||||
The existing `trio_proc` backend spawns a subprocess via
|
||||
`trio.lowlevel.open_process()` which ultimately uses
|
||||
`posix_spawn()` (or `fork+exec`) from the parent's main
|
||||
interpreter — the one running `trio.run()`. That path is
|
||||
affected by the trio+fork issues tracked in
|
||||
python-trio/trio#1614 and related, some of which are
|
||||
side-stepped only incidentally because we always `exec()`
|
||||
immediately after fork.
|
||||
|
||||
By forking from a pristine subint instead, we have a
|
||||
known-clean-of-trio fork parent. If we later want to try
|
||||
**fork-without-exec** for faster startup and automatic
|
||||
parent-`__main__` inheritance (the property `mp.fork`
|
||||
gives for free), this approach could unlock that cleanly.
|
||||
|
||||
Relationship to the other backends
|
||||
----------------------------------
|
||||
- `trio_proc`: fork/exec from main interp → affected by
|
||||
trio+fork issues, solved via immediate exec.
|
||||
- `subint_proc`: in-process subint, no fork at all →
|
||||
affected by shared-GIL abandoned-thread hazards (see
|
||||
`ai/conc-anal/subint_sigint_starvation_issue.md`).
|
||||
- `subint_fork_proc` (THIS): OS-level subproc (like
|
||||
`trio_proc`) BUT forked from a trio-free subint →
|
||||
avoids both issue-classes above, at the cost of an
|
||||
extra subint create/destroy per spawn.
|
||||
|
||||
Status
|
||||
------
|
||||
**NOT IMPLEMENTED** beyond the bootstrap scaffolding
|
||||
below. Open questions needing empirical validation:
|
||||
|
||||
1. Does CPython allow `os.fork()` from a non-main
|
||||
sub-interpreter under the legacy config? The public
|
||||
API is silent; there may be PEP 684 safety guards.
|
||||
2. Does the forked child need to fully `exec()` or can
|
||||
we stay fork-without-exec and `trio.run()` directly
|
||||
from within the launchpad subint in the child? The
|
||||
latter is the "interesting" mode — faster startup,
|
||||
`__main__` inheritance — but opens the question of
|
||||
what residual state from the parent's main interp
|
||||
leaks into the child's subint.
|
||||
3. How do `signal.set_wakeup_fd()`, installed signal
|
||||
handlers, and other process-global state interact
|
||||
when the forking thread is inside a subint? The
|
||||
child presumably inherits them but a fresh
|
||||
`trio.run()` resets what it cares about.
|
||||
|
||||
'''
|
||||
if not _has_subints:
|
||||
raise RuntimeError(
|
||||
f'The {"subint_fork"!r} spawn backend requires '
|
||||
f'Python 3.14+ (private stdlib `_interpreters` C '
|
||||
f'module + tractor-usage stability).\n'
|
||||
f'Current runtime: {sys.version}'
|
||||
)
|
||||
|
||||
raise NotImplementedError(
|
||||
'`subint_fork_proc` is a WIP prototype scaffold — '
|
||||
'the driver thread + fork-bootstrap + connect-back '
|
||||
'orchestration below is not yet wired up. See '
|
||||
'issue #379 for context.\n'
|
||||
'(Structure kept in-tree so the next iteration has '
|
||||
'a concrete starting point rather than a blank page.)'
|
||||
)
|
||||
|
||||
# ------------------------------------------------------------
|
||||
# SKETCH (below is intentionally dead code; kept so reviewers
|
||||
# can see the shape we'd plausibly build up to). Roughly
|
||||
# mirrors `subint_proc` structure but WITHOUT the in-process
|
||||
# subint lifetime management — the subint only lives long
|
||||
# enough to call `os.fork()`.
|
||||
# ------------------------------------------------------------
|
||||
|
||||
# Create the launchpad subint. Legacy config matches
|
||||
# `subint_proc`'s reasoning (msgspec / PEP 684). For
|
||||
# fork-via-subint, isolation is moot since we don't
|
||||
# *stay* in the subint — we just need it trio-free.
|
||||
interp_id: int = _interpreters.create('legacy')
|
||||
log.runtime(
|
||||
f'Created launchpad subint for fork-spawn\n'
|
||||
f'(>\n'
|
||||
f' |_interp_id={interp_id}\n'
|
||||
)
|
||||
|
||||
uid: tuple[str, str] = subactor.aid.uid
|
||||
loglevel: str | None = subactor.loglevel
|
||||
|
||||
# Bootstrap fires inside the launchpad subint on a
|
||||
# worker OS-thread. Calls `os.fork()`. In the child,
|
||||
# `execv` back into the existing `python -m tractor._child`
|
||||
# CLI entry — which is what `trio_proc` already uses — so
|
||||
# the connect-back dance is identical. In the fork-parent
|
||||
# (still in the launchpad subint), return so the thread
|
||||
# can exit and we can `_interpreters.destroy()` the
|
||||
# launchpad.
|
||||
#
|
||||
# NOTE, `os.execv()` replaces the entire process image
|
||||
# (all interps, all threads — CPython handles this at the
|
||||
# OS level), so subint cleanup in the child is a no-op.
|
||||
import shlex
|
||||
uid_repr: str = repr(str(uid))
|
||||
parent_addr_repr: str = repr(str(parent_addr))
|
||||
bootstrap: str = (
|
||||
'import os, sys\n'
|
||||
'pid = os.fork()\n'
|
||||
'if pid == 0:\n'
|
||||
' # CHILD: full `exec` into fresh Python for\n'
|
||||
' # maximum isolation. (A `fork`-without-exec\n'
|
||||
' # variant would skip this and call\n'
|
||||
' # `_actor_child_main` directly — see class\n'
|
||||
' # docstring "Open question 2".)\n'
|
||||
' os.execv(\n'
|
||||
' sys.executable,\n'
|
||||
' [\n'
|
||||
' sys.executable,\n'
|
||||
" '-m',\n"
|
||||
" 'tractor._child',\n"
|
||||
f' {shlex.quote("--uid")!r},\n'
|
||||
f' {uid_repr},\n'
|
||||
f' {shlex.quote("--parent_addr")!r},\n'
|
||||
f' {parent_addr_repr},\n'
|
||||
+ (
|
||||
f' {shlex.quote("--loglevel")!r},\n'
|
||||
f' {loglevel!r},\n'
|
||||
if loglevel else ''
|
||||
)
|
||||
+ (
|
||||
f' {shlex.quote("--asyncio")!r},\n'
|
||||
if infect_asyncio else ''
|
||||
)
|
||||
+ ' ],\n'
|
||||
' )\n'
|
||||
'# FORK-PARENT branch falls through — we just want\n'
|
||||
'# the launchpad subint to finish so the driver\n'
|
||||
'# thread exits.\n'
|
||||
)
|
||||
|
||||
# TODO: orchestrate driver thread (mirror `subint_proc`'s
|
||||
# `_subint_target` pattern), then await
|
||||
# `ipc_server.wait_for_peer(uid)` on the parent side —
|
||||
# same as `trio_proc`. Soft-kill path is simpler here
|
||||
# than in `subint_proc`: we're managing an OS subproc,
|
||||
# not a legacy subint, so `Portal.cancel_actor()` + wait
|
||||
# + OS-level `SIGKILL` fallback (like `trio_proc`'s
|
||||
# `hard_kill()`) applies directly.
|
||||
|
|
|
|||
Loading…
Reference in New Issue