Harden cancel-ack hard-kill escalation

Two defensive fixes around the `Portal.cancel_actor()` +
`_try_cancel_then_kill()` escalation from `34f333a0`
"Escalate cancel-ack timeouts to `proc.terminate()`" (the
`trionics.start_or_cancel` follow-up); surfaced by
`/code-review high` on #462,

- guard `proc.terminate()` for backends whose `proc` slot
  isn't a `Process` — the future `subint` backend stores an
  `int` interp-id, so escalation would `AttributeError`
  instead of hard-killing; now it logs + no-ops.
- swap `assert cs.cancelled_caught` for an
  `if cs.cancelled_caught and raise_on_timeout:` guard so an
  unexpected shielded-scope exit returns a soft `False`
  rather than crashing `cancel_actor()` mid-teardown.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
trionics_start_or_cancel
Gud Boi 2026-06-17 19:46:04 -04:00
parent d28173f4c0
commit fdac157d3d
2 changed files with 24 additions and 3 deletions

View File

@ -334,8 +334,11 @@ class Portal:
# `move_on_after` fired — peer didn't ack within # `move_on_after` fired — peer didn't ack within
# bounded window. Behaviour depends on # bounded window. Behaviour depends on
# `raise_on_timeout`: # `raise_on_timeout`:
assert cs.cancelled_caught if (
if raise_on_timeout: cs.cancelled_caught
and
raise_on_timeout
):
raise ActorTooSlowError( raise ActorTooSlowError(
f'Peer {peer_id} did not ack `Actor.cancel()`' f'Peer {peer_id} did not ack `Actor.cancel()`'
f'-RPC within bounded wait of ' f'-RPC within bounded wait of '
@ -344,6 +347,11 @@ class Portal:
# legacy fire-and-forget path: log + return False so # legacy fire-and-forget path: log + return False so
# the caller can decide whether to escalate. # the caller can decide whether to escalate.
#
# NOTE, we also land here in the (unexpected) case where
# the shielded `move_on_after` block exits WITHOUT
# `return True` and WITHOUT the deadline firing — prefer
# a soft `False` over an `assert`-crash mid-teardown.
log.debug( log.debug(
f'May have failed to cancel peer?\n' f'May have failed to cancel peer?\n'
f'\n' f'\n'

View File

@ -151,7 +151,20 @@ async def _try_cancel_then_kill(
f' reason: {too_slow}\n' f' reason: {too_slow}\n'
f'-> escalating to `proc.terminate()` (hard-kill)\n' f'-> escalating to `proc.terminate()` (hard-kill)\n'
) )
proc.terminate() # XXX, the `subint` backend stores an `int` interp-id in the
# `proc` slot (not a `Process`), so it has no `.terminate()`.
# Guard here so a cancel-ack timeout doesn't `AttributeError`
# once that backend lands; its hard-kill path is a TODO.
if hasattr(proc, 'terminate'):
proc.terminate()
else:
log.error(
f'Cannot hard-kill sub-actor — backend proc-handle '
f'{proc!r} ({type(proc).__name__!r}) has no '
f'`.terminate()`!\n'
f' uid: {subactor.aid.reprol()!r}\n'
f'TODO: per-backend cancel-escalation.\n'
)
class ActorNursery: class ActorNursery: