forked from goodboy/tractor
This is a lingering debugger locking race case we needed to handle: - child crashes acquires TTY lock in root and attaches to `pdb` - child IPC goes down such that all channels to the root are broken / non-functional. - root is stuck thinking the child is still in debug even though it can't be contacted and the child actor machinery hasn't been cancelled by its parent. - root get's stuck in deadlock with child since it won't send a cancel request until the child is finished debugging, but the child can't unlock the debugger bc IPC is down. To avoid this scenario add debug lock blocking list via `._debug.Lock._blocked: set[tuple]` which holds actor uids for any actor that is detected by the root as having no transport channel connections with said root (of which at least one should exist if this sub-actor at some point acquired the debug lock). The root consequently checks this list for any actor that tries to (re)acquire the lock and blocks with a `ContextCancelled`. When a debug condition is tested in `._runtime._invoke` the context's `._enter_debugger_on_cancel` which is set to `False` if the actor is on the block list in which case the post-mortem entry is skipped. Further this adds a root-locking-task side cancel scope to `Lock._root_local_task_cs_in_debug` which can be cancelled by the root runtime when a stale lock is detected after all IPC channels for the actor have been torn down. NOTE: right now we're NOT doing this since it seems to cause test failures likely due because it may cause pre-mature cancellation and maybe needs a bit more experimenting? |
||
---|---|---|
.. | ||
experimental | ||
trionics | ||
__init__.py | ||
_child.py | ||
_clustering.py | ||
_debug.py | ||
_discovery.py | ||
_entry.py | ||
_exceptions.py | ||
_forkserver_override.py | ||
_ipc.py | ||
_mp_fixup_main.py | ||
_portal.py | ||
_root.py | ||
_runtime.py | ||
_spawn.py | ||
_state.py | ||
_streaming.py | ||
_supervise.py | ||
log.py | ||
msg.py | ||
to_asyncio.py |