From 72fbda4cefabf698ec096ec0e272a6c81bed7320 Mon Sep 17 00:00:00 2001
From: Tyler Goodlet <jgbt@protonmail.com>
Date: Wed, 12 Oct 2022 12:35:11 -0400
Subject: [PATCH] Add nooz file

---
 nooz/337.feature.rst | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)
 create mode 100644 nooz/337.feature.rst

diff --git a/nooz/337.feature.rst b/nooz/337.feature.rst
new file mode 100644
index 0000000..6e3e903
--- /dev/null
+++ b/nooz/337.feature.rst
@@ -0,0 +1,41 @@
+Add support for debug-lock blocking using a ``._debug.Lock._blocked:
+set[tuple]`` and add ids when no-more IPC connections with the
+root actor are detected.
+
+This is an enhancement which (mostly) solves a lingering debugger
+locking race case we needed to handle:
+
+- child crashes acquires TTY lock in root and attaches to ``pdb``
+- child IPC goes down such that all channels to the root are broken
+  / non-functional.
+- root is stuck thinking the child is still in debug even though it
+  can't be contacted and the child actor machinery hasn't been
+  cancelled by its parent.
+- root get's stuck in deadlock with child since it won't send a cancel
+  request until the child is finished debugging (to avoid clobbering
+  a child that is actually using the debugger), but the child can't
+  unlock the debugger bc IPC is down and it can't contact the root.
+
+To avoid this scenario add debug lock blocking list via
+`._debug.Lock._blocked: set[tuple]` which holds actor uids for any actor
+that is detected by the root as having no transport channel connections
+(of which at least one should exist if this sub-actor at some point
+acquired the debug lock). The root consequently checks this list for any
+actor that tries to (re)acquire the lock and blocks with
+a ``ContextCancelled``. Further, when a debug condition is tested in
+``._runtime._invoke``, the context's ``._enter_debugger_on_cancel`` is
+set to `False` if the actor was put on the block list then all
+post-mortem / crash handling will be bypassed for that task.
+
+In theory this approach to block list management may cause problems
+where some nested child actor acquires and releases the lock multiple
+times and it gets stuck on the block list after the first use? If this
+turns out to be an issue we can try changing the strat so blocks are
+only added when the root has zero IPC peers left?
+
+Further, this adds a root-locking-task side cancel scope,
+``Lock._root_local_task_cs_in_debug``, which can be ``.cancel()``-ed by the root
+runtime when a stale lock is detected during the IPC channel testing.
+However, right now we're NOT using this since it seems to cause test
+failures likely due to causing pre-mature cancellation and maybe needs
+a bit more experimenting?