Add per-`ctx_key` isolation tests for `maybe_open_context()`

Add `test_per_ctx_key_resource_lifecycle` to verify that per-key user tracking correctly tears down resources independently - exercises the fix from 02b2ef18 where a global `_Cache.users` counter caused stale cache hits when the same `acm_func` was called with different kwargs. Also, add a paired `acm_with_resource()` helper `@acm` that yields its `resource_id` for per-key testing in the above suite. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code Prompt-IO: ai/prompt-io/claude/20260406T172848Z_02b2ef1_prompt_io.md
2026-04-06 14:33:26 -04:00 · 2026-04-06 14:33:26 -04:00 · 85f9c5df6f
parent ebe9d5e4b5
commit 85f9c5df6f
3 changed files with 193 additions and 0 deletions
--- a/ai/prompt-io/claude/20260406T172848Z_02b2ef1_prompt_io.md
+++ b/ai/prompt-io/claude/20260406T172848Z_02b2ef1_prompt_io.md
@ -0,0 +1,54 @@
 ---
 model: claude-opus-4-6
 service: claude
 session: (ad-hoc, not tracked via conf.toml)
 timestamp: 2026-04-06T17:28:48Z
 git_ref: 02b2ef1
 scope: tests
 substantive: true
 raw_file: 20260406T172848Z_02b2ef1_prompt_io.raw.md
 ---
 ## Prompt
 User asked to extend `tests/test_resource_cache.py` with a test
 that reproduces the edge case fixed in commit `02b2ef18` (per-key
 locking+user tracking in `maybe_open_context()`). The bug was
 originally triggered in piker's `brokerd.kraken` backend where the
 same `acm_func` was called with different kwargs, and the old
 global `_Cache.users` counter caused:
 - teardown skipped for one `ctx_key` bc another key's users kept
  the global count > 0
 - re-entry hitting `assert not resources.get(ctx_key)` during the
  teardown window
 User requested a test that would fail under the old code and pass
 with the fix.
 ## Response summary
 Designed and implemented `test_per_ctx_key_resource_lifecycle`
 which verifies per-`ctx_key` resource isolation by:
 1. Holding resource `'a'` open in a bg task
 2. Opening+closing resource `'b'` (same `acm_func`, different
   kwargs) while `'a'` is still alive
 3. Re-opening `'b'` and asserting cache MISS — proving `'b'` was
   torn down independently despite `'a'` keeping its own user
   count > 0
 With the old global counter, phase 3 would produce a stale cache
 HIT (leaked resource) or crash on the assert.
 Also added a trivial `acm_with_resource(resource_id)` ACM helper
 at module level.
 ## Files changed
 - `tests/test_resource_cache.py` — add `acm_with_resource` ACM +
  `test_per_ctx_key_resource_lifecycle` test fn
 ## Human edits
 None — committed as generated (pending user review).
--- a/ai/prompt-io/claude/README.md
+++ b/ai/prompt-io/claude/README.md
@ -0,0 +1,27 @@
 # AI Prompt I/O Log — claude
 This directory tracks prompt inputs and model
 outputs for AI-assisted development using
 `claude` (Claude Code).
 ## Policy
 Prompt logging follows the
 [NLNet generative AI policy][nlnet-ai].
 All substantive AI contributions are logged
 with:
 - Model name and version
 - Timestamps
 - The prompts that produced the output
 - Unedited model output (`.raw.md` files)
 [nlnet-ai]: https://nlnet.nl/foundation/policies/generativeAI/
 ## Usage
 Entries are created by the `/prompt-io` skill
 or automatically via `/commit-msg` integration.
 Human contributors remain accountable for all
 code decisions. AI-generated content is never
 presented as human-authored work.
--- a/tests/test_resource_cache.py
+++ b/tests/test_resource_cache.py
@ -425,3 +425,115 @@ def test_lock_not_corrupted_on_fast_cancel(
                    )
    trio.run(main)
@acm
 async def acm_with_resource(resource_id: str):
    '''
    Yield `resource_id` as the cached value.
    Used to verify per-`ctx_key` isolation when the same
    `acm_func` is called with different kwargs.
    '''
    yield resource_id
 def test_per_ctx_key_resource_lifecycle(
    debug_mode: bool,
    loglevel: str,
 ):
    '''
    Verify that `maybe_open_context()` correctly isolates resource
    lifecycle **per `ctx_key`** when the same `acm_func` is called
    with different kwargs.
    Previously `_Cache.users` was a single global `int` and
    `_Cache.locks` was keyed on `fid` (function ID), so calling
    the same `acm_func` with different kwargs (producing different
    `ctx_key`s) meant:
    - teardown for one key was skipped bc the *other* key's users
      kept the global count > 0,
    - and re-entry could hit the old
      `assert not resources.get(ctx_key)` crash during the
      teardown window.
    This was the root cause of a long-standing bug in piker's
    `brokerd.kraken` backend.
    '''
    timeout: float = 6
    if debug_mode:
        timeout = 999
    async def main():
        a_ready = trio.Event()
        a_exit = trio.Event()
        async def hold_resource_a():
            '''
            Open resource 'a' and keep it alive until signalled.
            '''
            async with maybe_open_context(
                acm_with_resource,
                kwargs={'resource_id': 'a'},
            ) as (cache_hit, value):
                assert not cache_hit
                assert value == 'a'
                log.info("resource 'a' entered (holding)")
                a_ready.set()
                await a_exit.wait()
                log.info("resource 'a' exiting")
        with trio.fail_after(timeout):
            async with (
                tractor.open_root_actor(
                    debug_mode=debug_mode,
                    loglevel=loglevel,
                ),
                trio.open_nursery() as tn,
            ):
                # Phase 1: bg task holds resource 'a' open.
                tn.start_soon(hold_resource_a)
                await a_ready.wait()
                # Phase 2: open resource 'b' (different kwargs,
                # same acm_func) then exit it while 'a' is still
                # alive.
                async with maybe_open_context(
                    acm_with_resource,
                    kwargs={'resource_id': 'b'},
                ) as (cache_hit, value):
                    assert not cache_hit
                    assert value == 'b'
                    log.info("resource 'b' entered")
                log.info("resource 'b' exited, waiting for teardown")
                await trio.lowlevel.checkpoint()
                # Phase 3: re-open 'b'; must be a fresh cache MISS
                # proving 'b' was torn down independently of 'a'.
                #
                # With the old global `_Cache.users` counter this
                # would be a stale cache HIT (leaked resource) or
                # trigger `assert not resources.get(ctx_key)`.
                async with maybe_open_context(
                    acm_with_resource,
                    kwargs={'resource_id': 'b'},
                ) as (cache_hit, value):
                    assert not cache_hit, (
                        "resource 'b' was NOT torn down despite "
                        "having zero users! (global user count bug)"
                    )
                    assert value == 'b'
                    log.info(
                        "resource 'b' re-entered "
                        "(cache miss, correct)"
                    )
                # Phase 4: let 'a' exit, clean shutdown.
                a_exit.set()
    trio.run(main)