Call it `test_lock_not_corrupted_on_fast_cancel()` and includes
a detailed doc string to explain. Implemented it "cleverly" by having
the target `@acm` cancel its parent nursery after a peer, cache-hitting
task, is already waiting on the task mutex release.
Here I was thinking the bcaster (usage) maybe required a rework but,
NOPE it's just bc a checkpoint was needed in the parent task owning the
`tn` which spawns `get_sub_and_pull()` tasks to ensure the bg allocated
`an`/portal is eventually cancel-called..
Ah well, at least i started a patch for `MsgStream.subscribe()` to make
it multicast revertible.. XD
Anyway, I tossed in some checks & notes related to all that unnecessary
effort since I do think i'll move forward implementing it:
- for the `cache_hit` case always verify that the `bcast` clone is
unregistered from the common state subs after
`.subscribe().__aexit__()`.
- do a light check that the implicit `MsgStream._broadcaster` is always
the only bcrx instance left-leaked into that state.. that is until
i get the proper de-allocation/reversion from multicast -> unicast
working.
- put in mega detailed note about the required parent-task checkpoint.
Since I recently discovered a very subtle race-case that can sometimes
cause the suite to hang, seemingly due to the `an: ActorNursery`
allocated *behind* the `.trionics.maybe_open_context()` usage; this can
result in never cancelling the 'streamer' subactor despite the `main()`
timeout-guard?
This led me to dig in and find that the underlying issue was 2-fold,
- our `BroadcastReceiver` termination-mgmt semantics in
`MsgStream.subscribe()` can result in the first subscribing task to
always keep the `MsgStream._broadcaster` instance allocated; it's
never `.aclose()`ed, which makes it tough to determine (and thus
trace) when all subscriber-tasks are actually complete and
exited-from-`.subscribe()`..
- i was shield waiting `.ipc._server.Server.wait_for_no_more_peers()` in
`._runtime.async_main()`'s shutdown sequence which would then compound
the issue resulting in a SIGINT-shielded hang.. the worst kind XD
Actual changes here are just styling, printing, and some mucking with
passing the `an`-ref up to the parent task in the root-actor where i was
doing a conditional `ActorNursery.cancel()` to mk sure that was actually
the problem. Presuming this is fixed the `.pause()` i left unmasked
should never hit.
Since it's for beg filtering, the current impl should be renamed anyway;
it's not just for filtering cancelled excs.
Deats,
- added a real doc string, links to official eg docs and fixed the
return typing.
- adjust all internal imports to match.
Namely that the more common-and-pertinent case is when
a `@context`-ep-fn contains the `finally`-footgun but without
a surrounding embedded `tn` (which currently still requires its own
scope embedded `trionics.maybe_raise_from_masking_exc()`) which can't
be compensated-for by `._rpc._invoke()` easily. Instead the test is
composed where the `._invoke()`-internal `tn` is the machinery being
addressed in terms of masking user-code excs with `trio.Cancelled`.
Deats,
- rename the test -> `test_unmasked_remote_exc` to reflect what the
runtime should actually be addressing/solving.
- drop the embedded `tn` from `sleep_n_chkpt_in_finally()` (for now)
since that case can't currently easily be addressed without the user
code using its own `trionics.maybe_raise_from_masking_exc()` inside
the nursery scope.
- as such drop all `tn` related params/logic/usage from the ep.
- add in a `Cancelled` handler block which checks for RTE masking and
always prints the occurrence loudly.
Follow up,
- obvi this suite will currently fail until the appropriate adjustment
is made to `._rpc._invoke()` to do the unmasking; coming next.
- we probably still need a case with an embedded user `tn` where if
the default strict-eg mode is used then a ctxc from the parent might
cause a non-graceful `Context.cancel()` outcome?
|_since the embedded user-`tn` will raise
`ExceptionGroup[trio.Cancelled]` upward despite the parent nursery's
scope being the canceller, or will a `collapse_eg()` inside the
`._invoke()` scope handle this as well?
Deats are documented within, but basically a subtlety we already track
with `trio`'s masking of excs by a checkpoint-in-`finally` can cause
compounded issues with our `@context` endpoints, mostly in terms of
remote error and cancel-ack relay semantics.
Nicely nailing 2 birds by leveraging the new `repl_fixture` support to
actually avoid use of a `pexpect`-style test B)
Functionality audit summary,
- ensures `open_crash_handler() as bxerr:` adheres to,
- `raise_on_exit` semantics including only raising from a list of exc-types,
- `repl_fixture` setup/teardown invocation and that `yield False` blocks REPL
interaction,
- delivering a `BoxedMaybeException` with the correct state set post
crash.
- all the above outside the actor-runtime existing.
Also luckily enough, this seems to have found a bug for which a fix is
coming right up!
It's been in the debug scripts quite a while without a wrapping test and
will be,
- only the 2nd such REPL test which uses a lower-level `@context` ep-API
- the first official and explicit use of `enable_transports=['uds']`
a suite.
Deats,
- flip to 'uds' tpt and 'devx' level logging in the script.
- add a new 2-case suite `test_ctxep_pauses_n_maybe_ipc_breaks` which
validates both the quit-early (via `BdbQuit`) and
channel-dropped-need-to-ctlc cases from a single test fn.
If the underlying example script fails (say due to a console output
pattern-mismatch, `AssertionError`) the `pexpect` managed subproc with
a `debug_mode=True` crash-handling-REPL engaged will ofc *not terminate*
due to any SIGINT sent by the test harnesss (since we shield from it as
part of normal sub-actor debugger operation). So instead always send
a 'continue' cmd to the active `PdbREPL`'s stdin so it deactivates and
allows the py-script-process to raise and terminate, unblocking the
`pexpect.spawn`'s internal subproc joiner (which would otherwise hang
without manual intervention, blocking downstream tests..).
Also, use the new `PexpectSpawner` type alias after actually importing
future annots.. XD
Such that we can more easily annotate any consumer test's of our
`.tests.devx.conftest.spawn()` fixture which delivers a closure which, when
called in a test fn body, transitively sub-invokes:
`pytest.Pytester.spawn()` -> `pexpect.spawn()`
IMO Expecting `Callable[[str], pexpect.pty_spawn.spawn]]` to be used all
over is a bit too.. verbose?
Like it sounds, verifying that when that param is passed to the runtime
startup eps (`.open_root_actor()/.open_nursery()`), the appropriate
tpt-protocol is deployed for IPC (both the server and bound endpoints)
in both the root and any sub-actors (as passed down from rent to child
via the `.msg.types.SpawnSpec`).
We already have the `.ipc` sub-pkg name so it seems a bit
redundant/noisy for a namespace path Bp
Leave an alias for the `Server` rn since it's already used in a few
other internal mods.. will likely rename later if everyone is cool with
it..
Namely any CLI driven runtime-config fixtures such as,
- `--spawn-backend` and `start_method`,
- `--tpdb` and `debug_mode`,
- `--tpt-proto` and `tpt_protos`/`tpt_proto`,
- `reg_addr` as driven by the above.
This moves all fixtures and necessary hook funcs (CLI parsing,
configuring and test-gen) to the `._testing.pytest` module and thus
allows any dependent project to leverage these fixtures in their own
test suites after pointing to that plugin mod using,
```python
# conftest.py
pytest_plugins: tuple[str] = (
"tractor._testing.pytest",
)
```
Also, add a new `._testing.addr` helper mod which now contains
a factored `get_rando_addr()` helper for creating test-sesh unique
tpt-specific registry (or other) IPC endpoint addrs.
For now it just boots a server, parametrized over all tpt-protos, sin
any actor runtime bootup. Obvi the future todo is ensuring it all works
with a client connecting via the equivalent lowlevel
`.ipc._chan._connect_chan()` API(s).
Namely while what I was actually trying to solve was why
`TransportClosed` was getting raised from `Portal.cancel_actor()` but
still useful edge case auditing either way. Also opts into the
`debug_mode` fixture with apprope timeout adjustment B)
In `tests/test_advanced_faults.py` that is.
Since instead of zero-responses like we'd expect from a network-socket
we actually can get a few differences from the OS when "everything IPC
is known"
XD
Namely it's about underlying `trio` exceptions versus how we wrap them
and how we expect to box them. A `TransportClosed` boxing improvement
is coming in follow up btw to make this all work!
B)
Via a new accumulative `--tpt-proto` arg you can select which
`tpt_protos: list[str]`-fixture protocol keys will be delivered to
opting in tests!
B)
Also includes,
- CLI quote handling/stripping.
- default of 'tcp'.
- only support one selection per session at the moment (until we figure
out how we want to support multiples, either simultaneously or
sequentially).
- draft a (masked) dynamic-`metafunc` parametrization in the
`pytest_generate_tests()` hook.
- first proven and working use in the `test_advanced_faults`-suite (and
thus its underlying
`examples/advanced_faults/ipc_failure_during_stream.py` script)!
|_ actually needed this to prove that the suite only has 2 failures on
'uds' seemingly due to low-level `trio` error semantics translation
differences to do with with calling `socket.close()`..
On a very nearly related topic,
- draft an (also commented out) `set_script_runtime_args()` fixture idea
for a std way of `partial`-ling in runtime args to `examples/`
scripts-as-modules defining a `main()` which would proxy to
`tractor.open_nursery()`.
Such that we can run (opting-in) tests on both TCP and UDS backends and
ensure the `reg_addr` fixture and various timeouts are adjusted
accordingly.
Impl deats,
- add a new `tpc_proto` CLI option and fixture to allow choosing which
"transport protocol" will be used in the test suites (either globally
or contextually).
- rm `_reg_addr` instead opting for a `_rando_port` which will only be
used for `reg_addr`s which are net-tpt-protos.
- rejig `reg_addr` fixture to set a ideally session-unique `testrun_reg_addr`
based on the `tpt_proto` setting making appropriate calls to `._addr`
APIs as needed.
- refine `daemon` fixture a bit with typing, `tpt_proto` timings, and
stderr capture.
- in `test_discovery` do a ton of type-annots, add `debug_mode` fixture
opt ins, augment `spawn_and_check_registry()` with `psutil.Process`
passing for introspection (when things go wrong..).
Namely transferring the `Actor` peer-`Channel` tracking attrs,
- `._peers` which maps the uids to client channels (with duplicates
apparently..)
- the `._peer_connected: dict[tuple[str, str], trio.Event]` child-peer
syncing table mostly used by parent actors to wait on sub's to connect
back during spawn.
- the `._no_more_peers = trio.Event()` level triggered state signal.
Further we move over with some minor reworks,
- `.wait_for_peer()` verbatim (adjusting all dependants).
- factor the no-more-peers shielded wait branch-block out of
the end of `async_main()` into 2 new server meths,
* `.has_peers()` with optional chan-connected checking flag.
* `.wait_for_no_more_peers()` which *just* does the
maybe-shielded `._no_more_peers.wait()`
That is moving from `._addr`,
- `TCPAddress` to `.ipc._tcp`
- `UDSAddress` to `.ipc._uds`
Obviously this requires adjusting a buncha stuff in `._addr` to avoid
import cycles (the original reason the module was not also included in
the new `.ipc` subpkg) including,
- avoiding "unnecessary" imports of `[Unwrapped]Address` in various modules.
* since `Address` is a protocol and the main point is that it **does
not need to be inherited** per
(https://typing.python.org/en/latest/spec/protocol.html#terminology)
thus I removed the need for it in both transport submods.
* and `UnwrappedAddress` is a type alias for tuples.. so we don't
really always need to be importing it since it also kinda obfuscates
what the underlying pairs are.
- not exporting everything in submods at the `.ipc` top level and
importing from specific submods by default.
- only importing various types under a `if typing.TYPE_CHECKING:` guard
as needed.
There was a very strange legacy test
`test_spawning.test_local_arbiter_subactor_global_state` which was
causing unforseen hangs/errors on the UDS tpt and looking deeper this
test was already doing root-actor things that should never have been
valid XD
So rework that test to properly demonstrate something of value
(i guess..) and add a new suite which start more rigorously auditing our
`open_root_actor()` permitted usage.
For the old test,
- since the main point of this test seemed to be the ability to invoke
the same function in both the parent and child actor (using the very
legacy `ActorNursery.run_in_actor()`.. due to be deprecated) rename it
to `test_run_in_actor_same_func_in_child`,
- don't re-enter `.open_root_actor()` since that's invalid usage (tested
in new suite see below),
- adjust some `spawn()` arg/var naming and ensure we only return in the
child.
For the new suite add tests for,
- ensuring the implicit `open_root_actor()` call under `open_nursery()`.
- double open of `open_root_actor()` from within the same process tree
both from a root and sub.
Intro some new `_exceptions` used in the new suite,
- a top level `RuntimeFailure` for generically expressing faults not of
our own doing that prevent successful operation; this is what we now
(changed in this commit) raise on attempts to open a 2nd root.
- mk `ActorFailure` derive from the former; it's already used from
`._spawn` when subprocs fail to boot.
EventFD class now expects the fd to already be init with open_eventfd
RingBuff Sender and Receiver fully manage SharedMemory and EventFD lifecycles, no aditional ctx mngrs needed
Separate ring buf tests into its own test bed
Add parametrization to test and cancellation
Add docstrings
Add simple testing data gen module .samples
Demonstrates fixed size frame-oriented reads by the child where the
parent only transmits a "read" stream msg on "frame fill events" such
that the child incrementally reads the shm list data (much like in
a real-time-buffered streaming system).
Muchas grax to @guilledk for finding the first issue which kicked of
this further scrutiny of the `tractor.Context` and `MsgStream` semantics
test suite with a strange edge case where,
- if the parent opened and immediately closed a stream while the remote
child task started and continued (without terminating) to send msgs
the parent's `open_context().__aexit__()` would **not block** on the
child to complete!
=> this was seemingly due to a bug discovered inside the
`.msg._ops.drain_to_final_msg()` stream handling case logic where we
are NOT checking if `Context._stream` is non-`None`!
As such this,
- extends the `test_caller_closes_ctx_after_callee_opens_stream` (now
renamed, see below) to include cases for all combinations of the child
and parent sending before receiving on the stream as well as all
placements of `Context.cancel()` in the parent before, around and after
the stream open.
- uses the new `expect_ctxc()` for expecting the taskc (`trio.Task`
cancelled)` cases.
- also extends the `test_callee_closes_ctx_after_stream_open` (also
renamed) to include the case where the parent sends a msg before it
receives.
=> this case has unveiled yet-another-bug where somehow the underlying
`MsgStream._rx_chan: trio.ReceiveMemoryChannel` is allowing the
child's `Return[None]` msg be consumed and NOT in a place where it is
correctly set as `Context._result` resulting in the parent hanging
forever inside `._ops.drain_to_final_msg()`..
Alongside,
- start renaming using the new "remote-task-peer-side" semantics
throughout the test module: "caller" -> "parent", "callee" -> "child".
Namely that `add_hooks: bool` should be the same as on the rent side..
Also, just drop the now unused `iter_maybe_sends`.
This makes the suite entire greeeeen btw, including the new sub-suite
which i hadn't runt before Bo