Nowhere near ready yet since this generates many test-body-logic
un-handled true-positives which currently fail, but it's a first draft
for the general case set. To start, includes a greater-than-one-`Msg` (a
strict top level tagged-union-of-structs) pld spec alongside
a `AnyFieldMsg`-workaround struct-msg for packing all other builtin
non-`Struct`/`Any` python types alongside the other explicit msgs.
Removing the now masked-for-a-while unit test remnants for
`test_limit_msgspec()` (and its helper `chk_pld_type()`) since these cases
are now covered in the `test_pldrx_limiting` suite at an e2e
IPC-system-spanning level.
Note that the contents of the `chk_pld_type()` might be useful in the
future once we start setting/allowing semantics for various "phases of
IPC with matching msgspecs", but that's a little ways off rn and this
commit can always be looked up, also iirc most of the details were
already somewhat out of date and causing suite failure.
Such that we can parametrize the `@context(pld_spec)` endpoint setting
using `pytest` and of course enable testing more then just the lone
`maybe_msg_spec` case. The implementation was a bit tricky because
subactors import any `enable_modules` just after subproc spawn, so
there's no easy way to indicate from the parent should should be passed
to the `@context()` decorator since it's already resolved by the time an
IPC is established. Thus the bulk of this patch is implementing
a pre-ctx which monkey-patches the (test) `child()`-ep-defining-module
before running test logic.
Impl deats,
- drop `maybe_msg_spec` global instead providing the same value via
a new `pld_spec: Union[Type]` parametrized input to the test suite.
- add a `decorate_child_ep()` helper which (re-)decorates the
mod-defined `child()` IPC-context endpoint with the provided `pld_spec`.
- add a new "pre IPC context" endpoint: `set_chld_pldspec()` which can
be opened (from another actor) just prior to opening the `child()` ep
and it will decorate the latter (using `decorate_child_ep()`)
presuming a `.msg._exts.enc_type_union()` generated `pld_spec_strs`
is provided.
- actually open the `set_chld_pldspec()` as a `deco_ctx` rom the
root-actor and ensure we cancel it on block teardown in non-raising
cases.
With methods to comms similar to those that exist for the `trio` side,
- `.get()` which proxies verbatim to the `._to_aio: asyncio.Queue`,
- `.send_nowait()` which thin-wraps to `._to_trio: trio.MemorySendChannel`.
Obviously the more correct design is to break up the channel type into
a pair of handle types, one for each "side's" task in each event-loop,
that's hopefully coming shortly in a follow up patch B)
Also,
- fill in some missing doc strings, tweak some explanation comments and
update todos.
- adjust the `test_aio_errors_and_channel_propagates_and_closes()` suite
to use the new `chan` fn-sig-API with `.open_channel_from()` including
the new methods for msg comms; ensures everything added here works e2e.
A bit of test driven dev to anticipate support of `.log.get_logger()`
usage such that it can be called from arbitrary sub-modules, themselves
embedded in arbitrary sub-pkgs, of some project; the when not provided,
the `sub_name` passed to the `Logger.getChild(<sub_name>)` will be set
as the sub-pkg path "down to" the calling module.
IOW if you call something like,
`log = tractor.log.get_logger(pkg_name='mypylib')`
from some `submod.py` in a project-dir that looks like,
mypylib/
mod.py
subpkg/
submod.py <- calling module
the `log: StackLevelAdapter` child-`Logger` instance will have a
`.name: str = 'mypylib.subpkg'`, discluding the `submod` part since this
already rendered as the `{filename}` header in `log.LOG_FORMAT`.
Previously similar behaviour would be obtained by passing
`get_logger(name=__name__)` in the calling module and so much so it
motivated me to make this the default, presuming we can introspect for
the info.
Impl deats,
- duplicated a `load_module_from_path()` from `modden` to load the
`testdir` rendered py project dir from its path.
|_should prolly factor it down to this lib anyway bc we're going to
need it for hot code reload? (well that and `watchfiles` Bp)
- in each of `mod.py` and `submod.py` render the `get_logger()` code
sin `name`, expecting the (coming shortly) implicit introspection
feat to do this.
- do `.name` and `.parent` checks against expected sub-logger values
from `StackLevelAdapter.logger.getChildren()`.
To start ensuring that when `name=__name__` is passed we try to
de-duplicate the `_root_name` and any `leaf_mod: str` since it's already
included in the headers as `{filename}`.
Deats,
- heavily document the de-duplication `str.partition()`s in
`.log.get_logger()` and provide the end fix by changing the predicate,
`if rname == 'tractor':` -> `if rname == _root_name`.
* also toss in some warnings for when we still detect duplicates.
- add todo comments around logging "filters" (vs. our "adapter").
- create the new `test_log_sys.test_root_pkg_not_duplicated()` which
runs green with the fixes from ^.
- add a ton of test-suite todos both for existing and anticipated
logging sys feats in the new mod.
While working on a fix to the hang case found from
`test_cancel_ctx_with_parent_side_entered_in_bg_task` an initial
solution caused this test to hang indefinitely; solve it with a small
wrapping `_main()` + `trio.fail_after()` entrypoint.
Further suite refinements,
- move the top-most `try:`->`else:` block
- toss in a masked base-exc block for tracing unexpected
`ctx.wait_for_result()` outcomes.
- tweak the `raise_sub_spawn_error_after` to be an optional `float`
which scales the `rng_seed: int = 50` msg counter to
`tell_little_bro()` so that the abs value to the `range()` can be
changed.
Such that when `maybe_context.cancel()` is not called (explicitly) and
only the subactor is cancelled by its parent we expect to see a ctxc
raised both from any call to `Context.wait_for_result()` and out of
the `Portal.open_context()` scope, up to the `trio.run()`.
Deats,
- obvi call-n-catch the ctxc (in scope) for the oob-only
subactor-cancelled case.
- add branches around `trio.run()` entry to match.
Discovered while writing a `@context` sanity test to verify unmasker
ignore-cases support. Masked code is due to the process of finding the
minimal example causing the original hang discovered in the original
examples script. Details are in the test-fn doc strings and surrounding
comments; more refinement and cleanup coming obviously.
Also moved over the self-cancel todos from the inter-peer tests module.
Demonstrating the guilty `trio.Lock.acquire()` impl which puts
a checkpoint inside its `trio.WouldBlock` handler and which will always
appear to mask the "sync path" case on (graceful) cancellation.
This first script draft demos the issue from within a `tractor.context`
ep bc that's where it was orig discovered, however i'm going to factor
out the `tractor` code and instead just use
a `.trionics.maybe_raise_from_masking_exc()` to demo its low-level
ignore-case feature.
Further, this script exposed a previously unhandled remote graceful
cancellation case which hangs:
- parent actor spawns child and opens a >1 ctxs with it,
- the parent then OoB (out-of-band) cancels the child actor (with
`Portal.cancel_actor()`),
- since the open ctxs raise a ctxc with a `.canceller == parent.uid` the
`Context._is_self_cancelled()` will eval `True`,
- the `Context._scope` will NOT be cancelled in
`._maybe_cancel_and_set_remote_error()` resulting in any bg-task which
is waiting on a `Portal.open_context()` to not be cancelled/unblocked.
So my plan is to factor this ^^ scenario into a standalone unit test
as well as another test which consumes from al low-level `trio`-only
version of **this** script-scenario to sanity check the interaction
of the unmasker-with-ignore-cases usage implicitly around a ctx ep.
Call it `test_trioisms::test_unmask_aclose_as_checkpoint_on_aexit` and
parametrize all script-mod`.main()` toggles including `.xfails()` for
the `raise_unmasked=False` cases.
Including all caller usage throughout. Moving to a non-`except*` impl
means it's never needed as a signal from the caller - we can just catch
the beg outright (like we should have always been doing)..
Such that we audit the `shield=root_tn.cancel_scope.cancel_called,`
passed to `await debug._maybe_enter_pm()` in the `open_root_actor()`
exit handler block.
Verifying that if any exc is raised pre `chan.send_nowait()` (our
currentlly shite version of a `chan.started()`) then that exc is indeed
raised through on the `trio`-parent task side. This case was reproduced
from a `piker.brokers.ib` issue with a similar embedded
`.trionics.maybe_open_context()` call.
Deats,
- call the suite `test_aio_side_raises_before_started`.
- mk the `@context` simply `maybe_open_context(acm_func=open_channel_from)`
with a `target=raise_before_started` which,
- simply sleeps then immediately raises a RTE.
- expect the RTE from the aio-child-side to propagate all the way up to
the root-actor's task right up through the `trio.run()`.
Since it's merely a local-file-sys subdirectory and there should be no
reason file creation conflicts with other bind spaces.
Also add 2 test suites to match,
- `tests/ipc/test_each_tpt::test_uds_bindspace_created_implicitly` to
verify the dir creation when DNE.
- `..test_uds_double_listen_raises_connerr` to ensure a double bind
raises a `ConnectionError` from the src `OSError`.
Call it `test_lock_not_corrupted_on_fast_cancel()` and includes
a detailed doc string to explain. Implemented it "cleverly" by having
the target `@acm` cancel its parent nursery after a peer, cache-hitting
task, is already waiting on the task mutex release.
Here I was thinking the bcaster (usage) maybe required a rework but,
NOPE it's just bc a checkpoint was needed in the parent task owning the
`tn` which spawns `get_sub_and_pull()` tasks to ensure the bg allocated
`an`/portal is eventually cancel-called..
Ah well, at least i started a patch for `MsgStream.subscribe()` to make
it multicast revertible.. XD
Anyway, I tossed in some checks & notes related to all that unnecessary
effort since I do think i'll move forward implementing it:
- for the `cache_hit` case always verify that the `bcast` clone is
unregistered from the common state subs after
`.subscribe().__aexit__()`.
- do a light check that the implicit `MsgStream._broadcaster` is always
the only bcrx instance left-leaked into that state.. that is until
i get the proper de-allocation/reversion from multicast -> unicast
working.
- put in mega detailed note about the required parent-task checkpoint.
Since I recently discovered a very subtle race-case that can sometimes
cause the suite to hang, seemingly due to the `an: ActorNursery`
allocated *behind* the `.trionics.maybe_open_context()` usage; this can
result in never cancelling the 'streamer' subactor despite the `main()`
timeout-guard?
This led me to dig in and find that the underlying issue was 2-fold,
- our `BroadcastReceiver` termination-mgmt semantics in
`MsgStream.subscribe()` can result in the first subscribing task to
always keep the `MsgStream._broadcaster` instance allocated; it's
never `.aclose()`ed, which makes it tough to determine (and thus
trace) when all subscriber-tasks are actually complete and
exited-from-`.subscribe()`..
- i was shield waiting `.ipc._server.Server.wait_for_no_more_peers()` in
`._runtime.async_main()`'s shutdown sequence which would then compound
the issue resulting in a SIGINT-shielded hang.. the worst kind XD
Actual changes here are just styling, printing, and some mucking with
passing the `an`-ref up to the parent task in the root-actor where i was
doing a conditional `ActorNursery.cancel()` to mk sure that was actually
the problem. Presuming this is fixed the `.pause()` i left unmasked
should never hit.
Was failing due to the `.fail_after()` timeout being *too short* and
somehow the new interplay of that with strict-exception groups resulting
in the `TooSlowError` never raising but instead an eg with the embedded
`AssertionError`?? I still don't really get it honestly..
I've written up lengthy notes around the different `delay` settings that
can be used to see the diff outcomes, the failing case being the one
i still don't really grok and think is justification for `trio` to
bubble inner `Cancelled`s differently possibly?
For now i've included the original failing case as an `xfail`
parametrization for now which will hopefully drive a follow lowlevel
`trio` test in `test_trioisms`!
Namely `test_empty_mngrs_input_raises()` was failing due to
lazy-iterator use as input to `mngrs` which i guess i added support for
a while back (by it doing a `list(mngrs)` internally)? So just change it
to `gather_contexts(mngrs=())` and also tweak the `trio.fail_after(3)`
since it appears that the prior 1sec was causing
too-fast-of-a-cancellation (before the cluster fully spawned) and thus
the expected `ValueError` never to show..
Also, mask the `tractor.trionics.collapse_eg()` usage (again?) in
`open_actor_cluster()` since it seems unnecessary.
Seems that the way the actor-nursery interacts with the
`.trionics.gather_contexts()` API on cancellation makes our
`.trionics.collapse_eg()` not work as intended?
I need to dig into how `ActorNursery.cancel()` and `.__aexit__()` might
be causing this discrepancy..
Consider this a commit-of-my-index type save for rn.
Since it's for beg filtering, the current impl should be renamed anyway;
it's not just for filtering cancelled excs.
Deats,
- added a real doc string, links to official eg docs and fixed the
return typing.
- adjust all internal imports to match.
Namely that the more common-and-pertinent case is when
a `@context`-ep-fn contains the `finally`-footgun but without
a surrounding embedded `tn` (which currently still requires its own
scope embedded `trionics.maybe_raise_from_masking_exc()`) which can't
be compensated-for by `._rpc._invoke()` easily. Instead the test is
composed where the `._invoke()`-internal `tn` is the machinery being
addressed in terms of masking user-code excs with `trio.Cancelled`.
Deats,
- rename the test -> `test_unmasked_remote_exc` to reflect what the
runtime should actually be addressing/solving.
- drop the embedded `tn` from `sleep_n_chkpt_in_finally()` (for now)
since that case can't currently easily be addressed without the user
code using its own `trionics.maybe_raise_from_masking_exc()` inside
the nursery scope.
- as such drop all `tn` related params/logic/usage from the ep.
- add in a `Cancelled` handler block which checks for RTE masking and
always prints the occurrence loudly.
Follow up,
- obvi this suite will currently fail until the appropriate adjustment
is made to `._rpc._invoke()` to do the unmasking; coming next.
- we probably still need a case with an embedded user `tn` where if
the default strict-eg mode is used then a ctxc from the parent might
cause a non-graceful `Context.cancel()` outcome?
|_since the embedded user-`tn` will raise
`ExceptionGroup[trio.Cancelled]` upward despite the parent nursery's
scope being the canceller, or will a `collapse_eg()` inside the
`._invoke()` scope handle this as well?
Deats are documented within, but basically a subtlety we already track
with `trio`'s masking of excs by a checkpoint-in-`finally` can cause
compounded issues with our `@context` endpoints, mostly in terms of
remote error and cancel-ack relay semantics.
Nicely nailing 2 birds by leveraging the new `repl_fixture` support to
actually avoid use of a `pexpect`-style test B)
Functionality audit summary,
- ensures `open_crash_handler() as bxerr:` adheres to,
- `raise_on_exit` semantics including only raising from a list of exc-types,
- `repl_fixture` setup/teardown invocation and that `yield False` blocks REPL
interaction,
- delivering a `BoxedMaybeException` with the correct state set post
crash.
- all the above outside the actor-runtime existing.
Also luckily enough, this seems to have found a bug for which a fix is
coming right up!
It's been in the debug scripts quite a while without a wrapping test and
will be,
- only the 2nd such REPL test which uses a lower-level `@context` ep-API
- the first official and explicit use of `enable_transports=['uds']`
a suite.
Deats,
- flip to 'uds' tpt and 'devx' level logging in the script.
- add a new 2-case suite `test_ctxep_pauses_n_maybe_ipc_breaks` which
validates both the quit-early (via `BdbQuit`) and
channel-dropped-need-to-ctlc cases from a single test fn.
If the underlying example script fails (say due to a console output
pattern-mismatch, `AssertionError`) the `pexpect` managed subproc with
a `debug_mode=True` crash-handling-REPL engaged will ofc *not terminate*
due to any SIGINT sent by the test harnesss (since we shield from it as
part of normal sub-actor debugger operation). So instead always send
a 'continue' cmd to the active `PdbREPL`'s stdin so it deactivates and
allows the py-script-process to raise and terminate, unblocking the
`pexpect.spawn`'s internal subproc joiner (which would otherwise hang
without manual intervention, blocking downstream tests..).
Also, use the new `PexpectSpawner` type alias after actually importing
future annots.. XD
Such that we can more easily annotate any consumer test's of our
`.tests.devx.conftest.spawn()` fixture which delivers a closure which, when
called in a test fn body, transitively sub-invokes:
`pytest.Pytester.spawn()` -> `pexpect.spawn()`
IMO Expecting `Callable[[str], pexpect.pty_spawn.spawn]]` to be used all
over is a bit too.. verbose?