With methods to comms similar to those that exist for the `trio` side,
- `.get()` which proxies verbatim to the `._to_aio: asyncio.Queue`,
- `.send_nowait()` which thin-wraps to `._to_trio: trio.MemorySendChannel`.
Obviously the more correct design is to break up the channel type into
a pair of handle types, one for each "side's" task in each event-loop,
that's hopefully coming shortly in a follow up patch B)
Also,
- fill in some missing doc strings, tweak some explanation comments and
update todos.
- adjust the `test_aio_errors_and_channel_propagates_and_closes()` suite
to use the new `chan` fn-sig-API with `.open_channel_from()` including
the new methods for msg comms; ensures everything added here works e2e.
Improve the `spawn` fixture teardown logic in
`tests/devx/conftest.py` fixing the while-else bug, and fix
`test_advanced_faults` genexp for `TransportClosed` exc type
checking.
Deats,
- replace broken `while-else` pattern with direct
`if ptyproc.isalive()` check after the SIGINT loop.
- fix undefined `spawned` ref -> `ptyproc.isalive()` in
while condition.
- improve walrus expr formatting in timeout check (multiline
style).
Also fix `test_ipc_channel_break_during_stream()` assertion,
- wrap genexp in `all()` call so it actually checks all excs
are `TransportClosed` instead of just creating an unused
generator.
(this patch was suggested by copilot in,
https://github.com/goodboy/tractor/pull/411)
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Adjust `basic_echo_server()` default sequence len to avoid the race
where the 'tell_little_bro()` finished streaming **before** the
echo-server sub is cancelled by its peer subactor (which is the whole
thing we're testing!).
Deats,
- bump `rng_seed` default from 50 -> 100 to ensure peer
cancel req arrives before echo dialog completes on fast hw.
- add `trio.sleep(0.001)` between send/receive in msg loop on the
"client" streamer side to give cancel request transit more time to
arrive.
Also,
- add more native `tractor`-type hints.
- reflow `basic_echo_server()` doc-string for 67 char limit
- add masked `pause()` call with comment about unreachable
code path
- alphabetize imports: mv `current_actor` and `open_nursery`
below typed imports
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add `TransportClosed` to except clauses where `trio`'s own
resource-closed errors are already caught, ensuring our
higher-level tpt exc is also tolerated in those same spots.
Likely i will follow up with a removal of the `trio` variants since most
*should be* caught and re-raised as tpt-closed out of the `.ipc` stack
now?
Add `TransportClosed` to various handler blocks,
- `._streaming.MsgStream.aclose()/.send()` except blocks.
- the broken-channel except in `._context.open_context_from_portal()`.
- obvi import it where necessary in those ^ mods.
Adjust `test_advanced_faults` suite + exs-script to match,
- update `ipc_failure_during_stream.py` example to catch
`TransportClosed` alongside `trio.ClosedResourceError`
in both the break and send-check paths.
- shield the `trio.sleep(0.01)` after tpt close in example to avoid
taskc-raise/masking on that checkpoint since we want to simulate
waiting for a user to send a KBI.
- loosen `ExceptionGroup` assertion to `len(excs) <= 2` and ensure all
excs are `TransportClosed`.
- improve multi-line formatting, minor style/formatting fixes in
condition expressions.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Refine tpt-error reporting to include closure attribution (`'locally'`
vs `'by peer'`), tighten match conditions and reduce needless newlines
in exc reprs.
Deats,
- factor out `trans_err_msg: str` and `by_whom: str` into a `dict`
lookup before the `match:` block to pair specific err msgs to closure
attribution strings.
- use `by_whom` directly as `CRE` case guard condition
(truthy when msg matches known underlying CRE msg content).
- conveniently include `by_whom!r` in `TransportClosed` message.
- fix `'locally ?'` -> `'locally?'` in send-side `CRE`
handler (drop errant space).
- add masked `maybe_pause_bp()` calls at both `CRE` sites (from when
i was tracing a test harness issue where the UDS socket path wasn't
being cleaned up on teardown).
- drop trailing `\n` from `body=` args to `TransportClosed`.
- reuse `trans_err_msg` for the `BRE`/broken-pipe guard.
Also adjust testing, namely `test_ctxep_pauses_n_maybe_ipc_breaks`'s
expected patts-set for new msg formats to be raised out of
`.ipc._transport`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Convert `spawn` fixture to a generator and add post-test graceful
subproc cleanup via `SIGINT`/`SIGKILL` to avoid leaving stale `pexpect`
child procs around between test runs as well as any UDS-tpt socket files
under the system runtime-dir.
Deats,
- convert `return _spawn` -> `yield _spawn` to enable
post-yield teardown logic.
- add a new `nonlocal spawned` ref so teardown logic can access the last
spawned child from outside the delivered spawner fn-closure.
- add `SIGINT`-loop after yield with 5s timeout, then
`SIGKILL` if proc still alive.
- add masked `breakpoint()` and TODO about UDS path cleanup
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Turns out we aren't clearing the `._state._runtime_vars` entries in
between `open_root_actor` calls.. This test refinement catches that by
adding runtime-vars asserts on the expected root-addrs value; ensure
`_runtime_vars['_root_addrs'] ONLY match the values provided by the
test's CURRENT root actor.
This causes a failure when the (just added)
`test_non_registrar_spawns_child` is run as part of the module suite,
it's fine when run standalone.
Ensure non-registrar root actors can spawn children and that
those children receive correct parent contact info. This test
catches the bug reported in,
https://github.com/goodboy/tractor/issues/410
Add new `test_non_registrar_spawns_child()` which spawns a sub-actor
from a non-registrar root and verifies the child can manually connect
back to its parent using `get_root()` API, auditing
`._state._runtime_vars` addr propagation from rent to child.
Also,
- improve type hints throughout test suites
(`subprocess.Popen`, `UnwrappedAddress`, `Aid` etc.)
- rename `n` -> `an` for actor nursery vars
- use multiline style for function signatures
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Test pkg-level init module and sub-pkg module logger naming
to better validate auto-naming logic.
Deats,
- create `pkg_init_mod` and write `mod_code` to it for
testing pkg-level `__init__.py` logger instance creation.
* assert `snakelib.__init__` logger name is `proj_name`.
- write `mod_code` to `subpkg/__init__.py`` as well and check the same.
Also,
- rename some vars,
* `pkg_mod` -> `pkg_submod`,
* `pkgmod` -> `subpkgmod`
- add `ModuleType` import for type hints
- improve comments explaining pkg init vs first-level
sub-module naming expectations.
- drop trailing whitespace and unused TODO comment
- remove masked `breakpoint()` call
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add assertions and comments to better test the reworked
implicit module-name detection in `get_logger()`.
Deats,
- add `assert not tractor.current_actor()` check to verify
no runtime is active during test.
- import `.log` submod directly for use.
- add masked `breakpoint()` for debugging mod loading.
- add comment about using `ranger` to inspect `testdir` layout
of auto-generated py pkg + module-files.
- improve comments explaining pkg-root-log creation.
- add TODO for testing `get_logger()` call from pkg
`__init__.py`
- add comment about first-pkg-level module naming.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Use new implicit module-name detection throughout codebase to simplify
logger creation and leverage auto-naming from caller mod .
Main changes,
- drop `name=__name__` arg from all `get_logger()` calls
(across 29 modules).
- update `get_console_log()` calls to include `name='tractor'` for
enabling root logger in test harness and entry points; this ensures
logic in `get_logger()` triggers so that **all** `tractor`-internal
logging emits to console.
- add info log msg in test `conftest.py` showing test-harness
log level
Also,
- fix `.actor.uid` ref to `.actor.aid.uid` in `._trace`.
- adjust a `._context` log msg formatting for clarity.
- add TODO comments in `._addr`, `._uds` for when we mv to
using `multiaddr`.
- add todo for `RuntimeVars` type hint TODO in `.msg.types` (once we
eventually get that all going obvi!)
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
A bit of test driven dev to anticipate support of `.log.get_logger()`
usage such that it can be called from arbitrary sub-modules, themselves
embedded in arbitrary sub-pkgs, of some project; the when not provided,
the `sub_name` passed to the `Logger.getChild(<sub_name>)` will be set
as the sub-pkg path "down to" the calling module.
IOW if you call something like,
`log = tractor.log.get_logger(pkg_name='mypylib')`
from some `submod.py` in a project-dir that looks like,
mypylib/
mod.py
subpkg/
submod.py <- calling module
the `log: StackLevelAdapter` child-`Logger` instance will have a
`.name: str = 'mypylib.subpkg'`, discluding the `submod` part since this
already rendered as the `{filename}` header in `log.LOG_FORMAT`.
Previously similar behaviour would be obtained by passing
`get_logger(name=__name__)` in the calling module and so much so it
motivated me to make this the default, presuming we can introspect for
the info.
Impl deats,
- duplicated a `load_module_from_path()` from `modden` to load the
`testdir` rendered py project dir from its path.
|_should prolly factor it down to this lib anyway bc we're going to
need it for hot code reload? (well that and `watchfiles` Bp)
- in each of `mod.py` and `submod.py` render the `get_logger()` code
sin `name`, expecting the (coming shortly) implicit introspection
feat to do this.
- do `.name` and `.parent` checks against expected sub-logger values
from `StackLevelAdapter.logger.getChildren()`.
To start ensuring that when `name=__name__` is passed we try to
de-duplicate the `_root_name` and any `leaf_mod: str` since it's already
included in the headers as `{filename}`.
Deats,
- heavily document the de-duplication `str.partition()`s in
`.log.get_logger()` and provide the end fix by changing the predicate,
`if rname == 'tractor':` -> `if rname == _root_name`.
* also toss in some warnings for when we still detect duplicates.
- add todo comments around logging "filters" (vs. our "adapter").
- create the new `test_log_sys.test_root_pkg_not_duplicated()` which
runs green with the fixes from ^.
- add a ton of test-suite todos both for existing and anticipated
logging sys feats in the new mod.
While working on a fix to the hang case found from
`test_cancel_ctx_with_parent_side_entered_in_bg_task` an initial
solution caused this test to hang indefinitely; solve it with a small
wrapping `_main()` + `trio.fail_after()` entrypoint.
Further suite refinements,
- move the top-most `try:`->`else:` block
- toss in a masked base-exc block for tracing unexpected
`ctx.wait_for_result()` outcomes.
- tweak the `raise_sub_spawn_error_after` to be an optional `float`
which scales the `rng_seed: int = 50` msg counter to
`tell_little_bro()` so that the abs value to the `range()` can be
changed.
Such that when `maybe_context.cancel()` is not called (explicitly) and
only the subactor is cancelled by its parent we expect to see a ctxc
raised both from any call to `Context.wait_for_result()` and out of
the `Portal.open_context()` scope, up to the `trio.run()`.
Deats,
- obvi call-n-catch the ctxc (in scope) for the oob-only
subactor-cancelled case.
- add branches around `trio.run()` entry to match.
Discovered while writing a `@context` sanity test to verify unmasker
ignore-cases support. Masked code is due to the process of finding the
minimal example causing the original hang discovered in the original
examples script. Details are in the test-fn doc strings and surrounding
comments; more refinement and cleanup coming obviously.
Also moved over the self-cancel todos from the inter-peer tests module.
Demonstrating the guilty `trio.Lock.acquire()` impl which puts
a checkpoint inside its `trio.WouldBlock` handler and which will always
appear to mask the "sync path" case on (graceful) cancellation.
This first script draft demos the issue from within a `tractor.context`
ep bc that's where it was orig discovered, however i'm going to factor
out the `tractor` code and instead just use
a `.trionics.maybe_raise_from_masking_exc()` to demo its low-level
ignore-case feature.
Further, this script exposed a previously unhandled remote graceful
cancellation case which hangs:
- parent actor spawns child and opens a >1 ctxs with it,
- the parent then OoB (out-of-band) cancels the child actor (with
`Portal.cancel_actor()`),
- since the open ctxs raise a ctxc with a `.canceller == parent.uid` the
`Context._is_self_cancelled()` will eval `True`,
- the `Context._scope` will NOT be cancelled in
`._maybe_cancel_and_set_remote_error()` resulting in any bg-task which
is waiting on a `Portal.open_context()` to not be cancelled/unblocked.
So my plan is to factor this ^^ scenario into a standalone unit test
as well as another test which consumes from al low-level `trio`-only
version of **this** script-scenario to sanity check the interaction
of the unmasker-with-ignore-cases usage implicitly around a ctx ep.
Call it `test_trioisms::test_unmask_aclose_as_checkpoint_on_aexit` and
parametrize all script-mod`.main()` toggles including `.xfails()` for
the `raise_unmasked=False` cases.
Including all caller usage throughout. Moving to a non-`except*` impl
means it's never needed as a signal from the caller - we can just catch
the beg outright (like we should have always been doing)..
Such that we audit the `shield=root_tn.cancel_scope.cancel_called,`
passed to `await debug._maybe_enter_pm()` in the `open_root_actor()`
exit handler block.
Verifying that if any exc is raised pre `chan.send_nowait()` (our
currentlly shite version of a `chan.started()`) then that exc is indeed
raised through on the `trio`-parent task side. This case was reproduced
from a `piker.brokers.ib` issue with a similar embedded
`.trionics.maybe_open_context()` call.
Deats,
- call the suite `test_aio_side_raises_before_started`.
- mk the `@context` simply `maybe_open_context(acm_func=open_channel_from)`
with a `target=raise_before_started` which,
- simply sleeps then immediately raises a RTE.
- expect the RTE from the aio-child-side to propagate all the way up to
the root-actor's task right up through the `trio.run()`.
Since it's merely a local-file-sys subdirectory and there should be no
reason file creation conflicts with other bind spaces.
Also add 2 test suites to match,
- `tests/ipc/test_each_tpt::test_uds_bindspace_created_implicitly` to
verify the dir creation when DNE.
- `..test_uds_double_listen_raises_connerr` to ensure a double bind
raises a `ConnectionError` from the src `OSError`.
Call it `test_lock_not_corrupted_on_fast_cancel()` and includes
a detailed doc string to explain. Implemented it "cleverly" by having
the target `@acm` cancel its parent nursery after a peer, cache-hitting
task, is already waiting on the task mutex release.
Here I was thinking the bcaster (usage) maybe required a rework but,
NOPE it's just bc a checkpoint was needed in the parent task owning the
`tn` which spawns `get_sub_and_pull()` tasks to ensure the bg allocated
`an`/portal is eventually cancel-called..
Ah well, at least i started a patch for `MsgStream.subscribe()` to make
it multicast revertible.. XD
Anyway, I tossed in some checks & notes related to all that unnecessary
effort since I do think i'll move forward implementing it:
- for the `cache_hit` case always verify that the `bcast` clone is
unregistered from the common state subs after
`.subscribe().__aexit__()`.
- do a light check that the implicit `MsgStream._broadcaster` is always
the only bcrx instance left-leaked into that state.. that is until
i get the proper de-allocation/reversion from multicast -> unicast
working.
- put in mega detailed note about the required parent-task checkpoint.
Since I recently discovered a very subtle race-case that can sometimes
cause the suite to hang, seemingly due to the `an: ActorNursery`
allocated *behind* the `.trionics.maybe_open_context()` usage; this can
result in never cancelling the 'streamer' subactor despite the `main()`
timeout-guard?
This led me to dig in and find that the underlying issue was 2-fold,
- our `BroadcastReceiver` termination-mgmt semantics in
`MsgStream.subscribe()` can result in the first subscribing task to
always keep the `MsgStream._broadcaster` instance allocated; it's
never `.aclose()`ed, which makes it tough to determine (and thus
trace) when all subscriber-tasks are actually complete and
exited-from-`.subscribe()`..
- i was shield waiting `.ipc._server.Server.wait_for_no_more_peers()` in
`._runtime.async_main()`'s shutdown sequence which would then compound
the issue resulting in a SIGINT-shielded hang.. the worst kind XD
Actual changes here are just styling, printing, and some mucking with
passing the `an`-ref up to the parent task in the root-actor where i was
doing a conditional `ActorNursery.cancel()` to mk sure that was actually
the problem. Presuming this is fixed the `.pause()` i left unmasked
should never hit.
Was failing due to the `.fail_after()` timeout being *too short* and
somehow the new interplay of that with strict-exception groups resulting
in the `TooSlowError` never raising but instead an eg with the embedded
`AssertionError`?? I still don't really get it honestly..
I've written up lengthy notes around the different `delay` settings that
can be used to see the diff outcomes, the failing case being the one
i still don't really grok and think is justification for `trio` to
bubble inner `Cancelled`s differently possibly?
For now i've included the original failing case as an `xfail`
parametrization for now which will hopefully drive a follow lowlevel
`trio` test in `test_trioisms`!
Namely `test_empty_mngrs_input_raises()` was failing due to
lazy-iterator use as input to `mngrs` which i guess i added support for
a while back (by it doing a `list(mngrs)` internally)? So just change it
to `gather_contexts(mngrs=())` and also tweak the `trio.fail_after(3)`
since it appears that the prior 1sec was causing
too-fast-of-a-cancellation (before the cluster fully spawned) and thus
the expected `ValueError` never to show..
Also, mask the `tractor.trionics.collapse_eg()` usage (again?) in
`open_actor_cluster()` since it seems unnecessary.
Seems that the way the actor-nursery interacts with the
`.trionics.gather_contexts()` API on cancellation makes our
`.trionics.collapse_eg()` not work as intended?
I need to dig into how `ActorNursery.cancel()` and `.__aexit__()` might
be causing this discrepancy..
Consider this a commit-of-my-index type save for rn.
Since it's for beg filtering, the current impl should be renamed anyway;
it's not just for filtering cancelled excs.
Deats,
- added a real doc string, links to official eg docs and fixed the
return typing.
- adjust all internal imports to match.