This change is masked out now BUT i'm leaving it in for reference.
I was debugging a multi-actor fault where the primary source actor was
an infected-aio-subactor (`brokerd.ib`) and it seemed like the REPL was only
entering on the `trio` side (at a `.open_channel_from()`) and not
eventually breaking in the `asyncio.Task`. But, since (changing
something?) it seems to be working now, it's just that the `trio` side
seems to sometimes handle before the (source/causing and more
child-ish) `asyncio`-task, which is a bit odd and not expected..
We could likely refine (maybe with an inter-loop-task REPL lock?) this
at some point and ensure a child-`asyncio` task which errors always
grabs the REPL **first**?
Lowlevel deats/further-todos,
- add (masked) `maybe_open_crash_handler()` block around
`asyncio.Task` execution with notes about weird parent-addr
delivery bug in `test_sync_pause_from_aio_task`
* yeah dunno what that's about but made a bug; seems to be IPC
serialization of the `TCPAddress` struct somewhere??
- add inter-loop lock TODO for avoiding aio-task clobbering
trio-tasks when both crash in debug-mode
Also,
- change import from `tractor.devx.debug` to `tractor.devx`
- adjust `get_logger()` call to use new implicit mod-name detection
added to `.log.get_logger()`, i.e. sin `name=__name__`.
- some teensie refinements to `open_channel_from()`:
* swap return type annotation for to `tuple[LinkedTaskChannel, Any]`
(was `Any`).
* update doc-string to clarify started-value delivery
* add err-log before `.pause()` in what should be an unreachable path.
* add todo to swap the `(first, chan)` pair to match that of ctx..
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
With methods to comms similar to those that exist for the `trio` side,
- `.get()` which proxies verbatim to the `._to_aio: asyncio.Queue`,
- `.send_nowait()` which thin-wraps to `._to_trio: trio.MemorySendChannel`.
Obviously the more correct design is to break up the channel type into
a pair of handle types, one for each "side's" task in each event-loop,
that's hopefully coming shortly in a follow up patch B)
Also,
- fill in some missing doc strings, tweak some explanation comments and
update todos.
- adjust the `test_aio_errors_and_channel_propagates_and_closes()` suite
to use the new `chan` fn-sig-API with `.open_channel_from()` including
the new methods for msg comms; ensures everything added here works e2e.
Expand and clarify the comment for the default `case _`
block in the `.send()` error matcher, noting that we
console-error and raise-thru for unexpected disconnect
conditions.
(this patch was suggested by copilot in,
https://github.com/goodboy/tractor/pull/411)
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Improve the `spawn` fixture teardown logic in
`tests/devx/conftest.py` fixing the while-else bug, and fix
`test_advanced_faults` genexp for `TransportClosed` exc type
checking.
Deats,
- replace broken `while-else` pattern with direct
`if ptyproc.isalive()` check after the SIGINT loop.
- fix undefined `spawned` ref -> `ptyproc.isalive()` in
while condition.
- improve walrus expr formatting in timeout check (multiline
style).
Also fix `test_ipc_channel_break_during_stream()` assertion,
- wrap genexp in `all()` call so it actually checks all excs
are `TransportClosed` instead of just creating an unused
generator.
(this patch was suggested by copilot in,
https://github.com/goodboy/tractor/pull/411)
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Adjust `basic_echo_server()` default sequence len to avoid the race
where the 'tell_little_bro()` finished streaming **before** the
echo-server sub is cancelled by its peer subactor (which is the whole
thing we're testing!).
Deats,
- bump `rng_seed` default from 50 -> 100 to ensure peer
cancel req arrives before echo dialog completes on fast hw.
- add `trio.sleep(0.001)` between send/receive in msg loop on the
"client" streamer side to give cancel request transit more time to
arrive.
Also,
- add more native `tractor`-type hints.
- reflow `basic_echo_server()` doc-string for 67 char limit
- add masked `pause()` call with comment about unreachable
code path
- alphabetize imports: mv `current_actor` and `open_nursery`
below typed imports
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Brief descriptions for both fns in `._discovery` clarifying
what each delivers and under what conditions.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Flesh out missing method doc-strings, improve log msg formatting and
assert -> `RuntimeError` for un-inited tpt layer.
Deats,
- add doc-string to `.send()` noting `TransportClosed` raise
on comms failures.
- add doc-string to `.recv()`.
- expand `._aiter_msgs()` doc-string, line-len reflow.
- add doc-string to `.connected()`.
- convert `assert self._transport` -> `RuntimeError` raise
in `._aiter_msgs()` for more explicit crashing.
- expand `_connect_chan()` doc-string, note it's lowlevel
and suggest `.open_portal()` to user instead.
- factor out `src_exc_str` in `TransportClosed` log handler
to avoid double-call
- use multiline style for `.connected()` return expr.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Remove the `trio.ClosedResourceError` and `trio.BrokenResourceError`
handling that should now be subsumed by `TransportClosed` re-raising out
of the `.ipc` stack.
Deats,
- drop CRE and BRE from `._streaming.MsgStream.aclose()/.send()` blocks.
- similarly rm from `._context.open_context_from_portal()`.
- also from `._portal.Portal.cancel_actor()` and drop the
(now-completed-todo) comment about this exact thing.
Also add comment in `._rpc.try_ship_error_to_remote()` noting the
remaining `trio` catches there are bc the `.ipc` layers *should* be
wrapping them; thus `log.critical()` use is warranted.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add `TransportClosed` to except clauses where `trio`'s own
resource-closed errors are already caught, ensuring our
higher-level tpt exc is also tolerated in those same spots.
Likely i will follow up with a removal of the `trio` variants since most
*should be* caught and re-raised as tpt-closed out of the `.ipc` stack
now?
Add `TransportClosed` to various handler blocks,
- `._streaming.MsgStream.aclose()/.send()` except blocks.
- the broken-channel except in `._context.open_context_from_portal()`.
- obvi import it where necessary in those ^ mods.
Adjust `test_advanced_faults` suite + exs-script to match,
- update `ipc_failure_during_stream.py` example to catch
`TransportClosed` alongside `trio.ClosedResourceError`
in both the break and send-check paths.
- shield the `trio.sleep(0.01)` after tpt close in example to avoid
taskc-raise/masking on that checkpoint since we want to simulate
waiting for a user to send a KBI.
- loosen `ExceptionGroup` assertion to `len(excs) <= 2` and ensure all
excs are `TransportClosed`.
- improve multi-line formatting, minor style/formatting fixes in
condition expressions.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Refine tpt-error reporting to include closure attribution (`'locally'`
vs `'by peer'`), tighten match conditions and reduce needless newlines
in exc reprs.
Deats,
- factor out `trans_err_msg: str` and `by_whom: str` into a `dict`
lookup before the `match:` block to pair specific err msgs to closure
attribution strings.
- use `by_whom` directly as `CRE` case guard condition
(truthy when msg matches known underlying CRE msg content).
- conveniently include `by_whom!r` in `TransportClosed` message.
- fix `'locally ?'` -> `'locally?'` in send-side `CRE`
handler (drop errant space).
- add masked `maybe_pause_bp()` calls at both `CRE` sites (from when
i was tracing a test harness issue where the UDS socket path wasn't
being cleaned up on teardown).
- drop trailing `\n` from `body=` args to `TransportClosed`.
- reuse `trans_err_msg` for the `BRE`/broken-pipe guard.
Also adjust testing, namely `test_ctxep_pauses_n_maybe_ipc_breaks`'s
expected patts-set for new msg formats to be raised out of
`.ipc._transport`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Internal helper which falls back to sync `pdb` when the
child actor can't reach root to acquire the TTY lock.
Useful when debugging tpt layer failures (intentional or
otherwise) where a sub-actor can no longer IPC-contact the
root to coordinate REPL access; root uses `.pause()` as
normal while non-root falls back to `mk_pdb().set_trace()`.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Convert `spawn` fixture to a generator and add post-test graceful
subproc cleanup via `SIGINT`/`SIGKILL` to avoid leaving stale `pexpect`
child procs around between test runs as well as any UDS-tpt socket files
under the system runtime-dir.
Deats,
- convert `return _spawn` -> `yield _spawn` to enable
post-yield teardown logic.
- add a new `nonlocal spawned` ref so teardown logic can access the last
spawned child from outside the delivered spawner fn-closure.
- add `SIGINT`-loop after yield with 5s timeout, then
`SIGKILL` if proc still alive.
- add masked `breakpoint()` and TODO about UDS path cleanup
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Unmask the CRE case block for peer-closed socket errors which already
had a TODO about reproducing the condition. It appears this case can
happen during inter-actor comms teardowns in `piker`, but i haven't been
able to figure out exactly what reproduces it yet..
So activate the block again for that 'socket already closed'-msg case,
and add a TODO questioning how to reproduce it.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
A partial revert of commit c05d08e426 since it seem we already
suppress tpt-closed errors lower down in `.ipc.Channel.send()`; given
that i'm pretty sure this new handler code should basically never run?
Left in a todo to remove the masked content once i'm done more
thoroughly testing under `piker`.
For IPC-disconnects-during-teardown edge cases, augment some `._rpc`
machinery,
- in `._invoke()` around the `await chan.send(return_msg)` where we
suppress if the underlying `Channel` already disconnected.
- add a disjoint handler in `_errors_relayed_via_ipc()` which just
reports-n-reraises the exc (same as prior behaviour).
* originally i thought it needed to be handled specially (to avoid
being crash handled) but turns out that isn't necessary?
* hence the also-added-bu-masked-out `debug_filter` / guard expression
around the `await debug._maybe_enter_pm()` line.
- show the `._invoke()` frame for the moment.
Turns out we aren't clearing the `._state._runtime_vars` entries in
between `open_root_actor` calls.. This test refinement catches that by
adding runtime-vars asserts on the expected root-addrs value; ensure
`_runtime_vars['_root_addrs'] ONLY match the values provided by the
test's CURRENT root actor.
This causes a failure when the (just added)
`test_non_registrar_spawns_child` is run as part of the module suite,
it's fine when run standalone.
Move `_root_addrs` assignment to after `async_main()` unblocks (via
`.started()`) which now delivers the bind addrs , ensuring correct
`UnwrappedAddress` propagation into `._state._runtime_vars` for
non-registar root actors..
Previously for non-registrar root actors the `._state._runtime_vars`
entries were being set as `Address` values which ofc IPC serialize
incorrectly rn vs. the unwrapped versions, (well until we add a msgspec
for their structs anyway) and thus are passed in incorrect form to
children/subactors during spawning..
This fixes the issue by waiting for the `.ipc.*` stack to
bind-and-resolve any randomly allocated addrs (by the OS) until after
the initial `Actor` startup is complete.
Deats,
- primarily, mv `_root_addrs` assignment from before `root_tn.start()`
to after, using started(-ed) `accept_addrs` now delivered from
`._runtime.async_main()`..
- update `task_status` type hints to match.
- unpack and set the `(accept_addrs, reg_addrs)` tuple from
`root_tn.start()` call into `._state._runtime_vars` entries.
- improve and embolden comments distinguishing registrar vs non-registrar
init paths, ensure typing reflects wrapped vs. unwrapped addrs.
Also,
- add a masked `mk_pdb().set_trace()` for debugging `raddrs` values
being "off".
- add TODO about using UDS on linux for root mailbox
- rename `trans_bind_addrs` -> `tpt_bind_addrs` for clarity.
- expand comment about random port allocation for
non-registrar case
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Ensure non-registrar root actors can spawn children and that
those children receive correct parent contact info. This test
catches the bug reported in,
https://github.com/goodboy/tractor/issues/410
Add new `test_non_registrar_spawns_child()` which spawns a sub-actor
from a non-registrar root and verifies the child can manually connect
back to its parent using `get_root()` API, auditing
`._state._runtime_vars` addr propagation from rent to child.
Also,
- improve type hints throughout test suites
(`subprocess.Popen`, `UnwrappedAddress`, `Aid` etc.)
- rename `n` -> `an` for actor nursery vars
- use multiline style for function signatures
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Test pkg-level init module and sub-pkg module logger naming
to better validate auto-naming logic.
Deats,
- create `pkg_init_mod` and write `mod_code` to it for
testing pkg-level `__init__.py` logger instance creation.
* assert `snakelib.__init__` logger name is `proj_name`.
- write `mod_code` to `subpkg/__init__.py`` as well and check the same.
Also,
- rename some vars,
* `pkg_mod` -> `pkg_submod`,
* `pkgmod` -> `subpkgmod`
- add `ModuleType` import for type hints
- improve comments explaining pkg init vs first-level
sub-module naming expectations.
- drop trailing whitespace and unused TODO comment
- remove masked `breakpoint()` call
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add assertions and comments to better test the reworked
implicit module-name detection in `get_logger()`.
Deats,
- add `assert not tractor.current_actor()` check to verify
no runtime is active during test.
- import `.log` submod directly for use.
- add masked `breakpoint()` for debugging mod loading.
- add comment about using `ranger` to inspect `testdir` layout
of auto-generated py pkg + module-files.
- improve comments explaining pkg-root-log creation.
- add TODO for testing `get_logger()` call from pkg
`__init__.py`
- add comment about first-pkg-level module naming.
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Use new implicit module-name detection throughout codebase to simplify
logger creation and leverage auto-naming from caller mod .
Main changes,
- drop `name=__name__` arg from all `get_logger()` calls
(across 29 modules).
- update `get_console_log()` calls to include `name='tractor'` for
enabling root logger in test harness and entry points; this ensures
logic in `get_logger()` triggers so that **all** `tractor`-internal
logging emits to console.
- add info log msg in test `conftest.py` showing test-harness
log level
Also,
- fix `.actor.uid` ref to `.actor.aid.uid` in `._trace`.
- adjust a `._context` log msg formatting for clarity.
- add TODO comments in `._addr`, `._uds` for when we mv to
using `multiaddr`.
- add todo for `RuntimeVars` type hint TODO in `.msg.types` (once we
eventually get that all going obvi!)
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Overhaul of the automatic-calling-module-name detection and sub-log
creation logic to avoid (at least warn) on duplication(s) and still
handle the common usage of a call with `name=__name__` from a mod's top
level scope.
Main `get_logger()` changes,
- refactor auto-naming logic for implicit `name=None` case such that we
handle at least `tractor` internal "bare" calls from internal submods.
- factor out the `get_caller_mod()` closure (still inside
`get_logger()`)for introspecting caller's module with configurable
frame depth.
- use `.removeprefix()` instead of `.lstrip()` for stripping pkg-name
from mod paths
- mv root-logger creation before sub-logger name processing
- improve duplicate detection for `pkg_name` in `name`
- add `_strict_debug=True`-only-emitted warnings for duplicate
pkg/leaf-mod names.
- use `print()` fallback for warnings when no actor runtime is up at
call time.
Surrounding tweaks,
- add `.level` property to `StackLevelAdapter` for getting
current emit level as lowercase `str`.
- mv `_proj_name` def to just above `get_logger()`
- use `_curr_actor_no_exc` partial in `_conc_name_getters`
to avoid runtime errors
- improve comments/doc-strings throughout
- keep some masked `breakpoint()` calls for future debugging
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
That is when no `name` is passed to `get_logger()`, try to introspect
the caller's `module.__name__` and use it to infer/get the "namespace
path" to that module the same as if using `name=__name__` as in the most
common usage.
Further, change the `_root_name` to be `pkg_name: str`, a public and
more obvious param name, and deprecate the former. This obviously adds
the necessary impl to make the new
`test_sys_log::test_implicit_mod_name_applied_for_child` test pass.
Impl detalles for `get_logger()`,
- add `pkg_name` and deprecate `_root_name`, include failover logic
and a warning.
- implement calling module introspection using
`inspect.stack()/getmodule()` to get both the `.__name__` and
`.__package__` info alongside adjusted logic to set the `name`
when not provided but only when a new `mk_sublog: bool` is set.
- tweak the `name` processing for implicitly set case,
- rename `sub_name` -> `pkg_path: str` which is the path
to the calling module minus that module's name component.
- only partition `name` if `pkg_name` is `in` it.
- use the `_root_log` for `pkg_name` duplication warnings.
Other/related,
- add types to various public mod vars missing them.
- rename `.log.log` -> `.log._root_log`.
A bit of test driven dev to anticipate support of `.log.get_logger()`
usage such that it can be called from arbitrary sub-modules, themselves
embedded in arbitrary sub-pkgs, of some project; the when not provided,
the `sub_name` passed to the `Logger.getChild(<sub_name>)` will be set
as the sub-pkg path "down to" the calling module.
IOW if you call something like,
`log = tractor.log.get_logger(pkg_name='mypylib')`
from some `submod.py` in a project-dir that looks like,
mypylib/
mod.py
subpkg/
submod.py <- calling module
the `log: StackLevelAdapter` child-`Logger` instance will have a
`.name: str = 'mypylib.subpkg'`, discluding the `submod` part since this
already rendered as the `{filename}` header in `log.LOG_FORMAT`.
Previously similar behaviour would be obtained by passing
`get_logger(name=__name__)` in the calling module and so much so it
motivated me to make this the default, presuming we can introspect for
the info.
Impl deats,
- duplicated a `load_module_from_path()` from `modden` to load the
`testdir` rendered py project dir from its path.
|_should prolly factor it down to this lib anyway bc we're going to
need it for hot code reload? (well that and `watchfiles` Bp)
- in each of `mod.py` and `submod.py` render the `get_logger()` code
sin `name`, expecting the (coming shortly) implicit introspection
feat to do this.
- do `.name` and `.parent` checks against expected sub-logger values
from `StackLevelAdapter.logger.getChildren()`.
To start ensuring that when `name=__name__` is passed we try to
de-duplicate the `_root_name` and any `leaf_mod: str` since it's already
included in the headers as `{filename}`.
Deats,
- heavily document the de-duplication `str.partition()`s in
`.log.get_logger()` and provide the end fix by changing the predicate,
`if rname == 'tractor':` -> `if rname == _root_name`.
* also toss in some warnings for when we still detect duplicates.
- add todo comments around logging "filters" (vs. our "adapter").
- create the new `test_log_sys.test_root_pkg_not_duplicated()` which
runs green with the fixes from ^.
- add a ton of test-suite todos both for existing and anticipated
logging sys feats in the new mod.
Based on the impure template from `pyproject.nix` and providing
a dev-shell for easy bypass-n-hack on nix(os) using `uv`.
Deats,
- include bash completion pkgs for devx/happiness.
- pull `ruff` from <nixpkgs> to avoid wheel (build) issues.
- pin to py313 `cpython` for now.
Namely,
- `devx` for console debugging extras used in `tractor.devx`.
- `repl` for @goodboy's `xonsh` hackin utils.
- `testing` for harness stuffs.
- `lint` for whenever we start doing that; it requires special
separation on nixos in order to pull `ruff` from pkgs.
Oh and bump the lock file.
Such that we are able to (finally) detect when we should
`Context._scope.cancel()` specifically when the `.parent_task` is
**not** blocking on receiving from the underlying `._rx_chan`, since if
the task is blocking on `.receive()` it will call `.cancel()`
implicitly.
This is a lot to explain with very little code actually needed for the
implementation (are we like `trio` yet anyone?? XD) but the main jist is
that `Context._maybe_cancel_and_set_remote_error()` needed the
additional case of calling `._scope.cancel()` whenever we know that
a remote-error/ctxc won't be immediately handled, bc user code is doing
non `Context`-API things, and result in a similar outcome as if that
task was waiting on `Context.wait_for_result()` or `.__aexite__()`.
Impl details,
- add a new `._is_blocked_on_rx_chan()` method which predicates whether
the (new) `.parent_task` is blocking on `._rx_chan.receive()`.
* see various stipulations about the current impl and how we might
need to adjust for the future given `trio`'s commitment to the
`Task.custom_sleep_data` attr..
- add `.parent_task`, a pub wrapper for `._task`.
- check for `not ._is_blocked_on_rx_chan()` before manually cancelling
the local `.parent_task`
- minimize the surrounding branch case expressions.
Other,
- tweak a couple logs.
- add a new `.cancel()` pre-started msg.
- mask the `.cancel_called` setter, it's only (been) used for tracing.
- todos around maybe moving the `._nursery` allocation "around" the
`.start_remote_task()` call and various subsequent tweaks therein.
While working on a fix to the hang case found from
`test_cancel_ctx_with_parent_side_entered_in_bg_task` an initial
solution caused this test to hang indefinitely; solve it with a small
wrapping `_main()` + `trio.fail_after()` entrypoint.
Further suite refinements,
- move the top-most `try:`->`else:` block
- toss in a masked base-exc block for tracing unexpected
`ctx.wait_for_result()` outcomes.
- tweak the `raise_sub_spawn_error_after` to be an optional `float`
which scales the `rng_seed: int = 50` msg counter to
`tell_little_bro()` so that the abs value to the `range()` can be
changed.