A partial revert of commit c05d08e426 since it seem we already
suppress tpt-closed errors lower down in `.ipc.Channel.send()`; given
that i'm pretty sure this new handler code should basically never run?
Left in a todo to remove the masked content once i'm done more
thoroughly testing under `piker`.
For IPC-disconnects-during-teardown edge cases, augment some `._rpc`
machinery,
- in `._invoke()` around the `await chan.send(return_msg)` where we
suppress if the underlying `Channel` already disconnected.
- add a disjoint handler in `_errors_relayed_via_ipc()` which just
reports-n-reraises the exc (same as prior behaviour).
* originally i thought it needed to be handled specially (to avoid
being crash handled) but turns out that isn't necessary?
* hence the also-added-bu-masked-out `debug_filter` / guard expression
around the `await debug._maybe_enter_pm()` line.
- show the `._invoke()` frame for the moment.
Move `_root_addrs` assignment to after `async_main()` unblocks (via
`.started()`) which now delivers the bind addrs , ensuring correct
`UnwrappedAddress` propagation into `._state._runtime_vars` for
non-registar root actors..
Previously for non-registrar root actors the `._state._runtime_vars`
entries were being set as `Address` values which ofc IPC serialize
incorrectly rn vs. the unwrapped versions, (well until we add a msgspec
for their structs anyway) and thus are passed in incorrect form to
children/subactors during spawning..
This fixes the issue by waiting for the `.ipc.*` stack to
bind-and-resolve any randomly allocated addrs (by the OS) until after
the initial `Actor` startup is complete.
Deats,
- primarily, mv `_root_addrs` assignment from before `root_tn.start()`
to after, using started(-ed) `accept_addrs` now delivered from
`._runtime.async_main()`..
- update `task_status` type hints to match.
- unpack and set the `(accept_addrs, reg_addrs)` tuple from
`root_tn.start()` call into `._state._runtime_vars` entries.
- improve and embolden comments distinguishing registrar vs non-registrar
init paths, ensure typing reflects wrapped vs. unwrapped addrs.
Also,
- add a masked `mk_pdb().set_trace()` for debugging `raddrs` values
being "off".
- add TODO about using UDS on linux for root mailbox
- rename `trans_bind_addrs` -> `tpt_bind_addrs` for clarity.
- expand comment about random port allocation for
non-registrar case
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Use new implicit module-name detection throughout codebase to simplify
logger creation and leverage auto-naming from caller mod .
Main changes,
- drop `name=__name__` arg from all `get_logger()` calls
(across 29 modules).
- update `get_console_log()` calls to include `name='tractor'` for
enabling root logger in test harness and entry points; this ensures
logic in `get_logger()` triggers so that **all** `tractor`-internal
logging emits to console.
- add info log msg in test `conftest.py` showing test-harness
log level
Also,
- fix `.actor.uid` ref to `.actor.aid.uid` in `._trace`.
- adjust a `._context` log msg formatting for clarity.
- add TODO comments in `._addr`, `._uds` for when we mv to
using `multiaddr`.
- add todo for `RuntimeVars` type hint TODO in `.msg.types` (once we
eventually get that all going obvi!)
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Overhaul of the automatic-calling-module-name detection and sub-log
creation logic to avoid (at least warn) on duplication(s) and still
handle the common usage of a call with `name=__name__` from a mod's top
level scope.
Main `get_logger()` changes,
- refactor auto-naming logic for implicit `name=None` case such that we
handle at least `tractor` internal "bare" calls from internal submods.
- factor out the `get_caller_mod()` closure (still inside
`get_logger()`)for introspecting caller's module with configurable
frame depth.
- use `.removeprefix()` instead of `.lstrip()` for stripping pkg-name
from mod paths
- mv root-logger creation before sub-logger name processing
- improve duplicate detection for `pkg_name` in `name`
- add `_strict_debug=True`-only-emitted warnings for duplicate
pkg/leaf-mod names.
- use `print()` fallback for warnings when no actor runtime is up at
call time.
Surrounding tweaks,
- add `.level` property to `StackLevelAdapter` for getting
current emit level as lowercase `str`.
- mv `_proj_name` def to just above `get_logger()`
- use `_curr_actor_no_exc` partial in `_conc_name_getters`
to avoid runtime errors
- improve comments/doc-strings throughout
- keep some masked `breakpoint()` calls for future debugging
(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
That is when no `name` is passed to `get_logger()`, try to introspect
the caller's `module.__name__` and use it to infer/get the "namespace
path" to that module the same as if using `name=__name__` as in the most
common usage.
Further, change the `_root_name` to be `pkg_name: str`, a public and
more obvious param name, and deprecate the former. This obviously adds
the necessary impl to make the new
`test_sys_log::test_implicit_mod_name_applied_for_child` test pass.
Impl detalles for `get_logger()`,
- add `pkg_name` and deprecate `_root_name`, include failover logic
and a warning.
- implement calling module introspection using
`inspect.stack()/getmodule()` to get both the `.__name__` and
`.__package__` info alongside adjusted logic to set the `name`
when not provided but only when a new `mk_sublog: bool` is set.
- tweak the `name` processing for implicitly set case,
- rename `sub_name` -> `pkg_path: str` which is the path
to the calling module minus that module's name component.
- only partition `name` if `pkg_name` is `in` it.
- use the `_root_log` for `pkg_name` duplication warnings.
Other/related,
- add types to various public mod vars missing them.
- rename `.log.log` -> `.log._root_log`.
To start ensuring that when `name=__name__` is passed we try to
de-duplicate the `_root_name` and any `leaf_mod: str` since it's already
included in the headers as `{filename}`.
Deats,
- heavily document the de-duplication `str.partition()`s in
`.log.get_logger()` and provide the end fix by changing the predicate,
`if rname == 'tractor':` -> `if rname == _root_name`.
* also toss in some warnings for when we still detect duplicates.
- add todo comments around logging "filters" (vs. our "adapter").
- create the new `test_log_sys.test_root_pkg_not_duplicated()` which
runs green with the fixes from ^.
- add a ton of test-suite todos both for existing and anticipated
logging sys feats in the new mod.
Such that we are able to (finally) detect when we should
`Context._scope.cancel()` specifically when the `.parent_task` is
**not** blocking on receiving from the underlying `._rx_chan`, since if
the task is blocking on `.receive()` it will call `.cancel()`
implicitly.
This is a lot to explain with very little code actually needed for the
implementation (are we like `trio` yet anyone?? XD) but the main jist is
that `Context._maybe_cancel_and_set_remote_error()` needed the
additional case of calling `._scope.cancel()` whenever we know that
a remote-error/ctxc won't be immediately handled, bc user code is doing
non `Context`-API things, and result in a similar outcome as if that
task was waiting on `Context.wait_for_result()` or `.__aexite__()`.
Impl details,
- add a new `._is_blocked_on_rx_chan()` method which predicates whether
the (new) `.parent_task` is blocking on `._rx_chan.receive()`.
* see various stipulations about the current impl and how we might
need to adjust for the future given `trio`'s commitment to the
`Task.custom_sleep_data` attr..
- add `.parent_task`, a pub wrapper for `._task`.
- check for `not ._is_blocked_on_rx_chan()` before manually cancelling
the local `.parent_task`
- minimize the surrounding branch case expressions.
Other,
- tweak a couple logs.
- add a new `.cancel()` pre-started msg.
- mask the `.cancel_called` setter, it's only (been) used for tracing.
- todos around maybe moving the `._nursery` allocation "around" the
`.start_remote_task()` call and various subsequent tweaks therein.
Since it turns out there's even case(s) in `trio` core that are guilty
(of implementing things like checkpoints in exc handlers), this adds
facility for ignoring explicit cases via `inspect.FrameInfo` field
matching from the unmasked `exc_ctx` within
`maybe_raise_from_masking_exc()`.
Impl deats,
- use `inspect.getinnerframes()/getmodule()` to extract the equivalent
"guilty place in code" which raised the masked error which we'd like
to ignore and **not unmask**.
- start a `_mask_cases: dict` which describes the entries to ignore
by matching against a specific `FrameInfo`'s fields from indexed
from `getinnerframes()`.
- describe in that table the case i hit with `trio.WouldBlock` being
always masked by a `Cancelled` due to way `trio.Lock.acquire()`
implements the blocking case in the would-block handler..
- always call into a new `is_expected_masking_case()` predicate (from
`maybe_raise_from_masking_exc()`) on matching `exc_ctx` types.
The correct ordering is to de-alloc the surrounding `service_n`
+ `trio.Event` **after** the `mng` teardown ensuring the
`mng.__aexit__()` never can hit a ref-error if it touches either (like
if a `tn` is passed to `maybe_open_context()`!
Such that key->value pairs can be defined which should *never be*
unmasked where values of
- the keys are exc-types which might be masked, and
- the values are exc-types which masked the equivalent key.
For example, the default includes:
- KBI->taskc: a kbi should never be unmasked from its masking
`trio.Cancelled`.
For the impl, a new `do_warn: bool` in the fn-body determines the
primary guard for whether a warning or re-raising is necessary.
Including all caller usage throughout. Moving to a non-`except*` impl
means it's never needed as a signal from the caller - we can just catch
the beg outright (like we should have always been doing)..
That is from `maybe_raise_from_masking_exc()` thus minimizing us to
a single `except BaseException` block with logic branching for the beg
vs. `unmask_from` exc cases.
Also,
- raise val-err when `unmask_from` is not a `tuple`.
- tweak the exc-note warning format.
- drop all pausing from dev work.
Turns out we weren't despite the optional `stream_handler_nursery` input
to `Server.listen_on()`; fail over to the `Server._stream_handler_tn`
allocated during server setup in those cases.
Turns out I didn't read my own internals docs/comments and despite it
not being used previously, this adds the real use case: a root,
per-actor, scope which ensures parent comms are the last conc-thing to
be cancelled.
Also, the impl changes here make the test from 6410e45 (or wtv
it's rebased to) pass, i.e. we can support crash handling in the root
actor despite the root-tn having been (self) cancelled.
Superficial adjustments,
- rename `Actor._service_n` -> `._service_tn` everywhere.
- add asserts to `._runtime.async_main()` which ensure that the any
`.trionics.maybe_open_nursery()` calls against optionally passed
`._[root/service]_tn` are allocated-if-not-provided (the
`._service_tn`-case being an i-guess-prep-for-the-future-anti-pattern
Bp).
- obvi adjust all internal usage to match new naming.
Serious/real-use-case changes,
- add (back) a `Actor._root_tn` which sits a scope "above" the
service-tn and is either,
+ assigned in `._runtime.async_main()` for sub-actors OR,
+ assigned in `._root.open_root_actor()` for the root actor.
**THE primary reason** to keep this "upper" tn is that during
a full-`Actor`-cancellation condition (more details below) we want to
ensure that the IPC connection with a sub-actor's parent is **the last
thing to be cancelled**; this is most simply implemented by ensuring
that the `Actor._parent_chan: .ipc.Channel` is handled in an upper
scope in `_rpc.process_messages()`-subtask-terms.
- for the root actor this `root_tn` is allocated in `.open_root_actor()`
body and assigned as such.
- extend `Actor.cancel_soon()` to be cohesive with this entire teardown
"policy" by scheduling a task in the `._root_tn` which,
* waits for the `._service_tn` to complete and then,
* cancels the `._root_tn.cancel_scope`,
* includes "sclangy" console logging throughout.
I masked it bc it doesn't seem to actually work for the case I was
testing (`emsd` clobbering a `paperboi` in `piker`..) but figured I'd
leave it as a reminder for solving this problem more generally (#320)
since this is likely the place in the code for a soln.
When i tested it in my case it just resulted in a hang around the `with
debug.acquire_debug_lock()` for some reason? Can't remember if the child
ended up being able to REPL without issue though..
Such that we handle them despite a cancellation condition. This is
almost always the case, that `root_tn.cancel_scope.cancel_called` is
set, by the time the `debug._maybe_enter_pm()` hits. Previous I guess we
just weren't actually ever REPL-debugging such cases?
TODO, still needs a test obvi!
That is the `target` can declare a `chan: LinkedTaskChannel` instead of
`to_trio`/`from_aio`.
To support it,
- change `.started()` -> the more appropriate `.started_nowait()` which
can be called sync from the aio child task.
- adjust the `provide_channels` assert to accept either fn sig
declaration (for now).
Still needs test(s) obvi..
Makes the newly added `test_aio_side_raises_before_started` test pass by
ensuring errors raised by any `.to_asyncio.open_channel_from()` spawned
child-`asyncio.Task` are relayed by any caught `trio.EndOfChannel` by
checking for a new `LinkedTaskChannel._closed_by_aio_task: bool`.
Impl deats,
- obvi add `LinkedTaskChannel._closed_by_aio_task: bool = False`
- in `translate_aio_errors()` always check for the new flag on EOC
conditions and in such cases set `chan._trio_to_raise = aio_err` such
that the `trio`-parent-task always raises the child's exception
directly, OW keep original EoC passthrough in place.
- include *very* detailed per-case comments around the extended handler.
- adjust re-raising logic with a new `raise_from` where we only give the
`aio_err` priority if it's not already set as to `trio_to_raise`.
Also,
- hide the `_run_asyncio_task()` frame by def.
Been meaning to do this forever and a recent test hang finally drove me
to it Bp
Like it sounds, adopt the "cancel-status" properties on `ActorNursery`
use already on our `Context` and derived from `trio.CancelScope`:
- add new private `._cancel_called` (set in the head of `.cancel()`)
& `._cancelled_caught` (set in the tail) instance vars with matching
read-only `@properties`.
- drop the instance-var and instead delegate a `.cancelled: bool`
property to `._cancel_called` and add a usage deprecation warning
(since removing it breaks a buncha tests).
So i need to either adjust the tests or figure out if/why this is needed
to avoid the crashing in `pikerd` i found when killin the chart during
a long backfill with `binance` backend..
Just like in the BRE case (for UDS) it seems when a peer closes the
(UDS?) socket `trio` instead raises a `ClosedResourceError` which we now
catch and re-raise as a `TransportClosed`. This again results in
`tpt.send()` calls from the rpc-runtime **not** raising when it's known
that the IPC channel is disconnected.
Shift around comments and expressions for better reading, assign
`tpt_closed` for easier introspection from REPL during debug oh and fix
the `MsgpackTransport.pformat()` to render '|_peers: 1' .. XD
Similar to what was just changed for `Context.repr_state`, when the
child task is cancelled but by a different "layer" of the runtime (i.e.
a `Portal.cancel_actor()` / `SIGINT`-to-process canceller) we don't
dump a traceback instead just `log.cancel()` emit.
Such that if the local task hasn't resolved but is `trio.Cancelled` and
a `.canceller` was set, we report a `'actor-cancelled'` from
`.repr_state: str`. Bit of formatting to avoid needless newlines too!
Since it's merely a local-file-sys subdirectory and there should be no
reason file creation conflicts with other bind spaces.
Also add 2 test suites to match,
- `tests/ipc/test_each_tpt::test_uds_bindspace_created_implicitly` to
verify the dir creation when DNE.
- `..test_uds_double_listen_raises_connerr` to ensure a double bind
raises a `ConnectionError` from the src `OSError`.
Using `.get_stream_addrs()` such that we always (*can*) assign the peer
end's PID in the `._raddr`.
Also factor common `ConnectionError` re-raising into
a `_reraise_as_connerr()`-@cm.