tractor

Commit Graph

Author	SHA1	Message	Date
Tyler Goodlet	efb69f9bf9	Flip back `StartAck` timeout to `inf`..	2025-03-24 14:04:51 -04:00
Tyler Goodlet	e4e04c516f	First draft "payload receiver in a new `.msg._ops` As per much tinkering, re-designs and preceding rubber-ducking via many "commit msg novelas", finally this adds the (hopefully) final missing layer for typed msg safety: `tractor.msg._ops.PldRx` (or `PayloadReceiver`? haven't decided how verbose to go..) Design justification summary: ------ - ------ - need a way to be as-close-as-possible to the `tractor`-application such that when `MsgType.pld: PayloadT` validation takes place, it is straightforward and obvious how user code can decide to handle any resulting `MsgTypeError`. - there should be a common and optional-yet-modular way to modify how data delivered via IPC (possibly embedded as user defined, type-constrained `.pld: msgspec.Struct`s) can be handled and processed during fault conditions and/or IPC "msg attacks". - support for nested type constraints within a `MsgType.pld` field should be simple to define, implement and understand at runtime. - a layer between the app-level IPC primitive APIs (`Context`/`MsgStream`) and application-task code (consumer code of those APIs) should be easily customized and prove-to-be-as-such through demonstrably rigorous internal (sub-sys) use! -> eg. via seemless runtime RPC eps support like `Actor.cancel()` -> by correctly implementing our `.devx._debug.Lock` REPL TTY mgmt dialog prot, via a dead simple payload-as-ctl-msg-spec. There are some fairly detailed doc strings included so I won't duplicate that content, the majority of the work here is actually somewhat of a factoring of many similar blocks that are doing more or less the same `msg = await Context._rx_chan.receive()` with boilerplate for `Error`/`Stop` handling via `_raise_from_no_key_in_msg()`. The new `PldRx` basically provides a shim layer for this common "receive msg, decode its payload, yield it up to the consuming app task" by pairing the RPC feeder mem-chan with a msg-payload decoder and expecting IPC API internals to use one API instead of re-implementing the same pattern all over the place XD `PldRx` breakdown ------ - ------ - for now only expects a `._msgdec: MsgDec` which allows for override-able `MsgType.pld` validation and most obviously used in the impl of `.dec_msg()`, the decode message method. - provides multiple mem-chan receive options including: \|_ `.recv_pld()` which does the e2e operation of receiving a payload item. \|_ a sync `.recv_pld_nowait()` version. \|_ a `.recv_msg_w_pld()` which optionally allows retreiving both the shuttling `MsgType` as well as it's `.pld` body for use cases where info on both is important (eg. draining a `MsgStream`). Dirty internal changeover/implementation deatz: ------ - ------ - obvi move over all the IPC "primitives" that previously had the duplicate recv-n-yield logic: - `MsgStream.receive[_nowait]()` delegating instead to the equivalent `PldRx.recv_pld[_nowait]()`. - add `Context._pld_rx: PldRx`, created and passed in by `mk_context()`; use it for the `.started()` -> `first: Started` retrieval inside `open_context_from_portal()`. - all the relevant `Portal` invocation methods: `.result()`, `.run_from_ns()`, `.run()`; also allows for dropping `_unwrap_msg()` and `.Portal_return_once()` outright Bo - rename `Context.ctx._recv_chan` -> `._rx_chan`. - add detailed `Context._scope` info for logging whether or not it's cancelled inside `_maybe_cancel_and_set_remote_error()`. - move `._context._drain_to_final_msg()` -> `._ops.drain_to_final_msg()` since it's really not necessarily ctx specific per say, and it does kinda fit with "msg operations" more abstractly ;)	2025-03-24 14:04:51 -04:00
Tyler Goodlet	dc31f0dac9	Use `DebugStatus` around subactor lock requests Breaks out all the (sub)actor local conc primitives from `Lock` (which is now only used in and by the root actor) such that there's an explicit distinction between a task that's "consuming" the `Lock` (remotely) vs. the root-side service tasks which do the actual acquire on behalf of the requesters. `DebugStatus` changeover deats: ------ - ------ - move all the actor-local vars over `DebugStatus` including: - move `_trio_handler` and `_orig_sigint_handler` - `local_task_in_debug` now `repl_task` - `_debugger_request_cs` now `req_cs` - `local_pdb_complete` now `repl_release` - drop all ^ fields from `Lock.repr()` obvi.. - move over the `.[un]shield_sigint()` and `.is_main_trio_thread()` methods. - add some new attrs/meths: - `DebugStatus.repl` for the currently running `Pdb` in-actor singleton. - `.repr()` for pprint of state (like `Lock`). - Note: that even when a root-actor task is in REPL, the `DebugStatus` is still used for certain actor-local state mgmt, such as SIGINT handler shielding. - obvi change all lock-requester code bits to now use a `DebugStatus` in their local actor-state instead of `Lock`, i.e. change usage from `Lock` in `._runtime` and `._root`. - use new `Lock.get_locking_task_cs()` API in when checking for sub-in-debug from `._runtime.Actor._stream_handler()`. Unrelated to topic-at-hand tweaks: ------ - ------ - drop the commented bits about hiding `@[a]cm` stack frames from `_debug.pause()` and simplify to only one block with the `shield` passthrough since we already solved the issue with cancel-scopes using `@pdbp.hideframe` B) - this includes all the extra logging about the extra frame for the user (good thing i put in that wasted effort back then eh..) - put the `try/except BaseException` with `log.exception()` around the whole of `._pause()` to ensure we don't miss in-func errors which can cause hangs.. - allow passing in `portal: Portal` to `Actor.start_remote_task()` such that `Portal` task spawning methods are always denoted correctly in terms of `Context.side`. - lotsa logging tweaks, decreasing a bit of noise from `.runtime()`s.	2025-03-24 14:04:51 -04:00
Tyler Goodlet	648695a325	Start tidying up `._context`, use `pack_from_raise()` Mostly removing commented (and replaced) code blocks lingering from the ctxc semantics work and new typed-msg-spec `MsgType`s handling AND use the new `._exceptions.pack_from_raise()` helper to construct `StreamOverrun` msgs. Deaterz: - clean out the drain loop now that it's implemented to handle our struct msg types including the `dict`-msg bits left in as fallback-reminders, any notes/todos better summarized at the top of their blocks, remove any `_final_result_is_set()` related duplicate/legacy tidbits. - use a `case Error()` block in drain loop with fallthrough to `_:` always resulting in an rte raise. - move "XXX" notes into the doc-string for `._deliver_msg()` as a "rules" section. - use `match:` syntax for logging the `result_or_err: MsgType` outcome from the final `.result()` call inside `open_context_from_portal()`. - generally speaking use `MsgType` type annotations throughout!	2025-03-24 14:04:51 -04:00
Tyler Goodlet	fb94ecd729	Rename `Actor._push_result()` -> `._deliver_ctx_payload()` Better describes the internal RPC impl/latest-architecture with the msgs delivered being those which either define a `.pld: PayloadT` that gets passed up to user code, or the error-msg subset that similarly is raised in a ctx-linked task.	2025-03-24 14:04:51 -04:00
Tyler Goodlet	8ac9ccf65d	Finally drop masked `chan.send(None)` related code blocks	2025-03-24 14:04:51 -04:00
Tyler Goodlet	939f198dd9	Drop `None`-sentinel cancels RPC loop mechanism Pretty sure we haven't needed it for a while, it was always generally hazardous in terms of IPC msg types, AND it's definitely incompatible with a dynamically applied typed msg spec: you can't just expect a `None` to be willy nilly handled all the time XD For now I'm masking out all the code and leaving very detailed surrounding notes but am not removing it quite yet in case for strange reason it is needed by some edge case (though I haven't found according to the test suite). Backstory: ------ - ------ Originally (i'm pretty sure anyway) it was added as a super naive "remote cancellation" mechanism (back before there were specific `Actor` methods for such things) that was mostly (only?) used before IPC `Channel` closures to "more gracefully cancel" the connection's parented RPC tasks. Since we now have explicit runtime-RPC endpoints for conducting remote cancellation of both tasks and full actors, it should really be removed anyway, because: - a `None`-msg setinel is inconsistent with other RPC endpoint handling input patterns which (even prior to typed msging) had specific msg-value triggers. - the IPC endpoint's (block) implementation should use `Actor.cancel_rpc_tasks(parent_chan=chan)` instead of a manual loop through a `Actor._rpc_tasks.copy()`.. Deats: - mask the `Channel.send(None)` calls from both the `Actor._stream_handler()` tail as well as from the `._portal.open_portal()` was connected block. - mask the msg loop endpoint block and toss in lotsa notes. Unrelated tweaks: - drop `Actor._debug_mode`; unused. - make `Actor.cancel_server()` return a `bool`. - use `.msg.pretty_struct.Struct.pformat()` to show any msg that is ignored (bc invalid) in `._push_result()`.	2025-03-24 14:04:51 -04:00
Tyler Goodlet	09eed9d7e1	WIP porting runtime to use `Msg`-spec	2025-03-24 14:04:51 -04:00
Tyler Goodlet	4621c8c1b9	Change all `\| None` -> `\|None` in `._runtime`	2025-03-20 22:37:51 -04:00
Tyler Goodlet	9082efbe68	Add a `._state._runtime_vars['_registry_addrs']` Such that it's set to whatever `Actor.reg_addrs: list[tuple]` is during the actor's init-after-spawn guaranteeing each actor has at least the registry infos from its parent. Ensure we read this if defined over `_root._default_lo_addrs` in `._discovery` routines, namely `.find_actor()` since it's the one API normally used without expecting the runtime's `current_actor()` to be up. Update the latest inter-peer cancellation test to use the `reg_addr` fixture (and thus test this new runtime-vars value via `find_actor()` usage) since it was failing if run after the infected `asyncio` suite due to registry contact failure.	2025-03-20 19:50:31 -04:00
Tyler Goodlet	dbd79d8beb	Log chan-server-startup failures via `.exception()`	2025-03-20 19:50:31 -04:00
Tyler Goodlet	51bd38976f	Expose per-actor registry addrs via `.reg_addrs` Since it's handy to be able to debug the writing of this instance var (particularly when checking state passed down to a child in `Actor._from_parent()`), rename and wrap the underlying `Actor._reg_addrs` as a settable `@property` and add validation to the `.setter` for sanity - actor discovery is a critical functionality. Other tweaks: - fix `.cancel_soon()` to pass expected argument.. - update internal runtime error message to be simpler and link to GH issues. - use new `Actor.reg_addrs` throughout core.	2025-03-20 19:50:31 -04:00
Tyler Goodlet	7246749137	Add post-mortem catch around failed transport addr binds to aid with runtime debugging	2025-03-20 19:50:31 -04:00
Tyler Goodlet	7ce4bc489e	Init-support for "multi homed" transports Since we'd like to eventually allow a diverse set of transport (protocol) methods and stacks, and a multi-peer discovery system for distributed actor-tree applications, this reworks all runtime internals to support multi-homing for any given tree on a logical host. In other words any actor can now bind its transport server (currently only unsecured TCP + `msgspec`) to more then one address available in its (linux) network namespace. Further, registry actors (now dubbed "registars" instead of "arbiters") can also similarly bind to multiple network addresses and provide discovery services to remote actors via multiple addresses which can now be provided at runtime startup. Deats: - adjust `._runtime` internals to use a `list[tuple[str, int]]` (and thus pluralized) socket address sequence where applicable for transport server socket binds, now exposed via `Actor.accept_addrs`: - `Actor.__init__()` now takes a `registry_addrs: list`. - `Actor.is_arbiter` -> `.is_registrar`. - `._arb_addr` -> `._reg_addrs: list[tuple]`. - always reg and de-reg from all registrars in `async_main()`. - only set the global runtime var `'_root_mailbox'` to the loopback address since normally all in-tree processes should have access to it, right? - `._serve_forever()` task now takes `listen_sockaddrs: list[tuple]` - make `open_root_actor()` take a `registry_addrs: list[tuple[str, int]]` and defaults when not passed. - change `ActorNursery.start_..()` methods take `bind_addrs: list` and pass down through the spawning layer(s) via the parent-seed-msg. - generalize all `._discovery()` APIs to accept `registry_addrs`-like inputs and move all relevant subsystems to adopt the "registry" style naming instead of "arbiter": - make `find_actor()` support batched concurrent portal queries over all provided input addresses using `.trionics.gather_contexts()` Bo - syntax: move to using `async with <tuples>` 3.9+ style chained @acms. - a general modernization of the code to a python 3.9+ style. - start deprecation and change to "registry" naming / semantics: - `._discovery.get_arbiter()` -> `.get_registry()`	2025-03-20 19:50:31 -04:00
Tyler Goodlet	b00ba158f1	Kick off `.devx` subpkg for our dev tools B) Where `.devx` is "developer experience", a hopefully broad enough subpkg name for all the slick stuff planned to augment working on the actor runtime 💥 Move the `._debug` module into the new subpkg and adjust rest of core code base to reflect import path change. Also add a new `.devx._debug.open_crash_handler()` manager for wrapping any sync code outside a `trio.run()` which is handy for eventual CLI addons for popular frameworks like `click`/`typer`.	2025-03-20 15:07:27 -04:00
Tyler Goodlet	1deed8dbee	._runtime: log level tweaks, use crit for stale debug lock detection	2025-03-20 15:07:27 -04:00
Tyler Goodlet	f0417d802b	First proto: use `greenback` for sync func breakpointing This works now for supporting a new `tractor.pause_from_sync()` `tractor`-aware-replacement for `Pdb.set_trace()` from sync functions which are also scheduled from our runtime. Uses `greenback` to do all the magic of scheduling the bg `tractor._debug._pause()` task and engaging the normal TTY locking machinery triggered by `await tractor.breakpoint()` Further this starts some public API renaming, making a switch to `tractor.pause()` from `.breakpoint()` which IMO much better expresses the semantics of the runtime intervention required to suffice multi-process "breakpointing"; it also is an alternate name for the same in computer science more generally: https://en.wikipedia.org/wiki/Breakpoint It also avoids using the same name as the `breakpoint()` built-in which is important since there is alot more going on when you call our equivalent API. Deats of that: - add deprecation warning for `tractor.breakpoint()` - add `tractor.pause()` and a shorthand, easier-to-type, alias `.pp()` for "pause-point" B) - add `pause_from_sync()` as the new `breakpoint()`-from-sync-function hack which does all the `greenback` stuff for the user. Still TODO: - figure out where in the runtime and when to call `greenback.ensure_portal()`. - fix the frame selection issue where `trio._core._ki._ki_protection_decorator:wrapper` seems to be always shown on REPL start as the selected frame..	2025-03-20 15:07:27 -04:00
Tyler Goodlet	11bab13a06	Various adjustments to fix breakage after rebase - Remove `exceptiongroup` import, - pin to py 3.11 in `setup.py` - revert any lingering `tractor.devx` imports; sub-pkg is coming in a downstream PR! - remove weird double `@property` lingering from conflict reso.. - modern `pytest` requires conftest mod mods to be relative imported.	2025-03-19 15:30:59 -04:00
Tyler Goodlet	9a8cd13894	Another cancel-req-invalid log msg fmt tweak	2025-03-16 16:06:26 -04:00
Tyler Goodlet	a87df3009f	Drop now-deprecated deps on modern `trio`/Python - `trio_typing` is nearly obsolete since `trio >= 0.23` - `exceptiongroup` is built-in to python 3.11 - `async_generator` primitives have lived in `contextlib` for quite a while!	2025-03-16 16:06:24 -04:00
Tyler Goodlet	a5bc113fde	Start a `._rpc` module Since `._runtime` was getting pretty long (> 2k LOC) and much of the RPC low-level machinery is fairly isolated to a handful of task-funcs, it makes sense to re-org the RPC task scheduling and driving msg loop to its own code space. The move includes: - `process_messages()` which is the main IPC business logic. - `try_ship_error_to_remote()` helper, to box local errors for the wire. - `_invoke()`, the core task scheduler entrypoing used in the msg loop. - `_invoke_non_context()`, holds impls for non-`@context` task starts. - `_errors_relayed_via_ipc()` which does all error catch-n-boxing for wire-msg shipment using `try_ship_error_to_remote()` internally. Also inside `._runtime` improve some `Actor` methods docs.	2025-03-16 15:52:53 -04:00
Tyler Goodlet	544cb40533	Attempt at better internal traceback hiding Previously i was trying to approach this using lots of `__tracebackhide__`'s in various internal funcs but since it's not exactly straight forward to do this inside core deps like `trio` and the stdlib, it makes a bit more sense to optionally catch and re-raise certain classes of errors from their originals using `raise from` syntax as per: https://docs.python.org/3/library/exceptions.html#exception-context Deats: - litter `._context` methods with `__tracebackhide__`/`hide_tb` which were previously being shown but that don't need to be to application code now that cancel semantics testing is finished up. - i originally did the same but later commented it all out in `._ipc` since error catch and re-raise instead in higher level layers (above the transport) seems to be a much saner approach. - add catch-n-reraise-from in `MsgStream.send()`/.`receive()` to avoid seeing the depths of `trio` and/or our `._ipc` layers on comms errors. Further this patch adds some refactoring to use the same remote-error shipper routine from both the actor-core in the RPC invoker: - rename it as `try_ship_error_to_remote()` and call it from `._invoke()` as well as it's prior usage. - make it optionally accept `cid: str` a `remote_descr: str` and of course a `hide_tb: bool`. Other misc tweaks: - add some todo notes around `Actor.load_modules()` debug hooking. - tweak the zombie reaper log msg and timeout value ;)	2025-03-16 15:30:08 -04:00
Tyler Goodlet	cbaf4fc05b	Add a open-ctx-with-self test Found exactly why trying this won't work when playing around with opening workspaces in `modden` using a `Portal.open_context()` back to the 'bigd' root actor: the RPC machinery only registers one entry in `Actor._contexts` which will get overwritten by each task's side and then experience race-based IPC msging errors (eg. rxing `{'started': _}` on the callee side..). Instead make opening a ctx back to the self-actor a runtime error describing it as an invalid op. To match: - add a new test `test_ctx_with_self_actor()` to the context semantics suite. - tried out adding a new `side: str` to the `Actor.get_context()` (and callers) but ran into not being able to determine the value from in `._push_result()` where it's needed to figure out which side to push to.. So, just leaving the commented arg (passing) in the runtime core for now in case we can come back to trying to make it work, tho i'm thinking it's not the right hack anyway XD	2025-03-16 15:19:51 -04:00
Tyler Goodlet	8e3a2a9297	Make `Actor._cancel_task(requesting_uid: tuple)` required arg	2025-03-16 14:01:50 -04:00
Tyler Goodlet	c5228e7be5	Set `._cancel_msg` to RPC `{cmd: 'self._cancel_task', ..}` msg Like how we set `Context._cancel_msg` in `._deliver_msg()` (in which case normally it's an `{'error': ..}` msg), do the same when any RPC task is remotely cancelled via `Actor._cancel_task` where that task doesn't yet have a cancel msg set yet. This makes is much easier to distinguish between ctx cancellations due to some remote error vs. Explicit remote requests via any of `Actor.cancel()`, `Portal.cancel_actor()` or `Context.cancel()`.	2025-03-16 14:01:50 -04:00
Tyler Goodlet	4fb34772e7	Mega-refactor on `._invoke()` targeting `@context`s Since eventually we want to implement all other RPC "func types" as contexts underneath this starts the rework to move all the other cases into a separate func not only to simplify the main `._invoke()` body but also as a reminder of the intention to do it XD Details of re-factor: - add a new `._invoke_non_context()` which just moves all the old blocks for non-context handling to a single def. - factor what was basically just the `finally:` block handler (doing all the task bookkeeping) into a new `@acm`: `_errors_relayed_via_ipc()` with that content packed into the post-`yield` (also with a `hide_tb: bool` flag added of course). * include a `debug_kbis: bool` for when needed. - since the `@context` block is the only type left in the main `_invoke()` body, de-dent it so it's more grok-able B) Obviously this patch also includes a few improvements regarding context-cancellation-semantics (for the `context` RPC case) on the callee side in order to match previous changes to the `Context` api: - always setting any ctxc as the `Context._local_error`. - using the new convenience `.maybe_raise()` topically (for now). - avoiding any previous reliance on `Context.cancelled_caught` for anything public of meaning. Further included is more logging content updates: - being pedantic in `.cancel()` msgs about whether termination is caused by error or ctxc. - optional `._invoke()` traceback hiding via a `hide_tb: bool`. - simpler log headers throughout instead leveraging new `.__repr__()` on primitives. - buncha `<= <actor-uid>` sent some message emissions. - simplified handshake statuses reporting. Other subsys api changes we need to match: - change to `Channel.transport`. - avoiding any `local_nursery: ActorNursery` waiting when the `._implicit_runtime_started` is set. And yes, lotsa more comments for #TODOs dawg.. since there's always somethin!	2025-03-16 14:01:48 -04:00
Tyler Goodlet	a388d3185b	Few more log msg tweaks in runtime	2025-03-14 22:18:31 -04:00
Tyler Goodlet	9420ea0c14	Tweak `Actor` cancel method signatures Besides improving a bunch more log msg contents similarly as before this changes the cancel method signatures slightly with different arg names: for `.cancel()`: - instead of `requesting_uid: str` take in a `req_chan: Channel` since we can always just read its `.uid: tuple` for logging and further we can then offer the `chan=None` case indicating a "self cancel" (since there's no "requesting channel"). - the semantics of "requesting" here better indicate that the IPC connection is an IPC peer and further (eventually) will allow permission checking against given peers for cancellation requests. - when `chan==None` we also define a meth-internal `requester_type: str` differently for logging content :) - add much more detailed `.cancel()` content around the requester, its type, and any debugger related locking steps. for `._cancel_task()`: - change the `chan` arg to `parent_chan: Channel` since "parent" correctly indicates that the channel is the parent of the locally spawned rpc task to cancel; in fact no other chan should be able to cancel tasks parented/spawned by other channels obvi! - also add more extensive meth-internal `.cancel()` logging with a #TODO around showing only the "relevant/lasest" `Context` state vars in such logging content. for `.cancel_rpc_tasks()`: - shorten `requesting_uid` -> `req_uid`. - add `parent_chan: Channel` to be similar as above in `._cancel_task()` (since it's internally delegated to anyway) which replaces the prior `only_chan` and use it to filter to only tasks spawned by this channel (thus as their "parent") as before. - instead of `if tasks:` to enter, invert and `return` early on `if not tasks`, for less indentation B) - add WIP str-repr format (for `.cancel()` emissions) to show a multi-address (maddr) + task func (via the new `Context._nsf`) and report all cancel task targets with it a "tree"; include #TODO to finalize and implement some utils for all this! To match ensure we adjust `process_messages()` self/`Actor` cancel handling blocks to provide the new `kwargs` (now with `dict`-merge syntax) to `._invoke()`.	2025-03-14 22:18:29 -04:00
Tyler Goodlet	e403d63eb7	Better logging for cancel requests in IPC msg loop As similarly improved in other parts of the runtime, adds much more pedantic (`.cancel()`) logging content to indicate the src of remote cancellation request particularly for `Actor.cancel()` and `._cancel_task()` cases prior to `._invoke()` task scheduling. Also add detailed case comments and much more info to the "request-to-cancel-already-terminated-RPC-task" log emission to include the `Channel` and `Context.cid` deats. This helped me find the src of a race condition causing a test to fail where a callee ctx task was returning a result before an expected `ctx.cancel()` request arrived B). Adding much more pedantic `.cancel()` msg contents around the requester's deats should ensure these cases are much easier to detect going forward! Also, simplify the `._invoke()` final result/error log msg to only put one of either the final error or returned result above the `Context` pprint.	2025-03-14 22:13:12 -04:00
Tyler Goodlet	3c385c6949	Use `NamespacePath` in `Context` mgmt internals The only case where we can't is in `Portal.run_from_ns()` usage (since we pass a path with `self:<Actor.meth>`) and because `.to_tuple()` internally uses `.load_ref()` which will of course fail on such a path.. So or now impl as, - mk `Actor.start_remote_task()` take a `nsf: NamespacePath` but also offer a `load_nsf: bool = False` such that by default we bypass ref loading (maybe this is fine for perf long run as well?) for the `Actor`/'self:'` case mentioned above. - mk `.get_context()` take an instance `nsf` obvi. More logging msg format tweaks: - change msg-flow related content to show the `Context._nsf`, which, right, is coming follow up commit.. - bunch more `.runtime()` format updates to show `msg: dict` contents and internal primitives with trailing `'\n'` for easier reading. - report import loading `stackscope` in subactors.	2025-03-14 22:11:57 -04:00
Tyler Goodlet	4b0aa5e379	Baboso! fix `chan.send(None)` indent..	2025-03-14 15:49:37 -04:00
Tyler Goodlet	6a303358df	Improved log msg formatting in core As part of solving some final edge cases todo with inter-peer remote cancellation (particularly a remote cancel from a separate actor tree-client hanging on the request side in `modden`..) I needed less dense, more line-delimited log msg formats when understanding ipc channel and context cancels from console logging; this adds a ton of that to: - `._invoke()` which now does, - better formatting of `Context`-task info as multi-line `'<field>: <value>\n'` messages, - use of `trio.Task` (from `.lowlevel.current_task()` for full rpc-func namespace-path info, - better "msg flow annotations" with `<=` for understanding `ContextCancelled` flow. - `Actor._stream_handler()` where in we break down IPC peers reporting better as multi-line `\|_<Channel>` log msgs instead of all jammed on one line.. - `._ipc.Channel.send()` use `pformat()` for repr of packet. Also tweak some optional deps imports for debug mode: - add `maybe_import_gb()` for attempting to import `greenback`. - maybe enable `stackscope` tree pprinter on `SIGUSR1` if installed. Add a further stale-debugger-lock guard before removal: - read the `._debug.Lock.global_actor_in_debug: tuple` uid and possibly `maybe_wait_for_debugger()` when the child-user is known to have a live process in our tree. - only cancel `Lock._root_local_task_cs_in_debug: CancelScope` when the disconnected channel maps to the `Lock.global_actor_in_debug`, though not sure this is correct yet? Started adding missing type annots in sections that were modified.	2025-03-14 15:49:36 -04:00
Tyler Goodlet	13fbcc723f	Guarding for IPC failures in `._runtime._invoke()` Took me longer then i wanted to figure out the source of a failed-response to a remote-cancellation (in this case in `modden` where a client was cancelling a workspace layer.. but disconnects before receiving the ack msg) that was triggering an IPC error when sending the error msg for the cancellation of a `Actor._cancel_task()`, but since this (non-rpc) `._invoke()` task was trying to send to a now disconnected canceller it was resulting in a `BrokenPipeError` (or similar) error. Now, we except for such IPC errors and only raise them when, 1. the transport `Channel` is for sure up (bc ow what's the point of trying to send an error on the thing that caused it..) 2. it's definitely for handling an RPC task Similarly if the entire main invoke `try:` excepts, - we only hide the call-stack frame from the debugger (with `__tracebackhide__: bool`) if it's an RPC task that has a connected channel since we always want to see the frame when debugging internal task or IPC failures. - we don't bother trying to send errors to the context caller (actor) when it's a non-RPC request since failures on actor-runtime-internal tasks shouldn't really ever be reported remotely, only maybe raised locally. Also some other tidying, - this properly corrects for the self-cancel case where an RPC context is cancelled due to a local (runtime) task calling a method like `Actor.cancel_soon()`. We now set our own `.uid` as the `ContextCancelled.canceller` value so that other-end tasks know that the cancellation was due to a self-cancellation by the actor itself. We still need to properly test for this though! - add a more detailed module doc-str. - more explicit imports for `trio` core types throughout.	2025-03-14 13:56:23 -04:00
Tyler Goodlet	f5fcd8ca2e	Be mega-pedantic with `ContextCancelled` semantics As part of extremely detailed inter-peer-actor testing, add much more granular `Context` cancellation state tracking via the following (new) fields: - `.canceller: tuple[str, str]` the uuid of the actor responsible for the cancellation condition - always set by `Context._maybe_cancel_and_set_remote_error()` and replaces `._cancelled_remote` and `.cancel_called_remote`. If set, this value should normally always match a value from some `ContextCancelled` raised or caught by one side of the context. - `._local_error` which is always set to the locally raised (and caller or callee task's scope-internal) error which caused any eventual cancellation/error condition and thus any closure of the context's per-task-side-`trio.Nursery`. - `.cancelled_caught: bool` is now always `True` whenever the local task catches (or "silently absorbs") a `ContextCancelled` (a `ctxc`) that indeed originated from one of the context's linked tasks or any other context which raised its own `ctxc` in the current `.open_context()` scope. => whenever there is a case that no `ContextCancelled` was raised in the `.open_context().__aexit__()` (eg. `ctx.result()` called after a call `ctx.cancel()`), we still consider the context's as having "caught a cancellation" since the `ctxc` was indeed silently handled by the cancel requester; all other error cases are already represented by mirroring the state of the `._scope: trio.CancelScope` => IOW there should be no case where an error is not raised in the context's scope and `.cancelled_caught: bool == False`, i.e. no case where `._scope.cancelled_caught == False and ._local_error is not None`! - always raise any `ctxc` from `.open_stream()` if `._cancel_called == True` - if the cancellation request has not already resulted in a `._remote_error: ContextCancelled` we raise a `RuntimeError` to indicate improper usage to the guilty side's task code. - make `._maybe_raise_remote_err()` a sync func and don't raise any `ctxc` which is matched against a `.canceller` determined to be the current actor, aka a "self cancel", and always set the `._local_error` to any such `ctxc`. - `.side: str` taken from inside `.cancel()` and unused as of now since it might be better re-written as a similar `.is_opener() -> bool`? - drop unused `._started_received: bool`.. - TONS and TONS of detailed comments/docs to attempt to explain all the possible cancellation/exit cases and how they should exhibit as either silent closes or raises from the `Context` API! Adjust the `._runtime._invoke()` code to match: - use `ctx._maybe_raise_remote_err()` in `._invoke()`. - adjust to new `.canceller` property. - more type hints. - better `log.cancel()` msging around self-cancels vs. peer-cancels. - always set the `._local_error: BaseException` for the "callee" task just like `Portal.open_context()` now will do B) Prior we were raising any `Context._remote_error` directly and doing (more or less) the same `ContextCancelled` "absorbing" logic (well kinda) in block; instead delegate to the method	2025-03-14 13:42:55 -04:00
Tyler Goodlet	a7c36a9cbe	Tidy/clarify another `._runtime` comment	2025-03-14 13:40:19 -04:00
Tyler Goodlet	ee94d6d62c	Teensie tidy up on actor doc string	2025-03-14 13:36:16 -04:00
Tyler Goodlet	6495688730	Drop `Optional` style from runtime mod	2023-05-25 16:00:05 -04:00
Tyler Goodlet	a0276f41c2	Remote cancellation runtime-internal vars renames - `Context._cancel_called_remote` -> `._cancelled_remote` since "called" implies the cancellation was "requested" when it could be due to another error and the actor uid is the value - only set once the far end task scope is terminated due to either error or cancel, which has nothing to do with what caused the cancellation. - `Actor._cancel_called_remote` -> `._cancel_called_by_remote` which emphasizes that this variable is only set IFF some remote actor requested that this actor's runtime be cancelled via `Actor.cancel()`.	2023-05-19 14:31:55 -04:00
Tyler Goodlet	60791ed546	Oof, fix remaining `Actor.cancel()` in `Actor._from_parent()`	2023-05-15 10:00:45 -04:00
Tyler Goodlet	20d75ff934	Move move context code into new `._context` mod	2023-05-15 10:00:45 -04:00
Tyler Goodlet	968f13f9ef	Set `Context._scope_nursery` on callee side too Because obviously we probably want to support `allow_overruns` on the remote callee side as well XD Only found the bugs fixed in this patch this thanks to writing a much more exhaustive test set for overrun cases B)	2023-05-15 10:00:45 -04:00
Tyler Goodlet	c72026091e	Remote `Context` cancellation semantics rework B) This adds remote cancellation semantics to our `tractor.Context` machinery to more closely match that of `trio.CancelScope` but with operational differences to handle the nature of parallel tasks interoperating across multiple memory boundaries: - if an actor task cancels some context it has opened via `Context.cancel()`, the remote (scope linked) task will be cancelled using the normal `CancelScope` semantics of `trio` meaning the remote cancel scope surrounding the far side task is cancelled and `trio.Cancelled`s are expected to be raised in that scope as per normal `trio` operation, and in the case where no error is raised in that remote scope, a `ContextCancelled` error is raised inside the runtime machinery and relayed back to the opener/caller side of the context. - if any actor task cancels a full remote actor runtime using `Portal.cancel_actor()` the same semantics as above apply except every other remote actor task which also has an open context with the actor which was cancelled will also be sent a `ContextCancelled` but with the `.canceller` field set to the uid of the original cancel requesting actor. This changeset also includes a more "proper" solution to the issue of "allowing overruns" during streaming without attempting to implement any form of IPC streaming backpressure. Implementing task-granularity backpressure cross-process turns out to be more or less impossible without augmenting out streaming protocol (likely at the cost of performance). Further allowing overruns requires special care since any blocking of the runtime RPC msg loop task effectively can block control msgs such as cancels and stream terminations. The implementation details per abstraction layer are as follows. ._streaming.Context: - add a new contructor factor func `mk_context()` which provides a strictly private init-er whilst allowing us to not have to define an `.__init__()` on the type def. - add public `.cancel_called` and `.cancel_called_remote` properties. - general rename of what was the internal `._backpressure` var to `._allow_overruns: bool`. - move the old contents of `Actor._push_result()` into a new `._deliver_msg()` allowing for better encapsulation of per-ctx msg handling. - always check for received 'error' msgs and process them with the new `_maybe_cancel_and_set_remote_error()` before any msg delivery to the local task, thus guaranteeing error and cancellation handling despite any overflow handling. - add a new `._drain_overflows()` task-method for use with new `._allow_overruns: bool = True` mode. - add back a `._scope_nursery: trio.Nursery` (allocated in `Portal.open_context()`) who's sole purpose is to spawn a single task which runs the above method; anything else is an error. - augment `._deliver_msg()` to start a task and run the above method when operating in no overrun mode; the task queues overflow msgs and attempts to send them to the underlying mem chan using a blocking `.send()` call. - on context exit, any existing "drainer task" will be cancelled and remaining overflow queued msgs are discarded with a warning. - rename `._error` -> `_remote_error` and set it in a new method `_maybe_cancel_and_set_remote_error()` which is called before processing - adjust `.result()` to always call `._maybe_raise_remote_err()` at its start such that whenever a `ContextCancelled` arrives we do logic for whether or not to immediately raise that error or ignore it due to the current actor being the one who requested the cancel, by checking the error's `.canceller` field. - set the default value of `._result` to be `id(Context()` thus avoiding conflict with any `.result()` actually being `False`.. ._runtime.Actor: - augment `.cancel()` and `._cancel_task()` and `.cancel_rpc_tasks()` to take a `requesting_uid: tuple` indicating the source actor of every cancellation request. - pass through the new `Context._allow_overruns` through `.get_context()` - call the new `Context._deliver_msg()` from `._push_result()` (since the factoring out that method's contents). ._runtime._invoke: - `TastStatus.started()` back a `Context` (unless an error is raised) instead of the cancel scope to make it easy to set/get state on that context for the purposes of cancellation and remote error relay. - always raise any remote error via `Context._maybe_raise_remote_err()` before doing any `ContextCancelled` logic. - assign any `Context._cancel_called_remote` set by the `requesting_uid` cancel methods (mentioned above) to the `ContextCancelled.canceller`. ._runtime.process_messages: - always pass a `requesting_uid: tuple` to `Actor.cancel()` and `._cancel_task` to that any corresponding `ContextCancelled.canceller` can be set inside `._invoke()`.	2023-05-15 10:00:45 -04:00
Tyler Goodlet	e80e0a551f	Change a bunch of log levels to cancel, including any `ContextCancelled` handling	2023-05-15 10:00:45 -04:00
Tyler Goodlet	d75343106b	More single doc-strs in discovery mod	2023-05-15 10:00:45 -04:00
Tyler Goodlet	df01294bb2	Show more functiony syntax in ctx-cancelled log msgs	2023-01-29 14:55:02 -05:00
Tyler Goodlet	ddf3d0d1b3	Show tracebacks for un-shipped/propagated errors	2023-01-29 14:55:02 -05:00
Tyler Goodlet	97d5f7233b	Fix uid2nursery lookup table type annot	2023-01-29 14:55:02 -05:00
Tyler Goodlet	d27c081a15	Ensure arbiter sockaddr type before usage	2023-01-29 14:55:02 -05:00
Tyler Goodlet	4f977189c0	Handle broken mem chan on `Actor._push_result()` When backpressure is used and a feeder mem chan breaks during msg delivery (usually because the IPC allocating task already terminated) instead of raising we simply warn as we do for the non-backpressure case. Also, add a proper `Actor.is_arbiter` test inside `._invoke()` to avoid doing an arbiter-registry lookup if the current actor is the registrar.	2023-01-29 14:55:02 -05:00
Tyler Goodlet	fca2e7c10e	Simplify closed abruptly log msg	2023-01-26 12:44:13 -05:00

1 2

60 Commits (efb69f9bf9966d3581375db5c50df9d865e43a82)