Compare commits

..

169 Commits

Author SHA1 Message Date
Tyler Goodlet d28c7e17c6 Add `.trionics._broadcast` todos for py 3.12 2024-03-13 16:09:31 -04:00
Tyler Goodlet d23d8c1779 Start a `._rpc` module
Since `._runtime` was getting pretty long (> 2k LOC) and much of the RPC
low-level machinery is fairly isolated to a handful of task-funcs, it
makes sense to re-org the RPC task scheduling and driving msg loop to
its own code space.

The move includes:
- `process_messages()` which is the main IPC business logic.
- `try_ship_error_to_remote()` helper, to box local errors for the wire.
- `_invoke()`, the core task scheduler entrypoing used in the msg loop.
- `_invoke_non_context()`, holds impls for non-`@context` task starts.
- `_errors_relayed_via_ipc()` which does all error catch-n-boxing for
   wire-msg shipment using `try_ship_error_to_remote()` internally.

Also inside `._runtime` improve some `Actor` methods docs.
2024-03-13 15:57:15 -04:00
Tyler Goodlet 58cc57a422 Move `Portal.open_context()` impl to `._context`
Finally, since normally you need the content from `._context.Context`
and surroundings in order to effectively grok `Portal.open_context()`
anyways, might as well move the impl to the ctx module as
`open_context_from_portal()` and just bind it on the `Portal` class def.

Associated/required tweaks:
- avoid circ import on `.devx` by only import
  `.maybe_wait_for_debugger()` when debug mode is set.
- drop `async_generator` usage, not sure why this hadn't already been
  changed to `contextlib`?
- use `@acm` alias throughout `._portal`
2024-03-13 12:09:38 -04:00
Tyler Goodlet da913ef2bb Attempt at better internal traceback hiding
Previously i was trying to approach this using lots of
`__tracebackhide__`'s in various internal funcs but since it's not
exactly straight forward to do this inside core deps like `trio` and the
stdlib, it makes a bit more sense to optionally catch and re-raise
certain classes of errors from their originals using `raise from` syntax
as per:
https://docs.python.org/3/library/exceptions.html#exception-context

Deats:
- litter `._context` methods with `__tracebackhide__`/`hide_tb` which
  were previously being shown but that don't need to be to application
  code now that cancel semantics testing is finished up.
- i originally did the same but later commented it all out in `._ipc`
  since error catch and re-raise instead in higher level layers
  (above the transport) seems to be a much saner approach.
- add catch-n-reraise-from in `MsgStream.send()`/.`receive()` to avoid
  seeing the depths of `trio` and/or our `._ipc` layers on comms errors.

Further this patch adds some refactoring to use the
same remote-error shipper routine from both the actor-core in the RPC
invoker:
- rename it as `try_ship_error_to_remote()` and call it from
  `._invoke()` as well as it's prior usage.
- make it optionally accept `cid: str` a `remote_descr: str` and of
  course a `hide_tb: bool`.

Other misc tweaks:
- add some todo notes around `Actor.load_modules()` debug hooking.
- tweak the zombie reaper log msg and timeout value ;)
2024-03-13 10:44:51 -04:00
Tyler Goodlet 96992bcbb9 Add (back) a `tractor._testing` sub-pkg
Since importing from our top level `conftest.py` is not scaleable
or as "future forward thinking" in terms of:
- LoC-wise (it's only one file),
- prevents "external" (aka non-test) example scripts from importing
  content easily,
- seemingly(?) can't be used via abs-import if using
  a `[tool.pytest.ini_options]` in a `pyproject.toml` vs.
  a `pytest.ini`, see:
  https://docs.pytest.org/en/8.0.x/reference/customize.html#pyproject-toml)

=> Go back to having an internal "testing" pkg like `trio` (kinda) does.

Deats:
- move generic top level helpers into pkg-mod including the new
  `expect_ctxc()` (which i needed in the advanced faults testing script.
- move `@tractor_test` into `._testing.pytest` sub-mod.
- adjust all the helper imports to be a `from tractor._testing import <..>`

Rework `test_ipc_channel_break_during_stream()` and backing script:
- make test(s) pull `debug_mode` from new fixture (which is now
  controlled manually from `--tpdb` flag) and drop the previous
  parametrized input.
- update logic in ^ test for "which-side-fails" cases to better match
  recently updated/stricter cancel/failure semantics in terms of
  `ClosedResouruceError` vs. `EndOfChannel` expectations.
- handle `ExceptionGroup`s with expected embedded errors in test.
- better pendantics around whether to expect a user simulated KBI.
- for `examples/advanced_faults/ipc_failure_during_stream.py` script:
  - generalize ipc breakage in new `break_ipc()` with support for diff
    internal `trio` methods and a #TODO for future disti frameworks
  - only make one sub-actor task break and the other just stream.
  - use new `._testing.expect_ctxc()` around ctx block.
  - add a bit of exception handling with `print()`s around ctxc (unused
    except if 'msg' break method is set) and eoc cases.
  - don't break parent side ipc in loop any more then once
    after first break, checked via flag var.
  - add a `pre_close: bool` flag to control whether
    `MsgStreama.aclose()` is called *before* any ipc breakage method.

Still TODO:
- drop `pytest.ini` and add the alt section to `pyproject.py`.
 -> currently can't get `--rootdir=` opt to work.. not showing in
   console header.
 -> ^ also breaks on 'tests' `enable_modules` imports in subactors
   during discovery tests?
2024-03-13 09:09:08 -04:00
Tyler Goodlet 6533285d7d Add `an: ActorNursery` var placeholder for final log msg 2024-03-12 08:56:17 -04:00
Tyler Goodlet 8c39b8b124 Tweak some tests for spurious failues
With the seeming cause that some cases occasionally raise
`ExceptionGroup` instead of a (collapsed out) single error which, in
those cases at least try to check that `.exceptions` has the original
error.
2024-03-11 10:37:34 -04:00
Tyler Goodlet ededa2e88f More spaceless union type annots 2024-03-11 10:33:06 -04:00
Tyler Goodlet dd168184c3 Add a open-ctx-with-self test
Found exactly why trying this won't work when playing around with
opening workspaces in `modden` using a `Portal.open_context()` back to
the 'bigd' root actor: the RPC machinery only registers one entry in
`Actor._contexts` which will get overwritten by each task's side and
then experience race-based IPC msging errors (eg. rxing `{'started': _}`
on the callee side..). Instead make opening a ctx back to the self-actor
a runtime error describing it as an invalid op.

To match:
- add a new test `test_ctx_with_self_actor()` to the context semantics
  suite.
- tried out adding a new `side: str` to the `Actor.get_context()` (and
  callers) but ran into not being able to determine the value from in
  `._push_result()` where it's needed to figure out which side to push
  to.. So, just leaving the commented arg (passing) in the runtime core
  for now in case we can come back to trying to make it work, tho i'm
  thinking it's not the right hack anyway XD
2024-03-11 10:29:42 -04:00
Tyler Goodlet 37ee477aee Let `MsgStream.receive_nowait()` take in msg key list
Call it `allow_msg_keys: list[str] = ['yield']` and set it to accept
`['yield', 'return']` from the drain loop in `.aclose()`. Only pass the
last key error to `_raise_from_no_key_in_msg()` in the fall-through
case.

Somehow this seems to prevent all the intermittent test failures i was
seeing in local runs including when running the entire suite all in
sequence; i ain't complaining B)
2024-03-11 10:20:55 -04:00
Tyler Goodlet f067cf48a7 Unify some log msgs in `.to_asyncio`
Much like similar recent changes throughout the core, build out `msg:
str` depending on error cases and emit with `.cancel()` level as
appropes. Also mute (via level) some duplication in the cancel case
inside `_run_asyncio_task()` for console noise reduction.
2024-03-08 16:07:17 -05:00
Tyler Goodlet c56d4b0a79 Assign `ctx._local_error` ASAP from `.open_context()`
Such that `.outcome` related fields render nicely asap for logging
withing `Portal.open_context()` itself.
2024-03-08 16:03:13 -05:00
Tyler Goodlet 7cafb59ab7 Tweak `Context.repr_outcome()` for KBIs
Since apparently `str(KeyboardInterrupt()) == ''`? So instead add little
`<str> or repr(merr)` expressions throughout to avoid blank strings
rendering if various `repr()`/`.__str__()` outputs..
2024-03-08 15:46:42 -05:00
Tyler Goodlet 7458f99733 Add a `._state._runtime_vars['_registry_addrs']`
Such that it's set to whatever `Actor.reg_addrs: list[tuple]` is during
the actor's init-after-spawn guaranteeing each actor has at least the
registry infos from its parent. Ensure we read this if defined over
`_root._default_lo_addrs` in `._discovery` routines, namely
`.find_actor()` since it's the one API normally used without expecting
the runtime's `current_actor()` to be up.

Update the latest inter-peer cancellation test to use the `reg_addr`
fixture (and thus test this new runtime-vars value via `find_actor()`
usage) since it was failing if run *after* the infected `asyncio` suite
due to registry contact failure.
2024-03-08 15:34:20 -05:00
Tyler Goodlet 4c3c3e4b56 Support a `._state.last_actor()` getter
Not sure if it's really that useful other then for reporting errors from
`current_actor()` but at least it alerts `tractor` devs and/or users
when the runtime has already terminated vs. hasn't been started
yet/correctly.

Set the `._last_actor_terminated: tuple` in the root's final block which
allows testing for an already terminated tree which is the case where
`._state._current_actor == None` and the last is set.
2024-03-08 14:11:17 -05:00
Tyler Goodlet b29d33d603 Make `Actor._cancel_task(requesting_uid: tuple)` required arg 2024-03-08 14:03:18 -05:00
Tyler Goodlet 1617e0ff2c Woops, fix one last `ctx._cancelled_caught` in drain loop 2024-03-08 13:48:35 -05:00
Tyler Goodlet c025761f15 Adjust `asyncio` test for stricter ctx-self-cancels
Use `expect_ctx()` around the portal cancellation case, toss in
a `'context'` parametrization and return just the `Context.outcome` from
`main()` B)
2024-03-07 21:33:07 -05:00
Tyler Goodlet 2e797ef7ee Update ctx test suites to stricter semantics
Including mostly tweaking asserts on relayed `ContextCancelled`s and
the new pub ctx properties: `.outcome`, `.maybe_error`, etc. as it
pertains to graceful (absorbed) remote cancellation vs. loud ctxc cases
expected to be raised by any `Portal.cancel_actor()` style teardown.

Start checking a variety internals like `._remote/local_error`,
`._is_self_cancelled()`, `._is_final_result_set()`, `._cancel_msg`
where applicable.

Also factor out the new `expect_ctxc()` checker to our `conftest.py` for
use in other suites.
2024-03-07 21:26:57 -05:00
Tyler Goodlet c36deb1f4d Woops, fix `_post_mortem()` type sig..
We're passing a `extra_frames_up_when_async=2` now (from prior attempt
to hide `CancelScope.__exit__()` when `shield=True`) and thus both
`debug_func`s must accept it 🤦

On the brighter side found out that the `TypeError` from the call-sig
mismatch was actually being swallowed entirely so add some
`.exception()` msgs for such cases to at least alert the dev they broke
stuff XD
2024-03-07 21:24:34 -05:00
Tyler Goodlet fa7e37d6ed (Event) more pedantic `.cancel_acked: bool` def
Changes the condition logic to be more strict and moves it to a private
`._is_self_cancelled() -> bool` predicate which can be used elsewhere
(instead of having almost similar duplicate checks all over the
place..) and allows taking in a specific `remote_error` just for
verification purposes (like for tests).

Main strictness distinctions are now:
- obvi that `.cancel_called` is set (this filters any
  `Portal.cancel_actor()` or other out-of-band RPC),
- the received `ContextCancelled` **must** have its `.canceller` set to
  this side's `Actor.uid` (indicating we are the requester).
- `.src_actor_uid` **must** be the same as the `.chan.uid` (so the error
  must have originated from the opposite side's task.
- `ContextCancelled.canceller` should be already set to the `.chan.uid`
  indicating we received the msg via the runtime calling
  `._deliver_msg()` -> `_maybe_cancel_and_set_remote_error()` which
  ensures the error is specifically destined for this ctx-task exactly
  the same as how `Actor._cancel_task()` sets it from an input
  `requesting_uid` arg.

In support of the above adjust some impl deats:
- add `Context._actor: Actor` which is set once in `mk_context()` to
  avoid issues (particularly in testing) where `current_actor()` raises
  after the root actor / runtime is already exited. Use `._actor.uid` in
  both `.cancel_acked` (obvi) and '_maybe_cancel_and_set_remote_error()`
  when deciding whether to call `._scope.cancel()`.
- always cast `.canceller` to `tuple` if not null.
- delegate `.cancel_acked` directly to new private predicate (obvi).
- always set `._canceller` from any `RemoteActorError.src_actor_uid` or
  failing over to the `.chan.uid` when a non-remote error (tho that
  shouldn't ever happen right?).
- more extensive doc-string for `.cancel()` detailing the new strictness
  rules about whether an eventual `.cancel_acked` might be set.

Also tossed in even more logging format tweaks by adding a
`type_only: bool` to `.repr_outcome()` as desired for simpler output in
the `state: <outcome-repr-here>` and `.repr_rpc()` sections of the
`.__str__()`.
2024-03-07 20:35:43 -05:00
Tyler Goodlet 364ea91983 Set `._cancel_msg` to RPC `{cmd: 'self._cancel_task', ..}` msg
Like how we set `Context._cancel_msg` in `._deliver_msg()` (in
which case normally it's an `{'error': ..}` msg), do the same when any
RPC task is remotely cancelled via `Actor._cancel_task` where that task
doesn't yet have a cancel msg set yet.

This makes is much easier to distinguish between ctx cancellations due
to some remote error vs. Explicit remote requests via any of
`Actor.cancel()`, `Portal.cancel_actor()` or `Context.cancel()`.
2024-03-07 18:24:00 -05:00
Tyler Goodlet 7ae9b5319b Tweak inter-peer `._scope` state asserts
We don't expect `._scope.cancelled_caught` to be set really ever on
inter-peer cancellation since no ctx is ever cancelling itself, a peer
cancels some other and then bubbles back to all other peers.

Also add `ids: lambda` for `error_during_ctxerr_handling` param to
`test_peer_canceller()`
2024-03-06 16:09:38 -05:00
Tyler Goodlet 6156ff95f8 Add `shield: bool` support to `.pause()`
It's been on the todo for a while and I've given up trying to properly
hide the `trio.CancelScope.__exit__()` frame for now instead opting to
just `log.pdb()` a big apology XD

Users can obvi still just not use the flag and wrap `tractor.pause()` in
their own cs block if they want to avoid having to hit `'up'` in the pdb
REPL if needed in a cancelled task-scope.

Impl deatz:
- factor orig `.pause()` impl into new `._pause()` so that we can more tersely
  wrap the original content depending on `shield: bool` input; only open
  the cancel-scope when shield is set to avoid aforemented extra strack
  frame annoyance.
- pass through `shield` to underlying `_pause` and `debug_func()` so we
  can actually know when so log our apology.
- add a buncha notes to new `.pause()` wrapper regarding the inability
  to hide the cancel-scope `.__exit__()`, inluding that overriding the
  code in `trio._core._run.CancelScope` doesn't seem to solve the issue
  either..

Unrelated `maybe_wait_for_debugger()` tweaks:
- don't read `Lock.global_actor_in_debug` more then needed, rename local
  read var to `in_debug` (since it can also hold the root actor uid, not
  just sub-actors).
- shield the `await debug_complete.wait()` since ideally we avoid the
  root cancellation child-actors in debug even when the root calls this
  func in a cancelled scope.
2024-03-06 14:37:54 -05:00
Tyler Goodlet 9e3f41a5b1 Tweak inter-peer tests for new/refined semantics
Buncha subtle details changed mostly to do with when `Context.cancel()`
gets called on "real" remote errors vs. (peer requested) cancellation
and then local side handling of `ContextCancelled`.

Specific changes to make tests pass:
- due to raciness with `sleeper_ctx.result()` raising the ctxc locally
  vs. the child-peers receiving similar ctxcs themselves (and then
  erroring and propagating back to the root parent), we might not see
  `._remote_error` set during the sub-ctx loops (except for the sleeper
  itself obvi).
- do not expect `.cancel_called`/`.cancel_caught` to be set on any
  sub-ctx since currently `Context.cancel()` is only called non-shielded
  and thus is not in invoked when `._scope.cancel()` is called as part
  of each root-side ctx ref/block handling the inter-peer ctxc.
- do not expect `Context._scope.cancelled_caught` to be set in most cases
  (even the sleeper)

TODO Outstanding adjustments not fixed yet:
-[ ] `_scope.cancelled_caught` checks outside the `.open_context()`
  blocks.
2024-03-06 10:13:41 -05:00
Tyler Goodlet 7c22f76274 Yahh, add `.devx` package to installed subpkgs.. 2024-03-06 09:55:05 -05:00
Tyler Goodlet 04c99c2749 Woops, add `.msg` sub-pkg to install set 2024-03-06 09:48:46 -05:00
Tyler Goodlet e536057fea `._entry`: use same msg info in start/terminate log 2024-03-05 12:30:34 -05:00
Tyler Goodlet c6b4da5788 Tweak `._portal` log content to use `Context.repr_outcome()` 2024-03-05 12:26:33 -05:00
Tyler Goodlet 1f7f84fdfa Mk debugger tests work for arbitrary pre-REPL format
Since this was changed as part of overall project wide logging format
updates, and i ended up changing the both the crash and pause `.pdb()`
msgs to include some multi-line-ascii-"stuff", might as well make the
pre-prompt checks in the test suite more flexible to match.

As such, this exposes 2 new constants inside the `.devx._debug` mod:
- `._pause_msg: str` for the pre `tractor.pause()` header emitted via
  `log.pdb()` and,
- `._crash_msg: str` for the pre `._post_mortem()` equiv when handling
  errors in debug mode.

Adjust the test suite to use these values and thus make us more capable
to absorb changes in the future as well:
- add a new `in_prompt_msg()` predicate, very similar to `assert_before()`
  but minus `assert`s which takes in a `parts: list[str]` to match
  in the pre-prompt stdout.
- delegate to `in_prompt_msg()` in `assert_before()` since it was mostly
  duplicate minus `assert`.
- adjust all previous `<patt> in before` asserts to instead use
  `in_prompt_msg()` with separated pre-prompt-header vs. actor-name
  `parts`.
- use new `._pause/crash_msg` values in all such calls including any
  `assert_before()` cases.
2024-03-05 12:22:04 -05:00
Tyler Goodlet a5bdc6db66 Flip rpc tests over to use `ExceptionGroup` on new `trio` 2024-03-05 10:34:32 -05:00
Tyler Goodlet 9a18b57d38 Mega-refactor on `._invoke()` targeting `@context`s
Since eventually we want to implement all other RPC "func types" as
contexts underneath this starts the rework to move all the other cases
into a separate func not only to simplify the main `._invoke()` body but
also as a reminder of the intention to do it XD

Details of re-factor:
- add a new `._invoke_non_context()` which just moves all the old blocks
  for non-context handling to a single def.
- factor what was basically just the `finally:` block handler (doing all
  the task bookkeeping) into a new `@acm`: `_errors_relayed_via_ipc()`
  with that content packed into the post-`yield` (also with a `hide_tb:
  bool` flag added of course).
  * include a `debug_kbis: bool` for when needed.
- since the `@context` block is the only type left in the main
  `_invoke()` body, de-dent it so it's more grok-able B)

Obviously this patch also includes a few improvements regarding
context-cancellation-semantics (for the `context` RPC case) on the
callee side in order to match previous changes to the `Context` api:
- always setting any ctxc as the `Context._local_error`.
- using the new convenience `.maybe_raise()` topically (for now).
- avoiding any previous reliance on `Context.cancelled_caught` for
  anything public of meaning.

Further included is more logging content updates:
- being pedantic in `.cancel()` msgs about whether termination is caused
  by error or ctxc.
- optional `._invoke()` traceback hiding via a `hide_tb: bool`.
- simpler log headers throughout instead leveraging new `.__repr__()` on
  primitives.
- buncha `<= <actor-uid>` sent some message emissions.
- simplified handshake statuses reporting.

Other subsys api changes we need to match:
- change to `Channel.transport`.
- avoiding any `local_nursery: ActorNursery` waiting when the
  `._implicit_runtime_started` is set.

And yes, lotsa more comments for #TODOs dawg.. since there's always
somethin!
2024-03-02 22:12:00 -05:00
Tyler Goodlet ed10632d97 Avoid `ctx.cancel()` after ctxc rxed in `.open_context()`
In the case where the callee side delivers us a ctxc with `.canceller`
set we can presume that remote cancellation already has taken place and
thus we don't need to do the normal call-`Context.cancel()`-on-error
step. Further, in the case where we do call it also handle any
`trio.CloseResourceError` gracefully with a `.warning()`.

Also, originally I had added a post-`yield`-maybe-raise to attempt
handling any remote ctxc the same as for the local case (i.e. raised
from `yield` line) wherein if we get a remote ctxc the same handler
branch-path would trigger, thus avoiding different behaviour in that
case. I ended up masking it out (but can't member why.. ) as it seems
the normal `.result()` call and its internal handling gets the same
behaviour? I've left in the heavily commented code in case it ends up
being the better way to go; likely making the move to having a single
code in both cases is better even if it is just a matter of deciding
whether to swallow the ctxc or not in the `.cancel_acked` case.

Further teensie improvements:
- obvi improve/simplify log msg contents as in prior patches.
- use the new `maybe_wait_for_debugger(header_msg: str)` if/when waiting
  to exit in debug mode.
- another `hide_tb: bool` frame hider flag.
- rando type-annot updates of course :)
2024-03-02 17:18:55 -05:00
Tyler Goodlet 299429a278 Deep `Context` refinements
Spanning from the pub API, to instance `repr()` customization (for
logging/REPL content), to the impl details around the notion of a "final
outcome" and surrounding IPC msg draining mechanics during teardown.

A few API and field updates:

- new `.cancel_acked: bool` to replace what we were mostly using
  `.cancelled_caught: bool` for but, for purposes of better mapping the
  semantics of remote cancellation of parallel executing tasks; it's set
  only when `.cancel_called` is set and a ctxc arrives with
  a `.canceller` field set to the current actor uid indicating we
  requested and received acknowledgement from the other side's task
  that is cancelled gracefully.

- strongly document and delegate (and prolly eventually remove as a pub
  attr) the `.cancelled_caught` property entirely to the underlying
  `._scope: trio.CancelScope`; the `trio` semantics don't really map
  well to the "parallel with IPC msging"  case in the sense that for
  us it breaks the concept of the ctx/scope closure having "caught"
  something instead of having "received" a msg that the other side has
  "acknowledged" (i.e. which for us is the completion of cancellation).

- new `.__repr__()`/`.__str__()` format that tries to tersely yet
  comprehensively as possible display everything you need to know about
  the 3 main layers of an SC-linked-IPC-context:
  * ipc: the transport + runtime layers net-addressing and prot info.
  * rpc: the specific linked caller-callee task signature details
    including task and msg-stream instances.
  * state: current execution and final outcome state of the task pair.
  * a teensie extra `.repr_rpc` for a condensed rpc signature.

- new `.dst_maddr` to get a `libp2p` style "multi-address" (though right
  now it's just showing the transport layers so maybe we should move to
  to our `Channel`?)

- new public instance-var fields supporting more granular remote
  cancellation/result/error state:
  * `.maybe_error: Exception|None` for any final (remote) error/ctxc
    which computes logic on the values of `._remote_error`/`._local_error`
    to determine the "final error" (if any) on termination.
  * `.outcome` to the final error or result (or `None` if un-terminated)
  * `.repr_outcome()` for a console/logging friendly version of the
    final result or error as needed for the `.__str__()`.

- new private interface bits to support all of ^:
  * a new "no result yet" sentinel value, `Unresolved`, using a module
    level class singleton that `._result` is set too (instead of
    `id(self)`) to both determine if and present when no final result
    from the callee has-yet-been/was delivered (ever).
    => really we should get rid of `.result()` and change it to
    `.wait_for_result()` (or something)u
  * `_final_result_is_set()` predicate to avoid waiting for an already
    delivered result.
  * `._maybe_raise()` proto-impl that we should use to replace all the
    `if re:` blocks it can XD
  * new `._stream: MsgStream|None` for when a stream is opened to aid
    with the state repr mentioned above.

Tweaks to the termination drain loop `_drain_to_final_msg()`:

- obviously (obvi) use all the changes above when determining whether or
  not a "final outcome" has arrived and thus breaking from the loop ;)
  * like the `.outcome` `.maybe_error`  and `._final_ctx_is_set()` in
    the `while` pred expression.

- drop the `_recv_chan.receive_nowait()` + guard logic since it seems
  with all the surrounding (and coming soon) changes to
  `Portal.open_context()` using all the new API stuff (mentioned in
  first bullet set above) we never hit the case of inf-block?

Oh right and obviously a ton of (hopefully improved) logging msg content
changes, commented code removal and detailed comment-docs strewn about!
2024-03-01 22:37:32 -05:00
Tyler Goodlet 28fefe4ffe Make stream draining status logs `.debug()` level 2024-03-01 19:27:10 -05:00
Tyler Goodlet 08a6a51cb8 Add `._implicit_runtime_started` mark, better logs
After some deep logging improvements to many parts of `._runtime`,
I realized a silly detail where we are always waiting on any opened
`local_nursery: ActorNursery` to signal exit from
`Actor._stream_handler()` even in the case of being an implicitly opened
root actor (`open_root_actor()` wasn't called by user/app code) via
`._supervise.open_nursery()`..

So, to address this add a `ActorNursery._implicit_runtime_started: bool`
that can be set and then checked to avoid doing the unnecessary
`.exited.wait()` (and any subsequent warn logging on an exit timeout) in
that special but most common case XD

Matching with other subsys log format refinements, improve readability
and simplicity of the actor-nursery supervisory log msgs, including:
- simplify and/or remove any content that more or less duplicates msg
  content found in emissions from lower-level primitives and sub-systems
  (like `._runtime`, `_context`, `_portal` etc.).
- add a specific `._open_and_supervise_one_cancels_all_nursery()`
  handler block for `ContextCancelled` to log with `.cancel()` level
  noting that the case is a "remote cancellation".
- put the nursery-exit and actor-tree shutdown status into a single msg
  in the `implicit_runtime` case.
2024-03-01 15:44:01 -05:00
Tyler Goodlet 50465d4b34 Spawn naming and log format tweaks
- rename `.soft_wait()` -> `.soft_kill()`
- rename `.do_hard_kill()` -> `.hard_kill()`
- adjust any `trio.Process.__repr__()` log msg contents to have the
  little tree branch prefix: `'|_'`
2024-03-01 11:37:23 -05:00
Tyler Goodlet 4f69af872c Add field-first subproca `.info()` to `._entry` 2024-02-29 20:01:39 -05:00
Tyler Goodlet 9bc6a61c93 Add "fancier" remote-error `.__repr__()`-ing
Our remote error box types `RemoteActorError`, `ContextCancelled` and
`StreamOverrun` needed a console display makeover particularly for
logging content and `repr()` in higher level primitives like `Context`.

This adds a more "dramatic" str-representation to showcase the
underlying boxed traceback content more sensationally (via ascii-art
emphasis) as well as support a more terse `.reprol()` (representation
for one-line) format that can be used for types that track remote
errors/cancels like with `Context._remote_error`.

Impl deats:
- change `RemoteActorError.__repr__()` formatting to show (sub-type
  specific) `.msgdata` fields in a multi-line format (similar to our new
  `.msg.types.Struct` style) followed by some ascii accented delimiter
  lines to emphasize any `.msgdata["tb_str"]` packed by the remote
- for rme and subtypes allow picking the specifically relevant fields
  via a type defined `.reprol_fields: list[str]` and pick for each
  subtype:
   |_ `RemoteActorError.src_actor_uid`
   |_ `ContextCancelled.canceller`
   |_ `StreamOverrun.sender`

- add `.reprol()` to show a `repr()`-on-one-line formatted string that
  can be used by other multi-line-field-`repr()` styled composite types
  as needed in (high level) logging info.
- toss in some mod level `_body_fields: list[str]` for summary of such
  fields (if needed).
- add some new rae (remote-actor-error) props:
  - `.type` around a newly named `.boxed_type`
  - `.type_str: str`
  - `.tb_str: str`
2024-02-29 18:56:31 -05:00
Tyler Goodlet 23aa97692e Fix `Channel.__repr__()` safety, renames to `._transport`
Hit a reallly weird bug in the `._runtime` IPC msg handling loop where
it seems that by `str.format()`-ing a `Channel` before initializing it
would put the `._MsgTransport._agen()` in an already started state
causing an irrecoverable core startup failure..

I presume it's something to do with delegating to the
`MsgpackTCPStream.__repr__()` and, something something.. the
`.set_msg_transport(stream)` getting called to too early such that
`.msgstream.__init__()` is called thus init-ing the `._agen()` before
necessary? I'm sure there's a design lesson to be learned in here
somewhere XD

This was discovered while trying to add more "fancy" logging throughout
said core for the purposes of cobbling together an init attempt at
libp2p style multi-address representations for our IPC primitives. Thus
I also tinker here with adding some new fields to `MsgpackTCPStream`:
- `layer_key`: int = 4
- `name_key`: str = 'tcp'
- `codec_key`: str = 'msgpack'

Anyway, just changed it so that if `.msgstream` ain't set then we just
return a little "null repr" `str` value thinger.

Also renames `Channel.msgstream` internally to `._transport` with
appropriate pub `@property`s added such that everything else won't break
;p

Also drops `Optional` typing vis-a-vi modern union syntax B)
2024-02-29 18:37:04 -05:00
Tyler Goodlet 1e5810e56c Make `NamespacePath` kinda support methods..
Obviously we can't deterministic-ally call `.load_ref()` (since you'd
have to point to an `id()` or something and presume a particular
py-runtime + virt-mem space for it to exist?) but it at least helps with
the `str` formatting for logging purposes (like `._cancel_rpc_tasks()`)
when `repr`-ing ctxs and their specific "rpc signatures".

Maybe in the future getting this working at least for singleton types
per process (like `Actor` XD ) will be a thing we can support and make
some sense of.. Bo
2024-02-29 17:37:02 -05:00
Tyler Goodlet b54cb6682c Add #TODO for generating func-sig type-annots as `str` for pprinting 2024-02-29 17:21:43 -05:00
Tyler Goodlet 3ed309f019 Add test for `modden` sub-spawner-server hangs on cancel
As per a lot of the recent refinements to `Context` cancellation, add
a new test case to replicate the original hang-on-cancel found with
`modden` when using a client actor to spawn a subactor in some other
tree where despite `Context.cancel()` being called the requesting client
would hang on the opened context with the server.

The specific scenario added here is to have,
- root actor spawns 2 children: a client and a spawn server.
- the spawn server opens with a spawn-request serve loop and begins to
  wait for the client.
- client spawns and connects to the sibling spawn server, requests to
  spawn a sub-actor, the "little bro", connects to it then does some
  echo streaming, cancels the request with it's sibling (the spawn
  server) which should in turn cancel the root's-grandchild and result
  in a cancel-ack back to the client's `.open_context()`.
- root ensures that it can also connect to the grandchild (little bro),
  do the same echo streaming, then ensure everything tears down
  correctly after cancelling all the children.

More refinements to come here obvi in the specific cancellation
semantics and possibly causes.

Also tweaks the other tests in suite to use the new `Context` properties
recently introduced and similarly updated in the previous patch to the
ctx-semantics suite.
2024-02-29 15:45:55 -05:00
Tyler Goodlet d08aeaeafe Make `@context`-cancelled tests more pedantic
In order to match a very significant and coming-soon patch set to the
IPC `Context` and `Channel` cancellation semantics with significant but
subtle changes to the primitives and runtime logic:

- a new set of `Context` state pub meth APIs for checking exact
  inter-actor-linked-task outcomes such as `.outcome`, `.maybe_error`,
  and `.cancel_acked`.

- trying to move away from `Context.cancelled_caught` usage since the
  semantics from `trio` don't really map well (in terms of cancel
  requests and how they result in cancel-scope graceful closure) and
  `.cancel_acked: bool` is a better approach for IPC req-resp msging.
  - change test usage to access `._scope.cancelled_caught` directly.

- more pedantic ctxc-raising expects around the "type of self
  cancellation" and final outcome in ctxc cases:
  - `ContextCancelled` is raised by ctx (`Context.result()`) consumer
    methods when `Portal.cancel_actor()` is called (since it's an
    out-of-band request) despite `Channel._cancel_called` being set.
  - also raised by `.open_context().__aexit__()` on close.
  - `.outcome` is always `.maybe_error` is always one of
    `._local/remote_error`.
2024-02-28 19:25:27 -05:00
Tyler Goodlet c6ee4e5dc1 Add a `pytest.ini` config 2024-02-22 20:37:12 -05:00
Tyler Goodlet ad5eee5666 WIP final impl of ctx-cancellation-semantics 2024-02-22 18:33:18 -05:00
Tyler Goodlet fc72d75061 Support `maybe_wait_for_debugger(header_msg: str)`
Allow callers to stick in a header to the `.pdb()` level emitted msg(s)
such that any "waiting status" content is only shown if the caller
actually get's blocked waiting for the debug lock; use it inside the
`._spawn` sub-process reaper call.

Also, return early if `Lock.global_actor_in_debug == None` and thus
only enter the poll loop when actually needed, consequently raise
if we fall through the loop without acquisition.
2024-02-22 15:08:10 -05:00
Tyler Goodlet de1843dc84 Few more log msg tweaks in runtime 2024-02-22 15:06:39 -05:00
Tyler Goodlet 930d498841 Call `actor.cancel(None)` from root to avoid mismatch with (any future) meth sig changes 2024-02-22 14:45:08 -05:00
Tyler Goodlet 5ea112699d Tweak broadcast fanout test to never inf loop
Since a bug in the new `MsgStream.aclose()` impl's drain block logic was
triggering an actual inf loop (by not ever canceller the streamer child
actor), make sure we put a loop limit on the `inf_streamer`()` XD

Also add a bit more deats to the test `print()`s in each actor and toss
in `debug_mode` fixture support.
2024-02-22 14:41:28 -05:00
Tyler Goodlet e244747bc3 Add note that maybe `Context._eoc` should be set by caller? 2024-02-22 14:22:45 -05:00
Tyler Goodlet 5a09ccf459 Tweak `Actor` cancel method signatures
Besides improving a bunch more log msg contents similarly as before this
changes the cancel method signatures slightly with different arg names:

for `.cancel()`:
- instead of `requesting_uid: str` take in a `req_chan: Channel`
  since we can always just read its `.uid: tuple` for logging and
  further we can then offer the `chan=None` case indicating a
  "self cancel" (since there's no "requesting channel").
- the semantics of "requesting" here better indicate that the IPC connection
  is an IPC peer and further (eventually) will allow permission checking
  against given peers for cancellation requests.
- when `chan==None` we also define a meth-internal `requester_type: str`
  differently for logging content :)
- add much more detailed `.cancel()` content around the requester, its
  type, and any debugger related locking steps.

for `._cancel_task()`:
- change the `chan` arg to `parent_chan: Channel` since "parent"
  correctly indicates that the channel is the parent of the locally
  spawned rpc task to cancel; in fact no other chan should be able to
  cancel tasks parented/spawned by other channels obvi!
- also add more extensive meth-internal `.cancel()` logging with a #TODO
  around showing only the "relevant/lasest" `Context` state vars in such
  logging content.

for `.cancel_rpc_tasks()`:
- shorten `requesting_uid` -> `req_uid`.
- add `parent_chan: Channel` to be similar as above in `._cancel_task()`
  (since it's internally delegated to anyway) which replaces the prior
  `only_chan` and use it to filter to only tasks spawned by this channel
  (thus as their "parent") as before.
- instead of `if tasks:` to enter, invert and `return` early on
  `if not tasks`, for less indentation B)
- add WIP str-repr format (for `.cancel()` emissions) to show
  a multi-address (maddr) + task func (via the new `Context._nsf`) and
  report all cancel task targets with it a "tree"; include #TODO to
  finalize and implement some utils for all this!

To match ensure we adjust `process_messages()` self/`Actor` cancel
handling blocks to provide the new `kwargs` (now with `dict`-merge
syntax) to `._invoke()`.
2024-02-22 14:22:08 -05:00
Tyler Goodlet ce1bcf6d36 Fix overruns test to avoid return-beats-ctxc race
Turns out that py3.11 might be so fast that iterating a EoC-ed
`MsgStream` 1k times is faster then a `Context.cancel()` msg
transmission from a parent actor to it's child (which i guess makes
sense). So tweak the test to delay 5ms between stream async-for iteration
attempts when the stream is detected to be `.closed: bool` (coming in
patch) or `ctx.cancel_called == true`.
2024-02-21 13:53:25 -05:00
Tyler Goodlet 28ba5e5435 Add `pformat()` of `ActorNursery._children` to logging
Such that you see the children entries prior to exit instead of the
prior somewhat detail/use-less logging. Also, rename all `anursery` vars
to just `an` as is the convention in most examples.
2024-02-21 13:21:28 -05:00
Tyler Goodlet 10adf34be5 Set any `._eoc` to the err in `_raise_from_no_key_in_msg()`
Since that's what we're now doing in `MsgStream._eoc` internal
assignments (coming in future patch), do the same in this exception
re-raise-helper and include more extensive doc string detailing all
the msg-type-to-raised-error cases. Also expose a `hide_tb: bool` like
we have already in `unpack_error()`.
2024-02-21 13:17:37 -05:00
Tyler Goodlet 82dcaff8db Better logging for cancel requests in IPC msg loop
As similarly improved in other parts of the runtime, adds much more
pedantic (`.cancel()`) logging content to indicate the src of remote
cancellation request particularly for `Actor.cancel()` and
`._cancel_task()` cases prior to `._invoke()` task scheduling. Also add
detailed case comments and much more info to the
"request-to-cancel-already-terminated-RPC-task" log emission to include
the `Channel` and `Context.cid` deats.

This helped me find the src of a race condition causing a test to fail
where a callee ctx task was returning a result *before* an expected
`ctx.cancel()` request arrived B). Adding much more pedantic
`.cancel()` msg contents around the requester's deats should ensure
these cases are much easier to detect going forward!

Also, simplify the `._invoke()` final result/error log msg to only put
*one of either* the final error or returned result above the `Context`
pprint.
2024-02-21 13:05:22 -05:00
Tyler Goodlet 621b252b0c Use `NamespacePath` in `Context` mgmt internals
The only case where we can't is in `Portal.run_from_ns()` usage (since we
pass a path with `self:<Actor.meth>`) and because `.to_tuple()`
internally uses `.load_ref()` which will of course fail on such a path..

So or now impl as,
- mk `Actor.start_remote_task()` take a `nsf: NamespacePath` but also
  offer a `load_nsf: bool = False` such that by default we bypass ref
  loading (maybe this is fine for perf long run as well?) for the
  `Actor`/'self:'` case mentioned above.
- mk `.get_context()` take an instance `nsf` obvi.

More logging msg format tweaks:
- change msg-flow related content to show the `Context._nsf`, which,
  right, is coming follow up commit..
- bunch more `.runtime()` format updates to show `msg: dict` contents
  and internal primitives with trailing `'\n'` for easier reading.
- report import loading `stackscope` in subactors.
2024-02-20 16:15:48 -05:00
Tyler Goodlet 20a089c331 Drop extra "
" when logging actor nursery errors
2024-02-20 15:58:11 -05:00
Tyler Goodlet df50d78042 Fix `.devx.maybe_wait_for_debugger()` polling deats
When entered by the root actor avoid excessive polling cycles by,
- blocking on the `Lock.no_remote_has_tty: trio.Event` and breaking
  *immediately* when set (though we should really also lock
  it from the root right?) to avoid extra loops..
- shielding the `await trio.sleep(poll_delay)` call to avoid any local
  cancellation causing the (presumably root-actor task) caller to move
  on (possibly to cancel its children) and instead to continue
  poll-blocking until the lock is actually released by its user.
- `break` the poll loop immediately if no remote locker is detected.
- use `.pdb()` level for reporting lock state changes.

Also add a #TODO to handle calls by non-root actors as it pertains to
2024-02-20 15:57:31 -05:00
Tyler Goodlet 114ec36436 Add `stackscope` as dep, drop legacy `pdb` issue cruft 2024-02-20 15:29:31 -05:00
Tyler Goodlet 179d7d2b04 Add `NamespacePath._ns` todo for `self:<ns.meth>` support 2024-02-20 15:28:11 -05:00
Tyler Goodlet f568fca98f Emit warning on any `ContextCancelled.canceller == None` 2024-02-20 15:26:14 -05:00
Tyler Goodlet 6c9bc627d8 Make ctx tests support `debug_mode: bool` fixture
Such that with `--tpdb` passed (sub)actors will engage the `pdbp` REPL
automatically and so that we can use the new `stackscope` support when
complex cases hang Bo

Also,
- simplified some type-annots (ns paths),
- doc-ed an inter-peer test func with some ascii msg flows,
- added a bottom #TODO for replicating the scenario i hit in `modden`
  where a separate client actor-tree was hanging on cancelling a `bigd`
  sub-workspace..
2024-02-20 15:14:58 -05:00
Tyler Goodlet 1d7cf7d1dd Enable `stackscope` render via root in debug mode
If `stackscope` is importable and debug_mode is enabled then we by
default call and report `.devx.enable_stack_on_sig()` is set B)

This makes debugging unexpected (SIGINT ignoring) hangs a cinch!
2024-02-20 13:23:16 -05:00
Tyler Goodlet 54a0a0000d .log: more multi-line styling 2024-02-20 13:22:44 -05:00
Tyler Goodlet 0268b2ce91 Better subproc supervisor logging, todo for #320
Given i just similarly revamped a buncha `._runtime` log msg formatting,
might as well do something similar inside the spawning machinery such
that groking teardown sequences of each supervising task is much more
sane XD

Mostly this includes doing similar `'<field>: <value>\n'` multi-line
formatting when reporting various subproc supervision steps as well as
showing a detailed `trio.Process.__repr__()` as appropriate.

Also adds a detailed #TODO according to the needs of #320 for which
we're going to need some internal mechanism for intermediary parent
actors to determine if a given debug tty locker (sub-actor) is one of
*their* (transitive) children and thus stall the normal
cancellation/teardown sequence until that locker is complete.
2024-02-20 13:12:51 -05:00
Tyler Goodlet 81f8e2d4ac _supervise: iter nice expanded multi-line `._children` tups with typing 2024-02-20 09:18:22 -05:00
Tyler Goodlet bf0739c194 Add `stackscope` tree pprinter triggered by SIGUSR1
Can be optionally enabled via a new `enable_stack_on_sig()` which will
swap in the SIGUSR1 handler. Much thanks to @oremanj for writing this
amazing project, it's thus far helped me fix some very subtle hangs
inside our new IPC-context cancellation machinery that would have
otherwise taken much more manual pdb-ing and hair pulling XD

Full credit for `dump_task_tree()` goes to the original project author
with some minor tweaks as was handed to me via the trio-general matrix
room B)

Slight changes from orig version:
- use a `log.pdb()` emission to pprint to console
- toss in an ex sh CLI cmd to trigger the dump from another terminal
  using `kill` + `pgrep`.
2024-02-20 09:05:34 -05:00
Tyler Goodlet 5fe3f58ea9 Add a `debug_mode: bool` fixture via `--tpdb` flag
Allows tests (including any `@tractor_test`s) to subscribe to a CLI flag
`--tpdb` (for "tractor python debugger") which the session can provide
to tests which can then proxy the value to `open_root_actor()` (via
`open_nursery()`) when booting the runtime - thus enabling our debug
mode globally to any subscribers B)

This is real handy if you have some failures but can't determine the
root issue without jumping into a `pdbp` REPL inside a (sub-)actor's
spawned-task.
2024-02-20 08:53:37 -05:00
Tyler Goodlet 3e1d033708 WIP: solved the modden client hang.. 2024-02-19 17:00:46 -05:00
Tyler Goodlet c35576e196 Baboso! fix `chan.send(None)` indent.. 2024-02-19 14:41:03 -05:00
Tyler Goodlet 8ce26d692f Improved log msg formatting in core
As part of solving some final edge cases todo with inter-peer remote
cancellation (particularly a remote cancel from a separate actor
tree-client hanging on the request side in `modden`..) I needed less
dense, more line-delimited log msg formats when understanding ipc
channel and context cancels from console logging; this adds a ton of
that to:
- `._invoke()` which now does,
  - better formatting of `Context`-task info as multi-line
    `'<field>: <value>\n'` messages,
  - use of `trio.Task` (from `.lowlevel.current_task()` for full
    rpc-func namespace-path info,
  - better "msg flow annotations" with `<=` for understanding
    `ContextCancelled` flow.
- `Actor._stream_handler()` where in we break down IPC peers reporting
  better as multi-line `|_<Channel>` log msgs instead of all jammed on
  one line..
- `._ipc.Channel.send()` use `pformat()` for repr of packet.

Also tweak some optional deps imports for debug mode:
- add `maybe_import_gb()` for attempting to import `greenback`.
- maybe enable `stackscope` tree pprinter on `SIGUSR1` if installed.

Add a further stale-debugger-lock guard before removal:
- read the `._debug.Lock.global_actor_in_debug: tuple` uid and possibly
  `maybe_wait_for_debugger()` when the child-user is known to have
  a live process in our tree.
- only cancel `Lock._root_local_task_cs_in_debug: CancelScope` when
  the disconnected channel maps to the `Lock.global_actor_in_debug`,
  though not sure this is correct yet?

Started adding missing type annots in sections that were modified.
2024-02-19 14:00:23 -05:00
Tyler Goodlet 7f29fd8dcf Let `pack_error()` take a msg injected `cid: str|None` 2024-02-18 17:17:31 -05:00
Tyler Goodlet 7fbada8a15 Add `StreamOverrun.sender: tuple` for better handling
Since it's generally useful to know who is the cause of an overrun (say
bc you want your system to then adjust the writer side to slow tf down)
might as well pack an extra `.sender: tuple[str, str]` actor uid field
which can be relayed through `RemoteActorError` boxing. Add an extra
case for the exc-type to `unpack_error()` to match B)
2024-02-16 15:23:02 -05:00
Tyler Goodlet 286e75d342 Offer `unpack_error(hid_tb: bool)` for `pdbp` REPL config 2024-02-14 16:13:32 -05:00
Tyler Goodlet df641d9d31 Bring in pretty-ified `msgspec.Struct` extension
Originally designed and used throughout `piker`, the subtype adds some
handy pprinting and field diffing extras often handy when viewing struct
types in logging or REPL console interfaces B)

Obvi this rejigs the `tractor.msg` mod into a sub-pkg and moves the
existing namespace obj-pointer stuff into a new `.msg.ptr` sub mod.
2024-01-28 16:33:10 -05:00
Tyler Goodlet 35b0c4bef0 Never mask original `KeyError` in portal-error unwrapper, for now? 2024-01-23 11:14:10 -05:00
Tyler Goodlet c4496f21fc Try allowing multi-pops of `_Cache.locks` for now? 2024-01-23 11:13:07 -05:00
Tyler Goodlet 7e0e627921 Use `import <blah> as blah` over `__all__` in `.trionics` 2024-01-23 11:09:38 -05:00
Tyler Goodlet 28ea8e787a Bump timeout on resource cache test a bitty bit. 2024-01-03 22:27:05 -05:00
Tyler Goodlet 0294455c5e `_root`: drop unused `typing` import 2024-01-02 18:43:43 -05:00
Tyler Goodlet 734bc09b67 Move missing-key-in-msg raiser to `._exceptions`
Since we use basically the exact same set of logic in
`Portal.open_context()` when expecting the first `'started'` msg factor
and generalize `._streaming._raise_from_no_yield_msg()` into a new
`._exceptions._raise_from_no_key_in_msg()` (as per the lingering todo)
which obvi requires a more generalized / optional signature including
a caller specific `log` obj. Obvi call the new func from all the other
modules X)
2024-01-02 18:34:15 -05:00
Tyler Goodlet 0bcdea28a0 Fmt repr as multi-line style call 2024-01-02 11:28:55 -05:00
Tyler Goodlet fdf3a1b01b Only use `greenback` if actor-runtime is up.. 2024-01-02 11:28:02 -05:00
Tyler Goodlet ce7b8a5e18 Drop unused walrus assign of `re` 2024-01-02 11:21:20 -05:00
Tyler Goodlet 00024181cd `StackLevelAdapter._log(stacklevel: int)` for custom levels..
Apparently (and i don't know if this was always broken [i feel like no?]
or is a recent change to stdlib's `logging` stuff) we need increment the
`stacklevel` input by one for our custom level methods now? Without this
you're going to see the path to the method's-callstack-frame on every
emission instead of to the caller's. I first noticed this when debugging
the workspace layer spawning in `modden.bigd` and then verified it in
other depended projects..

I guess we should add some tests for this as well XD
2024-01-02 10:38:04 -05:00
Tyler Goodlet 814384848d Use `import <name> as <name>,` style over `__all__` in pkg mod 2024-01-02 10:25:17 -05:00
Tyler Goodlet bea31f6d19 ._child: remove some unused imports.. 2024-01-02 10:24:39 -05:00
Tyler Goodlet 250275d98d Guarding for IPC failures in `._runtime._invoke()`
Took me longer then i wanted to figure out the source of
a failed-response to a remote-cancellation (in this case in `modden`
where a client was cancelling a workspace layer.. but disconnects before
receiving the ack msg) that was triggering an IPC error when sending the
error msg for the cancellation of a `Actor._cancel_task()`, but since
this (non-rpc) `._invoke()` task was trying to send to a now
disconnected canceller it was resulting in a `BrokenPipeError` (or similar)
error.

Now, we except for such IPC errors and only raise them when,
1. the transport `Channel` is for sure up (bc ow what's the point of
   trying to send an error on the thing that caused it..)
2. it's definitely for handling an RPC task

Similarly if the entire main invoke `try:` excepts,
- we only hide the call-stack frame from the debugger (with
  `__tracebackhide__: bool`) if it's an RPC task that has a connected
  channel since we always want to see the frame when debugging internal
  task or IPC failures.
- we don't bother trying to send errors to the context caller (actor)
  when it's a non-RPC request since failures on actor-runtime-internal
  tasks shouldn't really ever be reported remotely, only maybe raised
  locally.

Also some other tidying,
- this properly corrects for the self-cancel case where an RPC context
  is cancelled due to a local (runtime) task calling a method like
  `Actor.cancel_soon()`. We now set our own `.uid` as the
  `ContextCancelled.canceller` value so that other-end tasks know that
  the cancellation was due to a self-cancellation by the actor itself.
  We still need to properly test for this though!
- add a more detailed module doc-str.
- more explicit imports for `trio` core types throughout.
2024-01-02 10:23:45 -05:00
Tyler Goodlet f415fc43ce `.discovery.get_arbiter()`: add warning around this now deprecated usage 2023-12-11 19:37:45 -05:00
Tyler Goodlet 3f15923537 More thurough hard kill doc strings 2023-12-11 18:17:42 -05:00
Tyler Goodlet 87cd725adb Add `open_root_actor(ensure_registry: bool)`
Allows forcing the opened actor to either obtain the passed registry
addrs or raise a runtime error.
2023-11-07 16:45:24 -05:00
Tyler Goodlet 48accbd28f Fix doc string "its" typo.. 2023-11-06 15:44:21 -05:00
Tyler Goodlet 227c9ea173 Test with `any(portals)` since `gather_contexts()` will return `list[None | tuple]` 2023-11-06 15:43:43 -05:00
Tyler Goodlet d651f3d8e9 Tons of interpeer test cleanup
Drop all the nested `@acm` blocks and defunct comments from initial
validations. Add some todos for cases that are still unclear such as
whether the caller / streamer should have `.cancelled_caught == True` in
it's teardown.
2023-10-25 15:21:41 -04:00
Tyler Goodlet ef0cfc4b20 Get inter-peer suite passing with all `Context` state checks!
Definitely needs some cleaning and refinement but this gets us to stage
1 of being pretty frickin correct i'd say 💃
2023-10-23 18:24:23 -04:00
Tyler Goodlet ecb525a2bc Adjust test details where `Context.cancel()` is called
We can now make asserts on `.cancelled_caught` and `_remote_error` vs.
`_local_error`. Expect a runtime error when `Context.open_stream()` is
called AFTER `.cancel()` and the remote `ContextCancelled` hasn't
arrived (yet). Adjust to `'itself'` string in self-cancel case.
2023-10-23 17:49:02 -04:00
Tyler Goodlet b77d123edd Fix `Context.result()` call to be in runtime scope 2023-10-23 17:48:34 -04:00
Tyler Goodlet f4e63465de Tweak `Channel._cancel_called` comment 2023-10-23 17:47:55 -04:00
Tyler Goodlet df31047ecb Be ultra-correct in `Portal.open_context()`
This took way too long to get right but hopefully will give us grok-able
and correct context exit semantics going forward B)

The main fixes were:
- always shielding the `MsgStream.aclose()` call on teardown to avoid
  bubbling a `Cancelled`.
- properly absorbing any `ContextCancelled` in cases due to "self
  cancellation" using the new `Context.canceller` in the logic.
- capturing any error raised by the `Context.result()` call in the
  "normal exit, result received" case and setting it as the
  `Context._local_error` so that self-cancels can be easily measured via
  `Context.cancelled_caught` in same way as remote-error caused
  cancellations.
- extremely detailed comments around all of the cancellation-error cases
  to avoid ever getting confused about the control flow in the future XD
2023-10-23 17:34:28 -04:00
Tyler Goodlet 131674eabd Be mega-pedantic with `ContextCancelled` semantics
As part of extremely detailed inter-peer-actor testing, add much more
granular `Context` cancellation state tracking via the following (new)
fields:
- `.canceller: tuple[str, str]` the uuid of the actor responsible for
  the cancellation condition - always set by
  `Context._maybe_cancel_and_set_remote_error()` and replaces
  `._cancelled_remote` and `.cancel_called_remote`. If set, this value
  should normally always match a value from some `ContextCancelled`
  raised or caught by one side of the context.
- `._local_error` which is always set to the locally raised (and caller
  or callee task's scope-internal) error which caused any
  eventual cancellation/error condition and thus any closure of the
  context's per-task-side-`trio.Nursery`.
- `.cancelled_caught: bool` is now always `True` whenever the local task
  catches (or "silently absorbs") a `ContextCancelled` (a `ctxc`) that
  indeed originated from one of the context's linked tasks or any other
  context which raised its own `ctxc` in the current `.open_context()` scope.
  => whenever there is a case that no `ContextCancelled` was raised
  **in** the `.open_context().__aexit__()` (eg. `ctx.result()` called
  after a call `ctx.cancel()`), we still consider the context's as
  having "caught a cancellation" since the `ctxc` was indeed silently
  handled by the cancel requester; all other error cases are already
  represented by mirroring the state of the `._scope: trio.CancelScope`
  => IOW there should be **no case** where an error is **not raised** in
  the context's scope and `.cancelled_caught: bool == False`, i.e. no
  case where `._scope.cancelled_caught == False and ._local_error is not
  None`!
- always raise any `ctxc` from `.open_stream()` if `._cancel_called ==
  True` - if the cancellation request has not already resulted in
  a `._remote_error: ContextCancelled` we raise a `RuntimeError` to
  indicate improper usage to the guilty side's task code.
- make `._maybe_raise_remote_err()` a sync func and don't raise
  any `ctxc` which is matched against a `.canceller` determined to
  be the current actor, aka a "self cancel", and always set the
  `._local_error` to any such `ctxc`.
- `.side: str` taken from inside `.cancel()` and unused as of now since
  it might be better re-written as a similar `.is_opener() -> bool`?
- drop unused `._started_received: bool`..
- TONS and TONS of detailed comments/docs to attempt to explain all the
  possible cancellation/exit cases and how they should exhibit as either
  silent closes or raises from the `Context` API!

Adjust the `._runtime._invoke()` code to match:
- use `ctx._maybe_raise_remote_err()` in `._invoke()`.
- adjust to new `.canceller` property.
- more type hints.
- better `log.cancel()` msging around self-cancels vs. peer-cancels.
- always set the `._local_error: BaseException` for the "callee" task
  just like `Portal.open_context()` now will do B)

Prior we were raising any `Context._remote_error` directly and doing
(more or less) the same `ContextCancelled` "absorbing" logic (well
kinda) in block; instead delegate to the method
2023-10-23 16:24:54 -04:00
Tyler Goodlet 5a94e8fb5b Raise a `MessagingError` from the src error on msging edge cases 2023-10-23 14:34:12 -04:00
Tyler Goodlet 0518b3ab04 Move `MessagingError` into `._exceptions` set 2023-10-23 14:17:36 -04:00
Tyler Goodlet 2f0bed3018 Ignore `greenback` import error if not installed 2023-10-19 12:41:15 -04:00
Tyler Goodlet 9da3b63644 Change remaining internals to use `Actor.reg_addrs` 2023-10-19 12:40:37 -04:00
Tyler Goodlet 1d6f55543d Expose per-actor registry addrs via `.reg_addrs`
Since it's handy to be able to debug the *writing* of this instance var
(particularly when checking state passed down to a child in
`Actor._from_parent()`), rename and wrap the underlying
`Actor._reg_addrs` as a settable `@property` and add validation to
the `.setter` for sanity - actor discovery is a critical functionality.

Other tweaks:
- fix `.cancel_soon()` to pass expected argument..
- update internal runtime error message to be simpler and link to GH issues.
- use new `Actor.reg_addrs` throughout core.
2023-10-19 12:38:27 -04:00
Tyler Goodlet a3ed30e62b Get remaining suites passing..
..by ensuring `reg_addr` fixture value passthrough to subactor eps
2023-10-19 11:51:47 -04:00
Tyler Goodlet 42d621bba7 Always dynamically re-read the `._root._default_lo_addrs` value in `find_actor()` 2023-10-18 19:10:04 -04:00
Tyler Goodlet 2e81ccf5b4 Dump `.msgdata` in `RemoteActorError.__repr__()` 2023-10-18 19:09:07 -04:00
Tyler Goodlet 022bf8ce75 Ensure `registry_addrs` is always set to something 2023-10-18 19:08:35 -04:00
Tyler Goodlet 0e9457299c Port all tests to new `reg_addr` fixture name 2023-10-18 15:39:20 -04:00
Tyler Goodlet 6b1ceee19f Type out the full-fledged streaming ex. 2023-10-18 15:36:00 -04:00
Tyler Goodlet 1e689ee701 Rename fixture `arb_addr` -> `reg_addr` and set the session value globally as `._root._default_lo_addrs` 2023-10-18 15:35:35 -04:00
Tyler Goodlet 190845ce1d Add masked super timeout line to `do_hard_kill()` for would-be runtime hackers 2023-10-18 15:29:43 -04:00
Tyler Goodlet 0c74b04c83 Facepalm, `wait_for_actor()` dun take an addr `list`.. 2023-10-18 15:22:54 -04:00
Tyler Goodlet 215fec1d41 Change old `._debug._pause()` name, cherry to #362 re `greenback` 2023-10-18 15:01:04 -04:00
Tyler Goodlet fcc8cee9d3 ._root: set a `_default_lo_addrs` and apply it when not provided by caller 2023-10-18 14:12:58 -04:00
Tyler Goodlet ca3f7a1b6b Add a first serious inter-peer remote cancel suite
Tests that appropriate `Context` exit state, the relay of
a `ContextCancelled` error and its `.canceller: tuple[str, str]` value
are set when an inter-peer cancellation happens via an "out of band"
request method (in this case using `Portal.cancel_actor()` and that
cancellation is propagated "horizontally" to other peers. Verify that
any such cancellation scenario which also experiences an "error during
`ContextCancelled` handling" DOES NOT result in that further error being
suppressed and that the user's exception bubbles out of the
`Context.open_context()` block(s) appropriately!

Likely more tests to come as well as some factoring of the teardown
state checks where possible.

Pertains to serious testing the major work landing in #357
2023-10-18 13:59:08 -04:00
Tyler Goodlet 87c1113de4 Always set default reg addr in `find_actor()` if not defined 2023-10-18 13:20:29 -04:00
Tyler Goodlet 43b659dbe4 Tidy/clarify another `._runtime` comment 2023-10-18 13:19:34 -04:00
Tyler Goodlet 63b1488ab6 Get mega-pedantic in `Portal.open_context()`
Specifically in the `.__aexit__()` phase to ensure remote,
runtime-internal, and locally raised error-during-cancelled-handling
exceptions are NEVER masked by a local `ContextCancelled` or any
exception group of `trio.Cancelled`s.

Also adds a ton of details to doc strings including extreme detail
surrounding the `ContextCancelled` raising cases and their processing
inside `.open_context()`'s exception handler blocks.

Details, details:
- internal rename `err`/`_err` stuff to just be `scope_err` since it's
  effectively the error bubbled up from the context's surrounding (and
  cross-actor) "scope".
- always shield `._recv_chan.aclose()` to avoid any `Cancelled` from
  masking the `scope_err` with a runtime related `trio.Cancelled`.
- explicitly catch the specific set of `scope_err: BaseException` that
  we can reasonably expect to handle instead of the catch-all parent
  type including exception groups, cancels and KBIs.
2023-10-18 13:18:29 -04:00
Tyler Goodlet 7eb31f3fea Runtime import `.get_root()` in stdin hijacker to avoid import cycle 2023-10-17 16:52:31 -04:00
Tyler Goodlet 534e5d150d Drop `msg` kwarg from `Context.cancel()`
Well first off, turns out it's never used and generally speaking
doesn't seem to help much with "runtime hacking/debugging"; why would
we need to "fabricate" a msg when `.cancel()` is called to self-cancel?

Also (and since `._maybe_cancel_and_set_remote_error()` now takes an
`error: BaseException` as input and thus expects error-msg unpacking
prior to being called), we now manually set `Context._cancel_msg: dict`
just prior to any remote error assignment - so any case where we would
have fabbed a "cancel msg" near calling `.cancel()`, just do the manual
assign.

In this vein some other subtle changes:
- obviously don't set `._cancel_msg` in `.cancel()` since it's no longer
  an input.
- generally do walrus-style `error := unpack_error()` before applying
  and setting remote error-msg state.
- always raise any `._remote_error` in `.result()` instead of returning
  the exception instance and check before AND after the underlying mem
  chan read.
- add notes/todos around `raise self._remote_error from None` masking of
  (runtime) errors in `._maybe_raise_remote_err()` and use it inside
  `.result()` since we had the inverse duplicate logic there anyway..

Further, this adds and extends a ton of (internal) interface docs and
details comments around the `Context` API including many subtleties
pertaining to calling `._maybe_cancel_and_set_remote_error()`.
2023-10-17 16:50:52 -04:00
Tyler Goodlet e4a6223256 `._exceptions`: typing and error unpacking updates
Bump type annotations to 3.10+ style throughout module as well as fill
out doc strings a bit. Inside `unpack_error()` pop any `error_dict: dict`
and,
- return `None` early if not found,
- versus pass directly as `**error_dict` to the error constructor
  instead of a double field read.
2023-10-16 16:23:30 -04:00
Tyler Goodlet ab2664da70 Runtime level log on debug REPL exits 2023-10-16 15:46:21 -04:00
Tyler Goodlet ae326cbb9a Ignore kbis in `open_crash_handler()` by default 2023-10-16 15:45:34 -04:00
Tyler Goodlet 07cec02303 Add comments around diff between `C/context` refs 2023-10-16 15:45:02 -04:00
Tyler Goodlet 2fdb8fc25a Factor non-yield stream msg processing into helper
Since both `MsgStream.receive()` and `.receive_nowait()` need the same
raising logic when a non-stream msg arrives (so that maybe an
appropriate IPC translated error can be raised) move the `KeyError`
handler code into a new `._streaming._raise_from_no_yield_msg()` func
and call it from both methods to make the error-interface-raising
symmetrical across both methods.
2023-10-16 15:35:16 -04:00
Tyler Goodlet 6d951c526a Comment all `.pause(shield=True)` attempts again, need to solve cancel scope `.__exit__()` frame hiding issue.. 2023-10-10 09:55:11 -04:00
Tyler Goodlet 575a24adf1 Always raise remote (cancelled) error if set
Previously we weren't raising a remote error if the local scope was
cancelled during a call to `Context.result()` which is problematic if
the caller WAS NOT the requester for said remote cancellation; in that
case we still want a `ContextCancelled` raised with the `.canceller:
str` set to the cancelling actor uid.

Further fix a naming bug where the (seemingly older) `._remote_err` was
being set to such an error instead of `._remote_error` XD
2023-10-10 09:45:49 -04:00
Tyler Goodlet 919e462f88 Write more comprehensive `Portal.cancel_actor()` doc str 2023-10-08 15:57:18 -04:00
Tyler Goodlet a09b8560bb Oof, default reg addrs needs to be in `list[tuple]` form.. 2023-10-07 18:52:37 -04:00
Tyler Goodlet c4cd573b26 Drop pause line from ctx cancel handler block in test 2023-10-07 18:51:59 -04:00
Tyler Goodlet d24a9e158f Msg-ified `ContextCancelled`s sub-error type should always be just, its type.. 2023-10-07 18:51:03 -04:00
Tyler Goodlet 18a1634025 Add shielding support to `.pause()`
Implement it like you'd expect using simply a wrapping
`trio.CancelScope` which is itself shielded by the input `shield: bool`
B)

There's seemingly still some issues with the frame selection when the
REPL engages and not sure how to resolve it yet but at least this does
indeed work for practical purposes. Still needs a test obviously!
2023-10-06 15:49:23 -04:00
Tyler Goodlet 78c0d2b234 Start inter-peer cancellation test mod
Move over relevant test from the "context semantics" test module which
was already verifying peer-caused-`ContextCancelled.canceller: tuple`
error info and propagation during an inter-peer cancellation scenario.

Also begin a more general set of inter-peer cancellation tests starting
with the simplest case where when a peer is cancelled the parent should
NOT get an "muted" `trio.Cancelled` and instead
a `tractor.ContextCancelled` with a `.canceller: tuple` which points to
the sibling actor which requested the peer cancel.
2023-10-06 15:44:26 -04:00
Tyler Goodlet 4314a59327 Add post-mortem catch around failed transport addr binds to aid with runtime debugging 2023-10-03 10:54:46 -04:00
Tyler Goodlet e94f1261b5 Move `maybe_open_crash_handler()` CLI `--pdb`-driven wrapper to debug mod 2023-10-02 18:10:34 -04:00
Tyler Goodlet 86da79a854 Rename to `parse_maddr()` and fill out doc strings 2023-09-29 14:49:18 -04:00
Tyler Goodlet de89e3a9c4 Add libp2p style "multi-address" parser from `piker`
Details are in the module docs; this is a first draft with lotsa room
for refinement and extension.
2023-09-29 14:11:31 -04:00
Tyler Goodlet 7bed470f5c Start `.devx.cli` extensions for pop CLI frameworks
Starting of with just a `typer` (and thus transitively `click`)
`typer.Typer.callback` hook which allows passthrough of the `--ll
<loglevel: str>` and `--pdb <debug_mode: bool>` flags for use when
building CLIs that use the runtime Bo

Still needs lotsa refinement and obviously better docs but, the doc
string for `load_runtime_vars()` shows how to use the underlying
`.devx._debug.open_crash_handler()` via a wrapper that can be passed the
`--pdb` flag and then enable debug mode throughout the entire actor
system.
2023-09-28 15:36:24 -04:00
Tyler Goodlet fa9a9cfb1d Kick off `.devx` subpkg for our dev tools B)
Where `.devx` is "developer experience", a hopefully broad enough subpkg
name for all the slick stuff planned to augment working on the actor
runtime 💥

Move the `._debug` module into the new subpkg and adjust rest of core
code base to reflect import path change. Also add a new
`.devx._debug.open_crash_handler()` manager for wrapping any sync code
outside a `trio.run()` which is handy for eventual CLI addons for
popular frameworks like `click`/`typer`.
2023-09-28 14:14:50 -04:00
Tyler Goodlet 3d0e95513c Init-support for "multi homed" transports
Since we'd like to eventually allow a diverse set of transport
(protocol) methods and stacks, and a multi-peer discovery system for
distributed actor-tree applications, this reworks all runtime internals
to support multi-homing for any given tree on a logical host. In other
words any actor can now bind its transport server (currently only
unsecured TCP + `msgspec`) to more then one address available in its
(linux) network namespace. Further, registry actors (now dubbed
"registars" instead of "arbiters") can also similarly bind to multiple
network addresses and provide discovery services to remote actors via
multiple addresses which can now be provided at runtime startup.

Deats:
- adjust `._runtime` internals to use a `list[tuple[str, int]]` (and
  thus pluralized) socket address sequence where applicable for transport
  server socket binds, now exposed via `Actor.accept_addrs`:
  - `Actor.__init__()` now takes a `registry_addrs: list`.
  - `Actor.is_arbiter` -> `.is_registrar`.
  - `._arb_addr` -> `._reg_addrs: list[tuple]`.
  - always reg and de-reg from all registrars in `async_main()`.
  - only set the global runtime var `'_root_mailbox'` to the loopback
    address since normally all in-tree processes should have access to
    it, right?
  - `._serve_forever()` task now takes `listen_sockaddrs: list[tuple]`
- make `open_root_actor()` take a `registry_addrs: list[tuple[str, int]]`
  and defaults when not passed.
- change `ActorNursery.start_..()` methods take `bind_addrs: list` and
  pass down through the spawning layer(s) via the parent-seed-msg.
- generalize all `._discovery()` APIs to accept `registry_addrs`-like
  inputs and move all relevant subsystems to adopt the "registry" style
  naming instead of "arbiter":
  - make `find_actor()` support batched concurrent portal queries over
    all provided input addresses using `.trionics.gather_contexts()` Bo
  - syntax: move to using `async with <tuples>` 3.9+ style chained
    @acms.
  - a general modernization of the code to a python 3.9+ style.
  - start deprecation and change to "registry" naming / semantics:
    - `._discovery.get_arbiter()` -> `.get_registry()`
2023-09-27 16:25:21 -04:00
Tyler Goodlet ee151b00af Mk `gather_contexts()` support `@acm`s yielding `None`
We were using a `all(<yielded values>)` condition which obviously won't
work if the batched managers yield any non-truthy value. So instead see
the `unwrapped: dict` with the `id(mngrs)` and only unblock once all
values have been filled in to be something that is not that value.
2023-09-27 14:05:22 -04:00
Tyler Goodlet 22c14e235e Expose `Channel` @ pkg level, drop `_debug.pp()` alias 2023-08-18 10:18:25 -04:00
Tyler Goodlet 1102843087 Teensie tidy up on actor doc string 2023-08-18 10:10:36 -04:00
Tyler Goodlet e03bec5efc Move `.to_asyncio` to modern optional value type annots 2023-07-21 15:08:46 -04:00
Tyler Goodlet bee2c36072 Make `NamespacePath` work on object refs
Detect if the input ref is a non-func (like an `object` instance) in
which case grab its type name using `type()`. Wrap all the name-getting
into a new `_mk_fqpn()` static meth: gets the "fully qualified path
name" and returns path and name in tuple; port other methds to use it.
Refine and update the docs B)
2023-07-12 13:07:30 -04:00
Tyler Goodlet b36b3d522f Map `breakpoint()` built-in to new `.pause_from_sync()` ep 2023-07-07 15:35:52 -04:00
Tyler Goodlet 4ace8f6037 Fix frame-selection display on first REPL entry
For whatever reason pdb(p), and in general, will show the frame of the
*next* python instruction/LOC on initial entry (at least using
`.set_trace()`), as such remove the `try/finally` block in the sync
code entrypoint `.pause_from_sync()`, and also since doesn't seem like
we really need it anyway.

Further, and to this end:
- enable hidden frames support in our default config.
- fix/drop/mask all the frame ref-ing/mangling we had prior since it's no
  longer needed as well as manual `Lock` releasing which seems to work
  already by having the `greenback` spawned task do it's normal thing?
- move to no `Union` type annots.
- hide all frames that can add "this is the runtime confusion" to
  traces.
2023-07-07 14:51:44 -04:00
Tyler Goodlet 98a7326c85 ._runtime: log level tweaks, use crit for stale debug lock detection 2023-07-07 14:49:23 -04:00
Tyler Goodlet 46972df041 .log: more correct handling for `get_logger(__name__)` usage 2023-07-07 14:48:37 -04:00
Tyler Goodlet 565d7c3ee5 Add longer "required reading" list B) 2023-07-07 14:47:42 -04:00
Tyler Goodlet ac695a05bf Updates from latest `piker.data._sharedmem` changes 2023-06-22 17:16:17 -04:00
Tyler Goodlet fc56971a2d First proto: use `greenback` for sync func breakpointing
This works now for supporting a new `tractor.pause_from_sync()`
`tractor`-aware-replacement for `Pdb.set_trace()` from sync functions
which are also scheduled from our runtime. Uses `greenback` to do all
the magic of scheduling the bg `tractor._debug._pause()` task and
engaging the normal TTY locking machinery triggered by `await
tractor.breakpoint()`

Further this starts some public API renaming, making a switch to
`tractor.pause()` from `.breakpoint()` which IMO much better expresses
the semantics of the runtime intervention required to suffice
multi-process "breakpointing"; it also is an alternate name for the same
in computer science more generally: https://en.wikipedia.org/wiki/Breakpoint
It also avoids using the same name as the `breakpoint()` built-in which
is important since there **is alot more going on** when you call our
equivalent API.

Deats of that:
- add deprecation warning for `tractor.breakpoint()`
- add `tractor.pause()` and a shorthand, easier-to-type, alias `.pp()`
  for "pause-point" B)
- add `pause_from_sync()` as the new `breakpoint()`-from-sync-function
  hack which does all the `greenback` stuff for the user.

Still TODO:
- figure out where in the runtime and when to call
  `greenback.ensure_portal()`.
- fix the frame selection issue where
  `trio._core._ki._ki_protection_decorator:wrapper` seems to be always
  shown on REPL start as the selected frame..
2023-06-21 16:08:18 -04:00
Tyler Goodlet ee87cf0e29 Add a debug-mode-breakpoint-causes-hang case!
Only found this by luck more or less (while working on something in
a client project) and it turns out we can actually get to (yet another)
hang state where SIGINT will be ignored by the root actor on teardown..

I've added all the necessary logic flags to reproduce. We obviously need
a follow up bug issue and a test suite to replicate!

It appears as though the following are required based on very light
tinkering:
- infected asyncio mode active
- debug mode active
- the `trio` context must breakpoint *before* `.started()`-ing
- the `asyncio` must **not** error
2023-06-21 14:07:31 -04:00
Tyler Goodlet ebcb275cd8 Add (first-draft) infected-`asyncio` actor task uses debugger example 2023-06-21 14:07:31 -04:00
Tyler Goodlet f745da9fb2 Add `numpy` for testing optional integrated shm API layer 2023-06-15 12:20:20 -04:00
Tyler Goodlet 4f442efbd7 Pass `str` dtype for `use_str` case 2023-06-15 12:20:20 -04:00
Tyler Goodlet f9a84f0732 Allocate size-specced "empty" sequence from default values by type 2023-06-15 12:20:20 -04:00
Tyler Goodlet e0bf964ff0 Mod define `_USE_POSIX`, add a of of todos 2023-06-15 12:20:20 -04:00
Tyler Goodlet a9fc4c1b91 Parametrize rw test with variable frame sizes
Demonstrates fixed size frame-oriented reads by the child where the
parent only transmits a "read" stream msg on "frame fill events" such
that the child incrementally reads the shm list data (much like in
a real-time-buffered streaming system).
2023-06-15 12:20:20 -04:00
Tyler Goodlet b52ff270c5 Add `ShmList` slice support in `.__getitem__()` 2023-06-15 12:20:20 -04:00
Tyler Goodlet 1713ecd9f8 Rename token type to `NDToken` in the style of `nptyping` 2023-06-15 12:20:20 -04:00
Tyler Goodlet edb82fdd78 Don't require runtime (for now), type annot fixing 2023-06-15 12:20:20 -04:00
Tyler Goodlet 339d787cf8 Add repetitive attach to existing segment test 2023-06-15 12:20:20 -04:00
Tyler Goodlet c32b21b4b1 Add initial readers-writer shm list tests 2023-06-15 12:20:20 -04:00
Tyler Goodlet 71477290fc Add `ShmList` wrapping the stdlib's `ShareableList`
First attempt at getting `multiprocessing.shared_memory.ShareableList`
working; we wrap the stdlib type with a readonly attr and a `.key` for
cross-actor lookup. Also, rename all `numpy` specific routines to have
a `ndarray` suffix in the func names.
2023-06-15 12:20:20 -04:00
Tyler Goodlet 9716d86825 Initial module import from `piker.data._sharemem`
More or less a verbatim copy-paste minus some edgy variable naming and
internal `piker` module imports. There is a bunch of OHLC related
defaults that need to be dropped and we need to adjust to an optional
dependence on `numpy` by supporting shared lists as per the mp docs.
2023-06-15 12:20:20 -04:00
27 changed files with 1257 additions and 332 deletions

View File

@ -6,7 +6,6 @@ been an outage) and we want to ensure that despite being in debug mode
actor tree will eventually be cancelled without leaving any zombies. actor tree will eventually be cancelled without leaving any zombies.
''' '''
from contextlib import asynccontextmanager as acm
from functools import partial from functools import partial
from tractor import ( from tractor import (
@ -18,7 +17,6 @@ from tractor import (
_testing, _testing,
) )
import trio import trio
import pytest
async def break_ipc( async def break_ipc(
@ -43,13 +41,6 @@ async def break_ipc(
await stream.aclose() await stream.aclose()
method: str = method or def_method method: str = method or def_method
print(
'#################################\n'
'Simulating CHILD-side IPC BREAK!\n'
f'method: {method}\n'
f'pre `.aclose()`: {pre_close}\n'
'#################################\n'
)
match method: match method:
case 'trans_aclose': case 'trans_aclose':
@ -89,17 +80,17 @@ async def break_ipc_then_error(
break_ipc_with: str|None = None, break_ipc_with: str|None = None,
pre_close: bool = False, pre_close: bool = False,
): ):
await break_ipc(
stream=stream,
method=break_ipc_with,
pre_close=pre_close,
)
async for msg in stream: async for msg in stream:
await stream.send(msg) await stream.send(msg)
await break_ipc(
assert 0 stream=stream,
method=break_ipc_with,
pre_close=pre_close,
)
assert 0
# async def close_stream_and_error(
async def iter_ipc_stream( async def iter_ipc_stream(
stream: MsgStream, stream: MsgStream,
break_ipc_with: str|None = None, break_ipc_with: str|None = None,
@ -108,6 +99,20 @@ async def iter_ipc_stream(
async for msg in stream: async for msg in stream:
await stream.send(msg) await stream.send(msg)
# wipe out channel right before raising
# await break_ipc(
# stream=stream,
# method=break_ipc_with,
# pre_close=pre_close,
# )
# send channel close msg at SC-prot level
#
# TODO: what should get raised here if anything?
# await stream.aclose()
# assert 0
@context @context
async def recv_and_spawn_net_killers( async def recv_and_spawn_net_killers(
@ -129,16 +134,14 @@ async def recv_and_spawn_net_killers(
async for i in stream: async for i in stream:
print(f'child echoing {i}') print(f'child echoing {i}')
await stream.send(i) await stream.send(i)
if ( if (
break_ipc_after break_ipc_after
and and
i >= break_ipc_after i > break_ipc_after
): ):
n.start_soon( '#################################\n'
iter_ipc_stream, 'Simulating CHILD-side IPC BREAK!\n'
stream, '#################################\n'
)
n.start_soon( n.start_soon(
partial( partial(
break_ipc_then_error, break_ipc_then_error,
@ -146,23 +149,10 @@ async def recv_and_spawn_net_killers(
pre_close=pre_close, pre_close=pre_close,
) )
) )
n.start_soon(
iter_ipc_stream,
@acm stream,
async def stuff_hangin_ctlc(timeout: float = 1) -> None: )
with trio.move_on_after(timeout) as cs:
yield timeout
if cs.cancelled_caught:
# pretend to be a user seeing no streaming action
# thinking it's a hang, and then hitting ctl-c..
print(
f"i'm a user on the PARENT side and thingz hangin "
f'after timeout={timeout} ???\n\n'
'MASHING CTlR-C..!?\n'
)
raise KeyboardInterrupt
async def main( async def main(
@ -179,6 +169,9 @@ async def main(
) -> None: ) -> None:
# from tractor._state import _runtime_vars as rtv
# rtv['_debug_mode'] = debug_mode
async with ( async with (
open_nursery( open_nursery(
start_method=start_method, start_method=start_method,
@ -197,11 +190,10 @@ async def main(
) )
async with ( async with (
stuff_hangin_ctlc(timeout=2) as timeout,
_testing.expect_ctxc( _testing.expect_ctxc(
yay=( yay=(
break_parent_ipc_after break_parent_ipc_after
or break_child_ipc_after or break_child_ipc_after,
), ),
# TODO: we CAN'T remove this right? # TODO: we CAN'T remove this right?
# since we need the ctxc to bubble up from either # since we need the ctxc to bubble up from either
@ -213,14 +205,12 @@ async def main(
# and KBI in an eg? # and KBI in an eg?
reraise=True, reraise=True,
), ),
portal.open_context( portal.open_context(
recv_and_spawn_net_killers, recv_and_spawn_net_killers,
break_ipc_after=break_child_ipc_after, break_ipc_after=break_child_ipc_after,
pre_close=pre_close, pre_close=pre_close,
) as (ctx, sent), ) as (ctx, sent),
): ):
rx_eoc: bool = False
ipc_break_sent: bool = False ipc_break_sent: bool = False
async with ctx.open_stream() as stream: async with ctx.open_stream() as stream:
for i in range(1000): for i in range(1000):
@ -238,7 +228,6 @@ async def main(
'#################################\n' '#################################\n'
) )
# TODO: other methods? see break func above.
# await stream._ctx.chan.send(None) # await stream._ctx.chan.send(None)
# await stream._ctx.chan.transport.stream.send_eof() # await stream._ctx.chan.transport.stream.send_eof()
await stream._ctx.chan.transport.stream.aclose() await stream._ctx.chan.transport.stream.aclose()
@ -262,12 +251,10 @@ async def main(
# TODO: is this needed or no? # TODO: is this needed or no?
raise raise
# timeout: int = 1 timeout: int = 1
# with trio.move_on_after(timeout) as cs: print(f'Entering `stream.receive()` with timeout={timeout}\n')
async with stuff_hangin_ctlc() as timeout: with trio.move_on_after(timeout) as cs:
print(
f'PARENT `stream.receive()` with timeout={timeout}\n'
)
# NOTE: in the parent side IPC failure case this # NOTE: in the parent side IPC failure case this
# will raise an ``EndOfChannel`` after the child # will raise an ``EndOfChannel`` after the child
# is killed and sends a stop msg back to it's # is killed and sends a stop msg back to it's
@ -279,30 +266,23 @@ async def main(
f'{rx}\n' f'{rx}\n'
) )
except trio.EndOfChannel: except trio.EndOfChannel:
rx_eoc: bool = True
print('MsgStream got EoC for PARENT') print('MsgStream got EoC for PARENT')
raise raise
print( if cs.cancelled_caught:
'Streaming finished and we got Eoc.\n' # pretend to be a user seeing no streaming action
'Canceling `.open_context()` in root with\n' # thinking it's a hang, and then hitting ctl-c..
'CTlR-C..' print(
) f"YOO i'm a PARENT user anddd thingz hangin..\n"
if rx_eoc: f'after timeout={timeout}\n'
assert stream.closed )
try:
await stream.send(i)
pytest.fail('stream not closed?')
except (
trio.ClosedResourceError,
trio.EndOfChannel,
) as send_err:
if rx_eoc:
assert send_err is stream._eoc
else:
assert send_err is stream._closed
raise KeyboardInterrupt print(
"YOO i'm mad!\n"
'The send side is dun but thingz hangin..\n'
'MASHING CTlR-C Ctl-c..'
)
raise KeyboardInterrupt
if __name__ == '__main__': if __name__ == '__main__':

View File

@ -1,9 +0,0 @@
'''
Reproduce a bug where enabling debug mode for a sub-actor actually causes
a hang on teardown...
'''
import asyncio
import trio
import tractor

View File

@ -8,10 +8,7 @@ This uses no extra threads, fancy semaphores or futures; all we need
is ``tractor``'s channels. is ``tractor``'s channels.
""" """
from contextlib import ( from contextlib import asynccontextmanager
asynccontextmanager as acm,
aclosing,
)
from typing import Callable from typing import Callable
import itertools import itertools
import math import math
@ -19,6 +16,7 @@ import time
import tractor import tractor
import trio import trio
from async_generator import aclosing
PRIMES = [ PRIMES = [
@ -46,7 +44,7 @@ async def is_prime(n):
return True return True
@acm @asynccontextmanager
async def worker_pool(workers=4): async def worker_pool(workers=4):
"""Though it's a trivial special case for ``tractor``, the well """Though it's a trivial special case for ``tractor``, the well
known "worker pool" seems to be the defacto "but, I want this known "worker pool" seems to be the defacto "but, I want this

View File

@ -13,7 +13,7 @@ async def simple_rpc(
''' '''
# signal to parent that we're up much like # signal to parent that we're up much like
# ``trio.TaskStatus.started()`` # ``trio_typing.TaskStatus.started()``
await ctx.started(data + 1) await ctx.started(data + 1)
async with ctx.open_stream() as stream: async with ctx.open_stream() as stream:

View File

@ -1,8 +0,0 @@
# vim: ft=ini
# pytest.ini for tractor
[pytest]
# don't show frickin captured logs AGAIN in the report..
addopts = --show-capture='no'
log_cli = false
; minversion = 6.0

View File

@ -6,3 +6,4 @@ mypy
trio_typing trio_typing
pexpect pexpect
towncrier towncrier
numpy

View File

@ -46,11 +46,10 @@ setup(
# trio related # trio related
# proper range spec: # proper range spec:
# https://packaging.python.org/en/latest/discussions/install-requires-vs-requirements/#id5 # https://packaging.python.org/en/latest/discussions/install-requires-vs-requirements/#id5
'trio >= 0.24', 'trio >= 0.22',
'async_generator',
# 'async_generator', # in stdlib mostly! 'trio_typing',
# 'trio_typing', # trio==0.23.0 has type hints! 'exceptiongroup',
# 'exceptiongroup', # in stdlib as of 3.11!
# tooling # tooling
'stackscope', 'stackscope',

View File

@ -41,22 +41,43 @@ no_windows = pytest.mark.skipif(
def pytest_addoption(parser): def pytest_addoption(parser):
parser.addoption( parser.addoption(
"--ll", action="store", dest='loglevel', "--ll",
action="store",
dest='loglevel',
default='ERROR', help="logging level to set when testing" default='ERROR', help="logging level to set when testing"
) )
parser.addoption( parser.addoption(
"--spawn-backend", action="store", dest='spawn_backend', "--spawn-backend",
action="store",
dest='spawn_backend',
default='trio', default='trio',
help="Processing spawning backend to use for test run", help="Processing spawning backend to use for test run",
) )
parser.addoption(
"--tpdb", "--debug-mode",
action="store_true",
dest='tractor_debug_mode',
# default=False,
help=(
'Enable a flag that can be used by tests to to set the '
'`debug_mode: bool` for engaging the internal '
'multi-proc debugger sys.'
),
)
def pytest_configure(config): def pytest_configure(config):
backend = config.option.spawn_backend backend = config.option.spawn_backend
tractor._spawn.try_set_start_method(backend) tractor._spawn.try_set_start_method(backend)
@pytest.fixture(scope='session')
def debug_mode(request):
return request.config.option.tractor_debug_mode
@pytest.fixture(scope='session', autouse=True) @pytest.fixture(scope='session', autouse=True)
def loglevel(request): def loglevel(request):
orig = tractor.log._default_loglevel orig = tractor.log._default_loglevel

View File

@ -85,8 +85,8 @@ def test_ipc_channel_break_during_stream(
''' '''
if spawn_backend != 'trio': if spawn_backend != 'trio':
if debug_mode: # if debug_mode:
pytest.skip('`debug_mode` only supported on `trio` spawner') # pytest.skip('`debug_mode` only supported on `trio` spawner')
# non-`trio` spawners should never hit the hang condition that # non-`trio` spawners should never hit the hang condition that
# requires the user to do ctl-c to cancel the actor tree. # requires the user to do ctl-c to cancel the actor tree.
@ -107,10 +107,7 @@ def test_ipc_channel_break_during_stream(
# AND we tell the child to call `MsgStream.aclose()`. # AND we tell the child to call `MsgStream.aclose()`.
and pre_aclose_msgstream and pre_aclose_msgstream
): ):
# expect_final_exc = trio.EndOfChannel expect_final_exc = trio.EndOfChannel
# ^XXX NOPE! XXX^ since now `.open_stream()` absorbs this
# gracefully!
expect_final_exc = KeyboardInterrupt
# NOTE when ONLY the child breaks or it breaks BEFORE the # NOTE when ONLY the child breaks or it breaks BEFORE the
# parent we expect the parent to get a closed resource error # parent we expect the parent to get a closed resource error
@ -123,25 +120,11 @@ def test_ipc_channel_break_during_stream(
and and
ipc_break['break_parent_ipc_after'] is False ipc_break['break_parent_ipc_after'] is False
): ):
# NOTE: we DO NOT expect this any more since expect_final_exc = trio.ClosedResourceError
# the child side's channel will be broken silently
# and nothing on the parent side will indicate this!
# expect_final_exc = trio.ClosedResourceError
# NOTE: child will send a 'stop' msg before it breaks # if child calls `MsgStream.aclose()` then expect EoC.
# the transport channel BUT, that will be absorbed by the
# `ctx.open_stream()` block and thus the `.open_context()`
# should hang, after which the test script simulates
# a user sending ctl-c by raising a KBI.
if pre_aclose_msgstream: if pre_aclose_msgstream:
expect_final_exc = KeyboardInterrupt expect_final_exc = trio.EndOfChannel
# XXX OLD XXX
# if child calls `MsgStream.aclose()` then expect EoC.
# ^ XXX not any more ^ since eoc is always absorbed
# gracefully and NOT bubbled to the `.open_context()`
# block!
# expect_final_exc = trio.EndOfChannel
# BOTH but, CHILD breaks FIRST # BOTH but, CHILD breaks FIRST
elif ( elif (
@ -151,8 +134,12 @@ def test_ipc_channel_break_during_stream(
> ipc_break['break_child_ipc_after'] > ipc_break['break_child_ipc_after']
) )
): ):
expect_final_exc = trio.ClosedResourceError
# child will send a 'stop' msg before it breaks
# the transport channel.
if pre_aclose_msgstream: if pre_aclose_msgstream:
expect_final_exc = KeyboardInterrupt expect_final_exc = trio.EndOfChannel
# NOTE when the parent IPC side dies (even if the child's does as well # NOTE when the parent IPC side dies (even if the child's does as well
# but the child fails BEFORE the parent) we always expect the # but the child fails BEFORE the parent) we always expect the
@ -173,8 +160,7 @@ def test_ipc_channel_break_during_stream(
ipc_break['break_parent_ipc_after'] is not False ipc_break['break_parent_ipc_after'] is not False
and ( and (
ipc_break['break_child_ipc_after'] ipc_break['break_child_ipc_after']
> > ipc_break['break_parent_ipc_after']
ipc_break['break_parent_ipc_after']
) )
): ):
expect_final_exc = trio.ClosedResourceError expect_final_exc = trio.ClosedResourceError
@ -238,29 +224,25 @@ def test_stream_closed_right_after_ipc_break_and_zombie_lord_engages():
''' '''
async def main(): async def main():
with trio.fail_after(3): async with tractor.open_nursery() as n:
async with tractor.open_nursery() as n: portal = await n.start_actor(
portal = await n.start_actor( 'ipc_breaker',
'ipc_breaker', enable_modules=[__name__],
enable_modules=[__name__], )
)
with trio.move_on_after(1): with trio.move_on_after(1):
async with ( async with (
portal.open_context( portal.open_context(
break_ipc_after_started break_ipc_after_started
) as (ctx, sent), ) as (ctx, sent),
): ):
async with ctx.open_stream(): async with ctx.open_stream():
await trio.sleep(0.5) await trio.sleep(0.5)
print('parent waiting on context') print('parent waiting on context')
print( print('parent exited context')
'parent exited context\n' raise KeyboardInterrupt
'parent raising KBI..\n'
)
raise KeyboardInterrupt
with pytest.raises(KeyboardInterrupt): with pytest.raises(KeyboardInterrupt):
trio.run(main) trio.run(main)

View File

@ -8,6 +8,10 @@ import platform
import time import time
from itertools import repeat from itertools import repeat
from exceptiongroup import (
BaseExceptionGroup,
ExceptionGroup,
)
import pytest import pytest
import trio import trio
import tractor import tractor

View File

@ -6,15 +6,13 @@ sub-sub-actor daemons.
''' '''
from typing import Optional from typing import Optional
import asyncio import asyncio
from contextlib import ( from contextlib import asynccontextmanager as acm
asynccontextmanager as acm,
aclosing,
)
import pytest import pytest
import trio import trio
import tractor import tractor
from tractor import RemoteActorError from tractor import RemoteActorError
from async_generator import aclosing
async def aio_streamer( async def aio_streamer(

View File

@ -12,9 +12,11 @@ TODO:
""" """
from functools import partial from functools import partial
import itertools import itertools
# from os import path
from typing import Optional from typing import Optional
import platform import platform
import pathlib import pathlib
# import sys
import time import time
import pytest import pytest
@ -24,13 +26,13 @@ from pexpect.exceptions import (
EOF, EOF,
) )
from tractor._testing import (
examples_dir,
)
from tractor.devx._debug import ( from tractor.devx._debug import (
_pause_msg, _pause_msg,
_crash_msg, _crash_msg,
) )
from tractor._testing import (
examples_dir,
)
from conftest import ( from conftest import (
_ci_env, _ci_env,
) )

View File

@ -8,6 +8,7 @@ import builtins
import itertools import itertools
import importlib import importlib
from exceptiongroup import BaseExceptionGroup
import pytest import pytest
import trio import trio
import tractor import tractor
@ -19,8 +20,6 @@ from tractor import (
from tractor.trionics import BroadcastReceiver from tractor.trionics import BroadcastReceiver
from tractor._testing import expect_ctxc from tractor._testing import expect_ctxc
from conftest import expect_ctxc
async def sleep_and_err( async def sleep_and_err(
sleep_for: float = 0.1, sleep_for: float = 0.1,

View File

@ -64,8 +64,7 @@ async def test_lifetime_stack_wipes_tmpfile(
except ( except (
tractor.RemoteActorError, tractor.RemoteActorError,
# tractor.BaseExceptionGroup, tractor.BaseExceptionGroup,
BaseExceptionGroup,
): ):
pass pass

167
tests/test_shm.py 100644
View File

@ -0,0 +1,167 @@
"""
Shared mem primitives and APIs.
"""
import uuid
# import numpy
import pytest
import trio
import tractor
from tractor._shm import (
open_shm_list,
attach_shm_list,
)
@tractor.context
async def child_attach_shml_alot(
ctx: tractor.Context,
shm_key: str,
) -> None:
await ctx.started(shm_key)
# now try to attach a boatload of times in a loop..
for _ in range(1000):
shml = attach_shm_list(
key=shm_key,
readonly=False,
)
assert shml.shm.name == shm_key
await trio.sleep(0.001)
def test_child_attaches_alot():
async def main():
async with tractor.open_nursery() as an:
# allocate writeable list in parent
key = f'shml_{uuid.uuid4()}'
shml = open_shm_list(
key=key,
)
portal = await an.start_actor(
'shm_attacher',
enable_modules=[__name__],
)
async with (
portal.open_context(
child_attach_shml_alot,
shm_key=shml.key,
) as (ctx, start_val),
):
assert start_val == key
await ctx.result()
await portal.cancel_actor()
trio.run(main)
@tractor.context
async def child_read_shm_list(
ctx: tractor.Context,
shm_key: str,
use_str: bool,
frame_size: int,
) -> None:
# attach in child
shml = attach_shm_list(
key=shm_key,
# dtype=str if use_str else float,
)
await ctx.started(shml.key)
async with ctx.open_stream() as stream:
async for i in stream:
print(f'(child): reading shm list index: {i}')
if use_str:
expect = str(float(i))
else:
expect = float(i)
if frame_size == 1:
val = shml[i]
assert expect == val
print(f'(child): reading value: {val}')
else:
frame = shml[i - frame_size:i]
print(f'(child): reading frame: {frame}')
@pytest.mark.parametrize(
'use_str',
[False, True],
ids=lambda i: f'use_str_values={i}',
)
@pytest.mark.parametrize(
'frame_size',
[1, 2**6, 2**10],
ids=lambda i: f'frame_size={i}',
)
def test_parent_writer_child_reader(
use_str: bool,
frame_size: int,
):
async def main():
async with tractor.open_nursery(
# debug_mode=True,
) as an:
portal = await an.start_actor(
'shm_reader',
enable_modules=[__name__],
debug_mode=True,
)
# allocate writeable list in parent
key = 'shm_list'
seq_size = int(2 * 2 ** 10)
shml = open_shm_list(
key=key,
size=seq_size,
dtype=str if use_str else float,
readonly=False,
)
async with (
portal.open_context(
child_read_shm_list,
shm_key=key,
use_str=use_str,
frame_size=frame_size,
) as (ctx, sent),
ctx.open_stream() as stream,
):
assert sent == key
for i in range(seq_size):
val = float(i)
if use_str:
val = str(val)
# print(f'(parent): writing {val}')
shml[i] = val
# only on frame fills do we
# signal to the child that a frame's
# worth is ready.
if (i % frame_size) == 0:
print(f'(parent): signalling frame full on {val}')
await stream.send(i)
else:
print(f'(parent): signalling final frame on {val}')
await stream.send(i)
await portal.cancel_actor()
trio.run(main)

View File

@ -5,7 +5,7 @@ want to see changed.
''' '''
import pytest import pytest
import trio import trio
from trio import TaskStatus from trio_typing import TaskStatus
@pytest.mark.parametrize( @pytest.mark.parametrize(

View File

@ -169,7 +169,8 @@ async def _drain_to_final_msg(
# only when we are sure the remote error is # only when we are sure the remote error is
# the source cause of this local task's # the source cause of this local task's
# cancellation. # cancellation.
ctx.maybe_raise() if re := ctx._remote_error:
ctx._maybe_raise_remote_err(re)
# CASE 1: we DID request the cancel we simply # CASE 1: we DID request the cancel we simply
# continue to bubble up as normal. # continue to bubble up as normal.
@ -256,13 +257,6 @@ async def _drain_to_final_msg(
) )
# XXX fallthrough to handle expected error XXX # XXX fallthrough to handle expected error XXX
# TODO: replace this with `ctx.maybe_raise()`
#
# TODO: would this be handier for this case maybe?
# async with maybe_raise_on_exit() as raises:
# if raises:
# log.error('some msg about raising..')
re: Exception|None = ctx._remote_error re: Exception|None = ctx._remote_error
if re: if re:
log.critical( log.critical(
@ -601,7 +595,7 @@ class Context:
if not re: if not re:
return False return False
if from_uid := re.src_uid: if from_uid := re.src_actor_uid:
from_uid: tuple = tuple(from_uid) from_uid: tuple = tuple(from_uid)
our_uid: tuple = self._actor.uid our_uid: tuple = self._actor.uid
@ -831,7 +825,7 @@ class Context:
# cancellation. # cancellation.
maybe_error_src: tuple = getattr( maybe_error_src: tuple = getattr(
error, error,
'src_uid', 'src_actor_uid',
None, None,
) )
self._canceller = ( self._canceller = (
@ -868,9 +862,6 @@ class Context:
# TODO: maybe we should also call `._res_scope.cancel()` if it # TODO: maybe we should also call `._res_scope.cancel()` if it
# exists to support cancelling any drain loop hangs? # exists to support cancelling any drain loop hangs?
# NOTE: this usage actually works here B)
# from .devx._debug import breakpoint
# await breakpoint()
# TODO: add to `Channel`? # TODO: add to `Channel`?
@property @property
@ -1039,8 +1030,8 @@ class Context:
@acm @acm
async def open_stream( async def open_stream(
self, self,
allow_overruns: bool|None = False, allow_overruns: bool | None = False,
msg_buffer_size: int|None = None, msg_buffer_size: int | None = None,
) -> AsyncGenerator[MsgStream, None]: ) -> AsyncGenerator[MsgStream, None]:
''' '''
@ -1080,16 +1071,13 @@ class Context:
# absorbed there (silently) and we DO NOT want to # absorbed there (silently) and we DO NOT want to
# actually try to stream - a cancel msg was already # actually try to stream - a cancel msg was already
# sent to the other side! # sent to the other side!
self.maybe_raise( if self._remote_error:
raise_ctxc_from_self_call=True, # NOTE: this is diff then calling
) # `._maybe_raise_remote_err()` specifically
# NOTE: this is diff then calling # because any task entering this `.open_stream()`
# `._maybe_raise_remote_err()` specifically # AFTER cancellation has already been requested,
# because we want to raise a ctxc on any task entering this `.open_stream()` # we DO NOT want to absorb any ctxc ACK silently!
# AFTER cancellation was already been requested, raise self._remote_error
# we DO NOT want to absorb any ctxc ACK silently!
# if self._remote_error:
# raise self._remote_error
# XXX NOTE: if no `ContextCancelled` has been responded # XXX NOTE: if no `ContextCancelled` has been responded
# back from the other side (yet), we raise a different # back from the other side (yet), we raise a different
@ -1170,6 +1158,7 @@ class Context:
# await trio.lowlevel.checkpoint() # await trio.lowlevel.checkpoint()
yield stream yield stream
# XXX: (MEGA IMPORTANT) if this is a root opened process we # XXX: (MEGA IMPORTANT) if this is a root opened process we
# wait for any immediate child in debug before popping the # wait for any immediate child in debug before popping the
# context from the runtime msg loop otherwise inside # context from the runtime msg loop otherwise inside
@ -1194,23 +1183,12 @@ class Context:
# #
# await stream.aclose() # await stream.aclose()
# NOTE: absorb and do not raise any # if re := ctx._remote_error:
# EoC received from the other side such that # ctx._maybe_raise_remote_err(
# it is not raised inside the surrounding # re,
# context block's scope! # raise_ctxc_from_self_call=True,
except trio.EndOfChannel as eoc: # )
if ( # await trio.lowlevel.checkpoint()
eoc
and stream.closed
):
# sanity, can remove?
assert eoc is stream._eoc
# from .devx import pause
# await pause()
log.warning(
'Stream was terminated by EoC\n\n'
f'{repr(eoc)}\n'
)
finally: finally:
if self._portal: if self._portal:
@ -1226,6 +1204,7 @@ class Context:
# TODO: replace all the instances of this!! XD # TODO: replace all the instances of this!! XD
def maybe_raise( def maybe_raise(
self, self,
hide_tb: bool = True, hide_tb: bool = True,
**kwargs, **kwargs,
@ -1409,41 +1388,33 @@ class Context:
f'{drained_msgs}' f'{drained_msgs}'
) )
self.maybe_raise( if (
raise_overrun_from_self=( (re := self._remote_error)
raise_overrun # and self._result == res_placeholder
and ):
# only when we ARE NOT the canceller self._maybe_raise_remote_err(
# should we raise overruns, bc ow we're re,
# raising something we know might happen # NOTE: obvi we don't care if we
# during cancellation ;) # overran the far end if we're already
(not self._cancel_called) # waiting on a final result (msg).
# raise_overrun_from_self=False,
raise_overrun_from_self=(
raise_overrun
and
# only when we ARE NOT the canceller
# should we raise overruns, bc ow we're
# raising something we know might happen
# during cancellation ;)
(not self._cancel_called)
),
) )
)
# if (
# (re := self._remote_error)
# # and self._result == res_placeholder
# ):
# self._maybe_raise_remote_err(
# re,
# # NOTE: obvi we don't care if we
# # overran the far end if we're already
# # waiting on a final result (msg).
# # raise_overrun_from_self=False,
# raise_overrun_from_self=(
# raise_overrun
# and
# # only when we ARE NOT the canceller
# # should we raise overruns, bc ow we're
# # raising something we know might happen
# # during cancellation ;)
# (not self._cancel_called)
# ),
# )
# if maybe_err: # if maybe_err:
# self._result = maybe_err # self._result = maybe_err
return self.outcome return self.outcome
# None if self._result == res_placeholder
# else self._result
# )
# TODO: switch this with above which should be named # TODO: switch this with above which should be named
# `.wait_for_outcome()` and instead do # `.wait_for_outcome()` and instead do
@ -1892,9 +1863,8 @@ async def open_context_from_portal(
# TODO: if we set this the wrapping `@acm` body will # TODO: if we set this the wrapping `@acm` body will
# still be shown (awkwardly) on pdb REPL entry. Ideally # still be shown (awkwardly) on pdb REPL entry. Ideally
# we can similarly annotate that frame to NOT show? for now # we can similarly annotate that frame to NOT show?
# we DO SHOW this frame since it's awkward ow.. hide_tb: bool = True,
hide_tb: bool = False,
# proxied to RPC # proxied to RPC
**kwargs, **kwargs,

View File

@ -30,10 +30,11 @@ from typing import (
import textwrap import textwrap
import traceback import traceback
import exceptiongroup as eg
import trio import trio
from tractor._state import current_actor from ._state import current_actor
from tractor.log import get_logger from .log import get_logger
if TYPE_CHECKING: if TYPE_CHECKING:
from ._context import Context from ._context import Context
@ -372,6 +373,7 @@ def unpack_error(
for ns in [ for ns in [
builtins, builtins,
_this_mod, _this_mod,
eg,
trio, trio,
]: ]:
if suberror_type := getattr( if suberror_type := getattr(
@ -394,13 +396,12 @@ def unpack_error(
def is_multi_cancelled(exc: BaseException) -> bool: def is_multi_cancelled(exc: BaseException) -> bool:
''' '''
Predicate to determine if a possible ``BaseExceptionGroup`` contains Predicate to determine if a possible ``eg.BaseExceptionGroup`` contains
only ``trio.Cancelled`` sub-exceptions (and is likely the result of only ``trio.Cancelled`` sub-exceptions (and is likely the result of
cancelling a collection of subtasks. cancelling a collection of subtasks.
''' '''
# if isinstance(exc, eg.BaseExceptionGroup): if isinstance(exc, eg.BaseExceptionGroup):
if isinstance(exc, BaseExceptionGroup):
return exc.subgroup( return exc.subgroup(
lambda exc: isinstance(exc, trio.Cancelled) lambda exc: isinstance(exc, trio.Cancelled)
) is not None ) is not None

View File

@ -28,13 +28,12 @@ import os
import warnings import warnings
from exceptiongroup import BaseExceptionGroup
import trio import trio
from ._runtime import ( from ._runtime import (
Actor, Actor,
Arbiter, Arbiter,
# TODO: rename and make a non-actor subtype?
# Arbiter as Registry,
async_main, async_main,
) )
from .devx import _debug from .devx import _debug
@ -326,10 +325,10 @@ async def open_root_actor(
) as err: ) as err:
entered: bool = await _debug._maybe_enter_pm(err) entered: bool = await _debug._maybe_enter_pm(err)
if ( if (
not entered not entered
and and not is_multi_cancelled(err)
not is_multi_cancelled(err)
): ):
logger.exception('Root actor crashed:\n') logger.exception('Root actor crashed:\n')

View File

@ -21,7 +21,6 @@ Remote (task) Procedure Call (scheduling) with SC transitive semantics.
from __future__ import annotations from __future__ import annotations
from contextlib import ( from contextlib import (
asynccontextmanager as acm, asynccontextmanager as acm,
aclosing,
) )
from functools import partial from functools import partial
import inspect import inspect
@ -35,12 +34,17 @@ from typing import (
) )
import warnings import warnings
from async_generator import aclosing
from exceptiongroup import BaseExceptionGroup
import trio import trio
from trio import ( from trio import (
CancelScope, CancelScope,
Nursery, Nursery,
TaskStatus, TaskStatus,
) )
# from trio_typing import (
# TaskStatus,
# )
from .msg import NamespacePath from .msg import NamespacePath
from ._ipc import Channel from ._ipc import Channel

View File

@ -45,7 +45,6 @@ from functools import partial
from itertools import chain from itertools import chain
import importlib import importlib
import importlib.util import importlib.util
import os
from pprint import pformat from pprint import pformat
import signal import signal
import sys import sys
@ -56,11 +55,14 @@ from typing import (
) )
import uuid import uuid
from types import ModuleType from types import ModuleType
import os
import warnings import warnings
import trio import trio
from trio import ( from trio import (
CancelScope, CancelScope,
)
from trio_typing import (
Nursery, Nursery,
TaskStatus, TaskStatus,
) )
@ -78,7 +80,11 @@ from ._exceptions import (
ContextCancelled, ContextCancelled,
TransportClosed, TransportClosed,
) )
from .devx import _debug from .devx import (
# pause,
maybe_wait_for_debugger,
_debug,
)
from ._discovery import get_registry from ._discovery import get_registry
from ._portal import Portal from ._portal import Portal
from . import _state from . import _state
@ -96,7 +102,7 @@ if TYPE_CHECKING:
log = get_logger('tractor') log = get_logger('tractor')
def _get_mod_abspath(module): def _get_mod_abspath(module: ModuleType) -> str:
return os.path.abspath(module.__file__) return os.path.abspath(module.__file__)
@ -388,12 +394,6 @@ class Actor:
self._no_more_peers = trio.Event() # unset by making new self._no_more_peers = trio.Event() # unset by making new
chan = Channel.from_stream(stream) chan = Channel.from_stream(stream)
their_uid: tuple[str, str]|None = chan.uid their_uid: tuple[str, str]|None = chan.uid
if their_uid:
log.warning(
f'Re-connection from already known {their_uid}'
)
else:
log.runtime(f'New connection to us @{chan.raddr}')
con_msg: str = '' con_msg: str = ''
if their_uid: if their_uid:
@ -665,7 +665,7 @@ class Actor:
f'last disconnected child uid: {uid}\n' f'last disconnected child uid: {uid}\n'
f'locking child uid: {pdb_user_uid}\n' f'locking child uid: {pdb_user_uid}\n'
) )
await _debug.maybe_wait_for_debugger( await maybe_wait_for_debugger(
child_in_debug=True child_in_debug=True
) )
@ -717,12 +717,10 @@ class Actor:
f'|_{chan}\n' f'|_{chan}\n'
) )
try: try:
# send msg loop terminate sentinel which # send a msg loop terminate sentinel
# triggers cancellation of all remotely
# started tasks.
await chan.send(None) await chan.send(None)
# XXX: do we want this? no right? # XXX: do we want this?
# causes "[104] connection reset by peer" on other end # causes "[104] connection reset by peer" on other end
# await chan.aclose() # await chan.aclose()
@ -1212,10 +1210,10 @@ class Actor:
# - callee self raises ctxc before caller send request, # - callee self raises ctxc before caller send request,
# - callee errors prior to cancel req. # - callee errors prior to cancel req.
log.cancel( log.cancel(
'Cancel request invalid, RPC task already completed?\n\n' 'Cancel request invalid, RPC task already completed?\n'
f'<= canceller: {requesting_uid}\n\n' f'<= canceller: {requesting_uid}\n\n'
f'=> {cid}@{parent_chan.uid}\n' f'=>{parent_chan}\n'
f' |_{parent_chan}\n' f' |_ctx-id: {cid}\n'
) )
return True return True
@ -1394,8 +1392,8 @@ class Actor:
@property @property
def accept_addrs(self) -> list[tuple[str, int]]: def accept_addrs(self) -> list[tuple[str, int]]:
''' '''
All addresses to which the transport-channel server binds All addresses to which the IPC-transport-channel server
and listens for new connections. binds and listens for new connections.
''' '''
# throws OSError on failure # throws OSError on failure
@ -1514,6 +1512,7 @@ async def async_main(
): ):
accept_addrs = set_accept_addr_says_rent accept_addrs = set_accept_addr_says_rent
# The "root" nursery ensures the channel with the immediate # The "root" nursery ensures the channel with the immediate
# parent is kept alive as a resilient service until # parent is kept alive as a resilient service until
# cancellation steps have (mostly) occurred in # cancellation steps have (mostly) occurred in
@ -1567,9 +1566,9 @@ async def async_main(
# tranport address bind errors - normally it's # tranport address bind errors - normally it's
# something silly like the wrong socket-address # something silly like the wrong socket-address
# passed via a config or CLI Bo # passed via a config or CLI Bo
entered_debug: bool = await _debug._maybe_enter_pm(oserr) entered_debug = await _debug._maybe_enter_pm(oserr)
if not entered_debug: if entered_debug:
log.exception('Failed to init IPC channel server !?\n') log.runtime('Exited debug REPL..')
raise raise
accept_addrs: list[tuple[str, int]] = actor.accept_addrs accept_addrs: list[tuple[str, int]] = actor.accept_addrs

833
tractor/_shm.py 100644
View File

@ -0,0 +1,833 @@
# tractor: structured concurrent "actors".
# Copyright 2018-eternity Tyler Goodlet.
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
SC friendly shared memory management geared at real-time
processing.
Support for ``numpy`` compatible array-buffers is provided but is
considered optional within the context of this runtime-library.
"""
from __future__ import annotations
from sys import byteorder
import time
from typing import Optional
from multiprocessing import shared_memory as shm
from multiprocessing.shared_memory import (
SharedMemory,
ShareableList,
)
from msgspec import Struct
import tractor
from .log import get_logger
_USE_POSIX = getattr(shm, '_USE_POSIX', False)
if _USE_POSIX:
from _posixshmem import shm_unlink
try:
import numpy as np
from numpy.lib import recfunctions as rfn
import nptyping
except ImportError:
pass
log = get_logger(__name__)
def disable_mantracker():
'''
Disable all ``multiprocessing``` "resource tracking" machinery since
it's an absolute multi-threaded mess of non-SC madness.
'''
from multiprocessing import resource_tracker as mantracker
# Tell the "resource tracker" thing to fuck off.
class ManTracker(mantracker.ResourceTracker):
def register(self, name, rtype):
pass
def unregister(self, name, rtype):
pass
def ensure_running(self):
pass
# "know your land and know your prey"
# https://www.dailymotion.com/video/x6ozzco
mantracker._resource_tracker = ManTracker()
mantracker.register = mantracker._resource_tracker.register
mantracker.ensure_running = mantracker._resource_tracker.ensure_running
mantracker.unregister = mantracker._resource_tracker.unregister
mantracker.getfd = mantracker._resource_tracker.getfd
disable_mantracker()
class SharedInt:
'''
Wrapper around a single entry shared memory array which
holds an ``int`` value used as an index counter.
'''
def __init__(
self,
shm: SharedMemory,
) -> None:
self._shm = shm
@property
def value(self) -> int:
return int.from_bytes(self._shm.buf, byteorder)
@value.setter
def value(self, value) -> None:
self._shm.buf[:] = value.to_bytes(self._shm.size, byteorder)
def destroy(self) -> None:
if _USE_POSIX:
# We manually unlink to bypass all the "resource tracker"
# nonsense meant for non-SC systems.
name = self._shm.name
try:
shm_unlink(name)
except FileNotFoundError:
# might be a teardown race here?
log.warning(f'Shm for {name} already unlinked?')
class NDToken(Struct, frozen=True):
'''
Internal represenation of a shared memory ``numpy`` array "token"
which can be used to key and load a system (OS) wide shm entry
and correctly read the array by type signature.
This type is msg safe.
'''
shm_name: str # this servers as a "key" value
shm_first_index_name: str
shm_last_index_name: str
dtype_descr: tuple
size: int # in struct-array index / row terms
# TODO: use nptyping here on dtypes
@property
def dtype(self) -> list[tuple[str, str, tuple[int, ...]]]:
return np.dtype(
list(
map(tuple, self.dtype_descr)
)
).descr
def as_msg(self):
return self.to_dict()
@classmethod
def from_msg(cls, msg: dict) -> NDToken:
if isinstance(msg, NDToken):
return msg
# TODO: native struct decoding
# return _token_dec.decode(msg)
msg['dtype_descr'] = tuple(map(tuple, msg['dtype_descr']))
return NDToken(**msg)
# _token_dec = msgspec.msgpack.Decoder(NDToken)
# TODO: this api?
# _known_tokens = tractor.ActorVar('_shm_tokens', {})
# _known_tokens = tractor.ContextStack('_known_tokens', )
# _known_tokens = trio.RunVar('shms', {})
# TODO: this should maybe be provided via
# a `.trionics.maybe_open_context()` wrapper factory?
# process-local store of keys to tokens
_known_tokens: dict[str, NDToken] = {}
def get_shm_token(key: str) -> NDToken | None:
'''
Convenience func to check if a token
for the provided key is known by this process.
Returns either the ``numpy`` token or a string for a shared list.
'''
return _known_tokens.get(key)
def _make_token(
key: str,
size: int,
dtype: np.dtype,
) -> NDToken:
'''
Create a serializable token that can be used
to access a shared array.
'''
return NDToken(
shm_name=key,
shm_first_index_name=key + "_first",
shm_last_index_name=key + "_last",
dtype_descr=tuple(np.dtype(dtype).descr),
size=size,
)
class ShmArray:
'''
A shared memory ``numpy.ndarray`` API.
An underlying shared memory buffer is allocated based on
a user specified ``numpy.ndarray``. This fixed size array
can be read and written to by pushing data both onto the "front"
or "back" of a set index range. The indexes for the "first" and
"last" index are themselves stored in shared memory (accessed via
``SharedInt`` interfaces) values such that multiple processes can
interact with the same array using a synchronized-index.
'''
def __init__(
self,
shmarr: np.ndarray,
first: SharedInt,
last: SharedInt,
shm: SharedMemory,
# readonly: bool = True,
) -> None:
self._array = shmarr
# indexes for first and last indices corresponding
# to fille data
self._first = first
self._last = last
self._len = len(shmarr)
self._shm = shm
self._post_init: bool = False
# pushing data does not write the index (aka primary key)
self._write_fields: list[str] | None = None
dtype = shmarr.dtype
if dtype.fields:
self._write_fields = list(shmarr.dtype.fields.keys())[1:]
# TODO: ringbuf api?
@property
def _token(self) -> NDToken:
return NDToken(
shm_name=self._shm.name,
shm_first_index_name=self._first._shm.name,
shm_last_index_name=self._last._shm.name,
dtype_descr=tuple(self._array.dtype.descr),
size=self._len,
)
@property
def token(self) -> dict:
"""Shared memory token that can be serialized and used by
another process to attach to this array.
"""
return self._token.as_msg()
@property
def index(self) -> int:
return self._last.value % self._len
@property
def array(self) -> np.ndarray:
'''
Return an up-to-date ``np.ndarray`` view of the
so-far-written data to the underlying shm buffer.
'''
a = self._array[self._first.value:self._last.value]
# first, last = self._first.value, self._last.value
# a = self._array[first:last]
# TODO: eventually comment this once we've not seen it in the
# wild in a long time..
# XXX: race where first/last indexes cause a reader
# to load an empty array..
if len(a) == 0 and self._post_init:
raise RuntimeError('Empty array race condition hit!?')
# breakpoint()
return a
def ustruct(
self,
fields: Optional[list[str]] = None,
# type that all field values will be cast to
# in the returned view.
common_dtype: np.dtype = float,
) -> np.ndarray:
array = self._array
if fields:
selection = array[fields]
# fcount = len(fields)
else:
selection = array
# fcount = len(array.dtype.fields)
# XXX: manual ``.view()`` attempt that also doesn't work.
# uview = selection.view(
# dtype='<f16',
# ).reshape(-1, 4, order='A')
# assert len(selection) == len(uview)
u = rfn.structured_to_unstructured(
selection,
# dtype=float,
copy=True,
)
# unstruct = np.ndarray(u.shape, dtype=a.dtype, buffer=shm.buf)
# array[:] = a[:]
return u
# return ShmArray(
# shmarr=u,
# first=self._first,
# last=self._last,
# shm=self._shm
# )
def last(
self,
length: int = 1,
) -> np.ndarray:
'''
Return the last ``length``'s worth of ("row") entries from the
array.
'''
return self.array[-length:]
def push(
self,
data: np.ndarray,
field_map: Optional[dict[str, str]] = None,
prepend: bool = False,
update_first: bool = True,
start: int | None = None,
) -> int:
'''
Ring buffer like "push" to append data
into the buffer and return updated "last" index.
NB: no actual ring logic yet to give a "loop around" on overflow
condition, lel.
'''
length = len(data)
if prepend:
index = (start or self._first.value) - length
if index < 0:
raise ValueError(
f'Array size of {self._len} was overrun during prepend.\n'
f'You have passed {abs(index)} too many datums.'
)
else:
index = start if start is not None else self._last.value
end = index + length
if field_map:
src_names, dst_names = zip(*field_map.items())
else:
dst_names = src_names = self._write_fields
try:
self._array[
list(dst_names)
][index:end] = data[list(src_names)][:]
# NOTE: there was a race here between updating
# the first and last indices and when the next reader
# tries to access ``.array`` (which due to the index
# overlap will be empty). Pretty sure we've fixed it now
# but leaving this here as a reminder.
if (
prepend
and update_first
and length
):
assert index < self._first.value
if (
index < self._first.value
and update_first
):
assert prepend, 'prepend=True not passed but index decreased?'
self._first.value = index
elif not prepend:
self._last.value = end
self._post_init = True
return end
except ValueError as err:
if field_map:
raise
# should raise if diff detected
self.diff_err_fields(data)
raise err
def diff_err_fields(
self,
data: np.ndarray,
) -> None:
# reraise with any field discrepancy
our_fields, their_fields = (
set(self._array.dtype.fields),
set(data.dtype.fields),
)
only_in_ours = our_fields - their_fields
only_in_theirs = their_fields - our_fields
if only_in_ours:
raise TypeError(
f"Input array is missing field(s): {only_in_ours}"
)
elif only_in_theirs:
raise TypeError(
f"Input array has unknown field(s): {only_in_theirs}"
)
# TODO: support "silent" prepends that don't update ._first.value?
def prepend(
self,
data: np.ndarray,
) -> int:
end = self.push(data, prepend=True)
assert end
def close(self) -> None:
self._first._shm.close()
self._last._shm.close()
self._shm.close()
def destroy(self) -> None:
if _USE_POSIX:
# We manually unlink to bypass all the "resource tracker"
# nonsense meant for non-SC systems.
shm_unlink(self._shm.name)
self._first.destroy()
self._last.destroy()
def flush(self) -> None:
# TODO: flush to storage backend like markestore?
...
def open_shm_ndarray(
size: int,
key: str | None = None,
dtype: np.dtype | None = None,
append_start_index: int | None = None,
readonly: bool = False,
) -> ShmArray:
'''
Open a memory shared ``numpy`` using the standard library.
This call unlinks (aka permanently destroys) the buffer on teardown
and thus should be used from the parent-most accessor (process).
'''
# create new shared mem segment for which we
# have write permission
a = np.zeros(size, dtype=dtype)
a['index'] = np.arange(len(a))
shm = SharedMemory(
name=key,
create=True,
size=a.nbytes
)
array = np.ndarray(
a.shape,
dtype=a.dtype,
buffer=shm.buf
)
array[:] = a[:]
array.setflags(write=int(not readonly))
token = _make_token(
key=key,
size=size,
dtype=dtype,
)
# create single entry arrays for storing an first and last indices
first = SharedInt(
shm=SharedMemory(
name=token.shm_first_index_name,
create=True,
size=4, # std int
)
)
last = SharedInt(
shm=SharedMemory(
name=token.shm_last_index_name,
create=True,
size=4, # std int
)
)
# Start the "real-time" append-updated (or "pushed-to") section
# after some start index: ``append_start_index``. This allows appending
# from a start point in the array which isn't the 0 index and looks
# something like,
# -------------------------
# | | i
# _________________________
# <-------------> <------->
# history real-time
#
# Once fully "prepended", the history section will leave the
# ``ShmArray._start.value: int = 0`` and the yet-to-be written
# real-time section will start at ``ShmArray.index: int``.
# this sets the index to nearly 2/3rds into the the length of
# the buffer leaving at least a "days worth of second samples"
# for the real-time section.
if append_start_index is None:
append_start_index = round(size * 0.616)
last.value = first.value = append_start_index
shmarr = ShmArray(
array,
first,
last,
shm,
)
assert shmarr._token == token
_known_tokens[key] = shmarr.token
# "unlink" created shm on process teardown by
# pushing teardown calls onto actor context stack
stack = tractor.current_actor().lifetime_stack
stack.callback(shmarr.close)
stack.callback(shmarr.destroy)
return shmarr
def attach_shm_ndarray(
token: tuple[str, str, tuple[str, str]],
readonly: bool = True,
) -> ShmArray:
'''
Attach to an existing shared memory array previously
created by another process using ``open_shared_array``.
No new shared mem is allocated but wrapper types for read/write
access are constructed.
'''
token = NDToken.from_msg(token)
key = token.shm_name
if key in _known_tokens:
assert NDToken.from_msg(_known_tokens[key]) == token, "WTF"
# XXX: ugh, looks like due to the ``shm_open()`` C api we can't
# actually place files in a subdir, see discussion here:
# https://stackoverflow.com/a/11103289
# attach to array buffer and view as per dtype
_err: Optional[Exception] = None
for _ in range(3):
try:
shm = SharedMemory(
name=key,
create=False,
)
break
except OSError as oserr:
_err = oserr
time.sleep(0.1)
else:
if _err:
raise _err
shmarr = np.ndarray(
(token.size,),
dtype=token.dtype,
buffer=shm.buf
)
shmarr.setflags(write=int(not readonly))
first = SharedInt(
shm=SharedMemory(
name=token.shm_first_index_name,
create=False,
size=4, # std int
),
)
last = SharedInt(
shm=SharedMemory(
name=token.shm_last_index_name,
create=False,
size=4, # std int
),
)
# make sure we can read
first.value
sha = ShmArray(
shmarr,
first,
last,
shm,
)
# read test
sha.array
# Stash key -> token knowledge for future queries
# via `maybe_opepn_shm_array()` but only after we know
# we can attach.
if key not in _known_tokens:
_known_tokens[key] = token
# "close" attached shm on actor teardown
tractor.current_actor().lifetime_stack.callback(sha.close)
return sha
def maybe_open_shm_ndarray(
key: str, # unique identifier for segment
size: int,
dtype: np.dtype | None = None,
append_start_index: int = 0,
readonly: bool = True,
) -> tuple[ShmArray, bool]:
'''
Attempt to attach to a shared memory block using a "key" lookup
to registered blocks in the users overall "system" registry
(presumes you don't have the block's explicit token).
This function is meant to solve the problem of discovering whether
a shared array token has been allocated or discovered by the actor
running in **this** process. Systems where multiple actors may seek
to access a common block can use this function to attempt to acquire
a token as discovered by the actors who have previously stored
a "key" -> ``NDToken`` map in an actor local (aka python global)
variable.
If you know the explicit ``NDToken`` for your memory segment instead
use ``attach_shm_array``.
'''
try:
# see if we already know this key
token = _known_tokens[key]
return (
attach_shm_ndarray(
token=token,
readonly=readonly,
),
False, # not newly opened
)
except KeyError:
log.warning(f"Could not find {key} in shms cache")
if dtype:
token = _make_token(
key,
size=size,
dtype=dtype,
)
else:
try:
return (
attach_shm_ndarray(
token=token,
readonly=readonly,
),
False,
)
except FileNotFoundError:
log.warning(f"Could not attach to shm with token {token}")
# This actor does not know about memory
# associated with the provided "key".
# Attempt to open a block and expect
# to fail if a block has been allocated
# on the OS by someone else.
return (
open_shm_ndarray(
key=key,
size=size,
dtype=dtype,
append_start_index=append_start_index,
readonly=readonly,
),
True,
)
class ShmList(ShareableList):
'''
Carbon copy of ``.shared_memory.ShareableList`` with a few
enhancements:
- readonly mode via instance var flag `._readonly: bool`
- ``.__getitem__()`` accepts ``slice`` inputs
- exposes the underlying buffer "name" as a ``.key: str``
'''
def __init__(
self,
sequence: list | None = None,
*,
name: str | None = None,
readonly: bool = True
) -> None:
self._readonly = readonly
self._key = name
return super().__init__(
sequence=sequence,
name=name,
)
@property
def key(self) -> str:
return self._key
@property
def readonly(self) -> bool:
return self._readonly
def __setitem__(
self,
position,
value,
) -> None:
# mimick ``numpy`` error
if self._readonly:
raise ValueError('assignment destination is read-only')
return super().__setitem__(position, value)
def __getitem__(
self,
indexish,
) -> list:
# NOTE: this is a non-writeable view (copy?) of the buffer
# in a new list instance.
if isinstance(indexish, slice):
return list(self)[indexish]
return super().__getitem__(indexish)
# TODO: should we offer a `.array` and `.push()` equivalent
# to the `ShmArray`?
# currently we have the following limitations:
# - can't write slices of input using traditional slice-assign
# syntax due to the ``ShareableList.__setitem__()`` implementation.
# - ``list(shmlist)`` returns a non-mutable copy instead of
# a writeable view which would be handier numpy-style ops.
def open_shm_list(
key: str,
sequence: list | None = None,
size: int = int(2 ** 10),
dtype: float | int | bool | str | bytes | None = float,
readonly: bool = True,
) -> ShmList:
if sequence is None:
default = {
float: 0.,
int: 0,
bool: True,
str: 'doggy',
None: None,
}[dtype]
sequence = [default] * size
shml = ShmList(
sequence=sequence,
name=key,
readonly=readonly,
)
# "close" attached shm on actor teardown
try:
actor = tractor.current_actor()
actor.lifetime_stack.callback(shml.shm.close)
actor.lifetime_stack.callback(shml.shm.unlink)
except RuntimeError:
log.warning('tractor runtime not active, skipping teardown steps')
return shml
def attach_shm_list(
key: str,
readonly: bool = False,
) -> ShmList:
return ShmList(
name=key,
readonly=readonly,
)

View File

@ -31,24 +31,25 @@ from typing import (
TYPE_CHECKING, TYPE_CHECKING,
) )
from exceptiongroup import BaseExceptionGroup
import trio import trio
from trio import TaskStatus from trio_typing import TaskStatus
from .devx._debug import ( from .devx import (
maybe_wait_for_debugger, maybe_wait_for_debugger,
acquire_debug_lock, acquire_debug_lock,
) )
from tractor._state import ( from ._state import (
current_actor, current_actor,
is_main_process, is_main_process,
is_root_process, is_root_process,
debug_mode, debug_mode,
) )
from tractor.log import get_logger from .log import get_logger
from tractor._portal import Portal from ._portal import Portal
from tractor._runtime import Actor from ._runtime import Actor
from tractor._entry import _mp_main from ._entry import _mp_main
from tractor._exceptions import ActorFailure from ._exceptions import ActorFailure
if TYPE_CHECKING: if TYPE_CHECKING:
@ -220,10 +221,6 @@ async def hard_kill(
# whilst also hacking on it XD # whilst also hacking on it XD
# terminate_after: int = 99999, # terminate_after: int = 99999,
# NOTE: for mucking with `.pause()`-ing inside the runtime
# whilst also hacking on it XD
# terminate_after: int = 99999,
) -> None: ) -> None:
''' '''
Un-gracefully terminate an OS level `trio.Process` after timeout. Un-gracefully terminate an OS level `trio.Process` after timeout.

View File

@ -136,7 +136,7 @@ class MsgStream(trio.abc.Channel):
# return await self.receive() # return await self.receive()
# except trio.EndOfChannel: # except trio.EndOfChannel:
# raise StopAsyncIteration # raise StopAsyncIteration
#
# see ``.aclose()`` for notes on the old behaviour prior to # see ``.aclose()`` for notes on the old behaviour prior to
# introducing this # introducing this
if self._eoc: if self._eoc:
@ -152,6 +152,7 @@ class MsgStream(trio.abc.Channel):
return msg['yield'] return msg['yield']
except KeyError as kerr: except KeyError as kerr:
# log.exception('GOT KEYERROR')
src_err = kerr src_err = kerr
# NOTE: may raise any of the below error types # NOTE: may raise any of the below error types
@ -165,20 +166,30 @@ class MsgStream(trio.abc.Channel):
stream=self, stream=self,
) )
# XXX: the stream terminates on either of: # XXX: we close the stream on any of these error conditions:
# - via `self._rx_chan.receive()` raising after manual closure
# by the rpc-runtime OR,
# - via a received `{'stop': ...}` msg from remote side.
# |_ NOTE: previously this was triggered by calling
# ``._rx_chan.aclose()`` on the send side of the channel inside
# `Actor._push_result()`, but now the 'stop' message handling
# has been put just above inside `_raise_from_no_key_in_msg()`.
except ( except (
trio.EndOfChannel, # trio.ClosedResourceError, # by self._rx_chan
trio.EndOfChannel, # by self._rx_chan or `stop` msg from far end
) as eoc: ) as eoc:
# log.exception('GOT EOC')
src_err = eoc src_err = eoc
self._eoc = eoc self._eoc = eoc
# a ``ClosedResourceError`` indicates that the internal
# feeder memory receive channel was closed likely by the
# runtime after the associated transport-channel
# disconnected or broke.
# an ``EndOfChannel`` indicates either the internal recv
# memchan exhausted **or** we raisesd it just above after
# receiving a `stop` message from the far end of the stream.
# Previously this was triggered by calling ``.aclose()`` on
# the send side of the channel inside
# ``Actor._push_result()`` (should still be commented code
# there - which should eventually get removed), but now the
# 'stop' message handling has been put just above.
# TODO: Locally, we want to close this stream gracefully, by # TODO: Locally, we want to close this stream gracefully, by
# terminating any local consumers tasks deterministically. # terminating any local consumers tasks deterministically.
# Once we have broadcast support, we **don't** want to be # Once we have broadcast support, we **don't** want to be
@ -199,11 +210,8 @@ class MsgStream(trio.abc.Channel):
# raise eoc # raise eoc
# a ``ClosedResourceError`` indicates that the internal except trio.ClosedResourceError as cre: # by self._rx_chan
# feeder memory receive channel was closed likely by the # log.exception('GOT CRE')
# runtime after the associated transport-channel
# disconnected or broke.
except trio.ClosedResourceError as cre: # by self._rx_chan.receive()
src_err = cre src_err = cre
log.warning( log.warning(
'`Context._rx_chan` was already closed?' '`Context._rx_chan` was already closed?'
@ -229,30 +237,15 @@ class MsgStream(trio.abc.Channel):
# over the end-of-stream connection error since likely # over the end-of-stream connection error since likely
# the remote error was the source cause? # the remote error was the source cause?
ctx: Context = self._ctx ctx: Context = self._ctx
ctx.maybe_raise( if re := ctx._remote_error:
raise_ctxc_from_self_call=True, ctx._maybe_raise_remote_err(
) re,
raise_ctxc_from_self_call=True,
)
# propagate any error but hide low-level frame details # propagate any error but hide low-level frames from
# from the caller by default for debug noise reduction. # caller by default.
if ( if hide_tb:
hide_tb
# XXX NOTE XXX don't reraise on certain
# stream-specific internal error types like,
#
# - `trio.EoC` since we want to use the exact instance
# to ensure that it is the error that bubbles upward
# for silent absorption by `Context.open_stream()`.
and not self._eoc
# - `RemoteActorError` (or `ContextCancelled`) if it gets
# raised from `_raise_from_no_key_in_msg()` since we
# want the same (as the above bullet) for any
# `.open_context()` block bubbled error raised by
# any nearby ctx API remote-failures.
# and not isinstance(src_err, RemoteActorError)
):
raise type(src_err)(*src_err.args) from src_err raise type(src_err)(*src_err.args) from src_err
else: else:
raise src_err raise src_err
@ -377,10 +370,6 @@ class MsgStream(trio.abc.Channel):
# await rx_chan.aclose() # await rx_chan.aclose()
if not self._eoc: if not self._eoc:
log.cancel(
'Stream closed before it received an EoC?\n'
'Setting eoc manually..\n..'
)
self._eoc: bool = trio.EndOfChannel( self._eoc: bool = trio.EndOfChannel(
f'Context stream closed by {self._ctx.side}\n' f'Context stream closed by {self._ctx.side}\n'
f'|_{self}\n' f'|_{self}\n'
@ -425,11 +414,13 @@ class MsgStream(trio.abc.Channel):
@property @property
def closed(self) -> bool: def closed(self) -> bool:
if (
rxc: bool = self._rx_chan._closed (rxc := self._rx_chan._closed)
_closed: bool|Exception = self._closed or
_eoc: bool|trio.EndOfChannel = self._eoc (_closed := self._closed)
if rxc or _closed or _eoc: or
(_eoc := self._eoc)
):
log.runtime( log.runtime(
f'`MsgStream` is already closed\n' f'`MsgStream` is already closed\n'
f'{self}\n' f'{self}\n'
@ -505,11 +496,7 @@ class MsgStream(trio.abc.Channel):
''' '''
__tracebackhide__: bool = hide_tb __tracebackhide__: bool = hide_tb
# raise any alreay known error immediately
self._ctx.maybe_raise() self._ctx.maybe_raise()
if self._eoc:
raise self._eoc
if self._closed: if self._closed:
raise self._closed raise self._closed

View File

@ -26,6 +26,7 @@ from typing import TYPE_CHECKING
import typing import typing
import warnings import warnings
from exceptiongroup import BaseExceptionGroup
import trio import trio
from .devx._debug import maybe_wait_for_debugger from .devx._debug import maybe_wait_for_debugger

View File

@ -31,7 +31,7 @@ from typing import (
Callable, Callable,
) )
from functools import partial from functools import partial
from contextlib import aclosing from async_generator import aclosing
import trio import trio
import wrapt import wrapt

View File

@ -33,9 +33,10 @@ from typing import (
) )
import trio import trio
from trio_typing import TaskStatus
from tractor._state import current_actor from .._state import current_actor
from tractor.log import get_logger from ..log import get_logger
log = get_logger(__name__) log = get_logger(__name__)
@ -183,7 +184,7 @@ class _Cache:
cls, cls,
mng, mng,
ctx_key: tuple, ctx_key: tuple,
task_status: trio.TaskStatus[T] = trio.TASK_STATUS_IGNORED, task_status: TaskStatus[T] = trio.TASK_STATUS_IGNORED,
) -> None: ) -> None:
async with mng as value: async with mng as value: