1
0
Fork 0
Commit Graph

87 Commits (7eb31f3fea8ade7877fa0a423f57f69936571660)

Author SHA1 Message Date
Tyler Goodlet 919e462f88 Write more comprehensive `Portal.cancel_actor()` doc str 2023-10-08 15:57:18 -04:00
Tyler Goodlet fa9a9cfb1d Kick off `.devx` subpkg for our dev tools B)
Where `.devx` is "developer experience", a hopefully broad enough subpkg
name for all the slick stuff planned to augment working on the actor
runtime 💥

Move the `._debug` module into the new subpkg and adjust rest of core
code base to reflect import path change. Also add a new
`.devx._debug.open_crash_handler()` manager for wrapping any sync code
outside a `trio.run()` which is handy for eventual CLI addons for
popular frameworks like `click`/`typer`.
2023-09-28 14:14:50 -04:00
Tyler Goodlet 3d0e95513c Init-support for "multi homed" transports
Since we'd like to eventually allow a diverse set of transport
(protocol) methods and stacks, and a multi-peer discovery system for
distributed actor-tree applications, this reworks all runtime internals
to support multi-homing for any given tree on a logical host. In other
words any actor can now bind its transport server (currently only
unsecured TCP + `msgspec`) to more then one address available in its
(linux) network namespace. Further, registry actors (now dubbed
"registars" instead of "arbiters") can also similarly bind to multiple
network addresses and provide discovery services to remote actors via
multiple addresses which can now be provided at runtime startup.

Deats:
- adjust `._runtime` internals to use a `list[tuple[str, int]]` (and
  thus pluralized) socket address sequence where applicable for transport
  server socket binds, now exposed via `Actor.accept_addrs`:
  - `Actor.__init__()` now takes a `registry_addrs: list`.
  - `Actor.is_arbiter` -> `.is_registrar`.
  - `._arb_addr` -> `._reg_addrs: list[tuple]`.
  - always reg and de-reg from all registrars in `async_main()`.
  - only set the global runtime var `'_root_mailbox'` to the loopback
    address since normally all in-tree processes should have access to
    it, right?
  - `._serve_forever()` task now takes `listen_sockaddrs: list[tuple]`
- make `open_root_actor()` take a `registry_addrs: list[tuple[str, int]]`
  and defaults when not passed.
- change `ActorNursery.start_..()` methods take `bind_addrs: list` and
  pass down through the spawning layer(s) via the parent-seed-msg.
- generalize all `._discovery()` APIs to accept `registry_addrs`-like
  inputs and move all relevant subsystems to adopt the "registry" style
  naming instead of "arbiter":
  - make `find_actor()` support batched concurrent portal queries over
    all provided input addresses using `.trionics.gather_contexts()` Bo
  - syntax: move to using `async with <tuples>` 3.9+ style chained
    @acms.
  - a general modernization of the code to a python 3.9+ style.
  - start deprecation and change to "registry" naming / semantics:
    - `._discovery.get_arbiter()` -> `.get_registry()`
2023-09-27 16:25:21 -04:00
Tyler Goodlet ead9e418de Expose `allow_overruns` to `Portal.open_context()`
Turns out you can get a case where you might be opening multiple
ctx-streams concurrently and during the context opening phase you block
for all contexts to open, but then when you eventually start opening
streams some slow to start context has caused the others become in an
overrun state.. so we need to let the caller control whether that's an
error ;)

This also needs a test!
2023-05-15 10:00:45 -04:00
Tyler Goodlet 20d75ff934 Move move context code into new `._context` mod 2023-05-15 10:00:45 -04:00
Tyler Goodlet c72026091e Remote `Context` cancellation semantics rework B)
This adds remote cancellation semantics to our `tractor.Context`
machinery to more closely match that of `trio.CancelScope` but
with operational differences to handle the nature of parallel tasks interoperating
across multiple memory boundaries:

- if an actor task cancels some context it has opened via
  `Context.cancel()`, the remote (scope linked) task will be cancelled
  using the normal `CancelScope` semantics of `trio` meaning the remote
  cancel scope surrounding the far side task is cancelled and
  `trio.Cancelled`s are expected to be raised in that scope as per
  normal `trio` operation, and in the case where no error is raised
  in that remote scope, a `ContextCancelled` error is raised inside the
  runtime machinery and relayed back to the opener/caller side of the
  context.
- if any actor task cancels a full remote actor runtime using
  `Portal.cancel_actor()` the same semantics as above apply except every
  other remote actor task which also has an open context with the actor
  which was cancelled will also be sent a `ContextCancelled` **but**
  with the `.canceller` field set to the uid of the original cancel
  requesting actor.

This changeset also includes a more "proper" solution to the issue of
"allowing overruns" during streaming without attempting to implement any
form of IPC streaming backpressure. Implementing task-granularity
backpressure cross-process turns out to be more or less impossible
without augmenting out streaming protocol (likely at the cost of
performance). Further allowing overruns requires special care since
any blocking of the runtime RPC msg loop task effectively can block
control msgs such as cancels and stream terminations.

The implementation details per abstraction layer are as follows.

._streaming.Context:
- add a new contructor factor func `mk_context()` which provides
  a strictly private init-er whilst allowing us to not have to define
  an `.__init__()` on the type def.
- add public `.cancel_called` and `.cancel_called_remote` properties.
- general rename of what was the internal `._backpressure` var to
  `._allow_overruns: bool`.
- move the old contents of `Actor._push_result()` into a new
  `._deliver_msg()` allowing for better encapsulation of per-ctx
  msg handling.
 - always check for received 'error' msgs and process them with the new
   `_maybe_cancel_and_set_remote_error()` **before** any msg delivery to
   the local task, thus guaranteeing error and cancellation handling
   despite any overflow handling.
- add a new `._drain_overflows()` task-method for use with new
  `._allow_overruns: bool = True` mode.
 - add back a `._scope_nursery: trio.Nursery` (allocated in
   `Portal.open_context()`) who's sole purpose is to spawn a single task
   which runs the above method; anything else is an error.
 - augment `._deliver_msg()` to start a task and run the above method
   when operating in no overrun mode; the task queues overflow msgs and
   attempts to send them to the underlying mem chan using a blocking
   `.send()` call.
 - on context exit, any existing "drainer task" will be cancelled and
   remaining overflow queued msgs are discarded with a warning.
- rename `._error` -> `_remote_error` and set it in a new method
  `_maybe_cancel_and_set_remote_error()` which is called before
  processing
- adjust `.result()` to always call `._maybe_raise_remote_err()` at its
  start such that whenever a `ContextCancelled` arrives we do logic for
  whether or not to immediately raise that error or ignore it due to the
  current actor being the one who requested the cancel, by checking the
  error's `.canceller` field.
 - set the default value of `._result` to be `id(Context()` thus avoiding
   conflict with any `.result()` actually being `False`..

._runtime.Actor:
- augment `.cancel()` and `._cancel_task()` and `.cancel_rpc_tasks()` to
  take a `requesting_uid: tuple` indicating the source actor of every
  cancellation request.
- pass through the new `Context._allow_overruns` through `.get_context()`
- call the new `Context._deliver_msg()` from `._push_result()` (since
  the factoring out that method's contents).

._runtime._invoke:
- `TastStatus.started()` back a `Context` (unless an error is raised)
  instead of the cancel scope to make it easy to set/get state on that
  context for the purposes of cancellation and remote error relay.
- always raise any remote error via `Context._maybe_raise_remote_err()`
  before doing any `ContextCancelled` logic.
- assign any `Context._cancel_called_remote` set by the `requesting_uid`
  cancel methods (mentioned above) to the `ContextCancelled.canceller`.

._runtime.process_messages:
- always pass a `requesting_uid: tuple` to `Actor.cancel()` and
  `._cancel_task` to that any corresponding `ContextCancelled.canceller`
  can be set inside `._invoke()`.
2023-05-15 10:00:45 -04:00
Tyler Goodlet 9f9907271b Merge `ReceiveMsgStream` and `MsgStream`
Since one-way streaming can be accomplished by just *not* sending on one
side (and/or thus wrapping such usage in a more restrictive API), we
just drop the recv-only parent type. The only method different was
`MsgStream.send()`, now merged in. Further in usage of `.subscribe()`
we monkey patch the underlying stream's `.send()` onto the delivered
broadcast receiver so that subscriber tasks can two-way stream as though
using the stream directly.

This allows us to more definitively drop `tractor.open_stream_from()` in
the longer run if we so choose as well; note currently this will
potentially create an issue if a caller tries to `.send()` on such a one
way stream.
2023-01-29 15:03:34 -05:00
Tyler Goodlet 13c9eadc8f Move result log msg up and drop else block 2023-01-29 14:55:02 -05:00
Tyler Goodlet 6c8cacc9d1 Adjust all default is `None` annots (per new `mypy`) 2022-12-12 13:18:22 -05:00
Tyler Goodlet 38326e8c15 Avoid error on context double pops 2022-12-11 23:46:33 -05:00
Tyler Goodlet 33f2234baf Hide some stack layers the user doesn't really need to see 2022-10-14 18:18:01 -04:00
Tyler Goodlet 50fe098e06 First pass, swap `MultiError` for `BaseExceptionGroup` 2022-10-14 18:16:51 -04:00
Tyler Goodlet 7a719ac2a7 Use one nursery per unique manager (signature)
Instead of sticking all `trionics.maybe_open_context()` tasks inside the
actor's (root) service nursery, open a unique one per manager function
instance (id).

Further, accept a callable for the `key` such that a user can have
more flexible control on the caching logic and move the
`maybe_open_nursery()` helper out of the portal mod and into this
trionics "managers" module.
2022-10-09 21:27:23 -04:00
Tyler Goodlet a3a5bc267e Make `process_messages()` a mod func 2022-09-15 23:41:28 -04:00
Tyler Goodlet b21f2e16ad Always consider the debugger when exiting contexts
When in an uncertain teardown state and in debug mode a context can be
popped from actor runtime before a child finished debugging (the case
when the parent is tearing down but the child hasn't closed/completed
its tty lock IPC exit phase) and the child sends the "stop" message to
unlock the debugger but it's ignored bc the parent has already dropped
the ctx. Instead we call `._debug.maybe_wait_for_deugger()` before these
context removals to avoid the root getting stuck thinking the lock was
never released.

Further, add special `Actor._cancel_task()` handling code inside
`_invoke()` which continues to execute the method despite the IPC
channel to the caller being broken and thus avoiding potential hangs due
to a target (child) actor task remaining alive.
2022-08-02 12:17:28 -04:00
Tyler Goodlet e2453fd3da Add spaces before values in log msg 2022-08-02 12:16:58 -04:00
Tyler Goodlet f2671ed026 Type annot updates 2022-07-27 11:40:02 -04:00
Tyler Goodlet bf0ac3116c Only cancel/get-result from a ctx if transport is up
There's no point in sending a cancel message to the remote linked task
and especially no reason to block waiting on a result from that task if
the transport layer is detected to be disconnected. We expect that the
transport shouldn't go down at the layer of the message loop
(reconnection logic should be handled in the transport layer itself) so
if we detect the channel is not connected we don't bother requesting
cancels nor waiting on a final result message.

Why?

- if the connection goes down in error the caller side won't have a way
  to know "how long" it should block to wait for a cancel ack or result
  and causes a potential hang that may require an additional ctrl-c from
  the user especially if using the debugger or if the traceback is not
  seen on console.
- obviously there's no point in waiting for messages when there's no
  transport to deliver them XD

Further, add some more detailed cancel logging detailing the task and
actor ids.
2022-07-27 11:40:02 -04:00
Tyler Goodlet c265f3f94e Move namespace path type into `msg` mod 2022-01-30 11:27:34 -05:00
Tyler Goodlet 2900ceb003 Not all objects have a `.__name__` 2022-01-30 11:26:34 -05:00
Tyler Goodlet b6ae77b5ac Use `pkgutils.resolve_name()` and a `str` subtype
Python 3.9's new object resolver + a `str` is much simpler then mucking
with tuples (and easier to serialize). Include a `.to_tuple()` formatter
since we still are passing the module namespace and function name
separately inside the runtime's message format but in theory we might be
able to simplify this depending on how we would change the support for
`enable_modules:list[str]` in the spawn API.

Thanks to @Fuyukai for pointing `resolve_name()` which I didn't know
about before!
2022-01-30 11:26:34 -05:00
Tyler Goodlet 949cb2c9fe First draft "namespace path" named tuple; probably will discard 2022-01-30 11:26:34 -05:00
Tyler Goodlet 8c004c1f36 Add an explicit messaging error for reporting an illegal context transaction 2021-12-16 19:59:10 -05:00
Tyler Goodlet 5d424e3703 Hide the key error tb on remote starting errors 2021-12-16 19:59:10 -05:00
Tyler Goodlet 6f94ffc304 Re-license code base for distribution under AGPL
This commit obviously denotes a re-license of all applicable parts of
the code base. Acknowledgement of this change was completed in #274 by
the majority of the current set of contributors. From here henceforth
all changes will be AGPL licensed and distributed. This is purely an
effort to maintain the same copy-left policy whilst closing the
(perceived) SaaS loophole the GPL allows for. It is merely for this
loophole: to avoid code hiding by any potential "network providers" who
are attempting to use the project to make a profit without either
compensating the authors or re-distributing their changes.

I thought quite a bit about this change and can't see a reason not to
close the SaaS loophole in our current license. We still are (hard)
copy-left and I plan to keep the code base this way for a couple
reasons:

- The code base produces income/profit through parent projects and is
  demonstrably of high value.
- I believe firms should not get free lunch for the sake of
  "contributions from their employees" or "usage as a service" which
  I have found to be a dubious argument at best.
- If a firm who intends to profit from the code base wants to use it
  they can propose a secondary commercial license to purchase with the
  proceeds going to the project's authors under some form of well
  defined contract.
- Many successful projects like Qt use this model; I see no reason it
  can't work in this case until such a time as the authors feel it
  should be loosened.

There has been detailed discussion in #103 on licensing alternatives.
The main point of this AGPL change is to protect the code base for the
time being from exploitation while it grows and as we move into the next
phase of development which will include extension into the multi-host
distributed software space.
2021-12-14 23:33:27 -05:00
Tyler Goodlet 4ea5c9b5db Pop context on `.open_context()` exit 2021-12-06 11:54:21 -05:00
Tyler Goodlet 2680a9473d Always set `Context._portal` on the caller task side 2021-12-05 19:28:00 -05:00
Tyler Goodlet d307eab118 Rework `Actor.send_cmd()` to `.start_remote_task()`
This more formally declares the runtime's remote task startingn API
and uses it throughout all the dependent `Portal` API methods.
Allows dropping `Portal._submit()` and simplifying `.run_in_actor()`
style result waiting to be delegated to the context APIs at remote
task `return` response time. We now also track the remote entrypoint
"type` as `Context._remote_func_type`.
2021-12-04 18:20:43 -05:00
Tyler Goodlet c5c3f7e789 Use `tractor.Context` throughout the runtime core
Instead of tracking feeder mem chans per RPC dialog, store `Context`
instances which (now) hold refs to the underlying RPC-task feeder chans
and track them inside a `Actor._contexts` map. This begins a transition
to making the "context" idea the primitive abstraction for representing
messaging dialogs between tasks in different memory domains (i.e.
usually separate processes).

A slew of changes made this possible:
- change `Actor.get_memchans()` -> `.get_context()`.
- Add new `Context._send_chan` and `._recv_chan` vars.
- implicitly create a new context on every `Actor.send_cmd()` call.
- use the context created by `.send_cmd()` in `Portal.open_context()`
  instead of manually creating one.
- call `Actor.get_context()` inside tasks run from `._invoke()`
  such that feeder chans are implicitly created for callee tasks
  thus fixing the bug #265.

NB: We might change some of the internal semantics to do with *when* the
feeder chans are actually created to denote whether or not a far end
task is actually *read to receive* messages. For example, in the cases
where it **never** will be ready to receive messages (one-way streaming,
a context that never opens a stream, etc.) we will likely want some kind
of error or at least warning to the caller that messages can't be sent
(yet).
2021-12-03 14:49:55 -05:00
Tyler Goodlet 08e9593306 Suppress broken resources errors in `Portal.cancel_actor()` 2021-12-02 15:29:04 -05:00
Tyler Goodlet 14f84571fb Don't cancel receive streams inside `.cancel_actor()`
We don't need to any more presuming you get ideal remote cancellation
conditions where the remote actor should teardown and kill the streams
from its end.
2021-12-02 15:29:04 -05:00
Tyler Goodlet a23afb0bb8 Set channel cancel called flag on cancel requests 2021-12-02 08:18:04 -05:00
Tyler Goodlet f6de7e0afd Factor out msg unwrapping into a func 2021-11-29 08:46:35 -05:00
Tyler Goodlet 0e7234aa68 Cache the return message instead of the value
Thanks to @richardsheridan for pointing out the limitations of using
*any* kind of value as the result-cached-flag and how it might cause
problems for anyone returning pickled blob-data. This changes the
`Portal` internal result value tracking to stash the full message from
which the value can be retrieved by any `Portal.result()` caller.
The internal change is that `Portal._return_once()` now returns a tuple
of the message *and* its value.
2021-11-29 07:44:44 -05:00
Tyler Goodlet 095c94b1d2 Fix `Portal.run_in_actor()` returns `None` bug
Fixes the issue where if the main remote task returns `None`,
`Portal.result()` would erroneously wait again on the underlying feeder
mem chan since `None` was being used as the cache flag. Instead set the
flag as the channel uid and consider the result collected when set to
anything else (since it would be odd to return that value from a remote
task when you already can read it as part of portal/channel apis).
2021-11-20 13:02:08 -05:00
Tyler Goodlet 10f66e5141 De-noise warnings, add a 'cancel' log level
Now that we're on our way to a (somewhat) serious beta release I think
it's about time to start de-noising the logging emissions. Since we're
trying out this approach of "stack layer oriented" log levels, I figured
this is a good time to move most of the "warnings" to what they should
be: cancellation monitoring status messages. The level is set to 16
which is just above our "runtime" level but just below the traditional
"info" level. I think this will be a decent approach since usually if
you're confused about why your `tractor` app is behaving unlike you
expect, it's 90% of the time going to be to do with cancellation or
error propagation. This this setup a user can specify the 'cancel' level
and see all the msgs pertaining to both actor and task-in-actor
cancellation mechanics.
2021-10-06 17:02:13 -04:00
Tyler Goodlet bd31f47d5f Handle kbi in ctx blocks via `BaseException`
Fixes prior committed tests by more generally handling `BaseExcepion` in
context blocks. Left in the commented concrete list for reference.
2021-10-04 10:38:44 -04:00
Tyler Goodlet 7857a9ac6d Add `shield: bool` kwarg to `Portal.open_stream_from()` 2021-09-02 21:12:54 -04:00
Tyler Goodlet fde52d2464 Mypy fixes 2021-07-08 12:48:34 -04:00
Tyler Goodlet 91640facbc Always shield cancel the caller on cancel-causing-errors, add teardown logging 2021-07-06 08:23:30 -04:00
Tyler Goodlet af701c16ee Consider relaying context error via raised-in-scope-nursery task 2021-07-06 08:23:29 -04:00
Tyler Goodlet 3d633408fc Don't clobber msg loop mem chan on rx stream close
Revert this change since it really is poking at internals and doesn't
make a lot of sense. If the context is going to be cancelled then the
msg loop will tear down the feed memory channel when ready, we don't
need to be clobbering it and confusing the runtime machinery lol.
2021-07-06 08:23:29 -04:00
Tyler Goodlet 348148ff1e Explicitly formalize context/streaming teardown
Add clear teardown semantics for `Context` such that the remote side
cancellation propagation happens only on error or if client code
explicitly requests it (either by exit flag to `Portal.open_context()`
or by manually calling `Context.cancel()`).  Add `Context.result()`
to wait on and capture the final result from a remote context function;
any lingering msg sequence will be consumed/discarded.

Changes in order to make this possible:
- pass the runtime msg loop's feeder receive channel in to the context
  on the calling (portal opening) side such that a final 'return' msg
  can be waited upon using `Context.result()` which delivers the final
  return value from the callee side `@tractor.context` async function.
- always await a final result from the target context function in
  `Portal.open_context()`'s `__aexit__()` if the context has not
  been (requested to be) cancelled by client code on block exit.
- add an internal `Context._cancel_called` for context "cancel
  requested" tracking (much like `trio`'s cancel scope).
- allow flagging a stream as terminated using an internal
  `._eoc` flag which will mark the stream as stopped for iteration.
- drop `StopAsyncIteration` catching in `.receive()`; it does
  nothing.
2021-07-06 08:23:29 -04:00
Tyler Goodlet 39b9896a62 Only close recv chan if we get a ref 2021-07-06 08:23:29 -04:00
Tyler Goodlet e311430d25 Be more pedantic with error handling 2021-07-06 08:23:29 -04:00
Tyler Goodlet 08eb6bd019 Fix typing 2021-07-06 08:23:29 -04:00
Tyler Goodlet e3955bb62b Add initial bi-directional streaming
This mostly adds the api described in
https://github.com/goodboy/tractor/issues/53#issuecomment-806258798

The first draft summary:
- formalize bidir steaming using the `trio.Channel` style interface
  which we derive as a `MsgStream` type.
- add `Portal.open_context()` which provides a `trio.Nursery.start()`
  remote task invocation style for setting up and tearing down tasks
  contexts in remote actors.
- add a distinct `'started'` message to the ipc protocol to facilitate
  `Context.start()` with a first return value.
- for our `ReceiveMsgStream` type, don't cancel the remote task in
  `.aclose()`; this is now done explicitly by the surrounding `Context`
   usage: `Context.cancel()`.
- streams in either direction still use a `'yield'` message keeping the
  proto mostly symmetric without having to worry about which side is the
  caller / portal opener.
- subtlety: only allow sending a `'stop'` message during a 2-way
  streaming context from `ReceiveStream.aclose()`, detailed comment
  with explanation is included.

Relates to #53
2021-07-06 08:23:29 -04:00
Tyler Goodlet 83af295b45 Fix func type checking 2021-04-28 12:23:08 -04:00
Tyler Goodlet 36251357b3 Add a new one-way stream API
NB: this is a breaking change removing support for `Portal.run()` being
able to invoke remote streaming functions and instead replacing the
method call with an async context manager api `Portal.open_stream_from()`
This style explicitly defines stream teardown at the call site instead
of expecting the user to handle tricky things correctly themselves: eg.
`async_geneartor.aclosing()`. Going forward `Portal.run()` can be used
only for invoking async functions.
2021-04-28 12:23:08 -04:00
Tyler Goodlet 7f38b7225d Aggregate and organize streaming components
Move receive stream into streaming modules and rebrand as a "message
stream".  Factor out cancellation mechanics in `.aclose()` into the
`Context` type which will soon provide the api for for cancelling portal
invocations.  Comment-stage a few methods on both types in anticipation
of a new bi-directional streaming api.  Add a `MsgStream` bidirectional
channel type which will be the eventual type yielded from
`Context.open_stream()`.  Adjust the response/dialog types to be the set
`{'asyncfun', 'asyncgen', 'context'}`. OH, and add async func checking
in `Portal.run()` to catch and error on sync funcs early.
2021-04-28 12:23:08 -04:00