Commit Graph

1678 Commits (336db8425e8e14085ac94778d4e40082499d378c)

Author SHA1 Message Date
Tyler Goodlet a79cdc7b44 Make cancel case expect multi-error 2021-12-06 16:32:23 -05:00
Tyler Goodlet c9132de7dc Move maybe-raise-error-msg logic into context
A context method handling all this logic makes the most sense since it
contains all the state related to whether the error should be raised in
a nursery scope or is expected to be raised by a consumer task which
reads and processes the msg directly (via a `Portal` API call). This
also makes it easy to always process remote errors even when there is no
(stream) overrun condition.
2021-12-06 16:32:23 -05:00
Tyler Goodlet 1f8e1cccbb Only pop contexts on decorated entrypoints 2021-12-06 13:48:19 -05:00
Tyler Goodlet 58805a0430 Slight delay to avoid flaky bcast race 2021-12-06 12:17:37 -05:00
Tyler Goodlet 142083d81b Don't cancel the context on overrun cases 2021-12-06 11:54:21 -05:00
Tyler Goodlet 318027ebd1 Raise stream overruns on one side never opened
A context stream overrun should normally never take place since if
a stream is opened (via ``Context.open_stream()``) backpressure is
applied on the message buffer (unless explicitly disabled by the
``backpressure=False`` flag) such that an overrun on the receiving task
should result in blocking the (remote) sender task (eventually depending
on the underlying ``MsgStream`` transport).

Here we add a special error message that reports if one side never
opened a stream and let's the user know in the overrun error message
that they may be trying to push messages to a task that isn't ready to
receive them.

Further fixes / details:
- pop any `Context` at the end of any `_invoke()` task that creates
  one and registers with the runtime.
- ignore but warn about messages received for a context that either
  no longer exists or is unknown (guarding against crashes by malicious
  packets in the latter case)
2021-12-06 11:54:21 -05:00
Tyler Goodlet b826ec8103 Better idea, enable backpressure on opened streams
Keeping it disabled on context open will help with detecting any stream
connection which was never opened on one side of the task pair.  In that
case we can report that there was an overrun **and** a stream wasn't
opened versus if the stream is explicitly configured not to use bp then
we throw the standard overflow.

Use `trio.Nursery._closed` to detect "closure" XD since it seems to be
the most reliable way to determine if a spawn call will trigger
a runtime error.
2021-12-06 11:54:21 -05:00
Tyler Goodlet 4ea5c9b5db Pop context on `.open_context()` exit 2021-12-06 11:54:21 -05:00
Tyler Goodlet f3432bd8fb Enable bp on clustering test 2021-12-05 20:02:55 -05:00
Tyler Goodlet 41a3e6a9ca Type check fixes 2021-12-05 20:00:58 -05:00
Tyler Goodlet 7b9d410c4d Adjust remaining examples and tests for non-backpressure default 2021-12-05 19:52:09 -05:00
Tyler Goodlet 2b05ffcc23 Add context stream overrun tests 2021-12-05 19:50:39 -05:00
Tyler Goodlet 185dbc7e3f Disable msg stream backpressure by default
Half of portal API usage requires a 1 message response (`.run()`,
`.run_in_actor()`) and the streaming APIs should probably be explicitly
enabled for backpressure if desired by the user. This makes more sense
in (psuedo) realtime systems where it's better to notify on a block then
freeze without notice. Make this default behaviour with a new error to
be raised: `tractor._exceptions.StreamOverrun` when a sender overruns
a stream by the default size (2**6 for now). The old behavior can be
enabled with `Context.open_stream(backpressure=True)` but now with
warning log messages when there are overruns.

Add task-linked-context error propagation using a "nursery raising"
technique such that if either end of context linked pair of tasks
errors, that error can be relayed to other side and raised as a form of
interrupt at the receiving task's next `trio` checkpoint. This enables
reliable error relay without expecting the (error) receiving task to
call an API which would raise the remote exception (which it might never
currently if using `tractor.MsgStream` APIs).

Further internal implementation details:
- define the default msg buffer size as `Actor.msg_buffer_size`
- expose a `msg_buffer_size: int` kwarg from `Actor.get_context()`
- maybe raise aforementioned context errors using
  `Context._maybe_error_from_remote_msg()` inside `Actor._push_result()`
- support optional backpressure on a stream when pushing messages
  in `Actor._push_result()`
- in `_invote()` handle multierrors raised from a `@tractor.context`
  entrypoint as being potentially caused by a relayed error from the
  remote caller task, if `Context._error` has been set then raise that
  error inside the `RemoteActorError` that will be relayed back to that
  caller more or less proxying through the source side error back to its
  origin.
2021-12-05 19:31:41 -05:00
Tyler Goodlet 2680a9473d Always set `Context._portal` on the caller task side 2021-12-05 19:28:00 -05:00
Tyler Goodlet 92b540d518 Add internal msg stream backpressure controls
In preparation for supporting both backpressure detection (through an
optional error) as well as control over the msg channel buffer size, add
internal configuration flags for both to contexts. Also adjust
`Context._err_on_from_remote_msg()` -> `._maybe..` such that it can be
called and will only raise if a scope nursery has been set. Add
a `Context._error` for stashing the remote task's error that may be
delivered in an `'error'` message.
2021-12-05 19:19:53 -05:00
Tyler Goodlet 6751349987 Add a stream overrun exception 2021-12-05 18:28:02 -05:00
Tyler Goodlet d307eab118 Rework `Actor.send_cmd()` to `.start_remote_task()`
This more formally declares the runtime's remote task startingn API
and uses it throughout all the dependent `Portal` API methods.
Allows dropping `Portal._submit()` and simplifying `.run_in_actor()`
style result waiting to be delegated to the context APIs at remote
task `return` response time. We now also track the remote entrypoint
"type` as `Context._remote_func_type`.
2021-12-04 18:20:43 -05:00
Tyler Goodlet 872b24aedd Prove we've fixed #265 2021-12-03 14:49:55 -05:00
Tyler Goodlet c5c3f7e789 Use `tractor.Context` throughout the runtime core
Instead of tracking feeder mem chans per RPC dialog, store `Context`
instances which (now) hold refs to the underlying RPC-task feeder chans
and track them inside a `Actor._contexts` map. This begins a transition
to making the "context" idea the primitive abstraction for representing
messaging dialogs between tasks in different memory domains (i.e.
usually separate processes).

A slew of changes made this possible:
- change `Actor.get_memchans()` -> `.get_context()`.
- Add new `Context._send_chan` and `._recv_chan` vars.
- implicitly create a new context on every `Actor.send_cmd()` call.
- use the context created by `.send_cmd()` in `Portal.open_context()`
  instead of manually creating one.
- call `Actor.get_context()` inside tasks run from `._invoke()`
  such that feeder chans are implicitly created for callee tasks
  thus fixing the bug #265.

NB: We might change some of the internal semantics to do with *when* the
feeder chans are actually created to denote whether or not a far end
task is actually *read to receive* messages. For example, in the cases
where it **never** will be ready to receive messages (one-way streaming,
a context that never opens a stream, etc.) we will likely want some kind
of error or at least warning to the caller that messages can't be sent
(yet).
2021-12-03 14:49:55 -05:00
Tyler Goodlet 3f6099f161 Add a double started error checking test 2021-12-03 10:08:55 -05:00
Tyler Goodlet 568902a5a9 Add test for #265: "msg sent before stream opened"
This always triggered the mentioned race condition.
We need to figure out the best approach to avoid this case.
2021-12-03 10:08:55 -05:00
Tyler Goodlet f4793af2b9 Error on mal-use of `Context.started()`
Previously we were ignoring a race where the callee an opened task
context could enter `Context.open_stream()` before calling `.started().
Disallow this as well as calling `.started()` more then once.
2021-12-03 10:08:55 -05:00
goodboy ae6d751d71
Merge pull request #267 from goodboy/acked_remote_cancels
Acked remote cancels
2021-12-03 09:51:41 -05:00
Tyler Goodlet 94a3cc532c Add nooz 2021-12-02 18:09:07 -05:00
Tyler Goodlet 08e9593306 Suppress broken resources errors in `Portal.cancel_actor()` 2021-12-02 15:29:04 -05:00
Tyler Goodlet 14f84571fb Don't cancel receive streams inside `.cancel_actor()`
We don't need to any more presuming you get ideal remote cancellation
conditions where the remote actor should teardown and kill the streams
from its end.
2021-12-02 15:29:04 -05:00
Tyler Goodlet e561a4908f Appease mypy 2021-12-02 15:29:04 -05:00
Tyler Goodlet a29924f330 Don't assume exception order from nursery 2021-12-02 08:45:58 -05:00
Tyler Goodlet 46070f99de Factor soft-wait logic into a helper, use with mp 2021-12-02 08:18:04 -05:00
Tyler Goodlet d81eb1a51e Finally, deterministic remote cancellation support
On msg loop termination we now check and see if a channel is associated
with a child-actor registered in some local task's nursery. If so, we
attempt to wait on channel closure initiated from the child side (by
draining the underlying msg stream) so as to avoid closing it too early
resulting in the child not relaying its termination status response. This
means we now support the ideal case in 2-general's where we get back the
ack to the closure request instead of just ignoring it and timing out XD

The main implementation detail is that when `Portal.cancel_actor()`
remotely calls `Actor.cancel()` we actually wait for the RPC response
from that request before allowing the channel shutdown sequence to
engage. The new msg stream draining support enables this.

Also, factor child-to-parent error propagation logic into a helper func
and improve some docs (yeah yeah y'all don't like the ''', i don't
care - it makes my eyes not hurt).
2021-12-02 08:18:04 -05:00
Tyler Goodlet d817f1a658 Add a nursery "exited" signal
Use a `trio.Event` to enable nursery closure detection such that core
runtime tasks can be notified when a local nursery exits and allow
shutdown protocols to operate without close-before-terminate issues
(such as IPC channel closure during remote peer cancellation).
2021-12-02 08:18:04 -05:00
Tyler Goodlet a23afb0bb8 Set channel cancel called flag on cancel requests 2021-12-02 08:18:04 -05:00
Tyler Goodlet 1976e61d1a Add `.drain()` support to msg streams
Enables "draining" the last set of messages after a channel/stream has
been terminated mostly for the purposes of receiving a final ACK to
a remote cancel command. Also, add an internal `Channel._cancel_called`
flag which can be set by `Portal.cancel_actor()`.
2021-12-02 08:18:04 -05:00
Tyler Goodlet 0ac3397dbb Only soft-acquire debug lock if a proc was spawned 2021-12-02 08:17:03 -05:00
Tyler Goodlet 62b2867e07 Tweak doc strings 2021-12-02 08:16:49 -05:00
Tyler Goodlet bf6958cdbe Handle cancelled-before-proc-created spawn case
It's definitely possible to have a nursery spawn task be cancelled
before a `trio.Process` handle is ever returned; we now handle this
case as a cancelled-during-spawn scenario. Zombie collection logic
also is bypassed in this case.
2021-12-02 08:16:05 -05:00
goodboy d05885d650
Merge pull request #266 from goodboy/faster_daemon_cancels
Faster graceful daemon cancels
2021-11-30 09:29:13 -05:00
Tyler Goodlet 77fc705b1f Add nooz 2021-11-29 22:52:19 -05:00
Tyler Goodlet 16a3321a38 Increase timeout for windows.. 2021-11-29 21:52:30 -05:00
Tyler Goodlet 7eb465a699 Graceful cancel actors before hard reaping 2021-11-29 16:03:23 -05:00
Tyler Goodlet 121f7fd844 Draft test that shows a slow daemon cancellation
Currently if the spawn task is waiting on a daemon actor it is likely in
`await proc.wait()`, however, if the actor nursery is subsequently
cancelled this checkpoint will be abandoned and the hard proc reaping
sequence will execute which results in a up to 3 second wait before
a "hard" system signal is sent to the child.  Ideally such
a cancelled-during-daemon-actor-wait condition is instead handled by
first trying to cancel the remote actor using `Portal.cancel_actor()` (a
"graceful" remote cancel request) which should (presuming normal runtime
operation) result in an immediate collection of the process after normal
actor (remotely triggered) runtime cancellation.
2021-11-29 16:03:14 -05:00
goodboy ac821bdd94
Merge pull request #264 from goodboy/runinactor_none_result
Fix `Portal.run_in_actor()` returns `None` result
2021-11-29 09:21:24 -05:00
Tyler Goodlet f6de7e0afd Factor out msg unwrapping into a func 2021-11-29 08:46:35 -05:00
Tyler Goodlet 0e7234aa68 Cache the return message instead of the value
Thanks to @richardsheridan for pointing out the limitations of using
*any* kind of value as the result-cached-flag and how it might cause
problems for anyone returning pickled blob-data. This changes the
`Portal` internal result value tracking to stash the full message from
which the value can be retrieved by any `Portal.result()` caller.
The internal change is that `Portal._return_once()` now returns a tuple
of the message *and* its value.
2021-11-29 07:44:44 -05:00
Tyler Goodlet 83da92d4cb Add nooz 2021-11-28 19:16:47 -05:00
Tyler Goodlet 57e98b25e7 Increase timeout, windows... 2021-11-20 13:08:19 -05:00
Tyler Goodlet 095c94b1d2 Fix `Portal.run_in_actor()` returns `None` bug
Fixes the issue where if the main remote task returns `None`,
`Portal.result()` would erroneously wait again on the underlying feeder
mem chan since `None` was being used as the cache flag. Instead set the
flag as the channel uid and consider the result collected when set to
anything else (since it would be odd to return that value from a remote
task when you already can read it as part of portal/channel apis).
2021-11-20 13:02:08 -05:00
Tyler Goodlet f32ccd76aa Add `Portal.result()` is None test case
This demonstrates a bug where if the remote `.run_in_actor()` task
returns `None` then multiple calls to `Portal.result()` will hang
forever...
2021-11-20 13:02:08 -05:00
goodboy b527fdbe1a
Merge pull request #263 from goodboy/early_deth_fixes
Early deth fixes
2021-11-08 21:23:09 -05:00
Tyler Goodlet 6b0366fe04 Guard against TCP server never started on cancel 2021-11-07 23:49:32 -05:00