Commit Graph

1252 Commits (925d5c1cebe8f037bb0b290e33400b7b373fef6a)

Author SHA1 Message Date
Tyler Goodlet 8477d21499 Restructure actor runtime nursery scoping
In an effort acquire more deterministic actor cancellation,
this adds a clearer and more resilient (whilst possibly a bit
slower) internal nursery structure with explicit semantics for
clarifying the task-scope shutdown sequence.

Namely, on cancellation, the explicit steps are now:
- cancel all currently running rpc tasks and wait
  for them to complete
- cancel the channel server and wait for it to complete
- cancel the msg loop for the channel with the immediate parent
- de-register with arbiter if possible
- wait on remaining connections to release
- exit process

To accomplish this add a new nursery called the "service nursery" which
spawns all rpc tasks **instead of using** the "root nursery". The root
is now used solely for async launching the msg loop for the primary
channel with the parent such that it is (nearly) the last thing torn
down on cancellation.

In the future it should also be possible to have `self.cancel()` return
a result to the parent once the runtime is sure that the rest of the
shutdown is atomic; this would allow for a true unbounded shield in
`Portal.cancel_actor()`. This will likely require that the error
handling blocks in `Actor._async_main()` are moved "inside" the root
nursery block such that the msg loop with the parent truly is the last
thing to terminate.
2020-08-08 14:55:41 -04:00
Tyler Goodlet 90c7fa6963 Allow shielding in `open_portal()` 2020-08-08 14:47:52 -04:00
Tyler Goodlet 532429aec9 Harden `trio` spawner process waiting
Always shield waiting for he process and always run
``trio.Process.__aexit__()`` on teardown. This enforces
that shutdown happens to due cancellation triggered inside
the sub-actor instead of the process being killed externally
by the parent.
2020-08-08 14:43:25 -04:00
Tyler Goodlet fe45d99f65 Allow opening a portal through an existing channel 2020-08-07 12:02:06 -04:00
Tyler Goodlet ae8488a578 Always shield de-register step with arbiter 2020-08-07 11:36:26 -04:00
Tyler Goodlet 3a868fec30 Cancel root nursery to trigger failure
The real issue is if the root nursery gets cancelled prior to
de-registration with the arbiter. This doesn't seem easy to
reproduce by side effect of a KBI however that is how it was
discovered in practise.
2020-08-07 11:34:17 -04:00
Tyler Goodlet d2d8860dad Add test for dereg failure on manual stream close
There was code from the last de-registration fix PR that I had commented
(to do with shielding arbiter dereg steps in `Actor._async_main()`) because
the block didn't seem to make a difference under infinite streaming
tests. Turns out it **for sure** is needed under certain conditions (likely
if the actor's root nursery is cancelled prior to actor nursery exit).
This was an attempt to simulate the failure mode if you manually close the
stream **before** cancelling the containing **actor**.

More tests to come I guess.
2020-08-07 09:16:01 -04:00
Guillermo Rodriguez 8da45eedf4
Merge pull request #143 from goodboy/ensure_deregister
Ensure actors de-register with arbiter when cancelled during infitinite streaming.
2020-08-04 12:19:02 -03:00
Tyler Goodlet 09ae51900d Better clarify uid comment 2020-08-04 09:52:49 -04:00
Tyler Goodlet 4f92cfe74f Don't `.aclose` `trio` processes until the very end
Trio will kill subprocesses via `Process.__aexit__()` using a `finally:`
block (which, yes, will get triggered on cancellation) so we avoid that
until true process "tear down" since subactors do many things during
graceful shutdown (such as de-registering from the name discovery
system). Oddly this only seems to be an issue during cancellation of
infinite stream consumption.

Resolves #141
2020-08-03 18:57:00 -04:00
Tyler Goodlet ae9016c06a Log on KBI cancelled termination 2020-08-03 18:46:18 -04:00
Tyler Goodlet a24c6bfdd2 Correctly catch cancelled nursery case (purely for logging) 2020-08-03 18:44:50 -04:00
Tyler Goodlet 56b81f07e5 Return `Dict[Tuple, Tuple]` from `.get_registry()` 2020-08-03 18:42:23 -04:00
Tyler Goodlet fbd68d2d91 Allow for tuple keys with std `msgpack` 2020-08-03 18:41:21 -04:00
Tyler Goodlet a5279f80a7 Actually reproduce the de-registration problem
This truly reproduces #141. It turns out the problem only occurs when
we're cancelled in the middle of consuming "infinite streams".
Good news is this tests a lot of edge cases :)
2020-08-03 18:28:09 -04:00
Tyler Goodlet 699bfd1857 Run unreg on cancel tests with remote arbiter as well 2020-08-03 15:41:41 -04:00
Tyler Goodlet 639299e6eb Expose a `.get_registry()` method on the arbiter 2020-08-03 15:40:41 -04:00
Tyler Goodlet 2ccaa94c60 Move daemon fixture up to conftest 2020-08-03 15:39:54 -04:00
Tyler Goodlet 0d9483376d Test cancel with SIGINT on non-windows as well 2020-08-03 13:01:56 -04:00
Tyler Goodlet cd2d8c217a Test that subactors deregister on cancel 2020-08-03 12:53:03 -04:00
goodboy a399bd3033
Merge pull request #133 from guilledk/drop_cloudpickle
Drop cloudpickle dependency
2020-07-29 18:24:27 -04:00
Guillermo Rodriguez 3e29fcf1ea
Docstring to the top\!, and redundant spaces goodbye\! 2020-07-29 15:39:38 -03:00
Guillermo Rodriguez a565d38251
Merge pull request #2 from goodboy/start_up_sequence_trickery
Start up sequence trickery
2020-07-29 15:02:51 -03:00
Tyler Goodlet da56d0f043 Add slight delays to SIGINT tests on mp 2020-07-29 13:27:15 -04:00
Tyler Goodlet 8f17c89cf9 Skip **every** quad test for mp on ci 2020-07-29 10:26:19 -04:00
Tyler Goodlet 9a40291d4a Repair startup sequence around parent state transfer
In order to have reliable subactor startup we need the following
sequence to take place:
- connect to the parent actor, handshake and receive runtime state
- load exposed modules into memory
- start the channel server up fully using the provided bind address
- finally, start processing new messages from the parent

Add a bunch more comments to clarify all this.
2020-07-28 22:25:22 -04:00
Guillermo Rodriguez 0a5691e0a8
Removed arbiter_addr local, and bind_addr is now passed through channel, in early child actor init. 2020-07-28 11:55:11 -03:00
Guillermo Rodriguez 8b44ec7a5d
Actually dropping the cloudpickle dependency from setup.py 2020-07-27 21:10:04 -03:00
Guillermo Rodriguez ef053eb070
Added named arguments to child init, and now passing less of them. 2020-07-27 21:05:00 -03:00
Guillermo Rodriguez e5dbf14ec3
Onlt await params in trio mode 2020-07-27 15:20:55 -03:00
Guillermo Rodriguez 2a407be532
Now passing additional initialization parameters through channel early after handshake. 2020-07-27 14:55:37 -03:00
goodboy 2cc4d7ce04
Merge pull request #135 from goodboy/fix_win_ci_again
Fix windows CI, again.
2020-07-27 13:19:01 -04:00
Tyler Goodlet 5715fd4599 Skip streaming tests 2020-07-27 12:20:46 -04:00
Tyler Goodlet e8a38e4d15 Fix cancelled type handling 2020-07-27 11:15:05 -04:00
goodboy ed96672136
Merge pull request #128 from goodboy/flaky_tests
Drop trio-run-in-process,  use pure trio process spawner, test out of channel ctrl-c subactor cancellation
2020-07-26 23:59:58 -04:00
Tyler Goodlet 3c7ec72f8e Fix SIGINT test names 2020-07-26 23:37:44 -04:00
Tyler Goodlet 5a27065a10 Finally tame the super flaky tests
- ease up on first stream test run deadline
- skip streaming tests in CI for mp backend, period
- give up on > 1 depth nested spawning with mp
- completely give up on slow spawning on windows
2020-07-26 22:53:40 -04:00
Tyler Goodlet 891edbab5f Run the trio spawner in nested tests 2020-07-25 18:19:17 -04:00
Tyler Goodlet dddbeb0e71 Run Windows on trio and mp backends
The new pure trio spawning backend uses `subprocess` internally which is
also supported on windows so let's run it in CI.
2020-07-25 13:41:48 -04:00
Tyler Goodlet 7c3928f0bf Oh mypy.. 2020-07-24 17:31:24 -04:00
Tyler Goodlet d3acb8d061 Wait on proc before killing stdio 2020-07-24 17:08:52 -04:00
Tyler Goodlet efde3a5773 Simplify the `_child.py` script
We don't really need stdin for anything but passing the entry point and
detaching it seemed to just cause errors on cancellation teardown.
2020-07-24 17:08:52 -04:00
Tyler Goodlet aa620fe61d Use `trio.Process.__aexit__()` and pass the actor uid
Using the context manager interface does some extra teardown beyond simply
calling `.wait()`. Pass the subactor's "uid" on the exec line for
debugging purposes when monitoring the process tree from the OS.
Hard code the child script module path to avoid a double import warning.
2020-07-24 17:08:52 -04:00
Tyler Goodlet a215df8dfc Add true ctrl-c tests using an out-of-band SIGINT
Verify ctrl-c, as a user would trigger it, properly cancels the actor
tree. This was an issue with `trio-run-in-process` that clearly wasn't
being handled correctly but for sure is now with the plain old
`trio` process spawner.

Resolves #115
2020-07-24 17:08:52 -04:00
Tyler Goodlet 4de75c3a9d Test cancel via api and keyboard interrupt
An initial attempt to discover an issue with trio-run-inprocess.
This is a good test to have regardless.
2020-07-24 17:08:52 -04:00
Tyler Goodlet 5adf2f3b0c Add logging to some cancel tests 2020-07-24 17:08:52 -04:00
Tyler Goodlet 4516febe26 Make sure to wait trio processes on teardown 2020-07-24 17:08:52 -04:00
Tyler Goodlet 0b305fd78a Change spawn method name in `Actor.load_modules()` 2020-07-24 17:08:52 -04:00
Tyler Goodlet 0936bdc592 Add back subactor logging 2020-07-24 17:08:52 -04:00
Guillermo Rodriguez 56463a08df First attempt at removing trip & updating hazmat -> lowlevel 2020-07-24 17:08:52 -04:00