It turns out recent improvements have made the debugger too good
so we need to just terminate the continue loop in this test when
we finally see the "spawn error" crash out because the breakpoint
forever case will literally, continue forever XD
Currently if the spawn task is waiting on a daemon actor it is likely in
`await proc.wait()`, however, if the actor nursery is subsequently
cancelled this checkpoint will be abandoned and the hard proc reaping
sequence will execute which results in a up to 3 second wait before
a "hard" system signal is sent to the child. Ideally such
a cancelled-during-daemon-actor-wait condition is instead handled by
first trying to cancel the remote actor using `Portal.cancel_actor()` (a
"graceful" remote cancel request) which should (presuming normal runtime
operation) result in an immediate collection of the process after normal
actor (remotely triggered) runtime cancellation.
The api we've made here is actually closer to `asyncio.gather()` but
with opening async context managers instead of funcs. Use another event
to allow for graceful teardown of children on non-cancellation exits
and add a doc string.
With the new fixes to the trio spawner we can expect that both root
*and* depth > 1 nursery owning actors will now not clobber any children
that are in debug (either via breakpoint or through crashing). The tests
changed now include more checks which ensure the 2nd level parent-ish
actors also bubble up through into `pdb` and don't kill any of their
(crashed) children before they're done themselves debugging.
Follow up to previous commit: extend our simple context test set to
include cancellation via kbi in the parent as well as timeout logic and
testing of the parent opening a stream even though the target actor does
not.
Thanks again to https://github.com/adder46/wrath for discovering this
bug.
Not sure we even have a test for this yet. The main issue discovered by
a user project (https://github.com/adder46/wrath) was that a kbi raised
inside a block like this (with both recv-only and send-recv streams)
would not cancel on the first ctrl-c sent from console and instead
SIGiNT had to be repeatedly sent as many times as there are subactors in
the first level tree. This test catches that as well as just verifies
the basic side-by-side functionality.
Add a couple more tests to check that a parent and sub-task stream can
be lagged and recovered (depending on who's slower). Factor some of the
test machinery into a new ctx mngr to make it all happen.
The whole origin was not having an explicit open/close semantic for
streams. We have that now so this internal mechanic isn't needed and
further our streams become more correct by having `.aclose()` be
independent of cancellation.
We may get multiple re-entries to debugger by `bp_forever` sub-actor
now since the root will incrementally try to cancel it only when the tty
lock is not held.
This resolves and completes #69 allowing all RPC invocation APIs to pass
function references directly instead of explicit `str` names for the
target namespace and function (this is still done implicitly
underneath). This brings us closer to `trio`'s task running API as well
as acknowledges that any inter-host RPC system (and API) will likely
need to be implemented on top of local RPC primitives anyway. Even if
this ends up **not** being true we can always go to "function stubs" as
part of our IAC protocol or, add a new method to do explicit namespace
calls: `.run_from_module()` or whatever everyone votes on.
Resolves#69
Further, this commit drops `Actor.statespace` from the entire system
since a user can easily get this same functionality using module
level variables. Fix docs to match all these changes (luckily mostly
already done due to example scripts referencing).
Add a ``tractor._portal.StreamReceiveChannel.shield_channel()`` context
manager which allows for avoiding the closing of an IPC stream's
underlying channel for the purposes of task re-spawning. Sometimes you
might want to cancel a task consuming a stream but not tear down the IPC
between actors (the default). A common use can might be where the task's
"setup" work might need to be redone but you want to keep the
established portal / channel in tact despite the task restart.
Includes a test.
This appears to demonstrate the same bug found in #156. It looks like
cancelling a subactor with a child, while that child is running sync code,
can result in the child never getting cancelled due to some strange
condition where the internal nurseries aren't being torn down as
expected when a `trio.Cancelled` is raised.
The real issue is if the root nursery gets cancelled prior to
de-registration with the arbiter. This doesn't seem easy to
reproduce by side effect of a KBI however that is how it was
discovered in practise.
There was code from the last de-registration fix PR that I had commented
(to do with shielding arbiter dereg steps in `Actor._async_main()`) because
the block didn't seem to make a difference under infinite streaming
tests. Turns out it **for sure** is needed under certain conditions (likely
if the actor's root nursery is cancelled prior to actor nursery exit).
This was an attempt to simulate the failure mode if you manually close the
stream **before** cancelling the containing **actor**.
More tests to come I guess.
This truly reproduces #141. It turns out the problem only occurs when
we're cancelled in the middle of consuming "infinite streams".
Good news is this tests a lot of edge cases :)
- ease up on first stream test run deadline
- skip streaming tests in CI for mp backend, period
- give up on > 1 depth nested spawning with mp
- completely give up on slow spawning on windows
Verify ctrl-c, as a user would trigger it, properly cancels the actor
tree. This was an issue with `trio-run-in-process` that clearly wasn't
being handled correctly but for sure is now with the plain old
`trio` process spawner.
Resolves#115
Parametrize our docs example test to include all (now fixed) examples
from the `READ.rst`. The examples themselves have been fixed/corrected
to run but they haven't yet been updated in the actual docs. Once #99
lands these example scripts will be directly included in our
documentation so there will be no possibility of presenting incorrect
examples to our users! This technically fixes#108 even though the new
example aren't going to be included directly in our docs until #99
lands.
Apply the fix from @chrizzFTD where we invoke the entry point using
module exec mode on a ``__main__.py`` and import the
``test_example::`main()` from within that entry point script.
A per #98 we need tests for examples from the docs as they would be run
by a user copy and pasting the code. This adds a small system for loading
examples from an "examples/" directory and executing them in
a subprocess while checking the output. We can use this to also verify
end-to-end expected logging output on std streams (ex. logging on
stderr).
To expand this further we can parameterize the test list using the
contents of the examples directory instead of hardcoding the script
names as I've done here initially.
Also, fix up the current readme examples to have the required/proper `if
__name__ == '__main__'` script guard.
Add a `--spawn-backend` option which can be set to one of {'mp',
'trio_run_in_process'} which will either run the test suite using the
`multiprocessing` or `trio-run-in-process` backend respectively.
Currently trying to run both in the same session can result in hangs
seemingly due to a lack of cleanup of forkservers / resource trackers
from `multiprocessing` which cause broken pipe errors on occasion (no
idea on the details).
For `test_cancellation.py::test_nested_multierrors`, use less nesting
when mp is used since it breaks if we push it too hard with the
whole recursive subprocess spawning thing...
It seems that mixing the two backends in the test suite results in hangs
due to lingering forkservers and resource managers from
`multiprocessing`? Likely we'll need either 2 separate CI runs to work
or someway to be sure that these lingering servers are killed in between
tests.
Another step toward having a complete test for #89.
Subactor breadth still seems to cause the most havoc and is why I've
kept that value to just 2 for now.
Add a test to verify that `trio.MultiError`s are properly propagated up
a simple actor nursery tree. We don't have any exception marshalling
between processes (yet) so we can't validate much more then a simple
2-depth tree. This satisfies the final bullet in #43.
Note I've limited the number of subactors per layer to around 5 since
any more then this seems to break the `multiprocessing` forkserver;
zombie subprocesses seem to be blocking teardown somehow...
Also add a single depth fast fail test just to verify that it's the
nested spawning that triggers this forkserver bug.
Can't seem to get the `capfd` fixture to capture subprocess logging to
stderr even though the console report shows the log message as being
captured? Skipping the test on the forkserver method for now.
In combination with `.aclose()`-ing the async gen instance returned from
`Portal.run()` this demonstrates the python bug:
https://bugs.python.org/issue32526
I've commented out the line that triggers the bug for now since this
case provides motivation for adding our own `trio.abc.ReceiveMemoryChannel`
implementation to be used instead of async gens directly (returned from
`Portal.run()`) since the latter is **not** task safe.
- steal from `trio` and add a `tractor_test` decorator
- use a random arbiter port to avoid conflicts with locally running
systems
- add all the (obviously) hilarious readme tests
- add a complex cancellation test which works with
`trio.move_on_after()`
Remove all the `piker` stuff and add some further checks including:
- main task result is returned correctly
- remote errors are raised locally
- remote async generator yields values locally