Such that you can use,
```python
tractor.to_asyncio.run_as_asyncio_guest(
trio_main=_trio_main,
)
```
to boostrap the root actor (and thus main parent process) to embed
the actor-rumtime into an `asyncio` loop. Prove it all works with an
subactor-free version of the aio echo-server test suite B)
By re-purposing our `pexpect`-based console matching with a new
`debugging/shield_hang_in_sub.py` example, this tests a few "hanging
actor" conditions more formally:
- that despite a hanging actor's task we can dump
a `stackscope.extract()` tree on relay of `SIGUSR1`.
- the actor tree will terminate despite a shielded forever-sleep by our
"T-800" zombie reaper machinery activating and hard killing the
underlying subprocess.
Some test deats:
- simulates the expect actions of a real user by manually using
`os.kill()` to send both signals to the actor-tree program.
- `pexpect`-matches against `log.devx()` emissions under normal
`debug_mode == True` usage.
- ensure we get the actual "T-800 deployed" `log.error()` msg and
that the actor tree eventually terminates!
Surrounding (re-org/impl/test-suite) changes:
- allow disabling usage via a `maybe_enable_greenback: bool` to
`open_root_actor()` but enable by def.
- pretty up the actual `.devx()` content from `.devx._stackscope`
including be extra pedantic about the conc-primitives for each signal
event.
- try to avoid double handles of `SIGUSR1` even though it seems the
original (what i thought was a) problem was actually just double
logging in the handler..
|_ avoid double applying the handler func via `signal.signal()`,
|_ use a global to avoid double handle func calls and,
|_ a `threading.RLock` around handling.
- move common fixtures and helper routines from `test_debugger` to
`tests/devx/conftest.py` and import them for use in both test mods.
Since we more or less require it for `tractor.pause_from_sync()` this
refines enable toggles and their relay down the actor tree as well as
more explicit logging around init and activation.
Tweaks summary:
- `.info()` report the module if discovered during root boot.
- use a `._state._runtime_vars['use_greenback']: bool` activation flag
inside `Actor._from_parent()` to determine if the sub should try to
use it and set to `False` if mod-loading fails / not installed.
- expose `maybe_init_greenback()` from `.devx` sugpkg.
- comment out RTE in `._pause()` for now since we already have it in
`.pause_from_sync()`.
- always `.exception()` on `maybe_init_greenback()` import errors to
clarify the underlying failure deats.
- always explicitly report if `._state._runtime_vars['use_greenback']`
was NOT set when `.pause_from_sync()` is called.
Other `._runtime.async_main()` adjustments:
- combine the "internal error call ur parents" message and the failed
registry contact status into one new `err_report: str`.
- drop the final exception handler's call to
`Actor.lifetime_stack.close()` since we're already doing it in the
`finally:` block and the earlier call has no currently known benefit.
- only report on the `.lifetime_stack()` callbacks if any are detected
as registered.
From both `open_root_actor()` and `._entry._trio_main()`.
Other `breakpoint()`-from-sync-func fixes:
- properly disable the default hook using `"0"` XD
- offer a `hide_tb: bool` from `open_root_actor()`.
- disable hiding the `._trio_main()` frame, bc pretty sure it doesn't
help anyone (either way) when REPL-ing/tb-ing from a subactor..?
- start tossing in `__tracebackhide__`s to various eps which don't need
to show in tbs or in the pdb REPL.
- port final `._maybe_enter_pm()` to pass a `api_frame`.
- start comment-marking up some API eps with `@api_frame`
in prep for actually using the new frame-stack tracing.
Breaks out all the (sub)actor local conc primitives from `Lock` (which
is now only used in and by the root actor) such that there's an explicit
distinction between a task that's "consuming" the `Lock` (remotely) vs.
the root-side service tasks which do the actual acquire on behalf of the
requesters.
`DebugStatus` changeover deats:
------ - ------
- move all the actor-local vars over `DebugStatus` including:
- move `_trio_handler` and `_orig_sigint_handler`
- `local_task_in_debug` now `repl_task`
- `_debugger_request_cs` now `req_cs`
- `local_pdb_complete` now `repl_release`
- drop all ^ fields from `Lock.repr()` obvi..
- move over the `.[un]shield_sigint()` and
`.is_main_trio_thread()` methods.
- add some new attrs/meths:
- `DebugStatus.repl` for the currently running `Pdb` in-actor
singleton.
- `.repr()` for pprint of state (like `Lock`).
- Note: that even when a root-actor task is in REPL, the `DebugStatus`
is still used for certain actor-local state mgmt, such as SIGINT
handler shielding.
- obvi change all lock-requester code bits to now use a `DebugStatus` in
their local actor-state instead of `Lock`, i.e. change usage from
`Lock` in `._runtime` and `._root`.
- use new `Lock.get_locking_task_cs()` API in when checking for
sub-in-debug from `._runtime.Actor._stream_handler()`.
Unrelated to topic-at-hand tweaks:
------ - ------
- drop the commented bits about hiding `@[a]cm` stack frames from
`_debug.pause()` and simplify to only one block with the `shield`
passthrough since we already solved the issue with cancel-scopes using
`@pdbp.hideframe` B)
- this includes all the extra logging about the extra frame for the
user (good thing i put in that wasted effort back then eh..)
- put the `try/except BaseException` with `log.exception()` around the
whole of `._pause()` to ensure we don't miss in-func errors which can
cause hangs..
- allow passing in `portal: Portal` to
`Actor.start_remote_task()` such that `Portal` task spawning methods
are always denoted correctly in terms of `Context.side`.
- lotsa logging tweaks, decreasing a bit of noise from `.runtime()`s.
Such that the top level `maybe_enable_greenback` from
`open_root_actor()` can toggle the entire actor tree's usage.
Read the rtv in `._rpc` tasks and only enable if set.
Also, rigor up the `._rpc.process_messages()` loop to handle `Error()`
and `case _:` separately such that we now raise an explicit rte for
unknown / invalid msgs. Use "parent" / "child" for side descriptions in
loop comments and put a fat comment before the `StartAck` in `_invoke()`.
Not sure if this is a good tactic (yet) but it at least covers us from
getting user's confused by `breakpoint()` usage causing REPL clobbering.
Always set an explicit rte raising breakpoint hook such that the user
realizes they can't use `.pause_from_sync()` without enabling debug
mode.
** CHERRY-PICK into `pause_from_sync_w_greenback` branch! **
Now supports use from any `trio` task, any sync thread started with
`trio.to_thread.run_sync()` AND also via `breakpoint()` builtin API!
The only bit missing now is support for `asyncio` tasks when in infected
mode.. Bo
`greenback` setup/API adjustments:
- move `._rpc.maybe_import_gb()` to -> `devx._debug` and factor out the cached
import checking into a sync func whilst placing the async `.ensure_portal()`
bootstrapping into a new async `maybe_init_greenback()`.
- use the new init-er func inside `open_root_actor()` with the output
predicating whether we override the `breakpoint()` hook.
core `devx._debug` implementation deatz:
- make `mk_mpdb()` only return the `pdp.Pdb` subtype instance since
the sigint unshielding func is now accessible from the `Lock`
singleton from anywhere.
- add non-main thread support (at least for `trio.to_thread` use cases)
to our `Lock` with a new `.is_trio_thread()` predicate that delegates
directly to `trio`'s internal version.
- do `Lock.is_trio_thread()` checks inside any methods which require
special provisions when invoked from a non-main `trio` thread:
- `.[un]shield_sigint()` methods since `signal.signal` usage is only
allowed from cpython's main thread.
- `.release()` since `trio.StrictFIFOLock` can only be called from
a `trio` task.
- rework `.pause_from_sync()` itself to directly call `._set_trace()`
and don't bother with `greenback._await()` when we're already calling
it from a `.to_thread.run_sync()` thread, oh and try to use the
thread/task name when setting `Lock.local_task_in_debug`.
- make it an RTE for now if you try to use `.pause_from_sync()` from any
infected-`asyncio` task, but support is (hopefully) coming soon!
For testing we add a new `test_debugger.py::test_pause_from_sync()`
which includes a ctrl-c parametrization around the
`examples/debugging/sync_bp.py` script which includes all currently
supported/working usages:
- `tractor.pause_from_sync()`.
- via `breakpoint()` overload.
- from a `trio.to_thread.run_sync()` spawn.
- `trio_typing` is nearly obsolete since `trio >= 0.23`
- `exceptiongroup` is built-in to python 3.11
- `async_generator` primitives have lived in `contextlib` for quite
a while!
Not sure if it's really that useful other then for reporting errors from
`current_actor()` but at least it alerts `tractor` devs and/or users
when the runtime has already terminated vs. hasn't been started
yet/correctly.
Set the `._last_actor_terminated: tuple` in the root's final block which
allows testing for an already terminated tree which is the case where
`._state._current_actor == None` and the last is set.
If `stackscope` is importable and debug_mode is enabled then we by
default call and report `.devx.enable_stack_on_sig()` is set B)
This makes debugging unexpected (SIGINT ignoring) hangs a cinch!
Where `.devx` is "developer experience", a hopefully broad enough subpkg
name for all the slick stuff planned to augment working on the actor
runtime 💥
Move the `._debug` module into the new subpkg and adjust rest of core
code base to reflect import path change. Also add a new
`.devx._debug.open_crash_handler()` manager for wrapping any sync code
outside a `trio.run()` which is handy for eventual CLI addons for
popular frameworks like `click`/`typer`.
Since we'd like to eventually allow a diverse set of transport
(protocol) methods and stacks, and a multi-peer discovery system for
distributed actor-tree applications, this reworks all runtime internals
to support multi-homing for any given tree on a logical host. In other
words any actor can now bind its transport server (currently only
unsecured TCP + `msgspec`) to more then one address available in its
(linux) network namespace. Further, registry actors (now dubbed
"registars" instead of "arbiters") can also similarly bind to multiple
network addresses and provide discovery services to remote actors via
multiple addresses which can now be provided at runtime startup.
Deats:
- adjust `._runtime` internals to use a `list[tuple[str, int]]` (and
thus pluralized) socket address sequence where applicable for transport
server socket binds, now exposed via `Actor.accept_addrs`:
- `Actor.__init__()` now takes a `registry_addrs: list`.
- `Actor.is_arbiter` -> `.is_registrar`.
- `._arb_addr` -> `._reg_addrs: list[tuple]`.
- always reg and de-reg from all registrars in `async_main()`.
- only set the global runtime var `'_root_mailbox'` to the loopback
address since normally all in-tree processes should have access to
it, right?
- `._serve_forever()` task now takes `listen_sockaddrs: list[tuple]`
- make `open_root_actor()` take a `registry_addrs: list[tuple[str, int]]`
and defaults when not passed.
- change `ActorNursery.start_..()` methods take `bind_addrs: list` and
pass down through the spawning layer(s) via the parent-seed-msg.
- generalize all `._discovery()` APIs to accept `registry_addrs`-like
inputs and move all relevant subsystems to adopt the "registry" style
naming instead of "arbiter":
- make `find_actor()` support batched concurrent portal queries over
all provided input addresses using `.trionics.gather_contexts()` Bo
- syntax: move to using `async with <tuples>` 3.9+ style chained
@acms.
- a general modernization of the code to a python 3.9+ style.
- start deprecation and change to "registry" naming / semantics:
- `._discovery.get_arbiter()` -> `.get_registry()`
This adds remote cancellation semantics to our `tractor.Context`
machinery to more closely match that of `trio.CancelScope` but
with operational differences to handle the nature of parallel tasks interoperating
across multiple memory boundaries:
- if an actor task cancels some context it has opened via
`Context.cancel()`, the remote (scope linked) task will be cancelled
using the normal `CancelScope` semantics of `trio` meaning the remote
cancel scope surrounding the far side task is cancelled and
`trio.Cancelled`s are expected to be raised in that scope as per
normal `trio` operation, and in the case where no error is raised
in that remote scope, a `ContextCancelled` error is raised inside the
runtime machinery and relayed back to the opener/caller side of the
context.
- if any actor task cancels a full remote actor runtime using
`Portal.cancel_actor()` the same semantics as above apply except every
other remote actor task which also has an open context with the actor
which was cancelled will also be sent a `ContextCancelled` **but**
with the `.canceller` field set to the uid of the original cancel
requesting actor.
This changeset also includes a more "proper" solution to the issue of
"allowing overruns" during streaming without attempting to implement any
form of IPC streaming backpressure. Implementing task-granularity
backpressure cross-process turns out to be more or less impossible
without augmenting out streaming protocol (likely at the cost of
performance). Further allowing overruns requires special care since
any blocking of the runtime RPC msg loop task effectively can block
control msgs such as cancels and stream terminations.
The implementation details per abstraction layer are as follows.
._streaming.Context:
- add a new contructor factor func `mk_context()` which provides
a strictly private init-er whilst allowing us to not have to define
an `.__init__()` on the type def.
- add public `.cancel_called` and `.cancel_called_remote` properties.
- general rename of what was the internal `._backpressure` var to
`._allow_overruns: bool`.
- move the old contents of `Actor._push_result()` into a new
`._deliver_msg()` allowing for better encapsulation of per-ctx
msg handling.
- always check for received 'error' msgs and process them with the new
`_maybe_cancel_and_set_remote_error()` **before** any msg delivery to
the local task, thus guaranteeing error and cancellation handling
despite any overflow handling.
- add a new `._drain_overflows()` task-method for use with new
`._allow_overruns: bool = True` mode.
- add back a `._scope_nursery: trio.Nursery` (allocated in
`Portal.open_context()`) who's sole purpose is to spawn a single task
which runs the above method; anything else is an error.
- augment `._deliver_msg()` to start a task and run the above method
when operating in no overrun mode; the task queues overflow msgs and
attempts to send them to the underlying mem chan using a blocking
`.send()` call.
- on context exit, any existing "drainer task" will be cancelled and
remaining overflow queued msgs are discarded with a warning.
- rename `._error` -> `_remote_error` and set it in a new method
`_maybe_cancel_and_set_remote_error()` which is called before
processing
- adjust `.result()` to always call `._maybe_raise_remote_err()` at its
start such that whenever a `ContextCancelled` arrives we do logic for
whether or not to immediately raise that error or ignore it due to the
current actor being the one who requested the cancel, by checking the
error's `.canceller` field.
- set the default value of `._result` to be `id(Context()` thus avoiding
conflict with any `.result()` actually being `False`..
._runtime.Actor:
- augment `.cancel()` and `._cancel_task()` and `.cancel_rpc_tasks()` to
take a `requesting_uid: tuple` indicating the source actor of every
cancellation request.
- pass through the new `Context._allow_overruns` through `.get_context()`
- call the new `Context._deliver_msg()` from `._push_result()` (since
the factoring out that method's contents).
._runtime._invoke:
- `TastStatus.started()` back a `Context` (unless an error is raised)
instead of the cancel scope to make it easy to set/get state on that
context for the purposes of cancellation and remote error relay.
- always raise any remote error via `Context._maybe_raise_remote_err()`
before doing any `ContextCancelled` logic.
- assign any `Context._cancel_called_remote` set by the `requesting_uid`
cancel methods (mentioned above) to the `ContextCancelled.canceller`.
._runtime.process_messages:
- always pass a `requesting_uid: tuple` to `Actor.cancel()` and
`._cancel_task` to that any corresponding `ContextCancelled.canceller`
can be set inside `._invoke()`.
Previously we were leaking our (pdb++) override into the Python runtime
which would always result in a runtime error whenever `breakpoint()` is
called outside our runtime; after exit of the root actor . This
explicitly restores any previous hook override (detected during startup)
or deletes the hook and restores the environment if none existed prior.
Also adds a new WIP debugging example script to ensure breakpointing
works as normal after runtime close; this will be added to the test
suite.
Pretty sure this is the final touch to alleviate all our debug lock
headaches! Instead of trying to revert to the "last" handler (as `pdb`
does internally in the stdlib) we always just revert to the handler
`trio` registers during startup. Further this seems to allow cancelling
the root-side locking task if it's detected as stale IFF we only do this
when the root actor is in a "no more IPC peers" state.
Deatz:
- (always) set `._debug.Lock._trio_handler` as the `trio` version, not
some last used handler to make sure we're getting the ctrl-c handling
we want when not in debug mode.
- assign the trio handler in `open_root_actor()`
`._runtime._async_main()` to be sure it's applied in subactors as well
as the root.
- only do debug lock blocking and root-side-locking-task cancels when
a "no peers" condition is detected in the root actor: i.e. no IPC
channels are detected by the root meaning it's impossible any actor
has a sane lock-state ongoing for debug mode.
Instead of the logic branching create a table `._spawn._methods`
which is used to lookup the desired backend framework (in this case
still only one of `multiprocessing` or `trio`) and make the top level
`.new_proc()` do the lookup and any common logic. Use a `typing.Literal`
to define the lookup table's key set.
Repair and ignore a bunch of type-annot related stuff todo with `mypy`
updates and backend-specific process typing.
This commit obviously denotes a re-license of all applicable parts of
the code base. Acknowledgement of this change was completed in #274 by
the majority of the current set of contributors. From here henceforth
all changes will be AGPL licensed and distributed. This is purely an
effort to maintain the same copy-left policy whilst closing the
(perceived) SaaS loophole the GPL allows for. It is merely for this
loophole: to avoid code hiding by any potential "network providers" who
are attempting to use the project to make a profit without either
compensating the authors or re-distributing their changes.
I thought quite a bit about this change and can't see a reason not to
close the SaaS loophole in our current license. We still are (hard)
copy-left and I plan to keep the code base this way for a couple
reasons:
- The code base produces income/profit through parent projects and is
demonstrably of high value.
- I believe firms should not get free lunch for the sake of
"contributions from their employees" or "usage as a service" which
I have found to be a dubious argument at best.
- If a firm who intends to profit from the code base wants to use it
they can propose a secondary commercial license to purchase with the
proceeds going to the project's authors under some form of well
defined contract.
- Many successful projects like Qt use this model; I see no reason it
can't work in this case until such a time as the authors feel it
should be loosened.
There has been detailed discussion in #103 on licensing alternatives.
The main point of this AGPL change is to protect the code base for the
time being from exploitation while it grows and as we move into the next
phase of development which will include extension into the multi-host
distributed software space.