tractor

Commit Graph

Author	SHA1	Message	Date
Tyler Goodlet	687852f368	Add stale entry deleted from registrar test By spawning an actor task that immediately shuts down the transport server and then sleeps, verify that attempting to connect via the `._discovery.find_actor()` helper delivers `None` for the `Portal` value. Relates to #184 and #216	2023-08-28 12:20:12 -04:00
Tyler Goodlet	d83d991f21	Handle stale registrar entries; detect and delete In cases where an actor's transport server task (by default handling new TCP connections) terminates early but does not de-register from the pertaining registry (aka the registrar) actor's address table, the trying-to-connect client actor will get a connection error on that address. In the case where client handles a (local) `OSError` (meaning the target actor address is likely being contacted over `localhost`) exception, make a further call to the registrar to delete the stale entry and `yield None` gracefully indicating to calling code that no `Portal` can be delivered to the target address. This issue was originally discovered in `piker` where the `emsd` (clearing engine) actor would sometimes crash on rapid client re-connects and then leave a `pikerd` stale entry. With this fix new clients will attempt connect via an endpoint which will re-spawn the `emsd` when a `None` portal is delivered (via `maybe_spawn_em()`).	2023-08-28 11:26:36 -04:00
Tyler Goodlet	1cf712cfac	Add `Arbiter.delete_sockaddr()` to remove addrs Since stale addrs can be leaked where the actor transport server task crashes but doesn't (successfully) unregister from the registrar, we need a remote way to remove such entries; hence this new (registrar) method. To implement this make use of the `bidict` lib for the `._registry` table thus making it super simple to do reverse uuid lookups from an input socket-address.	2023-08-21 19:07:14 -04:00
Tyler Goodlet	22c14e235e	Expose `Channel` @ pkg level, drop `_debug.pp()` alias	2023-08-18 10:18:25 -04:00
Tyler Goodlet	1102843087	Teensie tidy up on actor doc string	2023-08-18 10:10:36 -04:00
Tyler Goodlet	e03bec5efc	Move `.to_asyncio` to modern optional value type annots	2023-07-21 15:08:46 -04:00
Tyler Goodlet	bee2c36072	Make `NamespacePath` work on object refs Detect if the input ref is a non-func (like an `object` instance) in which case grab its type name using `type()`. Wrap all the name-getting into a new `_mk_fqpn()` static meth: gets the "fully qualified path name" and returns path and name in tuple; port other methds to use it. Refine and update the docs B)	2023-07-12 13:07:30 -04:00
Tyler Goodlet	b36b3d522f	Map `breakpoint()` built-in to new `.pause_from_sync()` ep	2023-07-07 15:35:52 -04:00
Tyler Goodlet	4ace8f6037	Fix frame-selection display on first REPL entry For whatever reason pdb(p), and in general, will show the frame of the next python instruction/LOC on initial entry (at least using `.set_trace()`), as such remove the `try/finally` block in the sync code entrypoint `.pause_from_sync()`, and also since doesn't seem like we really need it anyway. Further, and to this end: - enable hidden frames support in our default config. - fix/drop/mask all the frame ref-ing/mangling we had prior since it's no longer needed as well as manual `Lock` releasing which seems to work already by having the `greenback` spawned task do it's normal thing? - move to no `Union` type annots. - hide all frames that can add "this is the runtime confusion" to traces.	2023-07-07 14:51:44 -04:00
Tyler Goodlet	98a7326c85	._runtime: log level tweaks, use crit for stale debug lock detection	2023-07-07 14:49:23 -04:00
Tyler Goodlet	46972df041	.log: more correct handling for `get_logger(__name__)` usage	2023-07-07 14:48:37 -04:00
Tyler Goodlet	565d7c3ee5	Add longer "required reading" list B)	2023-07-07 14:47:42 -04:00
Tyler Goodlet	ac695a05bf	Updates from latest `piker.data._sharedmem` changes	2023-06-22 17:16:17 -04:00
Tyler Goodlet	fc56971a2d	First proto: use `greenback` for sync func breakpointing This works now for supporting a new `tractor.pause_from_sync()` `tractor`-aware-replacement for `Pdb.set_trace()` from sync functions which are also scheduled from our runtime. Uses `greenback` to do all the magic of scheduling the bg `tractor._debug._pause()` task and engaging the normal TTY locking machinery triggered by `await tractor.breakpoint()` Further this starts some public API renaming, making a switch to `tractor.pause()` from `.breakpoint()` which IMO much better expresses the semantics of the runtime intervention required to suffice multi-process "breakpointing"; it also is an alternate name for the same in computer science more generally: https://en.wikipedia.org/wiki/Breakpoint It also avoids using the same name as the `breakpoint()` built-in which is important since there is alot more going on when you call our equivalent API. Deats of that: - add deprecation warning for `tractor.breakpoint()` - add `tractor.pause()` and a shorthand, easier-to-type, alias `.pp()` for "pause-point" B) - add `pause_from_sync()` as the new `breakpoint()`-from-sync-function hack which does all the `greenback` stuff for the user. Still TODO: - figure out where in the runtime and when to call `greenback.ensure_portal()`. - fix the frame selection issue where `trio._core._ki._ki_protection_decorator:wrapper` seems to be always shown on REPL start as the selected frame..	2023-06-21 16:08:18 -04:00
Tyler Goodlet	ee87cf0e29	Add a debug-mode-breakpoint-causes-hang case! Only found this by luck more or less (while working on something in a client project) and it turns out we can actually get to (yet another) hang state where SIGINT will be ignored by the root actor on teardown.. I've added all the necessary logic flags to reproduce. We obviously need a follow up bug issue and a test suite to replicate! It appears as though the following are required based on very light tinkering: - infected asyncio mode active - debug mode active - the `trio` context must breakpoint before `.started()`-ing - the `asyncio` must not error	2023-06-21 14:07:31 -04:00
Tyler Goodlet	ebcb275cd8	Add (first-draft) infected-`asyncio` actor task uses debugger example	2023-06-21 14:07:31 -04:00
Tyler Goodlet	f745da9fb2	Add `numpy` for testing optional integrated shm API layer	2023-06-15 12:20:20 -04:00
Tyler Goodlet	4f442efbd7	Pass `str` dtype for `use_str` case	2023-06-15 12:20:20 -04:00
Tyler Goodlet	f9a84f0732	Allocate size-specced "empty" sequence from default values by type	2023-06-15 12:20:20 -04:00
Tyler Goodlet	e0bf964ff0	Mod define `_USE_POSIX`, add a of of todos	2023-06-15 12:20:20 -04:00
Tyler Goodlet	a9fc4c1b91	Parametrize rw test with variable frame sizes Demonstrates fixed size frame-oriented reads by the child where the parent only transmits a "read" stream msg on "frame fill events" such that the child incrementally reads the shm list data (much like in a real-time-buffered streaming system).	2023-06-15 12:20:20 -04:00
Tyler Goodlet	b52ff270c5	Add `ShmList` slice support in `.__getitem__()`	2023-06-15 12:20:20 -04:00
Tyler Goodlet	1713ecd9f8	Rename token type to `NDToken` in the style of `nptyping`	2023-06-15 12:20:20 -04:00
Tyler Goodlet	edb82fdd78	Don't require runtime (for now), type annot fixing	2023-06-15 12:20:20 -04:00
Tyler Goodlet	339d787cf8	Add repetitive attach to existing segment test	2023-06-15 12:20:20 -04:00
Tyler Goodlet	c32b21b4b1	Add initial readers-writer shm list tests	2023-06-15 12:20:20 -04:00
Tyler Goodlet	71477290fc	Add `ShmList` wrapping the stdlib's `ShareableList` First attempt at getting `multiprocessing.shared_memory.ShareableList` working; we wrap the stdlib type with a readonly attr and a `.key` for cross-actor lookup. Also, rename all `numpy` specific routines to have a `ndarray` suffix in the func names.	2023-06-15 12:20:20 -04:00
Tyler Goodlet	9716d86825	Initial module import from `piker.data._sharemem` More or less a verbatim copy-paste minus some edgy variable naming and internal `piker` module imports. There is a bunch of OHLC related defaults that need to be dropped and we need to adjust to an optional dependence on `numpy` by supporting shared lists as per the mp docs.	2023-06-15 12:20:20 -04:00
Tyler Goodlet	7507e269ec	Just import `mp` top level in `._spawn`	2023-06-14 15:32:15 -04:00
Tyler Goodlet	17ae449160	Tidy up `typing` imports in broadcaster mod	2023-06-14 15:31:52 -04:00
Tyler Goodlet	6495688730	Drop `Optional` style from runtime mod	2023-05-25 16:00:05 -04:00
Tyler Goodlet	a0276f41c2	Remote cancellation runtime-internal vars renames - `Context._cancel_called_remote` -> `._cancelled_remote` since "called" implies the cancellation was "requested" when it could be due to another error and the actor uid is the value - only set once the far end task scope is terminated due to either error or cancel, which has nothing to do with what caused the cancellation. - `Actor._cancel_called_remote` -> `._cancel_called_by_remote` which emphasizes that this variable is only set IFF some remote actor requested that this actor's runtime be cancelled via `Actor.cancel()`.	2023-05-19 14:31:55 -04:00
Tyler Goodlet	ead9e418de	Expose `allow_overruns` to `Portal.open_context()` Turns out you can get a case where you might be opening multiple ctx-streams concurrently and during the context opening phase you block for all contexts to open, but then when you eventually start opening streams some slow to start context has caused the others become in an overrun state.. so we need to let the caller control whether that's an error ;) This also needs a test!	2023-05-15 10:00:45 -04:00
Tyler Goodlet	60791ed546	Oof, fix remaining `Actor.cancel()` in `Actor._from_parent()`	2023-05-15 10:00:45 -04:00
Tyler Goodlet	7293b82bcc	Tweak doc string	2023-05-15 10:00:45 -04:00
Tyler Goodlet	20d75ff934	Move move context code into new `._context` mod	2023-05-15 10:00:45 -04:00
Tyler Goodlet	041d7da721	Drop caller cancels overrun test; covered in new tests	2023-05-15 10:00:45 -04:00
Tyler Goodlet	04e4397a8f	Ignore drainer-task nursery RTE during context exit	2023-05-15 10:00:45 -04:00
Tyler Goodlet	968f13f9ef	Set `Context._scope_nursery` on callee side too Because obviously we probably want to support `allow_overruns` on the remote callee side as well XD Only found the bugs fixed in this patch this thanks to writing a much more exhaustive test set for overrun cases B)	2023-05-15 10:00:45 -04:00
Tyler Goodlet	f9911c22a4	Seriously cover all overrun cases This actually caught further runtime bugs so it's gud i tried.. Add overrun-ignore enabled / disabled cases and error catching for all of them. More or less this should cover every possible outcome when it comes to setting `allow_overruns: bool` i hope XD	2023-05-15 10:00:45 -04:00
Tyler Goodlet	63adf73b4b	Adjust aio test for silent cancellation by parent	2023-05-15 10:00:45 -04:00
Tyler Goodlet	f1e9c0be93	Fix cluster test to use `allow_overruns`	2023-05-15 10:00:45 -04:00
Tyler Goodlet	6db656fecf	Flip allocate log msgs to debug	2023-05-15 10:00:45 -04:00
Tyler Goodlet	6994d2026d	Drop brackpressure usage from fan out tests	2023-05-15 10:00:45 -04:00
Tyler Goodlet	c72026091e	Remote `Context` cancellation semantics rework B) This adds remote cancellation semantics to our `tractor.Context` machinery to more closely match that of `trio.CancelScope` but with operational differences to handle the nature of parallel tasks interoperating across multiple memory boundaries: - if an actor task cancels some context it has opened via `Context.cancel()`, the remote (scope linked) task will be cancelled using the normal `CancelScope` semantics of `trio` meaning the remote cancel scope surrounding the far side task is cancelled and `trio.Cancelled`s are expected to be raised in that scope as per normal `trio` operation, and in the case where no error is raised in that remote scope, a `ContextCancelled` error is raised inside the runtime machinery and relayed back to the opener/caller side of the context. - if any actor task cancels a full remote actor runtime using `Portal.cancel_actor()` the same semantics as above apply except every other remote actor task which also has an open context with the actor which was cancelled will also be sent a `ContextCancelled` but with the `.canceller` field set to the uid of the original cancel requesting actor. This changeset also includes a more "proper" solution to the issue of "allowing overruns" during streaming without attempting to implement any form of IPC streaming backpressure. Implementing task-granularity backpressure cross-process turns out to be more or less impossible without augmenting out streaming protocol (likely at the cost of performance). Further allowing overruns requires special care since any blocking of the runtime RPC msg loop task effectively can block control msgs such as cancels and stream terminations. The implementation details per abstraction layer are as follows. ._streaming.Context: - add a new contructor factor func `mk_context()` which provides a strictly private init-er whilst allowing us to not have to define an `.__init__()` on the type def. - add public `.cancel_called` and `.cancel_called_remote` properties. - general rename of what was the internal `._backpressure` var to `._allow_overruns: bool`. - move the old contents of `Actor._push_result()` into a new `._deliver_msg()` allowing for better encapsulation of per-ctx msg handling. - always check for received 'error' msgs and process them with the new `_maybe_cancel_and_set_remote_error()` before any msg delivery to the local task, thus guaranteeing error and cancellation handling despite any overflow handling. - add a new `._drain_overflows()` task-method for use with new `._allow_overruns: bool = True` mode. - add back a `._scope_nursery: trio.Nursery` (allocated in `Portal.open_context()`) who's sole purpose is to spawn a single task which runs the above method; anything else is an error. - augment `._deliver_msg()` to start a task and run the above method when operating in no overrun mode; the task queues overflow msgs and attempts to send them to the underlying mem chan using a blocking `.send()` call. - on context exit, any existing "drainer task" will be cancelled and remaining overflow queued msgs are discarded with a warning. - rename `._error` -> `_remote_error` and set it in a new method `_maybe_cancel_and_set_remote_error()` which is called before processing - adjust `.result()` to always call `._maybe_raise_remote_err()` at its start such that whenever a `ContextCancelled` arrives we do logic for whether or not to immediately raise that error or ignore it due to the current actor being the one who requested the cancel, by checking the error's `.canceller` field. - set the default value of `._result` to be `id(Context()` thus avoiding conflict with any `.result()` actually being `False`.. ._runtime.Actor: - augment `.cancel()` and `._cancel_task()` and `.cancel_rpc_tasks()` to take a `requesting_uid: tuple` indicating the source actor of every cancellation request. - pass through the new `Context._allow_overruns` through `.get_context()` - call the new `Context._deliver_msg()` from `._push_result()` (since the factoring out that method's contents). ._runtime._invoke: - `TastStatus.started()` back a `Context` (unless an error is raised) instead of the cancel scope to make it easy to set/get state on that context for the purposes of cancellation and remote error relay. - always raise any remote error via `Context._maybe_raise_remote_err()` before doing any `ContextCancelled` logic. - assign any `Context._cancel_called_remote` set by the `requesting_uid` cancel methods (mentioned above) to the `ContextCancelled.canceller`. ._runtime.process_messages: - always pass a `requesting_uid: tuple` to `Actor.cancel()` and `._cancel_task` to that any corresponding `ContextCancelled.canceller` can be set inside `._invoke()`.	2023-05-15 10:00:45 -04:00
Tyler Goodlet	90e41016b9	Only tuplize `.canceller` if non-`None`	2023-05-15 10:00:45 -04:00
Tyler Goodlet	f54c415060	Move `NoRuntime` import inside `current_actor()` to avoid cycle	2023-05-15 10:00:45 -04:00
Tyler Goodlet	03644f59cc	Augment test cases for callee-returns-result early Turns out stuff was totally broken in these cases because we're either closing the underlying mem chan too early or not handling the "allow_overruns" mode's cancellation correctly..	2023-05-15 10:00:45 -04:00
Tyler Goodlet	67f82c6ebd	Add new remote error introspection attrs To handle both remote cancellation this adds `ContextCanceled.canceller: tuple` the uid of the cancel requesting actor and is expected to be set by the runtime when servicing any remote cancel request. This makes it possible for `ContextCancelled` receivers to know whether "their actor runtime" is the source of the cancellation. Also add an explicit `RemoteActor.src_actor_uid` which better formalizes the notion of "which remote actor" the error originated from. Both of these new attrs are expected to be packed in the `.msgdata` when the errors are loaded locally.	2023-05-15 10:00:45 -04:00
Tyler Goodlet	71cd445319	Add new set of context cancellation tests These will verify new changes to the runtime/messaging core which allows us to adopt an "ignore cancel if requested by us" style handling of `ContextCancelled` more like how `trio` does with `trio.Nursery.cancel_scope.cancel()`. We now expect a `ContextCancelled.canceller: tuple` which is set to the actor uid of the actor which requested the cancellation which eventually resulted in the remote error-msg. Also adds some experimental tweaks to the "backpressure" test which it turns out is very problematic in coordination with context cancellation since blocking on the feed mem chan to some task will block the ipc msg loop and thus handling of cancellation.. More to come to both the test and core to address this hopefully since right now this test is failing.	2023-05-15 10:00:45 -04:00

1 2 3 4 5 ...

1516 Commits (687852f368257dc5b4870ad8112bb9a7fd3bff5c) All Branches Search

1516 Commits (687852f368257dc5b4870ad8112bb9a7fd3bff5c)

All Branches