This should hopefully make teardown more reliable and includes better
logic to fail over to a hard kill path after a 3 second timeout waiting
for the instance to complete using the `docker-py` wait API. Also
generalize the supervisor teardown loop by allowing the container config
endpoint to return 2 msgs to expect:
- a startup message that can be read from the container's internal
process logging that indicates it is fully up and ready.
- a teardown msg that can be polled for that indicates the container has
gracefully terminated after a cancellation request which is passed to
our container wrappers `.cancel()` method.
Make the marketstore config endpoint return the 2 messages we previously
had hard coded and use this new api.
Which is basically just "deleting" rows from a column series.
You can only use the trim command from the `.cmd` cli and only with a so
called `LocalClient` currently; it's also sketchy af and caused
a machine to hang due to mem usage..
Ideally we can patch in this functionality for use by the rpc api
and have it not hang like this XD
Pertains to https://github.com/alpacahq/marketstore/issues/264
We return a copy (since since a view doesn't seem to work..) of the
(field filtered) shm array contents which is the same index-length as
the source data.
Further, fence off the resource tracker disable-hack into a helper
routine.
It seems once in a while a frame can get missed or dropped (at least
with binance?) so in those cases, when the request erlangs is already at
max, we just manually request the missing frame and presume things will
work out XD
Further, discard out of order frames that are "from the future" that
somehow end up in the async queue once in a while? Not sure why this
happens but it seems thus far just discarding them is nbd.
Bleh/🤦, the ``end_dt`` in scope is not the "earliest" frame's
`end_dt` in the async response queue.. Parse the queue's latest epoch
and use **that** to compare to the last last pushed datetime index..
Add more detailed logging to help debug any (un)expected datetime index
gaps.
When the tsdb has a last datum that is in the past less then a "frame's
worth" of sample steps we need to slice out only the data from the
latest frame that doesn't overlap; this fixes that slice logic..
Previously i dunno wth it was doing..
When the market isn't open the feed layer won't create a subscriber
entry in the sampler broadcast loop and so if a manual call to
``broadcast()`` is made (like when trying to update a chart from
a history prepend) we need to handle that case and just broadcast
a random `-1` for now..BD
Expect each backend to deliver a `config: dict[str, Any]` which provides
concurrency controls to `trimeter`'s batch task scheduler such that
backends can define their own concurrency limits.
The dirty deats in this patch include handling history "gaps" where
a query returns a history-frame-result which spans more then the typical
frame size (in seconds). In such cases we reset the target frame index
(datetime index sequence implemented with a `pendulum.Period`) using
a generator protocol `.send()` such that the sequence can be dynamically
re-indexed starting at the new (possibly) pre-gap datetime. The new gap
logic also allows us to detect out of order frames easier and thus wait
for the next-in-order to arrive before making more requests.
Use the new `open_history_client()` endpoint/API and expect backends to
provide a history "getter" routine that can be called to load historical
data into shm even when **not** using a tsdb. Add logic for filling in
data from the tsdb once the backend has provided data up to the last
recorded in the db. Add logic for avoiding overruns of the shm buffer
with more-then-necessary queries of tsdb data.
It turns out (i guess not so shockingly?) that `marketstore` doesn't
always teardown "gracefully" under SIGINT (seems to hang if there are
open client connections which are also in the midst of teardown?) so
this instead first tries the SIGINT and then fails over to a SIGKILL
(destroy loop) which seems to be much more reliable to ensure shutdown
without any downside - in terms of a "hard kill".
Originally i was thinking the issue was root perms related (which get
relegated solely to the `marketstored` daemon actor after spawn) but
actually it was indeed the signalling / application layer causing the
hold-up/latency on teardown. There's a bunch of lingering (now
commented) code which tried to solve this non-problem as well as a bunch
logging/prints to help decipher the root of the issue - this will all
get cleaned out shortly.
If `marketstore` is detected try to only load most recent missing data
from the data provider (broker) and the rest from the tsdb and push it
all to shm for display in the UI. If the provider/broker doesn't have
the history client endpoint, just use the old one for now so we can
start to incrementally add support. Don't start the ohlc step
incrementer task until the backend signals that the feed is live.
Add some basic `numpy` epoch slice logic to generate append and prepend
arrays to write to the db.
Mooar cool things,
- add a `Storage.delete_ts()` method to wipe a column series from the db
easily.
- don't attempt to read in any OHLC series by default on client load
- add some `pyqtgraph` profiling and drop manual latency measures
- if no db series for the fqsn exists write the entire shm array
Also, Start tinkering with `tractor.trionics.ipython_embed()`
In effort to get back to a usable REPL around the mkts client
this adds usage of the new `tractor` integration api as well as logic
for skipping backfilling if existing tsdb arrays are found.
Starts a wrapper around the `marketstore` client to do basic ohlcv query
and retrieval and prototypes out write methods for ohlc and tick.
Try to connect to `marketstore` automatically (which will fail if not
started currently) but we will eventually first do a service query.
Further:
- get `pikerd` working with and without `--tsdb` flag.
- support spawning `brokerd` with no real-time quotes.
- bring back in "fqsn" support that was originally not
in this history before commits factoring.
Not sure how I missed mapping the 5995 grpc port 🤦; done now.
Also adds graceful teardown using SIGINT with included container
logging relayed to the piker console B).
In order to support instruments with lifetimes (aka derivatives) we need
generally need special symbol annotations which detail such meta data
(such as `MNQ.GLOBEX.20220717` for daq futes). Further there is really
no reason for the public api for this feed layer to care about getting
a special "brokername" field since generally the data is coming directly
from UIs (eg. search selection) so we might as well accept a fqsn (fully
qualified symbol name) which includes the broker name; for now a suffix
like `'.ib'`. We may change this schema (soon) but this at least gets us
to a point where we expect the full name including broker/provider.
An additional detail: for certain "generic" symbol names (like for
futes) we will pull a so called "front contract" and map this to
a specific fqsn underneath, so there is a double (cached) entry for that
entry such that other consumers can use it the same way if desired.
Some other machinery changes:
- expect the `stream_quotes()` endpoint to deliver it's `.started()` msg
almost immediately since we now need it deliver any fqsn asap (yes
this means the ep should no longer wait on a "live" first quote and
instead deliver what quote data it can right away.
- expect the quotes ohlc sampler task to add in the broker name before
broadcast to remote (actor) consumers since the backend isn't (yet)
expected to do that add in itself.
- obviously we start using all the new fqsn related `Symbol` apis
Break up real-time quote feed and history loading into 2 separate tasks
and deliver a client side `data.Feed` as soon as history is loaded
(instead of waiting for a rt quote - the previous logic). If
a symbol doesn't have history then likely the feed shouldn't be loaded
(since presumably client code will need at least "some" datums history
to do anything) and waiting on a real-time quote is dumb, since it'll
hang if the market isn't open XD. If a symbol doesn't have history we
can always write a zero/null array when we run into that case. This also
greatly speeds up feed loading when both history and quotes are available.
TL;DR summary:
- add a `_Feedsbus.start_task()` one-cancel-scope-per-task method for
assisting with (re-)starting and stopping long running persistent
feeds (basically a "one cancels one" style nursery API).
- add a `manage_history()` task which does all history loading (and
eventually real-time writing) which has an independent signal and
start it in a separate task.
- drop the "sample rate per symbol" stuff since client code doesn't really
care when it can just inspect shm indexing/time-steps itself.
- run throttle tasks in the bus nursery thus avoiding cancelling the
underlying sampler task on feed client disconnects.
- don't store a repeated ref the bus nursery's cancel scope..
This should in theory result in increased burstiness since we remove
the plain `trio.sleep()` and instead always wait on the receive channel
as much as possible until the `trio.move_on_after()` (+ time diffing
calcs) times out and signals the next throttled send cycle. This also is
slightly easier to grok code-wise instead of the `try, except` and
another tight while loop until a `trio.WouldBlock`. The only simpler
way i can think to do it is with 2 tasks: 1 to collect ticks and the
other to read and send at the throttle rate.
Comment out the log msg for now to avoid latency and add much more
detailed comments. Add an overrun log msg to the main sample loop.
There was a lingering issue where the fsp daemon would sync its shm
array with the source data and we'd set the start/end indices to the
same value. Under some races a reader would then read an empty `.array`
which it wasn't expecting. This fixes that as well as tidies up the
`ShmArray.push()` logic and adds a temporary check in `.array` for zero
length if the array hasn't been written yet.
We can now start removing read array length checks in consumer code
and hopefully no more races will show up.
Try out he new broadcast channels from `tractor` for data feeds
we already have cached. Any time there's a cache hit we load the
cached feed and just slap a broadcast receiver on it for the local
consumer task.
If a client attaches to a quotes data feed and requests a throttle rate,
be sure to unsub that side-band memchan + task when it detaches and
especially so on any transport connection error.
Also, use an explicit `tractor.Context.cancel()` on the client feed
block exit since we removed the implicit cancel option from the
`tractor` api.
Adding binance's "hft" ws feeds has resulted in a lot of context
switching in our Qt charts, so much so it's chewin CPU and definitely
worth it to throttle to the detected display rate as per discussion in
issue #192.
This is a first very very naive attempt at throttling L1 tick feeds on
the `brokerd` end (producer side) using a constant and uniform delivery
rate by way of a `trio` task + mem chan. The new func is
`data._sampling.uniform_rate_send()`. Basically if a client request
a feed and provides a throttle rate we just spawn a task and queue up
ticks until approximately the next display rate's worth period of time
has passed before forwarding. It's definitely nothing fancy but does
provide fodder and a start point for an up and coming queueing eng to
start digging into both #107 and #109 ;)
This allows for more deterministically managing long running sub-daemon
services under `pikerd` using the new context api from `tractor`.
The contexts are allocated in an async exit stack and torn down at root
daemon termination. Spawn brokerds using this method by changing the
persistence entry point to be a `@tractor.context`.