There never was any underlying db bug, it was a hardcoded timeframe in
the column series write key.. Now we always assert a matching timeframe
in results.
Not only improves startup latency but also avoids a bug where the rt
buffer was being tsdb-history prepended *before* the backfilling of
recent data from the backend was complete resulting in our of order
frames in shm.
If a history manager raises a `DataUnavailable` just assume the sample
rate isn't supported and that no shm prepends will be done. Further seed
the shm array in such cases as before from the 1m history's last datum.
Also, fix tsdb -> shm back-loading, cancelling tsdb queries when either
no array-data is returned or a frame is delivered which has a start time
no lesser then the least last retrieved. Use strict timeframes for every
`Storage` API call.
Turns out querying for a high freq timeframe (like 1sec) will still
return a lower freq timeframe (like 1Min) SMH, and no idea if it's the
server or the client's fault, so we have to explicitly check the sample
step size and discard lower freq series-results. Do this inside
`Storage.read_ohlcv()` and return an empty `dict` when the wrong time
step is detected from the query result.
Further enforcements,
- both `.load()` and `read_ohlcv()` now require an explicit `timeframe:
int` input to guarantee the time step of the output array.
- drop all calls `.load()` with non-timeframe specific input.
Our default sample periods are 60s (1m) for the history chart and 1s for
the fast chart. This patch adds concurrent loading of both (or more)
different sample period data sets using the existing loading code but
with new support for looping through a passed "timeframe" table which
points to each shm instance.
More detailed adjustments include:
- breaking the "basic" and tsdb loading into 2 new funcs:
`basic_backfill()` and `tsdb_backfill()` the latter of which is run
when the tsdb daemon is discovered.
- adjust the fast shm buffer to offset with one day's worth of 1s so
that only up to a day is backfilled as history in the fast chart.
- adjust bus task starting in `manage_history()` to deliver back the
offset indices for both fast and slow shms and set them on the
`Feed` object as `.izero_hist/rt: int` values:
- allows the chart-UI linked view region handlers to use the offsets
in the view-linking-transform math to index-align the history and
fast chart.
It doesn't seem to be any slower on our least throttled backend
(binance) and it removes a bunch of hard to get correct frame
re-ordering logic that i'm not sure really ever fully worked XD
Commented some issues we still need to resolve as well.
Adjust all history query machinery to pass a `timeframe: int` in seconds
and set default of 60 (aka 1m) such that history views from here forward
will be 1m sampled OHLCV. Further when the tsdb is detected as up load
a full 10 years of data if possible on the 1m - backends will eventually
get a config section (`brokers.toml`) that allow user's to tune this.
The `Store.load()`, `.read_ohlcv()` and `.write_ohlcv()` and
`.delete_ts()` now can take a `timeframe: Optional[float]` param which
is used to look up the appropriate sampling period table-key from
`marketstore`.
As part of supporting a "history view" chart which shows downsampled
datums alongside our 1s (or higher) sampled OHLC we need a separate
buffer to store a the slower history from broker backends. This begins
that design by allocating 2 buffers:
- `rt_shm: ShmArray` which maps to a `/dev/shm/` file with `_rt` suffix
- `hist_shm: ShmArray` which maps to a file with `_hist` suffix
Deliver both of these shms back from both `manage_history()` and load
them as `Feed.rt_shm`/`.hist_shm` on the client side.
Impl deats:
- init the rt buffer with the first datum from loaded history and
assign all OHLC values to that row's 'close' and the vlm to 0.
- pass the hist buffer to the backfiller task
- only spawn **one** global sampler array-row increment task per
`brokerd` and pass in the 1s delay which we presume is our lowest
OHLC sample rate for now.
- drop `open_sample_step_stream()` and just move its body contents into
`Feed.index_stream()`
Instead of worrying about the increment period per shm subscription,
just use the value passed as input and presume the caller knows that
only one task is necessary and that the wakeup (sampling) period should
be the shortest that is needed.
It's very unlikely we don't want at least a 1s sampling (both in terms
of task switching cost and general usage) which will eventually ship as
the default "real-time" feed "timeframe". Further, this "fast" increment
sampling task can handle all lower sampling periods (eg. 1m, 5m, 1H)
based on the current implementation just the same.
Also, add a global default sample period as `_defaul_delay_s` for use in
other internal modules.
Since we figured out how to pass through ems dialog ids to the
`openOrders` sub we don't really need to do much with status updates
other then error handling. This drops `process_status()` and moves the
error handling logic into a status handler sub-block; we now just
info-log status updates for troubleshooting purposes.
This should hopefully make teardown more reliable and includes better
logic to fail over to a hard kill path after a 3 second timeout waiting
for the instance to complete using the `docker-py` wait API. Also
generalize the supervisor teardown loop by allowing the container config
endpoint to return 2 msgs to expect:
- a startup message that can be read from the container's internal
process logging that indicates it is fully up and ready.
- a teardown msg that can be polled for that indicates the container has
gracefully terminated after a cancellation request which is passed to
our container wrappers `.cancel()` method.
Make the marketstore config endpoint return the 2 messages we previously
had hard coded and use this new api.
Which is basically just "deleting" rows from a column series.
You can only use the trim command from the `.cmd` cli and only with a so
called `LocalClient` currently; it's also sketchy af and caused
a machine to hang due to mem usage..
Ideally we can patch in this functionality for use by the rpc api
and have it not hang like this XD
Pertains to https://github.com/alpacahq/marketstore/issues/264
We return a copy (since since a view doesn't seem to work..) of the
(field filtered) shm array contents which is the same index-length as
the source data.
Further, fence off the resource tracker disable-hack into a helper
routine.
It seems once in a while a frame can get missed or dropped (at least
with binance?) so in those cases, when the request erlangs is already at
max, we just manually request the missing frame and presume things will
work out XD
Further, discard out of order frames that are "from the future" that
somehow end up in the async queue once in a while? Not sure why this
happens but it seems thus far just discarding them is nbd.
Bleh/🤦, the ``end_dt`` in scope is not the "earliest" frame's
`end_dt` in the async response queue.. Parse the queue's latest epoch
and use **that** to compare to the last last pushed datetime index..
Add more detailed logging to help debug any (un)expected datetime index
gaps.
When the tsdb has a last datum that is in the past less then a "frame's
worth" of sample steps we need to slice out only the data from the
latest frame that doesn't overlap; this fixes that slice logic..
Previously i dunno wth it was doing..
When the market isn't open the feed layer won't create a subscriber
entry in the sampler broadcast loop and so if a manual call to
``broadcast()`` is made (like when trying to update a chart from
a history prepend) we need to handle that case and just broadcast
a random `-1` for now..BD
Expect each backend to deliver a `config: dict[str, Any]` which provides
concurrency controls to `trimeter`'s batch task scheduler such that
backends can define their own concurrency limits.
The dirty deats in this patch include handling history "gaps" where
a query returns a history-frame-result which spans more then the typical
frame size (in seconds). In such cases we reset the target frame index
(datetime index sequence implemented with a `pendulum.Period`) using
a generator protocol `.send()` such that the sequence can be dynamically
re-indexed starting at the new (possibly) pre-gap datetime. The new gap
logic also allows us to detect out of order frames easier and thus wait
for the next-in-order to arrive before making more requests.
Use the new `open_history_client()` endpoint/API and expect backends to
provide a history "getter" routine that can be called to load historical
data into shm even when **not** using a tsdb. Add logic for filling in
data from the tsdb once the backend has provided data up to the last
recorded in the db. Add logic for avoiding overruns of the shm buffer
with more-then-necessary queries of tsdb data.
It turns out (i guess not so shockingly?) that `marketstore` doesn't
always teardown "gracefully" under SIGINT (seems to hang if there are
open client connections which are also in the midst of teardown?) so
this instead first tries the SIGINT and then fails over to a SIGKILL
(destroy loop) which seems to be much more reliable to ensure shutdown
without any downside - in terms of a "hard kill".
Originally i was thinking the issue was root perms related (which get
relegated solely to the `marketstored` daemon actor after spawn) but
actually it was indeed the signalling / application layer causing the
hold-up/latency on teardown. There's a bunch of lingering (now
commented) code which tried to solve this non-problem as well as a bunch
logging/prints to help decipher the root of the issue - this will all
get cleaned out shortly.
If `marketstore` is detected try to only load most recent missing data
from the data provider (broker) and the rest from the tsdb and push it
all to shm for display in the UI. If the provider/broker doesn't have
the history client endpoint, just use the old one for now so we can
start to incrementally add support. Don't start the ohlc step
incrementer task until the backend signals that the feed is live.
Add some basic `numpy` epoch slice logic to generate append and prepend
arrays to write to the db.
Mooar cool things,
- add a `Storage.delete_ts()` method to wipe a column series from the db
easily.
- don't attempt to read in any OHLC series by default on client load
- add some `pyqtgraph` profiling and drop manual latency measures
- if no db series for the fqsn exists write the entire shm array
Also, Start tinkering with `tractor.trionics.ipython_embed()`
In effort to get back to a usable REPL around the mkts client
this adds usage of the new `tractor` integration api as well as logic
for skipping backfilling if existing tsdb arrays are found.
Starts a wrapper around the `marketstore` client to do basic ohlcv query
and retrieval and prototypes out write methods for ohlc and tick.
Try to connect to `marketstore` automatically (which will fail if not
started currently) but we will eventually first do a service query.
Further:
- get `pikerd` working with and without `--tsdb` flag.
- support spawning `brokerd` with no real-time quotes.
- bring back in "fqsn" support that was originally not
in this history before commits factoring.
Not sure how I missed mapping the 5995 grpc port 🤦; done now.
Also adds graceful teardown using SIGINT with included container
logging relayed to the piker console B).