There never was any underlying db bug, it was a hardcoded timeframe in
the column series write key.. Now we always assert a matching timeframe
in results.
Not only improves startup latency but also avoids a bug where the rt
buffer was being tsdb-history prepended *before* the backfilling of
recent data from the backend was complete resulting in our of order
frames in shm.
If a history manager raises a `DataUnavailable` just assume the sample
rate isn't supported and that no shm prepends will be done. Further seed
the shm array in such cases as before from the 1m history's last datum.
Also, fix tsdb -> shm back-loading, cancelling tsdb queries when either
no array-data is returned or a frame is delivered which has a start time
no lesser then the least last retrieved. Use strict timeframes for every
`Storage` API call.
Turns out querying for a high freq timeframe (like 1sec) will still
return a lower freq timeframe (like 1Min) SMH, and no idea if it's the
server or the client's fault, so we have to explicitly check the sample
step size and discard lower freq series-results. Do this inside
`Storage.read_ohlcv()` and return an empty `dict` when the wrong time
step is detected from the query result.
Further enforcements,
- both `.load()` and `read_ohlcv()` now require an explicit `timeframe:
int` input to guarantee the time step of the output array.
- drop all calls `.load()` with non-timeframe specific input.
Our default sample periods are 60s (1m) for the history chart and 1s for
the fast chart. This patch adds concurrent loading of both (or more)
different sample period data sets using the existing loading code but
with new support for looping through a passed "timeframe" table which
points to each shm instance.
More detailed adjustments include:
- breaking the "basic" and tsdb loading into 2 new funcs:
`basic_backfill()` and `tsdb_backfill()` the latter of which is run
when the tsdb daemon is discovered.
- adjust the fast shm buffer to offset with one day's worth of 1s so
that only up to a day is backfilled as history in the fast chart.
- adjust bus task starting in `manage_history()` to deliver back the
offset indices for both fast and slow shms and set them on the
`Feed` object as `.izero_hist/rt: int` values:
- allows the chart-UI linked view region handlers to use the offsets
in the view-linking-transform math to index-align the history and
fast chart.
It doesn't seem to be any slower on our least throttled backend
(binance) and it removes a bunch of hard to get correct frame
re-ordering logic that i'm not sure really ever fully worked XD
Commented some issues we still need to resolve as well.
Adjust all history query machinery to pass a `timeframe: int` in seconds
and set default of 60 (aka 1m) such that history views from here forward
will be 1m sampled OHLCV. Further when the tsdb is detected as up load
a full 10 years of data if possible on the 1m - backends will eventually
get a config section (`brokers.toml`) that allow user's to tune this.
The `Store.load()`, `.read_ohlcv()` and `.write_ohlcv()` and
`.delete_ts()` now can take a `timeframe: Optional[float]` param which
is used to look up the appropriate sampling period table-key from
`marketstore`.
As part of supporting a "history view" chart which shows downsampled
datums alongside our 1s (or higher) sampled OHLC we need a separate
buffer to store a the slower history from broker backends. This begins
that design by allocating 2 buffers:
- `rt_shm: ShmArray` which maps to a `/dev/shm/` file with `_rt` suffix
- `hist_shm: ShmArray` which maps to a file with `_hist` suffix
Deliver both of these shms back from both `manage_history()` and load
them as `Feed.rt_shm`/`.hist_shm` on the client side.
Impl deats:
- init the rt buffer with the first datum from loaded history and
assign all OHLC values to that row's 'close' and the vlm to 0.
- pass the hist buffer to the backfiller task
- only spawn **one** global sampler array-row increment task per
`brokerd` and pass in the 1s delay which we presume is our lowest
OHLC sample rate for now.
- drop `open_sample_step_stream()` and just move its body contents into
`Feed.index_stream()`
Instead of worrying about the increment period per shm subscription,
just use the value passed as input and presume the caller knows that
only one task is necessary and that the wakeup (sampling) period should
be the shortest that is needed.
It's very unlikely we don't want at least a 1s sampling (both in terms
of task switching cost and general usage) which will eventually ship as
the default "real-time" feed "timeframe". Further, this "fast" increment
sampling task can handle all lower sampling periods (eg. 1m, 5m, 1H)
based on the current implementation just the same.
Also, add a global default sample period as `_defaul_delay_s` for use in
other internal modules.
Since we figured out how to pass through ems dialog ids to the
`openOrders` sub we don't really need to do much with status updates
other then error handling. This drops `process_status()` and moves the
error handling logic into a status handler sub-block; we now just
info-log status updates for troubleshooting purposes.
This should hopefully make teardown more reliable and includes better
logic to fail over to a hard kill path after a 3 second timeout waiting
for the instance to complete using the `docker-py` wait API. Also
generalize the supervisor teardown loop by allowing the container config
endpoint to return 2 msgs to expect:
- a startup message that can be read from the container's internal
process logging that indicates it is fully up and ready.
- a teardown msg that can be polled for that indicates the container has
gracefully terminated after a cancellation request which is passed to
our container wrappers `.cancel()` method.
Make the marketstore config endpoint return the 2 messages we previously
had hard coded and use this new api.
Which is basically just "deleting" rows from a column series.
You can only use the trim command from the `.cmd` cli and only with a so
called `LocalClient` currently; it's also sketchy af and caused
a machine to hang due to mem usage..
Ideally we can patch in this functionality for use by the rpc api
and have it not hang like this XD
Pertains to https://github.com/alpacahq/marketstore/issues/264
We return a copy (since since a view doesn't seem to work..) of the
(field filtered) shm array contents which is the same index-length as
the source data.
Further, fence off the resource tracker disable-hack into a helper
routine.
It seems once in a while a frame can get missed or dropped (at least
with binance?) so in those cases, when the request erlangs is already at
max, we just manually request the missing frame and presume things will
work out XD
Further, discard out of order frames that are "from the future" that
somehow end up in the async queue once in a while? Not sure why this
happens but it seems thus far just discarding them is nbd.
Bleh/🤦, the ``end_dt`` in scope is not the "earliest" frame's
`end_dt` in the async response queue.. Parse the queue's latest epoch
and use **that** to compare to the last last pushed datetime index..
Add more detailed logging to help debug any (un)expected datetime index
gaps.
When the tsdb has a last datum that is in the past less then a "frame's
worth" of sample steps we need to slice out only the data from the
latest frame that doesn't overlap; this fixes that slice logic..
Previously i dunno wth it was doing..
When the market isn't open the feed layer won't create a subscriber
entry in the sampler broadcast loop and so if a manual call to
``broadcast()`` is made (like when trying to update a chart from
a history prepend) we need to handle that case and just broadcast
a random `-1` for now..BD
Expect each backend to deliver a `config: dict[str, Any]` which provides
concurrency controls to `trimeter`'s batch task scheduler such that
backends can define their own concurrency limits.
The dirty deats in this patch include handling history "gaps" where
a query returns a history-frame-result which spans more then the typical
frame size (in seconds). In such cases we reset the target frame index
(datetime index sequence implemented with a `pendulum.Period`) using
a generator protocol `.send()` such that the sequence can be dynamically
re-indexed starting at the new (possibly) pre-gap datetime. The new gap
logic also allows us to detect out of order frames easier and thus wait
for the next-in-order to arrive before making more requests.
Use the new `open_history_client()` endpoint/API and expect backends to
provide a history "getter" routine that can be called to load historical
data into shm even when **not** using a tsdb. Add logic for filling in
data from the tsdb once the backend has provided data up to the last
recorded in the db. Add logic for avoiding overruns of the shm buffer
with more-then-necessary queries of tsdb data.
It turns out (i guess not so shockingly?) that `marketstore` doesn't
always teardown "gracefully" under SIGINT (seems to hang if there are
open client connections which are also in the midst of teardown?) so
this instead first tries the SIGINT and then fails over to a SIGKILL
(destroy loop) which seems to be much more reliable to ensure shutdown
without any downside - in terms of a "hard kill".
Originally i was thinking the issue was root perms related (which get
relegated solely to the `marketstored` daemon actor after spawn) but
actually it was indeed the signalling / application layer causing the
hold-up/latency on teardown. There's a bunch of lingering (now
commented) code which tried to solve this non-problem as well as a bunch
logging/prints to help decipher the root of the issue - this will all
get cleaned out shortly.
If `marketstore` is detected try to only load most recent missing data
from the data provider (broker) and the rest from the tsdb and push it
all to shm for display in the UI. If the provider/broker doesn't have
the history client endpoint, just use the old one for now so we can
start to incrementally add support. Don't start the ohlc step
incrementer task until the backend signals that the feed is live.
Add some basic `numpy` epoch slice logic to generate append and prepend
arrays to write to the db.
Mooar cool things,
- add a `Storage.delete_ts()` method to wipe a column series from the db
easily.
- don't attempt to read in any OHLC series by default on client load
- add some `pyqtgraph` profiling and drop manual latency measures
- if no db series for the fqsn exists write the entire shm array
Also, Start tinkering with `tractor.trionics.ipython_embed()`
In effort to get back to a usable REPL around the mkts client
this adds usage of the new `tractor` integration api as well as logic
for skipping backfilling if existing tsdb arrays are found.
Starts a wrapper around the `marketstore` client to do basic ohlcv query
and retrieval and prototypes out write methods for ohlc and tick.
Try to connect to `marketstore` automatically (which will fail if not
started currently) but we will eventually first do a service query.
Further:
- get `pikerd` working with and without `--tsdb` flag.
- support spawning `brokerd` with no real-time quotes.
- bring back in "fqsn" support that was originally not
in this history before commits factoring.
Not sure how I missed mapping the 5995 grpc port 🤦; done now.
Also adds graceful teardown using SIGINT with included container
logging relayed to the piker console B).
In order to support instruments with lifetimes (aka derivatives) we need
generally need special symbol annotations which detail such meta data
(such as `MNQ.GLOBEX.20220717` for daq futes). Further there is really
no reason for the public api for this feed layer to care about getting
a special "brokername" field since generally the data is coming directly
from UIs (eg. search selection) so we might as well accept a fqsn (fully
qualified symbol name) which includes the broker name; for now a suffix
like `'.ib'`. We may change this schema (soon) but this at least gets us
to a point where we expect the full name including broker/provider.
An additional detail: for certain "generic" symbol names (like for
futes) we will pull a so called "front contract" and map this to
a specific fqsn underneath, so there is a double (cached) entry for that
entry such that other consumers can use it the same way if desired.
Some other machinery changes:
- expect the `stream_quotes()` endpoint to deliver it's `.started()` msg
almost immediately since we now need it deliver any fqsn asap (yes
this means the ep should no longer wait on a "live" first quote and
instead deliver what quote data it can right away.
- expect the quotes ohlc sampler task to add in the broker name before
broadcast to remote (actor) consumers since the backend isn't (yet)
expected to do that add in itself.
- obviously we start using all the new fqsn related `Symbol` apis
Break up real-time quote feed and history loading into 2 separate tasks
and deliver a client side `data.Feed` as soon as history is loaded
(instead of waiting for a rt quote - the previous logic). If
a symbol doesn't have history then likely the feed shouldn't be loaded
(since presumably client code will need at least "some" datums history
to do anything) and waiting on a real-time quote is dumb, since it'll
hang if the market isn't open XD. If a symbol doesn't have history we
can always write a zero/null array when we run into that case. This also
greatly speeds up feed loading when both history and quotes are available.
TL;DR summary:
- add a `_Feedsbus.start_task()` one-cancel-scope-per-task method for
assisting with (re-)starting and stopping long running persistent
feeds (basically a "one cancels one" style nursery API).
- add a `manage_history()` task which does all history loading (and
eventually real-time writing) which has an independent signal and
start it in a separate task.
- drop the "sample rate per symbol" stuff since client code doesn't really
care when it can just inspect shm indexing/time-steps itself.
- run throttle tasks in the bus nursery thus avoiding cancelling the
underlying sampler task on feed client disconnects.
- don't store a repeated ref the bus nursery's cancel scope..
This should in theory result in increased burstiness since we remove
the plain `trio.sleep()` and instead always wait on the receive channel
as much as possible until the `trio.move_on_after()` (+ time diffing
calcs) times out and signals the next throttled send cycle. This also is
slightly easier to grok code-wise instead of the `try, except` and
another tight while loop until a `trio.WouldBlock`. The only simpler
way i can think to do it is with 2 tasks: 1 to collect ticks and the
other to read and send at the throttle rate.
Comment out the log msg for now to avoid latency and add much more
detailed comments. Add an overrun log msg to the main sample loop.
There was a lingering issue where the fsp daemon would sync its shm
array with the source data and we'd set the start/end indices to the
same value. Under some races a reader would then read an empty `.array`
which it wasn't expecting. This fixes that as well as tidies up the
`ShmArray.push()` logic and adds a temporary check in `.array` for zero
length if the array hasn't been written yet.
We can now start removing read array length checks in consumer code
and hopefully no more races will show up.
Try out he new broadcast channels from `tractor` for data feeds
we already have cached. Any time there's a cache hit we load the
cached feed and just slap a broadcast receiver on it for the local
consumer task.
If a client attaches to a quotes data feed and requests a throttle rate,
be sure to unsub that side-band memchan + task when it detaches and
especially so on any transport connection error.
Also, use an explicit `tractor.Context.cancel()` on the client feed
block exit since we removed the implicit cancel option from the
`tractor` api.
Adding binance's "hft" ws feeds has resulted in a lot of context
switching in our Qt charts, so much so it's chewin CPU and definitely
worth it to throttle to the detected display rate as per discussion in
issue #192.
This is a first very very naive attempt at throttling L1 tick feeds on
the `brokerd` end (producer side) using a constant and uniform delivery
rate by way of a `trio` task + mem chan. The new func is
`data._sampling.uniform_rate_send()`. Basically if a client request
a feed and provides a throttle rate we just spawn a task and queue up
ticks until approximately the next display rate's worth period of time
has passed before forwarding. It's definitely nothing fancy but does
provide fodder and a start point for an up and coming queueing eng to
start digging into both #107 and #109 ;)
This allows for more deterministically managing long running sub-daemon
services under `pikerd` using the new context api from `tractor`.
The contexts are allocated in an async exit stack and torn down at root
daemon termination. Spawn brokerds using this method by changing the
persistence entry point to be a `@tractor.context`.
Avoid bothering with a trio event and expect the caller to do manual shm
registering with the write loop. Provide OHLC sample period indexing
through a re-branded pub-sub func ``iter_ohlc_periods()``.
Move all feed/stream agnostic logic and shared mem writing into a new
set of routines inside the ``data`` sub-package. This lets us move
toward a more standard API for broker and data backends to provide
cache-able persistent streams to client apps.
The data layer now takes care of
- starting a single background brokerd task to start a stream for as
symbol if none yet exists and register that stream for later lookups
- the existing broker backend actor is now always re-used if possible
if it can be found in a service tree
- synchronization with the brokerd stream's startup sequence is now
oriented around fast startup concurrency such that client code gets
a handle to historical data and quote schema as fast as possible
- historical data loading is delegated to the backend more formally by
starting a ``backfill_bars()`` task
- write shared mem in the brokerd task and only destruct it once requested
either from the parent actor or further clients
- fully de-duplicate stream data by using a dynamic pub-sub strategy
where new clients register for copies of the same quote set per symbol
This new API is entirely working with the IB backend; others will need
to be ported. That's to come shortly.
The min tick size is the smallest step an instrument can move in value
(think the number of decimals places of precision the value can have).
We start leveraging this in a few places:
- make our internal "symbol" type expose it as part of it's api
so that it can be passed around by UI components
- in y-axis view box scaling, use it to keep the bid/ask spread (L1 UI)
always on screen even in the case where the spread has moved further
out of view then the last clearing price
- allows the EMS to determine dark order live order submission offsets
If you have a common broker feed daemon then likely you don't want to
create superfluous shared mem buffers for the same symbol. This adds an
ad hoc little context manger which keeps a bool state of whether
a buffer writer task currently is running in this process. Before we
were checking the shared array token cache and **not** clearing it when
the writer task exited, resulting in incorrect writer/loader logic on
the next entry..
Really, we need a better set of SC semantics around the shared mem stuff
presuming there's only ever one writer per shared buffer at given time.
Hopefully that will come soon!