Always use `open_sample_stream()` to register fast and slow quote feed
buffers and get a sampler stream which we use to trigger
`Sampler.broadcast_all()` calls on the service side after backfill
events.
Now spawned under the `pikerd` tree as a singleton-daemon-actor we offer
a slew of new routines in support of this micro-service:
- `maybe_open_samplerd()` and `spawn_samplerd()` which provide the
`._daemon.Services` integration to conduct service spawning.
- `open_sample_stream()` which is a client-side endpoint which does all
the work of (lazily) starting the `samplerd` service (if dne) and
registers shm buffers for update as well as connect a sample-index
stream for iterator by the caller.
- `register_with_sampler()` which is the `samplerd`-side service task
endpoint implementing all the shm buffer and index-stream registry
details as well as logic to ensure a lone service task runs
`Services.increment_ohlc_buffer()`; it increments at the shortest period
registered which, for now, is the default 1s duration.
Further impl notes:
- fixes to `Services.broadcast()` to ensure broken streams get discarded
gracefully.
- we use a `pikerd` side singleton mutex `trio.Lock()` to ensure
one-and-only-one `samplerd` is ever spawned per `pikerd` actor tree.
We're moving toward a single actor managing sampler work and distributed
independently of `brokerd` services such that a user can run samplers on
different hosts then real-time data feed infra. Most of the
implementation details include aggregating `.data._sampling` routines
into a new `Sampler` singleton type.
Move the following methods to class methods:
- `.increment_ohlc_buffer()` to allow a single task to increment all
registered shm buffers.
- `.broadcast()` for IPC relay to all registered clients/shms.
Further add a new `maybe_open_global_sampler()` which allocates
a service nursery and assigns it to the `Sampler.service_nursery`; this
is prep for putting the step incrementer in a singleton service task
higher up the data-layer actor tree.
When we see multiple history frames that are duplicate to the request
set, bail re-trying after a number of tries (6 just cuz) and return
early from the tsdb backfill loop; presume that this many duplicates
means we've hit the beginning of history. Use a `collections.Counter`
for the duplicate counts. Make sure and warn log in such cases.
Wow, turns out tick framing was totally borked since we weren't framing
on "greater then throttle period long waits" XD
This moves all the framing logic into a common func and calls it in
every case:
- every (normal) "pre throttle period expires" quote receive
- each "no new quote before throttle period expires" (slow case)
- each "no clearing tick yet received" / only burst on clears case
Add some (untested) data slicing util methods for mapping time ranges to
source data indices:
- `.get_index()` which maps a single input epoch time to an equiv array
(int) index.
- add `slice_from_time()` which returns a view of the shm data from an
input epoch range presuming the underlying struct array contains
a `'time'` field with epoch stamps.
- `.view_data()` which slices out the "in view" data according to the
current state of the passed in `pg.PlotItem`'s view box.
This has been an outstanding idea for a while and changes the framing
format of tick events into a `dict[str, list[dict]]` wherein for each
tick "type" (eg. 'bid', 'ask', 'trade', 'asize'..etc) we create an FIFO
ordered `list` of events (data) and then pack this table into each
(throttled) send. This gives an additional implied downsample reduction
(in terms of iteration on the consumer side) from `N` tick-events to
a (max) `T` tick-types presuming the rx side only needs the latest tick
event.
Drop the `types: set` and adjust clearing event test to use the new
`ticks_by_type` map's keys.
Instead of uniformly distributing the msg send rate for a given
aggregate subscription, choose to be more bursty around clearing ticks
so as to avoid saturating the consumer with L1 book updates and vs.
delivering real trade data as-fast-as-possible.
Presuming the consumer is in the "UI land of slow" (eg. modern display
frame rates) such an approach serves more useful for seeing "material
changes" in the market as-bursty-as-possible (i.e. more short lived fast
changes in last clearing price vs. many slower changes in the bid-ask
spread queues). Such an approach also lends better to multi-feed
overlays which in aggregate tend to scale linearly with the number of
feeds/overlays; centralization of bursty arrival rates allows for
a higher overall throttle rate if used cleverly with framing.
Allows using `set` ops for subscription management and guarantees no
duplicates per `brokerd` actor. New API is simpler for dynamic
pause/resume changes per `Feed`:
- `_FeedsBus.add_subs()`, `.get_subs()`, `.remove_subs()` all accept multi-sub
`set` inputs.
- `Feed.pause()` / `.resume()` encapsulates management of *only* sending
a msg on each unique underlying IPC msg stream.
Use new api in sampler task.
Previously we would only detect overruns and drop subscriptions on
non-throttled feed subs, however you can get the same issue with
a wrapping throttler task:
- the intermediate mem chan can be blocked either by the throttler task
being too slow, in which case we still want to warn about it
- the stream's IPC channel actually breaks and we still want to drop
the connection and subscription so it doesn't be come a source of
stale backpressure.
Set each quote-stream by matching the provider for each `Flume` and thus
results in some flumes mapping to the same (multiplexed) stream.
Monkey-patch the equivalent `tractor.MsgStream._ctx: tractor.Context` on
each broadcast-receiver subscription to allow use by feed bus methods as
well as other internals which need to reference IPC channel/portal info.
Start a `_FeedsBus` subscription management API:
- add `.get_subs()` which returns the list of tuples registered for the
given key (normally the fqsn).
- add `.remove_sub()` which allows removing by key and tuple value and
provides encapsulation for sampler task(s) which deal with dropped
connections/subscribers.
Adds provider-list-filtered (quote) stream multiplexing support allowing
for merged real-time `tractor.MsgStream`s using an `@acm` interface.
Behind the scenes we are just doing a classic multi-task push to common
mem chan approach.
Details to make it work on `Feed`:
- add `Feed.mods: dict[str, Moduletype]` and
`Feed.portals[ModuleType, tractor.Portal]` which are both populated
during init in `open_feed()`
- drop `Feed.portal` and `Feed.name`
Also fix a final lingering tsdb history loading loop termination bug.
A slight facepalm but, the main issue was a simple indexing logic error:
we need to slice with `tsdb_history[-shm._first.value:]` to push most
recent history not oldest.. This allows cleanup of tsdb backfill loop as
well.
Further, greatly simply `diff_history()` time slicing by using the
classic `numpy` conditional slice on the epoch field.
This had a bug prior where the end of a frame (a partial) wasn't being
sliced correctly and we'd get odd gaps showing up in the backfilled from
`brokerd` vs. tsdb end index. Repair this by doing timeframe aware index
diffing in `diff_history()` which seems to resolve it. Also, use the
frame-result's `end_dt: datetime` for the loop exit condition.
Sync per-symbol sampler loop start to subscription registers such that
the loop can't start until the consumer's stream subscription is added;
the task-sync uses a `trio.Event`. This patch also drops a ton of
commented cruft.
Further adjustments needed to get parity with prior functionality:
- pass init msg 'symbol_info' field to the `Symbol.broker_info: dict`.
- ensure the `_FeedsBus._subscriptions` table uses the broker specific
(without brokername suffix) as keys for lookup so that the sampler
loop doesn't have to append in the brokername as a suffix.
- ensure the `open_feed_bus()` flumes-table-msg returned sent by
`tractor.Context.started()` uses the `.to_msg()` form of all flume
structs.
- ensure `maybe_open_feed()` uses `tractor.MsgStream.subscribe()` on all
`Flume.stream`s on cache hits using the
`tractor.trionics.gather_contexts()` helper.
Orient shm-flow-arrays around the new idea of a `Flume` which provides
access, mgmt and basic measure of real-time data flow sets (see water
flow management semantics).
- We discard the previous idea of a "init message" which contained all
the shm attachment info and instead send a startup message full of
`Flume.to_msg()`s which are symmetrically loaded on the caller actor
side.
- Create data-flows "entries" for every passed in fqsn such that the consumer gets back
streams and shm for each, now all wrapped in `Flume` types. For now we
allocate `brokermod.stream_quotes()` tasks 1-to-1 for each fqsn
(instead of expecting each backend to do multi-plexing, though we
might want that eventually) as well a `_FeedsBus._subscriber` entry
for each. The pause/resume management loop is adjusted to match.
Previously `Feed`s were allocated 1-to-1 with each fqsn.
- Make `Feed` a `Struct` subtype instead of a `@dataclass` and move all
flow specific attrs to the new `Flume`:
- move `.index_stream()`, `.get_ds_info()` to `Flume`.
- drop `.receive()`: each fqsn entry will now require knowledge of
separate streams by feed users.
- add multi-fqsn tables: `.flumes`, `.streams` which point to the
appropriate per-symbol entries.
- Async load all `Flume`s from all contexts and all quote streams using
`tractor.trionics.gather_contexts()` on the client `open_feed()` side.
- Update feeds test to include streaming 2 symbols on the same (binance)
backend.
There never was any underlying db bug, it was a hardcoded timeframe in
the column series write key.. Now we always assert a matching timeframe
in results.
Not only improves startup latency but also avoids a bug where the rt
buffer was being tsdb-history prepended *before* the backfilling of
recent data from the backend was complete resulting in our of order
frames in shm.
If a history manager raises a `DataUnavailable` just assume the sample
rate isn't supported and that no shm prepends will be done. Further seed
the shm array in such cases as before from the 1m history's last datum.
Also, fix tsdb -> shm back-loading, cancelling tsdb queries when either
no array-data is returned or a frame is delivered which has a start time
no lesser then the least last retrieved. Use strict timeframes for every
`Storage` API call.
Turns out querying for a high freq timeframe (like 1sec) will still
return a lower freq timeframe (like 1Min) SMH, and no idea if it's the
server or the client's fault, so we have to explicitly check the sample
step size and discard lower freq series-results. Do this inside
`Storage.read_ohlcv()` and return an empty `dict` when the wrong time
step is detected from the query result.
Further enforcements,
- both `.load()` and `read_ohlcv()` now require an explicit `timeframe:
int` input to guarantee the time step of the output array.
- drop all calls `.load()` with non-timeframe specific input.
Our default sample periods are 60s (1m) for the history chart and 1s for
the fast chart. This patch adds concurrent loading of both (or more)
different sample period data sets using the existing loading code but
with new support for looping through a passed "timeframe" table which
points to each shm instance.
More detailed adjustments include:
- breaking the "basic" and tsdb loading into 2 new funcs:
`basic_backfill()` and `tsdb_backfill()` the latter of which is run
when the tsdb daemon is discovered.
- adjust the fast shm buffer to offset with one day's worth of 1s so
that only up to a day is backfilled as history in the fast chart.
- adjust bus task starting in `manage_history()` to deliver back the
offset indices for both fast and slow shms and set them on the
`Feed` object as `.izero_hist/rt: int` values:
- allows the chart-UI linked view region handlers to use the offsets
in the view-linking-transform math to index-align the history and
fast chart.
It doesn't seem to be any slower on our least throttled backend
(binance) and it removes a bunch of hard to get correct frame
re-ordering logic that i'm not sure really ever fully worked XD
Commented some issues we still need to resolve as well.
Adjust all history query machinery to pass a `timeframe: int` in seconds
and set default of 60 (aka 1m) such that history views from here forward
will be 1m sampled OHLCV. Further when the tsdb is detected as up load
a full 10 years of data if possible on the 1m - backends will eventually
get a config section (`brokers.toml`) that allow user's to tune this.
The `Store.load()`, `.read_ohlcv()` and `.write_ohlcv()` and
`.delete_ts()` now can take a `timeframe: Optional[float]` param which
is used to look up the appropriate sampling period table-key from
`marketstore`.
As part of supporting a "history view" chart which shows downsampled
datums alongside our 1s (or higher) sampled OHLC we need a separate
buffer to store a the slower history from broker backends. This begins
that design by allocating 2 buffers:
- `rt_shm: ShmArray` which maps to a `/dev/shm/` file with `_rt` suffix
- `hist_shm: ShmArray` which maps to a file with `_hist` suffix
Deliver both of these shms back from both `manage_history()` and load
them as `Feed.rt_shm`/`.hist_shm` on the client side.
Impl deats:
- init the rt buffer with the first datum from loaded history and
assign all OHLC values to that row's 'close' and the vlm to 0.
- pass the hist buffer to the backfiller task
- only spawn **one** global sampler array-row increment task per
`brokerd` and pass in the 1s delay which we presume is our lowest
OHLC sample rate for now.
- drop `open_sample_step_stream()` and just move its body contents into
`Feed.index_stream()`
Instead of worrying about the increment period per shm subscription,
just use the value passed as input and presume the caller knows that
only one task is necessary and that the wakeup (sampling) period should
be the shortest that is needed.
It's very unlikely we don't want at least a 1s sampling (both in terms
of task switching cost and general usage) which will eventually ship as
the default "real-time" feed "timeframe". Further, this "fast" increment
sampling task can handle all lower sampling periods (eg. 1m, 5m, 1H)
based on the current implementation just the same.
Also, add a global default sample period as `_defaul_delay_s` for use in
other internal modules.
Since we figured out how to pass through ems dialog ids to the
`openOrders` sub we don't really need to do much with status updates
other then error handling. This drops `process_status()` and moves the
error handling logic into a status handler sub-block; we now just
info-log status updates for troubleshooting purposes.