Since porting all backends to the new `FeedInit` + `MktPair` + `Asset`
style init, we can now just directly pass a `MktPair` instance to the
history endpoint(s) since it's always called *after* the live feed
`.stream_quotes()` ep B)
This has a lot of benefits including allowing brokerd backends to have
more flexible, pre-processed market endpoint meta-data that piker has
already validated; makes handling special cases in much more straight
forward as well such as forex pairs from legacy brokers XD
First pass changes all crypto backends to expect this new input, ib will
come next after handling said special cases..
Previously we were passing the `fqme: str` which isn't as extensive nor
were we able to pass `MktPair` direct to backend history manager-loading
routines (which should be able to rely on always receiving it since
currently `stream_quotes()` is always called first for setup).
This also starts a slight bit of configuration oriented tsdb info
loading (via a new `conf.toml`) such that a user can decide to host
their (marketstore) db on a remote host and our container spawning and
client code will do the right startup automatically based on the config.
|-> Related to this I've added some comments about doing storage
backend module loading which should get actually written out as part of
patches coming in #486 (or something related).
Don't allow overruns again in history context since it seems it was
never a problem?
We need to allow overruns during the async multi-broker context spawning
init bc some backends might take longer then others to setup (eg.
binance vs. kucoin) and result in some context (stream) being overrun by
the time we get to the `.open_stream()` phase. Ideally, we can maybe
adjust the concurrent setup to be more of a task-per-provider style to
avoid this in the future - which would also in theory result in
more-immediate per-provider setup in terms showing ready feeds asap.
Also, does a bunch of renaming from fqsn -> fqme and drops the lower
casing of input symbols instead expecting the caller to know what the
data backend it's requesting is going to be able to handle in terms of
symbology.
Since it's a bit weird having service specific implementation details
inside the general service `._daemon` mod, and since i'd mentioned
trying this re-org; let's do it B)
Requires enabling the new mod in both `pikerd` and `brokerd` and
obviously a bit more runtime-loading of the service modules in the
`brokerd` service eps to avoid import cycles.
Also moved `_setup_persistent_brokerd()` into the new mod since the
naming would place it there even though the implementation really
wouldn't (longer run) since we want to split up `.data.feed` layer
backend-invoked eps into a separate actor eventually from the "actual"
`brokerd` which will be the actor running **only** the trade control eps
(eg. trades_dialogue()` and friends).
`trio`'s internals don't allow for async generator (and thus by
consequence dynamic reset of async exit stacks containing `@acm`s)
interleaving since doing so corrupts the cancel-scope stack. See details
in:
- https://github.com/python-trio/trio/issues/638
- https://trio-util.readthedocs.io/en/latest/#trio_util.trio_async_generator
- `trio._core._run.MISNESTING_ADVICE`
We originally tried to address this using
`@trio_util.trio_async_generator` in backend streaming code but for
whatever reason stopped working recently (at least for me) and it's more
or less implemented the same way as this patch but with more layers and
an extra dep. I also don't want us to have to address this problem again
if/when that lib isn't able to keep up to date with wtv `trio` is
doing..
So instead this is a complete rewrite of the conc design of our
auto-reconnect ws API to move all reset logic and msg relay into a bg
task which is respawned on reset-requiring events: user spec-ed msg recv
latency, network errors, roaming events.
Deatz:
- drop all usage of `AsyncExitStack` and no longer require client code
to (hackily) call `NoBsWs._connect()` on msg latency conditions,
intead this is all done behind the scenes and the user can instead
pass in a `msg_recv_timeout: float`.
- massively simplify impl of `NoBsWs` and move all reset logic into a
new `_reconnect_forever()` task.
- offer use of `reset_after: int` a count value that determines how many
`msg_recv_timeout` events are allowed to occur before reconnecting the
entire ws from scratch again.
Since we have made `MktPair.bs_mktid` mean something else now, change
all the feed setup var names to instead be more representative of the
actual value: `bs_fqme: str` and use the new `MktPair.bs_fqme` where
necessary.
The legacy version was a `dict` of `dicts` vs. now we want to be handed
a `list[FeedInit]`; process both in a factored way.
Drop `FeedInit.bs_mktid` since it's already defined on `.mkt.bs_mktid`
and we don't really need it top level.
More or less a replacement for what @guilledk did with the initial
attempt at a "broker check" type script a while back except in this case
we're going to always run this validation routine and it now uses a new
`FeedInit` struct to ensure backends are delivering the right schema-ed
data during startup. Also allows us to stick deprecation warnings / and
or strict API compat errors all in one spot (at least for live feeds).
Factors out a bunch of `MktPair` related adapter-logic into a new
`.validate.valiate_backend()` which warns to the backend implementer via
log msgs all the problems outstanding. Ideally we do our backend module
endpoint scan-and-complain regarding missing feature support from here
as well (eg. search, broker/trade ctl, ledger processing, etc.).
In `.feed` and `._sampling` move to using the new
`tractor.Context.open_stream(allow_overruns: bool)` (cough, A BREAKING
CHANGE).
Also set `Flume.mkt` during construction in `.feed.open_feed()`.
Might as well try and flip it over to the new type; make appropriate
dict serialization changes in `.to_msg()`. Alias back to `.symbol:
Symbol` with a property.
Initial attempt at getting the sampling and shm layer to use the new mkt
info meta-data type. Draft out a potential `BackendInitMsg:
msgspec.Struct` for validating the init msg returned from the
`stream_quotes()` start value; obvs don't actually use it yet.
Add `MktPair` handling block for when a backend delivers
a `mkt_info`-field containing init msg. Adjust the original
`Symbol`-style `'symbol_info'` msg processing to do `Decimal` defaults
and convert to `MktPair` including slapping in a hacky `_atype: str`
field XD
General initial name changes to `bs_mktid` and `_fqme` throughout!
Not sure how i missed this (and left in handling of `list.remove()` and
it ever worked for that?) after the `samplerd` impl in 5ec1a72 but, this
adjusts the remove-broken-subscriber loop to catch the correct
`set.remove()` exception type on a missing (likely already removed)
subscription entry.
There's been way too many issues when trying to calculate this
dynamically from the input array, so just expect the caller to know what
it's doing and don't bother with ever hitting the error case of
calculating and incorrect value internally.
Not sure why this was ever allowed but, for slicing to the sample
*before* whatever target time stamp is passed in we should definitely
not return the prior index as for the slice start since that might
include a very large gap prior to whatever sample is scanned to have
the earliest matching time stamp.
This was essential to fixing overlay intersect points searching in our
``ui.view_mode`` machinery..
In situations where clients are (dynamically) subscribing *while*
broadcasts are starting to taking place we need to handle the
`set`-modified-during-iteration case. This scenario seems to be more
common during races on concurrent startup of multiple symbols. The
solution here is to use another set to take note of subscribers which
are successfully sent-to and then skipping them on re-try.
This also contains an attempt to exception-handle throttled stream
overruns caused by higher frequency feeds (like binance) pushing more
quotes then can be handled during (UI) client startup.
Not really sure there's much we can do besides dump Grpc stuff when we
detect an "error" `str` for the moment..
Either way leave a buncha complaints (como siempre) and do linting
fixups..
Previously we would make the `ahabd` supervisor-actor sync to docker
container startup using pseudo-blocking log message processing.
This has issues,
- we're forced to do a hacky "yield back to `trio`" in order to be
"fake async" when reading the log stream and further,
- blocking on a message is fragile and often slow.
Instead, run the log processor in a background task and in the parent
task poll for the container to be in the client list using a similar
pseudo-async poll pattern. This allows the super to `Context.started()`
sooner (when the container is actually registered as "up") and thus
unblock its (remote) caller faster whilst still doing full log msg
proxying!
Deatz:
- adds `Container.cuid: str` a unique container id for logging.
- correctly proxy through the `loglevel: str` from `pikerd` caller task.
- shield around `Container.cancel()` in the teardown block and use
cancel level logging in that method.
With the addition of a new `elastixsearch` docker support in
https://github.com/pikers/piker/pull/464, adjustments were made
to container startup sync logic (particularly the `trio` checkpoint
sleep period - which itself is a hack around a sync client api) which
caused a regression in upstream startup logic wherein container error
logs were not being bubbled up correctly causing a silent failure mode:
- `marketstore` container started with corrupt input config
- `ahabd` super code timed out on startup phase due to a larger log
polling period, skipped processing startup logs from the container,
and continued on as though the container was started
- history client fails on grpc connection with no clear error on why the
connection failed.
Here we revert to the old poll period (1ms) to avoid any more silent
failures and further extend supervisor control through a configuration
override mechanism. To address the underlying design issue, this patch
adds support for container-endpoint-callbacks to override supervisor
startup configuration parameters via the 2nd value in their returned
tuple: the already delivered configuration `dict` value.
The current exposed values include:
{
'startup_timeout': 1.0,
'startup_query_period': 0.001,
'log_msg_key': 'msg',
},
This allows for container specific control over the startup-sync query
period (the hack mentioned above) as well as the expected log msg key
and of course the startup timeout.
Adds a `piker storage` subcmd with a `-d` flag to wipe a particular
fqsn's time series (both 1s and 60s). Obviously this needs to be
extended much more but provides a start point.
In order to support existing `pps.toml` files in the wild which don't
have the `asset_type, price_tick_size, lot_tick_size` fields, we need to
only optionally read them and instead expect that backends will write
the fields going forward (coming in follow patches).
Further this makes some small asset-size (vlm accounting) quantization
related adjustments:
- rename `Symbol.decimal_quant()` -> `.quantize_size()` since that is
explicitly what this method is doing.
- and expect an input `size: float` which we cast to decimal instead of
doing it inside the `.calc_size()` caller code.
- drop `Symbol.iterfqsns()` which wasn't being used anywhere at all..
Additionally, this drafts out a new replacement market-trading-pair data
type to eventually replace `.data._source.Symbol` -> `MktPair` which we
aren't using yet, but serves as the documentation-driven motivator ;)
and, it relates to https://github.com/pikers/piker/issues/467.
Add decimal quantize API to Symbol to simplify by-broker truncation
Add symbol info to `pps.toml`
Move _assert call to outside the _async_main context manager
Minor indentation and styling changes, also convert a few prints to log calls
Fix multi write / race condition on open_pps call
Switch open_pps to not write by default
Fix integer math kraken syminfo _tick_size initialization
For the purposes of avoiding another full format call we can stash the
last rendered 1d xy pre-graphics formats as
`IncrementalFormatter.x/y_1d: np.ndarray`s and allow readers in the viz
and render machinery to use this data easily for things like "only
drawing the last uppx's worth of data as a line". Also add
a `.flat_index_ratio: float` which can be used similarly as a scalar
applied to indexes into the src array but instead when indexing
(flattened) 1d xy formatted outputs. Finally, this drops the way
overdone/noisy `.__repr__()` meth we had XD
We obviously don't want to be debugging a sample-index issue if/when the
market for the asset is closed (since we'll be guaranteed to have
a mismatch, lul). Pass in the `feed_is_live: trio.Event` throughout the
backfilling routines to allow first checking for the live feed being active
so as to avoid breakpointing on false +ves. Also, add a detailed warning
log message for when *actually* investigating a mismatch.
This should never really happen but when it does it appears to be a race
with writing startup pre-graphics-formatter array data where we get
`x_end` epoch value subtracting some really small offset value (like
`-/+0.5`) or the opposite where the `x_start` is epoch and `x_end` is
small.
This adds a warning msg and `breakpoint()` as well as guards around the
entire code downsampling code path so that when resumed the downsampling
cycle should just be skipped and avoid a crash.
Whenever the last datum is in view `slice_from_time()` need to always
spec the final array index (i.e. the len - 1 value we set as
`read_i_max`) to avoid a uniform-step arithmetic error where gaps in the
underlying time series causes an index that's too low to be returned.
Doesn't seem like we really need to handle the situation where the start
or stop input time stamps are outside the index range of the data since
the new binary search handling via `numpy.searchsorted()` covers this
case at minimal runtime cost and with an equally correct output. Allows
us to drop some other indexing endpoint internal variables as well.
Define the x-domain coords "offset" (determining the curve graphics
per-datum placement) for each formatter such that there's only on place
to change it when needed. Obviously each graphics type has it's own
dimensionality and this is reflected by the array shapes on each
subtype.
Previously we were drawing with the middle of the bar on each index with
arms to either side: +/- some arm length. Instead this changes so that
each bar is drawn *after* each index/timestamp such that in graphics
coords the bar span more correctly matches the time span in the
x-domain. This makes the linked region between slow and fast chart
directly match (without any transform) for epoch-time indexing such that
the last x-coord in view on the fast chart is no more then the
next time step in (downsampled) slow view.
Deats:
- adjust in `._pathops.path_arrays_from_ohlc()` and take an `bar_w` bar
width input (normally taken from the data step size).
- change `.ui._ohlc.bar_from_ohlc_row()` and
`BarItems.draw_last_datum()` to match.
Allows easily switching between normal array `int` indexing and time
indexing by just flipping the `Viz._index_field: str`.
Also, guard all the x-data audit breakpoints with a time indexing
condition.
First allocation vs. first "prepend" of source data to an xy `ndarray`
format **must be mutex** in order to avoid a double prepend.
Previously when both blocks were executed we'd end up with
a `.xy_nd_start` that was decremented (at least) twice as much as it
should be on the first `.format_to_1d()` call which is obviously
incorrect (and causes problems for m4 downsampling as discussed below).
Further, since the underlying `ShmArray` buffer indexing is managed
(i.e. write-updated) completely independently from the incremental
formatter updates and internal xy indexing, we can't use
`ShmArray._first.value` and instead need to use the particular `.diff()`
output's prepend length value to decrement the `.xy_nd_start` on updates
after initial alloc.
Problems this resolves with m4:
- m4 uses a x-domain diff to calculate the number of "frames" to
downsample to, this is normally based on the ratio of pixel columns on
screen vs. the size of the input xy data.
- previously using an int-index (not epoch time) the max diff between
first and last index would be the size of the input buffer and thus
would never cause a large mem allocation issue (though it may have
been inefficient in terms of needed size).
- with an epoch time index this max diff could explode if you had some
near-now epoch time stamp **minus** an x-allocation value: generally
some value in `[0.5, -0.5]` which would result in a massive frames and
thus internal `np.ndarray()` allocation causing either a crash in
`numba` code or actual system mem over allocation.
Further, put in some more x value checks that trigger breakpoints if we
detect values that caused this issue - we'll remove em after this has
been tested enough.
If we presume that time indexing using a uniform step we can calculate
the exact index (using `//`) for the input time presuming the data
set has zero gaps. This gives a massive speedup over `numpy` fancy
indexing and (naive) `numba` iteration. Further in the case where time
gaps are detected, we can use `numpy.searchsorted()` to binary search
for the nearest expected index at lower latency.
Deatz,
- comment-disable the call to the naive `numba` scan impl.
- add a optional `step: int` input (calced if not provided).
- add todos for caching binary search results in the gap detection
cases.
- drop returning the "absolute buffer indexing" slice since the caller
can always just use the read-relative slice to acquire it.
Gives approx a 3-4x speedup using plain old iterate-with-for-loop style
though still not really happy with this .5 to 1 ms latency..
Move the core `@njit` part to a `_slice_from_time()` with a pure python
func with orig name around it. Also, drop the output `mask` array since
we can generally just use the slices in the caller to accomplish the
same input array slicing, duh..
We need to subtract the first index in the array segment read, not the
first index value in the time-sliced output, to get the correct offset
into the non-absolute (`ShmArray.array` read) array..
Further we **do** need the `&` between the advance indexing conditions
and this adds profiling to see that it is indeed real slow (like 20ms
ish even when using `np.where()`).
Planning to put the formatters into a new mod and aggregate all path
gen/op helpers into this module.
Further tweak include:
- moving `path_arrays_from_ohlc()` back to module level
- slice out the last xy datum for `OHLCBarsAsCurveFmtr` 1d formatting
- always copy the new x-value from the source to `.x_nd`
This was a major cause of error (particularly trying to get epoch
indexing working) and really isn't necessary; instead just have
`.diff()` always read from the underlying source array for current
index-step diffing and append/prepend slice construction.
Allows us to,
- drop `._last_read` state management and thus usage.
- better handle startup indexing by setting `.xy_nd_start/stop` to
`None` initially so that the first update can be done in one large
prepend.
- better understand and document the step curve "slice back to previous
level" logic which is now heavily commented B)
- drop all the `slice_to_head` stuff from and instead allow each
formatter to choose it's 1d segmenting.
Don't expect values (array + slice) to be returned and applied by
`.incr_update_xy_nd()` and instead presume this will implemented
internally in each (sub)formatter.
Attempt to simplify some incr-update routines, (particularly in the step
curve formatter, though most of it was reverted to just a simpler form
of the original implementation XD) including:
- dropping the need for the `slice_to_head: int` control.
- using the `xy_nd_start/stop` index counters over custom lookups.
Remove harcoded `'index'` field refs from all formatters in a first
attempt at moving towards epoch-time alignment (though don't actually
use it it yet).
Adjustments to the formatter interface:
- property for `.xy_nd` the x/y nd arrays.
- property for and `.xy_slice` the nd format array(s) start->stop index
slice.
Internal routine tweaks:
- drop `read_src_from_key` and always pass full source array on updates
and adjust handlers to expect to have to index the data field of
interest.
- set `.last_read` right after update calls instead of after 1d
conversion.
- drop `slice_to_head` array read slicing.
- add some debug points for testing 'time' indexing (though not used
here yet).
- add `.x_nd` array update logic for when the `.index_field` is not
'index' - i.e. when we begin to try and support epoch time.
- simplify some new y_nd updates to not require use of `np.broadcast()`
where possible.
Probably means it doesn't need to be a `Flume` method but it's
convenient to expect the caller to pass in the `np.ndarray` with
a `'time'` field instead of a `timeframe: str` arg; also, return the
slice mask instead of the sliced array as output (again allowing the
caller to do any slicing). Also, handle the slice-outside-time-range
case by just returning the entire index range with a `None` mask.
Adjust `Viz.view_data()` to instead do timeframe (for rt vs. hist shm
array) lookup and equiv array slicing with the returned mask.
Since these modules no longer contain Qt specific code we might
as well include them in the data sub-package.
Also, add `IncrementalFormatter.index_field` as single point to def the
indexing field that should be used for all x-domain graphics-data
rendering.
Since higher level charting and fsp management need access to the
new `Flume` indexing apis this adjusts some func sigs to pass through
(and/or create) flume instances:
- `LinkedSplits.add_plot()` and dependents.
- `ChartPlotWidget.draw_curve()` and deps, and it now returns a `Flow`.
- `.ui._fsp.open_fsp_admin()` and `FspAdmin.open_fsp_ui()` related
methods => now we wrap the destination fsp shm in a flume on the admin
side and is returned from `.start_engine_method()`.
Drop a bunch of (unused) chart widget methods including some already
moved to flume methods: `.get_index()`, `.in_view()`,
`.last_bar_in_view()`, `.is_valid_index()`.
Allows running simultaneous data feed services on the same (linux) host
by avoiding file-name collisions instead keying shm buffer sets by the
given `brokerd` instance. This allows, for example, either multiple dev
versions of the data layer to run side-by-side or for the test suite to
be seamlessly run alongside a production instance.
Always use `open_sample_stream()` to register fast and slow quote feed
buffers and get a sampler stream which we use to trigger
`Sampler.broadcast_all()` calls on the service side after backfill
events.
Now spawned under the `pikerd` tree as a singleton-daemon-actor we offer
a slew of new routines in support of this micro-service:
- `maybe_open_samplerd()` and `spawn_samplerd()` which provide the
`._daemon.Services` integration to conduct service spawning.
- `open_sample_stream()` which is a client-side endpoint which does all
the work of (lazily) starting the `samplerd` service (if dne) and
registers shm buffers for update as well as connect a sample-index
stream for iterator by the caller.
- `register_with_sampler()` which is the `samplerd`-side service task
endpoint implementing all the shm buffer and index-stream registry
details as well as logic to ensure a lone service task runs
`Services.increment_ohlc_buffer()`; it increments at the shortest period
registered which, for now, is the default 1s duration.
Further impl notes:
- fixes to `Services.broadcast()` to ensure broken streams get discarded
gracefully.
- we use a `pikerd` side singleton mutex `trio.Lock()` to ensure
one-and-only-one `samplerd` is ever spawned per `pikerd` actor tree.
We're moving toward a single actor managing sampler work and distributed
independently of `brokerd` services such that a user can run samplers on
different hosts then real-time data feed infra. Most of the
implementation details include aggregating `.data._sampling` routines
into a new `Sampler` singleton type.
Move the following methods to class methods:
- `.increment_ohlc_buffer()` to allow a single task to increment all
registered shm buffers.
- `.broadcast()` for IPC relay to all registered clients/shms.
Further add a new `maybe_open_global_sampler()` which allocates
a service nursery and assigns it to the `Sampler.service_nursery`; this
is prep for putting the step incrementer in a singleton service task
higher up the data-layer actor tree.
When we see multiple history frames that are duplicate to the request
set, bail re-trying after a number of tries (6 just cuz) and return
early from the tsdb backfill loop; presume that this many duplicates
means we've hit the beginning of history. Use a `collections.Counter`
for the duplicate counts. Make sure and warn log in such cases.
Wow, turns out tick framing was totally borked since we weren't framing
on "greater then throttle period long waits" XD
This moves all the framing logic into a common func and calls it in
every case:
- every (normal) "pre throttle period expires" quote receive
- each "no new quote before throttle period expires" (slow case)
- each "no clearing tick yet received" / only burst on clears case