Commit Graph

3789 Commits (659649ec48f81805142fb047bf16f0b004a66a7d)

Author SHA1 Message Date
Tyler Goodlet 659649ec48 Bah, fix nursery indents for maybe tsdb backloading
Can't ref `dt_eps` and `tsdb_entry` if they don't exist.. like for 1s
sampling from `binance` (which dne). So make sure to add better logic
guard and only open the finaly backload nursery if we actually need to
fill the gap between latest history and where tsdb history ends.

TO CHERRY #486
2023-12-18 19:46:59 -05:00
Tyler Goodlet f7cc43ee0b Add pauses to `store anal/ldshm` only on bad segs
Particularly halting before maybe writing the repaired timeseries
history in `store anal` to optionally allow user to avoid writing to
storage.
2023-12-18 11:56:57 -05:00
Tyler Goodlet f5dc21d3f4 Adjust all `.tsp` imports to use new sub-pkg
Also toss in a poll loop around the `hist_shm: ShmArray` backfill
read-check in the `.data.allocate_persisten_feed()`  init to cope with
possible racy-ness from the increased tsdb history loading concurrency
now implemented.
2023-12-18 11:54:28 -05:00
Tyler Goodlet 4568c55f17 Create `piker.tsp` "time series processing" subpkg
Move `.data.history` -> `.tsp.__init__.py` for now as main pkg-mod
and `.data.tsp` -> `.tsp._anal` (for analysis).

Obviously follow commits will change surrounding codebase (imports) to
match..
2023-12-18 11:53:27 -05:00
Tyler Goodlet d5d68f75ea ib: only raise first quote timeout err after tries
Previously we were actually failing silently too fast instead of
actually trying multiple times (now we do for 100) before finally
raising any timeout in the final loop `else:` block.
2023-12-18 11:45:19 -05:00
Tyler Goodlet 1f9a497637 Fixup symcache annot for kucoin as well 2023-12-15 16:01:31 -05:00
Tyler Goodlet 40c5d88a9b Fixup symcache type annots; no more `Pair` type 2023-12-15 16:00:51 -05:00
Tyler Goodlet 8989c73a93 Move `iter_dfs_from_shms` into `.data.history`
Thinking about just moving all of that module (after a content breakup)
to a new `.piker.tsp` which will mostly depend on the `.data` and
`.storage` sub-pkgs; the idea is to move biz-logic for tsdb IO/mgmt and
orchestration with real-time (shm) buffers and the graphics layer into
a common spot for both manual analysis/research work and better
separation of low level data structure primitives from their higher
level usage.

Add a better `data.history` mod doc string in prep for this move
as well as clean out a bunch of legacy commented cruft from the
`trimeter` and `marketstore` days.

TO CHERRY #486 (if we can)
2023-12-15 15:53:02 -05:00
Tyler Goodlet 3639f360c3 Reactivate forced viz updates from sampler broadcasts in hist display loop 2023-12-15 13:59:19 -05:00
Tyler Goodlet afd0781b62 Add (shm) abs index to `ContextLabel` 2023-12-15 13:57:10 -05:00
Tyler Goodlet ba154ef413 ib: don't bother with recursive not-enough-bars queries for now, causes more problems then it solves.. 2023-12-15 13:56:42 -05:00
Tyler Goodlet 97e2403fb1 Rework backfiller and null-segment task conc
For each timeframe open a sub-nursery to do the backfilling + tsdb load
+ null-segment scanning in an effort to both speed up load time (though
we need to reverse the current order to really make it faster rn since
moving to the much faster parquet file backend) and do concurrent
time-gap/null-segment checking of tsdb history while mrf (most recent
frame) history is backfilling.

The details are more or less just `trio` related task-func composition
tricks and a reordering of said funcs for optimal startup latency.
Also commented the `back_load_from_tsdb()` task for now since it's
unused.
2023-12-15 13:11:00 -05:00
Tyler Goodlet a4084d6a0b Bleh, fix another off-by-one issue in `np.argwhere()`
Apparently it returns the index of the prior zero-row (prolly since we
do the backward difference) so ensure `fi_zgaps += 1`..

Also fix remaining edge case handling when there's only 2 zero-segs
which was borked after a refactor to the special case blocks (like
a single zero row) prior to the `absi_zsegs` building loop AND make sure
to always return abs indices OUTSIDE the zero seg, i.e. the indices of
the non-zero row just before and just after so that the history
backfiller can use non-zero timestamps to generate range datetimes for
backend frame queries.

Add much more detailed doc-comments with a small ascii diagram to
explain how all these somewhat subtle vec ops work. Also toss in some
sanity checks on the output indices to ensure they don't point to
zero (time) valued rows when used to read the frame.
2023-12-15 12:48:50 -05:00
Tyler Goodlet 83bdca46a2 Wrap null-gap detect and fill in async gen
Call it `iter_null_segs()` (for now?) and use in the final (sequential)
stage of the `.history.start_backfill()` task-func. Delivers abs,
frame-relative, and equiv time stamps on each iteration pertaining to
each detected null-segment to make it easy to do piece-wise history
queries for each.

Further,
- handle edge case in `get_null_segs()` where there is only 1 zeroed
  row value, in which case we deliver `absi_zsegs` as a single pair of
  the same index value and,
  - when this occurs `iter_null_seqs()` delivers `None` for all the
    `start_` related indices/timestamps since all `get_hist()` routines
    (delivered by `open_history_client()`) should handle it as being a
    "get max history from this end_dt" type query.
- add note about needing to do time gap handling where there's a gap in
  the timeseries-history that isn't actually IN the data-history.
2023-12-13 18:29:06 -05:00
Tyler Goodlet c129f5bb4a Finally write a general purpose null-gap detector!
Using a bunch of fancy `numpy` vec ops (and ideally eventually extending
the same to `polars`) this is a first draft of `get_null_segs()`
a `col: str` field-value-is-zero detector which filters to all zero-valued
input frame segments and returns the corresponding useful slice-indexes:
- gap absolute (in shm buffer terms) index-endpoints as
  `absi_zsegs` for slicing to each null-segment in the src frame.
- ALL abs indices of rows with zeroed `col` values as `absi_zeros`.
- the full set of the input frame's row-entries (view) which are
  null valued for the chosen `col` as `zero_t`.

Use this new null-segment-detector in the
`.data.history.start_backfill()` task to attempt to fill null gaps that
might be extant from some prior backfill attempt. Since
`get_null_segs()` should now deliver a sequence of slices for each gap
we don't really need to have the `while gap_indices:` loop any more, so
just move that to the end-of-func and warn log (for now) if all gaps
aren't eventually filled.

TODO:
-[ ] do the null-seg detection and filling concurrently from
  most-recent-frame backfilling.
-[ ] offer the same detection in `.storage.cli` cmds for manual tsp
  anal.
-[ ] make the graphics layer actually update correctly when null-segs
  are filled (currently still broken somehow in the `Viz` caching
  layer?)

CHERRY INTO #486
2023-12-13 15:26:33 -05:00
Tyler Goodlet c4853a3fee Drop inter-method NL 2023-12-13 09:27:23 -05:00
Tyler Goodlet f274c3db3b Import `np2pl()` from `.data.tsp`
Also toss in todo for a timeseries search CLI cmd which can be handy
when doing offine store mgmt.
2023-12-13 09:25:44 -05:00
Tyler Goodlet b95932ea09 `.data.history`: run `.tsp.dedupe()` in backloader
In an effort to catch out-of-order and/or partial-frame-duplicated
segments, add some `.tsp` calls throughout the backloader tasks
including a call to the new `.sort_diff()` to catch the out-of-order
history cases.
2023-12-12 19:57:46 -05:00
Tyler Goodlet e8bf4c6e04 Return the `.len()` diff from `dedupe()` instead
Since the `diff: int` serves as a predicate anyway (when `0` nothing
duplicate was detected) might as well just return it directly since it's
likely also useful for the caller when doing deeper anal.

Also, handle the zero-diff case by just returning early with a copy of
the input frame and a `diff=0`.

CHERRY INTO #486
2023-12-12 16:48:56 -05:00
Tyler Goodlet 8e4d1a48ed Bleh, fix ib's `Client.bars()` recursion..
Turns out this was the main source of all sorts of gaps and overlaps
in history frame backfilling. The original idea was that when a gap
causes not enough (1m) bars to be delivered (like over a weekend or
holiday) when we just implicitly do another frame query to try and at
least fill out the default duration (normally 1-2 days). Doing the
recursion sloppily was causing all sorts of stupid problems..

It's kinda obvious now what was wrong in hindsight:
- always pass the sampling period (timeframe) when recursing
- adjust the logic to not be mutex with the no-data case (since it
  already is mutex..)
- pack to the `numpy` array BEFORE the recursive call to ensure the
  `end_dt: DateTime` is selected and passed correctly!

Toss in some other helpfuls:
- more explicit `pendulum` typing imports
- some masked out sorted-diffing checks (that can be enabled when
  debugging out-of-order frame issues)
- always error log about less-than time step mismatches since we should never
  have time-diff steps **smaller** then specified in the
  `sample_period_s`!
2023-12-12 16:19:21 -05:00
Tyler Goodlet b03eceebef data.tsp: drop masked `return` one liner 2023-12-11 20:11:42 -05:00
Tyler Goodlet f7a8d79b7b Add `NativeStorageClient._cache_df()` use it in `.write_ohlcv()` for caching on writes as well 2023-12-11 20:10:53 -05:00
Tyler Goodlet 49c458710e Move `numpy` <-> `polars` converters into `.data.tsp`
Yet again these are (going to be) generally useful in the data proc
layer as well as going forward with (possibly) moving the history and
shm rt-processing layer to apache (arrow or other) shared-ds
equivalents.
2023-12-11 17:53:31 -05:00
Tyler Goodlet b94582cb35 Move `dedupe()` to `.data.tsp` (so it has pals)
Includes a rename of `.data._timeseries` -> `.data.tsp` for "time series
processing", making it a public sub-mod; it contains a highly useful set
of data-frame and `numpy.ndarray` ops routines in various subsystems Bo
2023-12-11 16:24:27 -05:00
Tyler Goodlet 7311000846 Facepalm, set `was_deduped` as bool not the deduped frame.. 2023-12-11 13:18:10 -05:00
Tyler Goodlet e719733f97 Comment out overlap case block for now too? 2023-12-08 19:08:10 -05:00
Tyler Goodlet cb941a5554 BABOSO.. fix last history frame overlap slicing!
I guess since i started supporting the whole "allow a gap between
the latest tsdb sample and the latest retrieved history frame" the
overlap slicing has been completely borked XD where we've been sticking
in duplicate history samples and this has caused all sorts of down
stream time-series processing issues..

So fix that but ensuring whenever there IS an overlap between history in
the latest frame and the tsdb that we always prefer the latest frame's
data and slice OUT the tsdb's duplicate indices..

CHERRY TO #486
2023-12-08 18:56:38 -05:00
Tyler Goodlet 2d72a052aa Woops, make sure non-disti mode still works wen maybe getting `pikerd` XD 2023-12-08 17:43:52 -05:00
Tyler Goodlet 2eeef2a123 Add `dedupe()` to help with gap detection/resolution
Think i finally figured out the weird issue without out-of-order OHLC
history getting jammed in the wrong place:
- gap is detected in parquet/offline ts (likely due to a zero dt or
  other gap),
- query for history in the gap is made BUT that frame is then inserted
  in the shm buffer **at the end** (likely using array int-entry
  indexing) which inserts it at the wrong location,
- later this out-of-order frame is written to the storage layer
  (parquet) and then is repeated on further reboots with the original
  gap causing further queries for the same frame on every history
  backfill.

A set of tools useful for detecting these issues and annotating them
nicely on chart part of this patch's intent:
- `dedupe()` will detect any dt gaps, deduplicate datetime rows and
  return the de-duplicated df along with gaps table.
- use this in both `piker store anal` such that we potentially
  resolve and backfill the gaps correctly if some rows were removed.
- possibly also use this to detect the backfilling error in logic at
  the time of backfilling the frame instead of after the fact (which
  would require re-writing the shm array from something like `store
  ldshm` and would be a manual post-hoc solution, not a fix to the
  original issue..
2023-12-08 15:11:34 -05:00
Tyler Goodlet b6d2550f33 Add datetime col de-duplicator 2023-12-08 14:38:27 -05:00
Tyler Goodlet b9af6176c5 Factor `TimeseriesNotFound` to top level
TO CHERRY into #486
2023-12-07 12:31:14 -05:00
Tyler Goodlet dd0167b9a5 Make `fsp.cascade()` expect src/dst `Flume`s
Been meaning to this for a while, and there's still a few design
/ interface kinks (like `.mkt: MktPair` which should be better
generalized?) but this flips over all of the fsp chaining engine
to operate on the higher level `Flume` APIs via the newly cobbled
`Cascade` thinger..
2023-12-06 17:53:35 -05:00
Tyler Goodlet 9e71e0768f Define and pass a default `Flume._readonly: bool`
Allows opening with `.from_msg(readonly=False)` for write permissions
making underlyig shm arrays readonly. Also, make sure to pop the
`ShmArray` field entries prior to msg-ization, not sure how that worked
with the `Feed.flumes` equivalent..but?
2023-12-06 17:25:49 -05:00
Tyler Goodlet 6029f39a3f Allow `MktPair.from/to_msg()` to still do `.dst: str` for fsp flumes 2023-12-06 17:09:52 -05:00
Tyler Goodlet 656e2c6a88 fsp: intro a `Cascade` type that connects `Flume`s of streams 2023-12-05 16:59:07 -05:00
Tyler Goodlet 9245d24b47 ib: add `.pause()` on symbol query overruns to aid in fixing the issue 2023-12-04 13:10:15 -05:00
Tyler Goodlet 22bd83943b .storage: support `store anal --pdb` flag 2023-12-04 13:00:33 -05:00
Tyler Goodlet b94931bbdd Fix `Portal.channel: Channel` attr name error 2023-12-04 13:00:04 -05:00
Tyler Goodlet 239c1c457e Sort fqme suggestions pre-print 2023-12-04 11:34:39 -05:00
Tyler Goodlet 24a54a7085 Add `TimeseriesNotFound` for fqme lookup failures
A common usage error is to run `piker anal mnq.cme.ib` where the CLI
passed fqme is not actually fully-qualified (in this case missing an
expiry token) and we get an underlying `FileNotFoundError` from the
`StorageClient.read_ohlcv()` call. In such key misses, scan the existing
`StorageClient._index` for possible matches and report in a `raise from`
the new error.

CHERRY into #486
2023-12-04 11:22:55 -05:00
Tyler Goodlet ebd1eb114e Port runtime init to new `tractor.Actor.reg_addrs` related changes 2023-11-21 15:18:52 -05:00
Tyler Goodlet d3dab17939 order_mode: fix to avoid `Dialog.uuid` on null dialog.. 2023-10-20 13:57:52 -04:00
Tyler Goodlet cadc200818 Always ignore untracked-order error msgs from `brokerd` 2023-10-16 13:15:12 -04:00
Tyler Goodlet 363c8dfdb1 Default spec registrar set as empty addr list
Since it probably IS sane to just assume a root-actor-as-registrar
listening on the localhost as a default, AND allows NOT expecting every
caller of `open_piker_runtime()` to not have to pass an addr set XD

This makes a bucha CLI shit work again after breakage due to no
default..
2023-10-03 13:36:22 -04:00
Tyler Goodlet 00c046c280 Factor transport-ep parser/loader into helper
For now def it `.cli.load_trans_eps()` just inside the pkg mod; only
loads the ep for `pikerd` which currently acts as the main service-actor
registrar per host. Delegate to this new `.load_trans_eps()`
as-it-was-used from the `pikerd` cmd body and add fresh support for
`piker chart --maddr <addr: str>` using the routine in the body of the
`piker.cli.cli` cmd group after loading the `conf.toml::network` section
B)

Also, toss in runtime debug mode wrapping around `piker chart` using the
new `tractor.devx.maybe_open_crash_handler()` and pull the switch from
a `--pdb` flag now factored into the `.cli.cli` click group.
2023-10-03 10:00:01 -04:00
Tyler Goodlet 9165515811 ib: more detailed comments on wait-for-quote-task todo 2023-10-02 17:57:47 -04:00
Tyler Goodlet 543c11f377 ib: only normalize and log first quote if it arrives 2023-10-01 19:14:08 -04:00
Tyler Goodlet 637d33d7cc Make `.config.load_accounts()` load `brokers.toml`.. 2023-10-01 19:09:15 -04:00
Tyler Goodlet e5fdb33e31 Port cache-`dict` search to new `rapidfuzz` api 2023-10-01 17:46:46 -04:00
Tyler Goodlet 81a8cd1685 binance: always load the `brokers.toml` file since default is `conf.toml` now 2023-10-01 17:37:09 -04:00