Commit Graph

601 Commits (12e371b0274591155b153359ecf4a3d95fd228b3)

Author SHA1 Message Date
Tyler Goodlet 07331a160e Expose "bar gap margin" as `.ui._formatters.BGM: float` 2023-12-28 10:37:20 -05:00
Tyler Goodlet 1231c459aa Track data feed subscribers using a new `Sub(Struct)`
In prep for supporting reverse-ipc connect-back to UI actors from
middle-ware systems (for the purposes of triggering data-view canvas
re-renders and built-in tsp annotations), add a new struct type to
better generalize the management of remote feed subscriptions. Include
a `Sub.rc_ui: bool` for now (with nearby todo-comment) and expose an
`allow_remote_ctl_ui: bool` through the feed endpoints to help drive
/ prep for all that ^

Rework all the sampler tasks to expect the `Sub`'s new iface:

- split up the `Sub.ipc: MsgStream`  and `.send_chan` as separate fields
  since we're handling the throttle case in separate
  `sample_and_broadcast()` logic blocks anyway and avoids needing to
  monkey-patch on the `._ctx` malarky..
- explicitly provide the optional handle to the `_throttle_cs:
  CancelScope` again for the case where throttling/event-downsampling is
  requested.
- add `_FeedsBus.subs_items()` as a public iterator.
2023-12-26 20:48:06 -05:00
Tyler Goodlet a681b2f0bb Drop passing `bus` to `tsp.manage_history()` in feed allocator 2023-12-22 21:44:38 -05:00
Tyler Goodlet 5a60974990 Use explicit `.data.feed` import of `tractor.trionics` 2023-12-21 20:26:45 -05:00
Tyler Goodlet f5dc21d3f4 Adjust all `.tsp` imports to use new sub-pkg
Also toss in a poll loop around the `hist_shm: ShmArray` backfill
read-check in the `.data.allocate_persisten_feed()`  init to cope with
possible racy-ness from the increased tsdb history loading concurrency
now implemented.
2023-12-18 11:54:28 -05:00
Tyler Goodlet 4568c55f17 Create `piker.tsp` "time series processing" subpkg
Move `.data.history` -> `.tsp.__init__.py` for now as main pkg-mod
and `.data.tsp` -> `.tsp._anal` (for analysis).

Obviously follow commits will change surrounding codebase (imports) to
match..
2023-12-18 11:53:27 -05:00
Tyler Goodlet 40c5d88a9b Fixup symcache type annots; no more `Pair` type 2023-12-15 16:00:51 -05:00
Tyler Goodlet 8989c73a93 Move `iter_dfs_from_shms` into `.data.history`
Thinking about just moving all of that module (after a content breakup)
to a new `.piker.tsp` which will mostly depend on the `.data` and
`.storage` sub-pkgs; the idea is to move biz-logic for tsdb IO/mgmt and
orchestration with real-time (shm) buffers and the graphics layer into
a common spot for both manual analysis/research work and better
separation of low level data structure primitives from their higher
level usage.

Add a better `data.history` mod doc string in prep for this move
as well as clean out a bunch of legacy commented cruft from the
`trimeter` and `marketstore` days.

TO CHERRY #486 (if we can)
2023-12-15 15:53:02 -05:00
Tyler Goodlet 97e2403fb1 Rework backfiller and null-segment task conc
For each timeframe open a sub-nursery to do the backfilling + tsdb load
+ null-segment scanning in an effort to both speed up load time (though
we need to reverse the current order to really make it faster rn since
moving to the much faster parquet file backend) and do concurrent
time-gap/null-segment checking of tsdb history while mrf (most recent
frame) history is backfilling.

The details are more or less just `trio` related task-func composition
tricks and a reordering of said funcs for optimal startup latency.
Also commented the `back_load_from_tsdb()` task for now since it's
unused.
2023-12-15 13:11:00 -05:00
Tyler Goodlet a4084d6a0b Bleh, fix another off-by-one issue in `np.argwhere()`
Apparently it returns the index of the prior zero-row (prolly since we
do the backward difference) so ensure `fi_zgaps += 1`..

Also fix remaining edge case handling when there's only 2 zero-segs
which was borked after a refactor to the special case blocks (like
a single zero row) prior to the `absi_zsegs` building loop AND make sure
to always return abs indices OUTSIDE the zero seg, i.e. the indices of
the non-zero row just before and just after so that the history
backfiller can use non-zero timestamps to generate range datetimes for
backend frame queries.

Add much more detailed doc-comments with a small ascii diagram to
explain how all these somewhat subtle vec ops work. Also toss in some
sanity checks on the output indices to ensure they don't point to
zero (time) valued rows when used to read the frame.
2023-12-15 12:48:50 -05:00
Tyler Goodlet 83bdca46a2 Wrap null-gap detect and fill in async gen
Call it `iter_null_segs()` (for now?) and use in the final (sequential)
stage of the `.history.start_backfill()` task-func. Delivers abs,
frame-relative, and equiv time stamps on each iteration pertaining to
each detected null-segment to make it easy to do piece-wise history
queries for each.

Further,
- handle edge case in `get_null_segs()` where there is only 1 zeroed
  row value, in which case we deliver `absi_zsegs` as a single pair of
  the same index value and,
  - when this occurs `iter_null_seqs()` delivers `None` for all the
    `start_` related indices/timestamps since all `get_hist()` routines
    (delivered by `open_history_client()`) should handle it as being a
    "get max history from this end_dt" type query.
- add note about needing to do time gap handling where there's a gap in
  the timeseries-history that isn't actually IN the data-history.
2023-12-13 18:29:06 -05:00
Tyler Goodlet c129f5bb4a Finally write a general purpose null-gap detector!
Using a bunch of fancy `numpy` vec ops (and ideally eventually extending
the same to `polars`) this is a first draft of `get_null_segs()`
a `col: str` field-value-is-zero detector which filters to all zero-valued
input frame segments and returns the corresponding useful slice-indexes:
- gap absolute (in shm buffer terms) index-endpoints as
  `absi_zsegs` for slicing to each null-segment in the src frame.
- ALL abs indices of rows with zeroed `col` values as `absi_zeros`.
- the full set of the input frame's row-entries (view) which are
  null valued for the chosen `col` as `zero_t`.

Use this new null-segment-detector in the
`.data.history.start_backfill()` task to attempt to fill null gaps that
might be extant from some prior backfill attempt. Since
`get_null_segs()` should now deliver a sequence of slices for each gap
we don't really need to have the `while gap_indices:` loop any more, so
just move that to the end-of-func and warn log (for now) if all gaps
aren't eventually filled.

TODO:
-[ ] do the null-seg detection and filling concurrently from
  most-recent-frame backfilling.
-[ ] offer the same detection in `.storage.cli` cmds for manual tsp
  anal.
-[ ] make the graphics layer actually update correctly when null-segs
  are filled (currently still broken somehow in the `Viz` caching
  layer?)

CHERRY INTO #486
2023-12-13 15:26:33 -05:00
Tyler Goodlet b95932ea09 `.data.history`: run `.tsp.dedupe()` in backloader
In an effort to catch out-of-order and/or partial-frame-duplicated
segments, add some `.tsp` calls throughout the backloader tasks
including a call to the new `.sort_diff()` to catch the out-of-order
history cases.
2023-12-12 19:57:46 -05:00
Tyler Goodlet e8bf4c6e04 Return the `.len()` diff from `dedupe()` instead
Since the `diff: int` serves as a predicate anyway (when `0` nothing
duplicate was detected) might as well just return it directly since it's
likely also useful for the caller when doing deeper anal.

Also, handle the zero-diff case by just returning early with a copy of
the input frame and a `diff=0`.

CHERRY INTO #486
2023-12-12 16:48:56 -05:00
Tyler Goodlet b03eceebef data.tsp: drop masked `return` one liner 2023-12-11 20:11:42 -05:00
Tyler Goodlet 49c458710e Move `numpy` <-> `polars` converters into `.data.tsp`
Yet again these are (going to be) generally useful in the data proc
layer as well as going forward with (possibly) moving the history and
shm rt-processing layer to apache (arrow or other) shared-ds
equivalents.
2023-12-11 17:53:31 -05:00
Tyler Goodlet b94582cb35 Move `dedupe()` to `.data.tsp` (so it has pals)
Includes a rename of `.data._timeseries` -> `.data.tsp` for "time series
processing", making it a public sub-mod; it contains a highly useful set
of data-frame and `numpy.ndarray` ops routines in various subsystems Bo
2023-12-11 16:24:27 -05:00
Tyler Goodlet e719733f97 Comment out overlap case block for now too? 2023-12-08 19:08:10 -05:00
Tyler Goodlet cb941a5554 BABOSO.. fix last history frame overlap slicing!
I guess since i started supporting the whole "allow a gap between
the latest tsdb sample and the latest retrieved history frame" the
overlap slicing has been completely borked XD where we've been sticking
in duplicate history samples and this has caused all sorts of down
stream time-series processing issues..

So fix that but ensuring whenever there IS an overlap between history in
the latest frame and the tsdb that we always prefer the latest frame's
data and slice OUT the tsdb's duplicate indices..

CHERRY TO #486
2023-12-08 18:56:38 -05:00
Tyler Goodlet b6d2550f33 Add datetime col de-duplicator 2023-12-08 14:38:27 -05:00
Tyler Goodlet b9af6176c5 Factor `TimeseriesNotFound` to top level
TO CHERRY into #486
2023-12-07 12:31:14 -05:00
Tyler Goodlet 9e71e0768f Define and pass a default `Flume._readonly: bool`
Allows opening with `.from_msg(readonly=False)` for write permissions
making underlyig shm arrays readonly. Also, make sure to pop the
`ShmArray` field entries prior to msg-ization, not sure how that worked
with the `Feed.flumes` equivalent..but?
2023-12-06 17:25:49 -05:00
Tyler Goodlet 656e2c6a88 fsp: intro a `Cascade` type that connects `Flume`s of streams 2023-12-05 16:59:07 -05:00
Tyler Goodlet 50ddef0985 data.feed: dynamically load `ui._search` mod for headless installs 2023-09-28 12:30:10 -04:00
Tyler Goodlet ad59a581c7 symcache: passthrough `rapidfuzz.process.extract` kwargs 2023-09-22 15:56:49 -04:00
Tyler Goodlet 0ba75df877 Add `data.match_from_pairs` fuzzy symbology scanner
A helper for scanning a "pairs table" that most backends should expose
as part of their (internal) symbology set using `rapidfuzz` over
a `dict[str, Struct]` input table.

Also expose the `data.types.Struct` at the subpkg top level.
2023-09-22 13:54:25 -04:00
Tyler Goodlet 4a180019f0 Swap out `fuzzywuzzy` for the newer `rapidfuzz` lib 2023-09-13 11:57:02 -04:00
Tyler Goodlet 481618cc51 kraken: handle ws live trading API symbology
Of course I missed this first try but, we need to use the ws market pair
symbology set (since apparently kraken loves redundancy at least 3 times
XD) when processing transactions that arrive from live clears since it's
an entirely different `LTC/EUR` style key then the `XLTCEUR` style
delivered from the ReST eps..

As part of this:
- add `Client._altnames`, `._wsnames` as `dict[str, Pair]` tables,
  leaving the `._AssetPairs` table as is keyed by the "xname"s.
- Change `Pair.respname: str` -> `.xname` since these keys all just seem
  to have a weird 'X' prefix.
- do the appropriately keyed pair table lookup via a new `api_name_set:
  str` to `norm_trade_records()` and set is correctly in the ws live txn
  handler task.
2023-08-30 16:32:34 -04:00
Tyler Goodlet ad37cfbe2f Break backfill loop on `end_dt < start_dt` 2023-08-29 08:43:14 -04:00
Tyler Goodlet 546049b62f data.history: handle venue-closure gap edge case 2023-08-25 17:47:30 -04:00
Tyler Goodlet 5ed8544fd1 Bleh, move `.data.types` back up to top level pkg
Since it's depended on by `.data` stuff as well as pretty much
everything else, makes more sense to expose it as a top level module
(and maybe eventually as a subpkg as we add to it).
2023-08-05 15:57:10 -04:00
Tyler Goodlet 100be54641 data.history: add TODO for non-zero epochs and some typing 2023-07-31 17:21:11 -04:00
Tyler Goodlet 08e8990fe3 Do single `ShmArray.array` read on zero-time filtering 2023-07-26 15:41:04 -04:00
Tyler Goodlet 2c6ae5d994 Drop the `gap_dt_unit: str` column
We don't need it in `detect_time_gaps()` since doing straight up
datetime diffs in `polars` already has a humanized `str` representation
but with higher precision like '2d 1h 24m 1s' B)
2023-07-26 15:37:59 -04:00
Tyler Goodlet 7802febd20 Backfill history gaps with pre-gap close 2023-07-26 12:56:06 -04:00
Tyler Goodlet 64329d44e7 Flip `tractor.breakpoint()`s to new `.pause()` 2023-07-26 12:48:19 -04:00
Tyler Goodlet 58cf7ce10e Add `norm_trade()` ep to validator warnings 2023-07-26 12:39:08 -04:00
Tyler Goodlet d0f72bf269 Wrap symcache loading into `.from_scratch()`
Since we need it both when explicitly reloading **and**
whenever either the file or data in the file doesn't exist.
2023-07-26 12:27:26 -04:00
Tyler Goodlet 759ebe71e9 Allow disabling symcache load via kwarg as well 2023-07-20 15:27:46 -04:00
Tyler Goodlet e88913e1f3 .data._pathops: drop profiler imports, fix some naming to appease `ruff` 2023-07-20 15:27:22 -04:00
Tyler Goodlet 5e7916a0df Start `piker.toolz` subpkg for all our tooling B)
Since there's a growing list of top level mods which are more or less
utils/tools for working with the runtime; begin to move them into a new
subpkg starting with a new `.toolz.debug`.

Start with,
- a new `open_crash_handller()` for doing breakpoints around blocks that
  might error.
- move in what was `piker._profile` into `.toolz.profile` and adjust all
  importing appropriately.
2023-07-20 15:23:01 -04:00
Tyler Goodlet dfa13afe22 Allow backends to "bypass" symcache loading
Some backends like `ib` don't have an obvious (nor practical) way to
easily download the entire symbology set available from all its mkt
venues. For such backends loading might require a non-std approach (like
using the contract search from some input mkt-key set) and can't be
expected to necessarily be supported out of the box. As such, allow
annotating a broker sub-pkg module with a `_no_symcache: bool = True`
attr which will make `open_symcache()` yield early with an empty
`SymbologyCache` instance for use by the caller to fill in the mkt and
assets tables in whatever ad-hoc way desired.
2023-07-17 17:12:40 -04:00
Tyler Goodlet 2dab0e2e56 Expose `.data._symcache` stuff at subpkg toplevel
The list is `open_symcache()`, `get_symcache()`, `SymbologyCache`, and
`Stuct` which seems more or less fine to make part of the public
namespace. Also, make `._timeseries.t_unit` an instance of literal to make
`ruff` happy?
2023-07-17 01:20:52 -04:00
Tyler Goodlet e8025d0985 .data.types.Struct: by default include non-members from `.to_dict()`.. 2023-07-16 21:32:36 -04:00
Tyler Goodlet da206f5242 Store "namespace path" for each backend's pair struct
Since some backends have multiple venues keyed by the same
symbol-pair-name, AND often the market/symbol info for those different
market-venues is entirely different (cough binance), we will have to
(sometimes) save the struct namespace-path as str for lookup when
deserializing a symcache to object form.

NOTE: this change is reliant on the following `tractor` dev commit which
improves support for constructing a path from object-instance:
bee2c36072

Add a backend(-wide) default struct path stored as a (TOML top level)
field `pair_ns_path: str` in the serialized `dict`-table as well as
allow for a per pair-`Struct` value optionally defined on each type def;
the global is only used if none was defined per struct via a `ns_path:
str`.

Further deats:
- don't write non-struct-member fields to dict for TOML file cache.
- always keep object forms, well as objects (in tables).. XD
- factor cache loading from `dict` (and thus from TOML or presumably any
  other interchange form) into a `@classmethod` constructor method B)
- all choosing the subtable for `.search()` by name.
2023-07-13 17:58:50 -04:00
Tyler Goodlet 7f4884a6d9 data.types.Struct.to_dict(): discard non-member struct by default 2023-07-12 12:33:30 -04:00
Tyler Goodlet 8b9494281d Don't verify the history step period for now in `tsdb_backfill()` 2023-07-12 08:45:55 -04:00
Tyler Goodlet 8330b36e58 User/return explicit `symcache` var name in sync case 2023-07-12 08:45:55 -04:00
Tyler Goodlet 243821aab1 Bleh! Ok make `open_symcache()` and `@acm`..
Turns in order to make things much cleaner from inside-the-runtime usage
we do probably want to just make the manager async so that we can
generate the cache on demand from async UI inits as well as daemon
actors.. So change to that and instead make `get_symcache()` the helper
that should ONLY be called from sync funcs / offline ledger processing
utils!
2023-07-12 08:45:55 -04:00
Tyler Goodlet 8f40e522ef Add handy `DiffDump`ing for our `.types.Struct`
So you can do a `Struct1` - `Struct2` and we dump a little diff `list`
of tuples for anal on the REPL B)

Prolly can be broken out into it's own micro-patch?
2023-07-12 08:45:55 -04:00