Commit Graph

29 Commits (21fe469f430c14057a86f9ee74534be096e7b231)

Author SHA1 Message Date
Tyler Goodlet 03e21e1f85 `.storage.cli`: collect gap-markup-aids into `tf2aids: dict` prior to pause for introspection 2025-02-13 21:20:35 -05:00
Tyler Goodlet 52b349fe79 Always reload shm data before annotating gaps, so they line up.. 2024-01-09 15:55:16 -05:00
Tyler Goodlet 6959429af8 Factor gap annotating into new `markup_gaps()`
Since we definitely want to markup gaps that are both data-errors and
just plain old venue closures (time gaps), generalize the `gaps:
pl.DataFrame` loop in a func and call it from the `ldshm` cmd for now.

Some other tweaks to `store ldshm`:
- add `np.median()` failover when detecting (ohlc) ts sample rate.
- use new `tsp.dedupe()` signature.
- differentiate between sample-period size gaps and venue closure gaps
  by calling `tsp.detect_time_gaps()` with diff size thresholds.
- add todo for backfilling null-segs when detected.
2024-01-04 11:01:21 -05:00
Tyler Goodlet a86573b5a2 Fix .parquet filenaming..
Apparently `.storage.nativedb.mk_ohlcv_shm_keyed_filepath()` was always
kinda broken if you passed in a `period: float` with an actual non-`int`
to the format string? Fixed it to strictly cast to `int()` before
str-ifying so that you don't get weird `60.0s.parquet` in there..

Further this rejigs the `sotre ldshm` gap correction-annotation loop to,
- use `StorageClient.write_ohlcv()` instead of hackily re-implementing
  it.. now that problem from above is fixed!
- use a `needs_correction: bool` var to determine if gap markup and
  de-duplictated data should be pushed to the shm buffer,
- go back to using `AnnotCtl.add_rect()` for all detected gaps such that
  they all persist (and thus are shown together) until the client
  disconnects.
2023-12-26 17:14:26 -05:00
Tyler Goodlet b064a5f94d A working remote annotations controller B)
Obvi took a little `.ui` component fixing (as per prior commits) but
this is now a working PoC for gap detection and markup from a remote
(data) non-`chart` actor!

Iface and impl deats from `.ui._remote_ctl`:
- add new `open_annot_ctl()` mngr which attaches to all locally
  discoverable chart actors, gathers annot-ctl streams per fqme set, and
  delivers a new `AnnotCtl` client which allows adding annotation
  rectangles via a `.add_rect()` method.
  - also template out some other soon-to-get methods for removing and
    modifying pre-exiting annotations on some `ChartView` 💥
- ensure the `chart` CLI subcmd starts the (`qtloops`) guest-mode init
  with the `.ui._remote_ctl` module enabled.
- actually use this stuff in the `piker store ldshm` CLI to submit
  markup rects around any detected null/time gaps in the tsdb data!

Still lots to do:
-  probably colorization of gaps depending on if they're venue
   closures (aka real mkt gaps) vs. "missing data" from the backend (aka
   timeseries consistency gaps).
- run gap detection and markup as part of the std `.tsp` sub-sys
   runtime such that gap annots are a std "built-in" feature of
   charting.
- support for epoch time stamp AND abs-shm-index rect x-values
  (depending on chart operational state).
2023-12-22 15:19:20 -05:00
Tyler Goodlet f7cc43ee0b Add pauses to `store anal/ldshm` only on bad segs
Particularly halting before maybe writing the repaired timeseries
history in `store anal` to optionally allow user to avoid writing to
storage.
2023-12-18 11:56:57 -05:00
Tyler Goodlet 8989c73a93 Move `iter_dfs_from_shms` into `.data.history`
Thinking about just moving all of that module (after a content breakup)
to a new `.piker.tsp` which will mostly depend on the `.data` and
`.storage` sub-pkgs; the idea is to move biz-logic for tsdb IO/mgmt and
orchestration with real-time (shm) buffers and the graphics layer into
a common spot for both manual analysis/research work and better
separation of low level data structure primitives from their higher
level usage.

Add a better `data.history` mod doc string in prep for this move
as well as clean out a bunch of legacy commented cruft from the
`trimeter` and `marketstore` days.

TO CHERRY #486 (if we can)
2023-12-15 15:53:02 -05:00
Tyler Goodlet f274c3db3b Import `np2pl()` from `.data.tsp`
Also toss in todo for a timeseries search CLI cmd which can be handy
when doing offine store mgmt.
2023-12-13 09:25:44 -05:00
Tyler Goodlet 49c458710e Move `numpy` <-> `polars` converters into `.data.tsp`
Yet again these are (going to be) generally useful in the data proc
layer as well as going forward with (possibly) moving the history and
shm rt-processing layer to apache (arrow or other) shared-ds
equivalents.
2023-12-11 17:53:31 -05:00
Tyler Goodlet b94582cb35 Move `dedupe()` to `.data.tsp` (so it has pals)
Includes a rename of `.data._timeseries` -> `.data.tsp` for "time series
processing", making it a public sub-mod; it contains a highly useful set
of data-frame and `numpy.ndarray` ops routines in various subsystems Bo
2023-12-11 16:24:27 -05:00
Tyler Goodlet 7311000846 Facepalm, set `was_deduped` as bool not the deduped frame.. 2023-12-11 13:18:10 -05:00
Tyler Goodlet 2eeef2a123 Add `dedupe()` to help with gap detection/resolution
Think i finally figured out the weird issue without out-of-order OHLC
history getting jammed in the wrong place:
- gap is detected in parquet/offline ts (likely due to a zero dt or
  other gap),
- query for history in the gap is made BUT that frame is then inserted
  in the shm buffer **at the end** (likely using array int-entry
  indexing) which inserts it at the wrong location,
- later this out-of-order frame is written to the storage layer
  (parquet) and then is repeated on further reboots with the original
  gap causing further queries for the same frame on every history
  backfill.

A set of tools useful for detecting these issues and annotating them
nicely on chart part of this patch's intent:
- `dedupe()` will detect any dt gaps, deduplicate datetime rows and
  return the de-duplicated df along with gaps table.
- use this in both `piker store anal` such that we potentially
  resolve and backfill the gaps correctly if some rows were removed.
- possibly also use this to detect the backfilling error in logic at
  the time of backfilling the frame instead of after the fact (which
  would require re-writing the shm array from something like `store
  ldshm` and would be a manual post-hoc solution, not a fix to the
  original issue..
2023-12-08 15:11:34 -05:00
Tyler Goodlet 22bd83943b .storage: support `store anal --pdb` flag 2023-12-04 13:00:33 -05:00
Tyler Goodlet 24a54a7085 Add `TimeseriesNotFound` for fqme lookup failures
A common usage error is to run `piker anal mnq.cme.ib` where the CLI
passed fqme is not actually fully-qualified (in this case missing an
expiry token) and we get an underlying `FileNotFoundError` from the
`StorageClient.read_ohlcv()` call. In such key misses, scan the existing
`StorageClient._index` for possible matches and report in a `raise from`
the new error.

CHERRY into #486
2023-12-04 11:22:55 -05:00
Tyler Goodlet c3f8b089be Drop `.service._ahab` from storage cli runtime mods 2023-08-25 13:33:59 -04:00
Tyler Goodlet 385561276b Add gap detection into the `store ldshm` cmd 2023-07-26 15:45:55 -04:00
Tyler Goodlet 64329d44e7 Flip `tractor.breakpoint()`s to new `.pause()` 2023-07-26 12:48:19 -04:00
Tyler Goodlet d704d631ba Add `store ldshm` subcmd
Changed from the old `store clone` to instead simply load any shm buffer
matching a user provided `FQME: str` pattern; writing to parquet file is
only done if an explicit option flag is passed by user.

Implement new `iter_dfs_from_shms()` generator which allows interatively
loading both 1m and 1s buffers delivering the `Path`, `ShmArray` and
`polars.DataFrame` instances per matching file B)

Also add a todo for a `NativeStorageClient.clear_range()` method.
2023-06-27 13:41:47 -04:00
Tyler Goodlet 58c096bfad Bleh go back to using pdbp for REPL in anal 2023-06-27 13:41:47 -04:00
Tyler Goodlet fda7111305 Import from new `.data._timeseries` mod for anal 2023-06-27 13:41:47 -04:00
Tyler Goodlet 54f8a615fc Use `code.interact()` in anal subcmd for now 2023-06-27 13:41:47 -04:00
Tyler Goodlet 9fd412f631 Add basic time-sampling gap detection via `polars`
For OHLCV time series we normally presume a uniform sampling period
(1s or 60s by default) and it's handy to have tools to ensure a series
is gapless or contains expected gaps based on (legacy) market hours.

For this we leverage `polars`:
- add `.nativedb.with_dts()` a datetime-from-epoch-time-column frame
  "column-expander" which inserts datetime-casted, epoch-diff and
  dt-diff columns.
- add `.nativedb.detect_time_gaps()` which filters to any larger then
  expected sampling period rows.
- wrap the above (for now) in a `piker store anal` (analysis) cmd which
  atm always enters a breakpoint for tinkering.

Supporting storage client additions:
- add a `detect_period()` helper for extracting expected OHLC time step.
- add new `NativedbStorageClient` methods and attrs to provide for the above:
    - `.mk_path()` to **only** deliver a parquet-file path for use in
      other methods.
    - `._dfs` to house cached `pl.DataFrame`s loaded from `.parquet` files.
    - `.as_df()` which loads cached frames or loads them from disk and
      then caches (for next use).
    - `_write_ohlcv()` a private-sync version of the public equivalent
      meth since we don't currently have any actual async file IO
      underneath; add a flag for whether to return as a `numpy.ndarray`.
2023-06-27 13:41:47 -04:00
Tyler Goodlet d2accdac9b Drop remaining mkts nonsense from `store delete` 2023-06-27 13:41:47 -04:00
Tyler Goodlet c020ab76be Clean out marketstore specifics
- drop buncha cruft from `store ls` cmd and make it work for
  multi-backend fqme listing.
  - including adding an `.address` to the mkts client which shows the
    grpc socketaddr details.
- change defauls to new `'nativedb'.
- drop 'marketstore' from built-in backend list (for now)
2023-06-27 13:41:47 -04:00
Tyler Goodlet 7b4f4bf804 First draft `.storage.nativedb.` using parquet files
After much frustration with a particular tsdb (cough) this instead
implements a new native-file (and apache tech based) backend which
stores time series in parquet files (for now) using the `polars` apis
(since we plan to use that lib as well for processing).

Note this code is currently **very** rough and in draft mode.

Details:
- add conversion routines for going from `polars.DataFrame` to
  `numpy.ndarray` and back.
- lay out a simple file-name as series key symbology:
  `fqme.<datadescriptions>.parquet`, though probably it will evolve.
- implement the entire `StorageClient` interface as it stands.
- adjust `storage.cli` cmds to instead expect to use this new backend,
  which means it's a complete mess XD

Main benefits/motivation:
- wayy faster load times with no "datums to load limit" required.
- smaller space footprint and we haven't even touched compression
  settings yet!
- wayyy more compatible with other systems which can lever the apache
  ecosystem.
- gives us finer grained control over the filesystem usage so we can
  choose to swap out stuff like the replication system or networking
  access.
2023-06-27 13:41:47 -04:00
Tyler Goodlet 94733c4a0b A PoC tsdb prototype: `parqdb` using `polars`
Turns out just (over)writing `.parquet` files with >= 1M datums is like
less then a second, and we can likely speed up appends using
`fastparquet` (usage coming soon).

Includes:
- a new `clone` CLI subcmd to test this all out by ad-hoc copy of
  (literally hardcoded to a daemon-actor specific shm allocation X) an
  existing `/dev/shm/<ShmArray>` and push to `.parquet` file.
  - code to convert from our `ShmArray.array: np.ndarray` ->
    `polars.DataFrame` (thanks SO).
  - timing checks around the file IO and np -> polars conversion.
- a `read` subcmd which i was using to test the sync `pymarketstore`
  client against our async one to see if the issues from
  https://github.com/pikers/piker/issues/443 were resolved, but nope!
2023-06-27 13:41:47 -04:00
Tyler Goodlet e83de2906f Relegate old marketstore cli eps to masked module 2023-06-27 13:41:47 -04:00
Tyler Goodlet cb774e5a5d Re-implement `piker store` CLI with `typer`
Turns out you can mix and match `click` with `typer` so this moves what
was the `.data.cli` stuff into `storage.cli` and uses the integration
api to make it all work B)

New subcmd: `piker store`
- add `piker store ls` which lists all fqme keyed time-series from backend.
- add `store delete` to remove any such key->time-series.
  - now uses a nursery for multi-timeframe concurrency B)

Mask out all the old `marketstore` specific subcmds for now (streaming,
ingest, storesh, etc..) in anticipation of moving them into
a subpkg-module and make sure to import the sub-cmd module in our top
level cli package.

Other `.storage` api tweaks:
- drop the reraising with custom error (for now).
- rename `Storage` -> `StorageClient` (or should it be API?).
2023-06-27 13:41:47 -04:00
Tyler Goodlet 1ec9b0565f Move `.data.cli` to `.storage.cli` 2023-06-27 13:41:47 -04:00