Commit Graph

23 Commits (fed89562dc46e26de2f6869494c6238d43f79cf9)

Author SHA1 Message Date
Tyler Goodlet c3f8b089be Drop `.service._ahab` from storage cli runtime mods 2023-08-25 13:33:59 -04:00
Tyler Goodlet 385561276b Add gap detection into the `store ldshm` cmd 2023-07-26 15:45:55 -04:00
Tyler Goodlet 64329d44e7 Flip `tractor.breakpoint()`s to new `.pause()` 2023-07-26 12:48:19 -04:00
Tyler Goodlet 5e7916a0df Start `piker.toolz` subpkg for all our tooling B)
Since there's a growing list of top level mods which are more or less
utils/tools for working with the runtime; begin to move them into a new
subpkg starting with a new `.toolz.debug`.

Start with,
- a new `open_crash_handller()` for doing breakpoints around blocks that
  might error.
- move in what was `piker._profile` into `.toolz.profile` and adjust all
  importing appropriately.
2023-07-20 15:23:01 -04:00
Tyler Goodlet c9681d0aa2 .nativedb: ignore an `expired/` subdir 2023-07-12 08:45:55 -04:00
Tyler Goodlet d704d631ba Add `store ldshm` subcmd
Changed from the old `store clone` to instead simply load any shm buffer
matching a user provided `FQME: str` pattern; writing to parquet file is
only done if an explicit option flag is passed by user.

Implement new `iter_dfs_from_shms()` generator which allows interatively
loading both 1m and 1s buffers delivering the `Path`, `ShmArray` and
`polars.DataFrame` instances per matching file B)

Also add a todo for a `NativeStorageClient.clear_range()` method.
2023-06-27 13:41:47 -04:00
Tyler Goodlet 58c096bfad Bleh go back to using pdbp for REPL in anal 2023-06-27 13:41:47 -04:00
Tyler Goodlet c1546eb043 Add note about appending parquet files on write 2023-06-27 13:41:47 -04:00
Tyler Goodlet fda7111305 Import from new `.data._timeseries` mod for anal 2023-06-27 13:41:47 -04:00
Tyler Goodlet f25248c871 Add `.data._timeseries` utility mod
Org all the new (time) gap detection routines here and also move in the
`slice_from_time()` epoch -> index converter routine from `._pathops` B)
2023-06-27 13:41:47 -04:00
Tyler Goodlet 54f8a615fc Use `code.interact()` in anal subcmd for now 2023-06-27 13:41:47 -04:00
Tyler Goodlet 2dbcecdac7 Generalize time-gap detector to accept unit and threshold 2023-06-27 13:41:47 -04:00
Tyler Goodlet 9fd412f631 Add basic time-sampling gap detection via `polars`
For OHLCV time series we normally presume a uniform sampling period
(1s or 60s by default) and it's handy to have tools to ensure a series
is gapless or contains expected gaps based on (legacy) market hours.

For this we leverage `polars`:
- add `.nativedb.with_dts()` a datetime-from-epoch-time-column frame
  "column-expander" which inserts datetime-casted, epoch-diff and
  dt-diff columns.
- add `.nativedb.detect_time_gaps()` which filters to any larger then
  expected sampling period rows.
- wrap the above (for now) in a `piker store anal` (analysis) cmd which
  atm always enters a breakpoint for tinkering.

Supporting storage client additions:
- add a `detect_period()` helper for extracting expected OHLC time step.
- add new `NativedbStorageClient` methods and attrs to provide for the above:
    - `.mk_path()` to **only** deliver a parquet-file path for use in
      other methods.
    - `._dfs` to house cached `pl.DataFrame`s loaded from `.parquet` files.
    - `.as_df()` which loads cached frames or loads them from disk and
      then caches (for next use).
    - `_write_ohlcv()` a private-sync version of the public equivalent
      meth since we don't currently have any actual async file IO
      underneath; add a flag for whether to return as a `numpy.ndarray`.
2023-06-27 13:41:47 -04:00
Tyler Goodlet d2accdac9b Drop remaining mkts nonsense from `store delete` 2023-06-27 13:41:47 -04:00
Tyler Goodlet c020ab76be Clean out marketstore specifics
- drop buncha cruft from `store ls` cmd and make it work for
  multi-backend fqme listing.
  - including adding an `.address` to the mkts client which shows the
    grpc socketaddr details.
- change defauls to new `'nativedb'.
- drop 'marketstore' from built-in backend list (for now)
2023-06-27 13:41:47 -04:00
Tyler Goodlet 7b4f4bf804 First draft `.storage.nativedb.` using parquet files
After much frustration with a particular tsdb (cough) this instead
implements a new native-file (and apache tech based) backend which
stores time series in parquet files (for now) using the `polars` apis
(since we plan to use that lib as well for processing).

Note this code is currently **very** rough and in draft mode.

Details:
- add conversion routines for going from `polars.DataFrame` to
  `numpy.ndarray` and back.
- lay out a simple file-name as series key symbology:
  `fqme.<datadescriptions>.parquet`, though probably it will evolve.
- implement the entire `StorageClient` interface as it stands.
- adjust `storage.cli` cmds to instead expect to use this new backend,
  which means it's a complete mess XD

Main benefits/motivation:
- wayy faster load times with no "datums to load limit" required.
- smaller space footprint and we haven't even touched compression
  settings yet!
- wayyy more compatible with other systems which can lever the apache
  ecosystem.
- gives us finer grained control over the filesystem usage so we can
  choose to swap out stuff like the replication system or networking
  access.
2023-06-27 13:41:47 -04:00
Tyler Goodlet 94733c4a0b A PoC tsdb prototype: `parqdb` using `polars`
Turns out just (over)writing `.parquet` files with >= 1M datums is like
less then a second, and we can likely speed up appends using
`fastparquet` (usage coming soon).

Includes:
- a new `clone` CLI subcmd to test this all out by ad-hoc copy of
  (literally hardcoded to a daemon-actor specific shm allocation X) an
  existing `/dev/shm/<ShmArray>` and push to `.parquet` file.
  - code to convert from our `ShmArray.array: np.ndarray` ->
    `polars.DataFrame` (thanks SO).
  - timing checks around the file IO and np -> polars conversion.
- a `read` subcmd which i was using to test the sync `pymarketstore`
  client against our async one to see if the issues from
  https://github.com/pikers/piker/issues/443 were resolved, but nope!
2023-06-27 13:41:47 -04:00
Tyler Goodlet 7d1cc47db9 ROFL, even using `pymarketstore`'s json-RPC it's borked..
Turns out trying to switch to the old sync client and going back to
using the old json-RPC API (after having had to patch the upstream repo
to not import gRPC machinery to avoid crashes..) I'm basically getting
the exact same issues.

New tinkering results does possibly tell some new stuff:
- the EOF error seems to indeed be due to trying fetch records which haven't been
  written (properly) - like asking for a `end=<epoch_int>` that is
  earlier then the earliest record.
- the "snappy input corrupt" error seems to have something to do with
  the `Params.end` field not being an `int` and/or the int precision not
  being chosen correctly?
  - toying with this a bunch manually shows that the internals of the
    client (particularly `.build_query()` stuff) is parsing/calcing the
    `Epoch` and `Nanoseconds` values out incorrectly.. which is likely
    part of the problem.
  - we also changed `anyio_marketstore.MarketStoreclient.build_query()`
    logic when removing `pandas` a while back, which also seems to be
    part of the problem on the async side, however reverting those
    changes also didn't fix the issue entirely; likely something else
    more subtle going on (maybe with the write vs. read `Epoch` field
    type we pass?).

Despite all this malarky, we're already underway more or less obsoleting
this whole thing with a much less complex approach of using apache
parquet files and modern filesystem tools to get a more flexible and
numerics-native dataframe-oriented tsdb B)
2023-06-27 13:41:47 -04:00
Tyler Goodlet e83de2906f Relegate old marketstore cli eps to masked module 2023-06-27 13:41:47 -04:00
Tyler Goodlet cb774e5a5d Re-implement `piker store` CLI with `typer`
Turns out you can mix and match `click` with `typer` so this moves what
was the `.data.cli` stuff into `storage.cli` and uses the integration
api to make it all work B)

New subcmd: `piker store`
- add `piker store ls` which lists all fqme keyed time-series from backend.
- add `store delete` to remove any such key->time-series.
  - now uses a nursery for multi-timeframe concurrency B)

Mask out all the old `marketstore` specific subcmds for now (streaming,
ingest, storesh, etc..) in anticipation of moving them into
a subpkg-module and make sure to import the sub-cmd module in our top
level cli package.

Other `.storage` api tweaks:
- drop the reraising with custom error (for now).
- rename `Storage` -> `StorageClient` (or should it be API?).
2023-06-27 13:41:47 -04:00
Tyler Goodlet 1ec9b0565f Move `.data.cli` to `.storage.cli` 2023-06-27 13:41:47 -04:00
Tyler Goodlet 7ab97fb21d Add marketstore client as storage-backend module
To kick off our (tsdb) storage backends this adds our first implementing
a new `Storage(Protocol)` client interface. Going foward, the top level
`.storage` pkg-module will now expose backend agnostic APIs and helpers
whilst specific backend implementations will adhere to that middle-ware
layer.

Deats:
- add `.storage.marketstore.Storage` as the first client implementation,
  moving all needed (import) dependencies out from
  `.service.marketstore` as well as `.ohlc_key_map` and `get_client()`.
- move root `conf.toml` loading from `.data.history` into
  `.storage.__init__.open_storage_client()` which now takes in a `name:
  str` and does all the work of loading the correct backend module, its
  config, and determining if a service-instance can be contacted and
  a client loaded; in the case where this fails we raise a new
  `StorageConnectionError`.
- add a new `.storage.get_storagemod()` just like we have for brokers.
- make `open_storage_client()` also return the backend module such that
  the history-data layer can make backend specific calls as needed (eg.
  ohlc_key_map).
- fall back to a basic non-tsdb backfill when `open_storage_client()`
  raises the new connection error.
2023-06-27 13:41:47 -04:00
Tyler Goodlet 29211b200d Start `piker.storage` subsys: cross-(ts)db middlewares
The plan is to offer multiple tsdb and other storage backends (for
a variety of use cases) and expose them similarly to how we do for
broker and data providers B)
2023-06-27 13:41:47 -04:00