piker

Commit Graph

Author	SHA1	Message	Date
Tyler Goodlet	52b349fe79	Always reload shm data before annotating gaps, so they line up..	2024-01-09 15:55:16 -05:00
Tyler Goodlet	6959429af8	Factor gap annotating into new `markup_gaps()` Since we definitely want to markup gaps that are both data-errors and just plain old venue closures (time gaps), generalize the `gaps: pl.DataFrame` loop in a func and call it from the `ldshm` cmd for now. Some other tweaks to `store ldshm`: - add `np.median()` failover when detecting (ohlc) ts sample rate. - use new `tsp.dedupe()` signature. - differentiate between sample-period size gaps and venue closure gaps by calling `tsp.detect_time_gaps()` with diff size thresholds. - add todo for backfilling null-segs when detected.	2024-01-04 11:01:21 -05:00
Tyler Goodlet	a86573b5a2	Fix .parquet filenaming.. Apparently `.storage.nativedb.mk_ohlcv_shm_keyed_filepath()` was always kinda broken if you passed in a `period: float` with an actual non-`int` to the format string? Fixed it to strictly cast to `int()` before str-ifying so that you don't get weird `60.0s.parquet` in there.. Further this rejigs the `sotre ldshm` gap correction-annotation loop to, - use `StorageClient.write_ohlcv()` instead of hackily re-implementing it.. now that problem from above is fixed! - use a `needs_correction: bool` var to determine if gap markup and de-duplictated data should be pushed to the shm buffer, - go back to using `AnnotCtl.add_rect()` for all detected gaps such that they all persist (and thus are shown together) until the client disconnects.	2023-12-26 17:14:26 -05:00
Tyler Goodlet	b064a5f94d	A working remote annotations controller B) Obvi took a little `.ui` component fixing (as per prior commits) but this is now a working PoC for gap detection and markup from a remote (data) non-`chart` actor! Iface and impl deats from `.ui._remote_ctl`: - add new `open_annot_ctl()` mngr which attaches to all locally discoverable chart actors, gathers annot-ctl streams per fqme set, and delivers a new `AnnotCtl` client which allows adding annotation rectangles via a `.add_rect()` method. - also template out some other soon-to-get methods for removing and modifying pre-exiting annotations on some `ChartView` 💥 - ensure the `chart` CLI subcmd starts the (`qtloops`) guest-mode init with the `.ui._remote_ctl` module enabled. - actually use this stuff in the `piker store ldshm` CLI to submit markup rects around any detected null/time gaps in the tsdb data! Still lots to do: - probably colorization of gaps depending on if they're venue closures (aka real mkt gaps) vs. "missing data" from the backend (aka timeseries consistency gaps). - run gap detection and markup as part of the std `.tsp` sub-sys runtime such that gap annots are a std "built-in" feature of charting. - support for epoch time stamp AND abs-shm-index rect x-values (depending on chart operational state).	2023-12-22 15:19:20 -05:00
Tyler Goodlet	f7cc43ee0b	Add pauses to `store anal/ldshm` only on bad segs Particularly halting before maybe writing the repaired timeseries history in `store anal` to optionally allow user to avoid writing to storage.	2023-12-18 11:56:57 -05:00
Tyler Goodlet	f5dc21d3f4	Adjust all `.tsp` imports to use new sub-pkg Also toss in a poll loop around the `hist_shm: ShmArray` backfill read-check in the `.data.allocate_persisten_feed()` init to cope with possible racy-ness from the increased tsdb history loading concurrency now implemented.	2023-12-18 11:54:28 -05:00
Tyler Goodlet	8989c73a93	Move `iter_dfs_from_shms` into `.data.history` Thinking about just moving all of that module (after a content breakup) to a new `.piker.tsp` which will mostly depend on the `.data` and `.storage` sub-pkgs; the idea is to move biz-logic for tsdb IO/mgmt and orchestration with real-time (shm) buffers and the graphics layer into a common spot for both manual analysis/research work and better separation of low level data structure primitives from their higher level usage. Add a better `data.history` mod doc string in prep for this move as well as clean out a bunch of legacy commented cruft from the `trimeter` and `marketstore` days. TO CHERRY #486 (if we can)	2023-12-15 15:53:02 -05:00
Tyler Goodlet	c4853a3fee	Drop inter-method NL	2023-12-13 09:27:23 -05:00
Tyler Goodlet	f274c3db3b	Import `np2pl()` from `.data.tsp` Also toss in todo for a timeseries search CLI cmd which can be handy when doing offine store mgmt.	2023-12-13 09:25:44 -05:00
Tyler Goodlet	f7a8d79b7b	Add `NativeStorageClient._cache_df()` use it in `.write_ohlcv()` for caching on writes as well	2023-12-11 20:10:53 -05:00
Tyler Goodlet	49c458710e	Move `numpy` <-> `polars` converters into `.data.tsp` Yet again these are (going to be) generally useful in the data proc layer as well as going forward with (possibly) moving the history and shm rt-processing layer to apache (arrow or other) shared-ds equivalents.	2023-12-11 17:53:31 -05:00
Tyler Goodlet	b94582cb35	Move `dedupe()` to `.data.tsp` (so it has pals) Includes a rename of `.data._timeseries` -> `.data.tsp` for "time series processing", making it a public sub-mod; it contains a highly useful set of data-frame and `numpy.ndarray` ops routines in various subsystems Bo	2023-12-11 16:24:27 -05:00
Tyler Goodlet	7311000846	Facepalm, set `was_deduped` as bool not the deduped frame..	2023-12-11 13:18:10 -05:00
Tyler Goodlet	2eeef2a123	Add `dedupe()` to help with gap detection/resolution Think i finally figured out the weird issue without out-of-order OHLC history getting jammed in the wrong place: - gap is detected in parquet/offline ts (likely due to a zero dt or other gap), - query for history in the gap is made BUT that frame is then inserted in the shm buffer at the end (likely using array int-entry indexing) which inserts it at the wrong location, - later this out-of-order frame is written to the storage layer (parquet) and then is repeated on further reboots with the original gap causing further queries for the same frame on every history backfill. A set of tools useful for detecting these issues and annotating them nicely on chart part of this patch's intent: - `dedupe()` will detect any dt gaps, deduplicate datetime rows and return the de-duplicated df along with gaps table. - use this in both `piker store anal` such that we potentially resolve and backfill the gaps correctly if some rows were removed. - possibly also use this to detect the backfilling error in logic at the time of backfilling the frame instead of after the fact (which would require re-writing the shm array from something like `store ldshm` and would be a manual post-hoc solution, not a fix to the original issue..	2023-12-08 15:11:34 -05:00
Tyler Goodlet	b9af6176c5	Factor `TimeseriesNotFound` to top level TO CHERRY into #486	2023-12-07 12:31:14 -05:00
Tyler Goodlet	22bd83943b	.storage: support `store anal --pdb` flag	2023-12-04 13:00:33 -05:00
Tyler Goodlet	239c1c457e	Sort fqme suggestions pre-print	2023-12-04 11:34:39 -05:00
Tyler Goodlet	24a54a7085	Add `TimeseriesNotFound` for fqme lookup failures A common usage error is to run `piker anal mnq.cme.ib` where the CLI passed fqme is not actually fully-qualified (in this case missing an expiry token) and we get an underlying `FileNotFoundError` from the `StorageClient.read_ohlcv()` call. In such key misses, scan the existing `StorageClient._index` for possible matches and report in a `raise from` the new error. CHERRY into #486	2023-12-04 11:22:55 -05:00
Tyler Goodlet	a382f01c85	Move tsdb section to `service.tsdb.name` and get host from `.maddrs`	2023-10-01 17:23:39 -04:00
Tyler Goodlet	f94244aad4	Load `network` section from `conf.toml` for service-addr map	2023-09-28 12:04:24 -04:00
Tyler Goodlet	c3f8b089be	Drop `.service._ahab` from storage cli runtime mods	2023-08-25 13:33:59 -04:00
Tyler Goodlet	385561276b	Add gap detection into the `store ldshm` cmd	2023-07-26 15:45:55 -04:00
Tyler Goodlet	64329d44e7	Flip `tractor.breakpoint()`s to new `.pause()`	2023-07-26 12:48:19 -04:00
Tyler Goodlet	5e7916a0df	Start `piker.toolz` subpkg for all our tooling B) Since there's a growing list of top level mods which are more or less utils/tools for working with the runtime; begin to move them into a new subpkg starting with a new `.toolz.debug`. Start with, - a new `open_crash_handller()` for doing breakpoints around blocks that might error. - move in what was `piker._profile` into `.toolz.profile` and adjust all importing appropriately.	2023-07-20 15:23:01 -04:00
Tyler Goodlet	c9681d0aa2	.nativedb: ignore an `expired/` subdir	2023-07-12 08:45:55 -04:00
Tyler Goodlet	d704d631ba	Add `store ldshm` subcmd Changed from the old `store clone` to instead simply load any shm buffer matching a user provided `FQME: str` pattern; writing to parquet file is only done if an explicit option flag is passed by user. Implement new `iter_dfs_from_shms()` generator which allows interatively loading both 1m and 1s buffers delivering the `Path`, `ShmArray` and `polars.DataFrame` instances per matching file B) Also add a todo for a `NativeStorageClient.clear_range()` method.	2023-06-27 13:41:47 -04:00
Tyler Goodlet	58c096bfad	Bleh go back to using pdbp for REPL in anal	2023-06-27 13:41:47 -04:00
Tyler Goodlet	c1546eb043	Add note about appending parquet files on write	2023-06-27 13:41:47 -04:00
Tyler Goodlet	fda7111305	Import from new `.data._timeseries` mod for anal	2023-06-27 13:41:47 -04:00
Tyler Goodlet	f25248c871	Add `.data._timeseries` utility mod Org all the new (time) gap detection routines here and also move in the `slice_from_time()` epoch -> index converter routine from `._pathops` B)	2023-06-27 13:41:47 -04:00
Tyler Goodlet	54f8a615fc	Use `code.interact()` in anal subcmd for now	2023-06-27 13:41:47 -04:00
Tyler Goodlet	2dbcecdac7	Generalize time-gap detector to accept unit and threshold	2023-06-27 13:41:47 -04:00
Tyler Goodlet	9fd412f631	Add basic time-sampling gap detection via `polars` For OHLCV time series we normally presume a uniform sampling period (1s or 60s by default) and it's handy to have tools to ensure a series is gapless or contains expected gaps based on (legacy) market hours. For this we leverage `polars`: - add `.nativedb.with_dts()` a datetime-from-epoch-time-column frame "column-expander" which inserts datetime-casted, epoch-diff and dt-diff columns. - add `.nativedb.detect_time_gaps()` which filters to any larger then expected sampling period rows. - wrap the above (for now) in a `piker store anal` (analysis) cmd which atm always enters a breakpoint for tinkering. Supporting storage client additions: - add a `detect_period()` helper for extracting expected OHLC time step. - add new `NativedbStorageClient` methods and attrs to provide for the above: - `.mk_path()` to only deliver a parquet-file path for use in other methods. - `._dfs` to house cached `pl.DataFrame`s loaded from `.parquet` files. - `.as_df()` which loads cached frames or loads them from disk and then caches (for next use). - `_write_ohlcv()` a private-sync version of the public equivalent meth since we don't currently have any actual async file IO underneath; add a flag for whether to return as a `numpy.ndarray`.	2023-06-27 13:41:47 -04:00
Tyler Goodlet	d2accdac9b	Drop remaining mkts nonsense from `store delete`	2023-06-27 13:41:47 -04:00
Tyler Goodlet	c020ab76be	Clean out marketstore specifics - drop buncha cruft from `store ls` cmd and make it work for multi-backend fqme listing. - including adding an `.address` to the mkts client which shows the grpc socketaddr details. - change defauls to new `'nativedb'. - drop 'marketstore' from built-in backend list (for now)	2023-06-27 13:41:47 -04:00
Tyler Goodlet	7b4f4bf804	First draft `.storage.nativedb.` using parquet files After much frustration with a particular tsdb (cough) this instead implements a new native-file (and apache tech based) backend which stores time series in parquet files (for now) using the `polars` apis (since we plan to use that lib as well for processing). Note this code is currently very rough and in draft mode. Details: - add conversion routines for going from `polars.DataFrame` to `numpy.ndarray` and back. - lay out a simple file-name as series key symbology: `fqme.<datadescriptions>.parquet`, though probably it will evolve. - implement the entire `StorageClient` interface as it stands. - adjust `storage.cli` cmds to instead expect to use this new backend, which means it's a complete mess XD Main benefits/motivation: - wayy faster load times with no "datums to load limit" required. - smaller space footprint and we haven't even touched compression settings yet! - wayyy more compatible with other systems which can lever the apache ecosystem. - gives us finer grained control over the filesystem usage so we can choose to swap out stuff like the replication system or networking access.	2023-06-27 13:41:47 -04:00
Tyler Goodlet	94733c4a0b	A PoC tsdb prototype: `parqdb` using `polars` Turns out just (over)writing `.parquet` files with >= 1M datums is like less then a second, and we can likely speed up appends using `fastparquet` (usage coming soon). Includes: - a new `clone` CLI subcmd to test this all out by ad-hoc copy of (literally hardcoded to a daemon-actor specific shm allocation X) an existing `/dev/shm/<ShmArray>` and push to `.parquet` file. - code to convert from our `ShmArray.array: np.ndarray` -> `polars.DataFrame` (thanks SO). - timing checks around the file IO and np -> polars conversion. - a `read` subcmd which i was using to test the sync `pymarketstore` client against our async one to see if the issues from https://github.com/pikers/piker/issues/443 were resolved, but nope!	2023-06-27 13:41:47 -04:00
Tyler Goodlet	7d1cc47db9	ROFL, even using `pymarketstore`'s json-RPC it's borked.. Turns out trying to switch to the old sync client and going back to using the old json-RPC API (after having had to patch the upstream repo to not import gRPC machinery to avoid crashes..) I'm basically getting the exact same issues. New tinkering results does possibly tell some new stuff: - the EOF error seems to indeed be due to trying fetch records which haven't been written (properly) - like asking for a `end=<epoch_int>` that is earlier then the earliest record. - the "snappy input corrupt" error seems to have something to do with the `Params.end` field not being an `int` and/or the int precision not being chosen correctly? - toying with this a bunch manually shows that the internals of the client (particularly `.build_query()` stuff) is parsing/calcing the `Epoch` and `Nanoseconds` values out incorrectly.. which is likely part of the problem. - we also changed `anyio_marketstore.MarketStoreclient.build_query()` logic when removing `pandas` a while back, which also seems to be part of the problem on the async side, however reverting those changes also didn't fix the issue entirely; likely something else more subtle going on (maybe with the write vs. read `Epoch` field type we pass?). Despite all this malarky, we're already underway more or less obsoleting this whole thing with a much less complex approach of using apache parquet files and modern filesystem tools to get a more flexible and numerics-native dataframe-oriented tsdb B)	2023-06-27 13:41:47 -04:00
Tyler Goodlet	e83de2906f	Relegate old marketstore cli eps to masked module	2023-06-27 13:41:47 -04:00
Tyler Goodlet	cb774e5a5d	Re-implement `piker store` CLI with `typer` Turns out you can mix and match `click` with `typer` so this moves what was the `.data.cli` stuff into `storage.cli` and uses the integration api to make it all work B) New subcmd: `piker store` - add `piker store ls` which lists all fqme keyed time-series from backend. - add `store delete` to remove any such key->time-series. - now uses a nursery for multi-timeframe concurrency B) Mask out all the old `marketstore` specific subcmds for now (streaming, ingest, storesh, etc..) in anticipation of moving them into a subpkg-module and make sure to import the sub-cmd module in our top level cli package. Other `.storage` api tweaks: - drop the reraising with custom error (for now). - rename `Storage` -> `StorageClient` (or should it be API?).	2023-06-27 13:41:47 -04:00
Tyler Goodlet	1ec9b0565f	Move `.data.cli` to `.storage.cli`	2023-06-27 13:41:47 -04:00
Tyler Goodlet	7ab97fb21d	Add marketstore client as storage-backend module To kick off our (tsdb) storage backends this adds our first implementing a new `Storage(Protocol)` client interface. Going foward, the top level `.storage` pkg-module will now expose backend agnostic APIs and helpers whilst specific backend implementations will adhere to that middle-ware layer. Deats: - add `.storage.marketstore.Storage` as the first client implementation, moving all needed (import) dependencies out from `.service.marketstore` as well as `.ohlc_key_map` and `get_client()`. - move root `conf.toml` loading from `.data.history` into `.storage.__init__.open_storage_client()` which now takes in a `name: str` and does all the work of loading the correct backend module, its config, and determining if a service-instance can be contacted and a client loaded; in the case where this fails we raise a new `StorageConnectionError`. - add a new `.storage.get_storagemod()` just like we have for brokers. - make `open_storage_client()` also return the backend module such that the history-data layer can make backend specific calls as needed (eg. ohlc_key_map). - fall back to a basic non-tsdb backfill when `open_storage_client()` raises the new connection error.	2023-06-27 13:41:47 -04:00
Tyler Goodlet	29211b200d	Start `piker.storage` subsys: cross-(ts)db middlewares The plan is to offer multiple tsdb and other storage backends (for a variety of use cases) and expose them similarly to how we do for broker and data providers B)	2023-06-27 13:41:47 -04:00

43 Commits (89e241c132b86615bebc44d6c8eb1973e80a2f71)