Design: data schema for opts ts storage #24
Loading…
Reference in New Issue
There is no content yet.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may exist for a short time before cleaning up, in most cases it CANNOT be undone. Continue?
The fundamental “source” data schema we need to store for computing max pain at this moment:
Maybe we need to evaluate which data we want to add to the above data schema.
For the storage we need to add methods like
write_ohlcv()
andread_ohlcv()
for the new data schema:write_oi()
andread_oi()
(of course, this can be change)Also, add a method like
mk_ohlcv_shm_keyed_filepath()
for the open interest shm keyed filepath:mk_oi_shm_keyed_filepath()
Add storage support for derivsto Design: data schema for opts ts storage@ntorres one thing i can see immediately missing is an expiry field 😉
One in depth question i have (and don’t have an answer for yet) is whether we want to to orient the table schema such that all contracts can be interleaved in a big continuous table (for say a given strike/expiry/type) and then broken up for processing/viewing via a
polars
unpack or should we be keeping all contracts separate files entirely for long term storage and then expecting.parquet
loader code to do the correct (parsing of filenames) thing to make it easy to load multiple contract-mkts at once?Ok, yeah after a little thought i figured i’d make a table diagram (see attached) of what i envision being the most useful joined end-table for analysis and processing (like max-pain).
I’m hoping to re-render this using a
d2
table (maybe inside a larger diagram) and/or possibly using a real screen shot fromsummary of approaches
granular per-mkt named by FQME files
the most file-granular approach would be to keep the table fields relatively simple with a min 2 but possibly optionally 3:
time[_ns]: int|float
, the timestamp of the OI updateoi: float|int
, the actual purported open interest of the contract reported by a provider (feed)oi_cal: float|int
, thepiker
options sys calculated by us OI based on a mkt’s prior state (initial OI) and ongoing (real-time) vlm feed (likely ohlcv of some sort)this would mean we leverage a unique fqme schema to index mkts in the file sys very distinctly in such a way that,
.parquet
file contains which deriv mkts data by glancing at the filesysvisidata
(which yes supportsparquest
;) from consolepiker.fsp
andpiker.ui
consumers can be more flexible with logic around certain fields existing optionally from various providers and then dynamically processing/displaying different outputpolars
when needed but will have the further flexibility that not all mkts for a given “super scope” need to be loaded into mem if the user doesn’t require itall-in-one-table NON-granular, which would mean less
.parquet
filesnot my preferred approach since it means constantly having to load a large table scoped by some grander mkt-property (like provider or expiry or settling asset etc.)
However, this likely would result in a much simpler
.piker.storage
implementation as well as improved loading performance for very large (multi assert) derivatives data sets since all “sub contract mkts” could be allocated in a single table like originally put in the descr:Another impl detail wrt how a
datad
provider can offer OI info for a deriv..Likely they either provide it like we should,
oi
/oi_calc
updated with every clearing event (every tick that contains non-zero vlm) or,deribit
)how we want to aggregate
given there is likely going to be 2 feeds for most providers,
we need a way to merge these to enable getting single table for processing such that you can easily get the normaly trade event info (ticks) but with at least an added
oi
/oi_calc
field included as a column.For a granular per-mkt named by FQME files approach.
Open interest storage: #42
Here we’re using the
storage.nativedb
mod to write in a parquet file8f1e082c91
, at the moment it creates a file if it doesn’t exist for thefqme
and append the last process row.As discuss above this is the structure dtype:
The file name is the
fqme
for each instrument, in themax_pain
we construct this for each incoming msg, a better solution is to pass thefqme
already build,fqme
github discussionSo far this script appends the last row on each update for the instruments on separate files for a specifics
expiry_date
write_oi()
and_write_oi()
interface and impl for writting in parquet theoi struct
instorage.nativedb
, also addmk_oi_shm_keyed_filepath()
for managing shm file.