Design: data schema for opts ts storage #24
Loading…
Reference in New Issue
There is no content yet.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may exist for a short time before cleaning up, in most cases it CANNOT be undone. Continue?
The fundamental “source” data schema we need to store for computing max pain at this moment:
Maybe we need to evaluate which data we want to add to the above data schema.
For the storage we need to add methods like
write_ohlcv()
andread_ohlcv()
for the new data schema:write_oi()
andread_oi()
(of course, this can be change)Also, add a method like
mk_ohlcv_shm_keyed_filepath()
for the open interest shm keyed filepath:mk_oi_shm_keyed_filepath()
Add storage support for derivsto Design: data schema for opts ts storage@ntorres one thing i can see immediately missing is an expiry field 😉
One in depth question i have (and don’t have an answer for yet) is whether we want to to orient the table schema such that all contracts can be interleaved in a big continuous table (for say a given strike/expiry/type) and then broken up for processing/viewing via a
polars
unpack or should we be keeping all contracts separate files entirely for long term storage and then expecting.parquet
loader code to do the correct (parsing of filenames) thing to make it easy to load multiple contract-mkts at once?Ok, yeah after a little thought i figured i’d make a table diagram (see attached) of what i envision being the most useful joined end-table for analysis and processing (like max-pain).
I’m hoping to re-render this using a
d2
table (maybe inside a larger diagram) and/or possibly using a real screen shot fromsummary of approaches
granular per-mkt named by FQME files
the most file-granular approach would be to keep the table fields relatively simple with a min 2 but possibly optionally 3:
time[_ns]: int|float
, the timestamp of the OI updateoi: float|int
, the actual purported open interest of the contract reported by a provider (feed)oi_cal: float|int
, thepiker
options sys calculated by us OI based on a mkt’s prior state (initial OI) and ongoing (real-time) vlm feed (likely ohlcv of some sort)this would mean we leverage a unique fqme schema to index mkts in the file sys very distinctly in such a way that,
.parquet
file contains which deriv mkts data by glancing at the filesysvisidata
(which yes supportsparquest
;) from consolepiker.fsp
andpiker.ui
consumers can be more flexible with logic around certain fields existing optionally from various providers and then dynamically processing/displaying different outputpolars
when needed but will have the further flexibility that not all mkts for a given “super scope” need to be loaded into mem if the user doesn’t require itall-in-one-table NON-granular, which would mean less
.parquet
filesnot my preferred approach since it means constantly having to load a large table scoped by some grander mkt-property (like provider or expiry or settling asset etc.)
However, this likely would result in a much simpler
.piker.storage
implementation as well as improved loading performance for very large (multi assert) derivatives data sets since all “sub contract mkts” could be allocated in a single table like originally put in the descr:Another impl detail wrt how a
datad
provider can offer OI info for a deriv..Likely they either provide it like we should,
oi
/oi_calc
updated with every clearing event (every tick that contains non-zero vlm) or,deribit
)how we want to aggregate
given there is likely going to be 2 feeds for most providers,
we need a way to merge these to enable getting single table for processing such that you can easily get the normaly trade event info (ticks) but with at least an added
oi
/oi_calc
field included as a column.I’m debating myself on which “table storage” approach to take, if we have a solid fqme, I think its better to go for a granular one (granular per-mkt named by FQME files), and then we can load whatever fqme we want
for deribit a fqme looks like this:
btc-26feb25-90k-c.reversed_option.deribit
, we already have:currency
,expiry_date
,strike_price
,instrument_kind
,option_type
andexchange
, so we need to store for each fqme this:And then write all the machinery necessary to handle the fqme files (something like write_derv, get_by_strike, get_by_expiry_date, etc)