Topically, throughout various (seemingly) console-UX-affecting or benign
spots in the code base; nothing that required more intervention beyond
things superficial. A few spots also include `trio.Nursery` ref renames
(always to something with a `tn` in it) and log-level reductions to
quiet (benign) console noise oriented around issues meant to be solved
long..
Note there's still a couple spots i left with the loose-ify flag because
i haven't fully tested them without using the latest version of
`tractor.trionics.collapse_eg()`, but more then likely they should flip
over fine.
That is to use the new `tractor.msg.types.Aid` struct to pull the
`brokerd` info from the `tractor.Channel.aid: Aid` attr as well as more
generally handling the new `Channel.raddr.proto_key: str` and no longer
assuming a TCP IPC transport; this per the recent `tractor.ipc`
subsys which adds multi-IPC-transports!
Downstream tweaks to match,
- use an "opt-in" field set to display in the `brokerd` info pane in
`.ui._feedstatus.mk_feed_label()`.
|_ also add some todos and drop some seemingly unneeded form sizing
calcs?
- tweak `.ui._label` to allow not using markdown, though ended up not
doing that since it looked too plain..
- use more compact optional value style with `|`-union
- fix `.flows` typing-only import since we need `MktPair` to be
immediately defined for use on a `msgspec.Struct` field.
- more "tree-like" warning msg in `.validate()` reporting.
A new class var `tuple[Exception]` such that the err set can be reffed
externally as needed for catching other similar pub-sub/IPC failures in
other (related) real-time sub-systems.
Also added some now-masked logging for debugging live-feed stream reading
issues that should ONLY be used for debugging since they'll greatly
degrade HFT perf. Used the new `log.mk_repr()` stuff (that one day we
should prolly pull from `modden` as a dep) for pretty console emissions.
Since now `tractor` will raise this native `trio`-exc translated from
a `Stop` msg when the peer gracefully terminates a `tractor.MsgStream`.
Just `info()` log in such cases versus continuing to warn for the
others.
Since currently we're only using this IPC subsys for `deribit`, and
generally speaking we're primarly supporting options markets (which are
fairly "slow moving"), flip to a default of NOT resetting the `NoBsWs`
on timeout since doing so normally breaks the jsron-rpc IPC session.
Without a proper `fixture` passed to `open_autorecon_ws()` (which we
should eventually implement!!) relying on a timeout-to-reset more or
less will just cause breakage issues - a proper reconnect sequence must
be implemented before using that feature.
Deats,
- expose and proxy through the `msg_recv_timeout` from
`open_jsonrpc_session()` into the underlying `open_autorecon_ws()`
call.
Add a doc-string reflecting recent refinements, drop all the old hook
params, rename `n: trio.Nursery` -> `tn` for "task nursery" fitting with
code base's naming style.
In prep for supporting reverse-ipc connect-back to UI actors from
middle-ware systems (for the purposes of triggering data-view canvas
re-renders and built-in tsp annotations), add a new struct type to
better generalize the management of remote feed subscriptions. Include
a `Sub.rc_ui: bool` for now (with nearby todo-comment) and expose an
`allow_remote_ctl_ui: bool` through the feed endpoints to help drive
/ prep for all that ^
Rework all the sampler tasks to expect the `Sub`'s new iface:
- split up the `Sub.ipc: MsgStream` and `.send_chan` as separate fields
since we're handling the throttle case in separate
`sample_and_broadcast()` logic blocks anyway and avoids needing to
monkey-patch on the `._ctx` malarky..
- explicitly provide the optional handle to the `_throttle_cs:
CancelScope` again for the case where throttling/event-downsampling is
requested.
- add `_FeedsBus.subs_items()` as a public iterator.
Also toss in a poll loop around the `hist_shm: ShmArray` backfill
read-check in the `.data.allocate_persisten_feed()` init to cope with
possible racy-ness from the increased tsdb history loading concurrency
now implemented.
Move `.data.history` -> `.tsp.__init__.py` for now as main pkg-mod
and `.data.tsp` -> `.tsp._anal` (for analysis).
Obviously follow commits will change surrounding codebase (imports) to
match..
Thinking about just moving all of that module (after a content breakup)
to a new `.piker.tsp` which will mostly depend on the `.data` and
`.storage` sub-pkgs; the idea is to move biz-logic for tsdb IO/mgmt and
orchestration with real-time (shm) buffers and the graphics layer into
a common spot for both manual analysis/research work and better
separation of low level data structure primitives from their higher
level usage.
Add a better `data.history` mod doc string in prep for this move
as well as clean out a bunch of legacy commented cruft from the
`trimeter` and `marketstore` days.
TO CHERRY #486 (if we can)
For each timeframe open a sub-nursery to do the backfilling + tsdb load
+ null-segment scanning in an effort to both speed up load time (though
we need to reverse the current order to really make it faster rn since
moving to the much faster parquet file backend) and do concurrent
time-gap/null-segment checking of tsdb history while mrf (most recent
frame) history is backfilling.
The details are more or less just `trio` related task-func composition
tricks and a reordering of said funcs for optimal startup latency.
Also commented the `back_load_from_tsdb()` task for now since it's
unused.
Apparently it returns the index of the prior zero-row (prolly since we
do the backward difference) so ensure `fi_zgaps += 1`..
Also fix remaining edge case handling when there's only 2 zero-segs
which was borked after a refactor to the special case blocks (like
a single zero row) prior to the `absi_zsegs` building loop AND make sure
to always return abs indices OUTSIDE the zero seg, i.e. the indices of
the non-zero row just before and just after so that the history
backfiller can use non-zero timestamps to generate range datetimes for
backend frame queries.
Add much more detailed doc-comments with a small ascii diagram to
explain how all these somewhat subtle vec ops work. Also toss in some
sanity checks on the output indices to ensure they don't point to
zero (time) valued rows when used to read the frame.
Call it `iter_null_segs()` (for now?) and use in the final (sequential)
stage of the `.history.start_backfill()` task-func. Delivers abs,
frame-relative, and equiv time stamps on each iteration pertaining to
each detected null-segment to make it easy to do piece-wise history
queries for each.
Further,
- handle edge case in `get_null_segs()` where there is only 1 zeroed
row value, in which case we deliver `absi_zsegs` as a single pair of
the same index value and,
- when this occurs `iter_null_seqs()` delivers `None` for all the
`start_` related indices/timestamps since all `get_hist()` routines
(delivered by `open_history_client()`) should handle it as being a
"get max history from this end_dt" type query.
- add note about needing to do time gap handling where there's a gap in
the timeseries-history that isn't actually IN the data-history.
Using a bunch of fancy `numpy` vec ops (and ideally eventually extending
the same to `polars`) this is a first draft of `get_null_segs()`
a `col: str` field-value-is-zero detector which filters to all zero-valued
input frame segments and returns the corresponding useful slice-indexes:
- gap absolute (in shm buffer terms) index-endpoints as
`absi_zsegs` for slicing to each null-segment in the src frame.
- ALL abs indices of rows with zeroed `col` values as `absi_zeros`.
- the full set of the input frame's row-entries (view) which are
null valued for the chosen `col` as `zero_t`.
Use this new null-segment-detector in the
`.data.history.start_backfill()` task to attempt to fill null gaps that
might be extant from some prior backfill attempt. Since
`get_null_segs()` should now deliver a sequence of slices for each gap
we don't really need to have the `while gap_indices:` loop any more, so
just move that to the end-of-func and warn log (for now) if all gaps
aren't eventually filled.
TODO:
-[ ] do the null-seg detection and filling concurrently from
most-recent-frame backfilling.
-[ ] offer the same detection in `.storage.cli` cmds for manual tsp
anal.
-[ ] make the graphics layer actually update correctly when null-segs
are filled (currently still broken somehow in the `Viz` caching
layer?)
CHERRY INTO #486
In an effort to catch out-of-order and/or partial-frame-duplicated
segments, add some `.tsp` calls throughout the backloader tasks
including a call to the new `.sort_diff()` to catch the out-of-order
history cases.
Since the `diff: int` serves as a predicate anyway (when `0` nothing
duplicate was detected) might as well just return it directly since it's
likely also useful for the caller when doing deeper anal.
Also, handle the zero-diff case by just returning early with a copy of
the input frame and a `diff=0`.
CHERRY INTO #486
Yet again these are (going to be) generally useful in the data proc
layer as well as going forward with (possibly) moving the history and
shm rt-processing layer to apache (arrow or other) shared-ds
equivalents.
Includes a rename of `.data._timeseries` -> `.data.tsp` for "time series
processing", making it a public sub-mod; it contains a highly useful set
of data-frame and `numpy.ndarray` ops routines in various subsystems Bo
I guess since i started supporting the whole "allow a gap between
the latest tsdb sample and the latest retrieved history frame" the
overlap slicing has been completely borked XD where we've been sticking
in duplicate history samples and this has caused all sorts of down
stream time-series processing issues..
So fix that but ensuring whenever there IS an overlap between history in
the latest frame and the tsdb that we always prefer the latest frame's
data and slice OUT the tsdb's duplicate indices..
CHERRY TO #486
Allows opening with `.from_msg(readonly=False)` for write permissions
making underlyig shm arrays readonly. Also, make sure to pop the
`ShmArray` field entries prior to msg-ization, not sure how that worked
with the `Feed.flumes` equivalent..but?
A helper for scanning a "pairs table" that most backends should expose
as part of their (internal) symbology set using `rapidfuzz` over
a `dict[str, Struct]` input table.
Also expose the `data.types.Struct` at the subpkg top level.
Of course I missed this first try but, we need to use the ws market pair
symbology set (since apparently kraken loves redundancy at least 3 times
XD) when processing transactions that arrive from live clears since it's
an entirely different `LTC/EUR` style key then the `XLTCEUR` style
delivered from the ReST eps..
As part of this:
- add `Client._altnames`, `._wsnames` as `dict[str, Pair]` tables,
leaving the `._AssetPairs` table as is keyed by the "xname"s.
- Change `Pair.respname: str` -> `.xname` since these keys all just seem
to have a weird 'X' prefix.
- do the appropriately keyed pair table lookup via a new `api_name_set:
str` to `norm_trade_records()` and set is correctly in the ws live txn
handler task.
Since it's depended on by `.data` stuff as well as pretty much
everything else, makes more sense to expose it as a top level module
(and maybe eventually as a subpkg as we add to it).
We don't need it in `detect_time_gaps()` since doing straight up
datetime diffs in `polars` already has a humanized `str` representation
but with higher precision like '2d 1h 24m 1s' B)