gap_annotator: super fast (time-)gap detection API #75

Open
goodboy wants to merge 14 commits from gap_annotator into ib_venue_closures

Adds a new .ui._annotate.GapAnnotations(GraphicsObject): and surrounding IPC-serialization-wrapping for remote control, written with the help of claudey using the new pyqtgraph.Qt.internals.PrimitiveArray introduced in 0.13.2 release.

This provides us with a (remote-ctl) API for annotating large time-series with arrows, rects and txt-labels; visual markings that are very handy in verifying/analyzing (the quality of) backfilled for storage time series data-sets.

This feature set was originally initiated to help an introspecting human solve core conc-logic issues in the backfiller that can result in piecewise-gappy OHLCV series due to a variety of edge cases,

  • (rarest) actual bad data from the provider

  • an undetected/acknowledged venue closure which causes re-retrieval of already backfilled frames being duplicated.

  • backfiller multitasking race conditions during load-from-offline-storage (.parquet) where the previously saved series was in some way truncated/mis-aligned/corrupted/null-filled due to,

    • the data-daemon (rn brokerd but eventually should be a separate datad) being killed during backfill resulting in partially updated frames/bars and/or null-segments pushed to the offline format.

    • races on the shm-buffer being flushed to disk (due to its ring-buffer semantics) while the backfiller task(s) are mid write; possibly we are lacking better locking around this condition.

The longer term goal is obviously to avoid out-of-order/invalid tsdb writes at the outset instead of current workarounds using de-duplication/null-segment-filling helpers and other surrounding immediate hacks..


Follow up from #62


To-use/test/do here!

  • the piker store ldshm <fqme> cmd will now request the chart actor to annotate all gaps with rects + arrows indicating both known-venue-closure and unexpected time gaps.

  • UX to handle the diff between valid venue closure time-gaps and bad(ly written)/missing data time gaps.

    • txt labels which indicate the missing time duration using a “humanized” label (2h, 1.2d, 16s etc.) should likely only be shown when requested by the user (likely via key-combo/config/button in the chart UI) but any unexpected non-closure gaps should likely be marked with a ?? or something?

    • a UX-mechanism to manually piecewise backfill gaps using the mouse + ctx menu?

Follow through if not included here

  • follow up todos for integrating said gap-detector/checker into the chart actor’s core runtime and UX.

  • (if not done above) a UX to re-fill missing/bad ts data manually?

    • like a context-menu (on right-click) that let’s a user (auto)select regions which obviously/ostensibly contain time gaps that can be filled by the provider.

Follow-up-todos from GH,

Those that would be nice to knock out here but it’s fine if we just start tracking them throughout all follow up PRs.

Not sure how many are practical to (mark) solve(d) immediately but at least to get my head back in the problem set.

Adds a new `.ui._annotate.GapAnnotations(GraphicsObject):` and surrounding IPC-serialization-wrapping for remote control, written with the help of `claude`y using the new `pyqtgraph.Qt.internals.PrimitiveArray` introduced in `0.13.2` release. This provides us with a (remote-ctl) API for annotating large time-series with arrows, rects and txt-labels; visual markings that are very handy in verifying/analyzing (the quality of) backfilled for storage time series data-sets. This feature set was originally initiated to help an introspecting human solve core conc-logic issues in the backfiller that can result in piecewise-gappy OHLCV series due to a variety of edge cases, - (rarest) actual bad data from the provider - an undetected/acknowledged venue closure which causes re-retrieval of already backfilled frames being duplicated. - backfiller multitasking race conditions during load-from-offline-storage (`.parquet`) where the previously saved series was in some way truncated/mis-aligned/corrupted/null-filled due to, - the data-daemon (rn `brokerd` but eventually should be a separate `datad`) being killed during backfill resulting in partially updated frames/bars and/or null-segments pushed to the offline format. - races on the shm-buffer being flushed to disk (due to its ring-buffer semantics) while the backfiller task(s) are mid write; possibly we are lacking better locking around this condition. The longer term goal is obviously to avoid out-of-order/invalid tsdb writes at the outset instead of current workarounds using de-duplication/null-segment-filling helpers and other surrounding immediate hacks.. --- ### Follow up from #62 - [x] pinning to latest upstream `pyqtgraph` release! - [x] `0.14.0` appears to contain everything we require atm. - [x] do we keep maintaining our fork? - [ ] doc-if-kept the "smart dedupe" helper drafted in #62, https://pikers.dev/pikers/piker/pulls/62/files#diff-6740f589de8d885050fe2d16873c3c5c9bb18539 - [ ] doc and detail the remote-annot API refined in #62, * https://pikers.dev/pikers/piker/pulls/62/files#diff-198a0edc98e916a33c4bdbcc94de86c1d37681bc * https://pikers.dev/pikers/piker/pulls/62/files#diff-869be8a6a60c6c8e9feb6b264a1b7175726fa6e1 * also the extensions made to various annotation "shapes" by exposing more `pg` object kwargs passing to each. - [ ] doc and note the new offline REPL helper which enables `claude` to "batch-of-cmds" experience" what a dev is seeing from a `pdbp` interaction. * https://pikers.dev/pikers/piker/pulls/62/files#diff-86d2c9c17effd0857dcaed9f3914df637064f4ef - [ ] (@goodboy) actually document the `piker store [ls/delete/ldshm]` CLI.. - [ ] ?ideally bring in the skills stuff in #69 beforehand? --- ### To-use/test/do here! - [ ] the `piker store ldshm <fqme>` cmd will now request the chart actor to annotate all gaps with rects + arrows indicating both known-venue-closure and unexpected time gaps. - [ ] UX to handle the diff between valid venue closure time-gaps and bad(ly written)/missing data time gaps. - [ ] txt labels which indicate the missing time duration using a "humanized" label (`2h`, `1.2d`, `16s` etc.) should likely only be shown when requested by the user (likely via key-combo/config/button in the chart UI) but any unexpected non-closure gaps should likely be marked with a `??` or something? - [ ] a UX-mechanism to manually piecewise backfill gaps using the mouse + ctx menu? #### Follow through if not included here - [ ] follow up todos for integrating said gap-detector/checker into the chart actor's core runtime and UX. - [ ] (**if not done above**) a UX to re-fill missing/bad ts data manually? * like a context-menu (on right-click) that let's a user (auto)select regions which obviously/ostensibly contain time gaps that can be filled by the provider. --- #### Follow-up-todos from GH, Those that would be nice to knock out here but it's fine if we just start tracking them throughout all follow up PRs. Not sure how many are practical to (mark) solve(d) immediately but at least to get my head back in the problem set. - [ ] landing orig GH pr (as a formality), - https://github.com/pikers/piker/pull/486 - [ ] pull out issues from ^ (and any others) ideally using `claude` GH integration to summarize all the follow-up bugs solved here! - [ ] `/install-github-app`, - [ ] storage layer draft PR? - https://github.com/pikers/piker/pull/446 - [ ] various outstanding `tsdb` tagged stuffs, - https://github.com/pikers/piker/issues?q=is%3Aissue%20state%3Aopen%20label%3Atsdb
goodboy added 7 commits 2026-02-23 03:45:24 +00:00
49f75ee1fe Add info log for shm processing in `ldshm` CLI cmd
Log shm file name and detected period before null segment
processing to aid debugging.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
53b4775698 Add a `GapAnnotations` path-renderer
For a ~1000x perf gain says ol' claudy, our boi who wrote this entire
patch! Bo

Introduce `GapAnnotations` in `.ui._annotate` for batch-rendering gap
rects/arrows instead of individual `QGraphicsItem` instances. Uses
upstream's `pyqtgraph.Qt.internals.PrimitiveArray` for rects and
a `QPainterPath` for arrows. This API-replicates our prior annotator's
in view shape-graphics but now using (what we're dubbing)
"single-array-multiple-graphics" tech much like our `.ui._curve`
extensions to `pg` B)

Impl deats,
- batch draw ~1000 gaps in single paint call vs 1000 items
- arrows render in scene coords to maintain pixel size on zoom
- add vectorized timestamp-to-index lookup for repositioning
- cache bounding rect, rebuild on `reposition()` calls
- match `SelectRect` + `ArrowItem` visual style/colors
- skip reposition when timeframe doesn't match gap's period

Other,
- fix typo in `LevelMarker` docstring: "graphich" -> "graphic"
- reflow docstring in `qgo_draw_markers()` to 67 char limit

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
e433f24917 Add batch-submit API for gap annotations
Introduce `AnnotCtl.add_batch()` and `serve_rc_annots()` batch
handler to submit 1000s of gaps in single IPC msg instead of
per-annot round-trips. Server builds `GapAnnotations` from specs
and handles vectorized timestamp-to-index lookups.

Deats,
- add `'cmd': 'batch'` handler in `serve_rc_annots()`
- vectorized timestamp lookup via `np.searchsorted()` + masking
- build `gap_specs: list[dict]` from rect+arrow specs client-side
- create single `GapAnnotations` item for all gaps server-side
- handle `GapAnnotations.reposition()` in redraw handler
- add profiling to batch path for perf measurement
- support optional individual arrows for A/B comparison

Also,
- refactor `markup_gaps()` to collect specs + single batch call
- add `no_qt_updates()` context mgr for batch render ops
- add profiling to annotation teardown path
- add `GapAnnotations` case to `rm_annot()` match block

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
89f9763744 Pin `pg` at latest official `0.14.0` release
Keep in masked GH sources lines for easy hackin against upstream
`master` branch when needed as well!
goodboy changed title from gap_annotator: super fast (time-)gap annotator + API to gap_annotator: super fast (time-)gap detection API 2026-02-23 03:53:19 +00:00
goodboy force-pushed gap_annotator from 89f9763744 to 47cd48fea7 2026-02-24 17:31:59 +00:00 Compare
goodboy force-pushed gap_annotator from 47cd48fea7 to 4ff6cd9afa 2026-02-24 20:50:57 +00:00 Compare
goodboy force-pushed gap_annotator from 4ff6cd9afa to a06fe473cf 2026-02-24 21:07:01 +00:00 Compare
This pull request can be merged automatically.
You are not authorized to merge this pull request.
You can also view command line instructions.

Step 1:

From your project repository, check out a new branch and test the changes.
git checkout -b gap_annotator ib_venue_closures
git pull origin gap_annotator

Step 2:

Merge the changes and update on Gitea.
git checkout ib_venue_closures
git merge --no-ff gap_annotator
git push origin ib_venue_closures
Sign in to join this conversation.
No reviewers
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: pikers/piker#75
There is no content yet.