Namely insertion writes which over-fill the shm buffer past the latest
tsdb sample via `.tsp._history.shm_push_in_between()`.
Deats,
- check earliest `to_push` timestamp and enter pause point if it's
earlier then the tsdb's `backfill_until_dt` stamp.
- requires actually passing the `backfill_until_dt: datetime` thru,
* `get_null_segs()`
* `maybe_fill_null_segments()`
* `shm_push_in_between()` (obvi XD)
Namely such that when the previous-df-row by our shm-abs-'index' doesn't
exist we ignore certain cases which are likely due to borked-but-benign
samples written to the tsdb or rt shm buffers prior.
Particularly we now ignore,
- any `dt`/`prev_dt` values which are UNIX-epoch timestamped (val of 0).
- any row-is-first-row in the df; there is no previous.
- any missing previous datum by 'index', in which case we lookup the
`wdts` prior row and use that instead.
* this would indicate a missing sample for the time-step but we can
still detect a "gap" by looking at the prior row, by df-abs-index
`i`, and use its timestamp to determine the period/size of missing
samples (which need to likely still be retrieved).
* in this case i'm leaving in a pause-point for introspecting these
rarer cases when `--pdb` is passed via CLI.
Relatedly in the `piker store` CLI ep,
- add `--pdb` flag to `piker store`, pass it verbatim as `debug_mode`.
- when `times` has only a single row, don't calc a `period_s` median.
- only trace `null_segs` when in debug mode.
- always markup/dedupe gaps for `period_s==60`
Using `claude`, add a `.tsp._dedupe_smart` module that attemps "smarter"
duplicate bars by attempting to distinguish between erroneous bars
partially written during concurrent backfill race conditions vs.
**actual** data quality issues from historical providers.
Problem:
--------
Concurrent writes (live updates vs. backfilling) can result in create
duplicate timestamped ohlcv vars with different values. Some
potential scenarios include,
- a market live feed is cancelled during live update resulting in the
"last" datum being partially updated with all the ticks for the
time step.
- when the feed is rebooted during charting, the backfiller will not
finalize this bar since rn it presumes it should only fill data for
time steps not already in the tsdb storage.
Our current naive `.unique()` approach obvi keeps the incomplete bar
and a "smarter" approach is to compare the provider's final vlm
amount vs. the maybe-cancelled tsdb's bar; a higher vlm value from
the provider likely indicates the cancelled-during-live-write and
**not** a datum discrepancy from said data provider.
Analysis (with `claude`) of `zecusdt` data revealed:
- 1000 duplicate timestamps
- 999 identical bars (pure duplicates from 2022 backfill overlap)
- 1 volume-monotonic conflict (live partial vs backfill complete)
A soln from `claude` -> `tsp._dedupe_smart.dedupe_ohlcv_smart()`
which:
- sorts by vlm **before** deduplication and keep the most complete
bar based on vlm monotonicity as well as the following OHLCV
validation assumptions:
* volume should always increase
* high should be non-decreasing,
* low should be non-increasing
* open should be identical
- Separates valid race conditions from provider data quality issues
and reports and returns both dfs.
Change summary by `claude`:
- `.tsp._dedupe_smart`: new module with validation logic
- `.tsp.__init__`: expose `dedupe_ohlcv_smart()`
- `.storage.cli`: integrate smart dedupe, add logging for:
* duplicate counts (identical vs monotonic races)
* data quality violations (non-monotonic, invalid OHLC ranges)
* warnings for provider data issues
- Remove `assert not diff` (duplicates are valid now)
Verified on `zecusdt`: correctly keeps index 3143645
(volume=287.777) over 3143644 (volume=140.299) for
conflicting 2026-01-16 18:54 UTC bar.
`claude`'s Summary of reasoning
-------------------------------
- volume monotonicity is critical: a bar's volume only increases
during its time window.
- a backfilled bar should always have volume >= live updated.
- violations indicate any of:
* Provider data corruption
* Non-OHLCV aggregation semantics
* Timestamp misalignment
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Add a new `snippets/claude_debug_helper.py` to
provide a programmatic interface to `tractor.pause()` debugger
sessions for incremental data inspection matching the interactive UX
but able to be run by `claude` "offline" since it can't seem to feed
stdin (so it claims) to the `pdb` instance due to lack of ability to
allocate a tty internally.
The script-wrapper is based on `tractor`'s `tests/devx/` suite's use of
`pexpect` patterns for driving `pdbp` prompts and thus enables
automated-offline execution of REPL-inspection commands **without**
using incremental-realtime output capture (like a human would use it).
Features:
- `run_pdb_commands()`: batch command execution
- `InteractivePdbSession`: context manager for step-by-step REPL interaction
- `expect()` wrapper: timeout handling with buffer display
- Proper stdin/stdout handling via `pexpect.spawn()`
Example usage:
```python
from debug_helper import InteractivePdbSession
with InteractivePdbSession(
cmd='piker store ldshm zecusdt.usdtm.perp.binance'
) as session:
session.run('deduped.shape')
session.run('step_gaps.shape')
```
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
Polars tightened type safety for `.dt` accessor methods requiring
`total_*` methods for duration types vs datetime component accessors
like `day()` which now only work on datetime dtypes.
`detect_time_gaps()` in `.tsp._anal` was calling `.dt.day()`
on `dt_diff` column (a duration from `.diff()`) which throws
`InvalidOperationError` on modern polars.
Changes:
- use f-string to add pluralization to map time unit strings to
`total_<unit>s` form for the new duration API.
- Handle singular/plural forms: 'day' -> 'days' -> 'total_days'
- Ensure trailing 's' before applying 'total_' prefix
Also updates inline comments explaining the polars type distinction
between datetime components vs duration totals.
Fixes `piker store ldshm` crashes on datasets with time gaps.
(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
The `await feed_is_live.wait()` is more or less pointless and would only
cause slower startup afaig (as-far-as-i-grok) so i'm masking it here.
This also removes the final `strict_exception_groups=False` use from the
non-tests code base, flipping to the `tractor.trionics` collapser once
and for all!
Since i'm seeing IB records with a `None` value and i don't want to be
debugging every time order-mode boots up..
Also use `pdb=debug` in `.open_ledger_dfs()`
Note, this had conflicts on `piker/accounting/calc.py` when rebasing
onto the refactored `brokers_refinery` history which were resolved
manually!
To start this is just a shell for the test, there's no checking logic
yet.. put it as `test_accounting.test_ib_account_with_duplicated_mktids()`.
The test is composed for now to be completely runtime-free using only
the offline txn-ledger / symcache / account loading APIs, ideally we
fill in the activated symbology-data-runtime cases once we figure a sane
way to handle incremental symcache updates for backends like IB..
To actually fill the test out with real checks we still need to,
- extract the problem account file from my ib.algopape into the test
harness data.
- pick some contracts with multiple fqmes despite a single bs_mktid and
ensure they're aggregated as a single `Position` as well as,
* ideally de-duplicating txns from the account file section for the
mkt..
* warning appropriately about greater-then-one fqme for the bs_mktid
and providing a way for the ledger re-writing to choose the
appropriate `<venue>` as the "primary" when the
data-symbology-runtime is up and possibly use it to incrementally
update the IB symcache and store offline for next use?
Since under `trio`-cancellation the `.remove()` is a checkpoint and will
be masked by a taskc AND we **always want to remove the rect** despite
the surrounding teardown conditions.
In the `translate_and_relay_brokerd_events()` loop task that is, such
that we never crash on a `status_msg = book._active.pop(oid)` in the
'closed' status handler whenever a double removal happens.
Turns out there were unforeseen races here when a benign backend error
would cause an order-mode dialog to be cancelled (incorrectly) and then
a UI side `.on_cancel()` would trigger too-early removal from the
`book._active` table despite the backend sending an actual 'closed'
event (much) later, this would crash on the now missing entry..
So instead we now,
- obviously use `book._active.pop(oid, None)`
- emit a `log.warning()` (not info lol) on a null-read and with a less
"one-line-y" message explaining the double removal and maybe *why*.
For backends which opt to set the new `BrokerdPosition.bs_mktid` field,
give (matching logic) priority to it such that even if the `.symbol`
field doesn't match the mkt currently focussed on chart, it will
always match on a provider's own internal asset-mapping-id. The original
fallback logic for `.fqme` matching is left as is.
As an example with IB, a qqq.nasdaq.ib txn may have been filled on
a non-primary venue as qqq.directedea.ib, in this case if the mkt is
displayed and focused on chart we want the **entire position info** to
be overlayed by the `OrderMode` UX without discrepancy.
Other refinements,
- improve logging and add a detailed edge-case-comment around the
`.on_fill()` handler to clarify where if a benign 'error' msg is
relayed from a backend it will cause the UI to operate as though the
order **was not-cleared/cancelled** since the `.on_cancel()` handler
will have likely been called just before, popping the `.dialogs`
entry. Return `bool` to indicate whether the UI removed-lines
/ added-fill-arrows.
- inverse the `return` branching logic in `.on_cancel()` to reduce
indent.
- add a very loud `log.error()` in `Status(resp='error')` case-block
ensuring the console yells about the order being cancelled, also
a todo for the weird msg-field recursion nonsense..
Such that backends can deliver their own internal unique
`MktPair.bs_mktid` when they can't seem to get it right via the
`.fqme: str` export.. (COUGH ib, you piece of sh#$).
Also add todo for possibly replacing the msg with a `Position.summary()`
"snapshot" as a better and more rigorously generated wire-ready msg.
Despite a `.bs_mktid` ideally being a bijection with `MktPair.fqme`
values, apparently some backends (cough IB) will switch the .<venue>`
part in txn records resulting in multiple account-conf-file sections for
the same dst asset. Obviously that means we can't allocate new
`Position` entries keyed by that `bs_mktid`, instead be sure to **update
them instead**!
Deats,
- add case logic to avoid pp overwrites using a `pp_objs.get()` check.
- warn on duplicated pos entries whenever the current account-file
entry's `mkt` doesn't match the pre-existing position's.
- mk `Position.add_clear()` return a `bool` indicating if the record was
newly added, warn when it was already existing/added prior.
Also,
- drop the already deprecated `open_pps()`, also from sub-pkg exports.
- draft TODO for `Position.summary()` idea as a replacement for
`BrokerdPosition`-msgs.
That is inside embedded `.accounting.calc.dyn_parse_to_dt()` closure add
an optional `_invalid: list` param to where we can report
bad-timestamped records which we instead override and return as
`from_timestamp(0.)` (when the parser loop falls through) and report
later (in summary ) from the `.accounting.calc.iter_by_dt()` caller. Add
some logging and an optional debug block for future tracing.
NOTE, this commit was re-edited during a conflict between the orig
branches: `dev/binance_api_3.1` & `dev/alt_tpts_for_perf`.
For `[ib]` adjust content to match changes to the
`dockering/ib/README.rst` and for `[deribit]` toss in the WIP options
related params for anyone who wants to play around with @nt's work.
Such that the `brokers.toml` can contain any of the following
<port> = dict|tuple styles,
```toml
[ib.vnc_addrs]
4002 = {host = 'localhost', port = 5900, pw = 'doggy'} # host, port, pw
4002 = {host = 'localhost', port = 5900} # host, port, pw
4002 = ['localhost', 5900] # host, port, pw
```
With the first line demonstrating a vnc-server password (as normally set
via a `.env` file in the `dockering/ib/` subdir) with the `pw =` field.
This obviously removes the hardcoded `'doggy'` password from prior.
Impl details in `.brokers.ib._util`:
- pass the `ib.api.Client` down into `vnc_click_hack()` doing all config
reading within and removing host, port unpacking in the callingn
`data_reset_hack()`.
- also pass the client `try_xdo_manual()` and comment (with plans to
remove) the recently added localhost-only fallback section since
we now have a fully working py vnc client again with `pyvnc` B)
- in `vnc_click_hack()` match for all the possible config line styles
and,
* pass any `pw` field to `pyvncVNCConfig`,
* continue matching host, port without password,
* fallthrough to raising a val-err when neither ^ match.
For the new github image, a high-level look at its basic
features/usage/docs and prosing around our expected default usage with
the `piker.brokers.ib` backend.
It actually works for vncAuth(2) (thank god!) which the previous
`asyncvnc` **did not**, and seems to be mostly based on the work
from the `asyncvnc` author anyway (so all my past efforts don't seem to
have been in vain XD).
NOTE, the below deats ended up being factored in earlier into the
`pyproject.toml` alongside nix(os) support needed for testing and
landing this history. As the such, the comments are the originals but
the changes are not.
Deats,
- switch to `pyvnc` async API (using `asyncio` again obvi) in
`.ib._util._vnc_click_hack()`.
- add `pyvnc` as src installed dep from GH.
- drop `asyncvnc` as dep.
Other,
- update `pytest` version range to avoid weird auto-load plugin exposed
by `xonsh`?
- add a `tool.pytest.ini_options` to project file with vars to,
- disable that^ `xonsh` plug using `addopts = '-p no:xonsh'`.
- set a `testpaths` to avoid running anything but that subdir.
- try out the `'progress'` style console output (does it work?).
Such that if/when the `push()` ticker callback (closure) errors
internally, we actually eventually bubble the error out-and-up from the
`asyncio.Task` and from there out the `.to_asyncio.open_channel_from()` to
the parent `trio.Task`..
It ended up being much more subtle to solve then i would have liked
thanks to,
- whatever `Ticker.updateEvent.connect()` does behind the scenes in
terms of (clearly) swallowing with only log reporting any exc raised
in the registered callback (in our case `push()`),
- `asyncio.Task.set_excepion()` never working and instead needing to
resort to `Task.cancel()`, catching `CancelledError` and re-raising
the stashed `maybe_exc` from `push()` when set..
Further this ports `.to_asyncio.open_channel_from()` usage to use
the new `chan: tractor.to_asyncio.LinkedTaskChannel` fn-sig API, namely
for `_setup_quote_stream()` task. Requires the latest `tractor` updates
to the inter-eventloop-chan iface providing a `.set_nowait()` and
`.get()` for the `asyncio`-side.
Impl deats within `_setup_quote_stream()`,
- implement `push()` error-bubbling by adding a `maybe_exc` which can be
set by that callback itself or by its registering task; when set it is
both,
* reported on by the `teardown()` cb,
* re-raised by the terminated (via `.cancel()`) `asyncio.Task` after
woken from its sleep, aka "cancelled" (since that's apparently one
of the only options.. see big rant further todo comments).
- add explicit error-tolerance-tuning via a `handler_tries: int` counter
and `tries_before_raise: int` limit such that we only bubble
a `push()` raised exc once enough tries have consecutively failed.
- as mentioned, use the new `chan` fn-sig support and thus the new
method API for `asyncio` -> `trio` comms.
- a big TODO XXX around the need to use a better sys for terminating
`asyncio.Task`s whether it's by delegating to some `.to_asyncio`
internals after a factor-out OR by potentially going full bore `anyio`
throughout `.to_asyncio`'s impl in general..
- mk `teardown()` use appropriate `log.<level>()`s based on outcome.
Surroundingly,
- add a ton of doc-strings to mod fns previously missing them.
- improved / added-new comments to `wait_on_data_reset()` internals and
anything changed per ^above.
NOTE, resolved conflicts on `piker/brokers/ib/feed.py` due to
`brokers_refinery` commit:
d809c797 `.brokers.ib.feed`: better `tractor.to_asyncio` typing and var naming throughout!
Such that we can avoid other (pretty unreliable) "alternative" checks to
determine whether a real-time quote should be waited on or (when venue
is closed) we should just signal that historical backfilling can
commence immediately.
This has been a todo for a very long time and it turned out to be much
easier to accomplish than anticipated..
Deats,
- add a new `is_current_time_in_range()` dt range checker to predicate
whether an input range contains `datetime.now(start_dt.tzinfo)`.
- in `.ib.feed.stream_quotes()` add a `venue_is_open: bool` which uses
all of the new ^^ to determine whether to branch for the
short-circuit-and-do-history-now-case or the std real-time-quotes
should-be-awaited-since-venue-is-open, case; drop all the old hacks
trying to workaround not figuring that venue state stuff..
Other,
- also add a gpt5 composed parser to `._util` for the
`ib_insync.ContractDetails.tradingHours: str` for before i realized
there was a `.tradingSessions` property XD
- in `.ib_feed`,
* add various EG-collapsings per recent tractor/trio updates.
* better logging / exc-handling around ticker quote pushes.
* stop clearing `Ticker.ticks` each quote iteration; not sure if this
is needed/correct tho?
* add masked `Ticker.ticks` poll loop that logs.
- fix some `str.format()` usage in `._util.try_xdo_manual()`
NOTE, resolved conflicts on `piker/brokers/ib/feed.py` due to
rebasing onto up stream `brokers_refinery` commit,
d809c797 `.brokers.ib.feed`: better `tractor.to_asyncio` typing and var naming throughout
You'd think they could be bothered to make either a "log" or "warning"
msg type instead of a `type='error'`.. but alas, this attempts to detect
all such "warning"-errors and never proxy them to the clearing engine
thus avoiding the cancellation of any associated (by `reqid`)
pre-existing orders (control dialogs).
Also update all surrounding log messages to a more multiline style.
Since apparently porting to the new docker container enforces using
a vnc password and `asyncvnc` seems to have a bug/mis-config whenever
i've tried a pw over a wg tunnel..?
Soo, this tries out the old `i3ipc`-win-focus + `xdo` click hack when
the above fails.
Deats,
- add a mod-level `try_xdo_manual()` to wrap calling
`i3ipc_xdotool_manual_click_hack()` with an oserr handler, ensure we
don't bother trying if `i3ipc` import fails beforehand tho.
- call ^ from both the orig case block and the failover from the
vnc-client case.
- factor the `+no_setup_msg: str` out to mod level and expect it to be
`.format()`-ed.
- refresh todo around `asyncvnc` pw ish..
- add a new `i3ipc_fin_wins_titled()` window-title scanner which
predicates input `titles` and delivers any matches alongside the orig
focused win at call time.
- tweak `i3ipc_xdotool_manual_click_hack()` to call ^ and remove prior
unfactored window scanning logic.