Compare commits

..

47 Commits

Author SHA1 Message Date
Gud Boi a97f6c8dcf Flip `.tsp._history` logger to explicit mod-name (again) 2026-02-22 22:08:35 -05:00
Gud Boi a940018721 Adjust binance stale-bar detection to 2x tolerance
Change the stale-bar check in `.binance.feed` from `timeframe` to
`timeframe * 2` tolerance to avoid false-positive pauses when bars
are slightly delayed but still within acceptable bounds.

Styling,
- add walrus operator to capture `_time_step` for debugger
  inspection.
- add comment explaining the debug purpose of this check.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi 964f207150 Replace assert with warn for no-gaps in `.storage.cli`
Change `assert aids` to a warning log when no history gaps are found
during `ldshm` gap detection; it is the **ideal case** OBVI. This avoids
crashing the CLI when gap detection finds no issues, which is actually
good news!

Bp

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi fdb3999902 .tsp._history: add gap detection in backfill loop
Add frame-gap detection when `frame_last_dt < end_dt_param` to
warn about potential venue closures or missing data during the
backfill loop in `start_backfill()`.

Deats,
- add `frame_last_dt < end_dt_param` check after frame recv
- log warnings with EST-converted timestamps for clarity
- add `await tractor.pause()` for REPL-investigation on gaps
- add TODO comment about venue closure hour checking
- capture `_until_was_none` walrus var for null-check clarity
- add `last_time` assertion for `time[-1] == next_end_dt`
- rename `_daterr` to `nodata` with `_nodata` capture

Also,
- import `pendulum.timezone` and create `est` tz instance
- change `get_logger()` import from `.data._util` to `.log`
- add parens around `(next_prepend_index - ln) < 0` check

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi d1991b3300 Guard against `None` chart in `ArrowEditor.remove()`
Add null check for `linked.chart` before calling
`.plotItem.removeItem()` to prevent `AttributeError` when chart
is `None`.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi b90edf95a7 .ib.feed: only set `feed_is_live` after first quote
Move `feed_is_live.set()` to after receiving the first valid
quote instead of setting early on venue-closed path. Prevents
sampler registration when no live data expected.

Also,
- drop redundant `.set()` call in quote iteration loop
- add TODO note about sleeping until venue opens vs forever
- init `first_quote: dict` early for consistency

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi ce3d8e7a1e Only register shms w sampler when `feed_is_live`
Add timeout-gated wait for `feed_is_live: trio.Event` before passing shm
tokens to `open_sample_stream()`; skip registering shm-buffers with the
sampler if the feed doesn't "go live" within a new timeout.

The main motivation here is to avoid the sampler incrementing shm-array
bufs when the mkt-venue is closed so that a trailing "same price"
line/bars isn't updated/rendered in the chart's view when unnecessary.

Deats,
- add `wait_for_live_timeout: float = 0.5` param to `manage_history()`
- warn-log the fqme when timeout triggers
- add error log for invalid `frame_start_dt` comparisons to
  `maybe_fill_null_segments()`.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi f2b04c4071 Clarify `register_with_sampler()` started type and vars
Markup `ctx.started()` type-sig as `set[int]`, rename binding var
`first` to `shm_periods` and add type hints for clarity on context mgr
unpacking.

Also,
- whitespace cleanup: `Type | None` -> `Type|None` throughout
- format long lines: `.setdefault()`, `await ctx.started()`
- fix backtick style in docstrings for consistency
- add placeholder TODO comment for `feed_is_live` check; it might be
  more rigorous to pass the syncing state down thru all this?

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi e75c3d8a34 Ignore single-zero-sample trace on no runtime.. 2026-02-22 22:08:35 -05:00
Gud Boi be4adfc202 ib.feed: drop legacy "quote-with-vlm" polling
Since now we explicitly check each mkt's venue hours now we don't need
this mega hacky "waiting on a quote with real vlm" stuff to determing
whether historical data should be loaded immediately. This approach also
had the added complexity that we needed to handle edge cases for tickers
(like xauusd.cmdty) which never have vlm.. so it's nice to be rid of it
all ;p
2026-02-22 22:08:35 -05:00
Gud Boi 763faa0cc1 Always overwrite tsdb duplicates found during backfill
Enable the previously commented-out dedupe-and-write logic in
`start_backfill()` to ensure tsdb stays clean of duplicate
entries.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi 0f1f2e263d For claude, ignore no runtime for offline shm reading 2026-02-22 22:08:35 -05:00
Gud Boi fd92cd99c2 .ib._util: ignore attr err on click-hack twm wakeups? 2026-02-22 22:08:35 -05:00
Gud Boi 8c2fd7c780 Use `get_fonts()`, add `show_txt` flag to gap annots
Switch `.tsp._annotate.markup_gaps()` to use new
`.ui._style.get_fonts()` API for font size calc on client side and add
optional `show_txt: bool` flag to toggle gap duration labels (with
default `False`).

Also,
- replace `sgn` checks with named bools: `up_gap`, `down_gap`
- use `small_font.px_size - 1` for gap label font sizing
- wrap text creation in `if show_txt:` block
- update IPC handler to use `get_fonts()` vs direct `_font` import

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi 1016f54c98 Add `get_fonts()` API and fix `.px_size` for non-Qt ctxs
Add a public `.ui._style.get_fonts()` helper to retrieve the
`_font[_small]: DpiAwareFont` singleton pair. Adjust
`DpiAwareFont.px_size` to return `conf.toml` value when Qt returns `-1`
(no active Qt app).

Also,
- raise `ValueError` with detailed msg if both Qt and a conf-lookup fail
- add some more type union whitespace cleanups: `int | None` -> `int|None`

(this commit-msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi b4944916c9 Relay annot creation failures with err-dict resps
Change annot-ctl APIs to return `None` on failure instead of invalid
`aid`s. Server now sends `{'error': msg}` dict on failures, client
match-blocks handle gracefully.

Also,
- update return types: `.add_rect()`, `.add_arrow()`, `.add_text()`
  now return `int|None`
- match on `{'error': str(msg)}` in client IPC receive blocks
- send error dicts from server on timestamp lookup failures
- add failure handling in `markup_gaps()` to skip bad rects

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi 3f001cc1f6 Do time-based shm-index lookup for annots on server
Fix annotation misalignment during backfill by switching from
client-computed indices to server-side timestamp lookups against
current shm state. Store absolute coords on annotations and
reposition on viz redraws.

Lowlevel impl deats,
- add `time` param to `.add_arrow()`, `.add_text()`, `.add_rect()`
- lookup indices from shm via timestamp matching in IPC handlers
- force chart redraw before `markup_gaps()` annotation creation
- wrap IPC send/receive in `trio.fail_after(3)` for timeout when
  server fails to respond, likely hangs on no-case-match/error.
- cache `_meth`/`_kwargs` on rects, `_abs_x`/`_abs_y` on arrows
- auto-reposition all annotations after viz reset in redraw cmd

Also,
- handle `KeyError` for missing timeframes in chart lookup
- return `-1` aid on annotation creation failures (lol oh `claude`..)
- reconstruct rect positions from timestamps + BGM offset logic
- log repositioned annotation counts on viz redraw

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi 0845b257d9 Add buffer capacity checks to backfill loop
Prevent `ValueError` from negative prepend index in
`start_backfill()` by checking buffer space before push
attempts. Truncate incoming frame if needed and stop gracefully
when buffer full.

Also,
- add pre-push capacity check with frame truncation logic
- stop backfill when `next_prepend_index <= 0`
- log warnings for capacity exceeded and buffer-full conditions

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi 7964cc3cf4 Drop decimal points for whole-number durations
Adjust `humanize_duration()` to show "3h" instead of "3.0h" when the
duration value is a whole number, making labels cleaner.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi 4e7e4a7a1b Add `font_size` param to `AnnotCtl.add_text()` API
Expose font sizing control for `pg.TextItem` annotations thru the
annot-ctl API. Default to `_font.font.pixelSize() - 3` when no
size provided.

Also,
- thread `font_size` param thru IPC handler in `serve_rc_annots()`
- apply font via `QFont.setPixelSize()` on text item creation
- add `?TODO` note in `markup_gaps()` re using `conf.toml` value
- update `add_text()` docstring with font_size param desc

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi fe11f79f21 Add humanized duration labels to gap annotations
Introduce `humanize_duration()` helper in `.tsp._annotate` to
convert seconds to short human-readable format (d/h/m/s). Extend
annot-ctl API with `add_text()` method for placing `pg.TextItem`
labels on charts.

Also,
- add duration labels on RHS of gap arrows in `markup_gaps()`
- handle text item removal in `rm_annot()` match block
- expose `TextItem` cmd in `serve_rc_annots()` IPC handler
- use `hcolor()` for named-to-hex color conversion
- set anchor positioning for up vs down gaps

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:35 -05:00
Gud Boi 40cbc8546d .ib.feed: trim bars frame to `start_dt` 2026-02-22 22:08:35 -05:00
Gud Boi e3d7077f18 ib._util: ignore timeout-errs when crash-handling `pyvnc` connects 2026-02-22 22:08:35 -05:00
Gud Boi 7ddcf5893e Lul, woops compare against first-dt in `.ib.feed` bars frame.. 2026-02-22 22:08:35 -05:00
Gud Boi b1d6c595ec Expose more `pg.ArrowItem` params thru annot-ctl API 2026-02-22 22:08:35 -05:00
Gud Boi 33ec37a83f Add `pexpect`, `xonsh`@github:main to deps
The former bc `claude` needs it for its new "offline" REPL simulator
script `snippets/claude_debug_helper.py` and pin to `xonsh` git mainline
to get the fancy new next cmd/suggestion prompt feats (which @goodboy is
using from `modden` already). Bump lock file to match.

Ah right, and for now while hackin pin to a local `tractor` Bp
2026-02-22 22:08:35 -05:00
Gud Boi e6c7834a01 Add break for single bar null segments 2026-02-22 22:08:35 -05:00
Gud Boi 4ef5a5beb8 Space gap rect-annots "between" start-end bars 2026-02-22 22:08:35 -05:00
Gud Boi 11e95d9cbf Catch too-early ib hist frames
For now by REPLing them and raising an RTE inside `.ib.feed` as well as
tracing any such cases that make it (from other providers) up to the
`.tsp._history` layer during null-segment backfilling.
2026-02-22 22:08:34 -05:00
Gud Boi 51ca9cd4d9 Add arrow indicators to time gaps
Such that they're easier to spot when zoomed out, a similar color to the
`RectItem`s and also remote-controlled via the `AnnotCtl` api.

Deats,
- request an arrow per gap from `markup_gaps()` using a new
  `.add_arrow()` meth, set the color, direction and alpha with
  position always as the `iend`/close of the last valid bar.
- extend the `.ui._remote_ctl` subys to support the above,
  * add a new `AnnotCtl.add_arrow()`.
  * add the service-side IPC endpoint for a 'cmd': 'ArrowEditor'.
- add a new `rm_annot()` helper to ensure the right graphics removal
  API is used by annotation type:
  * `pg.ArrowItem` looks up the `ArrowEditor` and calls `.remove(annot).
  * `pg.SelectRect` keeps with calling `.delete()`.
- global-ize an `_editors` table to enable the prior.
- add an explicit RTE for races on the chart-actor's `_dss` init.
2026-02-22 22:08:34 -05:00
Gud Boi 27d077ade5 Arrow editor refinements in prep for gap checker
Namely exposing `ArrowEditor.add()` params to provide access to
coloring/transparency settings over the remote-ctl annotation API and
also adding a new `.remove_all()` to easily clear all arrows from
a single call. Also add `.remove()` compat methods to the other editors
(i.e. for lines, rects).
2026-02-22 22:08:34 -05:00
Gud Boi 9257af02b9 Mv `markup_gaps()` to new `.tsp._annotate` mod 2026-02-22 22:08:34 -05:00
Gud Boi 582f9be02f Enable tracing back insert backfills
Namely insertion writes which over-fill the shm buffer past the latest
tsdb sample via `.tsp._history.shm_push_in_between()`.

Deats,
- check earliest `to_push` timestamp and enter pause point if it's
  earlier then the tsdb's `backfill_until_dt` stamp.
- requires actually passing the `backfill_until_dt: datetime` thru,
  * `get_null_segs()`
  * `maybe_fill_null_segments()`
  * `shm_push_in_between()` (obvi XD)
2026-02-22 22:08:34 -05:00
Gud Boi 5f6e24f55c Tolerate various "bad data" cases in `markup_gaps()`
Namely such that when the previous-df-row by our shm-abs-'index' doesn't
exist we ignore certain cases which are likely due to borked-but-benign
samples written to the tsdb or rt shm buffers prior.

Particularly we now ignore,
- any `dt`/`prev_dt` values which are UNIX-epoch timestamped (val of 0).
- any row-is-first-row in the df; there is no previous.
- any missing previous datum by 'index', in which case we lookup the
  `wdts` prior row and use that instead.
  * this would indicate a missing sample for the time-step but we can
    still detect a "gap" by looking at the prior row, by df-abs-index
    `i`, and use its timestamp to determine the period/size of missing
    samples (which need to likely still be retrieved).
  * in this case i'm leaving in a pause-point for introspecting these
    rarer cases when `--pdb` is passed via CLI.

Relatedly in the `piker store` CLI ep,
- add `--pdb` flag to `piker store`, pass it verbatim as `debug_mode`.
- when `times` has only a single row, don't calc a `period_s` median.
- only trace `null_segs` when in debug mode.
- always markup/dedupe gaps for `period_s==60`
2026-02-22 22:08:34 -05:00
Gud Boi bd418078ca ib: up API timeout default for remote host conns 2026-02-22 22:08:34 -05:00
Gud Boi d5af471192 Add vlm-based "smart" OHLCV de-duping & bar validation
Using `claude`, add a `.tsp._dedupe_smart` module that attemps "smarter"
duplicate bars by attempting to distinguish between erroneous bars
partially written during concurrent backfill race conditions vs.
**actual** data quality issues from historical providers.

Problem:
--------
Concurrent writes (live updates vs. backfilling) can result in create
duplicate timestamped ohlcv vars with different values. Some
potential scenarios include,

- a market live feed is cancelled during live update resulting in the
  "last" datum being partially updated with all the ticks for the
  time step.
- when the feed is rebooted during charting, the backfiller will not
  finalize this bar since rn it presumes it should only fill data for
  time steps not already in the tsdb storage.

Our current naive  `.unique()` approach obvi keeps the incomplete bar
and a "smarter" approach is to compare the provider's final vlm
amount vs. the maybe-cancelled tsdb's bar; a higher vlm value from
the provider likely indicates the cancelled-during-live-write and
**not** a datum discrepancy from said data provider.

Analysis (with `claude`) of `zecusdt` data revealed:
- 1000 duplicate timestamps
- 999 identical bars (pure duplicates from 2022 backfill overlap)
- 1 volume-monotonic conflict (live partial vs backfill complete)

A soln from `claude` -> `tsp._dedupe_smart.dedupe_ohlcv_smart()`
which:
- sorts by vlm **before** deduplication and keep the most complete
  bar based on vlm monotonicity as well as the following OHLCV
  validation assumptions:
  * volume should always increase
  * high should be non-decreasing,
  * low should be non-increasing
  * open should be identical
- Separates valid race conditions from provider data quality issues
  and reports and returns both dfs.

Change summary by `claude`:
- `.tsp._dedupe_smart`: new module with validation logic
- `.tsp.__init__`: expose `dedupe_ohlcv_smart()`
- `.storage.cli`: integrate smart dedupe, add logging for:
  * duplicate counts (identical vs monotonic races)
  * data quality violations (non-monotonic, invalid OHLC ranges)
  * warnings for provider data issues
- Remove `assert not diff` (duplicates are valid now)

Verified on `zecusdt`: correctly keeps index 3143645
(volume=287.777) over 3143644 (volume=140.299) for
conflicting 2026-01-16 18:54 UTC bar.

`claude`'s Summary of reasoning
-------------------------------
- volume monotonicity is critical: a bar's volume only increases
  during its time window.
- a backfilled bar should always have volume >= live updated.
- violations indicate any of:
  * Provider data corruption
  * Non-OHLCV aggregation semantics
  * Timestamp misalignment

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:34 -05:00
Gud Boi 6c28b1cbbc Add `pexpect`-based `pdbp`-REPL offline helper
Add a new `snippets/claude_debug_helper.py` to
provide a programmatic interface to `tractor.pause()` debugger
sessions for incremental data inspection matching the interactive UX
but able to be run by `claude` "offline" since it can't seem to feed
stdin (so it claims) to the `pdb` instance due to lack of ability to
allocate a tty internally.

The script-wrapper is based on `tractor`'s `tests/devx/` suite's use of
`pexpect` patterns for driving `pdbp` prompts and thus enables
automated-offline execution of REPL-inspection commands **without**
using incremental-realtime output capture (like a human would use it).

Features:
- `run_pdb_commands()`: batch command execution
- `InteractivePdbSession`: context manager for step-by-step REPL interaction
- `expect()` wrapper: timeout handling with buffer display
- Proper stdin/stdout handling via `pexpect.spawn()`

Example usage:
```python
from debug_helper import InteractivePdbSession

with InteractivePdbSession(
    cmd='piker store ldshm zecusdt.usdtm.perp.binance'
) as session:
    session.run('deduped.shape')
    session.run('step_gaps.shape')
```

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:34 -05:00
Gud Boi 8af8ac4f7b Fix polars 1.36.0 duration API
Polars tightened type safety for `.dt` accessor methods requiring
`total_*` methods for duration types vs datetime component accessors
like `day()` which now only work on datetime dtypes.

`detect_time_gaps()` in `.tsp._anal` was calling `.dt.day()`
on `dt_diff` column (a duration from `.diff()`) which throws
`InvalidOperationError` on modern polars.

Changes:
- use f-string to add pluralization to map time unit strings to
  `total_<unit>s` form for the new duration API.
- Handle singular/plural forms: 'day' -> 'days' -> 'total_days'
- Ensure trailing 's' before applying 'total_' prefix

Also updates inline comments explaining the polars type distinction
between datetime components vs duration totals.

Fixes `piker store ldshm` crashes on datasets with time gaps.

(this patch was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 22:08:34 -05:00
Tyler Goodlet f4d9090d6d `.storage.__init__`: code styling updates 2026-02-22 22:08:34 -05:00
Tyler Goodlet b0953ecbee `.tsp._history`: drop `feed_is_live` syncing, another seg flag
The `await feed_is_live.wait()` is more or less pointless and would only
cause slower startup afaig (as-far-as-i-grok) so i'm masking it here.
This also removes the final `strict_exception_groups=False` use from the
non-tests code base, flipping to the `tractor.trionics` collapser once
and for all!
2026-02-22 22:08:34 -05:00
Tyler Goodlet b5e4c83341 Woops, keep `np2pl` exposed from `.tsp` 2026-02-22 22:08:34 -05:00
Tyler Goodlet 0ef98ba25c Factor to a new `.tsp._history` sub-mod
Cleaning out the `piker.tsp` pkg-mod to be only the (re)exports needed
for `._anal`/`._history` refs-use elsewhere!
2026-02-22 22:08:34 -05:00
Gud Boi 6b70fea5d4 Merge pull request 'Tpt-tolerance adjustments for latest `tractor`' (#73)
Reviewed-on: #73
2026-02-23 03:08:18 +00:00
Gud Boi 4e24cb1bff Adjust sampler's "IPC-dropped" log msg styling
Refmt the "connection-dropped" error-log in `Sampler`'s broadcast loop
to show error type first, then the IPC context details; mks it all
easier to grok/less-noisy on console imo.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 20:08:20 -05:00
Gud Boi 3d83b61f3f Wrap `open_autorecon_ws()` body for comms failures
Add outer `try/except` around the nursery block in
`open_autorecon_ws()` to catch any `NoBsWs.recon_errors` that
escape the inner reconnect loop, logging a warning instead of
propagating.

Also,
- correct `NoBsWs.recon_errors` typing to `tuple[Type[Exception]]`.

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 20:08:20 -05:00
Gud Boi 6f390dc88c Add timeout + shielding to `NoBsWs` reconnect logic
Add timeout param to `.reset()` and `.send_msg()` to prevent
indefinite blocking on reconnect attempts. Shield reconnect
sleeps from cancellation to ensure we avoid any "finally footgun" type
scenarios where `trio.Cancelled` masks an underlying exc per,
- https://github.com/goodboy/tractor/pull/387
- https://github.com/goodboy/tractor/pull/391

Deats,
- add `timeout` param to `.reset()`, return `bool` for success
- add `timeout=3` default to `.send_msg()` for reconnect wait
- shield `.reset()` call in `.send_msg()` error handler
- log warning when reconnect timeout exceeded
- shield throttled sleeps in `_reconnect_forever()` error paths

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 20:08:20 -05:00
Gud Boi e1f3d7c3f8 Handle `tractor.TransportClosed` as "stream-closed"
In both the ems and sampler since on new `tractor` this is the
"wrapping" exception raised when the transport layer terminates early
but in a psuedo-"graceful" way, expected when a peer actors disconnect.
Previously we were crashing in this case since old `tractor` just raised
the underlying `trio`-source-exceptions verbatim.

Also,
- use `Aid.reprol()` in log msgs vs old `.chan.uid` refs

(this commit msg was generated in some part by [`claude-code`][claude-code-gh])
[claude-code-gh]: https://github.com/anthropics/claude-code
2026-02-22 20:08:20 -05:00
3 changed files with 64 additions and 34 deletions

View File

@ -729,6 +729,7 @@ class Router(Struct):
except ( except (
trio.ClosedResourceError, trio.ClosedResourceError,
trio.BrokenResourceError, trio.BrokenResourceError,
tractor.TransportClosed,
): ):
to_remove.add(client_stream) to_remove.add(client_stream)
log.warning( log.warning(
@ -1699,5 +1700,5 @@ async def _emsd_main(
if not client_streams: if not client_streams:
log.warning( log.warning(
f'Order dialog is not being monitored:\n' f'Order dialog is not being monitored:\n'
f'{oid} ->\n{client_stream._ctx.chan.uid}' f'{oid!r} <-> {client_stream.chan.aid.reprol()}\n'
) )

View File

@ -99,6 +99,7 @@ class Sampler:
trio.BrokenResourceError, trio.BrokenResourceError,
trio.ClosedResourceError, trio.ClosedResourceError,
trio.EndOfChannel, trio.EndOfChannel,
tractor.TransportClosed,
) )
# holds all the ``tractor.Context`` remote subscriptions for # holds all the ``tractor.Context`` remote subscriptions for
@ -291,9 +292,10 @@ class Sampler:
except self.bcast_errors as err: except self.bcast_errors as err:
log.error( log.error(
f'Connection dropped for IPC ctx\n' f'Connection dropped for IPC ctx due to,\n'
f'{stream._ctx}\n\n' f'{type(err)!r}\n'
f'Due to {type(err)}' f'\n'
f'{stream._ctx}'
) )
borked.add(stream) borked.add(stream)
else: else:
@ -753,7 +755,7 @@ async def sample_and_broadcast(
log.warning( log.warning(
f'Feed OVERRUN {sub_key}' f'Feed OVERRUN {sub_key}'
f'@{bus.brokername} -> \n' f'@{bus.brokername} -> \n'
f'feed @ {chan.uid}\n' f'feed @ {chan.aid.reprol()}\n'
f'throttle = {throttle} Hz' f'throttle = {throttle} Hz'
) )

View File

@ -31,6 +31,7 @@ from typing import (
AsyncContextManager, AsyncContextManager,
AsyncGenerator, AsyncGenerator,
Iterable, Iterable,
Type,
) )
import json import json
@ -67,7 +68,7 @@ class NoBsWs:
''' '''
# apparently we can QoS for all sorts of reasons..so catch em. # apparently we can QoS for all sorts of reasons..so catch em.
recon_errors = ( recon_errors: tuple[Type[Exception]] = (
ConnectionClosed, ConnectionClosed,
DisconnectionTimeout, DisconnectionTimeout,
ConnectionRejected, ConnectionRejected,
@ -105,7 +106,10 @@ class NoBsWs:
def connected(self) -> bool: def connected(self) -> bool:
return self._connected.is_set() return self._connected.is_set()
async def reset(self) -> None: async def reset(
self,
timeout: float,
) -> bool:
''' '''
Reset the underlying ws connection by cancelling Reset the underlying ws connection by cancelling
the bg relay task and waiting for it to signal the bg relay task and waiting for it to signal
@ -114,18 +118,31 @@ class NoBsWs:
''' '''
self._connected = trio.Event() self._connected = trio.Event()
self._cs.cancel() self._cs.cancel()
with trio.move_on_after(timeout) as cs:
await self._connected.wait() await self._connected.wait()
return True
assert cs.cancelled_caught
return False
async def send_msg( async def send_msg(
self, self,
data: Any, data: Any,
timeout: float = 3,
) -> None: ) -> None:
while True: while True:
try: try:
msg: Any = self._dumps(data) msg: Any = self._dumps(data)
return await self._ws.send_message(msg) return await self._ws.send_message(msg)
except self.recon_errors: except self.recon_errors:
await self.reset() with trio.CancelScope(shield=True):
reconnected: bool = await self.reset(
timeout=timeout,
)
if not reconnected:
log.warning(
'Failed to reconnect after {timeout!r}s ??'
)
async def recv_msg(self) -> Any: async def recv_msg(self) -> Any:
msg: Any = await self._rx.receive() msg: Any = await self._rx.receive()
@ -191,7 +208,9 @@ async def _reconnect_forever(
f'{src_mod}\n' f'{src_mod}\n'
f'{url} connection bail with:' f'{url} connection bail with:'
) )
with trio.CancelScope(shield=True):
await trio.sleep(0.5) await trio.sleep(0.5)
rent_cs.cancel() rent_cs.cancel()
# go back to reonnect loop in parent task # go back to reonnect loop in parent task
@ -291,6 +310,7 @@ async def _reconnect_forever(
log.exception( log.exception(
'Reconnect-attempt failed ??\n' 'Reconnect-attempt failed ??\n'
) )
with trio.CancelScope(shield=True):
await trio.sleep(0.2) # throttle await trio.sleep(0.2) # throttle
raise berr raise berr
@ -351,6 +371,7 @@ async def open_autorecon_ws(
rcv: trio.MemoryReceiveChannel rcv: trio.MemoryReceiveChannel
snd, rcv = trio.open_memory_channel(616) snd, rcv = trio.open_memory_channel(616)
try:
async with ( async with (
tractor.trionics.collapse_eg(), tractor.trionics.collapse_eg(),
trio.open_nursery() as tn trio.open_nursery() as tn
@ -378,6 +399,12 @@ async def open_autorecon_ws(
finally: finally:
tn.cancel_scope.cancel() tn.cancel_scope.cancel()
except NoBsWs.recon_errors as con_err:
log.warning(
f'Entire ws-channel disconnect due to,\n'
f'con_err: {con_err!r}\n'
)
''' '''
JSONRPC response-request style machinery for transparent multiplexing JSONRPC response-request style machinery for transparent multiplexing