`.accounting._ledger`: typing anda more multiline styling

Drop some bps and style logic to multiline
`.accounting` add synopsis section to readme
2025-02-19 17:56:10 -05:00 · 2025-02-19 17:56:10 -05:00 · 2025-02-19 17:56:10 -05:00 · 2025-02-19 17:52:14 -05:00 · 2025-02-19 17:31:10 -05:00 · 2025-02-19 17:31:10 -05:00
15 changed files with 355 additions and 154 deletions
--- a/piker/accounting/README.rst
+++ b/piker/accounting/README.rst
@ -1,8 +1,40 @@
-.accounting
-----------
+piker.accounting
+________________
 A subsystem for transaction processing, storage and historical
 measurement.

+synopsis
+--------
+The big question for any trader is this:
+
+*what is the price that determines whether i take a loss or a gain on my
+trade?*
+
+In other words, at any given state of accounting your current assets,
+what is the price between any 2 assets you've transacted that determines
+at which price you can conduct **the next** transaction and know if you
+are making or losing more (or less) of the *source* asset versus the
+*destination* asset?
+
+Let's do a very simple example:
+
+> Joe wants to buy some tacos bc they're super hungo.
+> Joe has a friend who also likes tacos but doesn't mind if they're fresh; he doesn't mind having day old tacos.
+> Inflation is rampant and taco prices are trending up for no good reason besides everyone thinks prices are going up.
+> Joe goes to the taco stand and buys 4 tacos at 25 mxn.
+> This makes Joe's net cost `4 * 25 = 200` mxn.
+> Joe eats 3 tacos and realizes that he can't finish the last, so he puts it in the fridge to save for the next day (since he owns a comal).
+> The next day the price of tacos goes up to 30 mxn (for no good reason > besides the taco stand noticing Joe is a tourist and that > "inflation" is some thing that's used as an excuse for price changes).
+> Joe's friend from before got lit up (like he does every morning) and msgs Joe to buy him 2 tacos for when he shows up in the late morning.
+> Joe says "sure, but i also have a leftover if you want it, and I'm fasting today so you can have my sobras and i'll buy you a new one".
+> The friend coughs a couple times, and says "yee no problem man, just make sure you get them"
+>
+
+
+Prior *suit* definitions:
+
+- the canucks equiv of the IRS call this idea ["Adjusted cost base"](https://www.canada.ca/en/revenue-agency/services/tax/individuals/topics/about-your-tax-return/tax-return/completing-a-tax-return/personal-income/line-12700-capital-gains/definitions-capital-gains.html#Adjustedcostbase)
+

 .pnl
 ----
--- a/piker/accounting/_ledger.py
+++ b/piker/accounting/_ledger.py
@ -40,7 +40,7 @@ import tomli_w  # for fast ledger writing

 from piker.types import Struct
 from piker import config
-from ..log import get_logger
+from piker.log import get_logger
 from .calc import (
    iter_by_dt,
 )
@ -239,7 +239,9 @@ class TransactionLedger(UserDict):

        symcache: SymbologyCache = self._symcache
        towrite: dict[str, Any] = {}
-        for tid, txdict in self.tx_sort(self.data.copy()):
+        for tid, txdict in self.tx_sort(
+            self.data.copy()
+        ):
            # write blank-str expiry for non-expiring assets
            if (
                'expiry' in txdict
@ -377,7 +379,7 @@ def open_trade_ledger(
        account,
        dirpath=_fp,
    )
-    cpy = ledger_dict.copy()
+    cpy: dict = ledger_dict.copy()

    # XXX NOTE: if not provided presume we are being called from
    # sync code and need to maybe run `trio` to generate..
@ -406,7 +408,13 @@ def open_trade_ledger(
        account=account,
        mod=mod,
        symcache=symcache,
-        tx_sort=getattr(mod, 'tx_sort', tx_sort),
+
+        # NOTE: allow backends to provide custom ledger sorting
+        tx_sort=getattr(
+            mod,
+            'tx_sort',
+            tx_sort,
+        ),
    )
    try:
        yield ledger
--- a/piker/accounting/_mktinfo.py
+++ b/piker/accounting/_mktinfo.py
@ -305,8 +305,8 @@ class MktPair(Struct, frozen=True):
    # config right?
    # src_type: AssetTypeName

-    # for derivs, info describing contract, egs.
-    # strike price, call or put, swap type, exercise model, etc.
+    # for derivs, info describing contract, egs. strike price, call
+    # or put, swap type, exercise model, etc.
    contract_info: list[str] | None = None

    # TODO: rename to sectype since all of these can
--- a/piker/accounting/_pos.py
+++ b/piker/accounting/_pos.py
@ -30,7 +30,8 @@ from types import ModuleType
 from typing import (
    Any,
    Iterator,
-    Generator
+    Generator,
+    TYPE_CHECKING,
 )

 import pendulum
@ -59,8 +60,10 @@ from ..clearing._messages import (
    BrokerdPosition,
 )
 from piker.types import Struct
-from piker.data._symcache import SymbologyCache
-from ..log import get_logger
+from piker.log import get_logger
+
+if TYPE_CHECKING:
+    from piker.data._symcache import SymbologyCache

 log = get_logger(__name__)

@ -493,6 +496,17 @@ class Account(Struct):

        _mktmap_table: dict[str, MktPair] | None = None,

+        only_require: list[str]|True = True,
+        # ^list of fqmes that are "required" to be processed from
+        # this ledger pass; we often don't care about others and
+        # definitely shouldn't always error in such cases.
+        # (eg. broker backend loaded that doesn't yet supsport the
+        # symcache but also, inside the paper engine we don't ad-hoc
+        # request `get_mkt_info()` for every symbol in the ledger,
+        # only the one for which we're simulating against).
+        # TODO, not sure if there's a better soln for this, ideally
+        # all backends get symcache support afap i guess..
+
    ) -> dict[str, Position]:
        '''
        Update the internal `.pps[str, Position]` table from input
@ -535,11 +549,32 @@ class Account(Struct):
                if _mktmap_table is None:
                    raise

+                required: bool = (
+                    only_require is True
+                    or (
+                        only_require is not True
+                        and
+                        fqme in only_require
+                    )
+                )
                # XXX: caller is allowed to provide a fallback
                # mktmap table for the case where a new position is
                # being added and the preloaded symcache didn't
                # have this entry prior (eg. with frickin IB..)
-                mkt = _mktmap_table[fqme]
+                if (
+                    not (mkt := _mktmap_table.get(fqme))
+                    and
+                    required
+                ):
+                    raise
+
+                elif not required:
+                    continue
+
+                else:
+                    # should be an entry retreived somewhere
+                    assert mkt
+

            if not (pos := pps.get(bs_mktid)):

@ -656,7 +691,7 @@ class Account(Struct):
    def write_config(self) -> None:
        '''
        Write the current account state to the user's account TOML file, normally
-        something like ``pps.toml``.
+        something like `pps.toml`.

        '''
        # TODO: show diff output?
--- a/piker/accounting/calc.py
+++ b/piker/accounting/calc.py
@ -251,10 +251,16 @@ def iter_by_dt(
        for k in parsers:
            if (
                isdict and k in tx
-                 or getattr(tx, k, None)
+                or
+                getattr(tx, k, None)
            ):
-                v = tx[k] if isdict else tx.dt
-                assert v is not None, f'No valid value for `{k}`!?'
+                v = (
+                    tx[k] if isdict
+                    else tx.dt
+                )
+                assert v is not None, (
+                    f'No valid value for `{k}`!?'
+                )

                # only call parser on the value if not None from
                # the `parsers` table above (when NOT using
@ -269,8 +275,21 @@ def iter_by_dt(
                    return v

        else:
-            # XXX: should never get here..
-            breakpoint()
+            # TODO: move to top?
+            from piker.log import get_logger
+            log = get_logger(__name__)
+
+            # XXX: we should really never get here..
+            # only if a ledger record has no expected sort(able)
+            # field will we likely hit this.. like with ze IB.
+            # if no sortable field just deliver epoch?
+            log.warning(
+                'No (time) sortable field for TXN:\n'
+                f'{tx}\n'
+            )
+            return from_timestamp(0)
+            # breakpoint()
+

    entry: tuple[str, dict] | Transaction
    for entry in sorted(
--- a/piker/accounting/cli.py
+++ b/piker/accounting/cli.py
@ -300,7 +300,8 @@ def disect(
        assert not df.is_empty()

        # muck around in pdbp REPL
-        breakpoint()
+        # tractor.devx.mk_pdb().set_trace()
+        # breakpoint()

        # TODO: we REALLY need a better console REPL for this
        # kinda thing..
--- a/piker/clearing/_paper_engine.py
+++ b/piker/clearing/_paper_engine.py
@ -653,6 +653,7 @@ async def open_trade_dialog(
                # in) use manually constructed table from calling
                # the `.get_mkt_info()` provider EP above.
                _mktmap_table=mkt_by_fqme,
+                only_require=list(mkt_by_fqme),
            )

            pp_msgs: list[BrokerdPosition] = []
--- a/piker/data/_symcache.py
+++ b/piker/data/_symcache.py
@ -31,6 +31,7 @@ from pathlib import Path
 from pprint import pformat
 from typing import (
    Any,
+    Callable,
    Sequence,
    Hashable,
    TYPE_CHECKING,
@ -56,7 +57,7 @@ from piker.brokers import (
 )

 if TYPE_CHECKING:
-    from ..accounting import (
+    from piker.accounting import (
        Asset,
        MktPair,
    )
@ -149,57 +150,68 @@ class SymbologyCache(Struct):
                    'Implement `Client.get_assets()`!'
                )

-            if get_mkt_pairs := getattr(client, 'get_mkt_pairs', None):
-
-                pairs: dict[str, Struct] = await get_mkt_pairs()
-                for bs_fqme, pair in pairs.items():
-
-                    # NOTE: every backend defined pair should
-                    # declare it's ns path for roundtrip
-                    # serialization lookup.
-                    if not getattr(pair, 'ns_path', None):
-                        raise TypeError(
-                            f'Pair-struct for {self.mod.name} MUST define a '
-                            '`.ns_path: str`!\n'
-                            f'{pair}'
-                        )
-
-                    entry = await self.mod.get_mkt_info(pair.bs_fqme)
-                    if not entry:
-                        continue
-
-                    mkt: MktPair
-                    pair: Struct
-                    mkt, _pair = entry
-                    assert _pair is pair, (
-                        f'`{self.mod.name}` backend probably has a '
-                        'keying-symmetry problem between the pair-`Struct` '
-                        'returned from `Client.get_mkt_pairs()`and the '
-                        'module level endpoint: `.get_mkt_info()`\n\n'
-                        "Here's the struct diff:\n"
-                        f'{_pair - pair}'
-                    )
-                    # NOTE XXX: this means backends MUST implement
-                    # a `Struct.bs_mktid: str` field to provide
-                    # a native-keyed map to their own symbol
-                    # set(s).
-                    self.pairs[pair.bs_mktid] = pair
-
-                    # NOTE: `MktPair`s are keyed here using piker's
-                    # internal FQME schema so that search,
-                    # accounting and feed init can be accomplished
-                    # a sane, uniform, normalized basis.
-                    self.mktmaps[mkt.fqme] = mkt
-
-                self.pair_ns_path: str = tractor.msg.NamespacePath.from_ref(
-                    pair,
-                )
-
-            else:
+            get_mkt_pairs: Callable|None = getattr(
+                client,
+                'get_mkt_pairs',
+                None,
+            )
+            if not get_mkt_pairs:
                log.warning(
                    'No symbology cache `Pair` support for `{provider}`..\n'
                    'Implement `Client.get_mkt_pairs()`!'
                )
+                return self
+
+            pairs: dict[str, Struct] = await get_mkt_pairs()
+            if not pairs:
+                log.warning(
+                    'No pairs from intial {provider!r} sym-cache request?\n\n'
+                    '`Client.get_mkt_pairs()` -> {pairs!r} ?'
+                )
+                return self
+
+            for bs_fqme, pair in pairs.items():
+                if not getattr(pair, 'ns_path', None):
+                    # XXX: every backend defined pair must declare
+                    # a `.ns_path: tractor.NamespacePath` to enable
+                    # roundtrip serialization lookup from a local
+                    # cache file.
+                    raise TypeError(
+                        f'Pair-struct for {self.mod.name} MUST define a '
+                        '`.ns_path: str`!\n\n'
+                        f'{pair!r}'
+                    )
+
+                entry = await self.mod.get_mkt_info(pair.bs_fqme)
+                if not entry:
+                    continue
+
+                mkt: MktPair
+                pair: Struct
+                mkt, _pair = entry
+                assert _pair is pair, (
+                    f'`{self.mod.name}` backend probably has a '
+                    'keying-symmetry problem between the pair-`Struct` '
+                    'returned from `Client.get_mkt_pairs()`and the '
+                    'module level endpoint: `.get_mkt_info()`\n\n'
+                    "Here's the struct diff:\n"
+                    f'{_pair - pair}'
+                )
+                # NOTE XXX: this means backends MUST implement
+                # a `Struct.bs_mktid: str` field to provide
+                # a native-keyed map to their own symbol
+                # set(s).
+                self.pairs[pair.bs_mktid] = pair
+
+                # NOTE: `MktPair`s are keyed here using piker's
+                # internal FQME schema so that search,
+                # accounting and feed init can be accomplished
+                # a sane, uniform, normalized basis.
+                self.mktmaps[mkt.fqme] = mkt
+
+            self.pair_ns_path: str = tractor.msg.NamespacePath.from_ref(
+                pair,
+            )

        return self

--- a/piker/data/feed.py
+++ b/piker/data/feed.py
@ -786,7 +786,6 @@ async def install_brokerd_search(

@acm
 async def maybe_open_feed(
-
    fqmes: list[str],
    loglevel: str | None = None,

@ -840,13 +839,12 @@ async def maybe_open_feed(

@acm
 async def open_feed(
-
    fqmes: list[str],

-    loglevel: str | None = None,
+    loglevel: str|None = None,
    allow_overruns: bool = True,
    start_stream: bool = True,
-    tick_throttle: float | None = None,  # Hz
+    tick_throttle: float|None = None,  # Hz

    allow_remote_ctl_ui: bool = False,

--- a/piker/data/flows.py
+++ b/piker/data/flows.py
@ -36,10 +36,10 @@ from ._sharedmem import (
    ShmArray,
    _Token,
 )
+from piker.accounting import MktPair

 if TYPE_CHECKING:
-    from ..accounting import MktPair
-    from .feed import Feed
+    from piker.data.feed import Feed


 class Flume(Struct):
@ -82,7 +82,7 @@ class Flume(Struct):

    # TODO: do we need this really if we can pull the `Portal` from
    # ``tractor``'s internals?
-    feed: Feed | None = None
+    feed: Feed|None = None

    @property
    def rt_shm(self) -> ShmArray:
--- a/piker/data/validate.py
+++ b/piker/data/validate.py
@ -113,9 +113,9 @@ def validate_backend(
            )
            if ep is None:
                log.warning(
-                    f'Provider backend {mod.name} is missing '
-                    f'{daemon_name} support :(\n'
-                    f'The following endpoint is missing: {name}'
+                    f'Provider backend {mod.name!r} is missing '
+                    f'{daemon_name!r} support?\n'
+                    f'|_module endpoint-func missing: {name!r}\n'
                )

    inits: list[
--- a/piker/storage/cli.py
+++ b/piker/storage/cli.py
@ -386,6 +386,8 @@ def ldshm(
            open_annot_ctl() as actl,
        ):
            shm_df: pl.DataFrame | None = None
+            tf2aids: dict[float, dict] = {}
+
            for (
                shmfile,
                shm,
@ -526,16 +528,17 @@ def ldshm(
                            new_df,
                            step_gaps,
                        )
-
                        # last chance manual overwrites in REPL
-                        await tractor.pause()
+                        # await tractor.pause()
                        assert aids
+                        tf2aids[period_s] = aids

                else:
                    # allow interaction even when no ts problems.
-                    await tractor.pause()
-                    # assert not diff
+                    assert not diff

+            await tractor.pause()
+            log.info('Exiting TSP shm anal-izer!')

            if shm_df is None:
                log.error(
--- a/piker/storage/nativedb.py
+++ b/piker/storage/nativedb.py
@ -161,7 +161,13 @@ class NativeStorageClient:

    def index_files(self):
        for path in self._datadir.iterdir():
-            if path.name in {'borked', 'expired',}:
+            if (
+                path.is_dir()
+                or
+                '.parquet' not in str(path)
+                # or
+                # path.name in {'borked', 'expired',}
+            ):
                continue

            key: str = path.name.rstrip('.parquet')
--- a/piker/tsp/init.py
+++ b/piker/tsp/init.py
@ -44,8 +44,10 @@ import trio
 from trio_typing import TaskStatus
 import tractor
 from pendulum import (
+    Interval,
    DateTime,
    Duration,
+    duration as mk_duration,
    from_timestamp,
 )
 import numpy as np
@ -214,7 +216,8 @@ async def maybe_fill_null_segments(
        # pair, immediately stop backfilling?
        if (
            start_dt
-            and end_dt < start_dt
+            and
+            end_dt < start_dt
        ):
            await tractor.pause()
            break
@ -262,6 +265,7 @@ async def maybe_fill_null_segments(
        except tractor.ContextCancelled:
            # log.exception
            await tractor.pause()
+            raise

    null_segs_detected.set()
    # RECHECK for more null-gaps
@ -349,7 +353,7 @@ async def maybe_fill_null_segments(

 async def start_backfill(
    get_hist,
-    frame_types: dict[str, Duration] | None,
+    def_frame_duration: Duration,
    mod: ModuleType,
    mkt: MktPair,
    shm: ShmArray,
@ -379,22 +383,23 @@ async def start_backfill(
        update_start_on_prepend: bool = False
        if backfill_until_dt is None:

-            # TODO: drop this right and just expose the backfill
-            # limits inside a [storage] section in conf.toml?
-            # when no tsdb "last datum" is provided, we just load
-            # some near-term history.
-            # periods = {
-            #     1: {'days': 1},
-            #     60: {'days': 14},
-            # }
-
-            # do a decently sized backfill and load it into storage.
+            # TODO: per-provider default history-durations?
+            # -[ ] inside the `open_history_client()` config allow
+            #    declaring the history duration limits instead of
+            #    guessing and/or applying the same limits to all?
+            #
+            # -[ ] allow declaring (default) per-provider backfill
+            #     limits inside a [storage] sub-section in conf.toml?
+            #
+            # NOTE, when no tsdb "last datum" is provided, we just
+            # load some near-term history by presuming a "decently
+            # large" 60s duration limit and a much shorter 1s range.
            periods = {
                1: {'days': 2},
                60: {'years': 6},
            }
            period_duration: int = periods[timeframe]
-            update_start_on_prepend = True
+            update_start_on_prepend: bool = True

            # NOTE: manually set the "latest" datetime which we intend to
            # backfill history "until" so as to adhere to the history
@ -416,7 +421,6 @@ async def start_backfill(
                f'backfill_until_dt: {backfill_until_dt}\n'
                f'last_start_dt: {last_start_dt}\n'
            )
-
            try:
                (
                    array,
@ -426,71 +430,114 @@ async def start_backfill(
                    timeframe,
                    end_dt=last_start_dt,
                )
-
            except NoData as _daterr:
-                # 3 cases:
-                # - frame in the middle of a legit venue gap
-                # - history actually began at the `last_start_dt`
-                # - some other unknown error (ib blocking the
-                #   history bc they don't want you seeing how they
-                #   cucked all the tinas..)
-                if dur := frame_types.get(timeframe):
-                    # decrement by a frame's worth of duration and
-                    # retry a few times.
-                    last_start_dt.subtract(
-                        seconds=dur.total_seconds()
+                orig_last_start_dt: datetime = last_start_dt
+                gap_report: str = (
+                    f'EMPTY FRAME for `end_dt: {last_start_dt}`?\n'
+                    f'{mod.name} -> tf@fqme: {timeframe}@{mkt.fqme}\n'
+                    f'last_start_dt: {orig_last_start_dt}\n\n'
+                    f'bf_until: {backfill_until_dt}\n'
+                )
+                # EMPTY FRAME signal with 3 (likely) causes:
+                #
+                # 1. range contains legit gap in venue history
+                # 2. history actually (edge case) **began** at the
+                #    value `last_start_dt`
+                # 3. some other unknown error (ib blocking the
+                #    history-query bc they don't want you seeing how
+                #    they cucked all the tinas.. like with options
+                #    hist)
+                #
+                if def_frame_duration:
+                    # decrement by a duration's (frame) worth of time
+                    # as maybe indicated by the backend to see if we
+                    # can get older data before this possible
+                    # "history gap".
+                    last_start_dt: datetime = last_start_dt.subtract(
+                        seconds=def_frame_duration.total_seconds()
                    )
-                    log.warning(
-                        f'{mod.name} -> EMPTY FRAME for end_dt?\n'
-                        f'tf@fqme: {timeframe}@{mkt.fqme}\n'
-                        'bf_until <- last_start_dt:\n'
-                        f'{backfill_until_dt} <- {last_start_dt}\n'
-                        f'Decrementing `end_dt` by {dur} and retry..\n'
+                    gap_report += (
+                        f'Decrementing `end_dt` and retrying with,\n'
+                        f'def_frame_duration: {def_frame_duration}\n'
+                        f'(new) last_start_dt: {last_start_dt}\n'
                    )
+                    log.warning(gap_report)
+                    # skip writing to shm/tsdb and try the next
+                    # duration's worth of prior history.
                    continue

-            # broker says there never was or is no more history to pull
-            except DataUnavailable:
-                log.warning(
-                    f'NO-MORE-DATA in range?\n'
-                    f'`{mod.name}` halted history:\n'
-                    f'tf@fqme: {timeframe}@{mkt.fqme}\n'
-                    'bf_until <- last_start_dt:\n'
-                    f'{backfill_until_dt} <- {last_start_dt}\n'
-                )
+                else:
+                    # await tractor.pause()
+                    raise DataUnavailable(gap_report)

-                # ugh, what's a better way?
-                # TODO: fwiw, we probably want a way to signal a throttle
-                # condition (eg. with ib) so that we can halt the
-                # request loop until the condition is resolved?
-                if timeframe > 1:
-                    await tractor.pause()
+            # broker says there never was or is no more history to pull
+            except DataUnavailable as due:
+                message: str = due.args[0]
+                log.warning(
+                    f'Provider {mod.name!r} halted backfill due to,\n\n'
+
+                    f'{message}\n'
+
+                    f'fqme: {mkt.fqme}\n'
+                    f'timeframe: {timeframe}\n'
+                    f'last_start_dt: {last_start_dt}\n'
+                    f'bf_until: {backfill_until_dt}\n'
+                )
+                # UGH: what's a better way?
+                # TODO: backends are responsible for being correct on
+                # this right!?
+                # -[ ] in the `ib` case we could maybe offer some way
+                #     to halt the request loop until the condition is
+                #     resolved or should the backend be entirely in
+                #     charge of solving such faults? yes, right?
                return

+            time: np.ndarray = array['time']
            assert (
-                array['time'][0]
+                time[0]
                ==
                next_start_dt.timestamp()
            )

-            diff = last_start_dt - next_start_dt
-            frame_time_diff_s = diff.seconds
+            assert time[-1] == next_end_dt.timestamp()
+
+            expected_dur: Interval = last_start_dt - next_start_dt

            # frame's worth of sample-period-steps, in seconds
            frame_size_s: float = len(array) * timeframe
-            expected_frame_size_s: float = frame_size_s + timeframe
-            if frame_time_diff_s > expected_frame_size_s:
-
+            recv_frame_dur: Duration = (
+                from_timestamp(array[-1]['time'])
+                -
+                from_timestamp(array[0]['time'])
+            )
+            if (
+                (lt_frame := (recv_frame_dur < expected_dur))
+                or
+                (null_frame := (frame_size_s == 0))
+                # ^XXX, should NEVER hit now!
+            ):
                # XXX: query result includes a start point prior to our
                # expected "frame size" and thus is likely some kind of
                # history gap (eg. market closed period, outage, etc.)
                # so just report it to console for now.
+                if lt_frame:
+                    reason = 'Possible GAP (or first-datum)'
+                else:
+                    assert null_frame
+                    reason = 'NULL-FRAME'
+
+                missing_dur: Interval = expected_dur.end - recv_frame_dur.end
                log.warning(
-                    'GAP DETECTED:\n'
-                    f'last_start_dt: {last_start_dt}\n'
-                    f'diff: {diff}\n'
-                    f'frame_time_diff_s: {frame_time_diff_s}\n'
+                    f'{timeframe}s-series {reason} detected!\n'
+                    f'fqme: {mkt.fqme}\n'
+                    f'last_start_dt: {last_start_dt}\n\n'
+                    f'recv interval: {recv_frame_dur}\n'
+                    f'expected interval: {expected_dur}\n\n'
+
+                    f'Missing duration of history of {missing_dur.in_words()!r}\n'
+                    f'{missing_dur}\n'
                )
+                # await tractor.pause()

            to_push = diff_history(
                array,
@ -565,22 +612,27 @@ async def start_backfill(
            # long-term storage.
            if (
                storage is not None
-                and write_tsdb
+                and
+                write_tsdb
            ):
                log.info(
                    f'Writing {ln} frame to storage:\n'
                    f'{next_start_dt} -> {last_start_dt}'
                )

-                # always drop the src asset token for
+                # NOTE, always drop the src asset token for
                # non-currency-pair like market types (for now)
+                #
+                # THAT IS, for now our table key schema is NOT
+                # including the dst[/src] source asset token. SO,
+                # 'tsla.nasdaq.ib' over 'tsla/usd.nasdaq.ib' for
+                # historical reasons ONLY.
                if mkt.dst.atype not in {
                    'crypto',
                    'crypto_currency',
                    'fiat',  # a "forex pair"
+                    'perpetual_future',  # stupid "perps" from cex land
                }:
-                    # for now, our table key schema is not including
-                    # the dst[/src] source asset token.
                    col_sym_key: str = mkt.get_fqme(
                        delim_char='',
                        without_src=True,
@ -685,7 +737,7 @@ async def back_load_from_tsdb(
        last_tsdb_dt
        and latest_start_dt
    ):
-        backfilled_size_s = (
+        backfilled_size_s: Duration = (
            latest_start_dt - last_tsdb_dt
        ).seconds
        # if the shm buffer len is not large enough to contain
@ -908,6 +960,8 @@ async def tsdb_backfill(
            f'{pformat(config)}\n'
        )

+        # concurrently load the provider's most-recent-frame AND any
+        # pre-existing tsdb history already saved in `piker` storage.
        dt_eps: list[DateTime, DateTime] = []
        async with trio.open_nursery() as tn:
            tn.start_soon(
@ -918,7 +972,6 @@ async def tsdb_backfill(
                timeframe,
                config,
            )
-
            tsdb_entry: tuple = await load_tsdb_hist(
                storage,
                mkt,
@ -947,6 +1000,25 @@ async def tsdb_backfill(
                mr_end_dt,
            ) = dt_eps

+            first_frame_dur_s: Duration = (mr_end_dt - mr_start_dt).seconds
+            calced_frame_size: Duration = mk_duration(
+                seconds=first_frame_dur_s,
+            )
+            # NOTE, attempt to use the backend declared default frame
+            # sizing (as allowed by their time-series query APIs) and
+            # if not provided try to construct a default from the
+            # first frame received above.
+            def_frame_durs: dict[
+                int,
+                Duration,
+            ]|None = config.get('frame_types', None)
+            if def_frame_durs:
+                def_frame_size: Duration = def_frame_durs[timeframe]
+                assert def_frame_size == calced_frame_size
+            else:
+                # use what we calced from first frame above.
+                def_frame_size = calced_frame_size
+
            # NOTE: when there's no offline data, there's 2 cases:
            # - data backend doesn't support timeframe/sample
            #   period (in which case `dt_eps` should be `None` and
@ -977,7 +1049,7 @@ async def tsdb_backfill(
                    partial(
                        start_backfill,
                        get_hist=get_hist,
-                        frame_types=config.get('frame_types', None),
+                        def_frame_duration=def_frame_size,
                        mod=mod,
                        mkt=mkt,
                        shm=shm,
--- a/piker/tsp/_anal.py
+++ b/piker/tsp/_anal.py
@ -616,6 +616,18 @@ def detect_price_gaps(
    # ])
    ...

+# TODO: probably just use the null_segs impl above?
+def detect_vlm_gaps(
+    df: pl.DataFrame,
+    col: str = 'volume',
+
+) -> pl.DataFrame:
+
+    vnull: pl.DataFrame = w_dts.filter(
+        pl.col(col) == 0
+    )
+    return vnull
+

 def dedupe(
    src_df: pl.DataFrame,
@ -626,7 +638,6 @@ def dedupe(

 ) -> tuple[
    pl.DataFrame,  # with dts
-    pl.DataFrame,  # gaps
    pl.DataFrame,  # with deduplicated dts (aka gap/repeat removal)
    int,  # len diff between input and deduped
 ]:
@ -639,19 +650,22 @@ def dedupe(
    '''
    wdts: pl.DataFrame = with_dts(src_df)

-    # maybe sort on any time field
-    if sort:
-        wdts = wdts.sort(by='time')
-        # TODO: detect out-of-order segments which were corrected!
-        # -[ ] report in log msg
-        # -[ ] possibly return segment sections which were moved?
+    deduped = wdts

    # remove duplicated datetime samples/sections
    deduped: pl.DataFrame = wdts.unique(
-        subset=['dt'],
+        # subset=['dt'],
+        subset=['time'],
        maintain_order=True,
    )

+    # maybe sort on any time field
+    if sort:
+        deduped = deduped.sort(by='time')
+        # TODO: detect out-of-order segments which were corrected!
+        # -[ ] report in log msg
+        # -[ ] possibly return segment sections which were moved?
+
    diff: int = (
        wdts.height
        -
Author	SHA1	Message	Date
Tyler Goodlet	eb06fc79f1	`.accounting._ledger`: typing anda more multiline styling	2025-02-19 17:56:10 -05:00
Tyler Goodlet	76ca316b9d	Drop some bps and style logic to multiline	2025-02-19 17:56:10 -05:00
Tyler Goodlet	3e8481978b	`.accounting` add synopsis section to readme	2025-02-19 17:56:10 -05:00
Tyler Goodlet	9e6bfa0926	Teensie `piker.data` styling tweaks - use more compact optional value style with `\|`-union - fix `.flows` typing-only import since we need `MktPair` to be immediately defined for use on a `msgspec.Struct` field. - more "tree-like" warning msg in `.validate()` reporting.	2025-02-19 17:52:14 -05:00
Tyler Goodlet	a945bb33f3	Invert `getattr()` check for `get_mkt_pairs()` ep Such that we `return` early when not defined by the provider backend to reduce an indent level in `SymbologyCache.load()`.	2025-02-19 17:31:10 -05:00
Tyler Goodlet	850cdbfe59	Allow ledger passes to ignore (symcache) unknown fqmes For example in the paper-eng, if you have a backend that doesn't fully support a symcache (yet) it's handy to be able to ignore processing other paper-eng txns when all you care about at the moment is the simulated symbol. NOTE, that currently this will still result in a key-error when you load more then one mkt with the paper engine (for which the backend does not have the symcache implemented) since no fqme ad-hoc query was made for the 2nd symbol (and i'm not sure we should support that kinda hackery over just encouraging the sym-cache being added?). Def needs a little more thought depending on how many backends are never going to be able to (easily) support caching..	2025-02-19 17:31:10 -05:00
Tyler Goodlet	d49608f74e	Refine history gap/termination signalling Namely handling backends which do not provide a default "frame size-duration" in their init-config by making the backfiller guess the value based on the first frame received. Deats, - adjust `start_backfill()` to take a more explicit `def_frame_duration: Duration` expected to be unpacked from any backend hist init-config by the `tsdb_backfill()` caller which now also computes a value from the first received frame when the config section isn't provided. - in `start_backfill()` we now always expect the `def_frame_duration` input and always decrement the query range by this value whenever a `NoData` is raised by the provider-backend paired with an explicit `log.warning()` about the handling. - also relay any `DataUnavailable.args[0]` message from the provider in the handler. - repair "gap reporting" which checks for expected frame duration vs. that received with much better humanized logging on the missing segment using `pendulum.Interval/Duration.in_words()` output.	2025-02-19 17:01:24 -05:00
Tyler Goodlet	bf0ac93aa3	Only use `frame_types` if delivered during enter The `open_history_client()` provider endpoint can optionally deliver a `frame_types: dict[int, pendulum.Duration]` subsection in its `config: dict[str, dict]` (as was implemented with the `ib` backend). This allows the `tsp` backfilling machinery to use this "recommended frame duration" to subtract from the `last_start_dt` any time a `NoData` gap is signalled by the `get_hist()` call allowing gaps to be ignored safely without missing history by knowing the next earliest dt we can query from using the `end_dt`. However, currently all crypto$ providers haven't implemented this feat yet.. As such only try to use the `frame_types` feature if provided when handling `NoData` conditions inside `tsp.start_backfill()` and otherwise raise as normal.	2025-02-19 17:01:24 -05:00
Tyler Goodlet	d7179d47b0	`.tsp._anal`: add (unused) `detect_vlm_gaps()`	2025-02-19 17:01:24 -05:00
Tyler Goodlet	c390e87536	`.storage.cli`: collect gap-markup-aids into `tf2aids: dict` prior to pause for introspection	2025-02-19 17:01:24 -05:00
Tyler Goodlet	5e4a6d61c7	Ignore any non-`.parquet` files under `.config/piker/nativedb/` subdir	2025-02-19 17:01:24 -05:00
Tyler Goodlet	3caaa30b03	Mask no-data pause, add perps to no-`/src`-in-fqme asset set Was orig for debugging an issue with `kucoin` i think but definitely shouldn't be left in XD Also add `'perpetual_future'` to the `.start_backfill()` input literal set since we don't expect the 'btc/usd.perp.binance' for now.	2025-02-19 17:01:24 -05:00