5.7 KiB
provider "spec" (aka backends)
piker abstracts and encapsulates real-time data feeds across a slew of providers covering many (pretty much any) instrument class.
This doc is shoddy attempt as specifying what a backend must provide as a basic api, per functionality-feature set, in order to be supported for bny of real-time and historical data feeds and order control via the emsd clearing system.
"providers" must offer a top plevel namespace (normally exposed as a python module) which offers to a certain set of (async) functions to deliver info through a real-time, normalized data layer.
Generally speaking we break each piker.brokers.<backend_name> into a python package containing 3 sub-modules: - .api containing lowest level client code used to interact specifically with the APIs of the exchange, broker or data provider. - .feed which provides historical and real-time quote stream data provider endpoints called by piker's data layer in piker.data.feed. - .broker which defines endpoints expected by pikerd.clearing._ems and which are expected to adhere to the msg protocol defined in piker.clrearing._messages.
Our current set of "production" grade backends includes: - kraken - ib
data feeds
real-time quotes and tick streaming:
async def stream_quotes(
send_chan: trio.abc.SendChannel,
symbols: List[str],
shm: ShmArray,
feed_is_live: trio.Event,
loglevel: str = None, # log level passed in from user config
# startup sync via ``trio``
task_status: TaskStatus[Tuple[Dict, Dict]] = trio.TASK_STATUS_IGNORED,
) -> None:this routine must eventually deliver realt-time quote messages by sending them on the passed in send_chan; these messages must have specific format. there is a very simple but required startup sequence:
message starup sequence:
at a minimum, and asap, a first quote message should be returned for each requested symbol in symbols. the message should have a minimum format:
quote_msg: dict[str, Any] = {
'symbol': 'xbtusd', # or wtv symbol was requested
# this field is required in the initial first quote only (though
# is recommended in all follow up quotes) but can be
'last': <last clearing price>, # float
# tick stream fields (see below for schema/format)
'ticks': list[dict[str, Any]],
}further streamed quote messages should be in this same format. ticks is an optional sequence
historical OHLCV sampling
Example endpoint copyed from the binance backend:
@acm
async def open_history_client(
symbol: str,
) -> tuple[Callable, int]:
# TODO implement history getter for the new storage layer.
async with open_cached_client('binance') as client:
async def get_ohlc(
timeframe: float,
end_dt: datetime | None = None,
start_dt: datetime | None = None,
) -> tuple[
np.ndarray,
datetime, # start
datetime, # end
]:
if timeframe != 60:
raise DataUnavailable('Only 1m bars are supported')
array = await client.bars(
symbol,
start_dt=start_dt,
end_dt=end_dt,
)
times = array['time']
if (
end_dt is None
):
inow = round(time.time())
if (inow - times[-1]) > 60:
await tractor.breakpoint()
start_dt = pendulum.from_timestamp(times[0])
end_dt = pendulum.from_timestamp(times[-1])
return array, start_dt, end_dt
yield get_ohlc, {'erlangs': 3, 'rate': 3}This @acm routine is responsible for setting up an async historical data query routine for both charting and any local storage requirements.
The returned async func should retreive, normalize and deliver a tuple[np.ndarray, pendulum.dateime, pendulum.dateime] of the the numpy-ified data, the start and stop datetimes for the delivered history "frame". The history backloading routines inside piker.data.feed expect this interface for both loading history into ShmArrayt real-time buffers as well as any configured time-series-database (tsdb) and normally the format of this data is OHLCV sampled price and volume data but in theory can be high reslolution tick/trades/book times series in the future.
Currently sampling routines for charting and fsp processing expects a max resolution of 1s (second) OHLCV sampled data.
OHLCV minmal schema
ohlcv at a minimum is normally pushed to local shared memory (shm) numpy compatible arrays which are read by both UI components for display as well auto-strats and algorithmic trading engines. shm is obviously used for speed. we also intend to eventually support pure shm tick streams for ultra low latency processing by external processes/services.
the provider module at a minimum must define a numpy structured array dtype ohlc_dtype = np.dtype(_ohlc_dtype) where the _ohlc_dtype is normally defined in standard list-tuple synatx as:
# Broker specific ohlc schema which includes a vwap field
_ohlc_dtype = [
('index', int),
('time', int),
('open', float),
('high', float),
('low', float),
('close', float),
('volume', float),
('count', int),
('bar_wap', float),
]