piker/.claude/skills/timeseries-optimization/polars-patterns.md

1.5 KiB
Raw Permalink Blame History

Polars Integration Patterns

Polars usage patterns for pikers timeseries processing, including NumPy interop.

NumPy <-> Polars Conversion

import polars as pl

# numpy to polars
df = pl.from_numpy(
    arr,
    schema=[
        'index', 'time', 'open', 'high',
        'low', 'close', 'volume',
    ],
)

# polars to numpy (via arrow)
arr = df.to_numpy()

# piker convenience
from piker.tsp import np2pl, pl2np
df = np2pl(arr)
arr = pl2np(df)

Polars Performance Patterns

Lazy Evaluation

# build query lazily
lazy_df = (
    df.lazy()
    .filter(pl.col('volume') > 1000)
    .with_columns([
        (
            pl.col('close') - pl.col('open')
        ).alias('change')
    ])
    .sort('time')
)

# execute once
result = lazy_df.collect()

Groupby Aggregations

# resample to 5-minute bars
resampled = df.groupby_dynamic(
    index_column='time',
    every='5m',
).agg([
    pl.col('open').first(),
    pl.col('high').max(),
    pl.col('low').min(),
    pl.col('close').last(),
    pl.col('volume').sum(),
])

When to Use Polars vs NumPy

Use Polars when:

  • Complex queries with multiple filters/joins
  • Need SQL-like operations (groupby, window fns)
  • Working with heterogeneous column types
  • Want lazy evaluation optimization

Use NumPy when:

  • Simple array operations (indexing, slicing)
  • Direct memory access needed (e.g., SHM arrays)
  • Compatibility with Qt/pyqtgraph (expects NumPy)
  • Maximum performance for numerical computation