piker/.claude/skills/timeseries-optimization/polars-patterns.md

# Polars Integration Patterns

Polars usage patterns for piker's timeseries
processing, including NumPy interop.

## NumPy <-> Polars Conversion

```python
import polars as pl

# numpy to polars
df = pl.from_numpy(
    arr,
    schema=[
        'index', 'time', 'open', 'high',
        'low', 'close', 'volume',
    ],
)

# polars to numpy (via arrow)
arr = df.to_numpy()

# piker convenience
from piker.tsp import np2pl, pl2np
df = np2pl(arr)
arr = pl2np(df)
```

## Polars Performance Patterns

### Lazy Evaluation

```python
# build query lazily
lazy_df = (
    df.lazy()
    .filter(pl.col('volume') > 1000)
    .with_columns([
        (
            pl.col('close') - pl.col('open')
        ).alias('change')
    ])
    .sort('time')
)

# execute once
result = lazy_df.collect()
```

### Groupby Aggregations

```python
# resample to 5-minute bars
resampled = df.groupby_dynamic(
    index_column='time',
    every='5m',
).agg([
    pl.col('open').first(),
    pl.col('high').max(),
    pl.col('low').min(),
    pl.col('close').last(),
    pl.col('volume').sum(),
])
```

## When to Use Polars vs NumPy

### Use Polars when:
- Complex queries with multiple filters/joins
- Need SQL-like operations (groupby, window fns)
- Working with heterogeneous column types
- Want lazy evaluation optimization

### Use NumPy when:
- Simple array operations (indexing, slicing)
- Direct memory access needed (e.g., SHM arrays)
- Compatibility with Qt/pyqtgraph (expects NumPy)
- Maximum performance for numerical computation