piker/.claude/skills/timeseries-optimization/polars-patterns.md

79 lines
1.5 KiB
Markdown
Raw Permalink Normal View History

# Polars Integration Patterns
Polars usage patterns for piker's timeseries
processing, including NumPy interop.
## NumPy <-> Polars Conversion
```python
import polars as pl
# numpy to polars
df = pl.from_numpy(
arr,
schema=[
'index', 'time', 'open', 'high',
'low', 'close', 'volume',
],
)
# polars to numpy (via arrow)
arr = df.to_numpy()
# piker convenience
from piker.tsp import np2pl, pl2np
df = np2pl(arr)
arr = pl2np(df)
```
## Polars Performance Patterns
### Lazy Evaluation
```python
# build query lazily
lazy_df = (
df.lazy()
.filter(pl.col('volume') > 1000)
.with_columns([
(
pl.col('close') - pl.col('open')
).alias('change')
])
.sort('time')
)
# execute once
result = lazy_df.collect()
```
### Groupby Aggregations
```python
# resample to 5-minute bars
resampled = df.groupby_dynamic(
index_column='time',
every='5m',
).agg([
pl.col('open').first(),
pl.col('high').max(),
pl.col('low').min(),
pl.col('close').last(),
pl.col('volume').sum(),
])
```
## When to Use Polars vs NumPy
### Use Polars when:
- Complex queries with multiple filters/joins
- Need SQL-like operations (groupby, window fns)
- Working with heterogeneous column types
- Want lazy evaluation optimization
### Use NumPy when:
- Simple array operations (indexing, slicing)
- Direct memory access needed (e.g., SHM arrays)
- Compatibility with Qt/pyqtgraph (expects NumPy)
- Maximum performance for numerical computation