79 lines
1.5 KiB
Markdown
79 lines
1.5 KiB
Markdown
# Polars Integration Patterns
|
|
|
|
Polars usage patterns for piker's timeseries
|
|
processing, including NumPy interop.
|
|
|
|
## NumPy <-> Polars Conversion
|
|
|
|
```python
|
|
import polars as pl
|
|
|
|
# numpy to polars
|
|
df = pl.from_numpy(
|
|
arr,
|
|
schema=[
|
|
'index', 'time', 'open', 'high',
|
|
'low', 'close', 'volume',
|
|
],
|
|
)
|
|
|
|
# polars to numpy (via arrow)
|
|
arr = df.to_numpy()
|
|
|
|
# piker convenience
|
|
from piker.tsp import np2pl, pl2np
|
|
df = np2pl(arr)
|
|
arr = pl2np(df)
|
|
```
|
|
|
|
## Polars Performance Patterns
|
|
|
|
### Lazy Evaluation
|
|
|
|
```python
|
|
# build query lazily
|
|
lazy_df = (
|
|
df.lazy()
|
|
.filter(pl.col('volume') > 1000)
|
|
.with_columns([
|
|
(
|
|
pl.col('close') - pl.col('open')
|
|
).alias('change')
|
|
])
|
|
.sort('time')
|
|
)
|
|
|
|
# execute once
|
|
result = lazy_df.collect()
|
|
```
|
|
|
|
### Groupby Aggregations
|
|
|
|
```python
|
|
# resample to 5-minute bars
|
|
resampled = df.groupby_dynamic(
|
|
index_column='time',
|
|
every='5m',
|
|
).agg([
|
|
pl.col('open').first(),
|
|
pl.col('high').max(),
|
|
pl.col('low').min(),
|
|
pl.col('close').last(),
|
|
pl.col('volume').sum(),
|
|
])
|
|
```
|
|
|
|
## When to Use Polars vs NumPy
|
|
|
|
### Use Polars when:
|
|
- Complex queries with multiple filters/joins
|
|
- Need SQL-like operations (groupby, window fns)
|
|
- Working with heterogeneous column types
|
|
- Want lazy evaluation optimization
|
|
|
|
### Use NumPy when:
|
|
- Simple array operations (indexing, slicing)
|
|
- Direct memory access needed (e.g., SHM arrays)
|
|
- Compatibility with Qt/pyqtgraph (expects NumPy)
|
|
- Maximum performance for numerical computation
|