piker/.claude/skills/timeseries-optimization/polars-patterns.md

# Polars Integration Patterns

Polars usage patterns for piker's timeseries
processing, including NumPy interop.

## NumPy <-> Polars Conversion

```python
import polars as pl

# numpy to polars
df = pl.from_numpy(
    arr,
    schema=[
        'index', 'time', 'open', 'high',
        'low', 'close', 'volume',
    ],
)

# polars to numpy (via arrow)
arr = df.to_numpy()

# piker convenience
from piker.tsp import np2pl, pl2np
df = np2pl(arr)
arr = pl2np(df)
```

## Polars Performance Patterns

### Lazy Evaluation

```python
# build query lazily
lazy_df = (
    df.lazy()
    .filter(pl.col('volume') > 1000)
    .with_columns([
        (
            pl.col('close') - pl.col('open')
        ).alias('change')
    ])
    .sort('time')
)

# execute once
result = lazy_df.collect()
```

### Groupby Aggregations

```python
# resample to 5-minute bars
resampled = df.groupby_dynamic(
    index_column='time',
    every='5m',
).agg([
    pl.col('open').first(),
    pl.col('high').max(),
    pl.col('low').min(),
    pl.col('close').last(),
    pl.col('volume').sum(),
])
```

## When to Use Polars vs NumPy

### Use Polars when:
- Complex queries with multiple filters/joins
- Need SQL-like operations (groupby, window fns)
- Working with heterogeneous column types
- Want lazy evaluation optimization

### Use NumPy when:
- Simple array operations (indexing, slicing)
- Direct memory access needed (e.g., SHM arrays)
- Compatibility with Qt/pyqtgraph (expects NumPy)
- Maximum performance for numerical computation
Factor `.claude/skills/` into proper subdirs w/ frontmatter Reorganize all 5 skills from loose `.md` files (and one partially-formatted `commit_msg/`) into the documented `subdirectory/SKILL.md` format with YAML frontmatter. Deats, - `commit_msg/` -> `commit-msg/` w/ enhanced frontmatter: `argument-hint`, `disable-model-invocation`, `allowed-tools`, dynamic `!` context injection for staged diff + recent log, `$ARGUMENTS` support - `piker_profiling.md` -> `piker-profiling/SKILL.md` + `patterns.md` for detailed profiling patterns - `piker_slang_and_communication_style.md` -> `piker-slang/SKILL.md` + `dictionary.md` + `examples.md` - `pyqtgraph_rendering_optimization.md` -> `pyqtgraph-optimization/SKILL.md` + `examples.md` - `timeseries_numpy_polars_optimization.md` -> `timeseries-optimization/SKILL.md` + `numpy-patterns.md` + `polars-patterns.md` Also, - all background skills use `user-invocable: false` for auto-application when relevant. - use a hyphen convention across all dir names. - content is now split into supporting files linked from each `SKILL.md`. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code 2026-02-27 17:13:24 +00:00			`# Polars Integration Patterns`

			`Polars usage patterns for piker's timeseries`
			`processing, including NumPy interop.`

			`## NumPy <-> Polars Conversion`

			```python
			`import polars as pl`

			`# numpy to polars`
			`df = pl.from_numpy(`
			`arr,`
			`schema=[`
			`'index', 'time', 'open', 'high',`
			`'low', 'close', 'volume',`
			`],`
			`)`

			`# polars to numpy (via arrow)`
			`arr = df.to_numpy()`

			`# piker convenience`
			`from piker.tsp import np2pl, pl2np`
			`df = np2pl(arr)`
			`arr = pl2np(df)`
			```

			`## Polars Performance Patterns`

			`### Lazy Evaluation`

			```python
			`# build query lazily`
			`lazy_df = (`
			`df.lazy()`
			`.filter(pl.col('volume') > 1000)`
			`.with_columns([`
			`(`
			`pl.col('close') - pl.col('open')`
			`).alias('change')`
			`])`
			`.sort('time')`
			`)`

			`# execute once`
			`result = lazy_df.collect()`
			```

			`### Groupby Aggregations`

			```python
			`# resample to 5-minute bars`
			`resampled = df.groupby_dynamic(`
			`index_column='time',`
			`every='5m',`
			`).agg([`
			`pl.col('open').first(),`
			`pl.col('high').max(),`
			`pl.col('low').min(),`
			`pl.col('close').last(),`
			`pl.col('volume').sum(),`
			`])`
			```

			`## When to Use Polars vs NumPy`

			`### Use Polars when:`
			`- Complex queries with multiple filters/joins`
			`- Need SQL-like operations (groupby, window fns)`
			`- Working with heterogeneous column types`
			`- Want lazy evaluation optimization`

			`### Use NumPy when:`
			`- Simple array operations (indexing, slicing)`
			`- Direct memory access needed (e.g., SHM arrays)`
			`- Compatibility with Qt/pyqtgraph (expects NumPy)`
			`- Maximum performance for numerical computation`