piker/.claude/skills/timeseries-optimization/SKILL.md

---
name: timeseries-optimization
description: >
  High-performance timeseries processing with NumPy
  and Polars for financial data. Apply when working
  with OHLCV arrays, timestamp lookups, gap
  detection, or any array/dataframe operations in
  piker.
user-invocable: false
---

# Timeseries Optimization: NumPy & Polars

Skill for high-performance timeseries processing
using NumPy and Polars, with focus on patterns
common in financial/trading applications.

## Core Principle: Vectorization Over Iteration

**Never write Python loops over large arrays.**
Always look for vectorized alternatives.

```python
# BAD: Python loop (slow!)
results = []
for i in range(len(array)):
    if array['time'][i] == target_time:
        results.append(array[i])

# GOOD: vectorized boolean indexing (fast!)
results = array[array['time'] == target_time]
```

## Timestamp Lookup Patterns

The most critical optimization in piker timeseries
code. Choose the right lookup strategy:

### Linear Scan (O(n)) - Avoid!

```python
# BAD: O(n) scan through entire array
for target_ts in timestamps:  # m iterations
    matches = array[array['time'] == target_ts]
    # Total: O(m * n) - catastrophic!
```

**Performance:**
- 1000 lookups x 10k array = 10M comparisons
- Timing: ~50-100ms for 1k lookups

### Binary Search (O(log n)) - Good!

```python
# GOOD: O(m log n) using searchsorted
import numpy as np

time_arr = array['time']  # extract once
ts_array = np.array(timestamps)

# binary search for all timestamps at once
indices = np.searchsorted(time_arr, ts_array)

# bounds check and exact match verification
valid_mask = (
    (indices < len(array))
    &
    (time_arr[indices] == ts_array)
)

valid_indices = indices[valid_mask]
matched_rows = array[valid_indices]
```

**Requirements for `searchsorted()`:**
- Input array MUST be sorted (ascending)
- Works on any sortable dtype (floats, ints)
- Returns insertion indices (not found =
  `len(array)`)

**Performance:**
- 1000 lookups x 10k array = ~10k comparisons
- Timing: <1ms for 1k lookups
- **~100-1000x faster than linear scan**

### Hash Table (O(1)) - Best for Repeated Lookups!

If you'll do many lookups on same array, build
dict once:

```python
# build lookup once
time_to_idx = {
    float(array['time'][i]): i
    for i in range(len(array))
}

# O(1) lookups
for target_ts in timestamps:
    idx = time_to_idx.get(target_ts)
    if idx is not None:
        row = array[idx]
```

**When to use:**
- Many repeated lookups on same array
- Array doesn't change between lookups
- Can afford upfront dict building cost

## Performance Checklist

When optimizing timeseries operations:

- [ ] Is the array sorted? (enables binary search)
- [ ] Are you doing repeated lookups?
  (build hash table)
- [ ] Are struct fields accessed in loops?
  (extract to plain arrays)
- [ ] Are you using boolean indexing?
  (vectorized vs loop)
- [ ] Can operations be batched?
  (minimize round-trips)
- [ ] Is memory being copied unnecessarily?
  (use views)
- [ ] Are you using the right tool?
  (NumPy vs Polars)

## Common Bottlenecks and Fixes

### Bottleneck: Timestamp Lookups

```python
# BEFORE: O(n*m) - 100ms for 1k lookups
for ts in timestamps:
    matches = array[array['time'] == ts]

# AFTER: O(m log n) - <1ms for 1k lookups
indices = np.searchsorted(
    array['time'], timestamps,
)
```

### Bottleneck: Dict Building from Struct Array

```python
# BEFORE: 100ms for 3k rows
result = {
    float(row['time']): {
        'index': float(row['index']),
        'close': float(row['close']),
    }
    for row in matched_rows
}

# AFTER: <5ms for 3k rows
times = matched_rows['time'].astype(float)
indices = matched_rows['index'].astype(float)
closes = matched_rows['close'].astype(float)

result = {
    t: {'index': idx, 'close': cls}
    for t, idx, cls in zip(
        times, indices, closes,
    )
}
```

### Bottleneck: Repeated Field Access

```python
# BEFORE: 50ms for 1k iterations
for i, spec in enumerate(specs):
    start_row = array[
        array['time'] == spec['start_time']
    ][0]
    end_row = array[
        array['time'] == spec['end_time']
    ][0]
    process(
        start_row['index'],
        end_row['close'],
    )

# AFTER: <5ms for 1k iterations
# 1. Build lookup once
time_to_row = {...}  # via searchsorted

# 2. Extract fields to plain arrays
indices_arr = array['index']
closes_arr = array['close']

# 3. Use lookup + plain array indexing
for spec in specs:
    start_idx = time_to_row[
        spec['start_time']
    ]['array_idx']
    end_idx = time_to_row[
        spec['end_time']
    ]['array_idx']
    process(
        indices_arr[start_idx],
        closes_arr[end_idx],
    )
```

## References

- NumPy structured arrays:
  https://numpy.org/doc/stable/user/basics.rec.html
- `np.searchsorted`:
  https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html
- Polars: https://pola-rs.github.io/polars/
- `piker.tsp` - timeseries processing utilities
- `piker.data._formatters` - OHLC array handling

See [numpy-patterns.md](numpy-patterns.md) for
detailed NumPy structured array patterns and
[polars-patterns.md](polars-patterns.md) for
Polars integration.

---

*Last updated: 2026-01-31*
*Key win: 100ms -> 5ms dict building via field
extraction*
Factor `.claude/skills/` into proper subdirs w/ frontmatter Reorganize all 5 skills from loose `.md` files (and one partially-formatted `commit_msg/`) into the documented `subdirectory/SKILL.md` format with YAML frontmatter. Deats, - `commit_msg/` -> `commit-msg/` w/ enhanced frontmatter: `argument-hint`, `disable-model-invocation`, `allowed-tools`, dynamic `!` context injection for staged diff + recent log, `$ARGUMENTS` support - `piker_profiling.md` -> `piker-profiling/SKILL.md` + `patterns.md` for detailed profiling patterns - `piker_slang_and_communication_style.md` -> `piker-slang/SKILL.md` + `dictionary.md` + `examples.md` - `pyqtgraph_rendering_optimization.md` -> `pyqtgraph-optimization/SKILL.md` + `examples.md` - `timeseries_numpy_polars_optimization.md` -> `timeseries-optimization/SKILL.md` + `numpy-patterns.md` + `polars-patterns.md` Also, - all background skills use `user-invocable: false` for auto-application when relevant. - use a hyphen convention across all dir names. - content is now split into supporting files linked from each `SKILL.md`. (this patch was generated in some part by [`claude-code`][claude-code-gh]) [claude-code-gh]: https://github.com/anthropics/claude-code 2026-02-27 17:13:24 +00:00			`---`
			`name: timeseries-optimization`
			`description: >`
			`High-performance timeseries processing with NumPy`
			`and Polars for financial data. Apply when working`
			`with OHLCV arrays, timestamp lookups, gap`
			`detection, or any array/dataframe operations in`
			`piker.`
			`user-invocable: false`
			`---`

			`# Timeseries Optimization: NumPy & Polars`

			`Skill for high-performance timeseries processing`
			`using NumPy and Polars, with focus on patterns`
			`common in financial/trading applications.`

			`## Core Principle: Vectorization Over Iteration`

			`Never write Python loops over large arrays.`
			`Always look for vectorized alternatives.`

			```python
			`# BAD: Python loop (slow!)`
			`results = []`
			`for i in range(len(array)):`
			`if array['time'][i] == target_time:`
			`results.append(array[i])`

			`# GOOD: vectorized boolean indexing (fast!)`
			`results = array[array['time'] == target_time]`
			```

			`## Timestamp Lookup Patterns`

			`The most critical optimization in piker timeseries`
			`code. Choose the right lookup strategy:`

			`### Linear Scan (O(n)) - Avoid!`

			```python
			`# BAD: O(n) scan through entire array`
			`for target_ts in timestamps: # m iterations`
			`matches = array[array['time'] == target_ts]`
			`# Total: O(m * n) - catastrophic!`
			```

			`Performance:`
			`- 1000 lookups x 10k array = 10M comparisons`
			`- Timing: ~50-100ms for 1k lookups`

			`### Binary Search (O(log n)) - Good!`

			```python
			`# GOOD: O(m log n) using searchsorted`
			`import numpy as np`

			`time_arr = array['time'] # extract once`
			`ts_array = np.array(timestamps)`

			`# binary search for all timestamps at once`
			`indices = np.searchsorted(time_arr, ts_array)`

			`# bounds check and exact match verification`
			`valid_mask = (`
			`(indices < len(array))`
			`&`
			`(time_arr[indices] == ts_array)`
			`)`

			`valid_indices = indices[valid_mask]`
			`matched_rows = array[valid_indices]`
			```

			Requirements for `searchsorted()`:
			`- Input array MUST be sorted (ascending)`
			`- Works on any sortable dtype (floats, ints)`
			`- Returns insertion indices (not found =`
			`len(array)`)

			`Performance:`
			`- 1000 lookups x 10k array = ~10k comparisons`
			`- Timing: <1ms for 1k lookups`
			`- ~100-1000x faster than linear scan`

			`### Hash Table (O(1)) - Best for Repeated Lookups!`

			`If you'll do many lookups on same array, build`
			`dict once:`

			```python
			`# build lookup once`
			`time_to_idx = {`
			`float(array['time'][i]): i`
			`for i in range(len(array))`
			`}`

			`# O(1) lookups`
			`for target_ts in timestamps:`
			`idx = time_to_idx.get(target_ts)`
			`if idx is not None:`
			`row = array[idx]`
			```

			`When to use:`
			`- Many repeated lookups on same array`
			`- Array doesn't change between lookups`
			`- Can afford upfront dict building cost`

			`## Performance Checklist`

			`When optimizing timeseries operations:`

			`- [ ] Is the array sorted? (enables binary search)`
			`- [ ] Are you doing repeated lookups?`
			`(build hash table)`
			`- [ ] Are struct fields accessed in loops?`
			`(extract to plain arrays)`
			`- [ ] Are you using boolean indexing?`
			`(vectorized vs loop)`
			`- [ ] Can operations be batched?`
			`(minimize round-trips)`
			`- [ ] Is memory being copied unnecessarily?`
			`(use views)`
			`- [ ] Are you using the right tool?`
			`(NumPy vs Polars)`

			`## Common Bottlenecks and Fixes`

			`### Bottleneck: Timestamp Lookups`

			```python
			`# BEFORE: O(n*m) - 100ms for 1k lookups`
			`for ts in timestamps:`
			`matches = array[array['time'] == ts]`

			`# AFTER: O(m log n) - <1ms for 1k lookups`
			`indices = np.searchsorted(`
			`array['time'], timestamps,`
			`)`
			```

			`### Bottleneck: Dict Building from Struct Array`

			```python
			`# BEFORE: 100ms for 3k rows`
			`result = {`
			`float(row['time']): {`
			`'index': float(row['index']),`
			`'close': float(row['close']),`
			`}`
			`for row in matched_rows`
			`}`

			`# AFTER: <5ms for 3k rows`
			`times = matched_rows['time'].astype(float)`
			`indices = matched_rows['index'].astype(float)`
			`closes = matched_rows['close'].astype(float)`

			`result = {`
			`t: {'index': idx, 'close': cls}`
			`for t, idx, cls in zip(`
			`times, indices, closes,`
			`)`
			`}`
			```

			`### Bottleneck: Repeated Field Access`

			```python
			`# BEFORE: 50ms for 1k iterations`
			`for i, spec in enumerate(specs):`
			`start_row = array[`
			`array['time'] == spec['start_time']`
			`][0]`
			`end_row = array[`
			`array['time'] == spec['end_time']`
			`][0]`
			`process(`
			`start_row['index'],`
			`end_row['close'],`
			`)`

			`# AFTER: <5ms for 1k iterations`
			`# 1. Build lookup once`
			`time_to_row = {...} # via searchsorted`

			`# 2. Extract fields to plain arrays`
			`indices_arr = array['index']`
			`closes_arr = array['close']`

			`# 3. Use lookup + plain array indexing`
			`for spec in specs:`
			`start_idx = time_to_row[`
			`spec['start_time']`
			`]['array_idx']`
			`end_idx = time_to_row[`
			`spec['end_time']`
			`]['array_idx']`
			`process(`
			`indices_arr[start_idx],`
			`closes_arr[end_idx],`
			`)`
			```

			`## References`

			`- NumPy structured arrays:`
			`https://numpy.org/doc/stable/user/basics.rec.html`
			- `np.searchsorted`:
			`https://numpy.org/doc/stable/reference/generated/numpy.searchsorted.html`
			`- Polars: https://pola-rs.github.io/polars/`
			- `piker.tsp` - timeseries processing utilities
			- `piker.data._formatters` - OHLC array handling

			`See [numpy-patterns.md](numpy-patterns.md) for`
			`detailed NumPy structured array patterns and`
			`[polars-patterns.md](polars-patterns.md) for`
			`Polars integration.`

			`---`

			`Last updated: 2026-01-31`
			`*Key win: 100ms -> 5ms dict building via field`
			`extraction*`