Gives approx a 3-4x speedup using plain old iterate-with-for-loop style
though still not really happy with this .5 to 1 ms latency..
Move the core `@njit` part to a `_slice_from_time()` with a pure python
func with orig name around it. Also, drop the output `mask` array since
we can generally just use the slices in the caller to accomplish the
same input array slicing, duh..