WebSocket: LocalProtocolError: Pong … cannot be sent in state LOCAL_CLOSING after remote reset (WinError 10054) causes actor cascade & zombie cleanup #47

Open
opened 2025-10-08 18:23:24 +00:00 by powers · 0 comments
Collaborator

Summary

On Windows, the Binance data feed occasionally drops with WinError 10054 (remote reset). During shutdown, the WebSocket stack (wsproto/trio-websocket) tries to send a Pong while the connection is already closing, raising LocalProtocolError. The feed task fails; internal actor/IPC channels are torn down; the supervisor cleans up “zombie” sub-actors. Reconnect then races with close, resulting in repeated failures.

Environment

OS: Windows 10/11 (x64)

Runtime: Python 3.12.x (venv)

Packages (observed): piker, trio, trio-websocket, wsproto, tractor

Endpoint: wss://stream.binance.com:9443 (public stream)

Steps to Reproduce

Start pikerd with Binance feed enabled (chart/EMS optional).

Let it run behind typical Windows environment (Wi-Fi/VPN/consumer firewall).

Trigger or wait for a transient network hiccup (remote RST).

Observe reconnect loop and eventual crash.

Expected Behavior

On remote close/reset: gracefully stop ping/pong, finish close handshake, and attempt reconnection with backoff—without raising LocalProtocolError or tearing down unrelated actors.

Actual Behavior

Socket error: WinError 10054 (connection forcibly closed by remote host).

Immediately after, LocalProtocolError: Event Pong(…) cannot be sent in state ConnectionState.LOCAL_CLOSING.

Actor nursery cancellations and IPC channel closures; “zombie” processes cleaned by supervisor.

Reconnect attempt races with shutdown leading to additional errors.

Representative Logs / Trace

Impact

Intermittent feed loss on Windows networks (Wi-Fi/VPN/middleboxes).

Crash cascades and requires manual restart; charts/EMS sessions are interrupted.

Root Cause Analysis (RCA)

After receiving a close/EOF (or remote RST), the WebSocket state transitions to LOCAL_CLOSING.

A background ping task or on-ping callback still attempts to send Pong, which wsproto disallows during closing → LocalProtocolError.

Exception bubbles out of feed task, cancelling the actor nursery. Reconnect logic competes with shutdown, exacerbating the failure.

Summary On Windows, the Binance data feed occasionally drops with WinError 10054 (remote reset). During shutdown, the WebSocket stack (wsproto/trio-websocket) tries to send a Pong while the connection is already closing, raising LocalProtocolError. The feed task fails; internal actor/IPC channels are torn down; the supervisor cleans up “zombie” sub-actors. Reconnect then races with close, resulting in repeated failures. Environment OS: Windows 10/11 (x64) Runtime: Python 3.12.x (venv) Packages (observed): piker, trio, trio-websocket, wsproto, tractor Endpoint: wss://stream.binance.com:9443 (public stream) Steps to Reproduce Start pikerd with Binance feed enabled (chart/EMS optional). Let it run behind typical Windows environment (Wi-Fi/VPN/consumer firewall). Trigger or wait for a transient network hiccup (remote RST). Observe reconnect loop and eventual crash. Expected Behavior On remote close/reset: gracefully stop ping/pong, finish close handshake, and attempt reconnection with backoff—without raising LocalProtocolError or tearing down unrelated actors. Actual Behavior Socket error: WinError 10054 (connection forcibly closed by remote host). Immediately after, LocalProtocolError: Event Pong(...) cannot be sent in state ConnectionState.LOCAL_CLOSING. Actor nursery cancellations and IPC channel closures; “zombie” processes cleaned by supervisor. Reconnect attempt races with shutdown leading to additional errors. Representative Logs / Trace Impact Intermittent feed loss on Windows networks (Wi-Fi/VPN/middleboxes). Crash cascades and requires manual restart; charts/EMS sessions are interrupted. Root Cause Analysis (RCA) After receiving a close/EOF (or remote RST), the WebSocket state transitions to LOCAL_CLOSING. A background ping task or on-ping callback still attempts to send Pong, which wsproto disallows during closing → LocalProtocolError. Exception bubbles out of feed task, cancelling the actor nursery. Reconnect logic competes with shutdown, exacerbating the failure.
Sign in to join this conversation.
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: pikers/piker#47
There is no content yet.