# macOS Compatibility Fixes for Piker/Tractor This guide documents macOS-specific issues encountered when running `piker` on macOS and their solutions. These fixes address platform differences between Linux and macOS in areas like socket credentials, shared memory naming, and async runtime coordination. ## Table of Contents 1. [Socket Credential Passing](#1-socket-credential-passing) 2. [Shared Memory Name Length Limits](#2-shared-memory-name-length-limits) 3. [Shared Memory Cleanup Race Conditions](#3-shared-memory-cleanup-race-conditions) 4. [Async Runtime (Trio/AsyncIO) Coordination](#4-async-runtime-trioasyncio-coordination) --- ## 1. Socket Credential Passing ### Problem On Linux, `tractor` uses `SO_PASSCRED` and `SO_PEERCRED` socket options for Unix domain socket credential passing. macOS doesn't support these constants, causing `AttributeError` when importing. ```python # Linux code that fails on macOS from socket import SO_PASSCRED, SO_PEERCRED # AttributeError on macOS ``` ### Error Message ``` AttributeError: module 'socket' has no attribute 'SO_PASSCRED' ``` ### Root Cause - **Linux**: Uses `SO_PASSCRED` (to enable credential passing) and `SO_PEERCRED` (to retrieve peer credentials) - **macOS**: Uses `LOCAL_PEERCRED` (value `0x0001`) instead, and doesn't require enabling credential passing ### Solution Make the socket credential imports platform-conditional: **File**: `tractor/ipc/_uds.py` (or equivalent in `piker` if duplicated) ```python import sys from socket import ( socket, AF_UNIX, SOCK_STREAM, ) # Platform-specific credential passing constants if sys.platform == 'linux': from socket import SO_PASSCRED, SO_PEERCRED elif sys.platform == 'darwin': # macOS # macOS uses LOCAL_PEERCRED instead of SO_PEERCRED # and doesn't need SO_PASSCRED LOCAL_PEERCRED = 0x0001 SO_PEERCRED = LOCAL_PEERCRED # Alias for compatibility SO_PASSCRED = None # Not needed on macOS else: # Other platforms - may need additional handling SO_PASSCRED = None SO_PEERCRED = None # When creating a socket if SO_PASSCRED is not None: sock.setsockopt(SOL_SOCKET, SO_PASSCRED, 1) # When getting peer credentials if SO_PEERCRED is not None: creds = sock.getsockopt(SOL_SOCKET, SO_PEERCRED, struct.calcsize('3i')) ``` ### Implementation Notes - The `LOCAL_PEERCRED` value `0x0001` is specific to macOS (from ``) - macOS doesn't require explicitly enabling credential passing like Linux does - Consider using `ctypes` or `cffi` for a more robust solution if available --- ## 2. Shared Memory Name Length Limits ### Problem macOS limits POSIX shared memory names to **31 characters** (defined as `PSHMNAMLEN` in ``). Piker generates long descriptive names that exceed this limit, causing `OSError`. ```python # Long name that works on Linux but fails on macOS shm_name = "piker_quoter_tsla.nasdaq.ib_hist_1m" # 39 chars - too long! ``` ### Error Message ``` OSError: [Errno 63] File name too long: '/piker_quoter_tsla.nasdaq.ib_hist_1m' ``` ### Root Cause - **Linux**: Supports shared memory names up to 255 characters - **macOS**: Limits to 31 characters (including leading `/`) ### Solution Implement automatic name shortening for macOS while preserving the original key for lookups: **File**: `piker/data/_sharedmem.py` ```python import hashlib import sys def _shorten_key_for_macos(key: str) -> str: ''' macOS has a 31 character limit for POSIX shared memory names. Hash long keys to fit within this limit while maintaining uniqueness. ''' # macOS shm_open() has a 31 char limit (PSHMNAMLEN) # Use format: /p_ where hash is first 16 hex chars of sha256 # This gives us: / + p_ + 16 hex chars = 19 chars, well under limit # We keep the 'p' prefix to indicate it's from piker if len(key) <= 31: return key # Create a hash of the full key key_hash = hashlib.sha256(key.encode()).hexdigest()[:16] short_key = f'p_{key_hash}' return short_key class _Token(Struct, frozen=True): ''' Internal representation of a shared memory "token" which can be used to key a system wide post shm entry. ''' shm_name: str # actual OS-level name (may be shortened on macOS) shm_first_index_name: str shm_last_index_name: str dtype_descr: tuple size: int # in struct-array index / row terms key: str | None = None # original descriptive key (for lookup) def __eq__(self, other) -> bool: ''' Compare tokens based on shm names and dtype, ignoring the key field. The key field is only used for lookups, not for token identity. ''' if not isinstance(other, _Token): return False return ( self.shm_name == other.shm_name and self.shm_first_index_name == other.shm_first_index_name and self.shm_last_index_name == other.shm_last_index_name and self.dtype_descr == other.dtype_descr and self.size == other.size ) def __hash__(self) -> int: '''Hash based on the same fields used in __eq__''' return hash(( self.shm_name, self.shm_first_index_name, self.shm_last_index_name, self.dtype_descr, self.size, )) def _make_token( key: str, size: int, dtype: np.dtype | None = None, ) -> _Token: ''' Create a serializable token that uniquely identifies a shared memory segment. ''' if dtype is None: dtype = def_iohlcv_fields # On macOS, shorten long keys to fit the 31-char limit if sys.platform == 'darwin': shm_name = _shorten_key_for_macos(key) shm_first = _shorten_key_for_macos(key + "_first") shm_last = _shorten_key_for_macos(key + "_last") else: shm_name = key shm_first = key + "_first" shm_last = key + "_last" return _Token( shm_name=shm_name, shm_first_index_name=shm_first, shm_last_index_name=shm_last, dtype_descr=tuple(np.dtype(dtype).descr), size=size, key=key, # Store original key for lookup ) ``` ### Key Design Decisions 1. **Hash-based shortening**: Uses SHA256 to ensure uniqueness and avoid collisions 2. **Preserve original key**: Store the original descriptive key in the `_Token` for debugging and lookups 3. **Custom equality**: The `__eq__` and `__hash__` methods ignore the `key` field to ensure tokens are compared by their actual shm properties 4. **Platform detection**: Only applies shortening on macOS (`sys.platform == 'darwin'`) ### Edge Cases to Consider - Token serialization across processes (the `key` field must survive IPC) - Token lookup in dictionaries and caches - Debugging output (use `key` field for human-readable names) --- ## 3. Shared Memory Cleanup Race Conditions ### Problem During teardown, shared memory segments may be unlinked by one process while another is still trying to clean them up, causing `FileNotFoundError` to crash the application. ### Error Message ``` FileNotFoundError: [Errno 2] No such file or directory: '/p_74c86c7228dd773b' ``` ### Root Cause In multi-process architectures like `tractor`, multiple processes may attempt to clean up shared resources simultaneously. Race conditions during shutdown can cause: 1. Process A unlinks the shared memory 2. Process B tries to unlink the same memory → `FileNotFoundError` 3. Uncaught exception crashes Process B ### Solution Add defensive error handling to catch and log cleanup races: **File**: `piker/data/_sharedmem.py` ```python class ShmArray: # ... existing code ... def destroy(self) -> None: ''' Destroy the shared memory segment and cleanup OS resources. ''' if _USE_POSIX: # We manually unlink to bypass all the "resource tracker" # nonsense meant for non-SC systems. shm = self._shm name = shm.name try: shm_unlink(name) except FileNotFoundError: # Might be a teardown race where another process # already unlinked it - this is fine, just log it log.warning(f'Shm for {name} already unlinked?') # Also cleanup the index counters if hasattr(self, '_first'): try: self._first.destroy() except FileNotFoundError: log.warning(f'First index shm already unlinked?') if hasattr(self, '_last'): try: self._last.destroy() except FileNotFoundError: log.warning(f'Last index shm already unlinked?') class SharedInt: # ... existing code ... def destroy(self) -> None: if _USE_POSIX: # We manually unlink to bypass all the "resource tracker" # nonsense meant for non-SC systems. name = self._shm.name try: shm_unlink(name) except FileNotFoundError: # might be a teardown race here? log.warning(f'Shm for {name} already unlinked?') ``` ### Implementation Notes - This fix is platform-agnostic but particularly important on macOS where the shortened names make debugging harder - The warnings help identify cleanup races during development - Consider adding metrics/counters if cleanup races become frequent --- ## 4. Async Runtime (Trio/AsyncIO) Coordination ### Problem The `TrioTaskExited` error occurs when trio tasks are cancelled while asyncio tasks are still running, indicating improper coordination between the two async runtimes. ### Error Message ``` tractor._exceptions.TrioTaskExited: but the child `asyncio` task is still running? >> |_ ...> ``` ### Root Cause `tractor` uses "guest mode" to run trio as a guest in asyncio's event loop (or vice versa). The error occurs when: 1. A trio task is cancelled (e.g., user closes the UI) 2. The cancellation propagates to cleanup handlers 3. Cleanup tries to exit while asyncio tasks are still running 4. The `translate_aio_errors` context manager detects this inconsistent state ### Current State This issue is **partially resolved** by the other fixes (socket credentials and shared memory), which eliminate the underlying errors that trigger premature cancellation. However, it may still occur in edge cases. ### Potential Solutions #### Option 1: Improve Cancellation Propagation (Tractor-level) **File**: `tractor/to_asyncio.py` ```python async def translate_aio_errors( chan, wait_on_aio_task: bool = False, suppress_graceful_exits: bool = False, ): ''' Context manager to translate asyncio errors to trio equivalents. ''' try: yield except trio.Cancelled: # When trio is cancelled, ensure asyncio tasks are also cancelled if wait_on_aio_task: # Give asyncio tasks a chance to cleanup await trio.lowlevel.checkpoint() # Check if asyncio task is still running if aio_task and not aio_task.done(): # Cancel it gracefully aio_task.cancel() # Wait briefly for cancellation with trio.move_on_after(0.5): # 500ms timeout await wait_for_aio_task_completion(aio_task) raise # Re-raise the cancellation ``` #### Option 2: Proper Shutdown Sequence (Application-level) **File**: `piker/brokers/ib/api.py` (or similar broker modules) ```python async def load_clients_for_trio( client: Client, ... ) -> None: ''' Load asyncio client and keep it running for trio. ''' try: # Setup client await client.connect() # Keep alive - but make it cancellable await trio.sleep_forever() except trio.Cancelled: # Explicit cleanup before propagating cancellation log.info("Shutting down asyncio client gracefully") # Disconnect client if client.isConnected(): await client.disconnect() # Small delay to let asyncio cleanup await trio.sleep(0.1) raise # Now safe to propagate ``` #### Option 3: Detection and Warning (Current Approach) The current code detects the issue and raises a clear error. This is acceptable if: 1. The error is rare (only during abnormal shutdown) 2. It doesn't cause data loss 3. Logs provide enough info for debugging ### Recommended Approach For **piker**: Implement Option 2 (proper shutdown sequence) in broker modules where asyncio is used. For **tractor**: Consider Option 1 (improved cancellation propagation) as a library-level enhancement. ### Testing Test the fix by: ```python # Test graceful shutdown async def test_asyncio_trio_shutdown(): async with open_channel_from(...) as (first, chan): # Do some work await chan.send(msg) # Trigger cancellation raise KeyboardInterrupt # Should cleanup without TrioTaskExited error ``` --- ## Summary of Changes ### Files Modified in Piker 1. **`piker/data/_sharedmem.py`** - Added `_shorten_key_for_macos()` function - Modified `_Token` class to store original `key` - Modified `_make_token()` to use shortened names on macOS - Added `FileNotFoundError` handling in `destroy()` methods 2. **`piker/ui/_display.py`** - Removed assertion that checked for 'hist' in shm name (incompatible with shortened names) ### Files to Modify in Tractor (Recommended) 1. **`tractor/ipc/_uds.py`** - Make socket credential imports platform-conditional - Handle macOS-specific `LOCAL_PEERCRED` 2. **`tractor/to_asyncio.py`** (Optional) - Improve cancellation propagation between trio and asyncio - Add graceful shutdown timeout for asyncio tasks ### Platform Detection Pattern Use this pattern consistently: ```python import sys if sys.platform == 'darwin': # macOS # macOS-specific code pass elif sys.platform == 'linux': # Linux # Linux-specific code pass else: # Other platforms / fallback pass ``` ### Testing Checklist - [ ] Test on macOS (Darwin) - [ ] Test on Linux - [ ] Test shared memory with names > 31 chars - [ ] Test multi-process cleanup race conditions - [ ] Test graceful shutdown (Ctrl+C) - [ ] Test abnormal shutdown (kill signal) - [ ] Verify no memory leaks (check `/dev/shm` on Linux, `ipcs -m` on macOS) --- ## Additional Resources - **macOS System Headers**: - `/usr/include/sys/un.h` - Unix domain socket constants - `/usr/include/sys/posix_shm_internal.h` - Shared memory limits - **Python Documentation**: - [`socket` module](https://docs.python.org/3/library/socket.html) - [`multiprocessing.shared_memory`](https://docs.python.org/3/library/multiprocessing.shared_memory.html) - **Trio/AsyncIO**: - [Trio Guest Mode](https://trio.readthedocs.io/en/stable/reference-lowlevel.html#using-guest-mode-to-run-trio-on-top-of-other-event-loops) - [Tractor Documentation](https://github.com/goodboy/tractor) --- ## Contributing When implementing these fixes in your own project: 1. **Test thoroughly** on both macOS and Linux 2. **Add platform guards** to prevent cross-platform breakage 3. **Document platform-specific behavior** in code comments 4. **Consider CI/CD** testing on multiple platforms 5. **Handle edge cases** gracefully with proper logging If you find additional macOS-specific issues, please contribute to this guide!