# Peripheral Disconnect Cleanup Fix - Summary **Date:** 2025-11-12 **Branch:** refactor/abstraction-layer **Issue:** Android devices (acting as BLE centrals) disconnecting from Pi GATT servers never triggered cleanup, causing stale peer entries and eventual connection blocking at 7-peer limit. --- ## Problem Discovered ### Initial Symptoms (from production logs on 10.0.0.80 and 10.0.0.242) ``` [WARNING] LinuxBLEDriver Cannot connect to 4A:87:8C:C7:E3:F3: max peers (7) reached ``` **Root Cause Analysis:** - When Android devices connected TO Pi's GATT server (Pi as peripheral, Android as central), connections were tracked correctly - When Android disconnected, NO cleanup happened: - `connected_centrals[address]` remained in dictionary - `driver._peers[address]` remained in dictionary - Spawned interfaces, fragmenters, reassemblers stayed allocated - After ~7 peripheral disconnections, peer limit reached and blocked ALL new connections **Why It Failed:** 1. `BLEGATTServer._handle_central_disconnected()` method didn't exist 2. `on_central_disconnected` callback was never wired to driver 3. No D-Bus signal monitoring for device disconnections 4. BlueZ `PropertiesChanged` signals were ignored --- ## Fix Implemented (TDD Approach) ### 1. Test Suite Created (`tests/test_peripheral_disconnect_cleanup.py`) **9 comprehensive tests:** - Callback wiring verification - Peer dictionary cleanup - D-Bus signal handling - Multiple disconnect idempotency - Shutdown safety - Peer limit unblocking - Reconnection race conditions - Real-world scenario reproduction **All 9 tests passing ✅** ### 2. Core Cleanup Methods Added **File:** `src/RNS/Interfaces/linux_bluetooth_driver.py` **A) `LinuxBluetoothDriver._handle_peripheral_disconnected(address)` (line 852)** - Called when GATT server reports central disconnect - Removes from `_peers` dictionary (with lock protection) - Notifies `on_device_disconnected` callback to BLEInterface - Triggers full cleanup chain **B) `BluezeroGATTServer._handle_central_disconnected(address)` (line 1945)** - Removes from `connected_centrals` dictionary - Logs disconnection with connection duration - Calls `on_central_disconnected` callback (wired to driver method) **C) Callback Wiring (line 1558)** ```python self.on_central_disconnected = driver._handle_peripheral_disconnected ``` Connects GATT server disconnect events to driver cleanup. ### 3. D-Bus Disconnect Monitoring **Method:** `BluezeroGATTServer._monitor_device_disconnections()` (line 1645) **Implementation:** - Runs in separate daemon thread (`disconnect_monitor_thread`) - Subscribes to `org.freedesktop.DBus.Properties.PropertiesChanged` signals - Monitors `org.bluez.Device1` interface for `Connected` property changes - When `Connected` changes to `False`, extracts MAC address and calls cleanup - Uses `dbus_fast.aio.MessageBus` for async D-Bus operations **Lifecycle:** - Started in `BluezeroGATTServer.start()` (line 1803) - Stopped in `BluezeroGATTServer.stop()` (line 1811) - Runs continuously until `stop_event` is set --- ## Current Observations ### ✅ What Works 1. **Core cleanup logic verified by tests** - All 9 tests pass 2. **Callback wiring correct** - Methods properly connected 3. **Thread creation successful** - No import/syntax errors 4. **Deployed to 4 production devices:** - 10.0.0.80, 10.0.0.242, 10.0.0.39, 10.0.0.246 ### ⚠️ Current Issue: D-Bus Monitoring Not Logging **Observation:** D-Bus monitoring thread starts but debug messages not appearing in logs/stderr **Evidence:** - No "[GATT-MONITOR]" messages in stderr - No "D-Bus disconnect monitoring started" in RNS logfile - Thread creation code is correct (verified on device) - Import fixed (`dbus_fast.aio.MessageBus` not `dbus_fast.MessageBus`) **Possible Causes:** 1. **Signal subscription not working** - `bus.add_message_handler()` may need different approach 2. **Message matching issue** - Lambda filter might not be catching signals 3. **Threading context** - async/await in daemon thread may have issues 4. **Silent exception** - Thread dying without logging (though try/except should catch) **Impact:** Automatic disconnect detection not working YET, but manual cleanup methods are functional --- ## Testing Performed ### Unit/Integration Tests - ✅ 9/9 tests in `test_peripheral_disconnect_cleanup.py` passing - ✅ 10/10 tests in `test_bluez_state_cleanup.py` still passing - ✅ No regressions in existing test suite ### Real Hardware Deployment - ✅ Deployed to all 4 Raspberry Pi devices - ✅ Services starting successfully - ✅ No crashes or errors from new code - ⚠️ D-Bus monitoring not logging (needs investigation) ### Production Observations **Device 10.0.0.242:** - 4 centrals connected since restart (B8:27:EB:43:04:BC, 6D:99:93:FA:EF:54, B8:27:EB:10:28:CD, 4C:30:3F:6A:98:C8) - GATT server operating normally - Awaiting Android disconnect to test cleanup --- ## Next Steps for Troubleshooting ### Priority 1: Debug D-Bus Signal Subscription **Investigate:** 1. **Verify message handler is being called:** - Add print statement at top of lambda to see if ANY messages arrive - Check if filter logic (`msg.message_type.name == 'SIGNAL'`) is correct 2. **Check D-Bus signal format:** - Run `dbus-monitor --system "interface='org.freedesktop.DBus.Properties'"` on Pi - Observe actual signal structure when device disconnects - Verify our handler matches the real signal format 3. **Alternative subscription method:** ```python # Instead of add_message_handler, try: introspection = await bus.introspect('org.bluez', '/org/bluez/hci0') adapter_obj = bus.get_proxy_object('org.bluez', '/org/bluez/hci0', introspection) adapter_obj.on_properties_changed(callback) ``` ### Priority 2: Implement Timeout-Based Fallback **Simpler approach if D-Bus proves difficult:** ```python async def _poll_stale_connections(self): """Poll for stale central connections every 30s.""" while not self.stop_event.is_set(): await asyncio.sleep(30) with self.centrals_lock: for address, info in list(self.connected_centrals.items()): last_write = info.get('last_write_time', info['connected_at']) if time.time() - last_write > 60: # 60s timeout self._handle_central_disconnected(address) ``` ### Priority 3: Manual Testing **Test cleanup methods work without D-Bus:** 1. Connect Android device to Pi GATT server 2. Verify entry added to `connected_centrals` and `_peers` 3. Manually call `_handle_central_disconnected(android_mac)` 4. Verify cleanup happens correctly 5. Validate no memory leak over multiple cycles --- ## Files Modified ### Production Code - `src/RNS/Interfaces/linux_bluetooth_driver.py` - Added `_handle_peripheral_disconnected()` method (35 lines) - Added `_handle_central_disconnected()` method (30 lines) - Added `_monitor_device_disconnections()` method (112 lines) - Added `disconnect_monitor_thread` field - Wired `on_central_disconnected` callback ### Tests - `tests/test_peripheral_disconnect_cleanup.py` (NEW, 270 lines) - 9 test cases covering all scenarios - Reproduces real-world bug from production logs - Verifies cleanup flow end-to-end --- ## How to Test When D-Bus Monitoring Works **On any Pi (10.0.0.80, .242, .39, .246):** 1. **Connect Android app** as central to Pi's GATT server 2. **Watch logs** for connection: ``` [INFO] GATTServer: Central connected: (MTU: 517) ``` 3. **Disconnect Android app** 4. **Expected cleanup logs:** ``` [DEBUG] D-Bus: Device disconnected [INFO] Detected central disconnect via D-Bus: [INFO] GATTServer: Central disconnected: (was connected for X.Xs) [DEBUG] Handling peripheral disconnection from [DEBUG] Removed from _peers (peripheral disconnect) [DEBUG] Peripheral disconnection cleanup complete for ``` 5. **Verify no peer limit errors** after multiple connect/disconnect cycles --- ## Summary **Fix Status:** Core implementation complete and tested ✅ **D-Bus Monitoring:** Needs debugging ⚠️ **Fallback Option:** Timeout-based polling available if needed **Risk:** Low - new code is non-invasive, well-tested, and has safety checks **Recommended Action:** Complete D-Bus debugging or implement timeout fallback, then merge to main.