fix(ble): Add BlueZ state cleanup to prevent persistent "Operation already in progress" errors

Implements comprehensive BlueZ device state cleanup after connection failures
to prevent persistent "Operation already in progress" errors. This addresses
the issue where BlueZ maintains stale connection state after timeouts or
failures, preventing successful reconnection even after blacklist periods expire.

BlueZ State Cleanup Implementation:
- **Explicit client disconnect**: Call client.disconnect() in timeout and failure
  exception handlers to release BlueZ resources
- **D-Bus device removal**: New _remove_bluez_device() method removes stale device
  objects via BlueZ RemoveDevice() API
- **Post-blacklist cleanup**: Trigger BlueZ cleanup when peer is blacklisted after
  reaching max_connection_failures (7 failures)

Impact:
- Enables successful reconnection after temporary connection failures
- Fixes persistent errors across blacklist periods
- Prevents BlueZ from maintaining corrupted connection state
- Particularly important for Android devices with MAC address rotation

Implementation Details:
- linux_bluetooth_driver.py:786-830: New _remove_bluez_device() method
- linux_bluetooth_driver.py:1029-1044: Timeout cleanup (disconnect + removal)
- linux_bluetooth_driver.py:1051-1066: Failure cleanup (disconnect + removal)
- BLEInterface.py:1270-1285: Post-blacklist cleanup hook
- tests/test_bluez_state_cleanup.py: 10 new tests (all passing)

Documentation Updates:
- BLE_PROTOCOL_v2.2.md: New troubleshooting section for persistent InProgress errors
- CLAUDE.md: Added to recent fixes list
- CHANGELOG.md: Comprehensive fix description

Related Issues:
- Addresses "Operation already in progress" errors persisting after connection timeouts
- Fixes reconnection failures after peer blacklisting
- Prevents BlueZ state machine corruption from abandoned BleakClient instances

Testing:
- All 10 new unit tests pass
- Cleanup methods properly handle missing devices and D-Bus unavailability
- Integration testing on Raspberry Pi pending

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
torlando-tech 2025-11-10 00:51:27 -05:00
commit 9a3bfec5c7
6 changed files with 564 additions and 9 deletions

View file

@ -1039,11 +1039,58 @@ rnsd --verbose
- Ensure you're running version with race condition fix (check Platform-Specific Workarounds → Connection Race Condition Prevention)
- Check if external BLE tools (like `bluetoothctl`) are simultaneously attempting connections
- Verify BlueZ experimental features are enabled (`bluetoothd -E` flag)
- **If errors persist after connection timeouts or blacklist periods**, see "BlueZ State Corruption" section below
**See Also:** Platform-Specific Workarounds → Connection Race Condition Prevention for implementation details.
---
### Problem: "Operation already in progress" errors persisting after connection failures
**Symptoms:**
- `[org.bluez.Error.InProgress]` errors continue even after fixing race conditions
- Peer gets blacklisted after 7 failed connection attempts
- After blacklist expires, immediate re-failure with same "InProgress" error
- Errors occur on connection timeouts or when peer disappears during connection
**Cause:** BlueZ state corruption. When a connection attempt fails (timeout, peer disappeared, etc.), the BleakClient is abandoned without cleanup:
1. BlueZ maintains internal connection state (thinks connection is "in progress")
2. BlueZ device object persists in D-Bus with stale state
3. Subsequent connection attempts hit the stale state → "InProgress" error
4. Errors persist across blacklist periods because BlueZ state is never cleared
**Fix (v2.2.2+):** Automatic BlueZ state cleanup:
1. **Explicit client disconnect**: `client.disconnect()` called in timeout and failure handlers
2. **D-Bus device removal**: Stale BlueZ device objects removed via `RemoveDevice()` API
3. **Post-blacklist cleanup**: BlueZ state cleared when peer is blacklisted
**Implementation Details:**
- `linux_bluetooth_driver.py:_remove_bluez_device()` - Removes stale D-Bus device objects
- Exception handlers call cleanup after timeouts/failures (lines 1040-1066)
- Blacklist mechanism triggers cleanup (BLEInterface.py:1475-1490)
**Manual Verification:**
```bash
# Check logs for cleanup messages (DEBUG level)
grep -i "removed stale bluez device\|cleanup" ~/.reticulum/logfile
# Manually remove BlueZ device if needed
bluetoothctl remove <MAC_ADDRESS>
# Restart BlueZ if state is completely corrupted
sudo systemctl restart bluetooth
```
**Expected Behavior After Fix:**
- Successful reconnection after temporary connection failures
- Successful reconnection after blacklist period expires
- No persistent "InProgress" errors across multiple connection attempts
- BlueZ device objects automatically cleaned up on failures
**See Also:** CHANGELOG.md for detailed implementation notes.
---
## Configuration Reference
This section documents all configuration parameters available for the BLE interface. These are set in the Reticulum configuration file (e.g., `~/.reticulum/config`).