Add tests to test_zombie_connection_detection.py (which CI runs) to cover:
- _handle_identity_handshake: non-16-byte rejection, duplicate handling
- _pending_identity_connections cleanup after handshake
- _spawn_peer_interface zombie tracking initialization
These tests cover the same code paths as test_v2_2_identity_handshake.py
but are in a file that CI includes, achieving 100% patch coverage.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace Mock-based fixtures with real BLEInterface instances in
stale identity check tests. This ensures coverage.py properly
tracks execution of production code paths.
The Mock approach with method binding executed the production code
but coverage tracking was inconsistent. Using real instances
guarantees proper coverage attribution.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add test for _pending_identity_connections cleanup during successful
identity handshake (lines 1272-1275), achieving 100% patch coverage
for PR #38 changes.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Tests verify that:
- Duplicate 16-byte handshake matching known identity is consumed
- Different 16-byte data is also consumed to prevent reassembler errors
- Non-16-byte data is not incorrectly consumed as handshake
- Normal handshake processing works when identity not yet known
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When BLE link degrades, 1-byte keepalives may still work while larger data
packets fail. Both sides think the connection is "alive" based on keepalives,
but data can't flow. This causes a deadlock where new connections are
rejected as "duplicates" even though the existing connection is non-functional.
This change adds zombie detection by tracking when real data (not keepalives)
was last received. If an existing connection has only exchanged keepalives
for > 30 seconds (configurable via _zombie_timeout), new connections from
the same identity are allowed and the zombie connection is disconnected.
Changes:
- Add _last_real_data dict to track last real data timestamp per identity
- Add _zombie_timeout (default 30s) for configurable zombie threshold
- Update _check_duplicate_identity with Check 3: zombie detection
- Update _handle_ble_data to track real data activity after keepalive filter
- Initialize tracking in _handle_identity_handshake and _spawn_peer_interface
- Clean up tracking in _process_pending_detaches
- Add comprehensive test suite for zombie detection
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a peer disconnects, identity_to_address is NOT cleaned up immediately -
it's only removed after a 2-second grace period. However, _check_duplicate_identity
was not checking if the existing address is still connected before rejecting.
This caused legitimate reconnections from the same identity (after MAC rotation
or reconnection) to be incorrectly rejected as "duplicates" during the grace
period or when cleanup was delayed.
The fix adds two checks before rejecting:
1. If pending_detach exists for this identity (old connection already gone)
2. If existing address is not in connected_peers or peers dict
Also adds TDD tests that demonstrate the bug and verify the fix.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add tests that exercise the actual code path in linux_bluetooth_driver.py
for duplicate identity exception handling. These tests patch BleakClient
to verify that:
- Duplicate identity exceptions are logged as WARNING, not ERROR
- on_error callback uses 'info' severity for duplicate identity errors
- Normal connection failures still use 'error' severity
This improves patch coverage for the duplicate identity handling fix
by testing the driver code directly rather than just the logic in isolation.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix BLEInterface.handle_peripheral_data to use _compute_identity_hash
instead of RNS.Identity.full_hash for consistent identity hash computation
- Update MockBLEDriver.on_device_connected callback to match the
(address, peer_identity) signature in bluetooth_driver.py
- Fix test_v2_2_identity_handshake.py and test_v2_2_race_conditions.py
to properly mock ble_reticulum.Interface without breaking the namespace
- Use BLEFragmenter/BLEReassembler directly in tests instead of
non-existent _create_fragmenter/_create_reassembler methods
- Fix asyncio.get_event_loop() deprecation in test_ble_peer_interface.py
for Python 3.10+ compatibility
- Update MAC address test fixtures to account for v2.2 MAC sorting logic
- Fix test_peer_address_mac_rotation to properly simulate MAC rotation
where old connection is dropped before new one arrives
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Previously, _handle_identity_handshake (peripheral mode) did not check
for duplicate identities. If a peer connected via two MACs simultaneously,
both connections could be accepted.
Now, _handle_identity_handshake calls _check_duplicate_identity before
accepting the handshake. If the identity is already connected at a
different MAC, the new connection is rejected and disconnected.
This makes both central and peripheral modes consistent in rejecting
duplicate connections during MAC rotation overlap.
Also adds tests for peripheral mode duplicate rejection.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a peer connects with an identity already connected at a different
MAC address (Android MAC rotation), the connection is correctly rejected.
However, the error message format "Connection failed to {address}" was
matching the blacklist regex, causing the new MAC to be blacklisted.
After 3 duplicate rejections, the new MAC would be blacklisted for 60s+,
creating connectivity gaps when the old MAC finally disconnected.
Fix:
- Detect "Duplicate identity" in exception message
- Use severity "info" instead of "error" (doesn't trigger blacklist)
- Use safe message format "Duplicate identity rejected for {address}"
which doesn't match the blacklist regex pattern
Also adds comprehensive tests for MAC rotation blacklist behavior.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add test suite to verify that BLEPeerInterface.peer_address is properly
updated when BLE MAC address rotation occurs. Tests cover all 4 code paths:
- _address_changed_callback: primary path for address migration
- _mtu_negotiated_callback: when interface exists for identity at new address
- _handle_identity_handshake: when identity arrives at new address
- _spawn_peer_interface: when reusing interface for new address
Also includes tests for:
- Proper logging of address updates for debugging
- Consistency between peer_address and address_to_interface mapping
- Multiple consecutive MAC rotations
These tests prevent regression of the bidirectional BLE communication bug
where peripheral->central sends failed after MAC rotation.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add test coverage for new BLEInterface cleanup functionality:
- _cleanup_pending_identity_connections: timeout handling for non-Reticulum devices
- _process_pending_detaches: delayed interface detachment with grace period
- _validate_spawned_interfaces: orphaned interface cleanup
- Multi-address disconnect handling (keeping interface alive for MAC rotation)
- Thread-safe _spawn_peer_interface with locking
- Central disconnect callback with address fallback
These tests cover the 195 lines of new code in BLEInterface.py to improve
patch coverage for PR #35.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix mock_ble_driver.py import path (src/RNS/Interfaces -> src/ble_reticulum)
- Add address_to_interface, _pending_detach, _pending_detach_grace_period to test fixture
- Update test_identity_cache to expect grace period behavior (deferred cleanup)
- Update test_v2_2_mac_sorting to use renamed _cleanup_stale_address function
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous patch changed RNS.log in the test's namespace, but
BLEInterface.py imports RNS at module load time. To capture log
calls from _address_changed_callback(), we need to patch RNS.log
where it's used: ble_reticulum.BLEInterface.RNS.log
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The test was using mock_rns.log.assert_called() but the real
BLEInterface code calls the actual RNS.log function, not the mock.
Fixed by patching RNS.log at module level to capture actual log calls,
then asserting the "no identity found" warning was logged.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add 14 tests that exercise the REAL BLEInterface code:
- TestIdentityCacheOnDisconnect: verify caching on disconnect
- TestIdentityCacheOnDataReceive: verify cache restoration on data
- TestAddressChangedCallback: verify address migration
- TestCacheTTL: verify 60-second TTL behavior
- TestReassemblyCodePath: verify cache in reassembly path
- TestEdgeCases: concurrent access, multiple disconnects
Tests skip locally if RNS not installed but run in CI to provide
actual line coverage on the identity cache changes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add TestIdentityCacheIntegration class that imports and tests actual
BLEInterface methods instead of just mocking the logic. This should
provide codecov coverage on the changed lines.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update install.sh to copy from src/ble_reticulum/
- Update test files with new source paths
- Update GitHub workflows for new package structure
- Remove temporary refactoring helper scripts
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fixes namespace collision with Reticulum's own RNS.Interfaces package.
When both packages were installed, the collision caused import issues
and prevented BLE discovery between devices.
Changes:
- Rename src/RNS/Interfaces/ to src/ble_reticulum/
- Update pyproject.toml package configuration
- Update all imports in source and test files
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(ble): Increase D-Bus monitoring intervals to prevent HCI errors
The D-Bus monitoring threads were polling too frequently (0.5s and 30s),
causing HCI command collisions on BCM43xx single-radio chips. These chips
cannot handle concurrent BLE operations, and the frequent D-Bus activity
was interfering with scan/advertise cycles.
Changes:
- Increase D-Bus disconnect monitor interval from 0.5s to 5s
- Increase stale connection poll interval from 30s to 120s
This eliminates HCI errors (Opcode 0x2005/0x2006) while preserving
disconnect detection functionality with slightly higher latency.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* refactor(ble): Convert D-Bus monitoring to event-driven approach
Replace polling-based D-Bus monitoring with true event-driven pattern:
1. D-Bus monitor thread:
- Use asyncio.Event instead of periodic sleep
- Store loop reference for thread-safe shutdown
- Use call_soon_threadsafe to wake loop on stop
2. Stale poll thread:
- Replace busy-wait loop (240 x 0.5s) with single Event.wait()
- Increase interval from 120s to 300s (safety net only)
- Immediate response to stop signal
Benefits:
- Zero CPU usage while waiting (no periodic wakeups)
- Immediate shutdown response (ms instead of 5s)
- Cleaner, simpler code
- Maintains disconnect detection via D-Bus signals
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* test(ble): Add comprehensive unit tests for HCI error fixes
Add 26 new unit tests for the event-driven D-Bus monitoring fixes
that eliminated HCI errors on BCM43xx single-radio chips.
Test coverage:
- TestEventDrivenDBusMonitor: Tests asyncio.Event usage, immediate
wake response, call_soon_threadsafe cross-thread signaling
- TestStalePollImprovements: Tests threading.Event.wait() usage,
300s interval, immediate stop response
- TestStopShutdownBehavior: Tests stop() async signaling, RuntimeError
handling, shutdown latency improvement
- TestIntegrationScenarios: Tests full lifecycle, multiple stop calls
- TestCodeVerification: Verifies actual source patterns match expected
All 26 tests pass without requiring pytest-asyncio plugin.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(tests): Use threading.RLock instead of asyncio.Lock in test fixtures
In Python 3.8/3.9, asyncio.Lock() requires a running event loop. When
test_hci_error_fixes.py runs first (alphabetically) and uses asyncio.run(),
it closes the event loop after each test. Subsequent test fixtures that
create asyncio.Lock() then fail with "no current event loop" errors.
Since these are mock fixtures that don't need async semantics, use
threading.RLock() instead.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix(tests): Replace all asyncio.Lock() with threading.RLock() in test mocks
asyncio.Lock() requires a running event loop in Python 3.8/3.9. When
test files using asyncio.run() execute first, the event loop is closed,
causing subsequent test fixtures to fail when creating asyncio.Lock().
Fixed in:
- test_peripheral_disconnect_cleanup.py (mock_gatt_server fixture)
- test_bluez_state_cleanup.py (mock_driver fixture)
- test_ble_peer_interface.py (create_mock_peer_interface helper)
- conftest.py (create_mock_interface helper)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
---------
Co-authored-by: torlando-tech <torlando-tech@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Add TestMACRotationBypassesSorting class with 4 tests for the MAC rotation fix
- Fix MockInterface to properly mock RNS.Interfaces.Interface module
- Add get_config_obj() to MockInterface for BLEInterface initialization
- Fix inverted test expectations in test_sequential_mac_addresses and
test_mac_sorting_with_multiple_peers (lower MAC initiates connection)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reorder operations in handle_peripheral_data() to create
fragmenter/reassembler BEFORE spawning peer interface. This
prevents data from being dropped during the brief window when
the interface exists but the reassembler doesn't.
Also adds unit tests to verify the fix and prevent regression.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes integration test failures where TestRealWorldScenario tests
couldn't access the mock_driver fixture.
The mock_driver fixture was defined inside TestPeripheralDisconnectCleanup
class, making it unavailable to TestRealWorldScenario class. This caused
pytest fixture lookup errors:
- test_both_monitoring_mechanisms_detect_disconnect_idempotent
- test_polling_catches_missed_dbus_signal
Solution: Move mock_driver to module level (outside class) so all test
classes can access it as a shared fixture.
All integration tests now pass locally.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes three CI failures identified in workflow run #19395416465:
1. **Missing threading import** (test_peripheral_disconnect_cleanup.py)
- Added missing `import threading` to fix NameError during test setup
- Tests use threading.RLock() but import was missing
2. **Timing race condition** (test_stale_connection_polling.py)
- Increased sleep from 0.15s to 1.5s in test_polling_interval_30_seconds
- Test expects 2 polling cycles at 0.6s each, was timing out in CI
3. **Container-aware Bluetooth checks** (install.sh)
- Added is_container() helper to detect Docker/container environments
- Skip Bluetooth adapter power checks in containers (no hardware access)
- Prevents false failures from bluetoothctl crashes in CI environments
All changes are test/installer infrastructure only - no production code changes.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fix stale connection issue where identity mappings persist after disconnect,
preventing automatic reconnection when peer returns with different MAC address.
ROOT CAUSE:
- _device_disconnected_callback() cleaned up spawned_interfaces but NOT:
- address_to_identity mapping
- identity_to_address mapping
- handle_central_disconnected() had same issue
- Result: Laptop thinks it's still connected after Android restarts
- Manual rnsd restart required to clear stale state
THE FIX (TDD Approach):
1. RED: Wrote 5 tests demonstrating the bug (all FAILED initially)
2. GREEN: Added identity mapping cleanup to both disconnect methods
3. GREEN: All 5 tests now PASS
Changes:
- BLEInterface.py _device_disconnected_callback():
- Added del address_to_identity[address]
- Added del identity_to_address[identity_hash]
- BLEInterface.py handle_central_disconnected():
- Added del address_to_identity[address]
- Added del identity_to_address[identity_hash]
- linux_bluetooth_driver.py:
- Added RNS warning handler for better logging
- tests/test_identity_mapping_cleanup.py (NEW):
- 5 tests verifying identity mapping cleanup
- Tests both central and peripheral disconnect modes
- Reproduces real-world stale connection scenario
- Verifies automatic reconnection after fix
Test Results:
✅ All 5 tests PASS after fix
✅ Mappings properly cleaned up on disconnect
✅ Automatic reconnection enabled
Impact:
- No more manual rnsd restart needed
- Android MAC rotation handled correctly
- Stale connections automatically cleaned up
- Reconnection works without intervention
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The original D-Bus monitoring implementation (from peripheral disconnect fix)
wasn't receiving signals due to improper low-level API usage. This commit
replaces it with two reliable solutions:
Solution A: High-Level ObjectManager API
- Uses proper D-Bus proxy interface with automatic signal subscription
- Discovers and subscribes to all BlueZ devices (existing + new)
- PropertiesChanged callbacks properly integrated with asyncio event loop
- Signals now correctly delivered when centrals disconnect
Solution B: Timeout-Based Polling Fallback
- Polls BlueZ device state every 30 seconds as safety net
- Detects stale connections missed by D-Bus signals
- Uses sync dbus-python for simplicity and reliability
- Guaranteed cleanup within 30s even if signals fail
Implementation:
- Replaced _monitor_device_disconnections() with ObjectManager-based approach
- Added _poll_stale_connections() as polling fallback
- Both threads run concurrently for dual-layer monitoring
- Cleanup is idempotent (both detecting same disconnect is safe)
Testing:
- Added test_dbus_disconnect_monitoring.py (10 test cases)
- Added test_stale_connection_polling.py (8 test cases)
- Added 2 integration tests to test_peripheral_disconnect_cleanup.py
- All tests mock D-Bus libraries, no real D-Bus required
- Manual validation script (test_monitoring.py) verified locally
Impact:
- Peripheral disconnects now detected within ~1s (D-Bus) or 30s max (polling)
- Prevents "max peers (7) reached" blocking after multiple disconnect cycles
- System can handle unlimited connect/disconnect cycles without memory leaks
Reference: DBUS_MONITORING_FIX.md for complete analysis and troubleshooting
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixes a critical bug where Android devices (acting as BLE centrals) disconnecting
from Pi GATT servers (acting as peripherals) never triggered cleanup, causing stale
peer entries to accumulate until the 7-peer limit was reached and blocked all new
connections.
## Root Cause
- When centrals disconnected from peripheral mode, no cleanup occurred
- `BLEGATTServer._handle_central_disconnected()` method didn't exist
- `on_central_disconnected` callback was never wired to driver
- No D-Bus signal monitoring for device disconnections
- Stale entries remained in `_peers` dict until daemon restart
## Implementation (TDD Approach)
**New Methods:**
- `LinuxBluetoothDriver._handle_peripheral_disconnected()` (line 852)
- Removes peer from `_peers` dictionary
- Notifies on_device_disconnected callback
- Triggers full cleanup chain in BLEInterface
- `BluezeroGATTServer._handle_central_disconnected()` (line 1945)
- Removes from `connected_centrals` dictionary
- Logs connection duration
- Invokes driver callback
- `BluezeroGATTServer._monitor_device_disconnections()` (line 1645)
- Monitors D-Bus PropertiesChanged signals
- Detects when Connected property becomes False
- Runs in separate daemon thread
- Automatically triggers cleanup on disconnect
**Callback Wiring:** (line 1558)
`on_central_disconnected = driver._handle_peripheral_disconnected`
## Testing
- Created comprehensive test suite (9 tests, all passing)
- `tests/test_peripheral_disconnect_cleanup.py`:
- Callback wiring verification
- Peer dictionary cleanup
- D-Bus signal handling simulation
- Edge cases (multiple disconnects, race conditions, shutdown)
- Reproduces real-world bug from production logs
- No regressions in existing tests (test_bluez_state_cleanup.py passes)
## Current Status
✅ Core cleanup logic implemented and tested
✅ Deployed to 4 production devices (10.0.0.80, .242, .39, .246)
⚠️ D-Bus monitoring thread needs debugging (not logging yet)
**Known Issue:** D-Bus signal subscription may need alternative approach.
See PERIPHERAL_DISCONNECT_FIX_SUMMARY.md for troubleshooting steps.
**Fallback Option:** Timeout-based polling can be implemented if D-Bus proves difficult.
Reference: Production logs showed device 4A:87:8C:C7:E3:F3 repeatedly blocked
by "max peers (7) reached" due to uncleaned peripheral disconnections.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fixed race condition where started_event fires before peripheral.publish()
fully exports GATT services to D-Bus, causing "Reticulum service not found"
errors when central devices connect immediately after server startup.
Root cause:
- started_event.set() called on line 1665
- peripheral_obj.publish() called on line 1669 (exports to D-Bus)
- 50-200ms gap where server thinks it's ready but services aren't on D-Bus yet
- Central connects during gap -> "service not found" error
Fix:
- Added _verify_services_on_dbus() method to poll D-Bus adapter introspection
- Polls every 200ms with 5-second timeout after started_event fires
- Only returns from start() after confirming services are exported
- Graceful degradation: warns on timeout but doesn't fail startup
Impact:
- Eliminates "service not found" errors during server startup
- Ensures services are actually available before accepting connections
- Typical verification time: 100-300ms
- No runtime performance impact (only affects startup)
Files changed:
- src/RNS/Interfaces/linux_bluetooth_driver.py: Add D-Bus polling
- tests/test_gatt_server_readiness.py: Add test coverage
- BLE_PROTOCOL_v2.2.md: Document initialization race fix
- CHANGELOG.md: Record fix details
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
ConnectDevice() D-Bus method returns an object path (signature 'o') which
should be treated as success, not error. Previously, the return value was
not captured or logged, causing confusion when error messages like
"br-connection-profile-unavailable" appeared (which is expected for LE-only
connections).
Changes:
- Capture object path returned by call_connect_device()
- Log object path for debugging visibility
- Document that object path indicates successful LE connection initiation
- Clarify that BR/EDR profile unavailable is expected for BLE-only connections
Impact:
- Eliminates confusion from "profile unavailable" error messages
- Confirms LE connection was successfully initiated
- Improved debugging visibility through object path logging
Files changed:
- src/RNS/Interfaces/linux_bluetooth_driver.py: Capture and log object path
- tests/test_breddr_fallback_prevention.py: Add test coverage
- BLE_PROTOCOL_v2.2.md: Document object path behavior
- CHANGELOG.md: Record fix details
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Implements comprehensive BlueZ device state cleanup after connection failures
to prevent persistent "Operation already in progress" errors. This addresses
the issue where BlueZ maintains stale connection state after timeouts or
failures, preventing successful reconnection even after blacklist periods expire.
BlueZ State Cleanup Implementation:
- **Explicit client disconnect**: Call client.disconnect() in timeout and failure
exception handlers to release BlueZ resources
- **D-Bus device removal**: New _remove_bluez_device() method removes stale device
objects via BlueZ RemoveDevice() API
- **Post-blacklist cleanup**: Trigger BlueZ cleanup when peer is blacklisted after
reaching max_connection_failures (7 failures)
Impact:
- Enables successful reconnection after temporary connection failures
- Fixes persistent errors across blacklist periods
- Prevents BlueZ from maintaining corrupted connection state
- Particularly important for Android devices with MAC address rotation
Implementation Details:
- linux_bluetooth_driver.py:786-830: New _remove_bluez_device() method
- linux_bluetooth_driver.py:1029-1044: Timeout cleanup (disconnect + removal)
- linux_bluetooth_driver.py:1051-1066: Failure cleanup (disconnect + removal)
- BLEInterface.py:1270-1285: Post-blacklist cleanup hook
- tests/test_bluez_state_cleanup.py: 10 new tests (all passing)
Documentation Updates:
- BLE_PROTOCOL_v2.2.md: New troubleshooting section for persistent InProgress errors
- CLAUDE.md: Added to recent fixes list
- CHANGELOG.md: Comprehensive fix description
Related Issues:
- Addresses "Operation already in progress" errors persisting after connection timeouts
- Fixes reconnection failures after peer blacklisting
- Prevents BlueZ state machine corruption from abandoned BleakClient instances
Testing:
- All 10 new unit tests pass
- Cleanup methods properly handle missing devices and D-Bus unavailability
- Integration testing on Raspberry Pi pending
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Removed test_refactor_suite.py as it is completely superseded by the
comprehensive test suite:
Reasons for removal:
- Broken: Import errors, cannot run
- Incomplete: Contains TODO comments, no actual assertions
- Overlapped: Functionality covered by test_multi_device_simulation.py
- Inferior: 1 broken test vs 20 passing comprehensive tests
- Wrong approach: Tried to run real BLE instances instead of using mocks
- Already excluded: Ignored in CI via --ignore flag
The multi_device_simulation test suite provides superior coverage:
- MockBLEComponents (5 tests)
- SimulatedBLENode (3 tests)
- TwoDeviceSimulator (6 tests)
- IntegrationScenarios (4 tests)
- Performance (2 tests)
This was leftover scaffolding from the driver abstraction refactor.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Updated tests to reflect the new driver-based architecture where GATT
server and connection management are handled by the driver layer instead
of directly in BLEInterface.
Changes:
- test_integration.py: Updated to check for driver callbacks instead of
old GATT server methods (_data_received_callback vs on_data_received)
- test_integration.py: Added test for driver abstraction layer
- test_prioritization.py: Updated to check for driver.connect() instead
of removed _connect_to_peer() method
All 106 tests now pass (excluding test_refactor_suite.py which has
import issues and appears to be obsolete).
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Only install libffi-dev on armhf (32-bit ARM) systems where cffi needs
to compile from source. x86_64 and arm64 have pre-built cffi wheels
available, so they don't need the development headers.
Changes:
- install.sh: Detect architecture and conditionally add libffi-dev for armhf
- test_installer.sh: Show libffi-dev in output only for armhf systems
- test.yml: Update ARM CI summary to reflect conditional dependency
This reduces unnecessary dependencies on x86_64 and arm64 systems while
maintaining full compatibility with 32-bit Raspberry Pi devices.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add libffi-dev to system dependencies for Debian/Ubuntu/Raspberry Pi OS
to provide FFI headers needed when cffi compiles from source on ARM
platforms. This fixes ARM 32-bit and 64-bit installation failures.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
The test script was failing because it changed directories during execution
(to /tmp/test-config/interfaces) and then tried to use a relative path
to navigate back to the repo root, which failed.
Fix: Save the absolute path to repo root at the beginning and reuse it
when needed instead of calculating it relative to the current directory.
Fixes the error: 'cd: tests/..: No such file or directory'
Fixes#3
BlueZ experimental mode is required for proper BLE connectivity. Without it,
BlueZ attempts Classic Bluetooth (BR/EDR) connections instead of BLE (LE)
connections, causing connection errors like "br-connection-profile-unavailable"
and immediate disconnections after pairing.
Changes:
- install.sh: Automatically enables BlueZ experimental mode during installation
- Detects BlueZ version (requires >= 5.49)
- Creates systemd override to add -E flag to bluetoothd
- Checks if already enabled to avoid duplicate configuration
- Shows strong warning if user skips with --skip-experimental flag
- Added --skip-experimental flag to opt-out (not recommended)
- Updated help text to document new flag
- tests/test_installer.sh: Added tests for experimental mode configuration
- README.md: Documented BlueZ experimental mode in installation sections
- Added to automated installation description
- Added as required step in manual installation
- Added troubleshooting section for BR/EDR connection errors
- examples/config_example.toml: Added troubleshooting entry for BR/EDR errors
The installer now:
1. Detects BlueZ version >= 5.49 (required for experimental mode)
2. Checks if already enabled (graceful skip)
3. Enables experimental mode by default unless --skip-experimental is used
4. Shows prominent warning if skipped (may cause BLE to break)
5. Handles edge cases (no systemd, old BlueZ, container environments)
This addresses the root cause reported in issue #3 where devices were
connecting then immediately disconnecting with BR/EDR profile errors.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Arch Linux has PyGObject 3.54.5 in python-gobject package, but bluezero
requires PyGObject <3.52.0, causing pip to fail when trying to replace
the system version.
Solution: Don't install python-gobject system package on Arch. Let pip
compile the compatible PyGObject version (3.50.2) instead.
Changes:
- install.sh: Remove python-gobject from Arch pacman install
- install.sh: Add explanatory warning about PyGObject compilation
- tests/test_installer.sh: Don't check for python-gobject on Arch
- tests/test_installer.sh: Add comment explaining why it's skipped
- tests/test_installer.sh: Update summary for Arch (PyGObject compiled)
- README.md: Remove python-gobject from Arch instructions
- README.md: Explain version incompatibility and compilation requirement
Result:
- Debian/Ubuntu: All system packages, zero compilation (~1 min)
- Arch Linux: System packages + PyGObject compilation (~2-3 min)
Trade-off accepted: Arch users get longer install time in exchange for
compatibility with bluezero's PyGObject version requirement.
Fixes: error: uninstall-no-record-file (PyGObject 3.54.5 conflict)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Arch Linux has unique pip/system package integration where pip doesn't
recognize system python-gobject as satisfying PyGObject dependency,
causing bluezero to try compiling PyGObject from source.
Solution: Install base-devel on Arch to provide build tools (gcc, make, meson)
Changes:
- install.sh: Add base-devel to Arch system dependencies
- install.sh: Add note explaining why build tools needed on Arch
- install.sh: Use --needed flag to skip already installed packages
- README.md: Document base-devel requirement for Arch users
- README.md: Explain Arch vs Debian/Ubuntu compilation differences
- tests/test_installer.sh: Expect build tools on Arch (verify base-devel installed)
- tests/test_installer.sh: Update summary to reflect Arch compilation
Rationale:
- AUR python-bluezero is outdated (v0.9.0 vs pip v0.9.1)
- AUR package has 0 votes (rarely used by community)
- base-devel commonly installed on Arch systems anyway
- Keeps latest bluezero version
- Simpler than full AUR integration
Impact:
- Debian/Ubuntu: No compilation (< 1 min install)
- Arch Linux: Some compilation (~3 min install)
- Still faster than compiling everything on Debian
Fixes Arch Linux CI failure: "Unknown compiler(s): gcc not found"
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Fix two issues preventing installer tests from passing:
1. Arch Linux: Sync package database before installing packages
- Fresh Arch containers have no package database (core, extra)
- Added pacman -Sy before pacman -S in both basic prereqs and system deps
- Error was: "warning: database file for 'core' does not exist"
- Applied to both root and non-root installation paths
2. Debian/Ubuntu: Fix package check pattern for architecture suffixes
- dpkg shows packages as "python3-cairo:amd64" not "python3-cairo "
- Changed grep pattern from "^ii $pkg " to "^ii $pkg"
- Now matches packages with or without :amd64/:arm64 suffixes
- Error was: "FAIL: python3-cairo not installed" (even though it was)
Changes:
- install.sh lines 132-134, 233-234: Add pacman -Sy sync before install
- tests/test_installer.sh line 41: Fix dpkg grep pattern
This allows all 5 OS versions to pass:
- Debian 12 (Bookworm)
- Debian Trixie (testing)
- Ubuntu 22.04 LTS
- Ubuntu 24.04 LTS
- Arch Linux (rolling) [NEW]
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive Arch Linux testing to installer-test job.
Changes to .github/workflows/test.yml:
- Add archlinux:latest to test matrix (5 OS versions tested now)
- Set continue-on-error for Arch (rolling release can expose bleeding-edge issues)
- Arch tests run in parallel with Debian/Ubuntu tests
Changes to tests/test_installer.sh:
- Refactored to be OS-agnostic (supports Debian/Ubuntu AND Arch Linux)
- Added OS type detection (apt-get vs pacman)
- Added check_package() helper function (uses dpkg or pacman based on OS)
- Conditional Debian environment setup (DEBIAN_FRONTEND only for Debian/Ubuntu)
- OS-specific package name verification:
- Debian/Ubuntu: python3-gi, python3-dbus, python3-cairo, bluez
- Arch Linux: python-gobject, python-dbus, python-cairo, bluez, bluez-utils
- OS-specific build tool checks (dpkg -l vs pacman -Q)
- Updated summary output to show correct packages per OS
install.sh changes:
- NONE - Arch Linux support already complete and correct!
CI Matrix now tests:
- Debian 12 (Bookworm - current stable)
- Debian Trixie (testing - next release) [non-blocking]
- Ubuntu 22.04 LTS (Jammy)
- Ubuntu 24.04 LTS (Noble)
- Arch Linux (rolling release) [non-blocking] [NEW]
Benefits:
- Validates install.sh Arch support works in practice
- Tests with newer BlueZ/Python versions (rolling release)
- Forward compatibility testing
- Broader Linux distribution coverage
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
bleak doesn't always expose __version__ attribute, causing test failures.
Changed to just verify the module can be imported successfully.
Fixes: AttributeError: module 'bleak' has no attribute '__version__'
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
Major architectural improvement: install.sh now handles all prerequisites,
eliminating duplicate logic and making CI test exactly what users run.
## Changes to install.sh:
**1. Added pip_install() helper function (lines 37-49)**
- Detects pip version capabilities
- Uses --break-system-packages flag on pip 23.0+ (Debian 12+, Ubuntu 24.04+)
- Falls back to no flag on pip 22.x (Ubuntu 22.04)
- Single source of truth for all pip operations
- Fixes compatibility across all OS versions
**2. Added basic system package installation (lines 91-128)**
- Checks and installs: python3, python3-pip, git, sudo
- Supports both Debian/Ubuntu (apt-get) and Arch (pacman)
- Only installs missing packages (idempotent)
**3. Changed Reticulum check to auto-install (lines 171-190)**
- Previously: exited with error if Reticulum not found
- Now: automatically installs Reticulum using pip_install()
- Verifies installation succeeded
- Falls back to manual instructions if auto-install fails
**4. Updated all pip install commands to use helper (lines 242, 251)**
- Consistent --break-system-packages handling
- Works on Ubuntu 22.04, Debian 12, Trixie, Ubuntu 24.04
**5. Updated header comment**
- Reflects that script is now self-contained
- Documents all responsibilities
## Changes to tests/test_installer.sh:
**Simplified from 127 lines to 126 lines, but more importantly:**
**Removed (no longer needed):**
- Manual apt-get install of base packages
- Manual pip install of Reticulum
- Duplicate pip compatibility logic
**Kept:**
- Non-interactive environment setup
- Verification tests
- BLE interface import test
**Added:**
- Reticulum verification check
- Updated summary to reflect self-contained nature
## Benefits:
1. ✅ **Single source of truth** - No duplicate pip logic
2. ✅ **CI tests real workflow** - Exactly what users run
3. ✅ **Better user experience** - One command does everything
4. ✅ **Cross-version compatibility** - Works on all OS/pip versions
5. ✅ **Easier maintenance** - Changes in one place
6. ✅ **Self-contained** - install.sh has zero external dependencies
## Testing:
Works across all CI matrix OS versions:
- Ubuntu 22.04 (pip 22.0.2 - no --break-system-packages)
- Debian 12 (pip 23.0+ - requires --break-system-packages)
- Debian Trixie (pip 23.0+ - requires --break-system-packages)
- Ubuntu 24.04 (pip 24.0+ - supports --break-system-packages)
Fixes#4🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>