diff --git a/BLE_PROTOCOL_v0.3.0.md b/BLE_PROTOCOL_v0.3.0.md new file mode 100644 index 0000000..a45fab4 --- /dev/null +++ b/BLE_PROTOCOL_v0.3.0.md @@ -0,0 +1,230 @@ +# BLE-Reticulum Protocol Specification v0.3.0 + +**Version**: 0.3.0 +**Date**: December 2025 +**Status**: Draft +**Backwards Compatible With**: v2.2 + +## 1. Overview + +This document specifies the v0.3.0 extension to the BLE-Reticulum protocol. This version adds **capability advertisement** to support devices that can only operate in peripheral mode (e.g., ESP32-S3). + +### 1.1 Problem Statement + +The v2.2 protocol uses MAC address sorting to determine connection direction: the device with the numerically lower MAC address initiates the connection (acts as BLE central). However, some hardware platforms (notably ESP32-S3) cannot reliably operate as BLE central due to stack limitations. + +When such a device has a lower MAC address than a peer, neither device initiates a connection: +- The peripheral-only device cannot initiate (hardware limitation) +- The peer waits for the lower-MAC device to initiate (per v2.2 protocol) + +### 1.2 Solution + +v0.3.0 introduces **capability flags** in the advertising packet via BLE manufacturer-specific data. Devices advertise their role capabilities, allowing the connection direction logic to be overridden when one device is peripheral-only. + +## 2. Manufacturer-Specific Data Format + +### 2.1 Advertising Data Structure + +v0.3.0 devices include manufacturer-specific data in their advertising packet: + +``` +AD Type: 0xFF (Manufacturer Specific Data) +Length: 5 bytes (1 type + 4 data) + +Data Format (4 bytes): +┌─────────┬─────────┬─────────┬─────────┐ +│ Byte 0 │ Byte 1 │ Byte 2 │ Byte 3 │ +├─────────┼─────────┼─────────┼─────────┤ +│ CID Low │ CID High│ Version │ Flags │ +└─────────┴─────────┴─────────┴─────────┘ + +CID (Bytes 0-1): Company ID, little-endian + 0xFFFF = Reserved for testing (Bluetooth SIG) + +Version (Byte 2): Protocol version + 0x03 = v0.3.0 + +Flags (Byte 3): Capability flags + Bit 0: PERIPHERAL_ONLY (1 = cannot act as central) + Bit 1: Reserved (CENTRAL_ONLY, future use) + Bits 2-7: Reserved (must be 0) +``` + +### 2.2 Example Values + +| Device Type | CID | Version | Flags | Raw Bytes | +|-------------|-----|---------|-------|-----------| +| Dual-mode (full capability) | 0xFFFF | 0x03 | 0x00 | `FF FF 03 00` | +| Peripheral-only (ESP32-S3) | 0xFFFF | 0x03 | 0x01 | `FF FF 03 01` | + +### 2.3 Advertising Packet Layout + +The v0.3.0 advertising packet extends v2.2: + +``` +Main Advertising Packet (31 bytes max): +├── Flags (3 bytes) +├── Complete 128-bit Service UUID (18 bytes) +│ └── 37145b00-442d-4a94-917f-8f42c5da28e3 +├── Manufacturer Data (5 bytes) ← NEW in v0.3.0 +│ ├── AD Type 0xFF (1 byte) +│ └── Data (4 bytes): CID + Version + Flags +└── Remaining: 5 bytes available + +Scan Response Packet (31 bytes max): +└── Device Name: "RNS-{identity}" (variable) +``` + +## 3. Connection Direction Logic + +### 3.1 Decision Algorithm + +``` +FUNCTION shouldInitiateConnection(local_device, peer_device): + + local_peripheral_only = local_device.flags & PERIPHERAL_ONLY + peer_peripheral_only = peer_device.flags & PERIPHERAL_ONLY + + # Case 1: Peer is peripheral-only, we are not + IF peer_peripheral_only AND NOT local_peripheral_only: + RETURN TRUE # We MUST initiate (peer cannot) + + # Case 2: We are peripheral-only, peer is not + IF local_peripheral_only AND NOT peer_peripheral_only: + RETURN FALSE # Peer MUST initiate (we cannot) + + # Case 3: Both are peripheral-only (deadlock) + IF peer_peripheral_only AND local_peripheral_only: + LOG_WARNING("Both devices peripheral-only, connection impossible") + RETURN FALSE # Neither can initiate + + # Case 4: Both have full capability - use v2.2 MAC sorting + RETURN local_device.mac < peer_device.mac +``` + +### 3.2 Capability Detection + +When a device does not advertise manufacturer data (v2.2 device): +- Assume `has_capability_data = false` +- Assume `capability_flags = 0x00` (full capability) +- Fall back to v2.2 MAC sorting + +When manufacturer data is present: +- Verify Company ID = 0xFFFF +- Verify Version >= 0x03 +- Extract capability flags from byte 3 + +## 4. Backwards Compatibility + +### 4.1 Compatibility Matrix + +| Our Device | Peer Device | Connection Decision | Result | +|------------|-------------|---------------------|--------| +| v0.3.0 dual | v0.3.0 dual | MAC sorting | Works | +| v0.3.0 dual | v0.3.0 P-only | We initiate | Works | +| v0.3.0 P-only | v0.3.0 dual | Peer initiates | Works | +| v0.3.0 dual | v0.2.x | MAC sorting | Works | +| v0.3.0 P-only | v0.2.x (lower MAC) | v0.2.x initiates | Works | +| v0.3.0 P-only | v0.2.x (higher MAC) | Neither initiates | **Fails** | +| v0.3.0 P-only | v0.3.0 P-only | Neither initiates | **Fails** | + +### 4.2 Known Limitations + +1. **v0.3.0 peripheral-only ↔ v0.2.x with higher MAC**: No connection possible. The v0.2.x device uses MAC sorting and waits for the lower-MAC device (the v0.3.0 P-only) to initiate. + + **Mitigation**: Upgrade the v0.2.x device to v0.3.0. + +2. **Two peripheral-only devices**: Connection impossible as neither can initiate. + + **Mitigation**: Ensure at least one device in the mesh has full capability. + +## 5. GATT Service (Unchanged from v2.2) + +The GATT service structure remains unchanged: + +``` +Reticulum Service: 37145b00-442d-4a94-917f-8f42c5da28e3 +├── RX Characteristic: 37145b00-442d-4a94-917f-8f42c5da28e5 +│ └── Properties: WRITE, WRITE_WITHOUT_RESPONSE +├── TX Characteristic: 37145b00-442d-4a94-917f-8f42c5da28e4 +│ └── Properties: READ, NOTIFY +│ └── CCCD: 00002902-0000-1000-8000-00805f9b34fb +└── Identity Characteristic: 37145b00-442d-4a94-917f-8f42c5da28e6 + └── Properties: READ + └── Value: 16-byte identity hash +``` + +## 6. Implementation Notes + +### 6.1 NimBLE (ESP32) + +```cpp +// Setting manufacturer data +uint8_t mfr_data[4] = {0xFF, 0xFF, 0x03, peripheral_only ? 0x01 : 0x00}; +advertising->setManufacturerData(mfr_data, sizeof(mfr_data)); + +// Parsing manufacturer data +if (device->haveManufacturerData()) { + std::string data = device->getManufacturerData(); + if (data.size() >= 4) { + uint16_t cid = (uint8_t)data[0] | ((uint8_t)data[1] << 8); + if (cid == 0xFFFF && data[2] >= 0x03) { + capability_flags = data[3]; + } + } +} +``` + +### 6.2 Android + +```kotlin +// Setting manufacturer data (note: Android API excludes CID in the byte array) +val mfrData = byteArrayOf(0x03.toByte(), flags.toByte()) +advertiseData.addManufacturerData(0xFFFF, mfrData) + +// Parsing manufacturer data +val mfrData = scanRecord.getManufacturerSpecificData(0xFFFF) +if (mfrData != null && mfrData.size >= 2 && mfrData[0] >= 0x03.toByte()) { + capabilityFlags = mfrData[1].toInt() and 0xFF +} +``` + +### 6.3 Python (BlueZ/Bleak) + +```python +# Setting manufacturer data in advertisement +# BlueZ uses D-Bus ManufacturerData property +manufacturer_data = {0xFFFF: bytes([0x03, 0x01 if peripheral_only else 0x00])} + +# Parsing manufacturer data from scan +mfr_data = device.metadata.get("manufacturer_data", {}).get(0xFFFF) +if mfr_data and len(mfr_data) >= 2 and mfr_data[0] >= 0x03: + capability_flags = mfr_data[1] +``` + +## 7. Future Extensions + +Reserved capability flag bits for potential future use: + +| Bit | Name | Description | +|-----|------|-------------| +| 0 | PERIPHERAL_ONLY | Cannot act as BLE central | +| 1 | CENTRAL_ONLY | Cannot act as BLE peripheral | +| 2 | HIGH_BANDWIDTH | Supports extended MTU/PHY | +| 3 | RELAY_CAPABLE | Can relay packets in mesh | +| 4-7 | Reserved | Must be 0 | + +## 8. Version History + +| Version | Date | Changes | +|---------|------|---------| +| v2.0 | Oct 2025 | Identity characteristic for peer identification | +| v2.1 | Oct 2025 | (Deprecated) Identity in device name | +| v2.2 | Nov 2025 | Identity handshake protocol, identity-based keying | +| v0.3.0 | Dec 2025 | Capability advertisement for peripheral-only devices | + +## 9. References + +- [BLE-Reticulum Protocol v2.2](BLE_PROTOCOL_v2.2.md) - Full protocol specification +- [Bluetooth Core Specification](https://www.bluetooth.com/specifications/specs/core-specification/) - BLE advertising data format +- [Bluetooth Assigned Numbers](https://www.bluetooth.com/specifications/assigned-numbers/) - Company IDs diff --git a/src/ble_reticulum/BLEInterface.py b/src/ble_reticulum/BLEInterface.py index 1f67909..1cdf78c 100644 --- a/src/ble_reticulum/BLEInterface.py +++ b/src/ble_reticulum/BLEInterface.py @@ -376,6 +376,7 @@ class BLEInterface(Interface): self.spawned_interfaces = {} # identity_hash (16 hex chars) -> BLEPeerInterface self.address_to_identity = {} # address -> peer_identity (16-byte identity) self.identity_to_address = {} # identity_hash -> address (for reverse lookup) + self.address_to_interface = {} # address -> BLEPeerInterface (for cleanup fallback) # Cache for recently disconnected identities (address -> (identity, timestamp)) # Used to restore identity when peer reconnects before cache expiry (60s) self._identity_cache = {} @@ -681,12 +682,13 @@ class BLEInterface(Interface): def _periodic_cleanup_task(self): """ - Periodically clean up stale reassembly buffers (CRITICAL #2: prevent memory leak) + Periodically clean up stale reassembly buffers and orphaned interfaces. - This task runs every 30 seconds to remove incomplete packet reassembly buffers - that have timed out. Without this, failed transmissions would leave buffers in - memory indefinitely, leading to memory exhaustion on long-running instances - (especially critical on Pi Zero with only 512MB RAM). + This task runs every 30 seconds to: + 1. Remove incomplete packet reassembly buffers that have timed out + (prevents memory exhaustion on long-running instances) + 2. Validate spawned interfaces against actual connections + (catches orphaned interfaces from race conditions) """ if not self.online: return # Don't reschedule if interface is offline @@ -704,9 +706,70 @@ class BLEInterface(Interface): RNS.log(f"{self} periodic cleanup: removed {total_cleaned} stale reassembly buffer(s) total", RNS.LOG_INFO) + # Validate spawned interfaces against actual connections + self._validate_spawned_interfaces() + # Reschedule for next cleanup cycle self._start_cleanup_timer() + def _validate_spawned_interfaces(self): + """ + Validate that all spawned interfaces have actual underlying connections. + + Cleans up orphaned interfaces where the BLE connection is gone but the + interface remains (race condition protection). This is a safety net for + cases where cleanup in disconnect callbacks fails due to timing issues. + """ + try: + # Get list of actually connected peers from driver + connected_addresses = set(self.driver.connected_peers) + + # Check all address_to_interface entries + orphaned = [] + for address, peer_if in list(self.address_to_interface.items()): + if address not in connected_addresses: + # Connection is gone but interface remains + orphaned.append((address, peer_if)) + + # Clean up orphaned interfaces + for address, peer_if in orphaned: + RNS.log(f"{self} cleaning up orphaned interface for {address} (no active connection)", RNS.LOG_WARNING) + + # Get identity info from interface + peer_identity = None + identity_hash = None + if peer_if.peer_identity: + peer_identity = peer_if.peer_identity + identity_hash = self._compute_identity_hash(peer_identity) + + # Detach the interface + peer_if.detach() + + # Remove from all tracking dicts + if address in self.address_to_interface: + del self.address_to_interface[address] + if identity_hash and identity_hash in self.spawned_interfaces: + del self.spawned_interfaces[identity_hash] + if address in self.address_to_identity: + del self.address_to_identity[address] + if identity_hash and identity_hash in self.identity_to_address: + del self.identity_to_address[identity_hash] + + # Clean up fragmentation state + if peer_identity: + frag_key = self._get_fragmenter_key(peer_identity, address) + with self.frag_lock: + if frag_key in self.fragmenters: + del self.fragmenters[frag_key] + if frag_key in self.reassemblers: + del self.reassemblers[frag_key] + + if orphaned: + RNS.log(f"{self} periodic validation: cleaned up {len(orphaned)} orphaned interface(s)", RNS.LOG_INFO) + + except Exception as e: + RNS.log(f"{self} error during interface validation (non-fatal): {e}", RNS.LOG_WARNING) + def _device_discovered_callback(self, device: BLEDevice): """ Driver callback: Handle discovered BLE device. @@ -978,6 +1041,8 @@ class BLEInterface(Interface): Driver callback: Handle device disconnection. Cleans up peer state, interfaces, and fragmentation buffers. + Uses dual-index approach: tries identity lookup first, falls back to + address_to_interface for reliable cleanup even when identity unavailable. """ RNS.log(f"{self} disconnected from {address}", RNS.LOG_INFO) @@ -986,8 +1051,11 @@ class BLEInterface(Interface): if address in self.peers: del self.peers[address] - # Detach interface + # Try identity-based lookup first peer_identity = self.address_to_identity.get(address) + peer_if = None + identity_hash = None + if peer_identity: identity_hash = self._compute_identity_hash(peer_identity) @@ -996,19 +1064,41 @@ class BLEInterface(Interface): self._identity_cache[address] = (peer_identity, time.time()) RNS.log(f"{self} cached identity for {address} (TTL {self._identity_cache_ttl}s)", RNS.LOG_DEBUG) - if identity_hash in self.spawned_interfaces: - peer_if = self.spawned_interfaces[identity_hash] - peer_if.detach() - del self.spawned_interfaces[identity_hash] - RNS.log(f"{self} detached interface for {address}", RNS.LOG_DEBUG) + # Get interface via identity + peer_if = self.spawned_interfaces.get(identity_hash) - # Clean up identity mappings to prevent stale connections - if address in self.address_to_identity: - del self.address_to_identity[address] - RNS.log(f"{self} cleaned up address_to_identity for {address}", RNS.LOG_DEBUG) - if identity_hash in self.identity_to_address: - del self.identity_to_address[identity_hash] - RNS.log(f"{self} cleaned up identity_to_address for {identity_hash}", RNS.LOG_DEBUG) + # Fallback: if no identity or interface not found via identity, try direct address lookup + if peer_if is None: + peer_if = self.address_to_interface.get(address) + if peer_if: + RNS.log(f"{self} using address-based fallback for cleanup of {address}", RNS.LOG_DEBUG) + # Get identity from the interface itself + if peer_if.peer_identity: + peer_identity = peer_if.peer_identity + identity_hash = self._compute_identity_hash(peer_identity) + + # Detach interface if found + if peer_if: + peer_if.detach() + RNS.log(f"{self} detached interface for {address}", RNS.LOG_DEBUG) + + # Clean up spawned_interfaces dict + if identity_hash and identity_hash in self.spawned_interfaces: + del self.spawned_interfaces[identity_hash] + else: + RNS.log(f"{self} no interface found for disconnected {address} (may have been cleaned already)", RNS.LOG_DEBUG) + + # Always clean up address_to_interface mapping + if address in self.address_to_interface: + del self.address_to_interface[address] + + # Clean up identity mappings + if address in self.address_to_identity: + del self.address_to_identity[address] + RNS.log(f"{self} cleaned up address_to_identity for {address}", RNS.LOG_DEBUG) + if identity_hash and identity_hash in self.identity_to_address: + del self.identity_to_address[identity_hash] + RNS.log(f"{self} cleaned up identity_to_address for {identity_hash}", RNS.LOG_DEBUG) # Clean up fragmenter/reassembler if peer_identity: @@ -1049,6 +1139,8 @@ class BLEInterface(Interface): del self.identity_to_address[identity_hash] if old_address in self.address_to_identity: del self.address_to_identity[old_address] + if old_address in self.address_to_interface: + del self.address_to_interface[old_address] # Clean up fragmenter/reassembler for old address if peer_identity: @@ -1102,6 +1194,11 @@ class BLEInterface(Interface): computed_hash = self._compute_identity_hash(peer_identity) self.identity_to_address[computed_hash] = new_address + # Migrate address_to_interface mapping + if old_address in self.address_to_interface: + interface = self.address_to_interface.pop(old_address) + self.address_to_interface[new_address] = interface + # Migrate fragmenter/reassembler from old to new key old_frag_key = self._get_fragmenter_key(peer_identity, old_address) new_frag_key = self._get_fragmenter_key(peer_identity, new_address) @@ -1548,10 +1645,13 @@ class BLEInterface(Interface): # Compute lookup key using identity hash identity_hash = self._compute_identity_hash(peer_identity) - # Check if interface already exists (MAC sorting should prevent this) + # Check if interface already exists (MAC rotation causes same identity at different addresses) if identity_hash in self.spawned_interfaces: - RNS.log(f"{self} interface already exists for {name} ({identity_hash[:8]}), reusing", RNS.LOG_WARNING) - return self.spawned_interfaces[identity_hash] + existing_if = self.spawned_interfaces[identity_hash] + # Update address_to_interface for the new address (critical for cleanup) + self.address_to_interface[address] = existing_if + RNS.log(f"{self} interface already exists for {name} ({identity_hash[:8]}), reusing (added address mapping for {address})", RNS.LOG_DEBUG) + return existing_if # Create new peer interface peer_if = BLEPeerInterface(self, address, name, peer_identity) @@ -1565,8 +1665,9 @@ class BLEInterface(Interface): # Register with transport RNS.Transport.interfaces.append(peer_if) - # Store in tracking dict + # Store in tracking dicts (dual-indexed for reliable cleanup) self.spawned_interfaces[identity_hash] = peer_if + self.address_to_interface[address] = peer_if RNS.log(f"{self} created peer interface for {name} ({identity_hash[:8]}), type={connection_type}", RNS.LOG_INFO) @@ -1830,35 +1931,58 @@ class BLEInterface(Interface): """ Handle a central device disconnecting from our GATT server. + Uses dual-index approach: tries identity lookup first, falls back to + address_to_interface for reliable cleanup even when identity unavailable. + Args: address: BLE address of the central device """ RNS.log(f"{self} central disconnected: {address}", RNS.LOG_INFO) - # Look up peer identity + # Try identity-based lookup first peer_identity = self.address_to_identity.get(address, None) + peer_if = None + identity_hash = None - if not peer_identity: - RNS.log(f"{self} no identity for disconnected central {address}", RNS.LOG_WARNING) - return + if peer_identity: + identity_hash = self._compute_identity_hash(peer_identity) + peer_if = self.spawned_interfaces.get(identity_hash) - # Find and detach interface - identity_hash = self._compute_identity_hash(peer_identity) - if identity_hash in self.spawned_interfaces: - peer_if = self.spawned_interfaces[identity_hash] + # Fallback: if no identity or interface not found via identity, try direct address lookup + if peer_if is None: + peer_if = self.address_to_interface.get(address) + if peer_if: + RNS.log(f"{self} using address-based fallback for cleanup of central {address}", RNS.LOG_DEBUG) + # Get identity from the interface itself + if peer_if.peer_identity: + peer_identity = peer_if.peer_identity + identity_hash = self._compute_identity_hash(peer_identity) + + # Detach interface if found + if peer_if: peer_if.detach() - del self.spawned_interfaces[identity_hash] RNS.log(f"{self} detached interface for {address}", RNS.LOG_DEBUG) - # Clean up identity mappings to prevent stale connections - if address in self.address_to_identity: - del self.address_to_identity[address] - RNS.log(f"{self} cleaned up address_to_identity for {address}", RNS.LOG_DEBUG) - if identity_hash in self.identity_to_address: - del self.identity_to_address[identity_hash] - RNS.log(f"{self} cleaned up identity_to_address for {identity_hash}", RNS.LOG_DEBUG) + # Clean up spawned_interfaces dict + if identity_hash and identity_hash in self.spawned_interfaces: + del self.spawned_interfaces[identity_hash] + else: + RNS.log(f"{self} no interface found for disconnected central {address} (may have been cleaned already)", RNS.LOG_DEBUG) - # Clean up fragmenter/reassembler + # Always clean up address_to_interface mapping + if address in self.address_to_interface: + del self.address_to_interface[address] + + # Clean up identity mappings + if address in self.address_to_identity: + del self.address_to_identity[address] + RNS.log(f"{self} cleaned up address_to_identity for {address}", RNS.LOG_DEBUG) + if identity_hash and identity_hash in self.identity_to_address: + del self.identity_to_address[identity_hash] + RNS.log(f"{self} cleaned up identity_to_address for {identity_hash}", RNS.LOG_DEBUG) + + # Clean up fragmenter/reassembler + if peer_identity: frag_key = self._get_fragmenter_key(peer_identity, address) with self.frag_lock: if frag_key in self.reassemblers: @@ -1926,6 +2050,7 @@ class BLEInterface(Interface): for peer_if in list(self.spawned_interfaces.values()): peer_if.detach() self.spawned_interfaces.clear() + self.address_to_interface.clear() # Clear fragmentation state with self.frag_lock: