fix: add address-based fallback for peer interface cleanup

BLE peer interfaces weren't being cleaned up when connections dropped
if the identity-to-address mapping wasn't available at disconnect time.
This caused orphaned interfaces to persist (peer interfaces shown with
zero active connections).

Changes:
- Add address_to_interface mapping for direct address-based cleanup
- Update _device_disconnected_callback with dual-index approach:
  try identity lookup first, fall back to address_to_interface
- Update handle_central_disconnected with same dual-index approach
- Add _validate_spawned_interfaces() periodic validation (every 30s)
  that cross-checks interfaces against driver.connected_peers
- Update _cleanup_stale_interface and _address_changed_callback to
  maintain the new mapping
- Clear address_to_interface on detach()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
torlando-tech 2025-12-31 14:01:54 -05:00
commit ed625d4f0f
2 changed files with 394 additions and 39 deletions

230
BLE_PROTOCOL_v0.3.0.md Normal file
View file

@ -0,0 +1,230 @@
# BLE-Reticulum Protocol Specification v0.3.0
**Version**: 0.3.0
**Date**: December 2025
**Status**: Draft
**Backwards Compatible With**: v2.2
## 1. Overview
This document specifies the v0.3.0 extension to the BLE-Reticulum protocol. This version adds **capability advertisement** to support devices that can only operate in peripheral mode (e.g., ESP32-S3).
### 1.1 Problem Statement
The v2.2 protocol uses MAC address sorting to determine connection direction: the device with the numerically lower MAC address initiates the connection (acts as BLE central). However, some hardware platforms (notably ESP32-S3) cannot reliably operate as BLE central due to stack limitations.
When such a device has a lower MAC address than a peer, neither device initiates a connection:
- The peripheral-only device cannot initiate (hardware limitation)
- The peer waits for the lower-MAC device to initiate (per v2.2 protocol)
### 1.2 Solution
v0.3.0 introduces **capability flags** in the advertising packet via BLE manufacturer-specific data. Devices advertise their role capabilities, allowing the connection direction logic to be overridden when one device is peripheral-only.
## 2. Manufacturer-Specific Data Format
### 2.1 Advertising Data Structure
v0.3.0 devices include manufacturer-specific data in their advertising packet:
```
AD Type: 0xFF (Manufacturer Specific Data)
Length: 5 bytes (1 type + 4 data)
Data Format (4 bytes):
┌─────────┬─────────┬─────────┬─────────┐
│ Byte 0 │ Byte 1 │ Byte 2 │ Byte 3 │
├─────────┼─────────┼─────────┼─────────┤
│ CID Low │ CID High│ Version │ Flags │
└─────────┴─────────┴─────────┴─────────┘
CID (Bytes 0-1): Company ID, little-endian
0xFFFF = Reserved for testing (Bluetooth SIG)
Version (Byte 2): Protocol version
0x03 = v0.3.0
Flags (Byte 3): Capability flags
Bit 0: PERIPHERAL_ONLY (1 = cannot act as central)
Bit 1: Reserved (CENTRAL_ONLY, future use)
Bits 2-7: Reserved (must be 0)
```
### 2.2 Example Values
| Device Type | CID | Version | Flags | Raw Bytes |
|-------------|-----|---------|-------|-----------|
| Dual-mode (full capability) | 0xFFFF | 0x03 | 0x00 | `FF FF 03 00` |
| Peripheral-only (ESP32-S3) | 0xFFFF | 0x03 | 0x01 | `FF FF 03 01` |
### 2.3 Advertising Packet Layout
The v0.3.0 advertising packet extends v2.2:
```
Main Advertising Packet (31 bytes max):
├── Flags (3 bytes)
├── Complete 128-bit Service UUID (18 bytes)
│ └── 37145b00-442d-4a94-917f-8f42c5da28e3
├── Manufacturer Data (5 bytes) ← NEW in v0.3.0
│ ├── AD Type 0xFF (1 byte)
│ └── Data (4 bytes): CID + Version + Flags
└── Remaining: 5 bytes available
Scan Response Packet (31 bytes max):
└── Device Name: "RNS-{identity}" (variable)
```
## 3. Connection Direction Logic
### 3.1 Decision Algorithm
```
FUNCTION shouldInitiateConnection(local_device, peer_device):
local_peripheral_only = local_device.flags & PERIPHERAL_ONLY
peer_peripheral_only = peer_device.flags & PERIPHERAL_ONLY
# Case 1: Peer is peripheral-only, we are not
IF peer_peripheral_only AND NOT local_peripheral_only:
RETURN TRUE # We MUST initiate (peer cannot)
# Case 2: We are peripheral-only, peer is not
IF local_peripheral_only AND NOT peer_peripheral_only:
RETURN FALSE # Peer MUST initiate (we cannot)
# Case 3: Both are peripheral-only (deadlock)
IF peer_peripheral_only AND local_peripheral_only:
LOG_WARNING("Both devices peripheral-only, connection impossible")
RETURN FALSE # Neither can initiate
# Case 4: Both have full capability - use v2.2 MAC sorting
RETURN local_device.mac < peer_device.mac
```
### 3.2 Capability Detection
When a device does not advertise manufacturer data (v2.2 device):
- Assume `has_capability_data = false`
- Assume `capability_flags = 0x00` (full capability)
- Fall back to v2.2 MAC sorting
When manufacturer data is present:
- Verify Company ID = 0xFFFF
- Verify Version >= 0x03
- Extract capability flags from byte 3
## 4. Backwards Compatibility
### 4.1 Compatibility Matrix
| Our Device | Peer Device | Connection Decision | Result |
|------------|-------------|---------------------|--------|
| v0.3.0 dual | v0.3.0 dual | MAC sorting | Works |
| v0.3.0 dual | v0.3.0 P-only | We initiate | Works |
| v0.3.0 P-only | v0.3.0 dual | Peer initiates | Works |
| v0.3.0 dual | v0.2.x | MAC sorting | Works |
| v0.3.0 P-only | v0.2.x (lower MAC) | v0.2.x initiates | Works |
| v0.3.0 P-only | v0.2.x (higher MAC) | Neither initiates | **Fails** |
| v0.3.0 P-only | v0.3.0 P-only | Neither initiates | **Fails** |
### 4.2 Known Limitations
1. **v0.3.0 peripheral-only ↔ v0.2.x with higher MAC**: No connection possible. The v0.2.x device uses MAC sorting and waits for the lower-MAC device (the v0.3.0 P-only) to initiate.
**Mitigation**: Upgrade the v0.2.x device to v0.3.0.
2. **Two peripheral-only devices**: Connection impossible as neither can initiate.
**Mitigation**: Ensure at least one device in the mesh has full capability.
## 5. GATT Service (Unchanged from v2.2)
The GATT service structure remains unchanged:
```
Reticulum Service: 37145b00-442d-4a94-917f-8f42c5da28e3
├── RX Characteristic: 37145b00-442d-4a94-917f-8f42c5da28e5
│ └── Properties: WRITE, WRITE_WITHOUT_RESPONSE
├── TX Characteristic: 37145b00-442d-4a94-917f-8f42c5da28e4
│ └── Properties: READ, NOTIFY
│ └── CCCD: 00002902-0000-1000-8000-00805f9b34fb
└── Identity Characteristic: 37145b00-442d-4a94-917f-8f42c5da28e6
└── Properties: READ
└── Value: 16-byte identity hash
```
## 6. Implementation Notes
### 6.1 NimBLE (ESP32)
```cpp
// Setting manufacturer data
uint8_t mfr_data[4] = {0xFF, 0xFF, 0x03, peripheral_only ? 0x01 : 0x00};
advertising->setManufacturerData(mfr_data, sizeof(mfr_data));
// Parsing manufacturer data
if (device->haveManufacturerData()) {
std::string data = device->getManufacturerData();
if (data.size() >= 4) {
uint16_t cid = (uint8_t)data[0] | ((uint8_t)data[1] << 8);
if (cid == 0xFFFF && data[2] >= 0x03) {
capability_flags = data[3];
}
}
}
```
### 6.2 Android
```kotlin
// Setting manufacturer data (note: Android API excludes CID in the byte array)
val mfrData = byteArrayOf(0x03.toByte(), flags.toByte())
advertiseData.addManufacturerData(0xFFFF, mfrData)
// Parsing manufacturer data
val mfrData = scanRecord.getManufacturerSpecificData(0xFFFF)
if (mfrData != null && mfrData.size >= 2 && mfrData[0] >= 0x03.toByte()) {
capabilityFlags = mfrData[1].toInt() and 0xFF
}
```
### 6.3 Python (BlueZ/Bleak)
```python
# Setting manufacturer data in advertisement
# BlueZ uses D-Bus ManufacturerData property
manufacturer_data = {0xFFFF: bytes([0x03, 0x01 if peripheral_only else 0x00])}
# Parsing manufacturer data from scan
mfr_data = device.metadata.get("manufacturer_data", {}).get(0xFFFF)
if mfr_data and len(mfr_data) >= 2 and mfr_data[0] >= 0x03:
capability_flags = mfr_data[1]
```
## 7. Future Extensions
Reserved capability flag bits for potential future use:
| Bit | Name | Description |
|-----|------|-------------|
| 0 | PERIPHERAL_ONLY | Cannot act as BLE central |
| 1 | CENTRAL_ONLY | Cannot act as BLE peripheral |
| 2 | HIGH_BANDWIDTH | Supports extended MTU/PHY |
| 3 | RELAY_CAPABLE | Can relay packets in mesh |
| 4-7 | Reserved | Must be 0 |
## 8. Version History
| Version | Date | Changes |
|---------|------|---------|
| v2.0 | Oct 2025 | Identity characteristic for peer identification |
| v2.1 | Oct 2025 | (Deprecated) Identity in device name |
| v2.2 | Nov 2025 | Identity handshake protocol, identity-based keying |
| v0.3.0 | Dec 2025 | Capability advertisement for peripheral-only devices |
## 9. References
- [BLE-Reticulum Protocol v2.2](BLE_PROTOCOL_v2.2.md) - Full protocol specification
- [Bluetooth Core Specification](https://www.bluetooth.com/specifications/specs/core-specification/) - BLE advertising data format
- [Bluetooth Assigned Numbers](https://www.bluetooth.com/specifications/assigned-numbers/) - Company IDs

View file

@ -376,6 +376,7 @@ class BLEInterface(Interface):
self.spawned_interfaces = {} # identity_hash (16 hex chars) -> BLEPeerInterface
self.address_to_identity = {} # address -> peer_identity (16-byte identity)
self.identity_to_address = {} # identity_hash -> address (for reverse lookup)
self.address_to_interface = {} # address -> BLEPeerInterface (for cleanup fallback)
# Cache for recently disconnected identities (address -> (identity, timestamp))
# Used to restore identity when peer reconnects before cache expiry (60s)
self._identity_cache = {}
@ -681,12 +682,13 @@ class BLEInterface(Interface):
def _periodic_cleanup_task(self):
"""
Periodically clean up stale reassembly buffers (CRITICAL #2: prevent memory leak)
Periodically clean up stale reassembly buffers and orphaned interfaces.
This task runs every 30 seconds to remove incomplete packet reassembly buffers
that have timed out. Without this, failed transmissions would leave buffers in
memory indefinitely, leading to memory exhaustion on long-running instances
(especially critical on Pi Zero with only 512MB RAM).
This task runs every 30 seconds to:
1. Remove incomplete packet reassembly buffers that have timed out
(prevents memory exhaustion on long-running instances)
2. Validate spawned interfaces against actual connections
(catches orphaned interfaces from race conditions)
"""
if not self.online:
return # Don't reschedule if interface is offline
@ -704,9 +706,70 @@ class BLEInterface(Interface):
RNS.log(f"{self} periodic cleanup: removed {total_cleaned} stale reassembly buffer(s) total",
RNS.LOG_INFO)
# Validate spawned interfaces against actual connections
self._validate_spawned_interfaces()
# Reschedule for next cleanup cycle
self._start_cleanup_timer()
def _validate_spawned_interfaces(self):
"""
Validate that all spawned interfaces have actual underlying connections.
Cleans up orphaned interfaces where the BLE connection is gone but the
interface remains (race condition protection). This is a safety net for
cases where cleanup in disconnect callbacks fails due to timing issues.
"""
try:
# Get list of actually connected peers from driver
connected_addresses = set(self.driver.connected_peers)
# Check all address_to_interface entries
orphaned = []
for address, peer_if in list(self.address_to_interface.items()):
if address not in connected_addresses:
# Connection is gone but interface remains
orphaned.append((address, peer_if))
# Clean up orphaned interfaces
for address, peer_if in orphaned:
RNS.log(f"{self} cleaning up orphaned interface for {address} (no active connection)", RNS.LOG_WARNING)
# Get identity info from interface
peer_identity = None
identity_hash = None
if peer_if.peer_identity:
peer_identity = peer_if.peer_identity
identity_hash = self._compute_identity_hash(peer_identity)
# Detach the interface
peer_if.detach()
# Remove from all tracking dicts
if address in self.address_to_interface:
del self.address_to_interface[address]
if identity_hash and identity_hash in self.spawned_interfaces:
del self.spawned_interfaces[identity_hash]
if address in self.address_to_identity:
del self.address_to_identity[address]
if identity_hash and identity_hash in self.identity_to_address:
del self.identity_to_address[identity_hash]
# Clean up fragmentation state
if peer_identity:
frag_key = self._get_fragmenter_key(peer_identity, address)
with self.frag_lock:
if frag_key in self.fragmenters:
del self.fragmenters[frag_key]
if frag_key in self.reassemblers:
del self.reassemblers[frag_key]
if orphaned:
RNS.log(f"{self} periodic validation: cleaned up {len(orphaned)} orphaned interface(s)", RNS.LOG_INFO)
except Exception as e:
RNS.log(f"{self} error during interface validation (non-fatal): {e}", RNS.LOG_WARNING)
def _device_discovered_callback(self, device: BLEDevice):
"""
Driver callback: Handle discovered BLE device.
@ -978,6 +1041,8 @@ class BLEInterface(Interface):
Driver callback: Handle device disconnection.
Cleans up peer state, interfaces, and fragmentation buffers.
Uses dual-index approach: tries identity lookup first, falls back to
address_to_interface for reliable cleanup even when identity unavailable.
"""
RNS.log(f"{self} disconnected from {address}", RNS.LOG_INFO)
@ -986,8 +1051,11 @@ class BLEInterface(Interface):
if address in self.peers:
del self.peers[address]
# Detach interface
# Try identity-based lookup first
peer_identity = self.address_to_identity.get(address)
peer_if = None
identity_hash = None
if peer_identity:
identity_hash = self._compute_identity_hash(peer_identity)
@ -996,19 +1064,41 @@ class BLEInterface(Interface):
self._identity_cache[address] = (peer_identity, time.time())
RNS.log(f"{self} cached identity for {address} (TTL {self._identity_cache_ttl}s)", RNS.LOG_DEBUG)
if identity_hash in self.spawned_interfaces:
peer_if = self.spawned_interfaces[identity_hash]
peer_if.detach()
del self.spawned_interfaces[identity_hash]
RNS.log(f"{self} detached interface for {address}", RNS.LOG_DEBUG)
# Get interface via identity
peer_if = self.spawned_interfaces.get(identity_hash)
# Clean up identity mappings to prevent stale connections
if address in self.address_to_identity:
del self.address_to_identity[address]
RNS.log(f"{self} cleaned up address_to_identity for {address}", RNS.LOG_DEBUG)
if identity_hash in self.identity_to_address:
del self.identity_to_address[identity_hash]
RNS.log(f"{self} cleaned up identity_to_address for {identity_hash}", RNS.LOG_DEBUG)
# Fallback: if no identity or interface not found via identity, try direct address lookup
if peer_if is None:
peer_if = self.address_to_interface.get(address)
if peer_if:
RNS.log(f"{self} using address-based fallback for cleanup of {address}", RNS.LOG_DEBUG)
# Get identity from the interface itself
if peer_if.peer_identity:
peer_identity = peer_if.peer_identity
identity_hash = self._compute_identity_hash(peer_identity)
# Detach interface if found
if peer_if:
peer_if.detach()
RNS.log(f"{self} detached interface for {address}", RNS.LOG_DEBUG)
# Clean up spawned_interfaces dict
if identity_hash and identity_hash in self.spawned_interfaces:
del self.spawned_interfaces[identity_hash]
else:
RNS.log(f"{self} no interface found for disconnected {address} (may have been cleaned already)", RNS.LOG_DEBUG)
# Always clean up address_to_interface mapping
if address in self.address_to_interface:
del self.address_to_interface[address]
# Clean up identity mappings
if address in self.address_to_identity:
del self.address_to_identity[address]
RNS.log(f"{self} cleaned up address_to_identity for {address}", RNS.LOG_DEBUG)
if identity_hash and identity_hash in self.identity_to_address:
del self.identity_to_address[identity_hash]
RNS.log(f"{self} cleaned up identity_to_address for {identity_hash}", RNS.LOG_DEBUG)
# Clean up fragmenter/reassembler
if peer_identity:
@ -1049,6 +1139,8 @@ class BLEInterface(Interface):
del self.identity_to_address[identity_hash]
if old_address in self.address_to_identity:
del self.address_to_identity[old_address]
if old_address in self.address_to_interface:
del self.address_to_interface[old_address]
# Clean up fragmenter/reassembler for old address
if peer_identity:
@ -1102,6 +1194,11 @@ class BLEInterface(Interface):
computed_hash = self._compute_identity_hash(peer_identity)
self.identity_to_address[computed_hash] = new_address
# Migrate address_to_interface mapping
if old_address in self.address_to_interface:
interface = self.address_to_interface.pop(old_address)
self.address_to_interface[new_address] = interface
# Migrate fragmenter/reassembler from old to new key
old_frag_key = self._get_fragmenter_key(peer_identity, old_address)
new_frag_key = self._get_fragmenter_key(peer_identity, new_address)
@ -1548,10 +1645,13 @@ class BLEInterface(Interface):
# Compute lookup key using identity hash
identity_hash = self._compute_identity_hash(peer_identity)
# Check if interface already exists (MAC sorting should prevent this)
# Check if interface already exists (MAC rotation causes same identity at different addresses)
if identity_hash in self.spawned_interfaces:
RNS.log(f"{self} interface already exists for {name} ({identity_hash[:8]}), reusing", RNS.LOG_WARNING)
return self.spawned_interfaces[identity_hash]
existing_if = self.spawned_interfaces[identity_hash]
# Update address_to_interface for the new address (critical for cleanup)
self.address_to_interface[address] = existing_if
RNS.log(f"{self} interface already exists for {name} ({identity_hash[:8]}), reusing (added address mapping for {address})", RNS.LOG_DEBUG)
return existing_if
# Create new peer interface
peer_if = BLEPeerInterface(self, address, name, peer_identity)
@ -1565,8 +1665,9 @@ class BLEInterface(Interface):
# Register with transport
RNS.Transport.interfaces.append(peer_if)
# Store in tracking dict
# Store in tracking dicts (dual-indexed for reliable cleanup)
self.spawned_interfaces[identity_hash] = peer_if
self.address_to_interface[address] = peer_if
RNS.log(f"{self} created peer interface for {name} ({identity_hash[:8]}), type={connection_type}", RNS.LOG_INFO)
@ -1830,35 +1931,58 @@ class BLEInterface(Interface):
"""
Handle a central device disconnecting from our GATT server.
Uses dual-index approach: tries identity lookup first, falls back to
address_to_interface for reliable cleanup even when identity unavailable.
Args:
address: BLE address of the central device
"""
RNS.log(f"{self} central disconnected: {address}", RNS.LOG_INFO)
# Look up peer identity
# Try identity-based lookup first
peer_identity = self.address_to_identity.get(address, None)
peer_if = None
identity_hash = None
if not peer_identity:
RNS.log(f"{self} no identity for disconnected central {address}", RNS.LOG_WARNING)
return
if peer_identity:
identity_hash = self._compute_identity_hash(peer_identity)
peer_if = self.spawned_interfaces.get(identity_hash)
# Find and detach interface
identity_hash = self._compute_identity_hash(peer_identity)
if identity_hash in self.spawned_interfaces:
peer_if = self.spawned_interfaces[identity_hash]
# Fallback: if no identity or interface not found via identity, try direct address lookup
if peer_if is None:
peer_if = self.address_to_interface.get(address)
if peer_if:
RNS.log(f"{self} using address-based fallback for cleanup of central {address}", RNS.LOG_DEBUG)
# Get identity from the interface itself
if peer_if.peer_identity:
peer_identity = peer_if.peer_identity
identity_hash = self._compute_identity_hash(peer_identity)
# Detach interface if found
if peer_if:
peer_if.detach()
del self.spawned_interfaces[identity_hash]
RNS.log(f"{self} detached interface for {address}", RNS.LOG_DEBUG)
# Clean up identity mappings to prevent stale connections
if address in self.address_to_identity:
del self.address_to_identity[address]
RNS.log(f"{self} cleaned up address_to_identity for {address}", RNS.LOG_DEBUG)
if identity_hash in self.identity_to_address:
del self.identity_to_address[identity_hash]
RNS.log(f"{self} cleaned up identity_to_address for {identity_hash}", RNS.LOG_DEBUG)
# Clean up spawned_interfaces dict
if identity_hash and identity_hash in self.spawned_interfaces:
del self.spawned_interfaces[identity_hash]
else:
RNS.log(f"{self} no interface found for disconnected central {address} (may have been cleaned already)", RNS.LOG_DEBUG)
# Clean up fragmenter/reassembler
# Always clean up address_to_interface mapping
if address in self.address_to_interface:
del self.address_to_interface[address]
# Clean up identity mappings
if address in self.address_to_identity:
del self.address_to_identity[address]
RNS.log(f"{self} cleaned up address_to_identity for {address}", RNS.LOG_DEBUG)
if identity_hash and identity_hash in self.identity_to_address:
del self.identity_to_address[identity_hash]
RNS.log(f"{self} cleaned up identity_to_address for {identity_hash}", RNS.LOG_DEBUG)
# Clean up fragmenter/reassembler
if peer_identity:
frag_key = self._get_fragmenter_key(peer_identity, address)
with self.frag_lock:
if frag_key in self.reassemblers:
@ -1926,6 +2050,7 @@ class BLEInterface(Interface):
for peer_if in list(self.spawned_interfaces.values()):
peer_if.detach()
self.spawned_interfaces.clear()
self.address_to_interface.clear()
# Clear fragmentation state
with self.frag_lock: