Phase 2 Gate 2A plan and report

This commit is contained in:
John Poole 2026-05-18 14:44:25 -07:00
commit a7b1ca02de
12 changed files with 3696 additions and 0 deletions

View file

@ -0,0 +1,437 @@
# Gate 2A: BLE Protocol Session Manager Design
Date: 2026-05-18 14:34 America/Los_Angeles
Scope: design only. No source implementation is included in this gate.
## 1. Executive Summary
Phase 2 should proceed, but not by porting `BLEInterface._handle_identity_handshake()` line-for-line. The correct C++ unit is best named `BLEPeerSessionManager`: it should own BLE peer/session protocol state, identity decisions, pending-handshake state, MTU/session metadata, and adapter action decisions. `BLEProtocolSession` is a reasonable internal type name for one peer, but the migrated unit needs to manage multiple peers and address/identity mappings, so `BLEPeerSessionManager` is the clearer top-level name.
`_handle_identity_handshake()` should not be ported literally because it mixes several layers:
- pure protocol/session decisions: 16-byte identity detection, duplicate identity classification, identity key derivation, consumed/not-consumed routing;
- Linux/Python adapter actions: `RNS.log`, `driver.disconnect`, Python dict mutation, Python lock use, `BLEPeerInterface` creation;
- Reticulum integration: `RNS.Transport.interfaces` mutation through `_spawn_peer_interface`;
- fragmentation allocation using the current backend shim.
The C++ target should return explicit decisions and adapter actions. Python and ESP32 code should perform platform effects.
## 2. Reference Behavior Map
Reference source: `src/ble_reticulum/BLEInterface.py`.
| Function / behavior | Pure protocol/session decision | Linux/Python/Reticulum adapter action | ESP32/microReticulum adapter equivalent | State read | State mutated | Risk if changed |
|---|---|---|---|---|---|---|
| `_data_received_callback` lines 1309-1321 | Identity handshakes are checked before normal data routing. | Calls `_handle_identity_handshake`; if false, calls `_handle_ble_data`. | BLE callback calls session manager first; if not consumed, pass bytes to C++ reassembler/data path. | none directly | none directly | Reordering can pass identity bytes into reassembler or drop real data. |
| `_handle_identity_handshake`: `len(data) != 16` | Non-16-byte payload is not a handshake. | Return `False`, normal data path continues. | Return `PassToReassembler` / `consume=false`. | incoming data length | none | Misclassifying normal payloads breaks data transfer. |
| Known identity, 16-byte payload matches | Duplicate identity data is consumed silently. | Log debug; return `True`; do not reassemble. | Return `ConsumeInput` with duplicate-same decision. | `address_to_identity[address]` | none | Pyxis currently diverges here; passing this to reassembler may create false fragment errors. |
| Known identity, 16-byte payload differs | Identity-like 16-byte data is consumed with warning. | Log warning; return `True`. | Return `ConsumeInput` plus warning event. | `address_to_identity[address]` | none | Could consume legitimate 16-byte Reticulum payload; preserving current behavior avoids behavior drift during migration. |
| New 16-byte identity | Identity accepted if duplicate policy allows. | Convert to `bytes`; compute identity hash. | Store identity as 16-byte stable peer identity. | incoming data | local result state | Incorrect identity size or hash handling breaks peer keying. |
| `_compute_identity_hash` lines 1899-1918 | Short key is first 8 bytes rendered as 16 hex chars. | Used for Python map keys. | Use as compact display/session key where collision risk is acceptable; keep full identity as authoritative key. | peer identity | none | Truncation collision is unlikely but not impossible; full identity should be primary in C++ state. |
| `_get_fragmenter_key` lines 1886-1897 | Fragmentation key is full identity hex; address is ignored. | Python dict key for fragmenters/reassemblers. | Use 16-byte identity or full hex as fragmenter key; avoid MAC keying. | peer identity | none | Address-keyed fragmentation would break MAC rotation recovery. |
| `_check_duplicate_identity`: invalid/missing identity | Invalid identity does not trigger duplicate rejection. | Return false. | Return no rejection. | identity length | none | Over-strict rejection can disconnect peers before handshake completes. |
| `_check_duplicate_identity`: same identity, different address, pending detach | Allow reconnection; stale old address can be cleaned. | Logs; calls `_cleanup_stale_address`; return false. | Return `AcceptNewIdentity`, `UpdatePeerAddress`, maybe `CleanupOldAddress`. | `identity_to_address`, `_pending_detach` | stale address maps through adapter | Rejecting this breaks MAC rotation and reconnect-after-disconnect. |
| `_check_duplicate_identity`: old address not connected and not in `peers` | Allow reconnection; cleanup stale old address. | Logs; calls `_cleanup_stale_address`; return false. | Return `AcceptNewIdentity`, `UpdatePeerAddress`, `CleanupOldAddress`. | `driver.connected_peers`, `peers` | stale address maps through adapter | False duplicate rejection after stale cleanup race. |
| `_check_duplicate_identity`: old connection zombie | Allow new connection and disconnect old one. | Logs; cleanup stale address; `driver.disconnect(existing_address)`; return false. | Return `DisconnectOldPeer` and accept new peer. | `_last_real_data`, `_zombie_timeout` | stale mapping cleanup through adapter | Keeping zombie can block recovery; disconnecting wrong peer can flap stable links. |
| `_check_duplicate_identity`: old connection alive | Reject new duplicate. | Logs; return true. `_handle_identity_handshake` disconnects current address. | Return `RejectDuplicateIdentity`, `DisconnectCurrentPeer`. | `identity_to_address`, driver/peer liveness | none in core; adapter disconnects current | Allowing both creates duplicate interfaces and routing ambiguity. |
| Accept identity mappings | New identity becomes authoritative for address and identity key. | Mutates `address_to_identity`, `identity_to_address`. | C++ manager owns address-to-identity and identity-to-session maps. Python mirrors during transition. | identity hash | maps | Divergent Python/C++ maps are a transition risk. |
| MTU lookup | Use negotiated MTU if available, else BLE minimum 23. | Calls `driver.get_peer_mtu(address)`. | Adapter supplies MTU from connection handle; core falls back to `23`. | driver MTU | result MTU | Wrong MTU causes fragmentation failures or throughput regression. |
| Fragmentation state | Accepted identity needs fragmenter and reassembler. | Creates `BLEFragmenter(mtu)` and `BLEReassembler()`. | C++ session manager owns/requests creation of C++ fragmenter/reassembler keyed by identity. | MTU, identity | fragmenter/reassembler pools | Missing reassembler races early data. |
| Peer interface ready | Accepted identity should make peer routable. | Calls `_spawn_peer_interface` or updates existing interface address. | Register/update microReticulum peer/session object. | `spawned_interfaces` | `RNS.Transport.interfaces` indirectly, `address_to_interface` | Creating duplicate peer objects causes routing ambiguity. |
| Mark real data | Identity handshake counts as real data for zombie tracking. | Mutates `_last_real_data[identity_hash]`. | C++ session liveness timestamp updated. | time | liveness state | Zombie replacement decisions become inaccurate. |
| Remove pending identity | Successful handshake clears pending timeout. | Deletes `_pending_identity_connections[address]`. | C++ pending identity table clears connection/address. | pending map | pending map | Timeout may later disconnect a valid peer. |
| Exception path | Any exception during handshake processing still consumes the packet. | Logs error; returns `True`. | Return `ErrorConsumed` if adapter/core detects invalid transition. | exception | none or partial adapter state | Changing to pass-through can feed malformed identity into data path. |
| `handle_peripheral_data` lines 2113-2178 | Parallel/legacy peripheral handshake path: if no identity and len 16, accept; otherwise if no identity, drop. | Mutates maps, creates fragmenter/reassembler, spawns peer interface; lacks duplicate policy nuance from `_handle_identity_handshake`. | Should be retired or routed through same C++ session manager. | `address_to_identity`, `connection_timeout` | maps, fragmentation, interface | Divergent handshake paths are a major behavior-drift risk. |
## 3. Pyxis Comparison
Comparative project inspected: `/usr/local/src/pyxis/lib/ble_interface`.
Relevant C++ structure:
- `BLETypes.h`: UUIDs, MTU constants, timing constants, roles, connection state, peer state, `BLEAddress`, `ConnectionHandle`.
- `BLEPlatform.h`: BLE hardware abstraction for scanning, advertising, connection management, GATT reads/writes, notification callbacks.
- `BLEIdentityManager.h/.cpp`: local identity, identity handshake, address-to-identity mapping, MAC rotation callback, fixed-size handshake pools.
- `BLEPeerManager.h/.cpp`: discovered/connected peer state, identity-keyed peer storage, MAC-only peer storage, handle lookup, MTU, scoring, blacklist, zombie cleanup.
- `BLEInterface.h/.cpp`: orchestration layer that wires platform callbacks, identity manager callbacks, peer manager, fragmenters, reassembler, and microReticulum interface calls.
- `BLEFragmenter` and `BLEReassembler`: portable packet protocol components.
Pyxis organizes transport/interface state with a useful layered split: platform callback layer, identity manager, peer manager, fragmentation components, and a top-level microReticulum `InterfaceImpl`. It also uses fixed-size pools in several places, which is a strong ESP32-S3 design prior.
Ideas to reuse:
- Centralize BLE protocol constants and connection/session structs.
- Use a platform abstraction rather than embedding NimBLE or BlueZ calls in protocol logic.
- Key peers and reassembly by stable 16-byte identity, not MAC.
- Use fixed-size pools or bounded containers for ESP32 builds.
- Defer heavy work out of BLE stack callbacks when stack safety requires it.
- Keep microReticulum interface registration outside the pure identity/session manager.
Ideas not to reuse blindly:
- Pyxis `BLEIdentityManager::isHandshakeData()` treats a 16-byte payload as handshake only if no identity mapping exists. Current Python also consumes matching or mismatching 16-byte identity-like data when identity is already known.
- Pyxis duplicate connected identity handling is simpler than current Python. Current Python allows replacement for pending detach, stale disconnected address, and zombie old connection.
- Pyxis uses `std::function`, `std::vector`, `std::shared_ptr`, `std::string`, and dynamic `Bytes` in some interfaces. These may be acceptable, but ESP32 memory behavior should be measured before adopting the exact style.
Pyxis gives a useful ESP32-side adapter pattern, especially `ConnectionHandle`, `IBLEPlatform`, and the separation between identity/peer managers and `BLEInterface`. Its BLE code is early enough that it should guide architecture, not define behavior.
## 4. Proposed C++ Header
This is a proposal only; no header should be created in Gate 2A.
```cpp
#pragma once
#include <array>
#include <cstdint>
#include <optional>
#include <string>
#include <vector>
namespace ble_reticulum {
using PeerIdentity = std::array<uint8_t, 16>;
enum class LocalRole : uint8_t {
Unknown,
Central,
Peripheral,
};
enum class InputDecision : uint8_t {
PassToReassembler,
ConsumedDuplicateSameIdentity,
ConsumedDuplicateMismatchedIdentity,
AcceptedNewIdentity,
RejectedDuplicateIdentity,
ErrorConsumed,
};
enum class SessionActionType : uint8_t {
ConsumeInput,
PassToReassembler,
AcceptNewIdentity,
RejectDuplicateIdentity,
DisconnectCurrentPeer,
DisconnectOldPeer,
CreateFragmentationState,
MarkPeerReady,
UpdatePeerAddress,
RemovePendingIdentity,
MarkRealData,
CleanupOldAddress,
Warn,
};
struct ConnectionId {
// Linux adapter may use address string. ESP32 adapter may use handle.
// The core stores both when supplied but does not call platform APIs.
std::string address;
uint16_t handle = 0xffff;
};
struct ConnectionSnapshot {
ConnectionId current;
LocalRole local_role = LocalRole::Unknown;
std::optional<PeerIdentity> known_identity_for_address;
std::optional<uint16_t> negotiated_mtu;
// Adapter-supplied liveness facts needed to preserve current Python
// duplicate identity behavior without letting C++ call BlueZ/NimBLE.
std::optional<std::string> existing_address_for_identity;
bool identity_has_pending_detach = false;
bool existing_address_connected = false;
bool existing_address_in_peer_table = false;
bool existing_connection_is_zombie = false;
double existing_last_real_data = 0.0;
};
struct SessionAction {
SessionActionType type = SessionActionType::ConsumeInput;
ConnectionId target;
std::string old_address;
std::string new_address;
std::string message;
};
struct HandshakeResult {
InputDecision decision = InputDecision::PassToReassembler;
std::vector<SessionAction> actions;
bool consumed = false;
bool accepted = false;
bool should_disconnect_current = false;
bool should_disconnect_old = false;
std::optional<PeerIdentity> peer_identity;
std::string identity_key; // Compatibility key: first 8 identity bytes as 16 hex chars.
std::string fragmenter_key; // Compatibility key: full identity as 32 hex chars.
uint16_t mtu = 23;
};
struct PeerSessionView {
PeerIdentity identity;
std::string identity_key;
std::string current_address;
uint16_t current_handle = 0xffff;
uint16_t mtu = 23;
bool has_fragmentation_state = false;
bool peer_ready = false;
double pending_identity_since = 0.0;
double last_real_data = 0.0;
};
class BLEPeerSessionManager {
public:
explicit BLEPeerSessionManager(double pending_identity_timeout = 30.0,
double zombie_timeout = 45.0);
HandshakeResult handleIdentityHandshake(const ConnectionSnapshot& connection,
const uint8_t* data,
size_t data_size,
double now_seconds);
void markConnected(const ConnectionSnapshot& connection, double now_seconds);
void markDisconnected(const ConnectionId& connection, double now_seconds);
void markMtu(const ConnectionId& connection, uint16_t mtu);
void markPendingIdentity(const ConnectionId& connection, double now_seconds);
std::vector<ConnectionId> expiredPendingIdentities(double now_seconds) const;
std::optional<PeerSessionView> sessionByAddress(const std::string& address) const;
std::optional<PeerSessionView> sessionByIdentity(const PeerIdentity& identity) const;
static bool isIdentityHandshakePayload(const uint8_t* data, size_t data_size);
static PeerIdentity identityFromPayload(const uint8_t* data, size_t data_size);
static std::string computeIdentityKey(const PeerIdentity& identity);
static std::string computeFragmenterKey(const PeerIdentity& identity);
private:
// Ownership: this manager owns protocol/session truth. Adapters may mirror it
// during migration, but RNS objects, BLE stack objects, and actual I/O remain
// outside this class.
double pending_identity_timeout_;
double zombie_timeout_;
// Implementation may use std containers for native/Linux tests and fixed pools
// for embedded builds. The public behavior should not depend on allocation model.
};
} // namespace ble_reticulum
```
## 5. State Ownership Model
Move to C++:
- address-to-identity mapping;
- identity-to-active-session mapping;
- current address/connection metadata per identity;
- MTU per peer/session;
- pending identity state and timeout age;
- last real data / zombie timing if duplicate replacement policy remains;
- peer-ready state;
- fragmentation state ownership or at minimum authoritative requests to create/update it;
- reassembler ownership once the Python adapter can stop owning `reassemblers`.
Keep in adapters:
- `RNS.Transport.interfaces`;
- Python `BLEPeerInterface` instances;
- BlueZ/Bleak/DBus objects;
- ESP32 NimBLE/Bluedroid connection handles as platform resources, although handle values can be stored as opaque IDs in session state;
- logging mechanism;
- actual disconnect calls;
- actual GATT reads/writes/notifies;
- thread/mutex policy specific to Python runtime or ESP32 BLE stack callbacks.
## 6. Result / Action Model
The core should return decisions and adapter actions. Required actions:
- `ConsumeInput`;
- `PassToReassembler`;
- `AcceptNewIdentity`;
- `RejectDuplicateIdentity`;
- `DisconnectCurrentPeer`;
- `DisconnectOldPeer`;
- `CreateFragmentationState`;
- `MarkPeerReady`;
- `UpdatePeerAddress`;
- `RemovePendingIdentity`;
- `MarkRealData`;
- `CleanupOldAddress`;
- `Warn`.
The adapter executes these actions in platform order. For example, a zombie replacement result can include `CleanupOldAddress`, `DisconnectOldPeer`, `AcceptNewIdentity`, `CreateFragmentationState`, `MarkPeerReady`, `RemovePendingIdentity`, and `MarkRealData`.
## 7. Python Adapter Boundary
During transition, `BLEInterface.py` would:
1. Build `ConnectionSnapshot` from current Python dictionaries and driver state:
- `address_to_identity`;
- `identity_to_address`;
- `_pending_detach`;
- `driver.connected_peers`;
- `peers`;
- `_last_real_data`;
- `driver.get_peer_mtu(address)`.
2. Call `BLEPeerSessionManager.handleIdentityHandshake(...)`.
3. If `consumed` is false, call `_handle_ble_data(address, data)`.
4. Mirror accepted C++ session state into Python dictionaries initially.
5. Execute adapter actions:
- `RNS.log` for messages/warnings;
- `driver.disconnect(address)` for current/old peer disconnect;
- create/update Python `BLEFragmenter` and `BLEReassembler` until owned by C++;
- call `_spawn_peer_interface(...)` or update existing `BLEPeerInterface.peer_address`;
- delete `_pending_identity_connections[address]`.
The Python dictionaries should become mirrors, not the authoritative state.
## 8. ESP32 / microReticulum Adapter Boundary
On T-Beam SUPREME / ESP32-S3:
1. BLE stack callback receives bytes and a connection handle/address.
2. Adapter maps handle/address/role/MTU into `ConnectionSnapshot`.
3. Adapter calls `BLEPeerSessionManager.handleIdentityHandshake(...)`.
4. If result says `PassToReassembler`, pass payload to C++ `BLEReassembler`.
5. If result says `CreateFragmentationState`, initialize or update C++ fragmenter/reassembler pools keyed by identity.
6. If result says `MarkPeerReady`, register or update the peer/session with the microReticulum interface abstraction.
7. If result says disconnect, call NimBLE/Bluedroid disconnect through the ESP32 BLE platform layer.
8. Logging goes through the firmware logging mechanism, not the session core.
The adapter should prefer BLE connection handles for platform operations and stable 16-byte identity for Reticulum/session routing. MAC address should remain metadata because rotation is expected.
## 9. Test Matrix
| Gate | Test | Expected decision/action |
|---|---|---|
| 2B | non-16-byte payload | `PassToReassembler`, `consumed=false` |
| 2B | new 16-byte identity | `AcceptedNewIdentity`, `ConsumeInput`, identity key and fragmenter key set |
| 2B | known identity duplicate same | `ConsumedDuplicateSameIdentity`, `consumed=true` |
| 2B | known identity duplicate mismatch | `ConsumedDuplicateMismatchedIdentity`, `consumed=true`, warning action |
| 2B | duplicate identity active elsewhere | `RejectedDuplicateIdentity`, `DisconnectCurrentPeer` |
| 2B | duplicate identity with stale/pending detach | accept new identity, `CleanupOldAddress`, `UpdatePeerAddress` |
| 2B | duplicate identity with zombie old connection | accept new identity, `DisconnectOldPeer`, `CleanupOldAddress` |
| 2B | MTU provided | result MTU equals provided MTU |
| 2B | MTU missing | result MTU falls back to `23` |
| 2B | pending identity timeout | expired connection IDs are returned without platform calls |
| 2B | peer address update | identity session current address changes; fragmenter key unchanged |
| 2C | pybind11 identity key equivalence | C++ `computeIdentityKey` equals Python `_compute_identity_hash` |
| 2C | pybind11 fragmenter key equivalence | C++ `computeFragmenterKey` equals Python `_get_fragmenter_key` |
| 2D | consumed packet behavior | fake Python harness sees same return/consume behavior as `_handle_identity_handshake` |
| 2D | pass-to-reassembler behavior | non-handshake calls `_handle_ble_data` path in harness |
| 2D | duplicate active elsewhere | fake driver records disconnect current peer |
| 2D | zombie old connection | fake driver records disconnect old peer and accepts new peer |
| 2D | invalid/exception compatibility | invalid transition returns `ErrorConsumed` where current Python would consume after exception |
## 10. Migration Plan
- Gate 2A: design only; this report and SQL tracking only.
- Gate 2B: C++ session manager skeleton plus pure native unit tests.
- Gate 2C: pybind11 binding plus Python tests for result structs and helper equivalence.
- Gate 2D: Python equivalence harness with fake driver/state, no live BLE changes.
- Gate 2E: optional Python integration behind environment flag; mirror C++ state into Python dictionaries.
- Gate 2F: bilateral US Constitution field test with C++ session manager enabled.
- Gate 2G: ESP32/microReticulum adapter design using Pyxis as architectural prior art, not as behavior authority.
## 11. Risk Register
| Risk | Mitigation |
|---|---|
| Accidentally changing current BLE behavior | Tests must encode current Python decisions before integration. |
| Duplicate handshake logic in `handle_peripheral_data` | Route both paths through one C++ session manager or retire legacy path deliberately. |
| MAC rotation assumptions | Keep identity as primary key; model address changes as session metadata. |
| Identity collision/truncation assumptions | Use full 16-byte identity internally; keep 16-hex short key for compatibility/display only. |
| Consuming legitimate 16-byte real data | Preserve current behavior initially; document and test it before any protocol change. |
| Python and C++ state divergence | Make C++ authoritative and Python dictionaries mirrors during transition. |
| ESP32 memory constraints | Use fixed pools or bounded containers for embedded builds; avoid unbounded allocation in hot paths. |
| Linux/BlueZ vs ESP32 callback concurrency | Core returns actions only; adapters own locking, deferral, and stack-safe execution. |
| Fragmenter/reassembler ownership transition | Move ownership in stages with equivalence tests and field transfer tests. |
## 12. SQL Output
Companion SQL file:
`migration/sql/start_gate2a_protocol_session_design_20260518_1434.sql`
```sql
BEGIN TRANSACTION;
INSERT INTO symbols (
source_file,
symbol_name,
symbol_type,
class_name,
line_number,
tag,
phase,
status,
cpp_candidate,
confidence,
rationale,
callers,
callees,
notes,
first_seen_at,
updated_at
)
VALUES (
'src/ble_reticulum/BLEInterface.py',
'_handle_identity_handshake',
'method',
'BLEInterface',
1202,
'GLUE',
'2_ble_protocol_session_manager',
'DESIGN',
1,
'high',
'Gate 2A treats this method as reference behavior for a C++ BLEPeerSessionManager, not as a literal function port. The method mixes protocol decisions with Python, Linux driver, and Reticulum adapter side effects.',
'_data_received_callback',
'_compute_identity_hash; _check_duplicate_identity; driver.disconnect; driver.get_peer_mtu; _get_fragmenter_key; BLEFragmenter; BLEReassembler; _spawn_peer_interface',
'Gate 2A opened as design-only. Preserve tag GLUE because BLEInterface remains adapter/glue; cpp_candidate=1 means reference behavior for C++ session ownership, not direct port.',
CURRENT_TIMESTAMP,
CURRENT_TIMESTAMP
)
ON CONFLICT(source_file, class_name, symbol_name, line_number) DO UPDATE SET
tag = 'GLUE',
phase = '2_ble_protocol_session_manager',
status = 'DESIGN',
cpp_candidate = 1,
confidence = 'high',
rationale = excluded.rationale,
callers = excluded.callers,
callees = excluded.callees,
notes = excluded.notes,
updated_at = CURRENT_TIMESTAMP;
INSERT INTO reviews (
symbol_id,
reviewed_at,
reviewer,
old_tag,
new_tag,
old_status,
new_status,
note
)
SELECT
symbol_id,
CURRENT_TIMESTAMP,
'Codex',
'GLUE',
'GLUE',
'REVIEWED',
'DESIGN',
'Gate 2A opened as design-only. _handle_identity_handshake is reference behavior for a future C++ BLEPeerSessionManager, not a literal function port. Phase 1 records remain untouched.'
FROM symbols
WHERE source_file = 'src/ble_reticulum/BLEInterface.py'
AND class_name = 'BLEInterface'
AND symbol_name = '_handle_identity_handshake'
AND line_number = 1202;
COMMIT;
```