ble-reticulum/migration/phase2/Gate2A_BLEProtocolSession_design_20260518_1434.md

24 KiB

Gate 2A: BLE Protocol Session Manager Design

Date: 2026-05-18 14:34 America/Los_Angeles

Scope: design only. No source implementation is included in this gate.

1. Executive Summary

Phase 2 should proceed, but not by porting BLEInterface._handle_identity_handshake() line-for-line. The correct C++ unit is best named BLEPeerSessionManager: it should own BLE peer/session protocol state, identity decisions, pending-handshake state, MTU/session metadata, and adapter action decisions. BLEProtocolSession is a reasonable internal type name for one peer, but the migrated unit needs to manage multiple peers and address/identity mappings, so BLEPeerSessionManager is the clearer top-level name.

_handle_identity_handshake() should not be ported literally because it mixes several layers:

  • pure protocol/session decisions: 16-byte identity detection, duplicate identity classification, identity key derivation, consumed/not-consumed routing;
  • Linux/Python adapter actions: RNS.log, driver.disconnect, Python dict mutation, Python lock use, BLEPeerInterface creation;
  • Reticulum integration: RNS.Transport.interfaces mutation through _spawn_peer_interface;
  • fragmentation allocation using the current backend shim.

The C++ target should return explicit decisions and adapter actions. Python and ESP32 code should perform platform effects.

2. Reference Behavior Map

Reference source: src/ble_reticulum/BLEInterface.py.

Function / behavior Pure protocol/session decision Linux/Python/Reticulum adapter action ESP32/microReticulum adapter equivalent State read State mutated Risk if changed
_data_received_callback lines 1309-1321 Identity handshakes are checked before normal data routing. Calls _handle_identity_handshake; if false, calls _handle_ble_data. BLE callback calls session manager first; if not consumed, pass bytes to C++ reassembler/data path. none directly none directly Reordering can pass identity bytes into reassembler or drop real data.
_handle_identity_handshake: len(data) != 16 Non-16-byte payload is not a handshake. Return False, normal data path continues. Return PassToReassembler / consume=false. incoming data length none Misclassifying normal payloads breaks data transfer.
Known identity, 16-byte payload matches Duplicate identity data is consumed silently. Log debug; return True; do not reassemble. Return ConsumeInput with duplicate-same decision. address_to_identity[address] none Pyxis currently diverges here; passing this to reassembler may create false fragment errors.
Known identity, 16-byte payload differs Identity-like 16-byte data is consumed with warning. Log warning; return True. Return ConsumeInput plus warning event. address_to_identity[address] none Could consume legitimate 16-byte Reticulum payload; preserving current behavior avoids behavior drift during migration.
New 16-byte identity Identity accepted if duplicate policy allows. Convert to bytes; compute identity hash. Store identity as 16-byte stable peer identity. incoming data local result state Incorrect identity size or hash handling breaks peer keying.
_compute_identity_hash lines 1899-1918 Short key is first 8 bytes rendered as 16 hex chars. Used for Python map keys. Use as compact display/session key where collision risk is acceptable; keep full identity as authoritative key. peer identity none Truncation collision is unlikely but not impossible; full identity should be primary in C++ state.
_get_fragmenter_key lines 1886-1897 Fragmentation key is full identity hex; address is ignored. Python dict key for fragmenters/reassemblers. Use 16-byte identity or full hex as fragmenter key; avoid MAC keying. peer identity none Address-keyed fragmentation would break MAC rotation recovery.
_check_duplicate_identity: invalid/missing identity Invalid identity does not trigger duplicate rejection. Return false. Return no rejection. identity length none Over-strict rejection can disconnect peers before handshake completes.
_check_duplicate_identity: same identity, different address, pending detach Allow reconnection; stale old address can be cleaned. Logs; calls _cleanup_stale_address; return false. Return AcceptNewIdentity, UpdatePeerAddress, maybe CleanupOldAddress. identity_to_address, _pending_detach stale address maps through adapter Rejecting this breaks MAC rotation and reconnect-after-disconnect.
_check_duplicate_identity: old address not connected and not in peers Allow reconnection; cleanup stale old address. Logs; calls _cleanup_stale_address; return false. Return AcceptNewIdentity, UpdatePeerAddress, CleanupOldAddress. driver.connected_peers, peers stale address maps through adapter False duplicate rejection after stale cleanup race.
_check_duplicate_identity: old connection zombie Allow new connection and disconnect old one. Logs; cleanup stale address; driver.disconnect(existing_address); return false. Return DisconnectOldPeer and accept new peer. _last_real_data, _zombie_timeout stale mapping cleanup through adapter Keeping zombie can block recovery; disconnecting wrong peer can flap stable links.
_check_duplicate_identity: old connection alive Reject new duplicate. Logs; return true. _handle_identity_handshake disconnects current address. Return RejectDuplicateIdentity, DisconnectCurrentPeer. identity_to_address, driver/peer liveness none in core; adapter disconnects current Allowing both creates duplicate interfaces and routing ambiguity.
Accept identity mappings New identity becomes authoritative for address and identity key. Mutates address_to_identity, identity_to_address. C++ manager owns address-to-identity and identity-to-session maps. Python mirrors during transition. identity hash maps Divergent Python/C++ maps are a transition risk.
MTU lookup Use negotiated MTU if available, else BLE minimum 23. Calls driver.get_peer_mtu(address). Adapter supplies MTU from connection handle; core falls back to 23. driver MTU result MTU Wrong MTU causes fragmentation failures or throughput regression.
Fragmentation state Accepted identity needs fragmenter and reassembler. Creates BLEFragmenter(mtu) and BLEReassembler(). C++ session manager owns/requests creation of C++ fragmenter/reassembler keyed by identity. MTU, identity fragmenter/reassembler pools Missing reassembler races early data.
Peer interface ready Accepted identity should make peer routable. Calls _spawn_peer_interface or updates existing interface address. Register/update microReticulum peer/session object. spawned_interfaces RNS.Transport.interfaces indirectly, address_to_interface Creating duplicate peer objects causes routing ambiguity.
Mark real data Identity handshake counts as real data for zombie tracking. Mutates _last_real_data[identity_hash]. C++ session liveness timestamp updated. time liveness state Zombie replacement decisions become inaccurate.
Remove pending identity Successful handshake clears pending timeout. Deletes _pending_identity_connections[address]. C++ pending identity table clears connection/address. pending map pending map Timeout may later disconnect a valid peer.
Exception path Any exception during handshake processing still consumes the packet. Logs error; returns True. Return ErrorConsumed if adapter/core detects invalid transition. exception none or partial adapter state Changing to pass-through can feed malformed identity into data path.
handle_peripheral_data lines 2113-2178 Parallel/legacy peripheral handshake path: if no identity and len 16, accept; otherwise if no identity, drop. Mutates maps, creates fragmenter/reassembler, spawns peer interface; lacks duplicate policy nuance from _handle_identity_handshake. Should be retired or routed through same C++ session manager. address_to_identity, connection_timeout maps, fragmentation, interface Divergent handshake paths are a major behavior-drift risk.

3. Pyxis Comparison

Comparative project inspected: /usr/local/src/pyxis/lib/ble_interface.

Relevant C++ structure:

  • BLETypes.h: UUIDs, MTU constants, timing constants, roles, connection state, peer state, BLEAddress, ConnectionHandle.
  • BLEPlatform.h: BLE hardware abstraction for scanning, advertising, connection management, GATT reads/writes, notification callbacks.
  • BLEIdentityManager.h/.cpp: local identity, identity handshake, address-to-identity mapping, MAC rotation callback, fixed-size handshake pools.
  • BLEPeerManager.h/.cpp: discovered/connected peer state, identity-keyed peer storage, MAC-only peer storage, handle lookup, MTU, scoring, blacklist, zombie cleanup.
  • BLEInterface.h/.cpp: orchestration layer that wires platform callbacks, identity manager callbacks, peer manager, fragmenters, reassembler, and microReticulum interface calls.
  • BLEFragmenter and BLEReassembler: portable packet protocol components.

Pyxis organizes transport/interface state with a useful layered split: platform callback layer, identity manager, peer manager, fragmentation components, and a top-level microReticulum InterfaceImpl. It also uses fixed-size pools in several places, which is a strong ESP32-S3 design prior.

Ideas to reuse:

  • Centralize BLE protocol constants and connection/session structs.
  • Use a platform abstraction rather than embedding NimBLE or BlueZ calls in protocol logic.
  • Key peers and reassembly by stable 16-byte identity, not MAC.
  • Use fixed-size pools or bounded containers for ESP32 builds.
  • Defer heavy work out of BLE stack callbacks when stack safety requires it.
  • Keep microReticulum interface registration outside the pure identity/session manager.

Ideas not to reuse blindly:

  • Pyxis BLEIdentityManager::isHandshakeData() treats a 16-byte payload as handshake only if no identity mapping exists. Current Python also consumes matching or mismatching 16-byte identity-like data when identity is already known.
  • Pyxis duplicate connected identity handling is simpler than current Python. Current Python allows replacement for pending detach, stale disconnected address, and zombie old connection.
  • Pyxis uses std::function, std::vector, std::shared_ptr, std::string, and dynamic Bytes in some interfaces. These may be acceptable, but ESP32 memory behavior should be measured before adopting the exact style.

Pyxis gives a useful ESP32-side adapter pattern, especially ConnectionHandle, IBLEPlatform, and the separation between identity/peer managers and BLEInterface. Its BLE code is early enough that it should guide architecture, not define behavior.

4. Proposed C++ Header

This is a proposal only; no header should be created in Gate 2A.

#pragma once

#include <array>
#include <cstdint>
#include <optional>
#include <string>
#include <vector>

namespace ble_reticulum {

using PeerIdentity = std::array<uint8_t, 16>;

enum class LocalRole : uint8_t {
    Unknown,
    Central,
    Peripheral,
};

enum class InputDecision : uint8_t {
    PassToReassembler,
    ConsumedDuplicateSameIdentity,
    ConsumedDuplicateMismatchedIdentity,
    AcceptedNewIdentity,
    RejectedDuplicateIdentity,
    ErrorConsumed,
};

enum class SessionActionType : uint8_t {
    ConsumeInput,
    PassToReassembler,
    AcceptNewIdentity,
    RejectDuplicateIdentity,
    DisconnectCurrentPeer,
    DisconnectOldPeer,
    CreateFragmentationState,
    MarkPeerReady,
    UpdatePeerAddress,
    RemovePendingIdentity,
    MarkRealData,
    CleanupOldAddress,
    Warn,
};

struct ConnectionId {
    // Linux adapter may use address string. ESP32 adapter may use handle.
    // The core stores both when supplied but does not call platform APIs.
    std::string address;
    uint16_t handle = 0xffff;
};

struct ConnectionSnapshot {
    ConnectionId current;
    LocalRole local_role = LocalRole::Unknown;
    std::optional<PeerIdentity> known_identity_for_address;
    std::optional<uint16_t> negotiated_mtu;

    // Adapter-supplied liveness facts needed to preserve current Python
    // duplicate identity behavior without letting C++ call BlueZ/NimBLE.
    std::optional<std::string> existing_address_for_identity;
    bool identity_has_pending_detach = false;
    bool existing_address_connected = false;
    bool existing_address_in_peer_table = false;
    bool existing_connection_is_zombie = false;
    double existing_last_real_data = 0.0;
};

struct SessionAction {
    SessionActionType type = SessionActionType::ConsumeInput;
    ConnectionId target;
    std::string old_address;
    std::string new_address;
    std::string message;
};

struct HandshakeResult {
    InputDecision decision = InputDecision::PassToReassembler;
    std::vector<SessionAction> actions;

    bool consumed = false;
    bool accepted = false;
    bool should_disconnect_current = false;
    bool should_disconnect_old = false;

    std::optional<PeerIdentity> peer_identity;
    std::string identity_key;    // Compatibility key: first 8 identity bytes as 16 hex chars.
    std::string fragmenter_key;  // Compatibility key: full identity as 32 hex chars.
    uint16_t mtu = 23;
};

struct PeerSessionView {
    PeerIdentity identity;
    std::string identity_key;
    std::string current_address;
    uint16_t current_handle = 0xffff;
    uint16_t mtu = 23;
    bool has_fragmentation_state = false;
    bool peer_ready = false;
    double pending_identity_since = 0.0;
    double last_real_data = 0.0;
};

class BLEPeerSessionManager {
public:
    explicit BLEPeerSessionManager(double pending_identity_timeout = 30.0,
                                   double zombie_timeout = 45.0);

    HandshakeResult handleIdentityHandshake(const ConnectionSnapshot& connection,
                                            const uint8_t* data,
                                            size_t data_size,
                                            double now_seconds);

    void markConnected(const ConnectionSnapshot& connection, double now_seconds);
    void markDisconnected(const ConnectionId& connection, double now_seconds);
    void markMtu(const ConnectionId& connection, uint16_t mtu);
    void markPendingIdentity(const ConnectionId& connection, double now_seconds);
    std::vector<ConnectionId> expiredPendingIdentities(double now_seconds) const;

    std::optional<PeerSessionView> sessionByAddress(const std::string& address) const;
    std::optional<PeerSessionView> sessionByIdentity(const PeerIdentity& identity) const;

    static bool isIdentityHandshakePayload(const uint8_t* data, size_t data_size);
    static PeerIdentity identityFromPayload(const uint8_t* data, size_t data_size);
    static std::string computeIdentityKey(const PeerIdentity& identity);
    static std::string computeFragmenterKey(const PeerIdentity& identity);

private:
    // Ownership: this manager owns protocol/session truth. Adapters may mirror it
    // during migration, but RNS objects, BLE stack objects, and actual I/O remain
    // outside this class.
    double pending_identity_timeout_;
    double zombie_timeout_;

    // Implementation may use std containers for native/Linux tests and fixed pools
    // for embedded builds. The public behavior should not depend on allocation model.
};

} // namespace ble_reticulum

5. State Ownership Model

Move to C++:

  • address-to-identity mapping;
  • identity-to-active-session mapping;
  • current address/connection metadata per identity;
  • MTU per peer/session;
  • pending identity state and timeout age;
  • last real data / zombie timing if duplicate replacement policy remains;
  • peer-ready state;
  • fragmentation state ownership or at minimum authoritative requests to create/update it;
  • reassembler ownership once the Python adapter can stop owning reassemblers.

Keep in adapters:

  • RNS.Transport.interfaces;
  • Python BLEPeerInterface instances;
  • BlueZ/Bleak/DBus objects;
  • ESP32 NimBLE/Bluedroid connection handles as platform resources, although handle values can be stored as opaque IDs in session state;
  • logging mechanism;
  • actual disconnect calls;
  • actual GATT reads/writes/notifies;
  • thread/mutex policy specific to Python runtime or ESP32 BLE stack callbacks.

6. Result / Action Model

The core should return decisions and adapter actions. Required actions:

  • ConsumeInput;
  • PassToReassembler;
  • AcceptNewIdentity;
  • RejectDuplicateIdentity;
  • DisconnectCurrentPeer;
  • DisconnectOldPeer;
  • CreateFragmentationState;
  • MarkPeerReady;
  • UpdatePeerAddress;
  • RemovePendingIdentity;
  • MarkRealData;
  • CleanupOldAddress;
  • Warn.

The adapter executes these actions in platform order. For example, a zombie replacement result can include CleanupOldAddress, DisconnectOldPeer, AcceptNewIdentity, CreateFragmentationState, MarkPeerReady, RemovePendingIdentity, and MarkRealData.

7. Python Adapter Boundary

During transition, BLEInterface.py would:

  1. Build ConnectionSnapshot from current Python dictionaries and driver state:
    • address_to_identity;
    • identity_to_address;
    • _pending_detach;
    • driver.connected_peers;
    • peers;
    • _last_real_data;
    • driver.get_peer_mtu(address).
  2. Call BLEPeerSessionManager.handleIdentityHandshake(...).
  3. If consumed is false, call _handle_ble_data(address, data).
  4. Mirror accepted C++ session state into Python dictionaries initially.
  5. Execute adapter actions:
    • RNS.log for messages/warnings;
    • driver.disconnect(address) for current/old peer disconnect;
    • create/update Python BLEFragmenter and BLEReassembler until owned by C++;
    • call _spawn_peer_interface(...) or update existing BLEPeerInterface.peer_address;
    • delete _pending_identity_connections[address].

The Python dictionaries should become mirrors, not the authoritative state.

8. ESP32 / microReticulum Adapter Boundary

On T-Beam SUPREME / ESP32-S3:

  1. BLE stack callback receives bytes and a connection handle/address.
  2. Adapter maps handle/address/role/MTU into ConnectionSnapshot.
  3. Adapter calls BLEPeerSessionManager.handleIdentityHandshake(...).
  4. If result says PassToReassembler, pass payload to C++ BLEReassembler.
  5. If result says CreateFragmentationState, initialize or update C++ fragmenter/reassembler pools keyed by identity.
  6. If result says MarkPeerReady, register or update the peer/session with the microReticulum interface abstraction.
  7. If result says disconnect, call NimBLE/Bluedroid disconnect through the ESP32 BLE platform layer.
  8. Logging goes through the firmware logging mechanism, not the session core.

The adapter should prefer BLE connection handles for platform operations and stable 16-byte identity for Reticulum/session routing. MAC address should remain metadata because rotation is expected.

9. Test Matrix

Gate Test Expected decision/action
2B non-16-byte payload PassToReassembler, consumed=false
2B new 16-byte identity AcceptedNewIdentity, ConsumeInput, identity key and fragmenter key set
2B known identity duplicate same ConsumedDuplicateSameIdentity, consumed=true
2B known identity duplicate mismatch ConsumedDuplicateMismatchedIdentity, consumed=true, warning action
2B duplicate identity active elsewhere RejectedDuplicateIdentity, DisconnectCurrentPeer
2B duplicate identity with stale/pending detach accept new identity, CleanupOldAddress, UpdatePeerAddress
2B duplicate identity with zombie old connection accept new identity, DisconnectOldPeer, CleanupOldAddress
2B MTU provided result MTU equals provided MTU
2B MTU missing result MTU falls back to 23
2B pending identity timeout expired connection IDs are returned without platform calls
2B peer address update identity session current address changes; fragmenter key unchanged
2C pybind11 identity key equivalence C++ computeIdentityKey equals Python _compute_identity_hash
2C pybind11 fragmenter key equivalence C++ computeFragmenterKey equals Python _get_fragmenter_key
2D consumed packet behavior fake Python harness sees same return/consume behavior as _handle_identity_handshake
2D pass-to-reassembler behavior non-handshake calls _handle_ble_data path in harness
2D duplicate active elsewhere fake driver records disconnect current peer
2D zombie old connection fake driver records disconnect old peer and accepts new peer
2D invalid/exception compatibility invalid transition returns ErrorConsumed where current Python would consume after exception

10. Migration Plan

  • Gate 2A: design only; this report and SQL tracking only.
  • Gate 2B: C++ session manager skeleton plus pure native unit tests.
  • Gate 2C: pybind11 binding plus Python tests for result structs and helper equivalence.
  • Gate 2D: Python equivalence harness with fake driver/state, no live BLE changes.
  • Gate 2E: optional Python integration behind environment flag; mirror C++ state into Python dictionaries.
  • Gate 2F: bilateral US Constitution field test with C++ session manager enabled.
  • Gate 2G: ESP32/microReticulum adapter design using Pyxis as architectural prior art, not as behavior authority.

11. Risk Register

Risk Mitigation
Accidentally changing current BLE behavior Tests must encode current Python decisions before integration.
Duplicate handshake logic in handle_peripheral_data Route both paths through one C++ session manager or retire legacy path deliberately.
MAC rotation assumptions Keep identity as primary key; model address changes as session metadata.
Identity collision/truncation assumptions Use full 16-byte identity internally; keep 16-hex short key for compatibility/display only.
Consuming legitimate 16-byte real data Preserve current behavior initially; document and test it before any protocol change.
Python and C++ state divergence Make C++ authoritative and Python dictionaries mirrors during transition.
ESP32 memory constraints Use fixed pools or bounded containers for embedded builds; avoid unbounded allocation in hot paths.
Linux/BlueZ vs ESP32 callback concurrency Core returns actions only; adapters own locking, deferral, and stack-safe execution.
Fragmenter/reassembler ownership transition Move ownership in stages with equivalence tests and field transfer tests.

12. SQL Output

Companion SQL file:

migration/sql/start_gate2a_protocol_session_design_20260518_1434.sql

BEGIN TRANSACTION;

INSERT INTO symbols (
    source_file,
    symbol_name,
    symbol_type,
    class_name,
    line_number,
    tag,
    phase,
    status,
    cpp_candidate,
    confidence,
    rationale,
    callers,
    callees,
    notes,
    first_seen_at,
    updated_at
)
VALUES (
    'src/ble_reticulum/BLEInterface.py',
    '_handle_identity_handshake',
    'method',
    'BLEInterface',
    1202,
    'GLUE',
    '2_ble_protocol_session_manager',
    'DESIGN',
    1,
    'high',
    'Gate 2A treats this method as reference behavior for a C++ BLEPeerSessionManager, not as a literal function port. The method mixes protocol decisions with Python, Linux driver, and Reticulum adapter side effects.',
    '_data_received_callback',
    '_compute_identity_hash; _check_duplicate_identity; driver.disconnect; driver.get_peer_mtu; _get_fragmenter_key; BLEFragmenter; BLEReassembler; _spawn_peer_interface',
    'Gate 2A opened as design-only. Preserve tag GLUE because BLEInterface remains adapter/glue; cpp_candidate=1 means reference behavior for C++ session ownership, not direct port.',
    CURRENT_TIMESTAMP,
    CURRENT_TIMESTAMP
)
ON CONFLICT(source_file, class_name, symbol_name, line_number) DO UPDATE SET
    tag = 'GLUE',
    phase = '2_ble_protocol_session_manager',
    status = 'DESIGN',
    cpp_candidate = 1,
    confidence = 'high',
    rationale = excluded.rationale,
    callers = excluded.callers,
    callees = excluded.callees,
    notes = excluded.notes,
    updated_at = CURRENT_TIMESTAMP;

INSERT INTO reviews (
    symbol_id,
    reviewed_at,
    reviewer,
    old_tag,
    new_tag,
    old_status,
    new_status,
    note
)
SELECT
    symbol_id,
    CURRENT_TIMESTAMP,
    'Codex',
    'GLUE',
    'GLUE',
    'REVIEWED',
    'DESIGN',
    'Gate 2A opened as design-only. _handle_identity_handshake is reference behavior for a future C++ BLEPeerSessionManager, not a literal function port. Phase 1 records remain untouched.'
FROM symbols
WHERE source_file = 'src/ble_reticulum/BLEInterface.py'
  AND class_name = 'BLEInterface'
  AND symbol_name = '_handle_identity_handshake'
  AND line_number = 1202;

COMMIT;