reticiulum-specification/todo.md

421 lines
24 KiB
Markdown
Raw Normal View History

# TODO
Outstanding work for the spec repo.
## Outreach
- [ ] **File a community-documentation issue on `markqvist/Reticulum`.**
Link this repo as a community-maintained byte-level spec. Ask
whether the maintainer would like to bless / link from the
official Reticulum manual. Frame it as a complement to (not a
replacement for) the existing operator-focused docs.
- [x] **File a `random_hash` interop issue on `attermann/microReticulum`.**
Filed as [attermann/microReticulum#48](https://github.com/attermann/microReticulum/issues/48)
on 2026-05-04. Documents the missing 5-byte timestamp half of
`random_hash`, the path-table replacement effect on mixed-vendor
meshes, and a fix recipe (the existing TODO comment, with a
suggestion that `millis()/1000` is acceptable for clockless
devices since the path-table comparison cares about ordering
not absolute time).
Document microReticulum random_hash interop bug (§4.1 callout + §9.10) Real interop bug found while checking what the thatSFguy/reticulum-lora-repeater stack does with the random_hash field. The repeater is a thin wrapper around attermann/microReticulum, which emits 10 fully-random bytes for random_hash rather than the upstream Python form of 5 random bytes + 5 bytes of big-endian uint40 unix_seconds. The Python form is preserved as a comment in microReticulum src/Destination.cpp:270-272, with a "CBA TODO add in time to random hash" next to the random-only implementation. Effect: Python RNS receivers parse random_hash[5:10] as an emission timestamp via Transport.timebase_from_random_blob (RNS/Transport.py: 3100-3101), and use it for path-table replacement decisions in the equal-or-greater-hop branch (RNS/Transport.py:1721-1745). A uniformly-random uint40 has median ~5.5e11 ≈ year 19403 AD, so microReticulum announces look "far-future" to Python receivers and permanently win replay-ordering comparisons until the path TTL expires. First-contact path-table population is unaffected — the bug only surfaces on path replacement, which makes it a quiet failure mode in mixed-vendor meshes (microReticulum repeater + Python rnsd). Symmetry: microReticulum receivers don't consult the timestamp half, so microReticulum-to-microReticulum traffic is unaffected. The asymmetry is what makes the symptom show up only when a Python relay is also in the mesh. The repeater's pre_build.py aggressively patches FIVE other microReticulum protocol bugs (ratchet announce parsing, identity hash length 16→32, validate_announce/announce diagnostics, DATA/ PROOF forwarding for transport-mode, path-table write dedup) — but not this one. Filed as an outreach todo to upstream the fix to attermann/microReticulum. SPEC.md §4.1 — adds an UNVERIFIED callout naming the deviation, citing the exact source location and explaining the propagation path through Python's path-table logic. SPEC.md §9.10 — gotcha entry making the bug findable from the gotchas list, with a suggested clean-room workaround (emit the timestamp half yourself, even just seconds-since-boot). todo.md — outreach entry to file an issue on attermann/microReticulum proposing the fix. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 11:27:03 -04:00
## Test infrastructure
- [x] **Bootstrap `test-vectors/identities.json`** — Alice + Bob
identities populated against RNS 1.2.0. Regenerator at
`tools/regen_identities.py`.
- [x] **Bootstrap `test-vectors/announces.json`** — two vectors
(no-ratchet + with-ratchet) signed by Alice. Regenerator at
`tools/regen_announces.py` (deterministic via patched
`Identity.get_random_hash` + module-local `time.time` shim).
- [x] **Bootstrap `test-vectors/lxmf.json`** — two opportunistic
LXMF vectors Alice → Bob, full plaintext + Token-encrypted
ciphertext. Regenerator at `tools/regen_lxmf.py` (deterministic
via patched `LXMessage.timestamp`, ephemeral X25519, and
Token CBC IV).
- [x] **Bootstrap `test-vectors/links.json`** — Link handshake
vector with deterministic ephemerals. Regenerator at
`tools/regen_links.py`. Records LINKREQUEST + LRPROOF wire
bytes plus the derived session key both sides must agree on.
- [x] **Write the priority verifier scripts** listed in
`tools/README.md` — all eight done plus three follow-ons
(`verify_proof_packet.py`, `verify_rnode_split.py`,
`verify_stamps.py`, `verify_ratchet_dedup.py`). Status table
lives in `tools/README.md`.
## Open `⚠️ UNVERIFIED` items in SPEC.md
These need either a runtime test or a stronger upstream source citation
to remove their markers:
- [x] **§2.3 Originator HEADER_1 → HEADER_2 conversion.** Verified
against RNS 1.2.0 by `tools/verify_packet_header.py`, which
seeds `Transport.path_table` with a multi-hop entry and confirms
the converted wire bytes via stubbed `Transport.transmit`.
Citation updated to `RNS/Transport.py:1074-1083`.
- [x] **§4.3 The 3-element `[name, stamp_cost, [capabilities]]`
app_data variant.** Verified against LXMF 0.9.6 by
`tools/verify_announce_app_data.py`. Finding: in this LXMF
version the producer emits a 2-element form only (the
`supported_functionality` line at `LXMF/LXMRouter.py:999` is
dead code); the parser is prepared for a 3-element form via
`compression_support_from_app_data`. SPEC.md §4.3 updated to
describe the actual current behavior.
- [x] **§7.1 path? always precedes LXMF DATA.** Verified against
LXMF 0.9.6 by `tools/verify_path_request.py`. Finding: the
preamble fires only when `not has_path()` AND method is
OPPORTUNISTIC; the retry path can fire a second `request_path`
after `MAX_PATHLESS_TRIES` (`LXMRouter.py:2571+`). SPEC.md §7.1
rewritten accordingly. Also fixed a documentation bug in §1.2
(path-request name_hash column).
- [x] **§7.4 Ratchet ring count default = 8.** False — actual upstream
default is `Destination.RATCHET_COUNT = 512` at
`RNS/Destination.py:85` in RNS 1.2.0, with
`RATCHET_INTERVAL = 30*60` (line 90) and
`RATCHET_EXPIRY = 60*60*24*30` (`RNS/Identity.py:69`).
SPEC.md §7.4 corrected.
## Open `⚠️` items needing a runtime verifier
- [x] **`tools/verify_proof_packet.py` locks in §6.5.** Done.
- [x] **`tools/verify_rnode_split.py` locks in §8.3.** Done.
- [x] **`tools/verify_link_handshake.py` locks in §6.2 / §6.3.** Done.
## Spec gaps for a functional client (priority-ordered)
The items below are missing pieces that prevent a client built only from
this spec (plus the existing flows/) from interoperating with upstream.
Tier 1 = required to talk at all to the mesh as a leaf LXMF client.
Tier 2 = required for a client that's actually useful (chat that works
in the wild). Tier 3 = required to act as a transport node / relay.
Where I've already done the source reading, I've left the file/line
citations inline so whoever picks the item up can start without
re-research.
### Tier 1 — required for a barebones leaf LXMF client to interop
Add receive-announce flow + SPEC §4.5 validation rules Closes the highest-priority Tier 1 gap. Without this, a from-scratch client can't learn any peers exist; known_destinations stays empty and every outbound message fails at recall(dest_hash). SPEC.md §4.5 (new): announce validation rules with full citations to RNS/Identity.py::validate_announce (line 496) and the dispatch path in RNS/Transport.py:1623-2024. Covers the body parse with context_flag branch, signed_data reconstruction (including the empty-bytes-not-absent ratchet rule), Ed25519 signature verification, dest_hash recomputation, public-key collision rejection, blackhole list, cache update order (known_destinations -> known_ratchets -> path_table), PATH_RESPONSE distinction, and the implementation-private SHOULD rules around ingress rate limiting, random_blob history caps, and self-announce filtering. flows/receive-announce.md: chronological walk through 9 steps from deframing to handler dispatch, with the cheap-pre-filter design (signature-checked-then-counted) called out, the burst-active ingress limiter explained against IC_BURST_FREQ_NEW=6Hz / IC_BURST_FREQ=35Hz, the path-table decision tree, and the announce_handlers fan-out with aspect_filter and PATH_RESPONSE filtering. Ends with a wire-byte diagram and a per-step source map. Two side fixes found while drafting: - SPEC.md §4.1 had random_hash described as "10 random bytes". It's actually random_hash = get_random_hash()[0:5] + int(time.time()).to_bytes(5, "big") per RNS/Destination.py:282. Transit relays parse the trailing 5 bytes via timebase_from_random_blob (RNS/Transport.py:3100) for replay-ordering decisions. - SPEC.md §2.5 contexts table was missing PATH_RESPONSE = 0x0B (RNS/Packet.py:83). flows/README.md status table updated; the priority-ordered todo list also gets a few new entries spun off from the work (send-announce, forward-announce, send-resource, path-discovery flows). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 10:56:11 -04:00
- [x] **`flows/receive-announce.md` + SPEC.md §4.5 announce validation
rules.** Done. SPEC.md §4.5 covers the MUST validation rules
(body parse with `context_flag` branch, signed_data
reconstruction, signature verification, dest_hash recomputation,
public-key collision rejection, blackhole list, cache update
order, PATH_RESPONSE handling). `flows/receive-announce.md` walks
the chronology end-to-end. Side fixes: SPEC.md §4.1 corrected
(`random_hash` is 5 random bytes + 5 bytes big-endian uint40
unix_seconds, not 10 random bytes); SPEC.md §2.5 contexts table
now lists `0x0B PATH_RESPONSE`.
Add §10 Resource fragmentation + send-resource flow Closes Tier 1 #2. Without this, a client can't send any LXMF body larger than LINK_PACKET_MAX_CONTENT ≈ 360 B, can't receive a NomadNet page that doesn't fit in one MTU, and can't transfer files via rncp. SPEC.md §10 (new): full Resource fragmentation protocol with citations to RNS/Resource.py. 13 sub-sections covering preparation pipeline (metadata prefix → optional bz2 → random_hash prefix → SHA-256 over data||random_hash → link.encrypt of the WHOLE blob → part-split into SDU-sized chunks → 4-byte map_hash hashmap with collision guard within COLLISION_GUARD_SIZE = 2*WINDOW_MAX + HASHMAP_MAX_LEN), wire context inventory (RESOURCE_ADV / RESOURCE / RESOURCE_REQ / RESOURCE_HMU / RESOURCE_PRF / RESOURCE_ICL / RESOURCE_RCL), the msgpack dict for the advertisement (t/d/n/h/r/o/i/l/q/f/m), the request payload format with the hashmap_exhausted sentinel, the lazy-hashmap RESOURCE_HMU continuation that lets large hashmaps avoid breaking small-MTU links, the proof body resource_hash(32) || full_proof = SHA256(data||hash) (32) returned in a PROOF-type packet, the sliding window dynamics (WINDOW=4 → WINDOW_MAX_FAST=75 / WINDOW_MAX_VERY_SLOW=4 with rate detection), multi-segment cutover at MAX_EFFICIENT_SIZE = 1 MiB - 1 with the lazy `__prepare_next_segment` pattern, and the encryption-before-split layering that means a missing part can't be decrypted in isolation. flows/send-resource.md: 10-step chronology from RNS.Resource() construction through advertise → req/parts loop → HMU continuation → final RESOURCE_PRF → multi-segment fan-out, with a wire-byte ladder diagram and a per-step source map. Side fixes found while drafting: - SPEC.md §2.5 contexts table was wildly incomplete and had a real bug: KEEPALIVE was listed as 0xFD; upstream is 0xFA per RNS/Packet.py:87. 0xFD is actually LINKPROOF (the regular DATA-receipt context, §6.5). Replaced with the full upstream context inventory: NONE, RESOURCE_*, CACHE_REQUEST, REQUEST, RESPONSE, PATH_RESPONSE, COMMAND, COMMAND_STATUS, CHANNEL, KEEPALIVE, LINKIDENTIFY, LINKCLOSE, LINKPROOF, LRRTT, LRPROOF. - SPEC.md §6.5 reworded: "send back a PROOF packet (no context byte specifics)" → "send back a PROOF-type packet with context = LINKPROOF (0xFD)" for clarity. - The previously-numbered §10 "Test vectors" and §11 "Source map" are renumbered to §11 / §12 so the new Resource section lands in its correct protocol-stack position. agent.md §5 audit table updated accordingly. flows/README.md status table updated; receive-resource.md added as the next pending flow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 11:08:40 -04:00
- [x] **SPEC.md §10 / `flows/send-resource.md`: Reticulum Resource
fragmentation.** Done. SPEC.md §10 covers the wire-level MUST
rules: 13 sub-sections from "when Resource runs" through wire
contexts (ADV / REQ / RESOURCE / HMU / PRF / ICL / RCL),
hashmap collision-guard, sliding window, multi-segment cutover
at MAX_EFFICIENT_SIZE = 1 MiB - 1, and the encryption-then-split
layering. `flows/send-resource.md` walks the chronology in 10
steps with a wire-byte ladder diagram. Side fixes during the
drafting: SPEC.md §2.5 contexts table now lists ALL upstream
contexts (was missing all RESOURCE_*, REQUEST/RESPONSE,
COMMAND, CHANNEL, LINKIDENTIFY, LINKCLOSE, LRRTT entries) and
corrects KEEPALIVE from 0xFD (which is actually LINKPROOF) to
0xFA per `RNS/Packet.py:87`. SPEC.md §6.5 wording updated to
use the correct LINKPROOF context name. The previously-existing
§10 "Test vectors" and §11 "Source map" were renumbered to §11
and §12 to put §10 in the protocol-stack flow.
Expand §6.5 with full PROOF body wire spec (explicit vs implicit) Closes Tier 1 #3. The previous §6.5 was one paragraph that named "a PROOF packet" without specifying its body shape, signing input, or explicit/implicit choice — exactly the level of vagueness that caused the SF mobile client to ship the wrong proof shape on its first cut. New §6.5 has six sub-sections: §6.5.1 Two body formats: explicit = packet_hash(32) || signature(64) = 96B implicit = signature(64) = 64B Distinguished purely by length at the receiver per PacketReceipt.validate_proof (RNS/Packet.py:497-548). §6.5.2 Sender-side policy. Opportunistic DATA proofs default to the IMPLICIT form (Reticulum.__use_implicit_proof = True at RNS/Reticulum.py:259), only switching to explicit when the operator's config sets use_implicit_proof = No. Link DATA proofs are hardcoded explicit on both emit (Link.prove_packet at RNS/Link.py:383-394) and validate (validate_link_proof at RNS/Packet.py:449-494, with the implicit branch commented out). §6.5.3 Where the proof is addressed: opportunistic -> packet_hash[:16] as a synthetic ProofDestination link -> link.link_id §6.5.4 Wire summary with byte-position ladders for both forms. §6.5.5 Receiver tolerance: validators MUST accept both 64- and 96-byte bodies for opportunistic DATA proofs since the upstream default differs from what most non-RNS clients assume. §6.5.6 Restates the Link-DATA mandatory-receipt rule with context-byte clarification. Side fix: §2.5 contexts table description for LINKPROOF (0xFD) corrected. The constant is defined upstream but NOT actually emitted by either Identity.prove or Link.prove_packet — both build their proof packets with packet_type = PROOF and context = NONE (0x00). LINKPROOF (0xFD) is reserved but unused in RNS 1.2.0; the proof-ness of a packet is conveyed by packet_type, not context. todo.md gets a new "tools/verify_proof_packet.py" entry under the runtime-verifier section to lock the explicit/implicit dispatch in with a runtime test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 11:18:56 -04:00
- [x] **SPEC.md §6.5 expansion: regular (non-LRPROOF) PROOF body.**
Done. SPEC.md §6.5 now has six sub-sections covering explicit
(96B `packet_hash || signature`) vs implicit (64B
`signature`-only) forms, the upstream default
(`Reticulum.__use_implicit_proof = True` per
`RNS/Reticulum.py:259` — opportunistic DATA proofs default to
the implicit form on the wire), the Link DATA proof exception
(always explicit per `RNS/Link.py:383-394`), the
length-dispatch receiver-side, where the proof packet is
addressed (`packet_hash[:16]` as a synthetic ProofDestination
vs `link.link_id` for Link proofs), wire-byte ladders for both
forms. The previously-misleading SPEC §2.5 entry for
`LINKPROOF (0xFD)` is corrected — it's a defined-but-unused
constant in RNS 1.2.0; the actual proof packets carry
`context = NONE (0x00)`. todo for `tools/verify_proof_packet.py`
moves to "needs a runtime verifier" section.
Add §6.6 for the 3-byte MTU/mode signalling field Closes Tier 1 #4. Without this, a clean-room Link implementation that either always emits the signalling slot or always omits it will fail handshakes against the opposite-config peer because the LRPROOF signed_data either includes or excludes the 3 bytes — and the signature verifies against exactly one of those forms. §6.6 covers six sub-sections: §6.6.1 Wire layout. 24-bit big-endian packed value: top 3 bits of byte 0 = mode, low 21 bits = mtu. Citations to encoder at RNS/Link.py:147-151 and decoders at :154+, :171+. §6.6.2 Mode field. 3 bits, values 0..7. Currently only MODE_AES256_CBC = 0x01 is in ENABLED_MODES; six others are reserved (AES-128, AES-256-GCM, OTP, four PQ slots). Sender-side signalling_bytes() raises on disabled modes; receiver-side mode_from_lr_packet returns the raw integer without validation. handshake() at line 353 enforces. §6.6.3 MTU field. 21 bits, max 2,097,151. Forward-looking width; real interfaces are way smaller. Initiator emits its next-hop HW_MTU; responder clamps to min(its-view, requested) by rewriting the LINKREQUEST data buffer in place at RNS/Transport.py:2042-2051 BEFORE Destination.receive runs, so the eventual LRPROOF carries the clamped value. The clamp also leaves link_id invariant because §6.3's hashable_part strips trailing signalling. §6.6.4 Presence detection — purely by body length. Lengths 64 vs 67 for LINKREQUEST, 96 vs 99 for LRPROOF. No flag bit. §6.6.5 Signed_data inclusion rule (the interop break) — the LRPROOF signs over the signalling bytes when present. A peer that omits them when present (or includes them when absent) gets a signed_data mismatch and the link never establishes. §6.6.6 link_mtu_discovery = No config option. Disables emit on the initiator side; receivers don't need a parallel switch (length-dispatch handles it). §6.1 and §6.2 inline references updated to point at §6.6 for the bit layout instead of the previous "[signalling(3)]" placeholder. The existing §6.6 "Source" entry renumbered to §6.7. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 11:36:51 -04:00
- [x] **SPEC.md §6 sub-section: 3-byte MTU/mode signalling field.**
Done. SPEC.md §6.6 covers the full 24-bit packed format
(3-bit mode in the top of byte 0, 21-bit MTU in the low 21
bits), the encode/decode primitives, the seven defined modes
(only `MODE_AES256_CBC = 0x01` is enabled in RNS 1.2.0; six
others are reserved for AES-128, AES-256-GCM, OTP, and the
post-quantum migration), the responder-side MTU clamp
mechanism (an in-place rewrite of the LINKREQUEST data buffer
so the LRPROOF signed_data carries the clamped value but the
link_id stays invariant), the length-only presence detection,
and the inclusion-in-signed_data trap that breaks link
handshakes when one side emits signalling and the other
doesn't. §6.1 and §6.2 inline references updated to point at
§6.6 for the bit layout. Existing §6.6 "Source" renamed to §6.7.
Expand §7.2 + add path-discovery flow Closes Tier 1 #5. The previous §7.2 was four bullet points naming the "answer with an announce" rule but missing every wire detail — implementation-time the SF mobile client got steps 4 (dedup) and 5 (local-destination check) wrong on its first cut and the bug only surfaced as "I can message my own destination but no one else can reply". §7.2 is now six sub-sections: §7.2.1 Path-request packet parse rules. The handler's slice recipe with branching on payload length (32B = leaf form target||tag; 48B+ = transport form target||transport_id|| tag); tag cap at 16B; tagless-request rejection. §7.2.2 Tag-based dedup via Transport.discovery_pr_tags. The unique_tag = dest_hash || tag construction, the 32000- entry cap, why missing this turns a leaf into a broadcast- storm amplifier on retransmits. §7.2.3 The five-way dispatch in Transport.path_request: local-destination / transit-knows-path / local-client- forward / discovery-recursive / drop. Branches 1 and 5 are the only ones a leaf needs. §7.2.4 Path-response announce wire format. Body byte-identical to a regular announce (§4.1); only the outer packet context byte differs (NONE → PATH_RESPONSE 0x0B). PR_TAG_WINDOW=30s body-cache that serves identical wire bytes to racing relays so transit dedup converges. §7.2.5 Timing constants: PATH_REQUEST_GRACE = 0.4s, + PATH_REQUEST_RG = 1.5s for roaming-mode interfaces. Local-destination and local-client originator branches bypass the grace. §7.2.6 Minimum responsibility for a non-transport leaf — the six-step protocol-level recipe. flows/path-discovery.md: 9-step chronology covering both single-hop leaf-owns-target and two-hop transit-relay-knows-path cases. Wire-byte ladder diagrams for both. Notes the ingress-limit bypass for path-responses (Transport.py:1632-1639), the receive_path_responses opt-in for handler dispatch (Transport.py:1989-1991), and the timeout/escalation path through LXMRouter.process_outbound's MAX_PATHLESS_TRIES retry counter. flows/README.md status table updated. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 11:50:10 -04:00
- [x] **SPEC.md §7.2 expansion + new flow `flows/path-discovery.md`:
path-response announce vs periodic announce.** Done. SPEC.md
§7.2 now has six sub-sections: parse rules for the path-request
packet, tag-based dedup via `discovery_pr_tags`, the five-way
dispatch in `Transport.path_request` (local responder /
transit-knows-path / local-client-forward / discovery-recursive
/ drop), the path-response announce wire format (regular
announce body + `context = PATH_RESPONSE = 0x0B`), the
`PR_TAG_WINDOW = 30s` body-cache mechanism that lets multiple
relays receive the same wire bytes for dedup convergence,
timing rules (`PATH_REQUEST_GRACE = 0.4s` + `PATH_REQUEST_RG =
1.5s` for roaming-mode), and a minimum-leaf-responsibility
summary. `flows/path-discovery.md` walks the 9-step chronology
with two wire-byte ladders (single-hop leaf-owns-target and
two-hop transit-relay-knows-path).
Fix and expand §1.3 — on-disk identity format (real spec bug!) Closes Tier 1 #6 and the entire Tier 1 sweep. Previous §1.3 said the on-disk byte order was Ed25519_priv(32) || X25519_priv(32) ("opposite of the public_key concatenation"). That was WRONG. Verified empirically against RNS 1.2.0 by round-tripping the existing test vectors through Identity.to_file and reading the bytes back: disk = X25519_priv(32) || Ed25519_priv(32) # same as public_key This matches Identity.get_private_key() at RNS/Identity.py:694-698: return self.prv_bytes + self.sig_prv_bytes where prv_bytes is X25519 (line 679) and sig_prv_bytes is Ed25519 (line 682). It also matches load_private_key at line 706-717. Implementations following the prior spec wording would have written identity files that fail to load on upstream RNS — a real interop break that would have been very hard to debug because the failure is in keypair-loading, before any signature operation runs. §1.3 rewritten and expanded: - Correct byte order with citation to upstream code. - 64-byte raw-blob format with explicit "no header / no version / no checksum / no encryption". - File-system facts: no chmod, expected to live in OS-protected storage, filename is caller-controlled. - from_bytes HAZARD note: feeding raw random bytes skips the `cryptography` library's keypair-generation invariants (X25519 RFC 7748 §5 scalar clamping etc). - Cross-implementation portability follows automatically because there's nothing in the file but the bytes. - ⚠️ Spec correction callout warning future readers about the previous wording so the bug history is on record. tools/verify_destination_hash.py extended with a §1.3 to_file / from_file round-trip section. For each test vector it now: - writes the identity via to_file - asserts the on-disk file is exactly 64 bytes - asserts disk[:32] hex == expected x25519_priv_hex - asserts disk[32:64] hex == expected ed25519_priv_hex - reloads via from_file and asserts identity_hash invariance This is what would have caught the bug if it had been there from the start. tools/README.md updated to reflect §1.3 coverage. Cumulative Tier 1 status: 6 of 6 done. A from-scratch client built from §1-§9 + §10 + §11 + flows/ can now interop with upstream Reticulum / LXMF / RNode for identity, announce, opportunistic LXMF DATA, Resource fragmentation, regular PROOF receipts, link handshakes with MTU/mode signalling, path-? discovery, and KISS/HDLC/RNode-air-frame framing. Tiers 2 and 3 remain open in the todo for follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 11:54:54 -04:00
- [x] **SPEC.md §1.3 expansion: identity on-disk format.** Done — and
the previous wording was actually wrong about the byte order!
Empirically verified by reading `Identity.get_private_key()` at
`RNS/Identity.py:694-698` and `load_private_key` at line 706-717,
then round-tripping `to_file(path)` and reading back the bytes
against `test-vectors/identities.json`: the on-disk order is
X25519_priv(32) || Ed25519_priv(32), **same** as the public_key
concatenation, NOT opposite as the previous spec text claimed.
Implementations following the prior wording would have corrupted
identity files when interoperating with upstream Python RNS.
§1.3 now covers: 64-byte raw blob with no header/version/checksum/
encryption; the from_bytes HAZARD note (raw random bytes skip the
`cryptography` library's keypair invariants); cross-implementation
portability is automatic since there's nothing in the file but
the bytes; a ⚠️ "Spec correction" callout warning future readers
that prior revisions had this wrong. `tools/verify_destination_hash.py`
gets a new §1.3 round-trip section that writes via `to_file`,
reads back, asserts the byte slice matches the test vector, and
reloads via `from_file` to confirm identity_hash invariance.
### Tier 2 — required for a client to be useful in the wild
Add §5.8 propagation node protocol (Tier 2 #8 — TIER 2 COMPLETE) Closes Tier 2. Six sub-sections covering store-and-forward LXMF: §5.8.1 The lxmf.propagation destination, well-known name_hash e03a09b77ac21b22258e, four registered request handlers (/offer, /get, /stats, /sync) all reached via §11 REQUEST/RESPONSE protocol on an active Link. §5.8.2 Peer-to-peer sync via /offer: data = [peering_key(32), [transient_id_1, ...]] Three response shapes: False (peer has all), True (peer wants all), [list] (peer wants subset). Wanted messages are bundled into a Resource carrying the full encrypted LXMF bodies — propagation nodes never decrypt. §5.8.3 Client retrieval via /get: data = [wanted_ids, have_ids, optional_limit_kb] Listing query (both None), fetch query (wanted_ids set), purge query (have_ids set). The propagation node only returns messages keyed to the requester's destination_hash — structural defense against mis-routing. §5.8.4 Peering keys: PoW with 25 rounds of workblock expansion (~6 KiB), amortized once per peering relationship. peering_id = self_identity_hash || remote_identity_hash. §5.8.5 Propagation node announce app_data: distinct 7-element msgpack array (vs §4.3's 2-element form for lxmf.delivery). Element [5] is a 3-list of [stamp_cost, stamp_cost_flexibility, peering_cost] — most common interop break is misparsing as a single integer. §5.8.6 Source map across LXMRouter, LXMPeer, LXStamper, LXMF. Old §5.8 'Source' renamed to §5.9. Tier 2 complete: 8 of 8 done. Moving to Tier 3 (transport-relay specs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:12:21 -04:00
- [x] **SPEC.md §5.8: Propagation node protocol.** Offline message retrieval
via store-and-forward propagation nodes. Without this, every
message requires both peers online simultaneously. Authoritative
source: `LXMF/LXMRouter.py::process_propagated`, the
`lxmf.propagation` peering exchange (`peer()` / `sync()` between
nodes — `LXMRouter.py:1892+, 2118+`). The `propagated` method is
already in `LXMessage.py` but the wire protocol between
propagation nodes is undocumented. Cross-flow:
`flows/send-propagated-lxmf.md` (already a `⏳` entry in
`flows/README.md`).
- [x] **SPEC.md §6 expansion: KEEPALIVE / link teardown protocol.**
Done in §6.7 (old §6.7 Source moved to §6.8). Five
sub-sections: KEEPALIVE wire form (`0xFA` context, initiator-
originated `0xFF` ping → responder `0xFE` pong, body
Token-encrypted), cadence (`RTT × 205.7` clamped to `[5,360]s`),
STALE→CLOSED watchdog transitions, LINKCLOSE wire form
(`0xFC` context, body = 16-byte `link_id` Token-encrypted with
`plaintext == link_id` auth check), teardown reason codes
(`TIMEOUT/INITIATOR_CLOSED/DESTINATION_CLOSED`), and the
six-step minimum-receiver-responsibility recipe.
- [x] **SPEC.md §5.x (new): LXMF stamps + tickets for spam control.**
`LXMF.Stamp` (proof-of-work field in the optional 5th element of
the msgpack payload), `FIELD_TICKET` lookup. Modern Sideband 1.x
treats missing-stamp messages as spam in the UI. Spec currently
doesn't mention stamps at all. Authoritative source:
`LXMF/LXMessage.py::validate_stamp`, `LXMF/LXMRouter.py:1741-1774`
(the stamp-check branch in `lxmf_delivery`).
- [x] **SPEC.md §11 (new): REQUEST/RESPONSE protocol covers NomadNet pages.** Distinct from
LXMF — pages fetched over a Link with `context = CTX_REQUEST (0x09)`
/ `CTX_RESPONSE (0x0a)` (already in §2.5 contexts table). Request
body is a path string + field map; response is a body bytes blob.
Without this, a client can do LXMF chat but can't render NomadNet
content (nodes serving content, telemetry, micron pages).
- [x] **SPEC.md §1.4 (new): GROUP destinations.** Done. Five
sub-sections: key generation (`Token.generate_key()` 64-byte
AES-256 default), wire format (Token form same as Link-derived
`iv || ciphertext || hmac`, no eph_pub prefix because no ECDH),
destination hash recipe with optional identity disambiguation,
on-disk format (raw key bytes, no header/encryption/checksum),
and a why-rarely-used note covering forward-secrecy gaps and
key-distribution being unsolved at the protocol layer.
- [x] **SPEC.md §8.4 (new): RNode KISS configuration handshake.**
Done. Full bring-up sequence: command-byte inventory, the
`CMD_DETECT`/`DETECT_REQ`/`DETECT_RESP` exchange, 4-byte
big-endian encoding for `FREQUENCY`/`BANDWIDTH`, single-byte
payloads for `TXPOWER`/`SF`/`CR`/`RADIO_STATE`, the 12-step
bring-up recipe, and the receive sidecar metadata format
(`RSSI = byte - 157`, `SNR = signed Q6.2 / 4`).
- [x] **SPEC.md §8.5 (new): CSMA / airtime tracking.** Done as a
follow-on to §8.4. Airtime caps via `CMD_ST_ALOCK` /
`CMD_LT_ALOCK` (2-byte big-endian uint16 of `limit_percent ×
100`), `Reticulum.ANNOUNCE_CAP = 2.0` default; pre-TX carrier
sense is firmware-private and not exposed to the host — host
clients don't implement their own LBT, but native-LoRa clients
(e.g. the repeater repo) need the algorithm from
`RNode_Firmware.ino:683-712`.
- [x] **SPEC.md §6.5 second sub-bullet: implicit vs explicit proof
mode.** Done as part of the §6.5 expansion (Tier 1 #3). The
length-dispatch validator at `PacketReceipt.validate_proof`
and the `should_use_implicit_proof()` config switch are
documented in §6.5.1-§6.5.2 with full citations.
Add §12 transport-relay behaviour (Tier 3 — TIER 3 COMPLETE) Closes Tier 3 in a single consolidated section because all five items share state (path_table, announce_table, link_table, reverse_table, tunnels) and are emergent behaviours of the same Transport.inbound dispatch logic. Seven sub-sections: §12.1 transport_enabled toggle — leaf clients populate path_table only for destinations they personally need; transport-mode nodes populate it for everything they hear about. §12.2 DATA forwarding rules — three-case branch on remaining_hops (>1 forward as HEADER_2 with new transport_id; ==1 strip transport_id and forward as HEADER_1 broadcast; ==0 local). LINKREQUEST forwarding extras (link_table entry + §6.6 MTU clamp). Non-LINKREQUEST gets a reverse_table entry. §12.3 ANNOUNCE rebroadcasting — announce_table retransmit queue, per-interface ANNOUNCE_CAP airtime budget, announce_queue drain order (lowest-hop-count first), random_blob replay defence with MAX_RANDOM_BLOBS sliding window, and the PATH_RESPONSE short-circuit (path-responses go on a specific interface, not broadcast). §12.4 Path table management — entry shape (IDX_PT_* indexes), three TTLs by interface mode (AP_PATH_TIME 1h, ROAMING_PATH_TIME 4h, PATHFINDER_E 30 days), stale-paths eviction, persistence to storagepath/paths. §12.5 Reverse-table link transport — LRPROOF forwarding via link_table validation against the destination's known long-term Ed25519 pub, Link DATA forwarding once link_table[IDX_LT_VALIDATED] is set, PROOF receipt forwarding via reverse_table (one-shot pop on use, REVERSE_TIMEOUT bound for memory). §12.6 Tunnels and shared-instance protocol — discovery_path_requests recursive search (15s timeout), tunnels[] persistence across interface flap, shared-instance protocol (regular Reticulum packets over TCP loopback; the 'sharing' is Transport state, not wire format). §12.7 Source map. Old §12 Test vectors -> §13; old §13 Source map -> §14. Section order preserves protocol content before appendices. TIER 3 COMPLETE. All Tier 1, 2, and 3 spec gaps closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:16:26 -04:00
### Tier 3 — required to act as a transport node / relay (DONE)
Add §12 transport-relay behaviour (Tier 3 — TIER 3 COMPLETE) Closes Tier 3 in a single consolidated section because all five items share state (path_table, announce_table, link_table, reverse_table, tunnels) and are emergent behaviours of the same Transport.inbound dispatch logic. Seven sub-sections: §12.1 transport_enabled toggle — leaf clients populate path_table only for destinations they personally need; transport-mode nodes populate it for everything they hear about. §12.2 DATA forwarding rules — three-case branch on remaining_hops (>1 forward as HEADER_2 with new transport_id; ==1 strip transport_id and forward as HEADER_1 broadcast; ==0 local). LINKREQUEST forwarding extras (link_table entry + §6.6 MTU clamp). Non-LINKREQUEST gets a reverse_table entry. §12.3 ANNOUNCE rebroadcasting — announce_table retransmit queue, per-interface ANNOUNCE_CAP airtime budget, announce_queue drain order (lowest-hop-count first), random_blob replay defence with MAX_RANDOM_BLOBS sliding window, and the PATH_RESPONSE short-circuit (path-responses go on a specific interface, not broadcast). §12.4 Path table management — entry shape (IDX_PT_* indexes), three TTLs by interface mode (AP_PATH_TIME 1h, ROAMING_PATH_TIME 4h, PATHFINDER_E 30 days), stale-paths eviction, persistence to storagepath/paths. §12.5 Reverse-table link transport — LRPROOF forwarding via link_table validation against the destination's known long-term Ed25519 pub, Link DATA forwarding once link_table[IDX_LT_VALIDATED] is set, PROOF receipt forwarding via reverse_table (one-shot pop on use, REVERSE_TIMEOUT bound for memory). §12.6 Tunnels and shared-instance protocol — discovery_path_requests recursive search (15s timeout), tunnels[] persistence across interface flap, shared-instance protocol (regular Reticulum packets over TCP loopback; the 'sharing' is Transport state, not wire format). §12.7 Source map. Old §12 Test vectors -> §13; old §13 Source map -> §14. Section order preserves protocol content before appendices. TIER 3 COMPLETE. All Tier 1, 2, and 3 spec gaps closed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 12:16:26 -04:00
All five Tier 3 items consolidated into SPEC.md §12 "Transport-relay
behaviour" (single section, seven sub-sections) since they share state
(path_table, announce_table, link_table, reverse_table, tunnels,
discovery_path_requests):
- [x] **DATA forwarding rules** — §12.2 covers the three-case branch
on remaining_hops (>1 forward as HEADER_2 with new transport_id;
==1 strip transport_id and forward as HEADER_1 broadcast; ==0
local destination, just bump hops). LINKREQUEST gets an extra
link_table entry and the §6.6 MTU clamp; non-LINKREQUEST DATA
gets a reverse_table entry.
- [x] **ANNOUNCE rebroadcasting** — §12.3 covers the announce_table
retransmit queue, per-interface ANNOUNCE_CAP throttling and
announce_queue, random_blob replay defence with MAX_RANDOM_BLOBS
sliding-window cap, and the PATH_RESPONSE short-circuit.
- [x] **Path table management** — §12.4 covers the entry shape, three
TTL constants by interface mode (AP/ROAMING/default 30 days),
stale-paths eviction in Transport.jobs, and persistence to
storagepath/paths.
- [x] **Tunnels and shared-instance protocol** — §12.6 covers
discovery_path_requests recursive search, the tunnels[] state
that survives interface flap, and the shared-instance wire
protocol (just regular Reticulum packets over a TCP loopback;
what's "shared" is the Transport state, not the wire format).
- [x] **Reverse-table link transport** — §12.5 covers LRPROOF
forwarding via link_table, Link DATA forwarding in both
directions once the link_table entry is validated, and PROOF
receipt forwarding via reverse_table (one-shot pop on use).
## Developer-experience gaps (would save real implementers real time)
The following aren't strictly wire-format issues — they're things that
bite anyone building a clean-room client. Listed in rough priority
order: top three save the most debugging hours.
Add §13 threading/concurrency model (dev-experience #1) The wire spec is silent on threading, but a clean-room client built single-threaded mostly works for opportunistic LXMF and starts breaking on Resource transfers and Link keepalives. This is the #1 cause of 'my client compiles and almost works but is flaky'. Five sub-sections: §13.1 Long-running threads — Transport.jobloop (every 250ms, runs all maintenance), count_traffic_loop (every 1s bandwidth snapshots), per-Link Link.__watchdog_job (RTT-driven keepalive emission and STALE→CLOSED transitions), per-Resource Resource.__watchdog_job (retransmit timeouts), announce-handler callbacks fire on FRESH daemon threads per inbound announce, per-interface RX thread, process_announce_queue chained one-shot timers. §13.2 Lock inventory — 18 named Transport / Identity / Link / Resource / Destination locks. jobs_lock is the most aggressive: held for the entire jobs() body so parallel job invocations can't pile up. §13.3 Callback-thread guarantees: packet/link/receipt callbacks all run synchronously on the receive thread; only announce-handler callbacks run on fresh threads. Critical design implications: - Don't block the receive thread (queue-and-return). - Announce handlers race; lock shared state. - link_closed can fire from two paths (watchdog OR peer LINKCLOSE); make idempotent. §13.4 Implementation-private timing constants — job_interval = 250ms, links_check_interval = 1s, tables_cull_interval = 5s, hashlist_maxsize = 1M, WATCHDOG_MAX_SLEEP, PROCESSING_GRACE, SENDER_GRACE_TIME, etc. Don't scale below 100ms job_interval. §13.5 Source map. Test vectors and Source map renumbered to §14 and §15. Other section numbers unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 14:59:24 -04:00
- [x] **§13 (new): Threading / concurrency model.** Done in §13.
Five sub-sections covering long-running threads (jobloop,
count_traffic, per-link watchdog, per-resource watchdog,
per-interface RX, per-handler dispatch), full lock inventory
table, callback-thread guarantees with race notes, and
implementation-private timing constants. (Reticulum is
heavily threaded: `Transport.jobs` periodic loop, per-Link
watchdog daemon threads, per-Resource transfer threads,
announce-handler callbacks fire on fresh daemon threads,
lock inventory (`Transport.path_table_lock`,
`Transport.announce_table_lock`, `Identity.known_destinations_lock`,
etc). A client built single-threaded mostly works for
opportunistic LXMF but breaks on Resource transfers and Link
keepalives. #1 cause of "my client compiles and almost works
but is flaky." Roundup of which loop runs when, what callbacks
fire on which thread, what locks must be held to mutate which
state.
- [x] **§14 (new): Failure-mode → root-cause cheatsheet.** Done.
Eight tables (Identity/announce, Token crypto / opportunistic
LXMF, Link establishment / proofs, Resource transfers, Path
discovery, Transport / framing, LXMF specifics, Concurrency)
keyed by symptom, pointing at root-cause section + relevant
verifier. Closes with the §9.9 "rx-log every inbound packet"
diagnostic. §9 lists
gotchas by cause; this would be the inverse-index, organised
by symptom. Worked examples like:
- "messages send but no PROOF returns" → §6.5
implicit/explicit length mismatch
- "links establish then disconnect within a minute" →
§6.7 KEEPALIVE not implemented or wrong sentinel byte
- "first contact works but every subsequent send fails" →
§7.5 periodic re-announce missing
- "Sideband announces validate but mine don't" →
§4.1 random_hash timestamp not encoded (§9.10)
- "everything works on TCP but breaks on RNode" →
§8.4 KISS handshake or §8.3 split-packet protocol bug
High value because debugging Reticulum is a known multi-hour
exercise; this would shortcut diagnosis to seconds.
- [x] **§15 (new): Time / clock requirements roundup.** Done.
Seven sub-sections covering three clock kinds (wall time vs
boot-relative monotonic vs hi-res monotonic), what's required
vs recommended vs optional, the no-RTC strategy for
`random_hash` timestamps (boot-relative is fine; random
bytes are the §9.10 bug), wall-time-only LXMF features
(ticket expiry can't substitute), and an explicit
what-fails / what-works inventory for clockless devices
with their interop consequences. Currently
scattered across §4.1 (random_hash timestamp), §9.6 (clockless
LXMF senders), §5.7 (ticket expiry), §6.7 (RTT-driven keepalive),
§7.5 (re-announce cadence). A no-RTC device (Faketec, RAK4631
stock, Heltec_T114) needs a clear "what fails / what works /
how to substitute monotonic-seconds" roundup so embedded
implementers don't have to hunt for the constraints.
- [x] **§6.8 (new): Channel mode (`CHANNEL = 0x0E` context).** Done.
Six sub-sections: wire form (6-byte BE header msgtype+sequence+
length followed by payload, Token-encrypted by link session
key), reserved SystemMessageTypes (`SMT_STREAM_DATA = 0xff00`),
MSGTYPE registration via `Channel.register_message_type`,
reliable delivery via the standard §6.5 PROOF mechanism plus
a sliding window, when-to-use-Channel-vs-Resource-vs-REQUEST
decision matrix. Old §6.8 Source moved to §6.9.
Multiplexed-application-data channel that runs over an
established Link, distinct from DATA/REQUEST/RESPONSE.
`RNS/Channel.py` is the reference. NomadNet uses it for the
"channel" API beyond simple page fetches. Currently only a
one-line entry in §2.5; deserves its own §6.x sub-section
with body format and lifecycle.
- [x] **§8.6 (new): AutoInterface multicast discovery.** Done.
Seven sub-sections: IPv6 multicast group derivation from
`SHA256(group_id)` with scope/address-type bits, default
UDP ports (29716 discovery / 29717 unicast probe / 42671 data),
discovery cadence constants, discovery announce body format
(msgpack with group_hash + MTU + optional IFAC seal),
post-discovery data flow as plain unicast UDP on the data
port carrying full Reticulum packets, IFAC integration,
source map. HW_MTU = 1196 (Ethernet-MTU-friendly). UDP
multicast on a known group/port for LAN auto-detection of
peers. Specific multicast group, port, magic bytes, beacon
cadence. `RNS/Interfaces/AutoInterface.py` is the reference.
Needed for any client that wants to participate in
auto-discovered LAN meshes (the "share_instance" deployment
pattern with multiple physical hosts).
- [x] **§16 (new): Bounded-state inventory.** Done. Eight sub-section
tables covering per-node Transport state, per-interface state,
per-destination, per-Link, per-Resource, identity caches,
LXMF-level, Channel state — every memory-bounded structure across
the protocol with its cap and pointer to the explanatory section.
Closes with explicit guidance for embedded targets (~64KB-RAM
class) on what to bound, what to reject, and what to skip
(transport-mode operation). A single table of every
memory-bounded structure across the protocol with its cap:
`MAX_RANDOM_BLOBS = 32`, `Transport.max_pr_tags = 32000`,
`Interface.MAX_HELD_ANNOUNCES = 256`, `Destination.RATCHET_COUNT
= 512`, `Identity.known_destinations` (unbounded — the gotcha
itself), `Transport.MAX_HASHLIST_LENGTH`, `Resource.WINDOW_MAX_FAST
= 75`, `LXMRouter.propagation_entries` (operator-bounded), etc.
Critical for embedded targets where heap is finite. Mostly
implicit in §4.5 / §7.x / §10 / §12 today; a single appendix
table would be a quick reference card.
## Spec polishing (lower priority)
- [x] **Navigation polish for `SPEC.md`** — at ~3300 lines, splitting
into per-layer files would have broken ~37 cross-references
(flow docs, verifier docstrings, agent.md, README) for
relatively little reader benefit. Picked the lighter polish
instead: a collapsible Table of Contents at the top of the
doc with anchor links to every H2 + H3, plus a `<details>`
wrap on §11.6 (NomadNet specifics — informational/non-normative,
and the longest H3 sub-tree in the document). Helper script at
`tools/_gen_toc.py` regenerates the ToC if headings change.
- [x] **Add a "last-verified-against-rns" line** to SPEC.md
frontmatter (per `agent.md` §7). Done — `RNS 1.2.0 / LXMF
0.9.6` is now in the document header.
- [x] **`tools/verify_stamps.py`** runtime-locks §5.7. Done.
Verifies workblock determinism (confirms exactly 768 KiB at
3000 rounds), PoW search-and-validate at target_cost=4 (fast),
`LXMessage.validate_stamp` end-to-end accepts/rejects PoW
stamps, and the ticket shortcut path:
`SHA256(ticket || message_id)` is accepted with a matching
ticket and rejected with a wrong one.