Add §6.7 KEEPALIVE / link teardown (Tier 2 #1+#2)

Documents the link control plane that's required for any client
that wants links to survive idle periods. Five sub-sections:

  §6.7.1  KEEPALIVE wire form: context = 0xFA, initiator-originated
          0xFF ping body → responder 0xFE pong reply body, both
          Token-encrypted by the link session key. Cadence formula
          RTT × (KEEPALIVE_MAX/KEEPALIVE_MAX_RTT) = RTT × 205.7,
          clamped to [5s, 360s]. Initial value is 360s before RTT
          is measured by validate_proof.

  §6.7.2  STALE → CLOSED transition. Watchdog moves link to STALE
          when last_inbound + 2*keepalive elapses, then on next
          watchdog pass emits LINKCLOSE and goes to CLOSED.
          teardown_reason = TIMEOUT.

  §6.7.3  LINKCLOSE wire form: context = 0xFC, body = 16-byte
          link_id Token-encrypted. Receiver MUST verify
          plaintext == link_id before accepting the close. After
          accepting, link.shared_key/derived_key zeroed for forward
          secrecy.

  §6.7.4  Teardown reason codes: TIMEOUT(0x01), INITIATOR_CLOSED
          (0x02), DESTINATION_CLOSED(0x03). Local-state values, not
          on the wire.

  §6.7.5  Six-step minimum-receiver-responsibility recipe.

Also marks Tier 2 implicit/explicit proof item done — already
covered as part of §6.5's Tier 1 #3 expansion.

Old §6.7 "Source" renumbered to §6.8.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Rob 2026-05-03 11:59:13 -04:00
commit 22ee7636ef
2 changed files with 137 additions and 12 deletions

123
SPEC.md
View file

@ -721,7 +721,128 @@ The `[reticulum]` config option `link_mtu_discovery = No` makes `Reticulum.link_
A receiver doesn't need its own copy of the disable switch — it just stops seeing trailing signalling bytes from peers that have it disabled. Its own MTU reporting on the LRPROOF return path runs unaffected for peers that send it.
### 6.7 Source
### 6.7 KEEPALIVE and link teardown
A Link goes through five states (`RNS/Link.py:110-114`): `PENDING → HANDSHAKE → ACTIVE → STALE → CLOSED`. `KEEPALIVE` and `LINKCLOSE` are the two control-plane packet types that drive transitions out of `ACTIVE`.
#### 6.7.1 KEEPALIVE (`context = 0xFA`)
Cadence (`RNS/Link.py:844-846`):
```python
def __update_keepalive(self):
self.keepalive = max(min(self.rtt * (KEEPALIVE_MAX / KEEPALIVE_MAX_RTT), KEEPALIVE_MAX), KEEPALIVE_MIN)
self.stale_time = self.keepalive * STALE_FACTOR
```
with constants `KEEPALIVE_MAX = 360s`, `KEEPALIVE_MIN = 5s`, `KEEPALIVE_MAX_RTT = 1.75s`, `STALE_FACTOR = 2`. The interval is `RTT × 205.7` clamped to `[5, 360]` seconds. Before the first RTT is measured (set in `validate_proof`), the link uses `KEEPALIVE = KEEPALIVE_MAX = 360s`.
The watchdog (`Link.__watchdog_job`, line 751-821) fires on every active link. When `now >= last_inbound + keepalive` AND the local node is the **initiator**, it emits a KEEPALIVE:
```python
def send_keepalive(self):
keepalive_packet = RNS.Packet(self, bytes([0xFF]), context=RNS.Packet.KEEPALIVE)
keepalive_packet.send()
```
Body is a single byte `0xFF` — the "ping" sentinel. The packet is Token-encrypted with the link's session key per §3.1 link-derived form, so the wire body is `iv(16) || ciphertext(...) || hmac(32)`; the decrypted plaintext is just `b'\xff'`.
The **responder** receives this in `Link.receive` at `RNS/Link.py:1149-1153` and answers with the "pong" sentinel:
```python
elif packet.context == RNS.Packet.KEEPALIVE:
if not self.initiator and packet.data == bytes([0xFF]):
keepalive_packet = RNS.Packet(self, bytes([0xFE]), context=RNS.Packet.KEEPALIVE)
keepalive_packet.send()
```
So:
- **Ping** = initiator → responder, body `0xFF`.
- **Pong** = responder → initiator, body `0xFE`.
- Only the initiator originates KEEPALIVE traffic. The responder never spontaneously pings.
Both sentinel bytes are arbitrary; what actually matters for keep-alive purposes is that *any* inbound traffic on the link refreshes `last_inbound` (the watchdog's anchor for staleness decisions). KEEPALIVE packets, like all link DATA, also generate the mandatory PROOF receipt per §6.5, which is itself inbound traffic on the return path. So a successful ping/pong exchange resets the staleness clock on **both** sides via three round-trip artifacts: ping → pong → pong-proof.
A clean-room responder MUST emit the pong on inbound `0xFF`; without it the initiator's watchdog will declare the link stale on the next cycle.
#### 6.7.2 STALE → CLOSED transition
When `now >= last_inbound + stale_time` (= `2 × keepalive`), the watchdog moves the link from `ACTIVE` to `STALE` (line 796-800), then on its next pass emits a teardown packet and transitions to `CLOSED` (line 805-810):
```python
elif self.status == Link.STALE:
sleep_time = 0.001
self.__teardown_packet() # see §6.7.3
self.status = Link.CLOSED
self.teardown_reason = Link.TIMEOUT
self.link_closed()
```
`teardown_reason` is set to `Link.TIMEOUT` (constant value `0x01`) so the application's `link_closed_callback` can distinguish "the peer went dark" from "the peer cleanly closed".
There is also an explicit-cleanup path: after a STALE-induced teardown the watchdog adds a final grace period of `RTT × KEEPALIVE_TIMEOUT_FACTOR + STALE_GRACE` (= `RTT × 4 + 5s`) at line 797 to allow a delayed reply to bring the link back into ACTIVE before final teardown — but in upstream RNS 1.2.0 the `STALE → CLOSED` transition runs immediately on the next watchdog pass without consulting that grace period. The grace constant lives in case a future revision restores the soft-stale window.
#### 6.7.3 LINKCLOSE (`context = 0xFC`)
Either side can cleanly tear down a link by calling `Link.teardown()` (line 699-708), which sends a single LINKCLOSE packet and transitions the local state to `CLOSED`:
```python
def __teardown_packet(self):
teardown_packet = RNS.Packet(self, self.link_id, context=RNS.Packet.LINKCLOSE)
teardown_packet.send()
```
Wire form:
- `packet_type = DATA (0)`, `context = 0xFC`, `dest_hash = link_id`.
- Body is the **16-byte link_id**, Token-encrypted by the link's session key.
The peer's receiver path at `RNS/Link.py:1061-1063` calls `teardown_packet(packet)` (line 710-722):
```python
def teardown_packet(self, packet):
plaintext = self.decrypt(packet.data)
if plaintext == self.link_id: # auth check
self.status = Link.CLOSED
if self.initiator:
self.teardown_reason = Link.DESTINATION_CLOSED
else:
self.teardown_reason = Link.INITIATOR_CLOSED
self.link_closed()
```
The body's plaintext **MUST** equal `link_id` for the close to take effect — this is the on-link auth check. A peer that doesn't share the session key can't decrypt the body, and even if it could, the link_id check rejects bodies with arbitrary content. Combined with the Token HMAC, this gives both "encrypted" and "authenticated" guarantees on the teardown signal.
After `link_closed()` (line 724-743) runs:
- All `incoming_resources` and `outgoing_resources` are cancelled (cancels propagate into the §10 Resource state machine).
- The Link's session keys (`self.shared_key`, `self.derived_key`) are zeroed by reassignment to `None` — the upstream comment at line 700-702 notes this is the forward-secrecy property: "encryption keys are purged. New keys will be used if a new link to the same destination is established."
- The `link_closed_callback` registered via `set_link_closed_callback` fires.
- The Link is removed from its destination's `links` list (responders only — initiators don't have a destination-list entry).
#### 6.7.4 Teardown reason codes
`Link.teardown_reason` is set to one of (`RNS/Link.py:116-118`):
| Constant | Hex | Meaning |
|---|---|---|
| `TIMEOUT` | `0x01` | Watchdog STALE → CLOSED transition. No LINKCLOSE was received. |
| `INITIATOR_CLOSED` | `0x02` | This side is the responder; the initiator sent a LINKCLOSE. |
| `DESTINATION_CLOSED` | `0x03` | This side is the initiator; the responder sent a LINKCLOSE. |
These are local-state values, not on the wire — the LINKCLOSE packet itself doesn't carry a reason code. The recipient just infers whether the close came from the other side based on whether they're initiator or responder.
#### 6.7.5 Receiver responsibilities (minimum)
For a clean-room implementation that wants links to survive idle periods longer than a few seconds:
1. Keep a per-link `last_inbound` timestamp updated on every inbound packet on the link (DATA, PROOF, KEEPALIVE — anything).
2. On the **initiator** side, run a watchdog that emits a `0xFF` KEEPALIVE every `link.keepalive` seconds since `last_inbound`. Default `link.keepalive = 360s` is fine until you measure RTT.
3. On the **responder** side, reply to every `0xFF` KEEPALIVE with a `0xFE` KEEPALIVE. Don't originate.
4. On both sides, transition to `CLOSED` if `last_inbound + 2*keepalive` elapses with no traffic, AND emit a `LINKCLOSE` packet so the peer doesn't have to wait for its own watchdog to time out.
5. On every inbound `LINKCLOSE`, decrypt, verify body equals `link_id`, transition to `CLOSED`.
6. On `CLOSED`, zero the session keys and cancel any in-progress Resources.
### 6.8 Source
`RNS/Link.py`, `RNS/Packet.py::prove`, `RNS/Identity.py::prove`, `RNS/PacketReceipt.py::validate_proof`. The webclient's `reference/js-reference/link.js` is a faithful port.

26
todo.md
View file

@ -236,10 +236,16 @@ re-research.
propagation nodes is undocumented. Cross-flow:
`flows/send-propagated-lxmf.md` (already a `⏳` entry in
`flows/README.md`).
- [ ] **SPEC.md §6 expansion: KEEPALIVE / link teardown protocol.**
`CTX_KEEPALIVE = 0xfd` packets — exact wire body, exact cadence
(`Link.KEEPALIVE` constant), exact teardown packet (`Link.PROOF`
context). Real clients drop links incorrectly without this.
- [x] **SPEC.md §6 expansion: KEEPALIVE / link teardown protocol.**
Done in §6.7 (old §6.7 Source moved to §6.8). Five
sub-sections: KEEPALIVE wire form (`0xFA` context, initiator-
originated `0xFF` ping → responder `0xFE` pong, body
Token-encrypted), cadence (`RTT × 205.7` clamped to `[5,360]s`),
STALE→CLOSED watchdog transitions, LINKCLOSE wire form
(`0xFC` context, body = 16-byte `link_id` Token-encrypted with
`plaintext == link_id` auth check), teardown reason codes
(`TIMEOUT/INITIATOR_CLOSED/DESTINATION_CLOSED`), and the
six-step minimum-receiver-responsibility recipe.
- [ ] **SPEC.md §5.x (new): LXMF stamps + tickets for spam control.**
`LXMF.Stamp` (proof-of-work field in the optional 5th element of
the msgpack payload), `FIELD_TICKET` lookup. Modern Sideband 1.x
@ -271,13 +277,11 @@ re-research.
to bring up the radio. All defined in `RNode_Firmware/Framing.h:24-95`.
Spec just says "send Reticulum packets via CMD_DATA" — that's
not enough.
- [ ] **SPEC.md §6.5 second sub-bullet: implicit vs explicit proof
mode.** `RNS.Reticulum.should_use_implicit_proof()` mode trims
the proof body to just the signature (no `packet_hash` prefix),
saving 32 bytes. `RNS/Link.py:386-389` has the explicit form
hard-coded with the implicit branch commented out, but at least
one upstream branch toggles it — a client that hard-codes the
explicit form will eventually meet a peer in implicit mode.
- [x] **SPEC.md §6.5 second sub-bullet: implicit vs explicit proof
mode.** Done as part of the §6.5 expansion (Tier 1 #3). The
length-dispatch validator at `PacketReceipt.validate_proof`
and the `should_use_implicit_proof()` config switch are
documented in §6.5.1-§6.5.2 with full citations.
### Tier 3 — required to act as a transport node / relay