From f13985846c2b2626670d6c44c0a882365363187a Mon Sep 17 00:00:00 2001 From: Rob Date: Sun, 3 May 2026 15:05:47 -0400 Subject: [PATCH] =?UTF-8?q?Add=20=C2=A715=20time/clock=20requirements=20ro?= =?UTF-8?q?undup=20(dev-experience=20#3)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Consolidates time-sensitive behaviour scattered across §4.1, §5.3, §5.7.3, §5.8.5, §6.7.1, §7.1, §7.2, §7.5, §9.6, §10, §13.4 into one reference for what kind of clock you need where. Seven sub-sections: §15.1 Three clock kinds (wall time, boot-relative monotonic, hi-res monotonic). Embedded clean-rooms must be careful which call site needs which. §15.2 Required: monotonic seconds. Seven specific use sites that break a single-clock implementation if missing. All can be satisfied by boot-relative seconds. §15.3 random_hash timestamp encoding strategy for no-RTC devices: emit boot-relative seconds (look stale, lose path-replace ties — that's correct). Don't emit fully- random bytes (the §9.10 microReticulum bug — locks you in as 'latest' forever). §15.4 Wall-time-required (LXMF body timestamp, ticket expiry, propagation timebase). Tickets can't substitute — no-RTC devices must use PoW stamps instead. §15.5 Optional hi-res monotonic for diagnostics. §15.6 Explicit fails-vs-works inventory for a no-RTC, no-NTP-sync device. Net: opportunistic LXMF, propagated LXMF retrieval, and Links all work; only ticket-based shortcuts fail. A one-time clock sync flips most of the ⚠️ items to ✅. §15.7 Source map across all sections that touch time. Test vectors stays at §16; Source map renumbered to §17. Co-Authored-By: Claude Opus 4.7 (1M context) --- SPEC.md | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-- todo.md | 10 +++++- 2 files changed, 110 insertions(+), 3 deletions(-) diff --git a/SPEC.md b/SPEC.md index a8d57d0..d0bf50c 100644 --- a/SPEC.md +++ b/SPEC.md @@ -2440,7 +2440,106 @@ A client running on a constrained device (less RAM, slower CPU) can scale all of --- -## 15. Test vectors +## 15. Time and clock requirements + +Reticulum has time-sensitive behaviour scattered across many sections. This is the consolidated reference for what kind of clock you need where, what tolerance the protocol gives you, and what fails on a no-RTC device (Faketec, RAK4631 stock, Heltec_T114, generic nRF52 LoRa boards). + +### 15.1 Three clock kinds + +| Kind | What it tells you | Typical embedded availability | +|---|---|---| +| **Absolute wall time** | Current Unix-seconds (e.g. `1714780800`) | Only with NTP sync, GPS, or hand-set clock | +| **Boot-relative monotonic seconds** | Seconds since the device booted | Always, via `millis()` / `time.monotonic()` | +| **High-resolution monotonic** | Sub-second timing for RTT and watchdog | Always, but precision varies | + +Most upstream Python code assumes wall time is available because it runs on hosts with NTP. Embedded clean-room implementations need to be careful about which kind each call site needs. + +### 15.2 Required: monotonic seconds (every implementation) + +These break a single-clock implementation if missing. All can be satisfied by **boot-relative seconds** — they only need order, not absolute value. + +| Use | Section | What fails without it | +|---|---|---| +| Link RTT measurement | §6.7.1 | `keepalive` interval can't adapt; defaults to `KEEPALIVE_MAX = 360s` worst case | +| Link watchdog (last_inbound, last_outbound, last_keepalive timestamps) | §6.7 | Link can't detect staleness; lingers forever or tears down spuriously | +| Resource transfer watchdog (last_activity, advertisement retry timing) | §10 | Resource transfers stall without retry; SENDER_GRACE_TIME never triggers | +| `Transport.path_requests` rate-limit (`PATH_REQUEST_MI = 20s` minimum interval) | §7.1 | Path? storms — repeats faster than rate limit allows | +| `Transport.tables_last_culled` periodic eviction trigger | §13.4 | Path/reverse/link tables grow without bound | +| `Transport.discovery_pr_tags` aging (`PATH_REQUEST_GATE_TIMEOUT = 120s`) | §7.2.2 | Path-request dedup table never evicts old entries | +| `Interface.ic_burst_freq` rolling deque for ingress rate limiting | §4.5 step 8 | Per-interface ingress limiter can't compute Hz | + +### 15.3 Recommended: monotonic-with-no-skew across announces (timestamp encoding) + +The §4.1 `random_hash` carries a 5-byte big-endian uint40 timestamp: + +```python +random_hash = get_random_hash()[:5] + int(time.time()).to_bytes(5, "big") +``` + +Transit relays read `random_hash[5:10]` as a unix-seconds value and use it for path-table replay-ordering decisions (§4.5 step 6.3). Two requirements: + +1. **Monotonic across announces from the same destination.** A new announce should have a higher timestamp than older ones from the same destination, or relays will reject it as "older than what we have cached" in the equal-or-greater-hop branch. +2. **Comparable to other peers' timestamps.** If all your announces always look like "year 1970" (boot-relative seconds presented as unix), you'll consistently lose path-replay comparisons against peers with real wall time. That's actually fine — your announces just won't replace cached entries from real-time peers — but the inverse case is the §9.10 microReticulum bug: random `random_hash[5:10]` looks "far future" and freezes the path table. + +**No-RTC strategy:** emit boot-relative seconds. You'll always look stale to wall-time peers (their announces win in path-replace decisions, which is correct because their data is fresher), and you'll get monotonic-from-boot ordering between your own announces (correct). + +**Wrong strategy:** emit fully-random bytes (the §9.10 microReticulum bug). Locks you in as "latest" forever. + +### 15.4 Recommended: wall time (LXMF-level) + +These use absolute Unix-seconds. A device without wall time can substitute, with caveats: + +| Use | Section | Substitution if no wall time | +|---|---|---| +| LXMF body `timestamp` (`payload[0]`) | §5.3 | Use boot-relative seconds. Recipients per §9.6 should treat any timestamp before `1577836800` (2020-01-01) as "no clock" and substitute their local receive time. | +| Outgoing message `LXMessage.timestamp` for sender-side ordering | §5.3 | Same as above. | +| Stamp ticket expiry (`fields[FIELD_TICKET][0]`) | §5.7.3 | **You can't substitute here.** Tickets you issue with boot-relative seconds will appear to have already-expired-or-already-distant-future expiries to recipients. If your device has no wall time, don't issue tickets — fall back to PoW stamps (§5.7.2). | +| Propagation node `timebase` field in `/offer` requests | §5.8.5 | Same as random_hash strategy: boot-relative is fine; you'll appear "stale" but your peers' state stays consistent. | + +### 15.5 Optional: high-resolution monotonic for diagnostics + +These are nice-to-have; missing them just degrades observability: + +- Per-packet RX timestamp for RTT decomposition. +- Airtime accounting (sub-second precision improves `ANNOUNCE_CAP` enforcement; integer seconds is fine). +- Resource transfer `establishment_rate` calculation. + +Use whatever monotonic source your platform provides; even 1 ms resolution from `millis()` is plenty. + +### 15.6 What fails on a no-RTC, no-NTP-sync device + +A device that boots with no clock at all (`time.time()` returns a small integer, RTC chip absent or empty) and never syncs: + +- ✅ **Sending and receiving opportunistic LXMF** works fine. The §9.6 receiver-side fix-up (substitute local receive time when timestamp < 2020) handles your "year 1970" timestamps cleanly. +- ✅ **Receiving propagated LXMF** works. The propagation node tags messages with its own timestamp; you don't need yours. +- ✅ **Establishing Links** works. RTT is measured locally and only used for relative cadences. +- ⚠️ **Periodic re-announces** work, but your `random_hash[5:10]` will always look stale to wall-time peers. Your announces propagate fine; they just don't win path-table replacement races against fresher peers (which is correct — they ARE fresher). +- ⚠️ **Path-table updates from your own announces** work the first time (no cached entry to compare against), but subsequent re-announces may not replace stale cache entries on transit relays. Practical effect: your destination is reachable but transit relays keep trying older paths longer than ideal. +- ❌ **Issuing LXMF tickets** doesn't work — the expiry timestamp in `FIELD_TICKET` is meaningless without wall time. Don't issue tickets; rely on PoW stamps. +- ❌ **Sending propagated LXMF with ticket-based stamp shortcuts** doesn't work for the same reason. + +A single one-time clock sync (BLE config, web flasher, manual button-press at known time, GPS, `rnstatus` peer query) flips most of the ⚠️ items to ✅. The repeater repo's BLE config protocol can carry a clock value in the connection handshake; that's the simplest fix. + +### 15.7 Source map + +| Section | What relies on time | +|---|---| +| §4.1 | `random_hash[5:10]` emission timestamp | +| §4.5 step 6.3 | Path-table replacement using `random_blob` timestamps | +| §5.3 | LXMF body timestamp | +| §5.7.3 | LXMF ticket expiry | +| §5.8.5 | Propagation node timebase field | +| §6.7.1 | Link KEEPALIVE / RTT cadence | +| §7.1 | `Transport.path_requests` rate limit | +| §7.2 | `discovery_pr_tags` aging | +| §7.5 | Periodic re-announce cadence | +| §9.6 | Clockless sender LXMF timestamp fix-up | +| §10 | Resource watchdog timeouts | +| §13.4 | All `Transport.jobs` periodic intervals | + +--- + +## 16. Test vectors See [`test-vectors/`](test-vectors/). Currently populated: @@ -2452,7 +2551,7 @@ An implementation that round-trips every test vector — both directions — sho --- -## 16. Source map +## 17. Source map Upstream Python sources, in rough order of frequency-of-reference: diff --git a/todo.md b/todo.md index c0431ac..8b8b8af 100644 --- a/todo.md +++ b/todo.md @@ -365,7 +365,15 @@ order: top three save the most debugging hours. High value because debugging Reticulum is a known multi-hour exercise; this would shortcut diagnosis to seconds. -- [ ] **§17 (new): Time / clock requirements roundup.** Currently +- [x] **§15 (new): Time / clock requirements roundup.** Done. + Seven sub-sections covering three clock kinds (wall time vs + boot-relative monotonic vs hi-res monotonic), what's required + vs recommended vs optional, the no-RTC strategy for + `random_hash` timestamps (boot-relative is fine; random + bytes are the §9.10 bug), wall-time-only LXMF features + (ticket expiry can't substitute), and an explicit + what-fails / what-works inventory for clockless devices + with their interop consequences. Currently scattered across §4.1 (random_hash timestamp), §9.6 (clockless LXMF senders), §5.7 (ticket expiry), §6.7 (RTT-driven keepalive), §7.5 (re-announce cadence). A no-RTC device (Faketec, RAK4631