docs(spec): fix §10.2 Resource integrity hash — prefix is not r, not hashed

§10.2 step 3 wrongly equated the random-hash prefix prepended to the Resource body with the advertisement's `r` field, and step 5 fed that prefix into the hash/expected_proof input. Upstream RNS uses two distinct get_random_hash()[:4] values: a throwaway prefix the receiver strips and discards, and self.random_hash (the adv `r` field). The integrity hash is SHA256(uncompressed_plaintext || r) over the prefix-stripped, decompressed body — exactly as §10.8 already stated. - §10.2 steps 3 & 5 corrected to agree with §10.8 - §10.8: renamed misleading plaintext_with_random / data_with_random - §10.12: wire-layering block rewritten to match - README: errata entry under Spec corrections Verified against RNS 1.2.5 (Resource.py:332,405,412,440-443,682-694,755). Resolves #9. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 10:28:20 -04:00 · 2026-05-17 10:28:20 -04:00 · 1b955d19a9
commit 1b955d19a9
parent 3eea25977a
2 changed files with 18 additions and 11 deletions
--- a/SPEC.md
+++ b/SPEC.md
@ -2267,13 +2267,15 @@ Given input data and an `RNS.Link` in `ACTIVE` state (`RNS/Resource.py:248-478`)

 1. **Optional metadata prefix.** If the caller supplied a `metadata` dict, msgpack-pack it and prepend `length(3 bytes, big-endian uint24) || packed_metadata` to the body. The `has_metadata` (`x`) flag in the advertisement signals this. Receivers strip the prefix during reassembly (line 699-707).
 2. **Optional bz2 compression.** If `auto_compress` is true and the data fits within `auto_compress_limit` (default 64 MiB), the body is bz2-compressed and the `compressed` (`c`) flag is set. If compression doesn't shrink the data, the uncompressed form is sent and `c` is cleared.
-3. **Random hash prefix.** A 4-byte (`Resource.RANDOM_HASH_SIZE`) random hash is prepended to the (compressed-or-not) body. This is the `r` field in the advertisement and is part of the input to `hash` and `expected_proof`.
+3. **Random hash prefix.** A 4-byte (`Resource.RANDOM_HASH_SIZE`) random hash is prepended to the (compressed-or-not) body — `Resource.py:405`/`412`, a fresh `RNS.Identity.get_random_hash()[:4]` call. This prefix is **not** the `r` field, and is **not** part of the `hash` / `expected_proof` input. It is a separate throwaway value that travels inside the encrypted blob; the receiver strips and discards it (§10.8 step 3). The advertisement's `r` field carries a *different* value — `self.random_hash`, generated by its own `get_random_hash()[:4]` call at `Resource.py:440` — which is the actual integrity-hash and hashmap salt.
 4. **Link encryption.** The full `random_hash || (compressed?) data` blob is encrypted using `link.encrypt(...)` — i.e. the link-derived Token form (§3.1), no ephemeral_pub prefix. The `encrypted` (`e`) flag is set.
-5. **Hash and proof material.**
-   - `data_with_random = random_hash || (compressed?) plaintext`
-   - `hash = SHA256(data_with_random || random_hash)` (32 bytes)
+5. **Hash and proof material** (`Resource.py:440-443`). All three are computed over the **original uncompressed `plaintext`** — the caller's input, including any metadata prefix from step 1 (`Resource.py:332`) — *not* the compressed body, and *not* the random-prefixed wire blob from step 3:
+   - `random_hash = RNS.Identity.get_random_hash()[:4]` — the value the advertisement's `r` field carries.
+   - `hash = SHA256(plaintext || random_hash)` (32 bytes)
   - `truncated_hash = hash[:16]`
-   - `expected_proof = SHA256(data_with_random || hash)` (32 bytes) — what the receiver will eventually return in the RESOURCE_PRF packet.
+   - `expected_proof = SHA256(plaintext || hash)` (32 bytes) — what the receiver will eventually return in the RESOURCE_PRF packet.
+
+   The 4-byte prefix from step 3 is **not** in any of these inputs. The receiver strips the prefix and bz2-decompresses *before* hashing (§10.8 steps 3-5), so the sender must hash the uncompressed, unprefixed `plaintext` for the two sides to agree. A receiver that includes the prefix, or hashes the compressed form, rejects every legitimate Resource as `CORRUPT`.
 6. **Part split.** The encrypted body is sliced into parts of size `SDU = link.mtu - HEADER_MAXSIZE - IFAC_MIN_SIZE`. Each part becomes a packed `RNS.Packet(link, part_data, context=RESOURCE)`; the packed wire bytes are stored in `parts[i]` for later sending.
 7. **Hashmap.** Each part is fingerprinted to `MAPHASH_LEN = 4 bytes`. The full hashmap is `b"".join(map_hashes)`. **Hash collisions within the COLLISION_GUARD_SIZE = 2 × WINDOW_MAX + HASHMAP_MAX_LEN window are detected at construction time** — if two parts hash to the same 4-byte map_hash within that window, the random hash is regenerated and the whole hashmap is recomputed. Without this guard, the receiver can't disambiguate which part it just received from a part-request that named a colliding map_hash.

@ -2445,7 +2447,7 @@ When the receiver has assembled the full resource (`received_count == total_part
 2. `link.decrypt(...)` to plaintext.
 3. Strip the 4-byte `random_hash` prefix — **discard, do NOT compare to advertisement.r** (see callout below).
 4. If `compressed`: bz2-decompress.
-5. Recompute `SHA256(plaintext_with_random || random_hash)` and compare to `h`.
+5. Recompute `SHA256(plaintext || random_hash)` — over the prefix-stripped, decompressed body — and compare to `h`.
 6. If match: peel off metadata if `x` is set, write `data` to the destination; status = `COMPLETE`.
 7. If mismatch: status = `CORRUPT`; cancel.

@ -2459,7 +2461,7 @@ When the receiver has assembled the full resource (`received_count == total_part
 > formula `SHA256(data || r)`). A receiver that does
 > `assert prefix == advertisement.r` will reject every legitimate
 > Resource as corrupt. Just strip and discard. Integrity is proven
-> exclusively by step 5's `SHA256(plaintext_with_random || random_hash)`
+> exclusively by step 5's `SHA256(plaintext || random_hash)`
 > against `h` — that's the only check that matters; the prefix
 > bytes are scaffolding.

@ -2467,7 +2469,7 @@ On `COMPLETE`, the receiver emits the proof:

 ```
 proof_data = resource_hash(32) || full_proof(32)
-where full_proof = SHA256(data_with_random || resource_hash)
+where full_proof = SHA256(plaintext || resource_hash)
 ```

 sent as `RNS.Packet(link, proof_data, packet_type=PROOF, context=RESOURCE_PRF)` (`Resource.py:755-766`). The `full_proof` is exactly what the initiator pre-computed as `expected_proof` in §10.2 step 5 — it can validate the proof bytewise without re-running the SHA-256.
@ -2511,10 +2513,12 @@ The 3-byte big-endian uint24 metadata length encoding (§10.2 step 1) is what li
 Encryption layering is **outermost** — the wire bytes look like:

 ```
-plaintext           = data_with_random || random_hash    # SHA-256 input
-data_with_random    = random_hash(4) || maybe_compressed_body
+wire_blob           = prefix(4) || maybe_compressed       # the body that gets encrypted
+prefix              = fresh get_random_hash()[:4]         # NOT `r`; receiver strips & discards
 maybe_compressed    = compressed_body iff `c` flag, else uncompressed
-parts[i]            = link.encrypt( data_with_random[i*SDU : (i+1)*SDU] )
+parts[i]            = link.encrypt(wire_blob)[i*SDU : (i+1)*SDU]   # encrypt whole, then slice
+
+hash                = SHA256(uncompressed_body || random_hash)    # integrity; random_hash = adv `r`
 ```

 Critically, **the link encryption is applied to the WHOLE concatenated data first, then sliced into parts** — not to each part individually. This means part boundaries don't align with cipher block boundaries; a missing part can't be decrypted in isolation. The receiver must accumulate all parts before calling `link.decrypt()` (`Resource.py:676-679`).