docs(spec): fix §10.2 Resource integrity hash — prefix is not r, not hashed

§10.2 step 3 wrongly equated the random-hash prefix prepended to the
Resource body with the advertisement's `r` field, and step 5 fed that
prefix into the hash/expected_proof input. Upstream RNS uses two
distinct get_random_hash()[:4] values: a throwaway prefix the receiver
strips and discards, and self.random_hash (the adv `r` field). The
integrity hash is SHA256(uncompressed_plaintext || r) over the
prefix-stripped, decompressed body — exactly as §10.8 already stated.

- §10.2 steps 3 & 5 corrected to agree with §10.8
- §10.8: renamed misleading plaintext_with_random / data_with_random
- §10.12: wire-layering block rewritten to match
- README: errata entry under Spec corrections

Verified against RNS 1.2.5 (Resource.py:332,405,412,440-443,682-694,755).

Resolves #9.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Rob 2026-05-17 10:28:20 -04:00
commit 1b955d19a9
2 changed files with 18 additions and 11 deletions

26
SPEC.md
View file

@ -2267,13 +2267,15 @@ Given input data and an `RNS.Link` in `ACTIVE` state (`RNS/Resource.py:248-478`)
1. **Optional metadata prefix.** If the caller supplied a `metadata` dict, msgpack-pack it and prepend `length(3 bytes, big-endian uint24) || packed_metadata` to the body. The `has_metadata` (`x`) flag in the advertisement signals this. Receivers strip the prefix during reassembly (line 699-707).
2. **Optional bz2 compression.** If `auto_compress` is true and the data fits within `auto_compress_limit` (default 64 MiB), the body is bz2-compressed and the `compressed` (`c`) flag is set. If compression doesn't shrink the data, the uncompressed form is sent and `c` is cleared.
3. **Random hash prefix.** A 4-byte (`Resource.RANDOM_HASH_SIZE`) random hash is prepended to the (compressed-or-not) body. This is the `r` field in the advertisement and is part of the input to `hash` and `expected_proof`.
3. **Random hash prefix.** A 4-byte (`Resource.RANDOM_HASH_SIZE`) random hash is prepended to the (compressed-or-not) body`Resource.py:405`/`412`, a fresh `RNS.Identity.get_random_hash()[:4]` call. This prefix is **not** the `r` field, and is **not** part of the `hash` / `expected_proof` input. It is a separate throwaway value that travels inside the encrypted blob; the receiver strips and discards it (§10.8 step 3). The advertisement's `r` field carries a *different* value — `self.random_hash`, generated by its own `get_random_hash()[:4]` call at `Resource.py:440` — which is the actual integrity-hash and hashmap salt.
4. **Link encryption.** The full `random_hash || (compressed?) data` blob is encrypted using `link.encrypt(...)` — i.e. the link-derived Token form (§3.1), no ephemeral_pub prefix. The `encrypted` (`e`) flag is set.
5. **Hash and proof material.**
- `data_with_random = random_hash || (compressed?) plaintext`
- `hash = SHA256(data_with_random || random_hash)` (32 bytes)
5. **Hash and proof material** (`Resource.py:440-443`). All three are computed over the **original uncompressed `plaintext`** — the caller's input, including any metadata prefix from step 1 (`Resource.py:332`) — *not* the compressed body, and *not* the random-prefixed wire blob from step 3:
- `random_hash = RNS.Identity.get_random_hash()[:4]` — the value the advertisement's `r` field carries.
- `hash = SHA256(plaintext || random_hash)` (32 bytes)
- `truncated_hash = hash[:16]`
- `expected_proof = SHA256(data_with_random || hash)` (32 bytes) — what the receiver will eventually return in the RESOURCE_PRF packet.
- `expected_proof = SHA256(plaintext || hash)` (32 bytes) — what the receiver will eventually return in the RESOURCE_PRF packet.
The 4-byte prefix from step 3 is **not** in any of these inputs. The receiver strips the prefix and bz2-decompresses *before* hashing (§10.8 steps 3-5), so the sender must hash the uncompressed, unprefixed `plaintext` for the two sides to agree. A receiver that includes the prefix, or hashes the compressed form, rejects every legitimate Resource as `CORRUPT`.
6. **Part split.** The encrypted body is sliced into parts of size `SDU = link.mtu - HEADER_MAXSIZE - IFAC_MIN_SIZE`. Each part becomes a packed `RNS.Packet(link, part_data, context=RESOURCE)`; the packed wire bytes are stored in `parts[i]` for later sending.
7. **Hashmap.** Each part is fingerprinted to `MAPHASH_LEN = 4 bytes`. The full hashmap is `b"".join(map_hashes)`. **Hash collisions within the COLLISION_GUARD_SIZE = 2 × WINDOW_MAX + HASHMAP_MAX_LEN window are detected at construction time** — if two parts hash to the same 4-byte map_hash within that window, the random hash is regenerated and the whole hashmap is recomputed. Without this guard, the receiver can't disambiguate which part it just received from a part-request that named a colliding map_hash.
@ -2445,7 +2447,7 @@ When the receiver has assembled the full resource (`received_count == total_part
2. `link.decrypt(...)` to plaintext.
3. Strip the 4-byte `random_hash` prefix — **discard, do NOT compare to advertisement.r** (see callout below).
4. If `compressed`: bz2-decompress.
5. Recompute `SHA256(plaintext_with_random || random_hash)` and compare to `h`.
5. Recompute `SHA256(plaintext || random_hash)` — over the prefix-stripped, decompressed body — and compare to `h`.
6. If match: peel off metadata if `x` is set, write `data` to the destination; status = `COMPLETE`.
7. If mismatch: status = `CORRUPT`; cancel.
@ -2459,7 +2461,7 @@ When the receiver has assembled the full resource (`received_count == total_part
> formula `SHA256(data || r)`). A receiver that does
> `assert prefix == advertisement.r` will reject every legitimate
> Resource as corrupt. Just strip and discard. Integrity is proven
> exclusively by step 5's `SHA256(plaintext_with_random || random_hash)`
> exclusively by step 5's `SHA256(plaintext || random_hash)`
> against `h` — that's the only check that matters; the prefix
> bytes are scaffolding.
@ -2467,7 +2469,7 @@ On `COMPLETE`, the receiver emits the proof:
```
proof_data = resource_hash(32) || full_proof(32)
where full_proof = SHA256(data_with_random || resource_hash)
where full_proof = SHA256(plaintext || resource_hash)
```
sent as `RNS.Packet(link, proof_data, packet_type=PROOF, context=RESOURCE_PRF)` (`Resource.py:755-766`). The `full_proof` is exactly what the initiator pre-computed as `expected_proof` in §10.2 step 5 — it can validate the proof bytewise without re-running the SHA-256.
@ -2511,10 +2513,12 @@ The 3-byte big-endian uint24 metadata length encoding (§10.2 step 1) is what li
Encryption layering is **outermost** — the wire bytes look like:
```
plaintext = data_with_random || random_hash # SHA-256 input
data_with_random = random_hash(4) || maybe_compressed_body
wire_blob = prefix(4) || maybe_compressed # the body that gets encrypted
prefix = fresh get_random_hash()[:4] # NOT `r`; receiver strips & discards
maybe_compressed = compressed_body iff `c` flag, else uncompressed
parts[i] = link.encrypt( data_with_random[i*SDU : (i+1)*SDU] )
parts[i] = link.encrypt(wire_blob)[i*SDU : (i+1)*SDU] # encrypt whole, then slice
hash = SHA256(uncompressed_body || random_hash) # integrity; random_hash = adv `r`
```
Critically, **the link encryption is applied to the WHOLE concatenated data first, then sliced into parts** — not to each part individually. This means part boundaries don't align with cipher block boundaries; a missing part can't be decrypted in isolation. The receiver must accumulate all parts before calling `link.decrypt()` (`Resource.py:676-679`).