Documents the outbound retry layer that wraps the existing per-method send-* flows. Pinned to LXMF 0.9.7 / RNS 1.2.4 with literal-quoted upstream source for every claim: - 4-second tick cadence (PROCESSING_INTERVAL × JOB_OUTBOUND_INTERVAL) - All seven retry constants (MAX_DELIVERY_ATTEMPTS, DELIVERY_RETRY_WAIT, PATH_REQUEST_WAIT, MAX_PATHLESS_TRIES, MESSAGE_EXPIRY, LINK_MAX_INACTIVITY, P_LINK_MAX_INACTIVITY) at LXMRouter.py:30-38 - Eight-state machine (GENERATING/OUTBOUND/SENDING/SENT/DELIVERED/ REJECTED/CANCELLED/FAILED) at LXMessage.py:13-22 - The four terminal-state branches at top of process_outbound (lines 2517-2558) and the three per-method retry branches (OPPORTUNISTIC 2566-2592, DIRECT 2596-2673, PROPAGATED 2677-2730) - fail_message semantics at LXMRouter.py:2395-2402 Includes a "what does NOT happen" section calling out common misconceptions: no automatic DIRECT→PROPAGATED fallback, no exponential backoff, no in-router persistence of pending_outbound, MESSAGE_EXPIRY governs the propagation-node store not per-sender retries, SENT is the terminal success state for PROPAGATED (not DELIVERED). No verifier needed per agent.md §1 — all claims are direct upstream source citations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
13 KiB
Flow: LXMF outbound retry loop and per-message state machine
What LXMRouter.process_outbound actually does on each tick — the layer that wraps send-opportunistic-lxmf.md, send-link-lxmf.md, and send-propagated-lxmf.md and decides when each happy-path operation runs, retries, gives up, or falls through.
The three send-* flows describe what happens for one attempt of each method. This doc describes how attempts are scheduled, how the per-message state advances, and when a message moves from retry-eligible to terminally FAILED. It is the missing piece for any client that wants delivery semantics matching upstream Sideband.
Pinned against RNS 1.2.4 / LXMF 0.9.7. Line numbers below are from those versions.
Cadence: how often process_outbound runs
LXMRouter.jobloop (LXMF/LXMRouter.py:889-899) is a daemon thread that wakes every PROCESSING_INTERVAL seconds and calls LXMRouter.jobs, which dispatches to process_outbound whenever its tick counter is divisible by JOB_OUTBOUND_INTERVAL:
| Constant | Value | File:line |
|---|---|---|
PROCESSING_INTERVAL |
4 (seconds) |
LXMF/LXMRouter.py:31 |
JOB_OUTBOUND_INTERVAL |
1 |
LXMF/LXMRouter.py:852 |
So the effective outbound tick is every 4 seconds. Any per-message timer (path-request defer, retry backoff, link-establish timeout) is sampled at this granularity — a 10-second backoff isn't actually 10 seconds, it's "first tick at or after now + 10s."
handle_outbound also kicks process_outbound directly on a fresh thread when a new message is queued (LXMF/LXMRouter.py:1691), so the first attempt doesn't wait for the next jobloop tick.
Constants that drive retry behavior
All on LXMRouter, all module-cited (LXMF/LXMRouter.py:30-38):
| Constant | Value | Meaning |
|---|---|---|
MAX_DELIVERY_ATTEMPTS |
5 |
Per-message attempt cap. Crossing this triggers fail_message. |
DELIVERY_RETRY_WAIT |
10 (seconds) |
Wait between attempts when path is known but the prior attempt didn't yield delivery proof. |
PATH_REQUEST_WAIT |
7 (seconds) |
Wait after issuing a path? request before the next attempt. |
MAX_PATHLESS_TRIES |
1 |
OPPORTUNISTIC only — number of attempts before forcing a path request. |
MESSAGE_EXPIRY |
30*24*60*60 (30 days) |
Used by propagation-node store cleanup, not the per-message retry path. |
LINK_MAX_INACTIVITY |
10*60 |
Direct-link idle teardown threshold (clean_links). |
P_LINK_MAX_INACTIVITY |
3*60 |
Propagation-link idle teardown threshold. |
A full single-message retry budget for DIRECT or PROPAGATED is therefore 5 attempts × 10 seconds ≈ 50 seconds of wall-clock before fail_message runs, plus whatever each attempt itself spends inside the link-establishment / proof-wait window.
Per-message state machine
States from LXMF/LXMessage.py:13-22:
| State | Value | When |
|---|---|---|
GENERATING |
0x00 |
Stamp generation in progress (deferred-stamp messages only) |
OUTBOUND |
0x01 |
Queued in pending_outbound; not currently transmitting |
SENDING |
0x02 |
A send is in flight on the wire (packet sent / Resource transferring) |
SENT |
0x04 |
Wire send completed, but no end-to-end PROOF yet — also the terminal state for PROPAGATED (delivery to the recipient is the propagation node's job) |
DELIVERED |
0x08 |
End-to-end PROOF received from the final recipient — only reachable for OPPORTUNISTIC and DIRECT |
REJECTED |
0xFD |
Receiver explicitly rejected (e.g. stamp validation failed on a propagation node) |
CANCELLED |
0xFE |
Sender called LXMessage.cancel while still queued |
FAILED |
0xFF |
MAX_DELIVERY_ATTEMPTS exhausted, or unrecoverable error |
The valid-method enum is LXMessage.OPPORTUNISTIC = 0x01, DIRECT = 0x02, PROPAGATED = 0x03, PAPER = 0x05 (LXMF/LXMessage.py:29-32).
Per-tick decision tree
process_outbound (LXMF/LXMRouter.py:2513) holds outbound_processing_lock across the whole tick (line 2514-2515) and walks pending_outbound once. For each message, the top-of-loop branches on terminal state first:
| Branch | File:line | Effect |
|---|---|---|
state == DELIVERED |
2517-2542 | Remove from queue. If method was DIRECT, perform backchannel-identify on the link so the recipient can reply over the same link. |
method == PROPAGATED and state == SENT |
2544-2546 | Remove from queue (PROPAGATED's terminal success state is SENT, not DELIVERED — see state table). |
state == CANCELLED |
2548-2552 | Remove and fire failed_callback. |
state == REJECTED |
2554-2558 | Remove and fire failed_callback. |
Else (OUTBOUND or SENDING) |
2560+ | Per-method retry/send branch — see below. |
The non-terminal branch in turn switches on lxmessage.method:
OPPORTUNISTIC branch (LXMF/LXMRouter.py:2566-2592)
if lxmessage.method == LXMessage.OPPORTUNISTIC:
if lxmessage.delivery_attempts <= LXMRouter.MAX_DELIVERY_ATTEMPTS:
if lxmessage.delivery_attempts >= LXMRouter.MAX_PATHLESS_TRIES \
and not RNS.Transport.has_path(lxmessage.get_destination().hash):
# Force a path request, defer PATH_REQUEST_WAIT seconds
...
lxmessage.next_delivery_attempt = time.time() + LXMRouter.PATH_REQUEST_WAIT
elif lxmessage.delivery_attempts == LXMRouter.MAX_PATHLESS_TRIES + 1 \
and RNS.Transport.has_path(...):
# Path is known but prior attempt failed — drop_path + re-discover
RNS.Reticulum.get_instance().drop_path(...)
...
lxmessage.next_delivery_attempt = time.time() + LXMRouter.PATH_REQUEST_WAIT
else:
if not hasattr(lxmessage, "next_delivery_attempt") \
or time.time() > lxmessage.next_delivery_attempt:
lxmessage.delivery_attempts += 1
lxmessage.next_delivery_attempt = time.time() + LXMRouter.DELIVERY_RETRY_WAIT
lxmessage.send()
else:
self.fail_message(lxmessage)
Key behaviors:
- First attempt is "pathless-tolerant": if
delivery_attempts < MAX_PATHLESS_TRIES (=1)and there's no path, the message still tries a send (relying onhandle_outbound's pre-emptivepath?atLXMF/LXMRouter.py:1675-1679). - After the pathless tries are exhausted, an explicit
path?is fired and the message defersPATH_REQUEST_WAIT (=7s). - The
MAX_PATHLESS_TRIES + 1case is the "I have a stale path that didn't deliver" recovery:Reticulum.drop_pathevicts the bad path table entry, then a freshpath?is requested. - The
elsebranch is the actual retransmit: increment attempts, schedule+ DELIVERY_RETRY_WAIT (=10s), firelxmessage.send(). fail_messageruns only afterdelivery_attempts > MAX_DELIVERY_ATTEMPTS— i.e. attempts 1..5 are tried, attempt 6 tripsfail_message.
DIRECT branch (LXMF/LXMRouter.py:2596-2673)
Two sub-paths, decided by whether a usable link already exists in direct_links or backchannel_links:
Existing link, status == ACTIVE (line 2616-2627):
- If
state != SENDING, set the link as the delivery destination and calllxmessage.send(). - If
state == SENDING, just log progress — the prior send is still pending its proof.
Existing link, status == CLOSED (line 2628-2647):
- If the link was previously activated (
activated_at != None), the link died unexpectedly — issue a freshpath?and schedulePATH_REQUEST_WAIT. - Else (link was never activated — LRPROOF never arrived on the prior attempt), retry the path request once via
path_request_retried, then schedulePATH_REQUEST_WAIT. - Either way, drop the dead link from
direct_links/backchannel_linksand schedule the next attempt at+ DELIVERY_RETRY_WAIT.
No link exists (line 2651-2670):
if not hasattr(lxmessage, "next_delivery_attempt") \
or time.time() > lxmessage.next_delivery_attempt:
lxmessage.delivery_attempts += 1
lxmessage.next_delivery_attempt = time.time() + LXMRouter.DELIVERY_RETRY_WAIT
if lxmessage.delivery_attempts < LXMRouter.MAX_DELIVERY_ATTEMPTS:
if RNS.Transport.has_path(lxmessage.get_destination().hash):
delivery_link = RNS.Link(lxmessage.get_destination())
delivery_link.set_link_established_callback(self.process_outbound)
self.direct_links[delivery_destination_hash] = delivery_link
else:
RNS.Transport.request_path(lxmessage.get_destination().hash)
lxmessage.next_delivery_attempt = time.time() + LXMRouter.PATH_REQUEST_WAIT
The set_link_established_callback(self.process_outbound) re-entry is what lets the next tick after a successful LRPROOF immediately enter the "existing ACTIVE link" branch and fire send() — see send-link-lxmf.md §2 for why this works.
fail_message runs at line 2671-2673 once delivery_attempts > MAX_DELIVERY_ATTEMPTS.
PROPAGATED branch (LXMF/LXMRouter.py:2677-2730)
Structurally mirrors DIRECT but against outbound_propagation_link / outbound_propagation_node instead of per-recipient direct links. Two early failures:
outbound_propagation_node == None→ immediatefail_message(line 2680-2682). LXMF will not attempt PROPAGATED without an explicitly configured node — there is no automatic fallback from DIRECT/OPPORTUNISTIC to PROPAGATED. Sideband configures one viaLXMRouter.set_outbound_propagation_nodeat startup; a clean-room client must do the same before the user picks PROPAGATED.- All
MAX_DELIVERY_ATTEMPTSexhausted →fail_message(line 2728-2730).
Otherwise the link-state branching is identical to DIRECT: ACTIVE → send / CLOSED → drop and retry / no-link → establish-or-path-request.
The terminal transition: fail_message
LXMF/LXMRouter.py:2395-2402:
def fail_message(self, lxmessage):
RNS.log(str(lxmessage)+" failed to send", RNS.LOG_DEBUG)
lxmessage.progress = 0.0
if lxmessage in self.pending_outbound: self.pending_outbound.remove(lxmessage)
if lxmessage.state != LXMessage.REJECTED: lxmessage.state = LXMessage.FAILED
if lxmessage.failed_callback != None and callable(lxmessage.failed_callback):
lxmessage.failed_callback(lxmessage)
A few non-obvious properties:
REJECTEDis preserved when present (the receiver explicitly rejected — don't overwrite the reason).- The message is removed from
pending_outboundsynchronously; thefailed_callbackfires on the same thread asprocess_outbound. Callbacks must not block. - There is no automatic re-queue or method change on FAIL. A failed DIRECT message does not get re-tried as PROPAGATED. Apps that want that fallback have to implement it themselves on top of the
failed_callback.
What does NOT happen
These are common assumptions that don't match upstream behavior. Listed here so reimplementers don't trust their intuition:
- No automatic DIRECT→PROPAGATED fallback — see PROPAGATED branch above. The user (or app) chose
desired_methodat message construction time; LXMF never overrides it on failure. - No exponential backoff —
DELIVERY_RETRY_WAIT = 10sis constant across attempts 1..5. - No persistence of
pending_outboundto disk by default — pending outbound messages live in process memory. A LXMRouter restart drops them. (Sideband persists messages at the app level, not via LXMRouter.) MESSAGE_EXPIRYis not a per-message send timeout. It governs the propagation-node store (how long the node retains a message for offline pickup); it does not bound how long a single sender will keep retrying. The retry loop bounds itself viaMAX_DELIVERY_ATTEMPTS, which at ~10s per attempt is ~50 seconds, not 30 days.SENTis notDELIVERED. PROPAGATED reachesSENTafter the propagation node accepts the message; the recipient may pick it up minutes, hours, or days later. There is no end-to-end delivery proof for PROPAGATED messages until the recipient comes online and emits it (seesend-propagated-lxmf.md§6).- Path-request preamble is OPPORTUNISTIC-only at submit time.
handle_outboundonly fires the pre-emptivepath?whenlxmessage.method == OPPORTUNISTIC(LXMF/LXMRouter.py:1675). DIRECT and PROPAGATED rely onprocess_outbound's no-link branch to discover the path on the first tick.
Source map
| Concern | File | Function / line |
|---|---|---|
| Class constants | LXMF/LXMRouter.py |
30-83 |
| Job interval table | LXMF/LXMRouter.py |
852-859 |
jobs dispatcher |
LXMF/LXMRouter.py |
860-887 |
jobloop daemon |
LXMF/LXMRouter.py |
889-899 |
| Pre-emptive path request on submit | LXMF/LXMRouter.py |
1675-1679 |
handle_outbound thread kick |
LXMF/LXMRouter.py |
1691 |
process_outbound entry + lock |
LXMF/LXMRouter.py |
2513-2515 |
| Terminal-state branches (DELIVERED / SENT-PROPAGATED / CANCELLED / REJECTED) | LXMF/LXMRouter.py |
2517-2558 |
| OPPORTUNISTIC retry branch | LXMF/LXMRouter.py |
2566-2592 |
| DIRECT retry branch | LXMF/LXMRouter.py |
2596-2673 |
| PROPAGATED retry branch | LXMF/LXMRouter.py |
2677-2730 |
fail_message |
LXMF/LXMRouter.py |
2395-2402 |
| Message states | LXMF/LXMessage.py |
13-22 |
| Delivery methods | LXMF/LXMessage.py |
29-33 |