diff options
Diffstat (limited to 'news/phase4-nat-traversal/index.html')
| -rw-r--r-- | news/phase4-nat-traversal/index.html | 228 |
1 files changed, 228 insertions, 0 deletions
diff --git a/news/phase4-nat-traversal/index.html b/news/phase4-nat-traversal/index.html new file mode 100644 index 0000000..1d7748b --- /dev/null +++ b/news/phase4-nat-traversal/index.html @@ -0,0 +1,228 @@ +<!DOCTYPE html> +<html lang="en"> +<head> + <meta charset="utf-8"> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <title>Phase 4: Punching Through NATs — Tesseras</title> + <meta name="description" content="Tesseras nodes can now discover their NAT type via STUN, coordinate UDP hole punching through introducers, and fall back to transparent relay forwarding when direct connectivity fails."> + <!-- Open Graph --> + <meta property="og:type" content="article"> + <meta property="og:title" content="Phase 4: Punching Through NATs"> + <meta property="og:description" content="Tesseras nodes can now discover their NAT type via STUN, coordinate UDP hole punching through introducers, and fall back to transparent relay forwarding when direct connectivity fails."> + <meta property="og:image" content="https://tesseras.net/images/social.jpg"> + <meta property="og:image:width" content="1200"> + <meta property="og:image:height" content="630"> + <meta property="og:site_name" content="Tesseras"> + <!-- Twitter Card --> + <meta name="twitter:card" content="summary_large_image"> + <meta name="twitter:title" content="Phase 4: Punching Through NATs"> + <meta name="twitter:description" content="Tesseras nodes can now discover their NAT type via STUN, coordinate UDP hole punching through introducers, and fall back to transparent relay forwarding when direct connectivity fails."> + <meta name="twitter:image" content="https://tesseras.net/images/social.jpg"> + <link rel="stylesheet" href="https://tesseras.net/style.css?h=21f0f32121928ee5c690"> + + + <link rel="alternate" type="application/atom+xml" title="Tesseras" href="https://tesseras.net/atom.xml"> + + + <link rel="icon" type="image/png" sizes="32x32" href="https://tesseras.net/images/favicon.png?h=be4e123a23393b1a027d"> + +</head> +<body> + <header> + <h1> + <a href="https://tesseras.net/"> + <img src="https://tesseras.net/images/logo-64.png?h=c1b8d0c4c5f93b49d40b" alt="Tesseras" width="40" height="40" class="logo"> + Tesseras + </a> + </h1> + <nav> + + <a href="https://tesseras.net/about/">About</a> + <a href="https://tesseras.net/news/">News</a> + <a href="https://tesseras.net/releases/">Releases</a> + <a href="https://tesseras.net/faq/">FAQ</a> + <a href="https://tesseras.net/subscriptions/">Subscriptions</a> + <a href="https://tesseras.net/contact/">Contact</a> + + </nav> + <nav class="lang-switch"> + + <strong>English</strong> | <a href="/pt-br/news/phase4-nat-traversal/">Português</a> + + </nav> + </header> + + <main> + +<article> + <h2>Phase 4: Punching Through NATs</h2> + <p class="news-date">2026-02-15</p> + <p>Most people's devices sit behind a NAT — a network address translator that lets +them reach the internet but prevents incoming connections. For a P2P network, +this is an existential problem: if two nodes behind NATs can't talk to each +other, the network fragments. Phase 4 continues with a full NAT traversal stack: +STUN-based discovery, coordinated hole punching, and relay fallback.</p> +<p>The approach follows the same pattern as most battle-tested P2P systems (WebRTC, +BitTorrent, IPFS): try the cheapest option first, escalate only when necessary. +Direct connectivity costs nothing. Hole punching costs a few coordinated +packets. Relaying costs sustained bandwidth from a third party. Tesseras tries +them in that order.</p> +<h2 id="what-was-built">What was built</h2> +<p><strong>NatType classification</strong> (<code>tesseras-core/src/network.rs</code>) — A new <code>NatType</code> +enum (Public, Cone, Symmetric, Unknown) added to the core domain layer. This +type is shared across the entire stack: the STUN client writes it, the DHT +advertises it in Pong messages, and the punch coordinator reads it to decide +whether hole punching is even worth attempting (Cone-to-Cone works ~80% of the +time; Symmetric-to-Symmetric almost never works).</p> +<p><strong>STUN client</strong> (<code>tesseras-net/src/stun.rs</code>) — A minimal STUN implementation +(RFC 5389 Binding Request/Response) that discovers a node's external address. +The codec encodes 20-byte binding requests with a random transaction ID and +decodes XOR-MAPPED-ADDRESS responses. The <code>discover_nat()</code> function queries +multiple STUN servers in parallel (Google, Cloudflare by default), compares the +mapped addresses, and classifies the NAT type:</p> +<ul> +<li>Same IP and port from all servers → <strong>Public</strong> (no NAT)</li> +<li>Same mapped address from all servers → <strong>Cone</strong> (hole punching works)</li> +<li>Different mapped addresses → <strong>Symmetric</strong> (hole punching unreliable)</li> +<li>No responses → <strong>Unknown</strong></li> +</ul> +<p>Retries with exponential backoff and configurable timeouts. 12 tests covering +codec roundtrips, all classification paths, and async loopback queries.</p> +<p><strong>Signed punch coordination</strong> (<code>tesseras-net/src/punch.rs</code>) — Ed25519 signing +and verification for <code>PunchIntro</code>, <code>RelayRequest</code>, and <code>RelayMigrate</code> messages. +Every introduction is signed by the initiator with a 30-second timestamp window, +preventing reflection attacks (where an attacker replays an old introduction to +redirect traffic). The payload format is <code>target || external_addr || timestamp</code> +— changing any field invalidates the signature. 6 unit tests plus 3 +property-based tests with proptest (arbitrary node IDs, ports, and session +tokens).</p> +<p><strong>Relay session manager</strong> (<code>tesseras-net/src/relay.rs</code>) — Manages transparent +UDP relay sessions between NATed peers. Each session has a random 16-byte token; +peers prefix their packets with the token, the relay strips it and forwards. +Features:</p> +<ul> +<li>Bidirectional forwarding (A→R→B and B→R→A)</li> +<li>Rate limiting: 256 KB/s for reciprocal peers, 64 KB/s for non-reciprocal</li> +<li>10-minute maximum duration for bootstrap (non-reciprocal) sessions</li> +<li>Address migration: when a peer's IP changes (Wi-Fi to cellular), a signed +<code>RelayMigrate</code> updates the session without tearing it down</li> +<li>Idle cleanup with configurable timeout</li> +<li>8 unit tests plus 2 property-based tests</li> +</ul> +<p><strong>DHT message extensions</strong> (<code>tesseras-dht/src/message.rs</code>) — Seven new message +variants added to the DHT protocol:</p> +<table><thead><tr><th>Message</th><th>Purpose</th></tr></thead><tbody> +<tr><td><code>PunchIntro</code></td><td>"I want to connect to node X, here's my signed external address"</td></tr> +<tr><td><code>PunchRequest</code></td><td>Introducer forwards the request to the target</td></tr> +<tr><td><code>PunchReady</code></td><td>Target confirms readiness, sends its external address</td></tr> +<tr><td><code>RelayRequest</code></td><td>"Create a relay session to node X"</td></tr> +<tr><td><code>RelayOffer</code></td><td>Relay responds with its address and session token</td></tr> +<tr><td><code>RelayClose</code></td><td>Tear down a relay session</td></tr> +<tr><td><code>RelayMigrate</code></td><td>Update session after network change</td></tr> +</tbody></table> +<p>The <code>Pong</code> message was extended with NAT metadata: <code>nat_type</code>, +<code>relay_slots_available</code>, and <code>relay_bandwidth_used_kbps</code>. All new fields use +<code>#[serde(default)]</code> for backward compatibility — old nodes ignore what they +don't recognize, new nodes fall back to defaults. 9 new serialization roundtrip +tests.</p> +<p><strong>NatHandler trait and dispatch</strong> (<code>tesseras-dht/src/engine.rs</code>) — A new +<code>NatHandler</code> async trait (5 methods) injected into the DHT engine, following the +same dependency injection pattern as the existing <code>ReplicationHandler</code>. The +engine's message dispatch loop now routes all punch/relay messages to the +handler. This keeps the DHT engine protocol-agnostic while allowing the NAT +traversal logic to live in <code>tesseras-net</code>.</p> +<p><strong>Mobile reconnection types</strong> (<code>tesseras-embedded/src/reconnect.rs</code>) — A +three-phase reconnection state machine for mobile devices:</p> +<ol> +<li><strong>QuicMigration</strong> (0-2s) — try QUIC connection migration for all active peers</li> +<li><strong>ReStun</strong> (2-5s) — re-discover external address via STUN</li> +<li><strong>ReEstablish</strong> (5-10s) — reconnect peers that migration couldn't save</li> +</ol> +<p>Peers are reconnected in priority order: bootstrap nodes first, then nodes +holding our fragments, then nodes whose fragments we hold, then general DHT +neighbors. A new <code>NetworkChanged</code> event variant was added to the FFI event +stream so the Flutter app can show reconnection progress.</p> +<p><strong>Daemon NAT configuration</strong> (<code>tesd/src/config.rs</code>) — A new <code>[nat]</code> section in +the TOML config with STUN server list, relay toggle, max relay sessions, +bandwidth limits (reciprocal vs bootstrap), and idle timeout. All fields have +sensible defaults; relay is disabled by default.</p> +<p><strong>Prometheus metrics</strong> (<code>tesseras-net/src/metrics.rs</code>) — 16 metrics across four +subsystems:</p> +<ul> +<li><strong>STUN</strong>: requests, failures, latency histogram</li> +<li><strong>Punch</strong>: attempts/successes/failures (by NAT type pair), latency histogram</li> +<li><strong>Relay</strong>: active sessions, total sessions, bytes forwarded, idle timeouts, +rate limit hits</li> +<li><strong>Reconnect</strong>: network changes, attempts/successes by phase, duration +histogram</li> +</ul> +<p>6 tests verifying registration, increment, label cardinality, and +double-registration detection.</p> +<p><strong>Integration tests</strong> — Two end-to-end tests using <code>MemTransport</code> (in-memory +simulated network):</p> +<ul> +<li><code>punch_integration.rs</code> — Full 3-node hole-punch flow: A sends signed +<code>PunchIntro</code> to introducer I, I verifies and forwards <code>PunchRequest</code> to B, B +verifies the original signature and sends <code>PunchReady</code> back, A and B exchange +messages directly. Also tests that a bad signature is correctly rejected.</li> +<li><code>relay_integration.rs</code> — Full 3-node relay flow: A requests relay from R, R +creates session and sends <code>RelayOffer</code> to both peers, A and B exchange +token-prefixed packets through R, A migrates to a new address mid-session, A +closes the session, and the test verifies the session is torn down and further +forwarding fails.</li> +</ul> +<p><strong>Property tests</strong> — 7 proptest-based tests covering: signature round-trips for +all three signed message types (arbitrary node IDs, ports, tokens), NAT +classification determinism (same inputs always produce same output), STUN +binding request validity, session token uniqueness, and relay rejection of +too-short packets.</p> +<p><strong>Justfile targets</strong> — <code>just test-nat</code> runs all NAT traversal tests across +<code>tesseras-net</code> and <code>tesseras-dht</code>. <code>just test-chaos</code> is a placeholder for future +Docker Compose chaos tests with <code>tc netem</code>.</p> +<h2 id="architecture-decisions">Architecture decisions</h2> +<ul> +<li><strong>STUN over TURN</strong>: we implement STUN (discovery) and custom relay rather than +full TURN. TURN requires authenticated allocation and is designed for media +relay; our relay is simpler — token-prefixed UDP forwarding with rate limits. +This keeps the protocol minimal and avoids depending on external TURN servers.</li> +<li><strong>Signatures on introductions</strong>: every <code>PunchIntro</code> is signed by the +initiator. Without this, an attacker could send forged introductions to +redirect a node's hole-punch attempts to an attacker-controlled address (a +reflection attack). The 30-second timestamp window limits replay.</li> +<li><strong>Reciprocal bandwidth tiers</strong>: relay nodes give 4x more bandwidth (256 vs 64 +KB/s) to peers with good reciprocity scores. This incentivizes nodes to store +fragments for others — if you contribute, you get better relay service when +you need it.</li> +<li><strong>Backward-compatible Pong extension</strong>: new NAT fields in <code>Pong</code> use +<code>#[serde(default)]</code> and <code>Option<T></code>. Old nodes that don't understand these +fields simply skip them during deserialization. No protocol version bump +needed.</li> +<li><strong>NatHandler as async trait</strong>: the NAT traversal logic is injected into the +DHT engine via a trait, just like <code>ReplicationHandler</code>. This keeps the DHT +engine focused on routing and peer management, and allows the NAT +implementation to be swapped or disabled without touching core DHT code.</li> +</ul> +<h2 id="what-comes-next">What comes next</h2> +<ul> +<li><strong>Phase 4 continued</strong> — performance tuning (connection pooling, fragment +caching, SQLite WAL), security audits, institutional node onboarding, OS +packaging</li> +<li><strong>Phase 5: Exploration and Culture</strong> — public tessera browser by +era/location/theme/language, institutional curation, genealogy integration, +physical media export (M-DISC, microfilm, acid-free paper with QR)</li> +</ul> +<p>With NAT traversal, Tesseras can connect nodes regardless of their network +topology. Public nodes talk directly. Cone-NATed nodes punch through with an +introducer's help. Symmetric-NATed or firewalled nodes relay through willing +peers. The network adapts to the real world, where most devices are behind a NAT +and network conditions change constantly.</p> + +</article> + + </main> + + <footer> + <p>© 2026 Tesseras Project. <a href="/atom.xml">News Feed</a> · <a href="https://git.sr.ht/~ijanc/tesseras">Source</a></p> + </footer> +</body> +</html> |