Phase 4: Punching Through NATs
+2026-02-15
+Most people's devices sit behind a NAT — a network address translator that lets +them reach the internet but prevents incoming connections. For a P2P network, +this is an existential problem: if two nodes behind NATs can't talk to each +other, the network fragments. Phase 4 continues with a full NAT traversal stack: +STUN-based discovery, coordinated hole punching, and relay fallback.
+The approach follows the same pattern as most battle-tested P2P systems (WebRTC, +BitTorrent, IPFS): try the cheapest option first, escalate only when necessary. +Direct connectivity costs nothing. Hole punching costs a few coordinated +packets. Relaying costs sustained bandwidth from a third party. Tesseras tries +them in that order.
+What was built
+NatType classification (tesseras-core/src/network.rs) — A new NatType
+enum (Public, Cone, Symmetric, Unknown) added to the core domain layer. This
+type is shared across the entire stack: the STUN client writes it, the DHT
+advertises it in Pong messages, and the punch coordinator reads it to decide
+whether hole punching is even worth attempting (Cone-to-Cone works ~80% of the
+time; Symmetric-to-Symmetric almost never works).
STUN client (tesseras-net/src/stun.rs) — A minimal STUN implementation
+(RFC 5389 Binding Request/Response) that discovers a node's external address.
+The codec encodes 20-byte binding requests with a random transaction ID and
+decodes XOR-MAPPED-ADDRESS responses. The discover_nat() function queries
+multiple STUN servers in parallel (Google, Cloudflare by default), compares the
+mapped addresses, and classifies the NAT type:
-
+
- Same IP and port from all servers → Public (no NAT) +
- Same mapped address from all servers → Cone (hole punching works) +
- Different mapped addresses → Symmetric (hole punching unreliable) +
- No responses → Unknown +
Retries with exponential backoff and configurable timeouts. 12 tests covering +codec roundtrips, all classification paths, and async loopback queries.
+Signed punch coordination (tesseras-net/src/punch.rs) — Ed25519 signing
+and verification for PunchIntro, RelayRequest, and RelayMigrate messages.
+Every introduction is signed by the initiator with a 30-second timestamp window,
+preventing reflection attacks (where an attacker replays an old introduction to
+redirect traffic). The payload format is target || external_addr || timestamp
+— changing any field invalidates the signature. 6 unit tests plus 3
+property-based tests with proptest (arbitrary node IDs, ports, and session
+tokens).
Relay session manager (tesseras-net/src/relay.rs) — Manages transparent
+UDP relay sessions between NATed peers. Each session has a random 16-byte token;
+peers prefix their packets with the token, the relay strips it and forwards.
+Features:
-
+
- Bidirectional forwarding (A→R→B and B→R→A) +
- Rate limiting: 256 KB/s for reciprocal peers, 64 KB/s for non-reciprocal +
- 10-minute maximum duration for bootstrap (non-reciprocal) sessions +
- Address migration: when a peer's IP changes (Wi-Fi to cellular), a signed
+
RelayMigrateupdates the session without tearing it down
+ - Idle cleanup with configurable timeout +
- 8 unit tests plus 2 property-based tests +
DHT message extensions (tesseras-dht/src/message.rs) — Seven new message
+variants added to the DHT protocol:
| Message | Purpose |
|---|---|
PunchIntro | "I want to connect to node X, here's my signed external address" |
PunchRequest | Introducer forwards the request to the target |
PunchReady | Target confirms readiness, sends its external address |
RelayRequest | "Create a relay session to node X" |
RelayOffer | Relay responds with its address and session token |
RelayClose | Tear down a relay session |
RelayMigrate | Update session after network change |
The Pong message was extended with NAT metadata: nat_type,
+relay_slots_available, and relay_bandwidth_used_kbps. All new fields use
+#[serde(default)] for backward compatibility — old nodes ignore what they
+don't recognize, new nodes fall back to defaults. 9 new serialization roundtrip
+tests.
NatHandler trait and dispatch (tesseras-dht/src/engine.rs) — A new
+NatHandler async trait (5 methods) injected into the DHT engine, following the
+same dependency injection pattern as the existing ReplicationHandler. The
+engine's message dispatch loop now routes all punch/relay messages to the
+handler. This keeps the DHT engine protocol-agnostic while allowing the NAT
+traversal logic to live in tesseras-net.
Mobile reconnection types (tesseras-embedded/src/reconnect.rs) — A
+three-phase reconnection state machine for mobile devices:
-
+
- QuicMigration (0-2s) — try QUIC connection migration for all active peers +
- ReStun (2-5s) — re-discover external address via STUN +
- ReEstablish (5-10s) — reconnect peers that migration couldn't save +
Peers are reconnected in priority order: bootstrap nodes first, then nodes
+holding our fragments, then nodes whose fragments we hold, then general DHT
+neighbors. A new NetworkChanged event variant was added to the FFI event
+stream so the Flutter app can show reconnection progress.
Daemon NAT configuration (tesd/src/config.rs) — A new [nat] section in
+the TOML config with STUN server list, relay toggle, max relay sessions,
+bandwidth limits (reciprocal vs bootstrap), and idle timeout. All fields have
+sensible defaults; relay is disabled by default.
Prometheus metrics (tesseras-net/src/metrics.rs) — 16 metrics across four
+subsystems:
-
+
- STUN: requests, failures, latency histogram +
- Punch: attempts/successes/failures (by NAT type pair), latency histogram +
- Relay: active sessions, total sessions, bytes forwarded, idle timeouts, +rate limit hits +
- Reconnect: network changes, attempts/successes by phase, duration +histogram +
6 tests verifying registration, increment, label cardinality, and +double-registration detection.
+Integration tests — Two end-to-end tests using MemTransport (in-memory
+simulated network):
-
+
punch_integration.rs— Full 3-node hole-punch flow: A sends signed +PunchIntroto introducer I, I verifies and forwardsPunchRequestto B, B +verifies the original signature and sendsPunchReadyback, A and B exchange +messages directly. Also tests that a bad signature is correctly rejected.
+relay_integration.rs— Full 3-node relay flow: A requests relay from R, R +creates session and sendsRelayOfferto both peers, A and B exchange +token-prefixed packets through R, A migrates to a new address mid-session, A +closes the session, and the test verifies the session is torn down and further +forwarding fails.
+
Property tests — 7 proptest-based tests covering: signature round-trips for +all three signed message types (arbitrary node IDs, ports, tokens), NAT +classification determinism (same inputs always produce same output), STUN +binding request validity, session token uniqueness, and relay rejection of +too-short packets.
+Justfile targets — just test-nat runs all NAT traversal tests across
+tesseras-net and tesseras-dht. just test-chaos is a placeholder for future
+Docker Compose chaos tests with tc netem.
Architecture decisions
+-
+
- STUN over TURN: we implement STUN (discovery) and custom relay rather than +full TURN. TURN requires authenticated allocation and is designed for media +relay; our relay is simpler — token-prefixed UDP forwarding with rate limits. +This keeps the protocol minimal and avoids depending on external TURN servers. +
- Signatures on introductions: every
PunchIntrois signed by the +initiator. Without this, an attacker could send forged introductions to +redirect a node's hole-punch attempts to an attacker-controlled address (a +reflection attack). The 30-second timestamp window limits replay.
+ - Reciprocal bandwidth tiers: relay nodes give 4x more bandwidth (256 vs 64 +KB/s) to peers with good reciprocity scores. This incentivizes nodes to store +fragments for others — if you contribute, you get better relay service when +you need it. +
- Backward-compatible Pong extension: new NAT fields in
Ponguse +#[serde(default)]andOption<T>. Old nodes that don't understand these +fields simply skip them during deserialization. No protocol version bump +needed.
+ - NatHandler as async trait: the NAT traversal logic is injected into the
+DHT engine via a trait, just like
ReplicationHandler. This keeps the DHT +engine focused on routing and peer management, and allows the NAT +implementation to be swapped or disabled without touching core DHT code.
+
What comes next
+-
+
- Phase 4 continued — performance tuning (connection pooling, fragment +caching, SQLite WAL), security audits, institutional node onboarding, OS +packaging +
- Phase 5: Exploration and Culture — public tessera browser by +era/location/theme/language, institutional curation, genealogy integration, +physical media export (M-DISC, microfilm, acid-free paper with QR) +
With NAT traversal, Tesseras can connect nodes regardless of their network +topology. Public nodes talk directly. Cone-NATed nodes punch through with an +introducer's help. Symmetric-NATed or firewalled nodes relay through willing +peers. The network adapts to the real world, where most devices are behind a NAT +and network conditions change constantly.
+ +