From f186b71ca51e83837db60de13322394bb5e6d348 Mon Sep 17 00:00:00 2001 From: murilo ijanc Date: Tue, 24 Mar 2026 21:41:06 -0300 Subject: Initial commit Import existing tesseras.net website content. --- news/phase4-storage-deduplication/index.html | 217 ++++++++++++++++++++++++ news/phase4-storage-deduplication/index.html.gz | Bin 0 -> 4807 bytes 2 files changed, 217 insertions(+) create mode 100644 news/phase4-storage-deduplication/index.html create mode 100644 news/phase4-storage-deduplication/index.html.gz (limited to 'news/phase4-storage-deduplication') diff --git a/news/phase4-storage-deduplication/index.html b/news/phase4-storage-deduplication/index.html new file mode 100644 index 0000000..d499b4a --- /dev/null +++ b/news/phase4-storage-deduplication/index.html @@ -0,0 +1,217 @@ + + + + + + Phase 4: Storage Deduplication — Tesseras + + + + + + + + + + + + + + + + + + + + + + + + +
+

+ + + Tesseras + +

+ + +
+ +
+ +
+

Phase 4: Storage Deduplication

+

2026-02-15

+

When multiple tesseras share the same photo, the same audio clip, or the same +fragment data, the old storage layer kept separate copies of each. On a node +storing thousands of tesseras for the network, this duplication adds up fast. +Phase 4 continues with storage deduplication: a content-addressable store (CAS) +that ensures every unique piece of data is stored exactly once on disk, +regardless of how many tesseras reference it.

+

The design is simple and proven: hash the content with BLAKE3, use the hash as +the filename, and maintain a reference count in SQLite. When two tesseras +include the same 5 MB photo, one file exists on disk with a refcount of 2. When +one tessera is deleted, the refcount drops to 1 and the file stays. When the +last reference is released, a periodic sweep cleans up the orphan.

+

What was built

+

CAS schema migration (tesseras-storage/migrations/004_dedup.sql) — Three +new tables:

+ +

Indexes on the hash columns ensure O(1) lookups during reads and reference +counting.

+

CasStore (tesseras-storage/src/cas.rs) — The core content-addressable +storage engine. Files are stored under a two-level prefix directory: +<root>/<2-char-hex-prefix>/<full-hash>.blob. The store provides five +operations:

+ +

All operations are atomic within a single SQLite transaction. The refcount is +the source of truth — if the refcount says the object exists, the file must be +on disk.

+

CAS-backed FsBlobStore (tesseras-storage/src/blob.rs) — Rewritten to +delegate all storage to the CAS. When a blob is written, its BLAKE3 hash is +computed and passed to cas.put(). A row in blob_refs maps the logical path +(tessera + memory + filename) to the CAS hash. Reads look up the CAS hash via +blob_refs and fetch from cas.get(). Deleting a tessera releases all its blob +references in a single transaction.

+

CAS-backed FsFragmentStore (tesseras-storage/src/fragment.rs) — Same +pattern for erasure-coded fragments. Each fragment's BLAKE3 checksum is already +computed during Reed-Solomon encoding, so it's used directly as the CAS key. +Fragment verification now checks the CAS hash instead of recomputing from +scratch — if the CAS says the data is intact, it is.

+

Sweep garbage collector (cas.rs:sweep()) — A periodic GC pass that handles +three edge cases the normal refcount path can't:

+
    +
  1. Orphan files — files on disk with no corresponding row in cas_objects. +Can happen after a crash mid-write. Files younger than 1 hour are skipped +(grace period for in-flight writes); older orphans are deleted.
  2. +
  3. Leaked refcounts — rows in cas_objects with refcount zero that weren't +cleaned up (e.g., if the process died between decrementing and deleting). +These rows are removed.
  4. +
  5. Idempotent — running sweep twice produces the same result.
  6. +
+

The sweep is wired into the existing repair loop in tesseras-replication, so +it runs automatically every 24 hours alongside fragment health checks.

+

Migration from old layout (tesseras-storage/src/migration.rs) — A +copy-first migration strategy that moves data from the old directory-based +layout (blobs/<tessera>/<memory>/<file> and +fragments/<tessera>/<index>.shard) into the CAS. The migration:

+
    +
  1. Checks the storage version in storage_meta (version 1 = old layout, version +2 = CAS)
  2. +
  3. Walks the old blobs/ and fragments/ directories
  4. +
  5. Computes BLAKE3 hashes and inserts into CAS via put() — duplicates are +automatically deduplicated
  6. +
  7. Creates corresponding blob_refs / fragment_refs entries
  8. +
  9. Removes old directories only after all data is safely in CAS
  10. +
  11. Updates the storage version to 2
  12. +
+

The migration runs on daemon startup, is idempotent (safe to re-run), and +reports statistics: files migrated, duplicates found, bytes saved.

+

Prometheus metrics (tesseras-storage/src/metrics.rs) — Ten new metrics for +observability:

+ + + + + + + + + + +
MetricDescription
cas_objects_totalTotal unique objects in the CAS
cas_bytes_totalTotal bytes stored
cas_dedup_hits_totalNumber of writes that found an existing object
cas_bytes_saved_totalBytes saved by deduplication
cas_gc_refcount_deletions_totalObjects deleted when refcount reached zero
cas_gc_sweep_orphans_cleaned_totalOrphan files removed by sweep
cas_gc_sweep_leaked_refs_cleaned_totalLeaked refcount rows cleaned
cas_gc_sweep_skipped_young_totalYoung orphans skipped (grace period)
cas_gc_sweep_duration_secondsTime spent in sweep GC
+

Property-based tests — Two proptest tests verify CAS invariants under random +inputs:

+ +

Integration test updates — All integration tests across tesseras-core, +tesseras-replication, tesseras-embedded, and tesseras-cli updated for the +new CAS-backed constructors. Tamper-detection tests updated to work with the CAS +directory layout.

+

347 tests pass across the workspace. Clippy clean with -D warnings.

+

Architecture decisions

+ +

What comes next

+ +

Storage deduplication completes the storage efficiency story for Tesseras. A +node that stores fragments for thousands of users — common for institutional +nodes and always-on full nodes — now pays the disk cost of unique data only. +Combined with Reed-Solomon erasure coding (which already minimizes redundancy at +the network level), the system achieves efficient storage at both the local and +distributed layers.

+ +
+ +
+ + + + diff --git a/news/phase4-storage-deduplication/index.html.gz b/news/phase4-storage-deduplication/index.html.gz new file mode 100644 index 0000000..7df051e Binary files /dev/null and b/news/phase4-storage-deduplication/index.html.gz differ -- cgit v1.2.3