Storage and Synchronization of Distributed Ledgers: How Data Is Stored, Replicated, and Verified
- Marc Griffith

- Dec 1, 2025
- 2 min read

A full Bitcoin node today stores about 650 GB of blocks and about 6 GB of UTXO. An archival Ethereum node exceeds 14 TB because it preserves historical states. The replication factor is essentially equal to the number of full nodes (tens of thousands for Bitcoin, several thousand for Ethereum mainnet). Economic incentives (mining rewards, staking yields) keep enough operators running such heavy software.
In a modular stack:
– The consensus and data availability (DA) layer orders transactions and guarantees that block data remains available for about one to two weeks.
– Execution environments (EVM rollups, SVM rollups, custom VMs) process off-chain transactions and publish compact fraud or validity proofs plus state differences back to the DA layer.
– Full DA nodes still download everything, but there are far fewer of them.
– Full rollup nodes download only data relevant to their own chain.
This drastically reduces the replication burden for most participants, without compromising cryptographic guarantees.
Here’s how it works in practice:
After the window, the blobs are deleted, but the state root remains forever on the beacon chain. This is the first step toward separating Ethereum’s monolithic model.
– Lock-and-mint bridges (e.g., Wormhole, LayerZero) — observers report headers and Merkle proofs.
– Light-client relayers (IBC, Polygon AggLayer, Near Rainbow) — full nodes of chain A run a light client of chain B inside their VM and vice versa.
– Shared sequencing + asynchronous state roots (Espresso, Astria) — multiple rollups agree on ordering via a common service, eliminating traditional bridges.
Each approach replicates only the minimum necessary state — typically Merkle paths from a header to a specific event — rather than entire chains.
Data availability layers drastically reduce hardware requirements, but open a new attack surface: data withholding attacks. If block data disappears before proofs are finalized, the chain can halt. For this reason, DAS trust thresholds are set very high (often 99.99% statistical certainty).
This nuance is what actually enables the system to operate at scale.




