Data is expensive. Building a system that maintains a history of all state events and metadata for all said events in any context is daunting. Actually running the system and ensuring the data is always available is something else entirely.
Maintaining a Massive Historical Record
This is a consideration most blockchains have to make, including Ethereum. Ethereum must keep track of all transactions on the chain, but at the same time, nodes in the network can’t simply keep every transaction ever created. At least not all nodes. This would be prohibitively expensive from both a cost and a performance standpoint. It’s for this reason that Ethereum nodes have a pruning process.
This doesn’t mean that the data is lost forever. Instead, it simply shifts the expectation for where one might find that data. In the Ethereum world, there is the concept of an archive node. These nodes maintain the entire state history for the blockchain. They are not necessarily designed to be used for validation and ongoing participation in the network, but they are designed to allow for permissionless querying of historical Ethereum transactions that may have been pruned from the main stateful nodes powering the network.
It’s a brilliant system that allows for both the ongoing use of the network with reasonable performance and cost and access to historical data that may have been pruned from full Ethereum nodes. Archival is not cheap though. Continuing with our Ethereum example, the current estimated storage requirement for running an archive node is 12+ TB of disk space. The egress (bandwidth) cost for accessing this data is another consideration.
All of this is necessary to keep the Ethereum network running as expected and to provide historical data, but there are lighter-weight solutions to archiving that many other use cases (and in some cases even Ethereum) can make use of. For example, the Interplanetary File System (IPFS) provides a tailor-made solution to maintaining historical records that remain publicly accessible.
In the Ethereum example, if historical data were stored on IPFS, it would almost surely be used just for data aggregation, not for powering the network. However data aggregation is a massive benefit to many developers. In fact, data aggregation is a massive benefit to everyone.
This topic becomes increasingly interesting as the Ethereum blockchain pushes ahead with its roadmap. On the roadmap is “The Purge”, put forth currently as a proposal (EIP-4444). This proposal would force Ethereum clients to prune and remove all data over one year in age. Much like the pruning mentioned above, if anyone wants to access the pruned data, it has to live somewhere else. The proposal itself recommends IPFS as one possible solution.
Preserving The History of The Web
Ethereum isn’t the only protocol or platform in need of archiving solution. Archiving happens in industries big and small. It happens in your own home at a micro-level. Every time you open up that rusty old filing cabinet and put some bill you will probably never look at again inside, you’re archiving. On a larger scale, there are organizations like The Internet Archive that are archiving much much more than old bills and doctors’ records.
The Internet Archive has, in essence, a data aggregation service that snapshots the entire web, or most of it. And that’s not all. They archive books and music and film and anything that seems like it is of cultural significance to the world, or will one day be significant. To help them create and maintain their massive archive, the organization turned to IPFS. IPFS provides content verifiability through its content addressing system that generates content identifiers (CIDs) for every file. IPFS also provides open access to all. It’s the ideal solution for archives.
IPFS isn’t the only solution The Internet Archive uses for archival, but it’s a powerful one that provides open access to all.
As the archival needs of our web-based world grow, the need for solutions to store large amounts of data that can be retrieved performantly increases. Archives give a look back in history, but only if the archive is accessible. IPFS storage and IPFS gateways provide both the archival and the retrieval layer to ensure we can preserve the things most important to us.
If you want to archive anything, even that random short story you wrote in fifth grade, Pinata is here to make access to IPFS easy. Through file storage called pinning, and file retrieval through Dedicated IPFS Gateways, you can archive as little or as much as you want and need.
October 6, 2023
Ready to shape the future?
Explore our plans