IPFS As An Archival Storage Solution

Data is expensive. Building a system that maintains a history of all state events and metadata for all said events in any context is daunting. Actually running the system and ensuring the data is always available is something else entirely.

Maintaining a Massive Historical Record

This is a consideration most blockchains have to make, including Ethereum. Ethereum must keep track of all transactions on the chain, but at the same time, nodes in the network can’t simply keep every transaction ever created. At least not all nodes. This would be prohibitively expensive from both a cost and a performance standpoint. It’s for this reason that Ethereum nodes have a pruning process.

Pruning uses snapshots of the state database as an indicator to determine which nodes in the state trie can be kept and which ones are stale and can be discarded. Geth identifies the target state trie based on a stored snapshot layer which has at least 128 block confirmations on top (for surviving reorgs) data that isn't part of the target state trie or genesis state.

This doesn’t mean that the data is lost forever. Instead, it simply shifts the expectation for where one might find that data. In the Ethereum world, there is the concept of an archive node. These nodes maintain the entire state history for the blockchain. They are not necessarily designed to be used for validation and ongoing participation in the network, but they are designed to allow for permissionless querying of historical Ethereum transactions that may have been pruned from the main stateful nodes powering the network.

…accessing a historical state on a full node consumes a lot of computation. The client might need to execute all past transactions and compute one historical state from genesis. Archive nodes solve this by storing not only the most recent states but every historical state created after each block. It basically makes a trade-off with bigger disk space requirement.

It’s a brilliant system that allows for both the ongoing use of the network with reasonable performance and cost and access to historical data that may have been pruned from full Ethereum nodes. Archival is not cheap though. Continuing with our Ethereum example, the current estimated storage requirement for running an archive node is 12+ TB of disk space. The egress (bandwidth) cost for accessing this data is another consideration.

All of this is necessary to keep the Ethereum network running as expected and to provide historical data, but there are lighter-weight solutions to archiving that many other use cases (and in some cases even Ethereum) can make use of. For example, the Interplanetary File System (IPFS) provides a tailor-made solution to maintaining historical records that remain publicly accessible.

In the Ethereum example, if historical data were stored on IPFS, it would almost surely be used just for data aggregation, not for powering the network. However data aggregation is a massive benefit to many developers. In fact, data aggregation is a massive benefit to everyone.

This topic becomes increasingly interesting as the Ethereum blockchain pushes ahead with its roadmap. On the roadmap is “The Purge”, put forth currently as a proposal (EIP-4444). This proposal would force Ethereum clients to prune and remove all data over one year in age. Much like the pruning mentioned above, if anyone wants to access the pruned data, it has to live somewhere else. The proposal itself recommends IPFS as one possible solution.

This proposal impacts nodes that make use of historical data (e.g. web3 applications that display history of blocks, transactions or accounts). Preserving the history of Ethereum is fundamental and we believe there are various out-of-band ways to achieve this. Historical data can be packaged and shared via torrent magnet links or over networks like IPFS. Furthermore, systems like the Portal Network or The Graph can be used to acquire historical data. Clients should allow importing and exporting of historical data. Clients can provide scripts that fetch/verify data and automatically import them.

Preserving The History of The Web

Ethereum isn’t the only protocol or platform in need of archiving solution. Archiving happens in industries big and small. It happens in your own home at a micro-level. Every time you open up that rusty old filing cabinet and put some bill you will probably never look at again inside, you’re archiving. On a larger scale, there are organizations like The Internet Archive that are archiving much much more than old bills and doctors’ records.

The Internet Archive has, in essence, a data aggregation service that snapshots the entire web, or most of it. And that’s not all. They archive books and music and film and anything that seems like it is of cultural significance to the world, or will one day be significant. To help them create and maintain their massive archive, the organization turned to IPFS. IPFS provides content verifiability through its content addressing system that generates content identifiers (CIDs) for every file. IPFS also provides open access to all. It’s the ideal solution for archives.

IPFS isn’t the only solution The Internet Archive uses for archival, but it’s a powerful one that provides open access to all.

Conclusion

As the archival needs of our web-based world grow, the need for solutions to store large amounts of data that can be retrieved performantly increases. Archives give a look back in history, but only if the archive is accessible. IPFS storage and IPFS gateways provide both the archival and the retrieval layer to ensure we can preserve the things most important to us.

If you want to archive anything, even that random short story you wrote in fifth grade, Pinata is here to make access to IPFS easy. Through file storage called pinning, and file retrieval through Dedicated IPFS Gateways, you can archive as little or as much as you want and need.

Happy pinning!

IPFS As An Archival Storage Solution

Maintaining a Massive Historical Record

Preserving The History of The Web

Conclusion

tutorials

How To Run Your Own IPFS Gateway

web3

What Are IPFS Gateway Access Controls?

Stay up to date