Ethereum’s most recent fork was (eventually) successful, but it came with unintended consequence: the greater cryptocurrency community learned of its special “big, scary nodes,” and immediately freaked out.
Blockchain infrastructure provider BlockCypher recently posted its version of events leading up to (and directly following) last month’s Constantinople upgrade
It details its quest to reboot a special, dedicated machine (archive node) used to record all the “states” (settings) Ethereum has taken in its history.
At a high level, archive nodes store Ethereum snapshots. Not just a record of all the transactions processed, but a complete map of the entire blockchain each time a block is added. They are different to full nodes, which are just concerned with transactions, not states.
According to BlockCypher’s blog, rebooting archive nodes is supremely difficult – so much so that literally nobody else is bothering to run them, which the company says could present a security risk.
After examining every which way we could think of to add the Trie state to our Ethereum state, we asked [Ethereum co-founder Vitalik Buterin] for assistance. His first comment to us was “oh, you’re one of the few running one of those big, scary nodes.” We asked him if he knew of anyone else running a “big, scary node” to see if we could possibly sync with them. He knew of no one, not even the Ethereum Foundation keeps a full archival copy of the Ethereum chain. We were back to square [one]: starting the full sync again, this time including the Trie state.
It then declared a lesson learned: BlockCypher might be the only ones keeping an “entire history of Ethereum transactions.” This might be a problem, especially in scenarios where the blockchain is under attack.
But according to the Ethereum insiders Hard Fork spoke with, BlockCypher’s concerns don’t necessarily represent the true nature of the network, as archive nodes have no bearing on its overall security.
Archive nodes have ‘no impact on Ethereum’s security or trust model’
Ethereum’s ecosystem relies on Infura, a ConsenSys-backed blockchain infrastructure firm. Participants pay Infura to run resource-intensive processes on their behalf, particularly associated with deploying apps on the network.
Infura co-founder E.G. Galano told Hard Fork that archive nodes are only necessary in certain circumstances. In particular, they are used to check the state of an Ethereum account at any given block height. Other than that specific use, he claims there is no real need to keep them around.
“For example, if you wanted to know the Ether balance an account had at block #4,000,000, you would need to run and query an archive node,” Galano explained. “They are use case dependent and have no impact on the security or trust model of the blockchain.”
To be clear: Ethereum’s archive nodes are not equivalent to full nodes. A full node stores transaction history. An archive node stores that, as well as additional data related to Ethereum’s state.
Galano expressed that full nodes propagate the exact same information across Ethereum as archive nodes. In this case, the “archive label” simply means it computes and stores additional blockchain-derived data to query information more efficiently.
Martin Holst Swende, security team lead for the Ethereum Foundation, told Hard Fork that from a network perspective, archive nodes do not help network robustness any more than a full (or fast-synced) node.
“It is full nodes that are the key to maintaining and synchronizing the Ethereum blockchain, including all transactions and state transitions,” said Swende. “A ‘full node’ is a node which has performed a fast-sync or a so-called full-sync. If the node additionally stores a snapshot of each state at every block, then it is commonly referred to as an ‘archive node.'”
He then confirmed the three types of Ethereum nodes (fast-sync, full-sync, and archive) all keep the required data necessary to replay (or restore) all chain events, not just archive nodes.
The only data stored by archive nodes (and not by the others) is the full history of Ethereum states, which can be derived using data stored in other nodes anyway.
OK – but who is running archive nodes? Is anybody?
Hard Fork spoke with Parity Technologies, a startup building Ethereum infrastructure, to confirm how prevalent Ethereum’s archive nodes really are.
Just like Infura and the Ethereum Foundation, Parity’s technology chief Fredrik Harrysson isn’t quite convinced of BlockCypher’s claims.
“It is certainly not true that there is only one node with a full record of Ethereum’s transaction history, we usually have an archive node running in-house at [Parity], although there is really no need for one,” said Harrysson.
Infura’s Galano sided with Harrysson. “BlockCypher is not the only one running archive nodes. We run many, as do other API and infrastructure providers,” he told Hard Fork. “I don’t know the number that the Ethereum Foundation runs, but they at least run a few for their own use.”
The Ethereum Foundation, too, told Hard Fork it maintains multiple archive nodes, despite how unnecessary they may be. “At the current moment, we’re running three pairs of benchmarks (six machines), two on fast-sync, two on full-sync and two on archive-mode,” said Swende.
If you want to talk security, full nodes are what matter
The crux of BlockCypher’s blog was that it took more than two weeks to ‘reboot’ their version of Ethereum stored by their archive node. It also emphasized Ethereum’s “state” is very different to other blockchains, in that it cannot be restored using any traditional backup method.
Galano suggested BlockCypher’s issue was a lack of solid backup and restore procedure. He said the correct process involves creating a backup and a copy, later using that copy to restore state. This, he insisted, should have preserved the integrity of the original “backup.”
“Everyone makes mistakes and we ran into similar issues earlier in the life of Infura. My issue with that blog post is not that it says running an archive node is resource intensive, it is,” Galano admitted. “My issue is that if you are trusted to run infrastructure as a service and issue a post-mortem, take responsibility for your failures and don’t blame the protocol that your users expect you to understand better than they do.”
The total number of full nodes is essentially what matters for Ethereum, not its archive node count. Current numbers indicate the network consists of almost 12,000 full nodes.
“A reasonable lower-bounded number of nodes required is debatable – essentially if one trusted party had and served all the history it would be fine, but we’d be relying on a centralised party not disappearing, which thankfully is not the case here,” concluded Harrysson.
Archive nodes might not be considered strict requirements for Ethereum to operate securely, but as it turns out, they aren’t as rare as they seem, even despite their apparent lack of utility.
Did you know? Hard Fork has its own stage at TNW2019, our tech conference in Amsterdam. Check it out.
Get the TNW newsletter
Get the most important tech news in your inbox each week.