

Intro to Merkle Trees, Blockchains, and Scalability on Accumulate

In a similar way to how a Kindle can minimize the weight load of carrying around millions of books, Merkle Trees allow for many transactions to be securely stored and accessed while reducing the amount of data that is required to be communicated for each validator.

Merkle Trees allow for private data to not only be secure and high quality, but also scalable. A Merkle Tree is a type of binary tree that is extremely useful for the verification of large amounts of data. They are used in many applications such as Git, NoSQL databases, and the blockchain.

If a Merkle Tree is being used to represent some amount of data, an outside user can find out if a particular piece of data is contained in that set of data without directly having access to any of the data in the set – thus keeping it secure with a limited footprint. This is accomplished through the use of cryptographic hashes and cryptographic hash functions.

Accumulate uses Merkle Trees for ADIs in order to lighten the overall volume of data that is being communicated on the network while maintaining high levels of security.

Today we’re going to talk about

What is a cryptographic hash?
What is a Merkle Tree?
How does blockchain technology use Merkle Trees?
How Accumulate ADIs use Merkle Trees?

What is a Cryptographic Hash?

A cryptographic hash function (CHF) takes an arbitrary amount of data as an input and outputs a fixed-sized hash. These hashes are deterministic, which means they will always be the same if the same data is inputted, and that no two inputs will produce the same hash. In an oversimplified example, one could input a text file of the book Great Expectations and receive the hash “xD3vL2G”. If one inputted a different text file of Great Expectations with a single typo, like a misspelled word or an extra comma, into the same cryptographic hash function you’d instead receive the hash “M4LL8Ta”. Every unique data input will have a different unique output.

One of the major benefits of cryptographic hashes is that it is impossible to backwards engineer the original data input from a hash. This is why hashes are often used in password authentication. A company will often store a hash of your password on their database, and when you input your password on their site, they will run it through a Cryptographic Hash Function (CHF) to see if the hash matches as expected. Just like no function can generate the full text of Great Expectations from the input “xD3vL2G”, no malicious hacker will be able to generate your password from a hash of the password stored in the database.

Now that you understand cryptographic hashes, you can understand Merkle Trees and how they enable blockchain technology.

What is a Merkle Tree?

Merkle Trees are a type of binary tree where every node contains a cryptographic hash. The leaves, or base level, of the binary Merkle Tree each contains the hash of data that a developer wishes to store for verification purposes. For instance, each leaf of a Merkle Tree could contain the hash of a single transaction in a block of the blockchain. The parent nodes of these leaves will each contain a new hash that’s calculated from the hashes contained in its children. This process continues up until you reach the root node, which will contain a unique hash calculated using every hash contained in the tree.

Continuing the earlier example, if you have a Merkle Tree which represents a block with four transactions (A, B, C, D), then the leaf nodes of the tree will contain Hash-A, Hash-B, Hash-C, and Hash-D. The parents of those nodes will contain Hash-AB and Hash-CD. Finally, the root node of the Merkle Tree will calculate a new hash based on its children and will contain Hash-ABCD.

Merkle Trees and their cryptographic underpinnings are fundamental to the blockchain. That’s why Bitcoin is called a (crypto)currency.

How Does Blockchain Use Merkle Trees?

Every time a transaction is committed to the blockchain, a hash is generated representing the transaction, for example a1075db55d416d3ca199f55b6084e2115b9345e16c5cf302fc80e9d5fbf5d48d. These transactions are added onto the blockchain in blocks containing thousands of transactions. Each of these blocks has a unique header ID, can you guess where that header comes from? It is the root hash of the Merkle Tree representing all of the transactions in that block!

If you wanted to verify that a transaction was contained in a certain block, you would only need to check the transaction’s hash against the Merkle root hash – rather than every transaction ID in the entire block, which requires a tremendous amount of memory and processing power. With the Merkle Tree, you can check if a value is contained in a dataset without downloading the entire dataset.

Merkle Trees also allow for the creation of light-client nodes, such as hardware wallets, which do not contain a copy of the entire blockchain. The whole Bitcoin blockchain is currently over 300 GBs of data, it isn’t practical to expect every user to have that much storage space simply to hold crypto. That is multiple times as much data as most phones and laptops store, so thank Merkle Trees for the relatively small size of your mobile wallet.

Light-client nodes have small sizes of around 100 MBs because they mostly contain the Merkle roots of all blocks in the blockchain, rather than the full history of every transaction ever made on the blockchain. The full blockchain is contained in full nodes that light-client nodes communicate with. Light-client can use these Merkle roots and transaction IDs to verify any transaction that has ever occurred on the blockchain.

Merkle Trees are instrumental to cryptocurrencies because they allow for efficient verification of transactions with low data storage costs!

How Accumulate ADIs Use Merkle Trees

Merkle Trees can be seen in the Accumulate Network architecture in our ADI accounts. As you may know, an ADI account is a Distributed Digital Identity and Identifier (DDII) that can do things like:

Send a receive tokens like Bitcoin
Issue smart contracts like Ethereum or Polkadot
Facilitate consensus building and key security preferences for enterprises

ADIs and identity serve as the core part of Accumulate’s platform, so it is important to discuss them in our conversation about Merkle Trees.

Every ADI has its very own, constantly-growing Merkle Tree. Merkle Trees make it possible for Accumulate to scale by verifying any content within a chain is immutable and complete without requiring proof from another chain. Each tree is a hierarchical data structure that separates data validation from the data itself. Once a transaction is validated, it is fed to an accumulator that adds the hashes to a particular Merkle Tree’s account.

Another way to describe accumulators is Block Validator Networks (BVNs)– which is used to describe the interconnectedness in communication between validators in which the entire network sends/ receives a summary output hash of the transaction. This process reduces the volume of data required for a transaction to be executed by a validator, thus lightening the load of the network.

Other uses for Merkle Trees across the Accumulate ecosystem include:

Scratch Space Creation: Merkle Trees define order of transactions in a chain as well as membership. Order membership and state are required for authorship validation as well as to provide audit trails for events. This allows Accumulate to provide a token chain called a Scratch Space that deletes transaction history but maintains a cryptographically provable balance of accounts.
Sharding: Although you may have gathered this from an early reference to how Merkle Trees unlock scalability, Merkle Trees are fundamental to sharding on the Accumulate protocol. Accumulate uses three types of sharding: Network, Route, and Execution Sharding. Each identity is its own chain that is ever-growing on a Merkle Tree, which proves contents are immutable without requiring additional proof from other structures. Sharding of identity ensures that all of these essential operations are processed in parallel.
Anchoring: Every now and then, Merkle root proofs of data are inserted by Accumulate into Layer 1 chains, adding to the protocol’s universality between chains. This process is known as anchoring because it places a hash from one system into a hash from another system, allowing you to buy the security of a larger and more secure blockchain for a fractional cost (of 1 transaction). This process has been updated by Accumulate to involve the collection of many entries submitted across many applications, then reducing the state into just a single hash. An entire group of blocks can be put into a single anchor.

In Conclusion

Merkle Trees allow for large quantities of data to be scalable and secure. They are central to how enterprises will build large networks dependent on data to meet the growing needs of AI, blockchain, and other Web 3 technology. Accumulate uses Merkle Trees to scale because they are able to minimize the amount of data that needs to be communicated on the network for transactions to be validated.

← Prev: 2021 Recap: Accumulate Updates and Milestones Next: How Accumulate Connects and Enhances Layer 1 & 2 Protocols →