The State of the Machine: A Deep Dive into Ethereum's Core - Introduction
"To the user, the database is a black box that stores their money. To the engineer, the database is a lie that we desperately try to keep consistent. But to the Merkle Trie, the database is a mathematical proof. In this world, if you flip a single bit, you do not just corrupt a file, you create a parallel universe."
Welcome to a 5-part technical series documenting the development of merkle-trie-rs, a from-scratch, production-grade implementation of the Ethereum Modified Merkle Patricia Trie (MPT). Over the course of 15 chapters, we'll dissect the Ethereum Yellow Paper, implement recursive RLP serialization, and navigate the bitwise madness required to store the World State.
The Manifest
This is not a tutorial. This is a technical breakdown.
We are going to take the engine of the Ethereum Virtual Machine, disassemble it bolt by bolt, and put it back together using Rust.
Each part is designed to be standalone yet connected, so you can jump to any topic that interests you, though reading in order provides the complete journey from fundamentals to production.
Part I: The Alphabet (Nibbles)
We begin with the physics of the disk. We explore why Binary Trees fail at scale, why Ethereum chose the Hexary Radix, and the bitwise surgery required to implement a virtual u4 type system in a world that only speaks u8.
Chapters:
- Chapter 1: The Hexary Constraint (The Physics of the Disk)
- Chapter 2: The Bitwise Surgery (Nibbles)
- Chapter 3: The Compact Encoding (Hex-Prefix)
Part II: The Four Horsemen (Nodes)
We architect the recursive Node Enum. We dive into the genius of the "Extension Node"—the mechanism that allows Ethereum to compress empty space. We tackle the Rust-specific hell of managing infinite recursive types using Box and Arc.
Chapters:
- Chapter 4: The Architecture of Recursion (The Node Enum)
- Chapter 5: The Logic of Lookup (Reading the Mind)
- Chapter 6: The Cryptographic Link (The Merkle Property)
Part III: The Serialization Trap (RLP)
We implement the Recursive Length Prefix standard from scratch. We face what troubled me quite a lot, the Inline Node Optimization rule, a recursive logic bomb that has broken countless implementations before this one.
Chapters:
- Chapter 7: The Recursive Grammar (RLP)
- Chapter 8: The Inline Node Optimization (The Final Boss)
- Chapter 9: The Test of Truth (Determinism)
Part IV: The Surgical Strike (Insertion)
We implement the algorithm for modifying immutable structures. We visualize the "Split Condition" where keys diverge, and we write the recursive "surgery code" that tears a node apart and rebuilds it without breaking the cryptographic chain.
Chapters:
- Chapter 10: The Theory of Mutation (Recursive Surgery)
- Chapter 11: The Collision (Splitting Nodes)
Part V: The Witness (Proofs)
We reach the summit. We discuss the theory of Light Clients and "Trustless" verification. We implement the get_proof function, generating the cryptographic witnesses that power the modern decentralized web.
Chapters:
- Chapter 12: The Theory of Light Clients (Merkle Proofs)
- Chapter 13: The Visualizer (Seeing the Invisible)
- Chapter 14: The Cost of Truth (Performance Analysis)
- Chapter 15: The Epilogue (The State of the Machine)
The Project Statistics
The Goal: Implement Appendix D of the Ethereum Yellow Paper.
The Constraint: Strict adherence. No shortcuts. 100% compatibility with Mainnet vectors.
The Stack: Rust, tiny-keccak, rlp, hex, thiserror, clap, serde.
test suite: 64 tests
determinism: 100%
codebase: pure rust
status: production-ready
Prologue: The Search for Truth
1. The Dopamine of Speed
In my previous project (like all my other projects), Rusty Redis, I was chasing a very specific high: Speed.
I lived in the profiler. I obsessed over L1 cache hits. I fought against lock contention like it was my personal enemy. I optimized context switching and shaved microseconds off syscall overheads. I wanted to move 1.5 million packets per second through a TCP socket, and I wanted to do it on consumer hardware.
In that world, i.e., the world of high-performance caching and distributed databases, you are permitted a certain degree of engineering laxity. We operate under the doctrine of "Good Enough."
- If a log buffer doesn't flush to disk the exact millisecond a write happens? It's fine.
- If a packet gets dropped during a micro-burst network partition? We call it "Eventual Consistency."
- If the clock skews by 5 milliseconds between nodes? We use NTP and shrug.
Speed is the god we worship in Web2 systems. And occasionally, Accuracy is the sacrifice we make on its altar. We accept that 99.999% is not 100%, and we build retry logic to hide the difference.
But then I started looking at Ethereum.
2. The Paranoia of the Machine
Ethereum does NOT care about your benchmark scores.
Ethereum does NOT care if you can handle 1 million requests per second. (In fact, the EVM manages a pitiful, almost laughable 15-30 transactions per second).
Ethereum cares about one thing, and one thing only: Truth.
The "World State" of Ethereum is a single, canonical, immutable mapping of 180 million accounts to their balances, nonces, and storage roots. This state is not stored in a central server. It is replicated across thousands of nodes, from massive Geth archive clusters in AWS data centers to Raspberry Pis humming in basements in Berlin and laptops in coffee shops in Tokyo.
Every single one of those nodes must agree on the exact state of the world, down to the last byte.
Not "roughly agree." Not "eventually agree." Exactly agree.
- If my balance is
100 ETHand your node calculates it as100.000000000000000001 ETH, we have a HUGE failure. - If a single bit flips in the storage root of a smart contract, the SHA-3 hash of that root changes.
- If the storage hash changes, the Account Hash changes.
- If the Account Hash changes, the State Root changes.
- If the State Root changes, the Block Hash changes.
- And if the Block Hash changes, the chain splits.
We have created a hard fork. The consensus breaks. The money vanishes. The universe fractures into two competing realities.
This realization terrified me.
In Rusty Redis, a bug meant a user got an error message. In Ethereum, a bug in the state trie implementation means you have forked the network and lost consensus with the rest of the planet. The stakes are existential.
3. The Black Box Problem
For 99% of blockchain developers, the "State" is a black box.
We are what I call "JSON-RPC Script Kiddies."
We use libraries like ethers.rs, viem, or web3.js. We call functions like provider.getBalance(address). We get a number back. We trust the number.
We build Multi-Sig wallets, DeFi protocols, and NFT marketplaces on top of this infrastructure, assuming that the number the node gave us is correct.
But we never care to ask the real question: why do we trust it?
We are told that "Blockchains are trustless." But if you are querying an Infura node or an Alchemy node, and you are just blindly accepting the JSON response they send you, you aren't being trustless. You are trusting a centralized service provider. You are trusting that Mr. Jeff Bezos's servers (AWS) are running the code correctly.
I realized that despite writing smart contracts and using these tools for years, 99% of people didn't actually understand the mechanism of verification. When I first learnt about it, I treated the EVM State like magic.
I knew it involved "Merkle Trees." I knew it involved "Hashes." I knew it involved "Keccak-256." But I didn't know how the machine actually worked. I couldn't explain how a balance of 10 ETH is physically represented on the disk drive of a node.
4. The Yellow Paper Trauma
And then there is Vitalik Buterin and Dr. Gavin Wood.
When you read the early Ethereum literature, specifically the Yellow Paper, you realize that these people didn't just build a database. They built a mathematical proof of existence.
And when I first read this, the level of density was shocking. It describes the system not in terms of code, but in terms of mathematical relations that feel alien to a modern software engineer.
The Yellow Paper is not a technical spec in the way a REST API documentation is. It is a dense, academic paper filled with Greek notation. It describes the system not in terms of code, but in terms of mathematical relations.
The data structure they chose to back this system, i.e., the Modified Merkle Patricia Trie, is not something you find in a standard CS textbook or lecture. You cannot just pip install it.
It is a "Frankenstein's monster" of data structures.
It takes the cryptographic security of a Merkle Tree, blends it with the path compression of a Patricia Trie (to save space), forces it into a Base-16 (Hexary) structure to optimize for disk depth, and then wraps the whole thing in a custom serialization format (RLP) just to make it as annoying as possible to implement.
It is elegant. It is genius. And trust me when I say this, it is a pain in the ass to build.
5. The Challenge
I built merkle-trie-rs because I wanted to understand the Black Box from inside.
I set a rule for myself: No Abstractions.
I would not use the eth-trie crate. I would not look at the Geth (Go-Ethereum) source code for copy-pasting. I would not use high-level helpers.
I would open the Yellow Paper, turn to Appendix D, and I would translate the mathematical notation directly into Rust code (with a little bit of AI obviously).
I wanted to answer three questions:
- How does Vitalik think? What kind of mind designs a system where the key to a value is the path you take to find it? I wanted to reverse-engineer the thought process of the creators.
- Can I build it? Can I take one of the most notoriously complex specifications in distributed systems and implement it from scratch?
- Why Rust? Everyone says Rust is the future of blockchain (Polkadot, Solana, Near, Reth are all Rust). I did not just want to know, but I also wanted to feel the why. I wanted to see if Rust's strict ownership model (which usually fights you) would actually become a superpower when dealing with recursive, cryptographic structures.
6. The Weapon of Choice (Why Rust?)
In C or C++, implementing a Merkle Trie is a minefield of segfaults and dangling pointers. You have nodes pointing to nodes pointing to nodes. Who deletes them? When?
In Go (where Geth is written), you have a Garbage Collector. It's easier, but you lose that fine-grained control over memory layout, and the Garbage Collector pauses can kill your latency during high-load block validation.
Rust offers a third way.
The Enum.
Rust's Enums are algebraic data types. They are not just C-style integers. They can hold data. They can be recursive.
The ability to define the World State as:
enum Node {
Null,
Leaf {
key: Vec<u8>,
value: Vec<u8>,
},
Extension {
prefix: Vec<u8>,
next: Box<Node>,
},
Branch {
children: [Box<Node>; 16],
value: Option<Vec<u8>>,
},
}
is a superpower. It allows the compiler to force you to handle every single state. You cannot forget to handle the Null case. You cannot accidentally access a child that doesn't exist. The compiler forces you to cover the entire mathematical surface area of the Trie.
What followed was absolute obsession.
I dealt with bitwise offsets. I dealt with recursive infinite types. I fought with the borrow checker over who owns a node that doesn't exist yet. I debugged hash mismatches where a single byte was off because I encoded a length prefix as 0x81 instead of 0x80 (cons of trying to be fast when accuracy matters).
But at the end of it, I had something that didn't just store data.
I had a system that could prove, with mathematical certainty, that the data it held was true.
This is the story of how I built it.
And more importantly, this is the manual on how you can understand it, and hopefully even build it by yourself.
Welcome to the state of the machine.
Navigation
Next: Part I: The Alphabet (Nibbles) - Begin your journey with the physics of the hexary constraint and the bitwise surgery of nibbles.
Repository: github.com/bit2swaz/merkle-trie-rs
~ @bit2swaz
