The State of the Machine - Part IV: The Surgical Strike (Insertion)
"When we 'insert' into a Merkle Trie, we are not modifying the existing tree. We are cloning the path we touch, creating a parallel reality, and returning a new Root that points to this new reality."
Series Navigation:
- Introduction
- Part I: The Alphabet (Nibbles)
- Part II: The Four Horsemen (Nodes)
- Part III: The Serialization Trap (RLP)
- Part IV: The Surgical Strike (Insertion) <- You are here
- Part V: The Witness (Merkle Proofs)
PART IV: THE SURGICAL STRIKE
Chapter 10: The Theory of Mutation (Recursive Surgery)
In a mutable world (like C or Python), inserting into a tree is intuitive. You walk down to the node, change node.value = "new", and you're done. The memory address stays the same. The parent pointer stays the same.
In the Merkle world, this is forbidden.
If you change a Leaf, its Hash changes.
If its Hash changes, the Parent must point to a new Hash.
This means the Parent effectively becomes a new node.
This ripples all the way up.
When we "insert" into a Merkle Trie, we are not modifying the existing tree. We are cloning the path we touch, creating a parallel reality, and returning a new Root that points to this new reality. The old nodes (that weren't touched) are shared between the old and new trees. This is called a Persistent Data Structure (or Structural Sharing).
1. The Rust Ownership Challenge
In Rust, this concept of "Structural Sharing" usually implies Rc (Reference Counting).
However, for this implementation, we are keeping it simple: Ownership Passing.
Our insert function will consume the old node (taking ownership), perform the surgery, and return the new node.
fn insert(old_node: Node) -> Node
This aligns perfectly with Rust's affine types.
- If we recurse, we take the child, modify it, and put it back.
- If we split, we destroy the old node and return a new structure.
2. The Algorithm Signature
Let's define the recursive engine.
We place this in src/trie.rs.
impl EthTrie {
pub fn insert(&mut self, key: &[u8], value: &[u8]) {
let nibbles = Nibbles::from_raw(key, false);
let nibbles_vec = nibbles.as_slice().to_vec();
self.root = Box::new(Self::insert_at(*self.root.clone(), &nibbles_vec, value.to_vec()));
}
/// The Recursive Core
/// Takes ownership of 'node', returns the new version of 'node'
fn insert_at(node: Node, path: &[u8], value: Vec<u8>) -> Node {
match node {
Node::Null => {
// CASE 1: The Void
// If we are inserting into nothing, we create something.
Node::Leaf {
key: path.to_vec(),
value,
}
}
Node::Leaf { key, value: leaf_value } => {
// CASE 2: The Collision (Leaf)
// We hit an existing leaf. Do we update it? Or split it?
// (Implementation details shown in Chapter 11)
todo!("See full implementation below")
}
Node::Extension { prefix, next } => {
// CASE 3: The Collision (Extension)
// We hit a shortcut. Does it match our key?
// (Implementation details shown in Chapter 11)
todo!("See full implementation below")
}
Node::Branch { mut children, value: branch_value } => {
// CASE 4: The Fork
// We just need to pick the right lane and recurse.
if path.is_empty() {
Node::Branch {
children,
value: Some(value),
}
} else {
let idx = path[0] as usize;
children[idx] = Box::new(Self::insert_at(
*children[idx].clone(),
&path[1..],
value,
));
Node::Branch {
children,
value: branch_value,
}
}
}
}
}
}
This match statement is the map of our logic. Let's tackle the easy cases first.
3. Case 1: Inserting into the Void (Null)
This is the "Genesis" moment.
The tree is empty (or this branch is empty). We want to insert Path: [A, B], Value: "Data".
We simply create a new Leaf Node.
Node::Null => {
Node::Leaf {
key: path.to_vec(),
value,
}
}
Complexity: O(1).
Result: A new universe is born.
4. Case 2: The Fork in the Road (Branch)
We arrive at a Branch Node.
The path we want to insert is [A, B, C].
The Branch Node is a switchboard. It asks: "What is your first nibble?"
Answer: A (10).
We don't need to change the Branch Node itself. We just need to update Index 10.
We extract the child at Index 10, recursively call insert_at on it with the rest of the path ([B, C]), and plug the result back in.
Node::Branch { mut children, value: branch_value } => {
// Edge Case: If the path is empty, it means we are updating the value
// of the Branch Node itself (e.g. key "do" vs "dog").
if path.is_empty() {
Node::Branch {
children,
value: Some(value),
}
} else {
// Standard Case: Recurse into a child
let idx = path[0] as usize;
children[idx] = Box::new(Self::insert_at(
*children[idx].clone(),
&path[1..],
value,
));
Node::Branch {
children,
value: branch_value,
}
}
}
5. The Cliffhanger
Cases 1 and 2 are easy. They are "Add" and "Follow."
But what happens when we Collide?
Scenario:
- Existing Node:
Leaf { path: "dog", value: "puppy" } - Insert:
Key: "do", Value: "verb"
We are standing on a Leaf. The Leaf says "I am dog." We say "I am do."
We share a prefix (do), but we diverge.
We cannot just "add" to the Leaf. A Leaf cannot have children.
We have to Destroy the Leaf.
We have to take its atoms ("dog", "puppy") and rearrange them into a new structure:
Extension("do") -> Branch -> Leaf("g", "puppy").
This operation, The Split, is the heart of the Merkle Patricia Trie complexity. It requires precise nibble math and structural reallocation.
In the next chapter, we will perform this surgery. We will implement insert_into_leaf and insert_into_extension.
Chapter 11: The Collision (Splitting Nodes)
We are now deep in the operating room.
We have an existing node that claims ownership of a path.
- Existing Leaf:
Path: "dog",Value: "puppy"
We have an incoming insertion that conflicts with it. - Insert:
Key: "dot",Value: "point"
They share a prefix (do), but they diverge at the third letter (g vs t).
We cannot simply "add" dot to dog. A Leaf node cannot have children. It is terminal.
We must perform a Split Operation.
We need to take the existing Leaf, break it apart, and construct a new subgraph that preserves the shared history while separating the future.
The Goal Structure:
- Extension Node (
do) -> The shared history. - Branch Node -> The divergence point.
- Leaf A (
g) -> "puppy" (The old data). - Leaf B (
t) -> "point" (The new data).
1. The Common Prefix Logic
The first step in both Leaf and Extension collisions is to measure how much they agree.
// Inside insert_into_leaf or insert_into_extension
let match_len = path.common_prefix(&existing_path);
This match_len tells us where to make the cut.
- If
match_len == 0: We need a Branch immediately at the root. - If
match_len > 0: We need an Extension node to cover the shared part, followed by a Branch.
2. Splitting the Leaf (insert_into_leaf)
This is the code that handles the dog vs dot scenario.
Node::Leaf { key: leaf_key, value: leaf_value } => {
// Step 1: Find the Split Point
let common_len = Self::common_prefix_len(&leaf_key, path);
// CASE A: Perfect Match (Update)
// The key is identical to the existing leaf. Just update the value.
if common_len == leaf_key.len() && common_len == path.len() {
return Node::Leaf {
key: leaf_key,
value,
};
}
// CASE B: The Split
// We need to create a Branch Node at the point of divergence.
// 1. Create the empty Branch
let mut children: [Box<Node>; 16] = [
Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null),
Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null),
Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null),
Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null),
];
let mut branch_value = None;
// 2. Handle the Old Leaf Data
if common_len == leaf_key.len() {
// The old leaf ended exactly here. (e.g. Existing: "do", Insert: "dog")
// The old value goes into the Branch Value slot.
branch_value = Some(leaf_value);
} else {
// The old leaf continues past the split. (e.g. Existing: "dog", Insert: "do")
// We need to put the old data into a child of the branch.
// Determine the divergence nibble (e.g. 'g' in "dog")
let index = leaf_key[common_len] as usize;
// Create a shortened leaf for the remainder
let new_leaf = Node::Leaf {
key: leaf_key[common_len + 1..].to_vec(),
value: leaf_value,
};
children[index] = Box::new(new_leaf);
}
// 3. Handle the New Insert Data
if common_len == path.len() {
// The new key ends exactly here. (e.g. Insert: "do", Existing: "dog")
branch_value = Some(value);
} else {
// The new key continues. (e.g. Insert: "dot", Existing: "dog")
let index = path[common_len] as usize;
let new_leaf = Node::Leaf {
key: path[common_len + 1..].to_vec(),
value,
};
children[index] = Box::new(new_leaf);
}
// 4. Construct the Branch
let branch = Node::Branch {
children,
value: branch_value,
};
// 5. The "Extension Wrapper"
// If they shared a prefix (e.g. "do"), we must wrap the branch in an Extension.
if common_len > 0 {
Node::Extension {
prefix: path[..common_len].to_vec(),
next: Box::new(branch),
}
} else {
branch
}
}
This function is a masterpiece of case handling. It handles:
- Updates (Match == Length)
- Prefix Inserts (Match == Path Length)
- Superfix Inserts (Match == Leaf Length)
- Divergence (Match < Both Lengths)
3. Splitting the Extension (insert_into_extension)
Splitting an Extension is similar, but slightly trickier because an Extension doesn't hold a value. Instead, it holds a Next Node.
When we split an Extension, we might be cutting the bridge in half.
- Existing:
Extension("1234") -> Child - Insert:
Key "12"
We match 2 nibbles (12).
We must slice the Extension into:
Extension("12") -> Branch -> Extension("34") -> Child.
Node::Extension { prefix, next } => {
let common_len = Self::common_prefix_len(&prefix, path);
// CASE A: Full Traversal
// The new key matches the entire extension prefix.
// We just recurse into the 'next' node.
if common_len == prefix.len() {
let remaining = &path[common_len..];
return Node::Extension {
prefix,
next: Box::new(Self::insert_at(*next, remaining, value)),
};
}
// CASE B: The Split
// We diverged inside the extension path.
let shared = prefix[..common_len].to_vec();
let ext_remainder = &prefix[common_len..];
let path_remainder = &path[common_len..];
// 1. Create the Branch at the split point
let mut children: [Box<Node>; 16] = [
Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null),
Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null),
Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null),
Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null), Box::new(Node::Null),
];
// 2. Handle the "Old Path" (The rest of the extension)
if ext_remainder.len() == 1 {
// The extension ended right after this nibble.
children[ext_remainder[0] as usize] = next;
} else {
// We still have more extension path left.
children[ext_remainder[0] as usize] = Box::new(Node::Extension {
prefix: ext_remainder[1..].to_vec(),
next,
});
}
// 3. Handle the "New Path" (The inserted key)
if path_remainder.is_empty() {
// Insert key ends at the branch
let branch = Node::Branch {
children,
value: Some(value),
};
if common_len > 0 {
return Node::Extension {
prefix: shared,
next: Box::new(branch),
};
}
return branch;
} else {
// Insert key continues
children[path_remainder[0] as usize] = Box::new(Node::Leaf {
key: path_remainder[1..].to_vec(),
value,
});
}
// 4. Construct the Branch
let branch = Node::Branch {
children,
value: None,
};
// 5. Wrap in Extension (Shared Prefix)
if common_len > 0 {
Node::Extension {
prefix: shared,
next: Box::new(branch),
}
} else {
branch
}
}
4. And We're Done
We have now implemented all 4 cases of insertion.
- Null: Create Life.
- Branch: Follow the Path.
- Leaf: Split and Evolve.
- Extension: Cut and Bridge.
This recursive logic ensures that no matter what key you throw at the Trie, short, long, overlapping, distinct, the structure will morph to accommodate it perfectly.
And because every function returns a Node, the root hash of the tree naturally updates as the recursion unwinds.
We have a working, mutable, verify-in-memory database.
But there is one feature left. The feature that makes this useful for the actual blockchain.
A Full Node has the whole database. A Light Client (your phone) has nothing.
How does the Full Node prove to your phone that dog equals puppy?
It needs to send a Witness.
It needs to send a Merkle Proof.
In Part V, we build the final feature.
Next: Part V: The Witness (Merkle Proofs) - Learn how to generate cryptographic proofs for light clients.
Repository: github.com/bit2swaz/merkle-trie-rs
