| Priority | What It Does | Reth Component |
|---|---|---|
| 1. BAL → SR | Start state root computation immediately using BAL's final values | crates/trie/parallel/ |
| 2. BAL → Cache | Populate execution cache with exact slots from BAL | crates/engine/tree/...rs |
| 3. BAL → Parallel Exec | Group non-conflicting txs using BAL access info | crates/engine/tree/...essor/ |
The key insight: BAL transforms execution from discovery to verification. Instead of "execute to find out what happens," it becomes "verify that what BAL says happened is correct."
A Block Access List is a "cheat sheet" attached to each block that tells you exactly what state (accounts, storage slots) the block will touch and what the final values will be — before you execute anything.
Current Ethereum execution is blind:
┌─────────────────────────────────────────────────────────────────────┐
│ Block arrives with transactions │
│ │
│ tx1: Call contract 0xABC... │
│ tx2: Transfer to 0xDEF... │
│ tx3: Call contract 0x123... │
│ │
│ Question: What storage slots will these touch? │
│ Answer: ¯\_(ツ)_/¯ You have to execute to find out │
└─────────────────────────────────────────────────────────────────────┘
This creates three major bottlenecks that BAL solves.
A BAL is an RLP-encoded data structure appended to each block:
Block
├── Header (with new field: block_access_list_hash)
├── Transactions
├── Withdrawals
└── BlockAccessList ◄── NEW
├── Account 0xAAA...
│ ├── balance_changes: [(tx_idx: 1, value: 5 ETH), (tx_idx: 3, value: 4.5 ETH)]
│ ├── nonce_changes: [(tx_idx: 1, value: 42)]
│ ├── storage_writes: [(slot: 0x01, tx_idx: 2, value: 0x1234)]
│ └── storage_reads: [slot: 0x02, slot: 0x03]
├── Account 0xBBB...
│ └── ...
└── Account 0xCCC...
└── ...
Every access is tagged with a block access index indicating when it happened. This ordering lets you reconstruct exact state at any point during execution.
From EIP-7928, the RLP schema:
BlockAccessList := RLP([
AccountEntry_1,
AccountEntry_2,
...
])
AccountEntry := RLP([
address, # 20 bytes
storage_writes, # List of (slot, block_access_index, value)
storage_reads, # List of slots (no values — just "was read")
balance_changes, # List of (block_access_index, post_tx_balance)
nonce_changes, # List of (block_access_index, post_tx_nonce)
code_changes # List of (block_access_index, code_bytes)
])
Writes include the final value:
storage_writes: [
(slot: 0x01, tx_idx: 2, value: 0xABCD), # Slot 0x01 set to 0xABCD by tx 2
(slot: 0x01, tx_idx: 5, value: 0x0000), # Slot 0x01 zeroed by tx 5
]
Reads are just slot identifiers (value comes from pre-state or prior write):
storage_reads: [0x02, 0x03, 0x04] # These slots were read but not modified
Why the distinction?
- Writes: You need the value to compute state root
- Reads: You just need to know what to prefetch from DB
Without BAL: With BAL:
Execute tx1 ─┐ Parse BAL
│ │ │
▼ │ ▼
SLOAD 0x01 ──┼──► Cache miss! Know: [0x01, 0x02, 0x03, 0x07]
│ │ Fetch from disk │
▼ │ ▼
SLOAD 0x02 ──┼──► Cache miss! Prefetch ALL in parallel
│ │ Fetch from disk │
▼ │ ▼
Execute tx2 ─┘ Execute ──► 100% cache hits
BAL tells you exactly which txs touch which state:
BAL Analysis:
┌─────────────────────────────────────────────────────────────────────┐
│ tx1: reads [0x01], writes [0x02] │
│ tx2: reads [0x05], writes [0x06] ──► DISJOINT! Parallelize │
│ tx3: reads [0x02], writes [0x07] ──► Conflicts with tx1 │
│ tx4: reads [0x08], writes [0x09] ──► DISJOINT! Parallelize │
└─────────────────────────────────────────────────────────────────────┘
Execution schedule:
Time ────────────────────────────►
┌──────┐ ┌──────┐
│ tx1 │ │ tx3 │ (sequential - tx3 reads tx1's write)
└──────┘ └──────┘
┌──────┐
│ tx2 │ (parallel with tx1)
└──────┘
┌──────┐
│ tx4 │ (parallel with tx1)
└──────┘
Per EIP-7928: 60-80% of transactions access disjoint state — huge parallelization opportunity.
The state root only cares about what changed and final values. BAL has both:
Without BAL: With BAL:
Execute tx1 Parse BAL ──► Start SR immediately
│ │ │
Execute tx2 Execute tx1 │
│ │ │
Execute tx3 Execute tx2 SR computing
│ │ in parallel
... wait ... Execute tx3 │
│ ▼ ▼
▼ Verify SR matches ◄─┘
Start SR computation
│
▼
Finish SR
BAL comes from the block builder. How do we trust it?
Answer: We don't — we verify.
fn validate_block_with_bal(block: &Block, bal: &BlockAccessList) -> Result<()> {
// 1. Verify BAL hash matches header
let computed_hash = keccak256(bal.rlp_encode());
assert_eq!(computed_hash, block.header.block_access_list_hash);
// 2. Execute block, collect actual accesses
let (result, actual_accesses) = execute_and_trace(block)?;
// 3. Verify BAL matches what actually happened
assert_eq!(actual_accesses, bal); // Must match exactly
Ok(())
}If BAL is wrong:
- Missing entry → Block invalid
- Extra entry → Block invalid
- Wrong value → Block invalid
The builder must provide an accurate BAL or the block is rejected.
From EIP-7928 empirical data:
| Metric | Size |
|---|---|
| Average (compressed) | ~40 KiB |
| Worst case | < 1 MiB |
This is smaller than worst-case calldata today, so network overhead is acceptable.
┌─────────────────────────────────────────────────────────────────────┐
│ EXECUTOR │
│ (follower node, syncing node) │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ INPUT: Block + BAL │
│ KNOWS: Everything upfront │
│ CAN: Prefetch, parallelize, early SR │
│ TASK: Verify execution matches BAL │
│ │
│ ══════════════════════════════════════════════════════════════ │
│ BAL transforms execution from DISCOVERY to VERIFICATION │
│ ══════════════════════════════════════════════════════════════ │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ BUILDER │
│ (block producer, MEV searcher) │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ INPUT: Mempool transactions │
│ KNOWS: Nothing upfront │
│ MUST: Execute to discover accesses │
│ TASK: Create block AND generate BAL │
│ │
│ ══════════════════════════════════════════════════════════════ │
│ BAL provides ZERO benefit — builder has the REVERSE problem │
│ ══════════════════════════════════════════════════════════════ │
│ │
│ Sophisticated builders use: │
│ • Speculative execution with rollback │
│ • Conflict graphs for tx ordering │
│ • Parallel simulation + state merging │
│ │
└─────────────────────────────────────────────────────────────────────┘
BAL flows through the consensus-execution layer boundary:
Consensus Layer (Beacon Chain)
│
│ engine_newPayloadV5(payload, block_access_list)
▼
Execution Layer (reth)
│
├──► Validate BAL hash matches header
├──► Use BAL for prefetching
├──► Use BAL for parallel execution
├──► Use BAL for early SR
├──► Execute and verify BAL accuracy
│
▼
Return validity
New Engine API methods:
engine_newPayloadV5— includes BAL in requestengine_getPayloadV6— builder returns payload with BAL
Look at what ParallelStateRoot needs to start computing:
// From root.rs:87-93
let storage_root_targets = StorageRootTargets::new(
self.prefix_sets
.account_prefix_set
.iter()
.map(|nibbles| B256::from_slice(&nibbles.pack())), // Which accounts changed
self.prefix_sets.storage_prefix_sets, // Which storage slots changed
);That's it. The SR computation only needs to know which accounts and storage slots changed.
From EIP-7928, a BAL contains exactly this information:
- All accounts accessed/modified
- All storage slots accessed/modified
- Post-execution values for everything
// BAL → TriePrefixSets is almost 1:1
fn bal_to_prefix_sets(bal: &BlockAccessList) -> TriePrefixSets {
let mut account_prefix_set = PrefixSetMut::default();
let mut storage_prefix_sets = B256Map::default();
let mut destroyed_accounts = B256Set::default();
for entry in bal.entries() {
let hashed_addr = keccak256(entry.address);
account_prefix_set.insert(Nibbles::unpack(hashed_addr));
if entry.is_destroyed() {
destroyed_accounts.insert(hashed_addr);
}
if !entry.storage_changes.is_empty() {
let mut storage_set = PrefixSetMut::default();
for slot in entry.storage_changes.keys() {
storage_set.insert(Nibbles::unpack(keccak256(slot)));
}
storage_prefix_sets.insert(hashed_addr, storage_set.freeze());
}
}
TriePrefixSets {
account_prefix_set: account_prefix_set.freeze(),
storage_prefix_sets,
destroyed_accounts,
}
}That's basically the entire integration. Parse BAL → build TriePrefixSets → feed to existing ParallelStateRoot.
┌─────────────────────────────────────────────────────────────────────┐
│ CURRENT FLOW │
└─────────────────────────────────────────────────────────────────────┘
Execute Block ──► Collect state changes ──► Build HashedPostState
│
▼
construct_prefix_sets()
│
▼
TriePrefixSets
│
▼
ParallelStateRoot::new()
│
▼
Compute SR
┌─────────────────────────────────────────────────────────────────────┐
│ WITH BAL │
└─────────────────────────────────────────────────────────────────────┘
Parse BAL ──► bal_to_prefix_sets() ──► TriePrefixSets
│
▼
ParallelStateRoot::new() ◄── SAME CODE
│
▼
Compute SR
(can happen BEFORE or IN PARALLEL with execution)
The downstream code (ParallelStateRoot) doesn't change at all. You just provide the TriePrefixSets from a different source.
- No algorithm changes: The trie walking, hash building, parallel storage root computation - all stays the same
- Interface already exists: TriePrefixSets is already the input type - BAL just provides an alternative source
- BAL has MORE info than needed: BAL includes final values, but for SR we only need "what changed" (the prefix sets). Less parsing required.
- Already parallel: The spawn_blocking infrastructure for parallel storage roots just works - you're giving it the same data structure
Compare this to Priority 2 (cache population) which requires hooking into the execution cache system, or Priority 3 (parallel execution) which requires conflict detection and transaction grouping - those touch many more components.
It's simple because:
- The interface already exists (TriePrefixSets)
- BAL provides a superset of what's needed
- Zero changes to SR computation logic - just a new data source
- One small conversion function bridges BAL → existing types
This section covers how to achieve conflict-free parallel execution using BAL.
┌─────────────────────────────────────────────────────────────────────┐
│ TRADITIONAL EXECUTION │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ tx1 ──► tx2 ──► tx3 ──► tx4 ──► tx5 │
│ │
│ Each tx commits state, next tx reads committed state │
│ MUST be sequential — don't know dependencies │
│ │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ BAL-ENABLED PARALLEL EXECUTION │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ BAL tells us: tx1 writes slot A, tx3 reads slot A │
│ tx2, tx4, tx5 are independent │
│ │
│ ┌─────┐ │
│ │ tx1 │────────────┐ │
│ └─────┘ ▼ │
│ ┌─────┐ ┌─────┐ │
│ │ tx2 │ │ tx3 │ (tx3 gets tx1's output as input) │
│ └─────┘ └─────┘ │
│ ┌─────┐ │
│ │ tx4 │ │
│ └─────┘ │
│ ┌─────┐ │
│ │ tx5 │ │
│ └─────┘ │
│ │
│ tx2, tx4, tx5 run in parallel with tx1 │
│ │
└─────────────────────────────────────────────────────────────────────┘
The key insight is that each transaction can be modeled as a pure function:
Transaction = f(input_set) → output_set
For transaction N, the input is:
struct TxInput {
// Parent state (from DB) for slots NOT modified by earlier txs
parent_state: HashMap<(Address, Slot), Value>,
// Values from earlier txs in THIS block that we depend on
intra_block_dependencies: HashMap<(Address, Slot), Value>,
// Account state (balance, nonce) as modified by earlier txs
account_state: HashMap<Address, Account>,
}struct TxOutput {
// Storage slots we wrote
storage_writes: HashMap<(Address, Slot), Value>,
// Account changes (balance, nonce, code)
account_changes: HashMap<Address, AccountDiff>,
// Storage slots we read (but didn't write)
storage_reads: HashSet<(Address, Slot)>,
}BAL tracks changes with transaction indices:
BAL Entry for Account 0xAAA:
storage_writes:
- slot 0x01: tx_idx=2, value=0x1234 ◄── tx2 wrote this
- slot 0x01: tx_idx=5, value=0x5678 ◄── tx5 overwrote it
- slot 0x02: tx_idx=3, value=0xABCD ◄── tx3 wrote this
balance_changes:
- tx_idx=1, value=5 ETH ◄── tx1 changed balance
- tx_idx=4, value=4.5 ETH ◄── tx4 changed balance
From this, you can derive:
- tx3's input for slot 0x01: value written by tx2 (0x1234)
- tx5's input for slot 0x01: value written by tx2 (0x1234) — tx5 sees tx2's write
- tx4's input for balance: value from tx1 (5 ETH)
The approach is "folding the account changeset with transaction indexes":
fn derive_tx_inputs(bal: &BlockAccessList, parent_state: &State) -> Vec<TxInput> {
let mut tx_inputs = vec![TxInput::default(); num_txs];
for account_entry in bal.entries() {
// Track the "current" value as we process tx indices
let mut current_values: HashMap<Slot, Value> = HashMap::new();
// Sort writes by tx index
let sorted_writes = account_entry.storage_writes
.sorted_by_key(|w| w.tx_idx);
for write in sorted_writes {
// What was the value BEFORE this tx wrote?
let input_value = current_values
.get(&write.slot)
.cloned()
.unwrap_or_else(|| parent_state.get(account, write.slot));
// This tx's input includes this slot's prior value
tx_inputs[write.tx_idx].insert(account, write.slot, input_value);
// Update current value for subsequent txs
current_values.insert(write.slot, write.value);
}
}
tx_inputs
}Parent state: slot 0x01 = 100
BAL writes for slot 0x01:
tx2: value=200
tx5: value=300
Derived inputs:
┌─────────────────────────────────────────────────────────────────────┐
│ tx1: input[0x01] = 100 (from parent) │
│ tx2: input[0x01] = 100 (from parent, tx2 is first to write) │
│ tx3: input[0x01] = 200 (from tx2's write) │
│ tx4: input[0x01] = 200 (from tx2's write) │
│ tx5: input[0x01] = 200 (from tx2's write, tx5 will overwrite) │
└─────────────────────────────────────────────────────────────────────┘
The critical part — we don't trust the BAL, we verify it:
fn validate_tx_against_bal(
tx_idx: usize,
actual_output: TxOutput,
expected_from_bal: TxOutput,
) -> Result<(), InvalidBal> {
// 1. All storage writes must match
for (slot, expected_value) in expected_from_bal.storage_writes {
let actual_value = actual_output.storage_writes.get(&slot)
.ok_or(InvalidBal::MissingWrite(slot))?;
if actual_value != expected_value {
return Err(InvalidBal::ValueMismatch { slot, expected, actual });
}
}
// 2. No unexpected writes
for slot in actual_output.storage_writes.keys() {
if !expected_from_bal.storage_writes.contains_key(&slot) {
return Err(InvalidBal::SpuriousWrite(slot));
}
}
// 3. Account modifications must match
// (similar checks for balance, nonce, code)
// 4. Reads validation (caveat - see below)
// ...
Ok(())
}This is subtle and important:
Problem: BAL tracks reads at the account level, not per-transaction.
BAL Entry for Account 0xAAA:
storage_reads: [0x05, 0x06, 0x07] ◄── No tx index! Just "these were read"
Why this matters: We can't validate reads per-transaction. We can only validate that all reads across the entire block are accounted for.
fn validate_reads(bal: &BlockAccessList, all_tx_outputs: &[TxOutput]) -> Result<()> {
// Collect ALL reads from ALL transactions
let actual_reads: HashSet<(Address, Slot)> = all_tx_outputs
.iter()
.flat_map(|output| output.storage_reads.iter())
.collect();
// Collect expected reads from BAL
let expected_reads: HashSet<(Address, Slot)> = bal.entries()
.flat_map(|entry| {
entry.storage_reads.iter().map(|slot| (entry.address, slot))
})
.collect();
// Must match exactly
if actual_reads != expected_reads {
return Err(InvalidBal::ReadsMismatch);
}
Ok(())
}Implication: Read validation must wait until all transactions complete.
Parallel execution can be achieved by using multiple instances of the BlockExecutor, like a pool of executors:
┌─────────────────────────────────────────────────────────────────────┐
│ EXECUTOR POOL │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Executor 1 │ │ Executor 2 │ │ Executor 3 │ │ Executor 4 │ │
│ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │
│ │ tx1 │ │ tx2 │ │ tx4 │ │ tx5 │ │
│ └─────┘ └─────┘ └─────┘ └─────┘ │
│ │ │
│ ▼ │
│ ┌─────┐ │
│ │ tx3 │ (waits for tx1's output as input) │
│ └─────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Key insight: "Transaction changes don't need to be committed across transactions"
Each executor works in isolation:
- Gets its
TxInputderived from BAL - Executes the transaction
- Produces
TxOutput - Does NOT commit to shared state
- Validation compares output to BAL expectations
No shared mutable state between executors = no locks = true parallelism.
1. PARSE BAL
└── Extract all account entries with indexed changes
2. DERIVE TX INPUTS
└── For each tx, compute its input set from:
• Parent state (for slots not modified by earlier txs)
• Earlier tx outputs (for slots modified in this block)
3. PARALLEL EXECUTION
└── Spawn executor pool
└── Each executor gets: (transaction, TxInput)
└── Each executor produces: TxOutput
└── No cross-executor state sharing
4. VALIDATE OUTPUTS
└── For each tx:
• Storage writes match BAL
• Account changes match BAL
• No spurious writes
└── After all txs:
• Validate all reads are accounted for
5. IF VALID
└── Block is valid, apply final state
IF INVALID
└── Block rejected, BAL was incorrect
The beauty: Execution becomes embarrassingly parallel because each transaction is a pure function with known inputs and expected outputs.
BAL helps executors but provides ZERO benefit to builders — it's extra work.
| Executor | Builder | |
|---|---|---|
| Has | Block + BAL (given) | Mempool of unordered txs |
| BAL is | INPUT → use to optimize | OUTPUT → must generate |
| Benefit | Prefetch, parallelize, early SR | None |
Executors know what will be accessed → can optimize
Builders must discover what will be accessed → can't optimize the same way
MEMPOOL:
┌───────────────────────────────────────────────────────────┐
│ tx1: Swap on Uniswap (touches pool state) │
│ tx2: Swap on Uniswap (SAME pool) ← CONFLICT │
│ tx3: Transfer ETH (independent) │
│ tx4: Arbitrage (Uniswap + Sushiswap) │
└───────────────────────────────────────────────────────────┘
BUILDER MUST:
1. Execute to discover conflicts (no shortcuts)
2. Order optimally (maximize MEV/tips)
3. Handle failures gracefully
4. Do this in <12 seconds
5. Track all accesses + encode BAL (extra overhead)
BAL DOESN'T HELP WITH ANY OF THIS
Beaverbuild is one of the two dominant Ethereum block builders (alongside Titan). Together they build ~90% of Ethereum blocks.
These builders solve the problem BAL can't help with:
"Given a mempool of transactions, find the optimal ordering that maximizes value while handling conflicts"
| Technique | Description |
|---|---|
| Speculative Execution | Execute txs in parallel, detect conflicts, rollback |
| Conflict Graphs | Build dependency graph, find optimal ordering |
| Parallel Simulation | Test many orderings simultaneously |
| Incremental State | Track diffs efficiently across simulations |
| MEV Extraction | Optimize for maximum extractable value |
Beaverbuild is partnering with Flashbots on BuilderNet — a decentralized block building network using TEEs (Trusted Execution Environments) to reduce block building centralization.
Simple node: Execute blocks, use BAL for speedup ✓
Sophisticated Build blocks, generate BAL as overhead,
builder: get no BAL benefits during building
To build blocks competitively, you need Beaverbuild-level infrastructure regardless of BAL. BAL just adds overhead — extra work with no building-time benefit.
┌─────────────────────────────────────────────────────────────────────┐
│ BLOCK ACCESS LIST (BAL) │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ WHAT: RLP-encoded manifest of all state access in a block │
│ │
│ CONTAINS: │
│ • Every account touched │
│ • Every storage slot read or written │
│ • Post-transaction values for all changes │
│ • Ordering via block access index │
│ │
│ ENABLES: │
│ 1. Perfect I/O prefetching (know exactly what to fetch) │
│ 2. Parallel tx execution (know which txs are independent) │
│ 3. Early state root computation (know final values upfront) │
│ │
│ TRUST MODEL: │
│ • Builder provides BAL │
│ • Executor verifies by executing │
│ • Mismatch = invalid block │
│ │
│ OVERHEAD: │
│ • ~40 KiB average (compressed) │
│ • < 1 MiB worst case │
│ • Acceptable given 10-100x execution speedup potential │
│ │
└─────────────────────────────────────────────────────────────────────┘