The Three Priorities Unlocked by BAL for Nodes

Priority	What It Does	Reth Component
1. BAL → SR	Start state root computation immediately using BAL's final values	`crates/trie/parallel/`
2. BAL → Cache	Populate execution cache with exact slots from BAL	`crates/engine/tree/...rs`
3. BAL → Parallel Exec	Group non-conflicting txs using BAL access info	`crates/engine/tree/...essor/`

The key insight: BAL transforms execution from discovery to verification. Instead of "execute to find out what happens," it becomes "verify that what BAL says happened is correct."

What is BAL?

A Block Access List is a "cheat sheet" attached to each block that tells you exactly what state (accounts, storage slots) the block will touch and what the final values will be — before you execute anything.

The Problem BAL Solves

Current Ethereum execution is blind:

┌─────────────────────────────────────────────────────────────────────┐
│  Block arrives with transactions                                    │
│                                                                     │
│  tx1: Call contract 0xABC...                                       │
│  tx2: Transfer to 0xDEF...                                         │
│  tx3: Call contract 0x123...                                       │
│                                                                     │
│  Question: What storage slots will these touch?                     │
│  Answer: ¯\_(ツ)_/¯  You have to execute to find out               │
└─────────────────────────────────────────────────────────────────────┘

This creates three major bottlenecks that BAL solves.

What BAL Contains

A BAL is an RLP-encoded data structure appended to each block:

Block
├── Header (with new field: block_access_list_hash)
├── Transactions
├── Withdrawals
└── BlockAccessList  ◄── NEW
    ├── Account 0xAAA...
    │   ├── balance_changes: [(tx_idx: 1, value: 5 ETH), (tx_idx: 3, value: 4.5 ETH)]
    │   ├── nonce_changes: [(tx_idx: 1, value: 42)]
    │   ├── storage_writes: [(slot: 0x01, tx_idx: 2, value: 0x1234)]
    │   └── storage_reads: [slot: 0x02, slot: 0x03]
    ├── Account 0xBBB...
    │   └── ...
    └── Account 0xCCC...
        └── ...

The Key Insight: Block Access Index

Every access is tagged with a block access index indicating when it happened. This ordering lets you reconstruct exact state at any point during execution.

Deep Dive: Data Structure

From EIP-7928, the RLP schema:

BlockAccessList := RLP([
    AccountEntry_1,
    AccountEntry_2,
    ...
])

AccountEntry := RLP([
    address,           # 20 bytes
    storage_writes,    # List of (slot, block_access_index, value)
    storage_reads,     # List of slots (no values — just "was read")
    balance_changes,   # List of (block_access_index, post_tx_balance)
    nonce_changes,     # List of (block_access_index, post_tx_nonce)
    code_changes       # List of (block_access_index, code_bytes)
])

Storage Writes vs Reads

Writes include the final value:

storage_writes: [
    (slot: 0x01, tx_idx: 2, value: 0xABCD),  # Slot 0x01 set to 0xABCD by tx 2
    (slot: 0x01, tx_idx: 5, value: 0x0000),  # Slot 0x01 zeroed by tx 5
]

Reads are just slot identifiers (value comes from pre-state or prior write):

storage_reads: [0x02, 0x03, 0x04]  # These slots were read but not modified

Why the distinction?

Writes: You need the value to compute state root
Reads: You just need to know what to prefetch from DB

How BAL Enables Each Optimization

1. Perfect Prefetching (Cache Population)

Without BAL:                          With BAL:
                                      
Execute tx1 ─┐                        Parse BAL
     │       │                            │
     ▼       │                            ▼
SLOAD 0x01 ──┼──► Cache miss!         Know: [0x01, 0x02, 0x03, 0x07]
     │       │    Fetch from disk          │
     ▼       │                            ▼
SLOAD 0x02 ──┼──► Cache miss!         Prefetch ALL in parallel
     │       │    Fetch from disk          │
     ▼       │                            ▼
Execute tx2 ─┘                        Execute ──► 100% cache hits

2. Parallel Execution

BAL tells you exactly which txs touch which state:

BAL Analysis:
┌─────────────────────────────────────────────────────────────────────┐
│ tx1: reads [0x01], writes [0x02]                                    │
│ tx2: reads [0x05], writes [0x06]     ──► DISJOINT! Parallelize      │
│ tx3: reads [0x02], writes [0x07]     ──► Conflicts with tx1         │
│ tx4: reads [0x08], writes [0x09]     ──► DISJOINT! Parallelize      │
└─────────────────────────────────────────────────────────────────────┘

Execution schedule:
  Time ────────────────────────────►
  
  ┌──────┐  ┌──────┐
  │ tx1  │  │ tx3  │   (sequential - tx3 reads tx1's write)
  └──────┘  └──────┘
  ┌──────┐
  │ tx2  │             (parallel with tx1)
  └──────┘
  ┌──────┐
  │ tx4  │             (parallel with tx1)
  └──────┘

Per EIP-7928: 60-80% of transactions access disjoint state — huge parallelization opportunity.

3. Early State Root Computation

The state root only cares about what changed and final values. BAL has both:

Without BAL:                          With BAL:

Execute tx1                           Parse BAL ──► Start SR immediately
    │                                      │              │
Execute tx2                           Execute tx1         │
    │                                      │              │
Execute tx3                           Execute tx2    SR computing
    │                                      │         in parallel
 ... wait ...                         Execute tx3         │
    │                                      ▼              ▼
    ▼                                 Verify SR matches ◄─┘
Start SR computation                  
    │
    ▼
Finish SR

Validation: The Trust Model

BAL comes from the block builder. How do we trust it?

Answer: We don't — we verify.

fn validate_block_with_bal(block: &Block, bal: &BlockAccessList) -> Result<()> {
    // 1. Verify BAL hash matches header
    let computed_hash = keccak256(bal.rlp_encode());
    assert_eq!(computed_hash, block.header.block_access_list_hash);
    
    // 2. Execute block, collect actual accesses
    let (result, actual_accesses) = execute_and_trace(block)?;
    
    // 3. Verify BAL matches what actually happened
    assert_eq!(actual_accesses, bal);  // Must match exactly
    
    Ok(())
}

If BAL is wrong:

Missing entry → Block invalid
Extra entry → Block invalid
Wrong value → Block invalid

The builder must provide an accurate BAL or the block is rejected.

Size Overhead

From EIP-7928 empirical data:

Metric	Size
Average (compressed)	~40 KiB
Worst case	< 1 MiB

This is smaller than worst-case calldata today, so network overhead is acceptable.

Builder vs Executor: The Asymmetry

┌─────────────────────────────────────────────────────────────────────┐
│                         EXECUTOR                                    │
│  (follower node, syncing node)                                     │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  INPUT:  Block + BAL                                               │
│  KNOWS:  Everything upfront                                         │
│  CAN:    Prefetch, parallelize, early SR                           │
│  TASK:   Verify execution matches BAL                              │
│                                                                     │
│  ══════════════════════════════════════════════════════════════    │
│  BAL transforms execution from DISCOVERY to VERIFICATION           │
│  ══════════════════════════════════════════════════════════════    │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│                         BUILDER                                     │
│  (block producer, MEV searcher)                                    │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  INPUT:  Mempool transactions                                       │
│  KNOWS:  Nothing upfront                                           │
│  MUST:   Execute to discover accesses                              │
│  TASK:   Create block AND generate BAL                             │
│                                                                     │
│  ══════════════════════════════════════════════════════════════    │
│  BAL provides ZERO benefit — builder has the REVERSE problem       │
│  ══════════════════════════════════════════════════════════════    │
│                                                                     │
│  Sophisticated builders use:                                        │
│  • Speculative execution with rollback                             │
│  • Conflict graphs for tx ordering                                 │
│  • Parallel simulation + state merging                             │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

The Engine API Integration

BAL flows through the consensus-execution layer boundary:

Consensus Layer (Beacon Chain)
        │
        │  engine_newPayloadV5(payload, block_access_list)
        ▼
Execution Layer (reth)
        │
        ├──► Validate BAL hash matches header
        ├──► Use BAL for prefetching
        ├──► Use BAL for parallel execution  
        ├──► Use BAL for early SR
        ├──► Execute and verify BAL accuracy
        │
        ▼
    Return validity

New Engine API methods:

engine_newPayloadV5 — includes BAL in request
engine_getPayloadV6 — builder returns payload with BAL

Integration Deep Dives

1. BAL → State Root (SR)

The Current Input to State Root

Look at what ParallelStateRoot needs to start computing:

// From root.rs:87-93
let storage_root_targets = StorageRootTargets::new(
    self.prefix_sets
        .account_prefix_set
        .iter()
        .map(|nibbles| B256::from_slice(&nibbles.pack())),  // Which accounts changed
    self.prefix_sets.storage_prefix_sets,                    // Which storage slots changed
);

That's it. The SR computation only needs to know which accounts and storage slots changed.

What BAL Provides

From EIP-7928, a BAL contains exactly this information:

All accounts accessed/modified
All storage slots accessed/modified
Post-execution values for everything

The Simple Mapping

// BAL → TriePrefixSets is almost 1:1

fn bal_to_prefix_sets(bal: &BlockAccessList) -> TriePrefixSets {
    let mut account_prefix_set = PrefixSetMut::default();
    let mut storage_prefix_sets = B256Map::default();
    let mut destroyed_accounts = B256Set::default();
    
    for entry in bal.entries() {
        let hashed_addr = keccak256(entry.address);
        account_prefix_set.insert(Nibbles::unpack(hashed_addr));
        
        if entry.is_destroyed() {
            destroyed_accounts.insert(hashed_addr);
        }
        
        if !entry.storage_changes.is_empty() {
            let mut storage_set = PrefixSetMut::default();
            for slot in entry.storage_changes.keys() {
                storage_set.insert(Nibbles::unpack(keccak256(slot)));
            }
            storage_prefix_sets.insert(hashed_addr, storage_set.freeze());
        }
    }
    
    TriePrefixSets {
        account_prefix_set: account_prefix_set.freeze(),
        storage_prefix_sets,
        destroyed_accounts,
    }
}

That's basically the entire integration. Parse BAL → build TriePrefixSets → feed to existing ParallelStateRoot.

Why It's "Simple" - Visual

┌─────────────────────────────────────────────────────────────────────┐
│                    CURRENT FLOW                                     │
└─────────────────────────────────────────────────────────────────────┘

  Execute Block ──► Collect state changes ──► Build HashedPostState
                                                      │
                                                      ▼
                                            construct_prefix_sets()
                                                      │
                                                      ▼
                                              TriePrefixSets
                                                      │
                                                      ▼
                                            ParallelStateRoot::new()
                                                      │
                                                      ▼
                                                 Compute SR

┌─────────────────────────────────────────────────────────────────────┐
│                    WITH BAL                                         │
└─────────────────────────────────────────────────────────────────────┘

  Parse BAL ──► bal_to_prefix_sets() ──► TriePrefixSets
                                              │
                                              ▼
                                    ParallelStateRoot::new()  ◄── SAME CODE
                                              │
                                              ▼
                                          Compute SR

  (can happen BEFORE or IN PARALLEL with execution)

The downstream code (ParallelStateRoot) doesn't change at all. You just provide the TriePrefixSets from a different source.

What Makes It Even Simpler

No algorithm changes: The trie walking, hash building, parallel storage root computation - all stays the same
Interface already exists: TriePrefixSets is already the input type - BAL just provides an alternative source
BAL has MORE info than needed: BAL includes final values, but for SR we only need "what changed" (the prefix sets). Less parsing required.
Already parallel: The spawn_blocking infrastructure for parallel storage roots just works - you're giving it the same data structure

The Only "New" Work

Compare this to Priority 2 (cache population) which requires hooking into the execution cache system, or Priority 3 (parallel execution) which requires conflict detection and transaction grouping - those touch many more components.

TL;DR

It's simple because:

The interface already exists (TriePrefixSets)
BAL provides a superset of what's needed
Zero changes to SR computation logic - just a new data source
One small conversion function bridges BAL → existing types

3. BAL → Parallel Execution

This section covers how to achieve conflict-free parallel execution using BAL.

The Core Idea

┌─────────────────────────────────────────────────────────────────────┐
│  TRADITIONAL EXECUTION                                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  tx1 ──► tx2 ──► tx3 ──► tx4 ──► tx5                               │
│                                                                     │
│  Each tx commits state, next tx reads committed state               │
│  MUST be sequential — don't know dependencies                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────────┐
│  BAL-ENABLED PARALLEL EXECUTION                                     │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  BAL tells us: tx1 writes slot A, tx3 reads slot A                 │
│                tx2, tx4, tx5 are independent                        │
│                                                                     │
│  ┌─────┐                                                           │
│  │ tx1 │────────────┐                                              │
│  └─────┘            ▼                                              │
│  ┌─────┐         ┌─────┐                                           │
│  │ tx2 │         │ tx3 │  (tx3 gets tx1's output as input)         │
│  └─────┘         └─────┘                                           │
│  ┌─────┐                                                           │
│  │ tx4 │                                                           │
│  └─────┘                                                           │
│  ┌─────┐                                                           │
│  │ tx5 │                                                           │
│  └─────┘                                                           │
│                                                                     │
│  tx2, tx4, tx5 run in parallel with tx1                            │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Transaction Independence: Input/Output Sets

The key insight is that each transaction can be modeled as a pure function:

Transaction = f(input_set) → output_set

What's in the Input Set?

For transaction N, the input is:

struct TxInput {
    // Parent state (from DB) for slots NOT modified by earlier txs
    parent_state: HashMap<(Address, Slot), Value>,
    
    // Values from earlier txs in THIS block that we depend on
    intra_block_dependencies: HashMap<(Address, Slot), Value>,
    
    // Account state (balance, nonce) as modified by earlier txs
    account_state: HashMap<Address, Account>,
}

What's in the Output Set?

struct TxOutput {
    // Storage slots we wrote
    storage_writes: HashMap<(Address, Slot), Value>,
    
    // Account changes (balance, nonce, code)
    account_changes: HashMap<Address, AccountDiff>,
    
    // Storage slots we read (but didn't write)
    storage_reads: HashSet<(Address, Slot)>,
}

How BAL Provides This

BAL tracks changes with transaction indices:

BAL Entry for Account 0xAAA:
  storage_writes:
    - slot 0x01: tx_idx=2, value=0x1234    ◄── tx2 wrote this
    - slot 0x01: tx_idx=5, value=0x5678    ◄── tx5 overwrote it
    - slot 0x02: tx_idx=3, value=0xABCD    ◄── tx3 wrote this
  
  balance_changes:
    - tx_idx=1, value=5 ETH                ◄── tx1 changed balance
    - tx_idx=4, value=4.5 ETH              ◄── tx4 changed balance

From this, you can derive:

tx3's input for slot 0x01: value written by tx2 (0x1234)
tx5's input for slot 0x01: value written by tx2 (0x1234) — tx5 sees tx2's write
tx4's input for balance: value from tx1 (5 ETH)

Deriving Transaction-Level Input

The approach is "folding the account changeset with transaction indexes":

fn derive_tx_inputs(bal: &BlockAccessList, parent_state: &State) -> Vec<TxInput> {
    let mut tx_inputs = vec![TxInput::default(); num_txs];
    
    for account_entry in bal.entries() {
        // Track the "current" value as we process tx indices
        let mut current_values: HashMap<Slot, Value> = HashMap::new();
        
        // Sort writes by tx index
        let sorted_writes = account_entry.storage_writes
            .sorted_by_key(|w| w.tx_idx);
        
        for write in sorted_writes {
            // What was the value BEFORE this tx wrote?
            let input_value = current_values
                .get(&write.slot)
                .cloned()
                .unwrap_or_else(|| parent_state.get(account, write.slot));
            
            // This tx's input includes this slot's prior value
            tx_inputs[write.tx_idx].insert(account, write.slot, input_value);
            
            // Update current value for subsequent txs
            current_values.insert(write.slot, write.value);
        }
    }
    
    tx_inputs
}

Visual Example

Parent state: slot 0x01 = 100

BAL writes for slot 0x01:
  tx2: value=200
  tx5: value=300

Derived inputs:
┌─────────────────────────────────────────────────────────────────────┐
│  tx1: input[0x01] = 100  (from parent)                             │
│  tx2: input[0x01] = 100  (from parent, tx2 is first to write)      │
│  tx3: input[0x01] = 200  (from tx2's write)                        │
│  tx4: input[0x01] = 200  (from tx2's write)                        │
│  tx5: input[0x01] = 200  (from tx2's write, tx5 will overwrite)    │
└─────────────────────────────────────────────────────────────────────┘

Validation: Ensuring BAL is Correct

The critical part — we don't trust the BAL, we verify it:

fn validate_tx_against_bal(
    tx_idx: usize,
    actual_output: TxOutput,
    expected_from_bal: TxOutput,
) -> Result<(), InvalidBal> {
    
    // 1. All storage writes must match
    for (slot, expected_value) in expected_from_bal.storage_writes {
        let actual_value = actual_output.storage_writes.get(&slot)
            .ok_or(InvalidBal::MissingWrite(slot))?;
        
        if actual_value != expected_value {
            return Err(InvalidBal::ValueMismatch { slot, expected, actual });
        }
    }
    
    // 2. No unexpected writes
    for slot in actual_output.storage_writes.keys() {
        if !expected_from_bal.storage_writes.contains_key(&slot) {
            return Err(InvalidBal::SpuriousWrite(slot));
        }
    }
    
    // 3. Account modifications must match
    // (similar checks for balance, nonce, code)
    
    // 4. Reads validation (caveat - see below)
    // ...
    
    Ok(())
}

The Storage Read Caveat

This is subtle and important:

Problem: BAL tracks reads at the account level, not per-transaction.

BAL Entry for Account 0xAAA:
  storage_reads: [0x05, 0x06, 0x07]   ◄── No tx index! Just "these were read"

Why this matters: We can't validate reads per-transaction. We can only validate that all reads across the entire block are accounted for.

fn validate_reads(bal: &BlockAccessList, all_tx_outputs: &[TxOutput]) -> Result<()> {
    // Collect ALL reads from ALL transactions
    let actual_reads: HashSet<(Address, Slot)> = all_tx_outputs
        .iter()
        .flat_map(|output| output.storage_reads.iter())
        .collect();
    
    // Collect expected reads from BAL
    let expected_reads: HashSet<(Address, Slot)> = bal.entries()
        .flat_map(|entry| {
            entry.storage_reads.iter().map(|slot| (entry.address, slot))
        })
        .collect();
    
    // Must match exactly
    if actual_reads != expected_reads {
        return Err(InvalidBal::ReadsMismatch);
    }
    
    Ok(())
}

Implication: Read validation must wait until all transactions complete.

The Execution Abstraction

Parallel execution can be achieved by using multiple instances of the BlockExecutor, like a pool of executors:

┌─────────────────────────────────────────────────────────────────────┐
│                     EXECUTOR POOL                                   │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐   │
│  │ Executor 1 │  │ Executor 2 │  │ Executor 3 │  │ Executor 4 │   │
│  └────────────┘  └────────────┘  └────────────┘  └────────────┘   │
│        │              │              │              │              │
│        ▼              ▼              ▼              ▼              │
│     ┌─────┐       ┌─────┐       ┌─────┐       ┌─────┐            │
│     │ tx1 │       │ tx2 │       │ tx4 │       │ tx5 │            │
│     └─────┘       └─────┘       └─────┘       └─────┘            │
│        │                                                          │
│        ▼                                                          │
│     ┌─────┐                                                       │
│     │ tx3 │  (waits for tx1's output as input)                   │
│     └─────┘                                                       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Key insight: "Transaction changes don't need to be committed across transactions"

Each executor works in isolation:

Gets its TxInput derived from BAL
Executes the transaction
Produces TxOutput
Does NOT commit to shared state
Validation compares output to BAL expectations

No shared mutable state between executors = no locks = true parallelism.

Summary: The Full Flow

1. PARSE BAL
   └── Extract all account entries with indexed changes

2. DERIVE TX INPUTS
   └── For each tx, compute its input set from:
       • Parent state (for slots not modified by earlier txs)
       • Earlier tx outputs (for slots modified in this block)

3. PARALLEL EXECUTION
   └── Spawn executor pool
   └── Each executor gets: (transaction, TxInput)
   └── Each executor produces: TxOutput
   └── No cross-executor state sharing

4. VALIDATE OUTPUTS
   └── For each tx:
       • Storage writes match BAL
       • Account changes match BAL
       • No spurious writes
   └── After all txs:
       • Validate all reads are accounted for

5. IF VALID
   └── Block is valid, apply final state
   
   IF INVALID
   └── Block rejected, BAL was incorrect

The beauty: Execution becomes embarrassingly parallel because each transaction is a pure function with known inputs and expected outputs.

BAL Impact on Builders

BAL helps executors but provides ZERO benefit to builders — it's extra work.

The Core Asymmetry

	Executor	Builder
Has	Block + BAL (given)	Mempool of unordered txs
BAL is	INPUT → use to optimize	OUTPUT → must generate
Benefit	Prefetch, parallelize, early SR	None

Executors know what will be accessed → can optimize

Builders must discover what will be accessed → can't optimize the same way

What Builders Must Do

MEMPOOL:
┌───────────────────────────────────────────────────────────┐
│  tx1: Swap on Uniswap (touches pool state)               │
│  tx2: Swap on Uniswap (SAME pool) ← CONFLICT             │
│  tx3: Transfer ETH (independent)                         │
│  tx4: Arbitrage (Uniswap + Sushiswap)                    │
└───────────────────────────────────────────────────────────┘

BUILDER MUST:
1. Execute to discover conflicts (no shortcuts)
2. Order optimally (maximize MEV/tips)
3. Handle failures gracefully
4. Do this in <12 seconds
5. Track all accesses + encode BAL (extra overhead)

BAL DOESN'T HELP WITH ANY OF THIS

Beaverbuild & Sophisticated Builders

Beaverbuild is one of the two dominant Ethereum block builders (alongside Titan). Together they build ~90% of Ethereum blocks.

These builders solve the problem BAL can't help with:

"Given a mempool of transactions, find the optimal ordering that maximizes value while handling conflicts"

What They Do

Technique	Description
Speculative Execution	Execute txs in parallel, detect conflicts, rollback
Conflict Graphs	Build dependency graph, find optimal ordering
Parallel Simulation	Test many orderings simultaneously
Incremental State	Track diffs efficiently across simulations
MEV Extraction	Optimize for maximum extractable value

Recent: BuilderNet

Beaverbuild is partnering with Flashbots on BuilderNet — a decentralized block building network using TEEs (Trusted Execution Environments) to reduce block building centralization.

Bottom Line

Simple node:        Execute blocks, use BAL for speedup ✓
                 
Sophisticated       Build blocks, generate BAL as overhead,
builder:            get no BAL benefits during building

To build blocks competitively, you need Beaverbuild-level infrastructure regardless of BAL. BAL just adds overhead — extra work with no building-time benefit.

Summary: The Full Picture

┌─────────────────────────────────────────────────────────────────────┐
│                    BLOCK ACCESS LIST (BAL)                          │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  WHAT:    RLP-encoded manifest of all state access in a block      │
│                                                                     │
│  CONTAINS:                                                          │
│    • Every account touched                                          │
│    • Every storage slot read or written                            │
│    • Post-transaction values for all changes                       │
│    • Ordering via block access index                               │
│                                                                     │
│  ENABLES:                                                           │
│    1. Perfect I/O prefetching (know exactly what to fetch)         │
│    2. Parallel tx execution (know which txs are independent)       │
│    3. Early state root computation (know final values upfront)     │
│                                                                     │
│  TRUST MODEL:                                                       │
│    • Builder provides BAL                                           │
│    • Executor verifies by executing                                │
│    • Mismatch = invalid block                                      │
│                                                                     │
│  OVERHEAD:                                                          │
│    • ~40 KiB average (compressed)                                  │
│    • < 1 MiB worst case                                            │
│    • Acceptable given 10-100x execution speedup potential          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

yongkangc/bal-for-nodes.md