owned this note
owned this note
Published
Linked with GitHub
# Block-level Access Lists (BAL): Benchmark and Overhead
## TL;DR
* Evaluated six BAL designs to trade off data overhead and parallelism.
→ Type-2/5/6 enables full parallel execution; Type-3/4 offer maximum I/O reduction, Type-1 has the lowest overhead.
* I/O dominates Geth runtime (\~70%), primarily from account and storage reads.
→ Execution isn’t the bottleneck — most CPU time is spent on trie access, keccak hashing, and sender recovery.
* Implemented Type-1, Type-3, and Type-4 BAL in latest Geth.
→ Benchmarks show Type-1 cuts total block processing time by \~60%, Type-3 by \~68% and Type-4 by \~70%.
## Overview of BAL Designs
### Definitions
- Address: Account address in the BAL.
- Slot: Storage key in a contract account referenced in the BAL.
- Account value:
- Pre-block execution values: [nonce, balance, codeHash]
- Post-tx execution modified values: [nonce, balance]
- Slot value: Value of a specific storage slot.
- Pre-block execution values: Account or storage values before block execution, represented as:
- `Map[address/slot, account/storage value]`
- Post-tx execution modified values: Account or storage values modified by each transaction in the block, represented as:
- `List[tx_index, modified account and storage values]`
- Parallel I/O: Concurrent fetching of account/slot values via Account(addr) and Storage(addr, slot).
- Near-zero I/O: I/O operations are mostly eliminated, except for account/contract code reads and post-execution commits.
- Optimistic parallel execution: Executes transactions in parallel under the assumption of no conflicts, with rollbacks if conflicts are detected.
- Fully parallel execution: Parallel execution without re-executing transactions with conflicting dependencies.
- BAL validation overhead: Cost of verifying that the BAL accurately reflects access patterns and state changes during execution.
### BAL Design Variants
We explored six BAL design variants (excluding contract code for brevity):
1. **Addresses and slots**: Enables parallel I/O.
2. **Addresses and slots + post-tx modified values (EIP-7928)**: Supports parallel I/O and fully parallel execution.
3. **Addresses and slots + pre-block slot values**: Enables near-zero I/O.
4. **Addresses and slots + pre-block account and slot values**: Enables near-zero I/O.
5. **Addresses and slots + pre-block slot values + post-tx modified values**: Combines near-zero I/O, fully parallel execution, and parallel state root computation.
6. **Addresses and slots + pre-tx account and slot values**: Enables near-zero I/O and fully parallel execution.
The key difference between Type-5 and Type-6 lies in BAL validation:
- **Type-5** supports parallel validation of the provided BAL and parallel computation of the new state root. However, it requires merging the pre-block-execution values with the post-tx-execution modified values to reconstruct the prestate needed for fully parallel transaction execution.
- **Type-6** enables near-zero I/O reads but requires sequential validation, since the BAL for a transaction can only be verified after all preceding transactions have been executed.
- We expect the validation overhead difference between Type-5 and Type-6 to be minimal, since it's limited to in-memory operations.
[Overhead Table (exculdes contract code sizes), 2000 blocks:22347001-22349000](https://docs.google.com/spreadsheets/d/1W0tWBEoG-hV_REilXyu99F1WBplLBmbQscedoYGOEIA/edit?usp=sharing)

> Although contract codes were not included here, for optimistic or fully parallel execution, only newly created contract codes need to be provided. For existing contracts, only the `codeHash` is required, since parallel fetching of existing code is fast enough.
A key observation is that the **average block size overhead of Type-3 and Type-6 BAL is smaller than Type-2 (EIP-7928)**, since most account and storage writes may not be accessed by later transactions within the same block. This makes Type-6 an attractive BAL option, offering a better balance between overhead and performance.
Due to the space efficiency of Type-1, Type-3, and Type-4, as well as the complexity involved in enabling fully parallel execution for Type-2, Type-5, and Type-6 in Geth, we implemented only Type-1, Type-3, and Type-4 in latest Geth.
## Prototype Implementations of BAL in Geth 1.15.8
### Type-1 BAL
The BAL data structure:
```
var AllBlockAccessLists = map[uint64]types.AccessList{}
```
During `processBlock`, all BAL values are prefetched before transaction execution.
```go
func (p *StateProcessor) Process(block *types.Block, statedb *state.StateDB, cfg vm.Config) (*ProcessResult, error) {
...
statedb.PrefetchAccessList(block.Number().Uint64())
...
```
Prefetched states are stored into `StateDB.stateObjects[addr].originStorage`.
We first fetch accounts in parallel, then fetch storage values in parallel, thus eliminating almost all I/O reads in later execution (except for loading contract codes). (*Note: This is an initial version; parallel execution while parallel account and storage fetching optimizations have not yet been applied.*)
Snippet for parallel prefetching of accounts and slot values, and setting slot values:
```
type StateObject struct {
obj *stateObject
keys []common.Hash
}
type StorageKV struct {
obj *stateObject
key *common.Hash
val *common.Hash
}
tp := NewThreadPool(25)
tp.Start()
lenAccts := 0
lenMaxSlots := 0
accts := make(chan StateObject, len(bals))
accountReadStart := time.Now()
for _, bal := range bals {
addr := bal.Address
lenAccts++
keys := bal.StorageKeys
lenMaxSlots += len(keys)
tp.AddTask(func() {
// s.reader.Account is not thread due to keccakhash, so it's changed to AccountBAL
acct, err := s.reader.AccountBAL(addr)
if err != nil {
log.Error("fail to fetch account:", addr)
}
obj := newObject(s, addr, acct)
accts <- StateObject{obj, keys}
})
}
acctsArr := []StateObject{}
// set prefetched accounts in stateDB
for range lenAccts {
state := <-accts
acctsArr = append(acctsArr, state)
// must set it first to avoid accounts read later in storageBAL
s.setStateObject(state.obj)
}
close(accts)
s.AccountReads += time.Since(accountReadStart)
storages := make(chan *StorageKV, lenMaxSlots)
lenSlots := 0
storageReadStart := time.Now()
for _, state := range acctsArr {
obj := state.obj
keys := state.keys
addr := obj.Address()
if obj.origin == nil {
continue
}
lenSlots += len(keys)
acct := obj.origin
tr, err := trie.NewStateTrie(trie.StorageTrieID(s.originalRoot, obj.addrHash, acct.Root), s.db.TrieDB())
if err != nil {
log.Error("fail to create trie for:", addr)
panic("fail to create trie for:")
}
for _, key := range keys {
key := key
tp.AddTask(func() {
val, err := obj.db.reader.StorageBAL(addr, key, tr)
if err != nil {
log.Error("fail to fetch storage:", addr, key)
}
kv := &StorageKV{obj, &key, &val}
storages <- kv
})
}
}
tp.Stop()
s.StorageReads += time.Since(storageReadStart)
// set prefetched storage values in stateDB
for range lenSlots {
kv := <-storages
kv.obj.originStorage[*kv.key] = *kv.val
s.setStateObject(kv.obj)
}
```
### Type-3 BAL
The Type-3 implementation is largely similar to Type-2, with the key difference being that slot values are directly set in the `stateDB` without fetching them from database.
The BAL data structure:
```go
type StorageKV struct {
Key common.Hash `json:"key"`
Val common.Hash `json:"val"`
}
type AccessListKV struct {
Address common.Address `json:"address" gencodec:"required"`
Nonce uint64 `json:"nonce"`
Balance *uint256.Int `json:"balance"`
Root common.Hash `json:"root"`
CodeHash []byte `json:"codeHash"`
StorageKV []StorageKV `json:"storageKeys" gencodec:"required"`
}
```
During `processBlock`, all BAL values are prefetched before transaction execution.
Prefetched states are also stored into `StateDB.stateObjects[addr].originStorage`.
We first fetch accounts in parallel, then set corresponding storage values directly from the BAL.
Parallel prefetching accounts and setting storages snippet:
```go
tp := NewThreadPool(25)
tp.Start()
lenAccts := 0
accts := make(chan *stateObject, len(bals))
for _, bal := range bals {
addr := bal.Address
lenAccts++
kvs := bal.StorageKV
tp.AddTask(func() {
acct, err := s.reader.AccountBAL(addr)
if err != nil {
log.Error("fail to fetch account:", addr)
}
obj := newObject(s, addr, acct)
if acct != nil {
for _, kv := range kvs {
obj.originStorage[kv.Key] = kv.Val
}
}
accts <- obj
})
}
for range lenAccts {
obj := <-accts
s.setStateObject(obj)
}
```
### Type-4 BAL
The Type-4 implementation closely follows Type-3, but both account and slot values are directly set in the `stateDB` without any fetching. The BAL data structure remains the same as in Type-3.
```go
for _, bal := range bals {
addr := bal.Address
kvs := bal.StorageKV
if bal.Balance == nil {
bal.Balance = ZeroBalance
bal.Root = types.EmptyRootHash
bal.CodeHash = types.EmptyCodeHash.Bytes()
}
acct := types.StateAccount{
Nonce: bal.Nonce,
Balance: bal.Balance,
CodeHash: bal.CodeHash,
Root: bal.Root,
}
obj := newObject(s, addr, &acct)
for _, kv := range kvs {
obj.originStorage[kv.Key] = kv.Val
}
s.setStateObject(obj)
}
```
## Benchmark Results and Discussion
### Experimental Setup
- **Software**: Geth 1.15.8 with `state.scheme=path`
- **Hardware**: 32 cores, 7TB SSD, 128GB RAM
- **BAL generation and loading**:
Used `debug_traceTransaction` with `prestateTracer` to collect the BAL across 2000 blocks ([22347001, 22349000]), stored it in JSON format, and loaded it at Geth startup.
- **Benchmarking method**:
Used Geth’s `export` and `import` commands across 2000 blocks ([22347001, 22349000]).
```bash
geth export --datadir ./geth_full/snap/ dump.dat 22347001 22349000
geth --pprof.cpuprofile cpu1.prof --datadir ./geth_snap import --nocompaction true dump.dat
```
### Results
The [time cost breakdown](https://docs.google.com/spreadsheets/d/1W0tWBEoG-hV_REilXyu99F1WBplLBmbQscedoYGOEIA/edit?gid=779708818#gid=779708818) covers I/O, execution, validation and commit phases:

Key findings:
- **I/O costs for account and storage reads**, account for nearly **70%** of vanilla Geth’s total block execution time.
- Using Type-1 BAL (account addresses and storage keys only), our prototype achieved a ~60% reduction in total execution time by eliminating 85% of I/O reads. Type-4 BAL eliminates all I/O reads and achieved a ~70% reduction in total execution time. Notably, account read time is much lower than storage read time.
- In an earlier prototype (implemented 3 years ago, not listed here), **optimistic parallel execution** can reduced toal execution time by **80%**.
- As Toni pointed out, including **post-execution storage key values per transaction** would enable **fully parallel execution**, making total block execution time limited only by the slowest transaction.
- Commit time will become the main bottleneck once I/O read is eliminated and execution is fully parallelized with BAL.
We also profiled CPU time (excluding I/O), and the top 10 most time-consuming functions are shown below:

The detailed call stacks are [attached here](https://hackmd.io/_uploads/rJkuBb1bxg.svg?sanitize=true):
Observations:
- Most CPU time is spent on:
- **PathDB node fetching: 16%** (`keccak`, `atomic.(*Int32).Add(RLock)`)
- **Caching operations: 7%** (`maps.ctrlGroup.matchH2`)
- **Sender recovery: 4%** (`runtime.cgocall`)
- **Transaction execution** itself is not the critical path for performance.
## Future Works
- **Parallelize trie node fetching**: We observed that the average trie disk read depth during account and storage reads in the stateDB—excluding the depth contributed by intermediate cached nodes—is around 5. By parallelizing trie node fetching, I/O read costs could be reduced to approximately 25% of the current Geth implementation without BAL. As a result, **the performance gain from BAL alone may drop to around 15%**. It’s therefore worthwhile to benchmark Geth's performance under parallel trie fetching without BAL. It’s also worth noting that other clients (e.g., Nethermind) have adopted similar optimization approaches.

- **Experiment with fully parallel execution using BAL**: Investigate the performance impact and feasibility of fully parallel execution enabled by BAL.
## References
[1] [EIP-7928: Block-Level Access Lists](https://github.com/ethereum/EIPs/blob/a5cc19cda650213cbb74e20bcbd1e1474a842acc/EIPS/eip-7928.md)
[2] [Code for Reproducing BAL Overhead](https://github.com/dajuguan/lab/blob/main/eth/go/access_list_count.go)
[3] [Type-1, Type-3 and Type-4 BAL Implementations in Geth](https://github.com/dajuguan/go-ethereum/tree/po/bal)
[4] [Script for Generating BAL](https://github.com/dajuguan/lab/blob/main/eth/go/access_list.go)
[5] [ESP Academic Grant 2023: EthStorage/Quarkchain Earlier BAL Submission](https://docs.google.com/document/d/1FoW_iuIPoYAjxy-AUGr_vRbIK_Zon3XDAb6heS_GWL0/edit?usp=sharing)