## Concurrent read benchmarking
### Introduction
The Block-level Access List is designed to enable parallel transaction execution. By leveraging this feature, block execution time can be reduced, providing room to increase the network's gas limit without compromising block processing latency.
To ensure that the Block-level Access List provides benefits across all scenarios, we need to identify the current worst-case of Ethereum block execution. In other words, the extent to which we can safely increase the network's gas limit depends on how much BAL can improve this worst-case performance.
*The state bloat issue is not in the scope of the discusion in this writeup.*
### Background
One potential worst-case of the block prcessing currently is the read heavy block. Specifically the block full with SSLOAD opcodes.
Given that the current Ethereum block gas limit is set to 60 million, and with the introduction of EIP-7825, we can ensure that at least four transactions can be included in each block to fully utilize the available block space. Therefore, our benchmark focuses specifically on this scenario.
**Note**: It is very likely that the block gas limit will be gradually increased in the near future, and the minimum number of transactions required to fully utilize the block space will rise proportionally.
The goal of this benchmark is to assess how read throughput and latency scale with increasing concurrency, and identify the point of saturation.
More importantly, the result serves as input for determining whether specific state read locations should be included in the initial Block-level AccessList structure.
### Benchmark
The benchmark consists of two parts:
- measure the concurrent read performance over the storage devices directly
- measure the concurrent read performance over the real ethereum mainnet database
#### Concurrent read over the SSD
*hardware*
storage: samsung 980 pro
cpu: intel 14700k
*program*
https://gist.github.com/rjl493456442/beb41c8d95bd3537488a2f8944c7009b
*others*
The os page cache is dropped after each benchmark round, ensuring the peformance number is reliable.
`sync; echo 3 | sudo tee /proc/sys/vm/drop_caches`
**results**
| Threads | Per-Thread Read | Total Time | QPS | Aggregated Thread Time | Total Read | Throughput (MB/s) |
|---------|----------------|---------------|------------|-----------------------|------------|-----------------|
| 1 | 10240 MB | 2m18.76s | 18891.26 | 2m18.76s | 10.74 GB | 73.79 |
| 2 | 5120 MB | 1m13.56s | 35634.98 | 2m20.02s | 10.74 GB | 139.20 |
| 4 | 2560 MB | 37.56s | 69799.93 | 2m22.40s | 10.74 GB | 272.66 |
| 8 | 1280 MB | 19.43s | 134893.44 | 2m27.65s | 10.74 GB | 526.93 |
| 16 | 640 MB | 10.78s | 243192.34 | 2m44.10s | 10.74 GB | 949.97 |
| 32 | 320 MB | 6.61s | 396702.02 | 3m21.90s | 10.74 GB | 1549.62 |
| 64 | 160 MB | 4.97s | 526962.68 | 5m6.39s | 10.74 GB | 2058.45 |
| 128 | 80 MB | 4.60s | 570218.58 | 9m34.71s | 10.74 GB | 2227.42 |

The concurrent read performance scales almost linearly with the number of threads, reaching the upper limit at approximately 600k QPS.
#### Concurrent read over the Ethereum mainnet dataset
*hardware*
storage: samsung 980 pro
cpu: intel 14700k
*program*
https://github.com/rjl493456442/go-ethereum/blob/state-batch-read-tool/cmd/geth/dbcmd.go#L1197
```shell
[step-1] dump account keys
geth snapshot export-state-keys account.keys storage.keys
[step-2] benchmark
geth db batch-read-benchmark --raw --mode account --threads 1,2,4,8,16,32,64,128 --entries 1000000 --accounts ./account.keys
```
*database*
Pebble: https://github.com/cockroachdb/pebble
*dataset*
The ethereum mainnet at the block `23,711,574`, with key-value store as `288`GB.
**results**
| Threads | Elapsed (s) | Throughput (QPS) | Latency Mean (us) | Latency p50 (us) | Latency p75 (us) | Latency p95 (us) | Latency p99 (us) |
|---------|-------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
| 1 | 179.984 | 5,556.04 | 171.72 | 188 | 212.75 | 242 | 362.06 |
| 2 | 92.635 | 10,795.00 | 180.53 | 193 | 220 | 254.15 | 358.09 |
| 4 | 45.158 | 22,143.99 | 172.94 | 182 | 216 | 265 | 367 |
| 8 | 20.527 | 48,715.30 | 158.82 | 168.5 | 200 | 257 | 338.03 |
| 16 | 10.148 | 98,535.45 | 157.08 | 162 | 198 | 277 | 369.03 |
| 32 | 5.747 | 173,995.16 | 179.22 | 172 | 221 | 366.15 | 568 |
| 64 | 5.239 | 190,861.99 | 281.02 | 223 | 315 | 705.30 | 1,388.63 |
| 128 | 4.620 | 216,426.50 | 524.01 | 255 | 513 | 1,929.05 | 3,678.15 |

The concurrent read performance scales almost linearly with the number of threads, reaching the upper limit at approximately 200k QPS.
The performance gap between database reads and raw file reads arises from the database's multi-level architecture. Like the Pebble needs to check multiple levels including the memoryDB, overlapped level0 files and non-level0 files for a single point read.
More importantly, if we take the 4-thread result as the baseline, representing the block-level Access List (BAL) without read locations, and compare it with the 128-thread result, which corresponds to the BAL with read locations, we observe roughly a **10x** speedup.
Besides, the concurrent read performance saturates around 32 threads, showing no significant gains with higher concurrency. This indicates that the BAL with read locations can provide noticeable benefits before the network's block gas limit increases to **512 Mgas**.
### Mainnet db inspection
Key-value store:
```
gary@dev:~/mount/eth-mainnet$ du -sh geth
288G geth
```
Flatten-file store:
```shell
gary@dev:~/mount/eth-mainnet$ du -sh ~/hdd2/mainnet-full/
640G /home/gary/hdd2/mainnet-full/
```
```shell
+-----------------------+-----------------------------+------------+------------+
| DATABASE | CATEGORY | SIZE | ITEMS |
+-----------------------+-----------------------------+------------+------------+
| Key-Value store | Headers | 90.42 KiB | 135 |
| Key-Value store | Bodies | 16.18 MiB | 135 |
| Key-Value store | Receipt lists | 13.79 MiB | 135 |
| Key-Value store | Difficulties (deprecated) | 0.00 B | 0 |
| Key-Value store | Block number->hash | 5.54 KiB | 135 |
| Key-Value store | Block hash->number | 927.14 MiB | 23711575 |
| Key-Value store | Transaction index | 15.78 GiB | 457910445 |
| Key-Value store | Log index filter-map rows | 0.00 B | 0 |
| Key-Value store | Log index last-block-of-map | 0.00 B | 0 |
| Key-Value store | Log index block-lv | 0.00 B | 0 |
| Key-Value store | Log bloombits (deprecated) | 0.00 B | 0 |
| Key-Value store | Contract codes | 11.20 GiB | 1892776 |
| Key-Value store | Hash trie nodes | 0.00 B | 0 |
| Key-Value store | Path trie state lookups | 176.17 KiB | 4400 |
| Key-Value store | Path trie account nodes | 51.31 GiB | 445804960 |
| Key-Value store | Path trie storage nodes | 191.09 GiB | 1900819058 |
| Key-Value store | Path state history indexes | 0.00 B | 0 |
| Key-Value store | Verkle trie nodes | 0.00 B | 0 |
| Key-Value store | Verkle trie state lookups | 0.00 B | 0 |
| Key-Value store | Trie preimages | 0.00 B | 0 |
| Key-Value store | Account snapshot | 14.93 GiB | 324341342 |
| Key-Value store | Storage snapshot | 101.31 GiB | 1404655749 |
| Key-Value store | Beacon sync headers | 657.00 B | 1 |
| Key-Value store | Clique snapshots | 0.00 B | 0 |
| Key-Value store | Singleton metadata | 695.81 KiB | 16 |
| Ancient store (Chain) | Headers | 11.51 GiB | 8174048 |
| Ancient store (Chain) | Hashes | 859.29 MiB | 8174048 |
| Ancient store (Chain) | Bodies | 460.23 GiB | 8174048 |
| Ancient store (Chain) | Receipts | 167.17 GiB | 8174048 |
| Ancient store (State) | Account.Data | 47.90 MiB | 4397 |
| Ancient store (State) | Storage.Data | 19.38 MiB | 4397 |
| Ancient store (State) | History.Meta | 339.23 KiB | 4397 |
| Ancient store (State) | Account.Index | 46.96 MiB | 4397 |
| Ancient store (State) | Storage.Index | 69.00 MiB | 4397 |
+-----------------------+-----------------------------+------------+------------+
| TOTAL | 1.00 TIB | 4559140862 |
+-----------------------+-----------------------------+------------+------------+
```
### Concurrent read over the Ethereum bloatnet dataset
*hardware spec*: https://www.hetzner.com/dedicated-rootserver/ax52/
```
AMD Ryzen™ 7 7700
Simultaneous Multithreading
RAM: 64 GB DDR5
optional max. 192 GB DDR5 (for additional charge)
Disk: 2 x 1 TB NVMe SSD (Gen4)
(Software-RAID 1)
```
*database size*: 503.7GB
*program*:
```shell
[step-1] dump account keys
geth snapshot export-state-keys account.keys storage.keys
[step-2] benchmark
geth db batch-read-benchmark --raw --mode account --threads 1,2,4,8,16,32,64,128 --entries 1000000 --accounts ./account.keys
```
*result*
| Threads | Throughput (qps) | Mean Latency (us) | P50 (us) | P75 (us) | P95 (us) | P99 (us) |
|----------|------------------|-------------------|-----------|-----------|-----------|-----------|
| 1 | 9033.89 | 109.09 | 122.00 | 129.00 | 151.00 | 199.03 |
| 2 | 17848.64 | 111.08 | 123.00 | 131.00 | 153.00 | 205.00 |
| 4 | 35004.48 | 112.22 | 124.00 | 134.00 | 157.00 | 213.03 |
| 8 | 67568.91 | 115.59 | 126.00 | 144.00 | 164.00 | 217.00 |
| 16 | 129835.26 | 119.41 | 128.00 | 148.00 | 182.00 | 229.03 |
| 32 | 196250.11 | 146.78 | 134.00 | 162.00 | 275.00 | 573.00 |
| 64 | 198070.71 | 280.27 | 146.00 | 222.00 | 1071.45 | 2277.24 |
| 128 | 200950.44 | 624.43 | 161.00 | 530.75 | 2390.70 | 7251.52 |

As the reference, the concurrent read over the raw file is also tested.
It shows that the hardware is very powerful and can reach 1.5m qps, which is about 3 times faster than the one used in previous benchmark.
For reads over the bloatnet database, there isn't a significant speedup. I don't think we can directly conclude that reading from the bloatnet database is three times slower than from the mainnet database. It would be better to re-measure the read performance using a mainnet setup on the same machine for consistency.
*interpretation*
We can have the similar conclusion over the bloatnet, that:
- the concurrent read is saturated at around thread 32;
- before the saturation, the read performance can be improved linearly with the increased concurrency;
- the BAL reads is beneficial in the read-heavy scenario;
*result*
| Threads | QPS | Throughput (MB/s) | Total Time (s) |
|----------|------------|------------------|----------------|
| 1 | 54825.80 | 214.16 | 47.81 |
| 2 | 108901.66 | 425.40 | 24.07 |
| 4 | 211277.18 | 825.30 | 12.41 |
| 8 | 398724.81 | 1557.52 | 6.57 |
| 16 | 751116.07 | 2934.05 | 3.49 |
| 32 | 1242143.39 | 4852.12 | 2.11 |
| 64 | 1515170.38 | 5918.63 | 1.73 |
| 128 | 1522411.62 | 5946.92 | 1.72 |

### Bloatnet db inspection
```
/data # du -sh geth/chaindata/ancient/
929.9G geth/chaindata/ancient/
/data # du -sh geth/chaindata
1.4T geth/chaindata
```
```shell
+-----------------------+-----------------------------+------------+------------+
| DATABASE | CATEGORY | SIZE | ITEMS |
+-----------------------+-----------------------------+------------+------------+
| Key-Value store | Headers | 59.13 KiB | 90 |
| Key-Value store | Bodies | 49.32 KiB | 90 |
| Key-Value store | Receipt lists | 3.69 KiB | 90 |
| Key-Value store | Difficulties (deprecated) | 0.00 B | 0 |
| Key-Value store | Block number->hash | 3.69 KiB | 90 |
| Key-Value store | Block hash->number | 921.82 MiB | 23575665 |
| Key-Value store | Transaction index | 9.77 GiB | 283516326 |
| Key-Value store | Log index filter-map rows | 8.83 GiB | 89290916 |
| Key-Value store | Log index last-block-of-map | 1.86 MiB | 40657 |
| Key-Value store | Log index block-lv | 45.30 MiB | 2375188 |
| Key-Value store | Log bloombits (deprecated) | 0.00 B | 0 |
| Key-Value store | Contract codes | 31.32 GiB | 19657204 |
| Key-Value store | Hash trie nodes | 0.00 B | 0 |
| Key-Value store | Path trie state lookups | 5.75 MiB | 146933 |
| Key-Value store | Path trie account nodes | 61.68 GiB | 539666763 |
| Key-Value store | Path trie storage nodes | 352.53 GiB | 3285201457 |
| Key-Value store | Path state history indexes | 0.00 B | 0 |
| Key-Value store | Verkle trie nodes | 0.00 B | 0 |
| Key-Value store | Verkle trie state lookups | 0.00 B | 0 |
| Key-Value store | Trie preimages | 0.00 B | 0 |
| Key-Value store | Account snapshot | 19.46 GiB | 389862800 |
| Key-Value store | Storage snapshot | 193.48 GiB | 2421466190 |
| Key-Value store | Beacon sync headers | 647.00 B | 1 |
| Key-Value store | Clique snapshots | 0.00 B | 0 |
| Key-Value store | Singleton metadata | 696.08 KiB | 17 |
| Ancient store (Chain) | Headers | 11.26 GiB | 23575576 |
| Ancient store (Chain) | Hashes | 854.37 MiB | 23575576 |
| Ancient store (Chain) | Bodies | 665.67 GiB | 23575576 |
| Ancient store (Chain) | Receipts | 252.13 GiB | 23575576 |
| Ancient store (State) | History.Meta | 7.11 MiB | 90000 |
| Ancient store (State) | Account.Index | 8.65 MiB | 90000 |
| Ancient store (State) | Storage.Index | 3.67 MiB | 90000 |
| Ancient store (State) | Account.Data | 14.75 MiB | 90000 |
| Ancient store (State) | Storage.Data | 7.19 MiB | 90000 |
+-----------------------+-----------------------------+------------+------------+
| TOTAL | 1.57 TIB | 7054800477 |
+-----------------------+-----------------------------+------------+------------+
```