# Impact of IO on PEVM (WIP)
# Introduction
This is a short research piece exploring the high-level relationship between parallelism and in-memory vs disk starage in [Pevm](https://github.com/risechain/pevm/tree/main).
# Background
The [Pevm](https://github.com/risechain/pevm/tree/main/benches) benchmarks are conducted using `InMemoryStorage`, where all data is loaded into RAM before processing. This is to more easily identify CPU bottlenecks and to ensure a more consistent test environment.
We'd like to explore the performance improvement pevm unlocks when using disk storage and whether the relative improvement is better or worse than in-memory storage.
# Steps
We run [the benchmark](https://github.com/risechain/pevm/blob/main/benches/README.md) twice, one with the unmodified code using `InMemoryStorage`, one with out newly introduced `OnDiskStorage`.
The 'OnDiskStorage' is a simple key-value store in MDBX that stores data on disk. The data is stored in the following tables:
| Table | Key | Value |
| --- | --- | --- |
| accounts | address | (code_hash, balance, nonce) |
| storage | address, index | storage_value |
| bytecodes | code_hash | code |
| block_hashes | block_number | block_hash |
# Result
Switching from `InMemoryStorage` to `OnDiskStorage` negatively impacts both *Sequential* and *Parallel* execution (PEVM) significantly. This is very miuch expected as IO operations are much slower than memory operations. However, the results show that *Parallel* execution is **less affected** by IO constraints, thanks to the inherent advantages of parallelism.
| Algorithm | InMemoryStorage (on average, per block) | OnDiskStorage (on average, per block) | Slowdown (relative) |
| --- | --- | --- | --- |
| Sequential | 5.244820513 seconds | 6.54424359 seconds | +25% slower |
| Parallel | 2.778320513 seconds | 3.224512821 seconds | +16% slower |
<aside>
💡 The raw benchmark result (a `.tar.gz` file) can be found at the bottom of this doc.
</aside>
## Representative blocks
About 75% of the blocks look like this:

As observed, `OnDiskStorage` introduces some slowdown. However, *Parallel* execution is less impacted than *Sequential*. This is likely due to the nature of parallelism. In *Sequential* execution, the process is blocked during IO operations. In contrast, with *Parallel* execution, if one thread is blocked by IO, the other threads continue to run.
---
The remaining 25% of the blocks look like below. This happens when PEVM decides to fallback to *Sequential*. (Note the timescale is in microsecond, this is because small often blocks fallback to sequential)

## Impact to the speedup ratio
By changing `InMemoryStorage` to `OnDiskStorage`, we see **7.5%** improvement in the speedup value. In particular, the speedup values for each scenario are:
- `InMemoryStorage`: 5.244820513 / 2.778320513 ≈ 1.8878x
- `OnDiskStorage`: 6.54424359 / 3.224512821 ≈ 2.0295x
(2.0295 / 1.8878 ≈ 1.075)
The next step is to rerun this with the gigagas benchmark to gain insight of how the relative IO impact scales with the size of the blocks and the depth of transaction dependancies within the blocks.
## Take-away
This is an early test, without any major efforts towards improving the parallelism of in memory storage in Reth, there's ALOT of juice to squeeze here. The insight is `OnDiskStorage` benefits from parallelism more than `InMemoryStorage`.
## Contributions
If you're interested in contributing to the Pevm effort, please contact [@hai_rise](https://x.com/hai_rise)!
[criterion-8daf0c65ffbcd516b65f88d0dd5787119d1f8d40.tar.gz](Impact%20of%20IO%20on%20PEVM%20d9ccaf5cd9ce4c9eaacfabf1080d0d3b/criterion-8daf0c65ffbcd516b65f88d0dd5787119d1f8d40.tar.gz)