PR: https://github.com/sigp/lighthouse/pull/4718
During the EPF, I put together some code that abstracted away some of the levelDB specific database functionality within the beacon node backend. The existing KeyValueStore
trait was already doing some of the abstraction, however the code always "assumed" the KV store was levelDB.
TLDR I copied the slasher db architecture! I created a BeaconNodeBackend
type that implements the KeyValueStore
and ItemStore
traits. I then replaced all references of levelDB w/ the new BeaconNodeBackend
type. Within the BeaconNodeBackend type a cfg!
macro was used to activate levelDB/Redb specific code paths based on config values.
Recently I was able to add Redb as an alternative database implementation. Redb is an ACID embedded key value store implemented in pure rust. Data is stored in copy-on-write B-trees. See the design doc for more info
I've ran both the levelDB and Redb db implementations on mainnet and here are the results
In this round of metrics gathering I have also included block processing metrics AND disk read/writes from the process itself. I used MacOS built in activity monitor to track disk read/writes for the lighthouse process.
I made two separate runs for each db implementation. The data for each run can be found below. To summarize
Note, my computer fell asleep during Run #1, some of the data here may be skewed because of this.
Block Processing Metrics: https://snapshots.raintank.io/dashboard/snapshot/pQ0M8E9EOV2w7V3mSsxPSH1auzGXonOL
Database Metrics: https://snapshots.raintank.io/dashboard/snapshot/t4xr2Xia6alpL46P9cW9w3UtzZ8qayx6
Start Time | End Time | Bytes Written To Disk | Bytes Read From Disk |
---|---|---|---|
2024-08-02 16:10 GMT +2 | 2024-08-02 23:39 GMT +2 | 90.55 GB | 4.90 GB |
Bytes written per hour: 11.82 GB/hr
DB metrics:
https://lhm.sigp.io/dashboard/snapshot/iapHRvhuL5H9ba7FQNiZ4Sr92ZiJXad7
Block processing metrics:
https://snapshots.raintank.io/dashboard/snapshot/RKXtZTa8uKVrYxaE9QJ3GjllevRGdUkI
Start Time | End Time | Bytes Written To Disk | Bytes Read From Disk |
---|---|---|---|
2024-09-03 18:12 GMT +2 | 2024-09-03 23:30 GMT+2 | 64.37GB | 491.0MB |
Bytes written per hour: 12.12 GB/hr
Block processing metrics:
https://snapshots.raintank.io/dashboard/snapshot/XN3zWddfAvoMn0R52WnYZ4u5ZxH9UPQi
Database Metrics: https://snapshots.raintank.io/dashboard/snapshot/6w5E73M0C47zCfeZsrYStYoBAYqFnTkE
Start Time | End Time | Bytes Written To Disk | Bytes Read From Disk |
---|---|---|---|
2024-09-03 00:00 GMT +2 | 2024-09-03 9:03 GMT + 2 | 296.37GB | 20.25GB |
Bytes written per hour: 32.4 Gb/hr
DB Metrics: https://snapshots.raintank.io/dashboard/snapshot/4olVlEPmopbZ7vqQMAIg46m5Miqwirpf
Block processing metrics:
https://snapshots.raintank.io/dashboard/snapshot/TyIl4iLERJRKohI43xT8g3xPZzubvlxR
Start Time | End Time | Bytes Written To Disk | Bytes Read From Disk |
---|---|---|---|
2024-09-03 09:10 GMT +2 | 2024-09-03 18:01 GMT +2 | 284.79GB | 27.56 GB |
Bytes written per hour: 31.7GB/hr
In this round I've disabled backfill-sync
DB Metrics: https://snapshots.raintank.io/dashboard/snapshot/hMwTYy5rQbQcoOVbNjh4cM5qpPNuh28w
Block processing metrics:
https://snapshots.raintank.io/dashboard/snapshot/pAL8KxKXv6Dv4D94CMXjE59Zl2GkGb4C
Start Time | End Time | Bytes Written To Disk | Bytes Read From Disk |
---|---|---|---|
2024-10-02 00:46 GMT+2 | 2024-10-02 09:22 GMT+2 | 105.33GB | 899.2MB |
Bytes written per hour: 12.24GB/hr
I think we should "stress" test the redb implementation. The goal here would be to understand at what scenarios the database could become corrupted. We'll also want to trigger a db compaction.