Try   HackMD

Flat db healing

Bonsai operates with a flat database and a trie structure simultaneously. This combination allows for faster and more efficient SLOAD operations.

In a traditional trie structure, the accounts and slots are located at the bottom of the trie (leaf). To find specific account or slot, you have to go through the entire trie. Each pass through a node of the trie equals a read in the database. The deeper the information is in the trie, the more reads are needed, which can slow down the operation.

The introduction of a flat database structure alongside the trie in Bonsai presents a significant advantage. Instead of going through the entire trie, all the leaves (accounts and slots) are directly accessible. This means that you can find any information with a single read in the database, no matter its "depth" in the trie. This makes the search process much faster and more efficient.

We have run a series of tests to compare the traditional trie structure with the new combined flat database and trie structure. The results, which we will present later, clearly show that this feature greatly improves performance. The tests show that the combined structure greatly reduces the time and resources needed for SLOAD operations, which allows for faster block processing.

Why a new flag for bonsai ?

When using snapsync or fastsync, keeping a valid flat database becomes tricky because we sync with many pivot blocks. This leads to our state being a mix of different blocks.

The Problem

To fix this mixed state at the end of the sync, there's a healing step for the trie. But there wasn't a similar healing step for the flat database. This meant that we didn't update the flat database during the initial sync. Consequently, it was necessary to clear and rebuild the flat database post-sync, which significantly impacted SLOAD performance.

When we couldn't find data in the flat database, we had to look for it in the trie. This method had two main issues:

  1. More Reads: We couldn't always find the data with just one read. Sometimes, we had to read the trie to find an account or a slot. This meant more reads, which slowed things down.

  2. Reads zero: When an account or a slot didn't exist (read zero), we still had to look in the trie because we didn't fully trust the flat database. This led to many unnecessary reads, which impact performance.

When we looked at the blocks, we found a lot of reads zero. These unnecessary reads were hurting performance, making things slower and less efficient.

The Solution

The solution to these problems is to introduce a healing step for the flat database. This step will fix the state at the end of the sync, making sure the flat database is valid and up-to-date. This will cut down on unnecessary reads and ultimately make block processing operations faster and more efficient.

Healing process

The healing process for the flat database involves the following steps:

  1. Trie Healing: Before healing the flat database, the Merkle trie is healed using the existing process, ensuring the trie accurately represents the state of Ethereum.

  2. Flat Database Verification: After the trie healing process, the flat database is verified for validity by traversing it in ranges and comparing the data with the Merkle trie. The purpose is to identify any inconsistencies between the flat database and the trie.

    • Define a range within the flat database to be verified.
    • Retrieve the corresponding data from the flat database within the specified range.
    • Generate a range proof for the retrieved data using the Merkle trie.
  3. Healing the Flat Database: If any inconsistencies are found during the verification process, the flat database needs to be fixed. To achieve this, the following steps are performed:

    • Identify the range within the flat database that contains incorrect data.
    • Traverse the corresponding range in the Merkle trie to locate the incorrect leaf nodes.
    • Replace the incorrect leaf nodes in the flat database with the correct ones obtained from the Merkle trie.
    • Repeat this process for all identified inconsistent ranges within the flat database.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  1. Completing the Healing Process: Once all identified inconsistent ranges have been fixed, the healing process for the flat database is complete. The flat database now accurately reflects the state of Ethereum after the sync process.

Performance improvement

The healing mechanism for the flat database provides significant performance improvements, particularly for SLOAD operations and READ ZERO operations. With a complete and accurate flat database, the need for fallbacks to the Merkle trie is eliminated.

By healing the flat database and ensuring its completeness, the number of fallbacks to the Merkle trie is reduced to 0, resulting in improved SLOAD performance. READ ZERO operations also benefit from the elimination of unnecessary fallbacks.

We can notice a 20% improvement (274 ms instead of 344 ms on 50th percentile) on the node running the flat database feature compared the node running current main branch.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Additionally, there has been a significant enhancement in outliers (99th and 100th percentiles), which will enhance the attestions' performance impacted by such outliers.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

 

CPU profiling:
The improvement is pretty clear when checking the profiling of both nodes, especially on the SLOAD operation

Without this flag

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

With this flag

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →