Notes on Caching Blob sidecars

# Notes on Caching Blob sidecars ## Background In Prysm, blob data is directly saved to the DB without a KZG commitment check. This has the potential downside of storing unverified or malicious blobs in the DB. We aim to implement a cache layer to store blobs temporarily before they are migrated to the DB, ensuring that only verified blobs are permanently stored. ## Pre-requisites - The block must contain KZG commitments. - The node must compute every blob's KZG commitment to ensure it matches the block. And the KZG itself is correct. ## Current Behavior When blobs are received via p2p gossip, a Prysm node responds in the following ways: 1. **Blobs Verified**: Saves to the DB. 2. **Blobs without Parents**: Saved to a pending queue. 3. **Blob Validation Fails**: Not saved. ## Security Concerns based on current behavior 1. **Limited Risk on blobs verified**: Gossip validation rule limits blob acceptance to a maximum of 6 blobs per slot, based on block root and blob index. 2. **DOS Risk on blobs without parent**: There is a TTL for every pending blob until the end of the next slot to mitigate DOS attacks. 3. **No Risk for failed blobs**: Blobs that fail validation are not saved and thus present no risk. ### Attack Vectors 1.) **Single Valid Block with Equivocation Blobs** In this scenario, an attacker takes advantage of the system's limit of one sidecar per `(sidecar.block_root, sidecar.index)`. The attacker sends two identical sidecars with the same index, but only one can point to `block.body.kzg_commitments`. If Prysm receives the incorrect sidecar first, it will become stuck because it cannot save the second, correct sidecar to the DB. The immediate solution is to remove the incorrect sidecar when it fails validation. We will then require an honest attestation to resolve the issue. Once such an attestation arrives and we find the block is not in the DB, we manually request it using 'block by root.' If this block contains the required KZG commitments, we will request the correct sidecars using 'blobs sidecars by root.' I believe this is the simplest and most effective solution. It's important to note that we should not mark any block as 'bad' during this process. Another solution is to modify the DB to tolerate equivocated sidecars. While this approach is more complex, it could be feasible because P2P validation still prevents equivocated blobs from being saved in the DB. Therefore, the node will not get spammed through gossip protocols. The only time equivocated blobs could be saved to the DB is when we manually request them. However, this solution involves the added complexity of manually selecting the blob sidecars from the DB, which is not worth the effort. Moreover, there isn't a strong use case for saving equivocated yet invalid blob sidecars in the first place. 2.) **Equivocating Blocks with Equivocation Blobs**. This scenario is intriguing, but it doesn't pose a significant issue. It can be considered similar to the first attack, particularly because of how blob sidecars are stored in the database. The unique key for each blob sidecar is composed of `bytes(slot_to_rotating_buffer(blob.slot)) ++ bytes(blob.slot) ++ blob.block_root`, it is ensuring that each equivocating block has its own unique index in the DB. One question that arises is what would happen if a proposer were to generate 10,000 equivocating blocks along with 60,000 associated blobs. Although I'm optimistic that libp2p would manage this situation effectively, it's certainly worth further investigation. 3.) **Eclipse Attack Leading to Stagnation in DA Check**. A node can eclipse a Prysm node if Prysm does not manually request blobs while stuck in a DA check. This results in blobs never getting gossiped to the Prysm node, causing the block to hang for 30 seconds until manually requested. Consequently, this leads to 30 seconds of liveness delay for the Prysm node. This issue is likely to be addressed in future updates. 4.) **Handling Excessive Blob Sidecars Referencing a Block**: A node can accept up to 6 valid blob sidecars. Consider a scenario where a block has only 4 KZG commitments, but a node receives 6 blob sidecars, of which 4 are valid. In this situation, there is a data availability issue for the two excessive sidecars. These excessive blob sidecars should be removed from the storage. However, the block should still be processed as normal; failing the block is against the spec. Currently, Prysm will fail the block but will rely on manual requests through either the blockroot in the attestation or the parent root of a child block to retrieve the blob sidecar again. During the manual try, the block will be valid. ### Design thoughts ### The Difference in Caching Between Initial Sync and Regular Sync #### Initial Sync: - The node has the flexibility to determine the timing and manner of requesting blocks and blob sidecars, simplifying some aspects of the design. - All blob validations can be executed within the sync package. These include: - Ensuring blob side alignment with the corresponding block. Verifying the blob has a valid KZG. - The node cannot perform complete block validation within the sync package. - This raises a crucial question: What occurs if a block becomes invalid during `RecieveBlockBatch`? - Considering open design possibilities, one approach could be to cache the blob sidecar initially and then save it to the DB only after RecieveBlockBatch is successfully executed. - Alternatively, the blob sidecar could be saved to the DB first and removed later if RecieveBlockBatch fails. - Cache size considerations: Should this be determined per batch? The `RecieveBlockBatch` needs to synchronize with `RoundRobin` regarding this usage. #### Regular Sync: - Unlike initial sync, the node has no control over when it receives blocks and blob sidecars, introducing more complexity. - Saving unverified blob sidecars to the DB should be avoided. One solution could be to temporarily cache blob sidecars and move them to the DB once verified. - P2P validation mechanisms ensure that the node only accepts unique index values per block root for blob sidecars, mitigating DOS and spamming concerns. - Caching allows the flexibility to remove unneeded blob sidecars later and retain only the useful ones. - For example, if a blob has two commitments but receives four sidecars, it’s not advisable to let free DA occupy database space. Cache size considerations: A few slots should suffice for this scenario. #### Long-term Considerations: The ultimate goal is to eliminate the need for an initial sync, focusing solely on a single path for block processing. We are also working on redesigning DB by using a filesystem ### Open Questions: 1.) Should we reuse the cache for the pending queue? - Leaning toward not reusing the cache for the pending queue. 2.) What should the TTL be for verified cache blobs? - Verified blobs do not need a strict TTL. They can remain in the cache longer. The data size for two epochs of full blobs is approximately 48KB. - Consider using a Least Recently Used (LRU) cache for longer retention. 3.) What are the implications of using an LRU cache if no block is received for two epochs? - The LRU cache may get overwritten, potentially causing data loss or requiring data to be re-fetched. ### Workflow: 1.) Blobs are first verified in the P2P network and then saved to the cache. 2.) After the block has been processed and passes the DA check, blobs are migrated from the cache to the DB. 3.) TODO: Additional considerations and steps need to be outlined.