Block Replay - Before (2023-02-21) & After (2023-03-17)

# Block Replay - Before (2023-02-21) & After (2023-03-17) This document contains a comparison between our laste VKT call, and today. The comparsion is in **my desktop machine**. Results may vary in other setups, so see numbers as guides to signal "good ideas" and not precise speedups. Here we're only mentioning "CPU work". The overall benchmark has other kind of bottlenecks such as disk IO (which are a separate topic). ## TL;DR Here's a summary of speedups: - Total running time: 1.36x speedup [recall overall running time isn't only about VKT stuff] - CPU seconds doing `fp.Inverse`: 1.57x speedup. - CPU seconds doing Pedersen Hash (i.e: MSM, `CommitToPoly(...)`): 1.59x speedup. Main optimization to be mentioned: - Saving Inverses: - When updating trie commitments, batch inverses (Point->Fr) _per level_(h/t Dankrad for the idea). This didn't have a big impact on numbers, but I think that's related to the benchmark data; it's a good idea to keep. ([PR reference](https://github.com/gballet/go-verkle/pull/332)) - While doing the above, I realized when serializing touched nodes to be saved on disk, we have to do many Projective->Affine transformations for all involved commitments (root and children of nodes). This requires doing a division (i.e: `/Z`) thus inverse. Batch _all_ Projective->Affine transformations. This doesn't require to be per-level since each node serialization is independent. This had the greatest impact in saving inverses CPU work. ([PR refeference](https://github.com/gballet/go-verkle/pull/333)) - TL;DR: we saved Inverses both batching inverses for Point->Fr _and_ Projective->Affine which are different cases. - Saving MSM: - Use a separate and "wider" precomputed table for MSM in the first five elements of the vector. This is to exploit the fact that a lot of MSM are for transforming addresses to trie keys, rather than "arbitrary" vectors. ([PR reference](https://github.com/crate-crypto/go-ipa/pull/37)) - Saving Sqrt: - None concrete for now, but there's an opportunity reg Tonelli-Shanks. See below for more details about numbers. Note that we mentioned above relevant changes to mention in this doc, we've done other optimizations that are more concrete to clients or plain "engineering" (those are important too for other reasons). ## Before Log tail: ``` INFO [03-17|10:21:21.269] Writing cached state to disk block=499,873 hash=6008be..c7ff16 root=9096aa..bef3cf INFO [03-17|10:21:21.269] Persisted trie from memory database nodes=0 size=0.00B time=371ns gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B INFO [03-17|10:21:21.269] Blockchain stopped Import done in 8m1.330480953s. ``` CPU pprof: ![](https://i.imgur.com/2mROllU.png) ## After (current) Code: [reference branch](https://github.com/gballet/go-ethereum/pull/183). Log tail: ``` INFO [03-17|10:05:11.087] Writing cached state to disk block=499,873 hash=6008be..c7ff16 root=9096aa..bef3cf INFO [03-17|10:05:11.087] Persisted trie from memory database nodes=0 size=0.00B time=792ns gcnodes=0 gcsize=0.00B gctime=0s livenodes=1 livesize=0.00B INFO [03-17|10:05:11.088] Blockchain stopped Import done in 5m53.898622086s ``` CPU pprof: ![](https://i.imgur.com/OTVSxDl.png)