ignacio (jsign) - Update 11

# ignacio (jsign) - Update 11 In these last two weeks I worked in building a [custom Geth native tracer](https://geth.ethereum.org/docs/developers/evm-tracing/custom-tracer), which allows to replay a transaction an capture a bunch of information for that transaction execution: - For every `SLOAD` or `SSTORE` opcode execution, capture the tree depth of that branch. - For every `SLOAD` or `SSTORE` opcode execution, simulate what would be the future Verkle Trie branch and track if each time is a "first time" access, or a repeated access. ## What's the motivation for all this? There're two angles. ### 1. Static depth analysis vs dynamic depth analysis Previously, I've done a static analysis of the current Ethereum mainnet Merkle Trie Depth. For example, XX% of contracts have a Storage Trie of depth Y kind of analysis. That's interesting, but not all contracts have the same popularity so despite, say, 90% of the contracts have a tree depth of 1, they're probably not accessed much. The idea of using a Geth tracer, is that we'd be collecting data for _real_ usage. Apart from building the Geth tracer, I built a side-daemon that is replaying each transaction in a synced Geth node on every new block. Collecting data on real transaction execution means that the data will be weighting correctly which contracts are more popular. So, despite 1% of contracts have height Y, if that 1% is targeted in 80% of transactions, now we'll account for that. ### 2. Insights around VKTs intential design decision for locality In VKTs, there was an intentional design decision of bucketing toegheter storage slots in 256-spans. So for example, storage slots between 0...255 will be in the same branch, 256...511 will all be in the same branch, etc. This makes intuitive sense since if we access storage slot X, there's more chance that we'll access a storage slot close to it than any othe random one. (This isn't a new idea; today's computer architectures have the same design, which is called "Data locality" (e.g: fetch memory in cache lines)). Despite this optimization makes sense from an intuitive sense, it would be good to explore how well it can play out in real usage. But, unless we're in mainnet with VKT we can never be totally sure how that will work. One could say that current (or future) devnets using VKTs can help with that, but the usage in devnets is low-volume and not representative of real usage. So my idea is to use the same Geth tracer to, on each storage slot access (of real transaction executions), simulate what branch in the VKT that would map to. That allows to know how much the data locality optimization will behave _on real usage_. For example, if a transaction access storage slots 1 and 23, those count as a single VKT branch access (remember that storage-slots are bucketized in sizes of 256). Knowing how many VKT branch accesses are done is also relevant for the new gas model that VKT will change. In the new model, there's an influence on how many unique VKT branches are accessed, so that's why it's relevant. A transaction execution accessing slots 1 and 23 will pay for 1 branch access, but a contract accessing slots 1 and 258 will pay for 2 branch accesses. Note that both access 2 storage slots, but they'll pay differently! In summary, the idea is to have preliminary insight of the impact of this VKT design decision on real data (not abstract!). ## Weekly update This was a reasonable amount of work, but I advanced a lot these last two weeks. I'll be closing this work next week, which will have as an output: - The Geth tracer collecting all the raw information. - A separate deamon that will be calling `debug_traceTransaction` targeting this tracer for every transaction in new blocks in a continuously syncing Geth node. It also provides a separate command to summarize previously collected data. - Packed with the above, I'll summarize all the findings and results in a HackMD document to be shared in #verkle-trie. It's mainly targeted for Dankrad which was interested in this research direction.