Week 10 Dev update

# EPF5 Dev Update - Week 10 ## Weekly Highlights Attended EPF5 weekly standup, Grandine standup, and also our in-house meet which happens three times a week (Monday, Wednesday, and Friday). ### Meeting with David Theodore We had our first meeting with him on Tuesday, and he shared many insightful points, such as the challenges with blockchains being stateful, which makes fuzzing them more difficult compared to stateless systems. However, we need to build additional components to support it, and Kurtosis is one promising way forward. He suggested that we may focus on this approach. * Manual inspection. * Fuzz requests and responses (p2p). * Fuzz peer discovery. * Thread sanitizer ### Meeting with Saulius The first update is on the RSA crate, which I reported to Saulius. He gave me feedback that helped me understand things better. He said the RSA crate is actually an optional dependency, and Grandine doesn't use that option. On the flamegraph, I shared my report with him and also my concern about the flamegraph and thread creation. In his own words, he said profiling tools, in their experience, are hard to deal with, although they didn't try a lot of them. He suggested that the best thing to do is to try out a lot of them. The best way to profile Grandine is to run a validator, wait for 10 mins or so until the client fully connects to the peers and starts to work fully, and then connect the profiler for 12 sec or so for a slot. He also mentioned that they have used flamegraph before, and the problem they had was that it's tricky. I totally understand this because it's very hard to understand from this flamegraph where exactly the problem is. But he also suggested that I can go ahead and try different profilers, and maybe I could figure out something that is not efficient enough. However, the hard part will be that I may not know what is inefficient because the application is doing a lot. In order to say that something is inefficient, you need to compare it to something else that is efficient. Let's say 1/3 of the CPU is consumed by signatures; I may not know if this is good or bad. So one way to figure that out is to try to run the profiler on different clients and try to map out the functions and compare to know if, for example, other clients' signature verification is only one-tenth of the CPU. So this is more like a high-level guideline because profiling Grandine will not be easy to do and it's tricky, but he mentioned that Grandine's optimal most interest will be if there is any current inefficiency. Going through all this, I sincerely think that making a comparison with other clients, such as Lighthouse, will be complex because their naming conventions differ. I will need to apply some type of labeling or something similar. Although I have reviewed Lighthouse, I came across all the flamegraphs they generated, went through them, and tried to make sense of it. So, I'm thinking of asking them for help or guidelines on the best way to profile Grandine with flamegraphs—specifically to get an overview of how they did theirs, the steps they took, what they considered efficient or not, and any inefficiencies they discovered, whether related to flamegraphs or not. My task last week was to profile Grandine with a focus on thread creation and destruction, which I identified as key areas from the previous graphs I generated using different parameters. I proceeded to profile Grandine with an emphasis on thread creation, which produced this output: ![Screenshot 2024-08-15 at 19.43.37](https://hackmd.io/_uploads/SyIlQq1s0.png) I generated two graphs using different parameters and conditions, and based on these, here is a high-level overview of the interpretations of the graphs. **Potential inefficiency:** The high percentage in the blocking pool could indicate that some CPU-bound tasks might be blocking the async runtime more than necessary. However, considering the discussion we had with Saulius about what is efficient and what is not, it's difficult to definitively state that this is an inefficiency. The approach I would likely take, in line with our discussion, is to further investigate potential inefficiencies in Grandine by focusing on the following areas; **Block Processing Optimization** **Strategy:** Profile individual steps of block processing (transaction validation, state updates, receipts generation) to identify bottlenecks. **State Management** **Strategy:** Analyze cache hit rates and state access patterns. Benchmark different storage models. **Network Layer Optimization** **Strategy:** Measure block propagation times and network message overhead. Profile peer connection handling. **Consensus Algorithm Efficiency** **Strategy:** Benchmark the time taken for consensus decisions. Analyze validator participation rates. **Transaction Pool Management** **Strategy:** Profile mempool operations under high transaction volume. Measure transaction ingress and egress rates. **Database Optimization** **Strategy:** Analyze database read/write patterns. Measure I/O wait times during peak operations. ## To-dos for next week: * Get in contact with the Lighthouse team to understand how they were able to profile Lighthouse, draw insights from them, and see if I can combine their approach with my own ideas. * Compare Grandine's performance metrics with other Ethereum consensus clients, including Lighthouse, under similar conditions, focusing on Ethereum-specific benchmarks like time to finality, block propagation time, and transaction throughput.