_2024-07-28_ # EPF5: Week 7 Apologies for not posting an update last week - I was hoping to be able to report some success, but was not quite there yet. Even now, the network simulation does not quite work yet. In my [project proposal](https://github.com/eth-protocol-fellows/cohort-five/blob/main/projects/network-simulations-with-shadow.md) I wrote that I plan three to four week to get the simulations working, so while I am still in time, I really hope to get the simulation running soon:tm:, especially since I still need to implement some features such as convenient metrics extraction for my experiments. However, I still feel like I am progressing well, and in this update, I will tell you about the issues I fixed in the past two weeks and more about the current state. But first, a small refresher about my project. ## Blockchain or kernel dev? In my project, I want to simulate Ethereum networks using Shadow, which allows us to simulate with unmodified client software (at least in theory). This works by intercepting system calls and reimplenting them as needed to route packets between clients and have a deterministic simulations. There are a _lot_ of system calls, and _a lot_ of of options for many of them. Shadow does not implement all of them yet, so the simulation might fail if the client software can not cope with a missing system call. Tracking them down is rather difficult - while Shadow helpfully reports which client called an unsupported system call or system call option, it of course can not tell us which line exactly caused that call. Luckily, the inevitably crashing client itself may report an error message or stack trace to point us in the right direction. Still, finding the source of those errors took a while. As of now, I identified roughly five missing features, [and track them in my fork of `shadow-ethereum`](https://github.com/dknopik/ethereum-shadow/issues). I successfully worked around all of them! One in Lighthouse, worked around with a config change, and four in Reth. I was able to work around one of those with a build option, but the other three required code changes in Reth or in Shadow. Long term, these workarounds should be replaced with proper implementations of the required features in Shadow, but as that is quite complex and time-intensive, I will take care of that later:tm:, so that I can focus on getting the simulation to work first. Special shoutout to MDBX, a key-value database. It was cause of almost all my issues with Reth, and a huge pain to deal with, as it is implemented in a [single C file with 35k lines](https://github.com/paradigmxyz/reth/blob/fe2af8fa5cf334e49c4001080065d1019b63f41a/crates/storage/libmdbx-rs/mdbx-sys/libmdbx/mdbx.c), filled to the brim with very low-level code and mazes of C preprocessors. I will spare you, and won't explain every single workaround in detail. Let's just say that I read way too many man-pages about syscalls, and too many kernel header files - after all, this fellowship is about Ethereum, not the Linux kernel. :) ## You get a PR, you get a PR, and you get a PR! I was very happy when Shadow first managed to simulate 30 minutes of an Ethereum network. However, I quickly realized that it's not quite working fine yet, as not a single block was produced, and all nodes reported 0 (zero) peers. Some of these issues were caused by wrong configuration, but some were caused by bugs in the clients! Last week, I found and fixed one bug in each of the involved clients: [Lighthouse](https://github.com/sigp/lighthouse/pull/6170), [Geth via `bootnote`](https://github.com/ethereum/go-ethereum/pull/30234), and [Reth](https://github.com/paradigmxyz/reth/pull/9858)! I am happy that I could contribute through that, and hope to do so even more as soon as I actually get started with simulations. ## Status Quo & Next Steps As already said in the beginning, the simulation is still not running smoothly. Reth still refuses to peer with each other, and for some reason, the CL clients split the chain even though they properly peer in the beginning. Of course, fixing these issues remains my #1 priority for the next week. Hopefully I can finish this next week so I can look into metric extraction. Metric extraction _should_ be fairly easy if I can get the clients to nicely dump the Prometheus metrics they provide. After that, I want to implement "pseudo-geograpic" latency between nodes: currently, all nodes have latency of 100ms, but that is unrealistic, as nodes have varying latency to each other depending on their global location. Ideally, I should be able to assign a number of nodes to each continent, and also be able to specify how many of those have a perfect, acceptable or bad connection, to be able to reflect the fact that Ethereum nodes are running on different network setups IRL. Another TODO is migrating from the shell scripts which currently generate the Shadow configuration and client configurations to a more sophisticated solution. The current solutions uses environment variables for the configuration input, which will get messier the more features I want to implement. Moving to TOML or YAML would make this far nicer. Lastly, special thanks to the EthPandaOps team and their excellent [ethereum-genesis-generator](https://github.com/ethpandaops/ethereum-genesis-generator)! It really made my life easier as I did not have to reimplement everything.