EPF5 Week 12 Updates

# EPF5 Week 12 Updates This week wasn't super productive. I thought I found a meaningful improvement to lighthouse and grandine's list decoding functions back in [week 10](/Njz0P7sASKuu4804DkFutg). The issue with using `map_while` was that it burried any errors during the decoding process and would simply terminate early. If a list of 10 items hit an error 3 items in, the next items would never get decoded and the error wouldn't be resurfaced. Of course, we can definitely *tell* if there was an issue while decoding. However, there's no natural and efficient way to surface that (none that I can think of at least). I'll continue thinking about it, but didn't want to get stuck on it so I'm moving on for now. ## What Does Speeding Up SSZ Entail Anyway? A number of the best performing serialization schemes owe their performance to the types they use and their representation in memory (like rkyv). These aspects are either not amenable to ssz (and incompatible with the goals of an *Ethereum* serialization scheme), or are irrelevant to how I could improve on an *implementation* of ssz. The easiest way to speed up serialization is to use efficient data structures that are easier to encode/decode. This is obvious if all you care about is encoding and decoding, but sometimes the best data structure for the task is at odds with the overall goals of the application. A good example of this is the use of `milhouse::List` in lighthouse and `grandine_ssz::PersistentList` in grandine. While they're slower to encode/decode than a regular list, they also help clients manage state more efficiently, which is arguably more important. Optimizing the types used for serialization doesn't have that much of an impact in the grand scheme of things, while also massively complicating the implementation of ssz in these clients. So we need to eke out whatevery performance we can get while staying within the confines of what clients can manage in terms of complexity. What do? I have a few ideas on how we can make do. The common thread with these ideas is reducing the number of allocations first and foremost, and reducing the size of the allocations second. The size of an allocation does have some effect on latency I believe, but we can shave more time off our latency by avoiding allocations altogether. ### Zero-Alloc Encoding [ssz15](https://github.com/karalabe/ssz/tree/main) does a good job of encoding types without holding any intermediate buffers and making extra allocations. I'm not sure if grandine does this, I'm still trying to grok their crate. ### Scratchspace and Memory Arenas Allocated memory should be recycled as much as possible, so I'd like to design a crate that can reuse current allocations where possible. [Arenas](https://donsz.nl/blog/arenas/) are a neat way to save on allocation time in performance critical applications. ### Cheaper Error Handling Gracefully handling encoding/decoding errors without introducing unneccessary allocations, say while decoding lists with `itertools::process_results`. ### Tail-Call Optimization? I read an interesting article on how [protobuf parsing was sped up using tail-call optimization](https://blog.reverberate.org/2021/04/21/musttail-efficient-interpreters.html). It sounded interesting and I want to try using it if the improvement is worthwhile. ### SIMD The changes above are relatively low-hanging fruit compared to SIMD. There's probably some gains to be made with this but it's not my first priority, given how much it would increase complexity.