EPF5 Final Update

# Final Dev Update - Benchmarking SSZ & Building a High Performance SSZ Implementation ## Abstract My project for this cohort had two parts: the first was improving the state of benchmarking for ssz libraries in the rust ecosystem, and the second was shipping my own high performance implementation. [SSZ](https://epf.wiki/#/wiki/CL/SSZ) is a serialization scheme designed for the Consensus Layer, that comes with schemas for more efficient encoding/decoding compared to [RLP](https://epf.wiki/#/wiki/EL/RLP). There are a few rust implementations like [lighthouse's](https://github.com/sigp/ethereum_ssz), [grandine's](https://github.com/grandinetech/grandine/tree/develop/ssz), and [Alex's](https://github.com/ralexstokes/ssz-rs). However, there is no convenient way to measure these crates, and other crates in the ecosystem, against each other. Furthermore, these crates only test against the consensus spec tests so it's unclear how they perform on real beacon block and beacon state data. The [ssz-arena](https://github.com/ghiliweld/ssz-arena) fixes that. This benchmarking suite compares various rust implementations (including mine) against each other. It displays how they stack up in terms of runtime *and* memory allocation. It was crucial that it also profiled memory allocation since that's the dominant operation in a serialization library (there are no other expensive operations). For beacon block and state encoding/decoding, we fetch data from [sync-mainnet.beaconcha.in](https://sync-mainnet.beaconcha.in/), a beacon chain checkpoint provider. Users also have the option of bringing their own data locally to save time when running the benchmarks. I also worked on [sszb](https://github.com/ghiliweld/sszb), a high performance ssz implementation in rust. It's really fast, although still lagging behind grandine's in some instances. It's almost at feature parity with the other crates in the ecosystem, and I'm also working on adding support for checking "well-formedness" of input bytes for early filtering instead of having the decode the entire input. ## Status Report I'm satisfied with the feature set of ssz-arena, I still have some usage and contribution docs left to write. I want to get this benchmarking suite to a place where it's *the* place to compare ssz crates against each other, and part of that is making it really simple to submit a PR to add new implementations. Currently, sszb supports encoding and decoding for basic types and containers. Unions were omitted since they don't seem to be used very much, Péter also ommited them in [his implementation](https://github.com/karalabe/ssz) for the same reason. There's still a few items to check off the list before I publish sszb on crates.io, which I'll talk about in the next section, but for the remainder of the fellowship I wanted to focus on making sszb as fast as possible. I'll also take some time at Devcon to talk to teams that might want to use this, aside from rust client teams of course. ## Future of the Project I definitely want to continue maintaining this project, I'm very proud of the work I did and hope that this crate becomes the de-facto rust ssz implementation in the ecosystem. Here's what I have planned for the future of sszb. I still need to implement merkleization and merkle proofs (with generalized indices) to reach feature parity with other crates, but I decided to focus on making encoding and decoding as performant as possible for the duration of the fellowship. I believe that will lay the groundwork for faster merkleization and proof generation. I'm also interested in implementing a preemptive byte checker for ssz encoded objects, similar to [bytecheck](https://github.com/rkyv/bytecheck). This would allow for early validation that some input bytes conforms to how a type `T` should be encoded, essentially a "well-formedness" check (h/t [Daniel](https://github.com/dknopik)). This came up when I was discussing partial decoding (more on that shortly), and Daniel brought up needed to ensure some bytes were well-formed before partially decoding. This idea has other applications though, in theory some application developer could reject malformed inputs earlier rather than fully decoding on the hotpath of their application. After all that is done, I will be cleaning up the repo to gear up for a stable release. That entails adding usage docs as well as doc comments before publishing. It might take more time to integrate into a client like lighthouse, but more nimble projects might be able to adopt sszb sooner. I'm already in touch with the team at [Chainbound](https://chainbound.io/), and I'd be really happy if I got them to integrate sszb in their stuff. I'll make some PRs once the library is stable. Lastly after the stable release, I want to support partial decoding, re-encoding and re-hashing to squeeze the last bit of performance possible. There are instances where a developer does not need to decode the entire input (say a beacon state), but rather just wants a subset of it. No rust implementation currently supports this. For large objects like beacon state, full decoding is rather expensive so partial decoding would be a game changer. Re-encoding and re-hashing work similarly where a developer just wants to change a subset of the input without re-encoding the entire thing. The Sigma Prime team already implemented their beacon state data structure with [sparse updates in mind](https://lighthouse-blog.sigmaprime.io/tree-states-part1.html). Partial encoding is serialization in that same vein. The DevEx and API design around this is tricky to get right, I want to take the time to talk to more teams before hammering away on this feature. ## My Experience with the EPF The most interesting part of this project was learning what makes a serialization library performant. In many of the readings I did prior to starting, the boundary between serialization and data structure optimization was blurry. The most powerful optimization you can do is optimizing the underlying *data structure* you're encoding/decoding and making it more amenable to those tasks. This includes optimal memory alignment for zero-copy serialization and deserialization. However, most of the literature on this assumes you have full-control over the data types being used. It was unclear what room I had to improve things in the ecosystem, given that data structure design was not something I had control over. For example, I may want to use a simpler list type in Lighthouse since that would have the greatest effect in speeding up encoding/decoding, but the Lighthouse team has their own reasons for using milhouse that are more beneficial overall for the client. In the end, I settled on minimizing memory allocations where I could since that is the dominant cost in serialization and deserialization. My experience with the EPF was great, I'm really thankful for the opportunity to do this kind of work especially among peers. Michael Sproul, the main contributor on `ethereum_ssz`, was very helpful and a great mentor to bounce ideas off of. I got more context into the decisions made re: using milhouse and the inevitable effect that had on decoding performance. It helped inform the work I did with sszb. Thank you to Josh and Mario for the opportunity, I had a great time these last 5 months!