EPF5 Week 2 Updates

# EPF5 Week 2 Updates When I gave my weekly update on June 17, someone shared some links with me on previous work related to serialization on Ethereum. So I'm going to spend my week going through those and learning about [Simple Streamable Serialize (SSS)](https://github.com/ethereum/EIPs/blob/71098b1c2760f2ae557a7bab91770eb8cf72fed5/EIPS/eip-sss_serialization.md), its [spec](https://github.com/ethereum/bimini/blob/master/spec.md), and the [bimini](https://github.com/ethereum/bimini/tree/master) implementation. This [gist](https://gist.github.com/pipermerriam/ced4ed8f1fbab732120986f57db5069a) is helpful too. In addition, I went down a rabbithole on nested (and nested stack) automata. Hopefully it'll be relevant: - https://www.cis.upenn.edu/~alur/Jacm09.pdf - https://en.wikipedia.org/wiki/Nested_stack_automaton - https://en.wikipedia.org/wiki/Nested_word - https://arxiv.org/pdf/2010.06037 - https://arxiv.org/pdf/2209.10312 - https://homepages.inf.ed.ac.uk/libkin/papers/tocs11.pdf if there is some link between RLP and nested automata, I wonder if the literature on nested automata could help with faster reads on RLP encoded strings. Judging by the [devp2p protocol msg spec](https://github.com/ethereum/devp2p/blob/master/caps/eth.md#protocol-messages) and the [`EthStream`](https://github.com/paradigmxyz/reth/blob/main/crates/net/eth-wire/src/ethstream.rs#L267) type in reth, the message types are known ahead of time and invalid messages get rejected. That being the case, it's not clear to me why a self-describing format like RLP is needed. Is it, say, because the `data` field in a tx can be arbitrary (with arbitrary structure)? That seems to be abi-encoded though. RLP can be indexed or read faster than linear time because contents are unknown at the prefix. The entire object must be decoded first. Came across an [interesting conversation about SSS](https://github.com/ethereum/EIPs/pull/1805) and this [hackmd post](https://notes.ethereum.org/QF8jgOQbRTWUhK1zoi8D4Q#). These would suggest that we could obtain some good savings by encoding numbers more efficiently. #### Questions - do data availability and L2s have different serialization requirements? - is encoding/decoding happening in a word aligned manner? - can we benefit from vectorization? - would appending an index after the data work for faster access while being backwards compatible + opt-in? ### This Week's Links - https://won.hashnode.dev/exploring-serialization-methods - http://www.idryman.org/blog/2017/06/28/opic-a-memory-allocator-for-fast-serialization/ - https://dspace.mit.edu/bitstream/handle/1721.1/152815/limarta-limarta-meng-eecs-2023-thesis.pdf?sequence=1&isAllowed=y - https://en.wikipedia.org/wiki/Skip_list - https://en.wikipedia.org/wiki/Skip_graph - https://mechanical-sympathy.blogspot.com/2012/08/memory-access-patterns-are-important.html - https://github.com/ethereum/EIPs/blob/71098b1c2760f2ae557a7bab91770eb8cf72fed5/EIPS/eip-sss_serialization.md - https://github.com/ethereum/bimini/blob/7c26efec585742ef870bf58ea5d96e2deb242775/report.md#sss-vs-rlp-summary - https://github.com/ethereum/bimini/blob/master/spec.md - https://news.ycombinator.com/item?id=38684724 - https://news.ycombinator.com/item?id=11263378 - https://arxiv.org/pdf/1910.05109 - https://docs.rs/prefix_uvarint/latest/prefix_uvarint/ - https://news.ycombinator.com/item?id=11263667