# Subsrtate next generation storage https://forum.polkadot.network/t/generalized-storage-proofs/1315 https://www.prestonevans.me/nearly-optimal-state-merklization/ https://www.rob.tech/blog/nearly-optimal-polkadot-merklization/ ## Symmetric ### 32 bit vs 64 bit `blake2b_compress` sends 128 bytes to 32 bytes, and is optimized for 64 bit machines. `blake3_compress` sends 64 bytes to 32 bytes, and is optimized for 32 bit machines. Would 64 bit blake3 be faster? Ain't clear.. Jeff> *I suppose 64bit machines' SIMD parallelizes this 32bit arithmetic nicely anyways, so this decision costs nothing on 64bit machines.* Jack O'Connor> *Yes, that comes up in the introduction to our paper, and we go into more detail in section 7.2.* In principle, one might design an analog of `blake3_compress` that sends 64 bytes to 32 bytes, while being optimized for 64 bit machines. Does the blake3 paper argue this looks unecessary? Can we freely mix these, or does this add cache pressure? ### Vec proofs Aside from blake3, almost all hash functions admit O(1) sized append proofs. Append proofs are 256-ish bytes in blake2b for example. Blake3 has both append and edit proofs of size roughly 32 bytes * log(len/?). An edit proof can read or write ytes in a bytes string, but cannot shift them left or right. Afaik only a Bao tree hashes like blake3 support these edit proofs. An edit proof is half the size if its compression sends 64 bytes to 32 bytes, like blake3 does. (See above 32 vs 64 bit question) ### Indexed by bit strings We should've storage proofs indexed by bit strings in which all values appear at leaves, never at internal nodes. These support efficent binary hashing of the form `blake3_compress(left,right)` (See above 32 vs 64 bit question). Ideally, all leaves should be blake3 or blake2b hashes, making all intermediate `left` and `right` values be hashes Also, leaf nodes could use `blake3_compress(prefix_len_byte,prefix,hash(value)` or similar. ### Prefix trie We'll want a prefix trie that compresses internal nodes efficently like `blake2b_compress(left, right, prefix_len_byte, prefix, hash(value))`. Also, leaf nodes could instead use `blake3_compress(prefix_len_byte,prefix,hash(value)` or similar. In this, we enforce `prefix_len < 32` which would not support all current substrate prefixes, but accounts likely become indexed by bit strings instead. A priori, prefix should only be remainder of the prefix, so "y", xa", "xb" results in "y" being a leaf, and "x" being an interal node with leaves "a" and "b". We expect prefixes should be handled ## Asymmetric ### KZG