# A preliminary note on Vitalik's proposal for arithmetic hash-based proto-danksharding by [Kurt Pan](https://twitter.com/kurtpan666) Vitalik recently wrote a great [post](https://ethresear.ch/t/arithmetic-hash-based-alternatives-to-kzg-for-proto-danksharding-eip-4844/13863) on the trade-offs about replacing the *KZG commitment* to *arithmetic hash-based one* in EIP-4844. The article is very instructive and forward-looking, but unless you are a tech-savvy and keep an eye on the state-of-art progress, there will be a lot of prerequisites to be met to understand the article. In this short article I will present to you some of the necessities and learning materials for understanding Vitalik's post. The purpose is not to be mathematically or cryptographically rigorous or complete, it is mainly to serve as a preview of the landscape. I hope this article will make the learning process a little bit easier for you. ## Mathematics *Mathematics* (for most people) is hard, especially the ones involved in blockchain frontier technologies, which to many seem like moon math. But at the same time math is necessary for the clarification of concepts, formal description and analysis, security proofs and many other necessary steps of research. I will briefly describe some of the mathematical concepts below in informal language, for details please refer to: - [The MoonMath Manual](https://leastauthority.com/community-matters/moonmath-manual/) by Least Authority ### Finite fields > A [field](https://en.wikipedia.org/wiki/Field_(mathematics)) is a set on which addition, subtraction, multiplication, and division are defined and behave as the corresponding operations on rational and real numbers do. > > A [finite field](https://en.wikipedia.org/wiki/Finite_field) (or Galois field) is a field that contains a finite number of elements. The most common examples of finite fields are given by the integers mod $p$ when $p$ is a prime number. ### Elliptic curves > **Elliptic curves** are geometric objects in projective planes over a given field, made up of points that satisfy certain equations. > One of their key features from the point of view of cryptography is that, if the underlying field is of positive characteristic, elliptic curves are *finite, cyclic groups*. > Further, it is believed that, in this case, the *Discrete Logarithm Problem* on many elliptic curve groups is hard, given that the underlying characteristic is large enough. Although a bit old, this is still a good introductory article on elliptic curves: - [A (Relatively Easy To Understand) Primer on Elliptic Curve Cryptography](https://blog.cloudflare.com/a-relatively-easy-to-understand-primer-on-elliptic-curve-cryptography/) by Nick Sullivan If you are a visual learner, also check out these tools for visualizing elliptic curves to help you understand: - [The Animated Elliptic Curve](https://curves.xargs.org/) - [Elliptic Curves over Finite Fields](https://graui.de/code/elliptic2/) - [Elliptic Curve Explorer](https://samuelj.li/elliptic-curve-explorer/) ### Pairings > An *elliptic curve pairing* is a function that takes a pair of points on an elliptic curve and maps them to an element of some other group, called the target group. > This mapping has a nice property called *bilinearity* that can be very useful in cryptography, which is similar to the multiplication of points in the group of elliptic curves. A *pairing-friendly curves* is a special class of elliptic curves, which have a pairing map attached to them. Pairings have a wide range of applications in cryptography and blockchain technology: both in zk-SNARKs proof systems and KZG polynomial commitment schemes. Here are some excellent learning materials on pairings: - [Exploring Elliptic Curve Pairings](https://vitalik.ca/general/2017/01/14/exploring_ecp.html) by Vitalik Buterin - [BLS12-381 For The Rest Of Us](https://hackmd.io/@benjaminion/bls12-381) by Ben Edgington ### Lagrange Polynomial Interpolation We all learned in high school that a line can be uniquely determined past two points. Similarly, $n$ points can uniquely determine a polynomial of order $n-1$. *Lagrange interpolation* is the method used to find the expression of such polynomial. It is widely used in tons of techniques such as [Reed-Soloman error-correcting code](https://en.wikipedia.org/wiki/Reed%E2%80%93Solomon_error_correction), polynomial commitment schemes ,zk-SNARKs, data availability sampling, even in [secret sharing](https://en.wikipedia.org/wiki/Shamir%27s_Secret_Sharing) , [threshold signature schemes](https://academy.binance.com/en/articles/threshold-signatures-explained), and many others. ## Polynomial Commitment Schemes In a *[commitment scheme](https://en.wikipedia.org/wiki/Commitment_scheme)*, the committer commits to an arbitrary message by outputting some commitment. The committer can then later reveal the message, and a verifier can validate that indeed the commitment corresponds to the message. *Polynomial commitment schemes(PCS)* are commitment scheme to a polynomial. PCS can achieve some nice additional properties: the committer should be able to “open” certain evaluations of the committed polynomial without revealing the entire polynomial. ### KZG *KZG* (which stands for Kate, Zaverucha and Goldberg) is a polynomial commitment scheme introduced in [this paper](https://www.iacr.org/archive/asiacrypt2010/6477178/6477178.pdf) in 2010. It has the advantage that both polynomial commitment size and opening proof size are constant (very small), which makes it very useful on blockchains where storage is quite expensive. However at the same time, it has disadvantages such as requiring elliptic curve support for pairing, requiring trusted setup, and not being resistant to quantum attacks. For details of how KZG works, we recommend reading these materials: * [KZG polynomial commitments](https://dankradfeist.de/ethereum/2020/06/16/kate-polynomial-commitments.html) by Dankrad Feist * [Kate-Zaverucha-Goldberg (KZG) Constant-Sized Polynomial Commitments](https://alinush.github.io/2020/05/06/kzg-polynomial-commitments.html) by Alin Tomescu * [KZG in Practice: Polynomial Commitment Schemes and Their Usage in Scaling Ethereum](https://scroll.io/blog/kzg) by Scroll ### Trusted Setup > A **trusted setup ceremony** is a procedure that is done once to generate a piece of data that must then be used every time some cryptographic protocol is run. Generating this data requires some secret information; the "trust" comes from the fact that some person or some group of people has to generate these secrets, use them to generate the data, and then publish the data and forget the secrets. But once the data is generated, and the secrets are forgotten, no further participation from the creators of the ceremony is required. > -[How do trusted setups work?](https://vitalik.ca/general/2022/03/14/trustedsetup.html) by Vitalik Buterin Requiring a (one-time) trusted setup ceremony is a drawback of using KZG, as it introduces another layer of trust distribution, which adds complexity in design and implementation. Some other PCSs (such as IPA and Merkle Tree described below) and some newer zk-SNARKs proof systems (e.g., STARKs) do not require such trusted setup ceremonies (called *transparency*), but they all have some other trade-offs in terms of size or efficiency. Some other related articles: - [A Guide to Understanding Trusted Setups](https://blog.pantherprotocol.io/a-guide-to-understanding-trusted-setups/) by Panther Team - [Setup Ceremonies](https://zkproof.org/2021/06/30/setup-ceremonies/) by Anthony Matlala - [The Trusted Setup Phase](https://decentralizedthoughts.github.io/2019-07-19-setup-assumptions/) by Ittai Abraham - [Diving into the zk-SNARKs Setup Phase](https://medium.com/qed-it/diving-into-the-snarks-setup-phase-b7660242a0d7) by Daniel Benarroch ### Inner Product Argument There are two alternatives to KZG commitments: - **Discrete-log-based Inner Product Argument (IPA) commitments** - **Merkle roots based on arithmetic-friendly hash functions**. [Vitalik's post](https://ethresear.ch/t/arithmetic-hash-based-alternatives-to-kzg-for-proto-danksharding-eip-4844/13863) focuses on this one. You can learn IPA by searching for keywords [Bulletproofs](https://crypto.stanford.edu/bulletproofs/) , [Halo](https://eprint.iacr.org/2019/1021.pdf) etc. You can also check these: - [Halo and more: exploring incremental verification and SNARKs without pairings](https://vitalik.ca/general/2021/11/05/halo.html) by Vitalik Buterin - [Inner Product Arguments](https://dankradfeist.de/ethereum/2021/07/27/inner-product-arguments.html) by Dankrad Feist - [The halo2 Book](https://zcash.github.io/halo2/index.html) - [Understanding Inner Product Argument](https://suyash67.github.io/homepage/project/2020/06/28/inner-product-argument.html) by Suyash Bagad ### Merkle Tree and Verkle Tree > [Merkle tree](https://en.wikipedia.org/wiki/Merkle_tree) is a hash tree in which every "leaf" (node) is labelled with the cryptographic hash of a data block, and every node that is not a leaf (called a branch, inner node, or inode) is labelled with the cryptographic hash of the labels of its child nodes. A Merkle tree allows efficient and secure verification of the contents of a large data structure. (called a *Merkle proof*) Merkle trees are such a fundamental concept and are the basis for further understanding of many more advanced concepts such as vector commitments, [accumulators](https://en.wikipedia.org/wiki/Accumulator_(cryptography)), [authenticated data structures](https://dl.acm.org/doi/10.1145/2535838.2535851), membership proofs, etc. You can check out these excellent links to learn more. - [What is a Merkle Tree?](https://decentralizedthoughts.github.io/2020-12-22-what-is-a-merkle-tree/) by Alin Tomescu - [Merkle Tree in Blockchain: What is it, How does it work and Benefits](https://www.simplilearn.com/tutorials/blockchain-tutorial/merkle-tree-in-blockchain) by Ravikiran A S These will be useful if you want to learn more about an improved variant called *Verkle Tree*: - [Verkle trees](https://vitalik.ca/general/2021/06/18/verkle.html) by Vitalik Buterin - [Merkle trees vs. Verkle trees, Explained](https://cointelegraph.com/explained/merkle-trees-vs-verkle-trees-explained) by Jagjit Singh - [Take a Deep Dive Into Verkle Tree For Ethereum](https://hackernoon.com/take-a-deep-dive-into-verkle-tree-for-ethereum) by sin7y ## zk-SNARKs Welcome to the wild frontier of cryptography and blockchain technology! *zk-SNARKs* is much more than a buzzword that will quickly fade with the wind; instead, you can learn (almost) the entirety of cryptography by diving down this rabbit hole of zk-SNARKs. There is a wealth of learning material available online, just to name a few. - [ZK Whiteboard Sessions](https://zkhack.dev/whiteboard/) by ZK HACK - [Proofs, Arguments, and Zero-Knowledge](https://people.cs.georgetown.edu/jthaler/ProofsArgsAndZK.html) by Justin Thaler - [CS294: Foundations of Probabilistic Proofs (F2020)](http://people.eecs.berkeley.edu/~alexch/classes/CS294-F2020.html) Alessandro Chiesa - [Anatomy of a STARK](https://aszepieniec.github.io/stark-anatomy/) by Alan Szepieniec Or you can just refer to [Kurt Pan's Awesome Zero-Knowledge Proofs (2022)](https://site.kurtpan.pro/ktpzkp22.html) and I will keep updating this list. ## Arithmetic Hash Functions *Arithmetization* is the process of turning a generic statement or question into a set of equation to be verified or solved, which is a necessary sub-process (usually the first step) in most zk-SNARKs proof systems. **Arithmetic (or ZK-friendly/ SNARK-friendly) hash functions** are those hash functions that are designed to have a "simple" representation after arithmetization process. They can be verified with a simple verifier with a very low number of constraints. The Ethereum Foundation is holding a [ZK Hash Function Cryptanalysis Bounty](https://www.zkhashbounties.info/) in 2021 to design and analyze new ZK-friendly hash functions, one of which is [Poseidon](https://www.poseidon-hash.info/ ). In addition to Poseidon, there are many other designs, like: - [Pedersen Hash](https://iden3-docs.readthedocs.io/en/latest/iden3_repos/research/publications/zkproof-standards-workshop-2/pedersen-hash/pedersen.html) - [Rescue-Prime](https://www.esat.kuleuven.be/cosic/sites/rescue/about-rescue-prime/) - [MiMC](https://byt3bit.github.io/primesym/mimc/) - [Reinforced Concrete](https://eprint.iacr.org/2021/1038) I have to give a caveat that all of the above hash functions are still very young , compared to more mature ones like [SHA256](https://en.wikipedia.org/wiki/SHA-2) and [Keccak](https://en.wikipedia.org/wiki/SHA-3), and need to be tested and analyzed more before used in production. ## Danksharding Recently, Vitalik released [an updated Ethernet roadmap](https://twitter.com/VitalikButerin/status/1588669782471368704), and we can clearly see that EIP-4844 and Danksharding are the next big steps for Ethereum after [The Merge](https://ethereum.org/en/upgrades/merge/). > **Danksharding** combines multiple avenues of cutting edge research to provide the scalable base layer required for Ethereum’s rollup-centric roadmap. We can roughly say that `Full Danksharding = Proto-danksharding + Data Availability Sampling + PBS + 2D KZG scheme + Proof-of-custody + more`. Here are the best materials for learning about Danksharding: - [The Hitchhiker's Guide to Ethereum](https://members.delphidigital.io/reports/the-hitchhikers-guide-to-ethereum) by Jon Charbonneau - [The Complete Guide to Rollups](https://members.delphidigital.io/reports/the-complete-guide-to-rollups) by Jon Charbonneau - [DankSharding – What is it and how does it work?](https://www.rootstrap.com/blog/danksharding-what-is-it-and-how-does-it-work/) by Rafael Fuentes - [What is danksharding?](https://www.alchemy.com/overviews/danksharding) ### Proto-danksharding (EIP-4844) On the road to full Danksharding, some of the goodies are introduced by EIP-4844 : - Data blob-carrying transaction format - KZG commitments to the blobs - etc. You can find almost everything about EIP-4844 on this website: - https://www.eip4844.com/ ### Data Availability Sampling (DAS) > **Data Availability Sampling (DAS)** enables each node only ends up downloading a small portion of the total data. Each node (including client nodes that are not participating in staking) checks every blob, but instead of downloading the whole blob, they privately select N random indices in the blob, and attempt to download the data at just those positions. Finally, the reference list of DAS: * [Data Availability Sampling: From Basics to Open Problems](https://www.paradigm.xyz/2022/08/das) by Paradigm * [An explanation of the sharding + DAS proposal](https://hackmd.io/@vbuterin/sharding_proposal) and [Data availability sampling in practice](https://notes.ethereum.org/@vbuterin/r1v8VCULP) by Vitalik Buterin * [Celestia's Data Availability Layer](https://docs.celestia.org/concepts/how-celestia-works/data-availability-layer) by Celestia ## Conclusion When you keep learning and finally get the feeling that everything is connected, you're not far from the cutting edge of innovation. Keep Building!