History, motivation and introduction of ZKP

# Proof Systems Before drilling into the nitty-gritty of ZKP, we first refer the classic proof system. > Proof System is just a interactive process, where composed of a prover and a verifier. ![Proof System Illustration](https://hackmd.io/_uploads/ryJ7N0Nj3.png) *Proof System Simpified Illustration* - Prover has to **submit the proof** (a string/ binary, etc) and the Verifier has to **calculate for the result** (1 for true/accept; 0 for false/reject). - Prover works hard (calculate proof with time) to compute the proof, yet **Verifier has limited time to calculate**. - In the system, the prover and verifier are actually algorithms. *** ![Proof System Detail](https://hackmd.io/_uploads/S1HS404sh.png) *Proof System in Detail* In Computer Science, we usually use an **Efficient Verifiable Proofs (also NP-proofs)**, which has a trait that proof is short and verifier doesn't have all the time to read it (polynomial time). 1. Firstly, set up a claim $x$. 2. **Prover** , a input $w$ (represent proof, a binary string), which $|w| = \text{polynomial in} \ |x|$, and **Verifier** takes time polynomial in $|x|$. 3. After this, as mentioned above, the prover and verifier are both algorithms. What the **Verifier** does is to put the information into a function `V(x,w)`, which would finally become the result (True or False). *** ## Example 1: **Claim: N is a prodect of 2 large primes.** ![](https://hackmd.io/_uploads/SyWRdRVoh.png) **The example proof system* What does V knows after the interaction? 1. N is a product of 2 primes. 2. The two primes are p and q. 3. in this case V(x, w) might be: ```python def V(x, w): # x = N # w = (p, q) return N == p*q ``` What V has to do is to multiply p and q (polynomial time), and see if the result is N or not. - If N = pq, V accepts - Else, V rejects *** ## Example 2: **Claim: y is a quadratic residue mod N ($i.e \ \exists x \in \mathbb{Z^*_N} \ s.t. \ y = x^2 (mod N)$)** ![](https://hackmd.io/_uploads/Bk0uYm8o2.png) > quadratic residue just means the modulus of x square. **It is hard to find a square root of y mod N.**(just like "factoring N", a classic hard problem) Although you might think that finding a square root is really eazy when N is small, (for example, given a pair $y = 9, N = 11$, $x$ can be eazily 3), however, when N become larger (i.e a composite of 2 large primes), it would be computationally difficult. What does V knows after the interaction 1. y is a quadratic residue mod 2. Square root of y 3. as a result, V are able to use a function to verify y, x and N like this. ```python def verify(y, x, N): # Calculate x squared mod N x_squared_mod_N = (x ** 2) % N # If x squared mod N equals y, then y is a quadratic residue mod N if x_squared_mod_N == y: return True else: return False ``` V calculates the result to check if the $(x^2 \mod N)$ is equal to $y$. *** # NP-Languages > Definition of Language $$ A \ language \ L \ is \ a \ set \ of \ strings \ x. $$ which means it's a set of strings x that satisfies some properties. In the proof system, the thing we care about is the claim $x$ (binary) and the string $w$ from P, and V then computes on x and w in polynomial time to **check that $x$ is in $L$**. > A little bit bonus material of $L$ - Let $\Sigma$ be the underlying set of symbols, $\Sigma^*$ denotes the set of all finite strings. - A formal language is a subset of $\Sigma^*$ - A fact of $L$: **Any decisional problem can be encoded as a language.** Thus, "Determine if x has property Y" can be encoded as the membership of $$ L = \{x:x \ has \ property \ Y \} $$ e.g., quadratic residue of $y=x^2 (mod \ N)$ can be encoded as: $$ QR = \{(N, y): \exists x \ s.t. \ y = x^2 (mod \ N) \} $$ e.g., the graph isomorphic problem can be encoded as: $$ GI = \{(G_1,G_2): G_1 \ and \ G_2 \ are \ isomorphic \} $$ > Definition of NP-Language ![](https://hackmd.io/_uploads/ryDz81Ho2.png) **Terms used in definition of NP-Language* $$ L \ is \ a \ NP \ Language \\ (or \ NP \ decision \ problem), \\ if \ there \ is \ a \ poly(|x|)time \ verifier \ V where $$ - Completeness: (True claims have (short) proofs): - if $x \in L$, there is a $poly(|x|)long$ witness $w \in \{0, 1\}^* \ s.t.\ V(x,w) = 1$ - In other words, if the input is **valid**, then the result should return **True**. That is, if both prover and verifier act **"honestly"**, the underlying statement is true, then the proof should be accepted. - Soundness: (False theoremss have no proofs): - if $x \notin L$, there is no witness. That is, for all $w \in \{0, 1\}^* \ V(x,w) = 0$ - On the other hand, if the input is **invalid**, it's theoretically impossible to fool the verifier to return **True**. Thus, a lying prover cannot convince an honest verifier to believe an invalid statement is valid. **To Summarize**, Completeness allows honest provers to convince the verifier. Soundness ensures that regardless what kind of cheating that prover trying to do, the **prover will not be able to convince the verifier**. And most importantly, what prover tries to do is **to convince the verifier that the claim is true.** *** # What is ZKP? Zero Knowledge Proof stand for the 3 letter in ZKP, which first appeared in the paper - [The Knowledge Complexity of Interactive Proof Systems](http://people.csail.mit.edu/silvio/Selected%20Scientific%20Papers/Proof%20Systems/The_Knowledge_Complexity_Of_Interactive_Proof_Systems.pdf) in 1985 by Shafi Goldwasser, Silvio Micali and Charles Rackoff. During the time from 1982-1985, the 3 scientists wanted to figure out **"Is there other way to prove these type of theorems?"** For example, **can a prover convince that N is product of large primes, p and q, without sending p and q to the verifier?** The main idea of their research is to $$ \text{Prove that I could prove it if I felt like it} $$ To be more simplified, We can explain ZKP as below: > A zero-knowledge protocol is a method by which one party (the prover) can prove to another party (the verifier) that something is true, **without revealing any information** apart from the fact that this specific statement is true. > > -- from [**zero knowledge proofs page** - ethereum.org](https://ethereum.org/en/zero-knowledge-proofs/) Here I would like to introduce **the 3 main properties of ZKP**. Every element should exist to become a zero-knowledge protocol. 1. Completeness 2. Soundness 3. Zero Knowledge # Interactive Proof (IP) System > here the system changed the way to communicate between prover and verifier. The system changes an interactive way to communicate between P & V, which the messages goes forward and backward in multiple times (just like **ask & answer**, or **server & client model**). ![](https://hackmd.io/_uploads/Sy3QaYPs3.png) **Interactive Proof System* ## Example **Claim: An Interactive Proof for Quadratic Residue =** $\{ (N, y): \exists x \ s.t. y = x^2 (mod \ N) \}$ However, we ❌ **do not want** the prover send the square root of y to prove this claim; we want to do it in **the spirit of "Zero Knowledge."** Here's a solution method: ![](https://hackmd.io/_uploads/BkSQyiwo2.png) 1. P choose a random number r $\{ r \ | \ 1 \le r \le N \ s.t. \gcd(r, N) = 1\}$ 2. P send $s = r^2$ and talk to V: - Giving s or sy would convince V to believe that the claim is true, yet V would reveal $\sqrt{y} \ (mod \ N)$ immediately, which we don't want to happen. - So, P would send either $\sqrt{s} \ (mod \ N)$ or $\sqrt{sy} \ (mod \ N)$ to V and V would have the right to choose. 3. V would flip a coin $c$ to choose which proof will he take. 4. if - c=0: send $z=r$; - c=1: send $z=r\sqrt{y} \ (mod \ N )$; V will only accepts only if $z^2 = sy^c \ (mod \ N)$ - if c=0, $z=r$, $\therefore r^2 = s * y^0 = s \ (mod \ N)$ - else if c=1, $z=r\sqrt{y} \ (mod \ N )$, $\therefore s * y = s * y^1 \ (mod \ N)$ Here comes the properties - **Completeness**: if Claim is true, then V will accept. - **Soundness**: if Claim is false (P can only solve when c=0), thus the probability of V to accept the claim is $1/2$. If the protocol repeat 10000 times or even more, the prob. that V accept would be negligible (${1/2}^{10000}$). *** ## Interactive Proofs for a language $L$ ![](https://hackmd.io/_uploads/B1Yx-ovih.png) > Definition **$(P,V)$ is an IP for $L$, if V is probabilistic poly(|x|)time &** - Completeness: - if $x \in L$, V always accepts. - $\equiv$ if $x \in L$, $Prob.[(P,V)(x) = accept] = 1$ - Soundness: - if $x \notin L$, for all cheating prover strategy, **V will not accept except with negligible probability**. - $\equiv$ if $x \notin L$, for any $P^*$ (means cheating strategy), $Prob.[(P^*)(x) = accept] = negl(|x|)$ if you wanna prove the Completeness and Soundness in an IP system, you can just do the following: ![](https://hackmd.io/_uploads/r111GbIs3.png) **a example to find Completeness and Soundness in an IP system* # Where is Zero-Knowledge? Completeness and Soundness are satisfied above, yet there's still the zero-knowledge part. To fulfill that trait, we should introduce a new tool called simulation paradigm. ## Simulation Paradigm/ Simulator > [simulation paradigm]() is commonly used in crpytography field. The idea is like the figrue below: ![](https://hackmd.io/_uploads/ry5_IaDi3.png) **simulator illustration* Imagine that simulator is a machine that is able to mimick the interaction between P & V, and most importantly, it is able to construct the **exactly the same prob. distribution (may have a little bit difference)** as the real scenario. The same prob. distribution means: there's a poly-time distinguisher X (person on the left). X **cannot tell whether the real views and simulated views are different**. Then, we would say these two views are **computationally indistinguishable**. *** ## Zero-Knowledge > idea The idea of zero knowledge is depends on V. **If the information V gets before the interaction in IP == the information V gets after the interaction**, then we would say this protocol is **zero-knowledge.** In other words, what is in the interaction between P & V only convince V to believe that the claim is true, yet do not leak any other information. > Definition An Interactive Protocol $(P, V)$ is zero-knowledge for a language $L$ if there exists a **polynomial-time** algorithm Sim(the simulator) s.t. for every $x \in L$, $$ view_V(P,V)[x] = \{ (q_1,a_1,q_2,a_2,... coin \ tossed \ by \ V) \} \\ (computationally \ indistinguishable) \approx \\ Sim(x,1^{\lambda}) (1^{\lambda}: we \ want \ Sim() \ to \ run \ in \ polynomial \ time) $$ *** However, this is not done yet. The protocol above only consider the **honest verifier $V$** to do their business. In the real situation, we should also think of a **cheating verifier $V^*$**. Therefore, the true definition of zero-knowledge would be: An Interactive Protocol $(P, V)$ is zero-knowledge for a language $L$ if **for every polynomial-time $V^*$** there exists a **polynomial-time** algorithm Sim(the simulator) s.t. for every $x \in L$, $$ view_V(P,V)[x] \approx Sim(x,1^\lambda) $$ **To sum up**, we finally collect the 3 essential elements that a zero-knowledge interactive protocol should have: 1. **Completeness**: If the statement is true, the honest verifier (that follows the protocol) will be convinced by an honest prover. 2. **Soundness**: If the statement is false, no cheating prover can convince the honest verifier that it is true, except with some small probability. 3. **Zero-Knowledge**: If the statement is true, no verifier learns anything other than this fact. > note: different types of ZK: > - Computationally indistinguishable distrubutions: CZK > - Perfect identical distrubutions: PZK > - Statistically close distrubutions: SZK *** # Zero Knowledge Proof of knowledge Prover actually have done 2 things: 1. proved that the theorem is correct 2. also proved that she **knows** a square root mod N. (followed by the exmaple of QR). Considering the Language: $$ L_R = \{x: \exists w \ s.t. \ R(x,w) = accept \ \} \\ for \ poly-time \ relation \ R $$ > Definition $$ (P,V) \ is \ a \ proof \ of \ knowledge \ (POK), \\ for \ L_R \ if \exists \ polynomial \ time \ (knowledge) \ extractor \ algo. \ E \\ s.t. \ \forall x \in L, \\ in \ expected \ polynomial \ time \ E^P(x) \ outputs \ a \ witness \ w \\ s.t. \ V(x,w) = accept $$ **About Extractor**: 1. It is a machine/ algorithm that is able to extract (**pull out**) the witness (**the secret knowledge, for example the $r \ or \ r\sqrt{y}$ in QR**), such that $R(x,w) = accept$. 2. It works in **Expected Polynomial Time**: it must be able to do this in a reasonable amount (poly) of time. 3. The existence of the extractor ensures that the P isn't **bluffing** or using some trick to convince V. **If the extractor can derive the witness from the prover's messages, then the prover must truly knows the witness.** 4. $E^P(x)$ means it may run repeatedly P on same randomness (the well-known **rewinding technique**, just imagining **a time machine that can travel back a certain point to start over again.**) - This would also prove that **anything that V can learn from P can be simulated without P's help**. 5. if $Prob[(P,V)(x) = accept] = \alpha$, the $E^P(x)$ runs in expected $poly(|x|,1/\alpha)$ time So, an extractor algorithm would be: 1. Run P & receive $s=r^2 (mod \ N)$ 2. set V message to "head" 3. store $r$ --- > Rewind to the second step: 3. and $2^{nd}$ time set V msg to "tail". receive $rx$ 4. Output $rx/r=x (mod \ N)$ By doing so, the extrator can prove that **P actually knows square root x (mod N).** # Exercise for isomorphic graph ![](https://hackmd.io/_uploads/Sk_PT5uin.png) # Do all NP Languages have Zero Knowledge Interactive Proofs? The answer is **YES**, by the theorem: **If one-way function exist, then every L in NP has computationally zero knowledge interactive proofs** in [this paper](https://link.springer.com/content/pdf/10.1007/3-540-47721-7_11.pdf) There are 2 main point in their research: 1. Show that an NP-Complete Problem has a ZK interactive Proof. 2. One-way functions: hiding and binding commitment protocol/ scheme. ## Show that NP-complete problem has a ZKIP > Because by doing so, it'll be sufficient to establish that every language in an NP has a computational zero-knowledge interactive proof. Namely, if I have a way to have a zero-knowledge interactive proof of "whether the graph $G_x$ is [three-colorable](https://www.geeksforgeeks.org/3-coloring-is-np-complete/)," it will also imply that **I have zero-knowledge of interactive proof of x being in the language, for every language, which is an NP.** So what they showed in their paper is a zero-knowledge interactive proof for a particular NP complete problem, the three coloring problem. ## Commitment Protocol/ Scheme > Here brought an magic tool called commitment scheme commonly used in cryptography field. In brief, there are 2 algorithms and 2 roles (commiter Bob and decommiter Alice) in the commitment scheme: ### Algorithms 1. `commit(msg, r) -> com` - msg: message - r: secret randomness in $R$ (Random space) - com: commitment string 2. `verify(msg, com, r) -> accept/reject` - anyone can verify that commitment was opened correctly ### Roles 1. Bob (commiter/sender) 2. Alicie (decommiter/receiver) In the commitment scheme, Bob sends a **commitment (just imagine it as a string), which should be hidden in some form and should not be altered after generated,** to Alice (decommiter/receiver). Afterward, Bob should also send a **solution (may be a way to decrypt the commitment)** that how he generated this commitment, with some additional information. Since Bob sent a commitment and the recipe how he made it, Alice is able to **verify** if the commitment and the method is correct or not. > Cryptographic Commitments would be introduce more deeply in the [this note](https://hackmd.io/C7e1M3IgSJOVr9YfMRF9uw). ## Proof Process > suppose that there's a graph $G = (V, E)$ & prover input color $pi: V: \{ 0,1,2\}$ (each number represent different color) 1. **Prover** would pick a random permutation $sigma$ of colors $\{0,1,2\}$ and color the graph with coloring $\phi(v) := \sigma(\pi(v))$, and commit to each colot of each vertex v by running Commit(\phi(v)) protocol. 2. **Verifier** would select a random edge $e=(a,b)$ to send to Prover. 3. **Prover** decommit colors $\phi(a)$ & $\phi(b)$ of vertices a and b. Therefore, - if a verifier founds that $\phi(a) = \phi(b)$, he would reject - else, verifier repeats step 1~3 and accepts after $k$ iterations. To become a ZKIP, we should find the 3 crucial properties: - Completeness: if G is 3-colorable, then the **honest prover** uses a proper 3-coloring and the verifier should always accept. - Soundness: If G is not 3-colorable, then for all $P^*$(Bad Prover), $Prob.[Verifier \ accepts]$ $<$ $(1 - (1/{|E|}))^k$ $<$ $1/e^{|E|}$ for $k = {|E|}^2$. - Zero Knowledge: Would be explain in more detail below. **Zero Knowledge Property Proof** Little recap: to prove zero-knowledge, **we should make sure that information that V get before is excaltly the same after communication**. And by proving, we introduce **Simulator, called S in this case.** > Case 1: if Verifier is honest. S in input $G=(V,E)$ would choose at random in advance a challenge $(a,b)$ of the honest V. What S should do is to simulate the commit transcript to $\phi(v)$ for all $v$, make $\phi(a) \ne \phi(b)$, other vertices set $\phi(v) = 2$, and decommit-transcript to color $\phi(a), \phi(b)$. Since the distinguisher could not tell the View of real part and the simulated part (commitments can hide the information but still can be verified at decommiting), this case would be Zero-Knowledge. > Case 2: Simulation for any Veerifier $V^*$ S on input G and verifier $V^*$: $For \ i = 1$ to ${|E|}^2$ ![](https://hackmd.io/_uploads/Sk5obI6sh.png) # what does ZKIP can do? Arthur Merlin # From Interactive to Non-interactive - Fiat-Shamir Paradigm/ Heuristic let H be a cryptographic hash function that is able to generate some random parameters. If H is a random-oracle model, then completeness and soundness hold. We can use $H()$ to convert an interactive Arthur-Merlin model into an non-interactive one. (which implies that **NOT EVERY IP CAN BE MADE NON-INTERACTIVE.**) ## Interactive Proof ## Non-interactive Proof # References - [ZKP MOOC Lecture 1: Introduction and History of ZKP](https://www.youtube.com/watch?v=uchjTIlPzFo&list=PPSV) - [Zero Knowledge Proofs: an illustated primer - Part 1](https://blog.cryptographyengineering.com/2014/11/27/zero-knowledge-proofs-illustrated-primer/) - [Zero Knowledge Proofs: an illustrated primer - Part 2](https://blog.cryptographyengineering.com/2017/01/21/zero-knowledge-proofs-an-illustrated-primer-part-2/)