Halo

In this note we describe Halo, a recent zero-knowledge proof system. The goal is to walk through the scheme step by step and explain the cryptographic primitives at play.

We look at the problem of proving a very long sequence of statements.
For instance a blockchain-rollup operator needs to accumulate as many transactions as possible before providing a proof of correct execution of all those transactions to the blockchain. The proof has to be succint, verified in constant time (ideally) no matter the number of operations in the sequence.

Pedersen commitments

Halo relies mainly on the Pedersen commitment scheme, along with an opening technique borrowed from bullet proof (with a little modification from its original description).

Let

pr be prime numbers,
E(Fp)
an elliptic curve defined over
Fp
such that
E
contains a group
E[r]
of
Fp
rational points of order
r
. Let
n<<r
be an integer and
(Gi)i[n]
a list of points in
E[r]
.
Let
P=i[n]aiXiFr[X]
a polynomial.
The Pedersen commitment to
P
is
PCP=i[n]aiGi

Binding property: Suppose

P=i[n]aiXi and
Q=i[n]biXi
are
2
different polynomials, with
PCP=PCQ
.
Since
G1,,Gn
are in a cyclic group they can be written as
λ1G,,λnG
where
G
is a generator and
λi
are in
Fr
.
PCP=PCQ
implies that
i[n](aibi)λiG=O
, so
(aibi)i[n]
is orthogonal to
(λi)i[n]
, therefore
(aibi)i[n]
belongs to an hyperplan of
Frn
. The probability that
ab
is in this hyperplan by luck is
rn1/rn=1/r
.

Hiding property: The version that we described is not hiding. To be hiding

PCP is masked by adding a random
rR
component R is a random point on the curve and
rFr
is a random number. We stick to the non hiding version to not overload notations.

There is a way to open a Pedersen commitment (adapted from a more general technique in bulletproofs) which involves logarithmic (in the degree of the polynomial) communication complexity between a prover and a verifier.

We suppose that the degree of the committed polynomial is a power of

2, equal to
2s
.

If

a=(a1,,an)Frn and
G=(G1,,Gn)E(Fp)[r]n
are the coefficients of a polynomial and the points used for computing Pedersen commitments, we write
<a,G>
for
i[n]aiGi
.

We also add the subscript

L and
R
to indicate that we take the left part or the right part (of size
n/2
) of a list of elements.

Opening a Pedersen commitment

PCP at
x

Let

x=(1,x,,xn1).

  • Verifier sends
    x
    and a random point
    UE(Fp)[r]
    to the prover
  • The prover sends the purported value
    P(x)=y
    and the verifier computes
    K0=PC(P)+yU
  • loop for i from
    1
    to
    s
    :
  • The prover computes
    Li=<aL,GR>+<aL,xR>URi=<aR,GL>+<aR,xL>U

    and send those to the verifier
  • the verifier sends a random
    uiFr
    to the prover
  • The prover sets
    a=uiaL+ui1aRx=ui1xL+uixRG=ui1GL+uiGR
  • end loop
  • At the end of the loop,
    a=a
    and
    x=x
    are of size
    1
    , the prover sends them to the verifier.
  • The verifier sets
    h=Πi[s](ui1+uiX2i1)
    and computes:
    x=h(x)G=PChK=K0+i[s](ui2Li+ui2Ri)
  • The verifier finally checks that
    axU+aG=K
    . If the check holds, then the verifier concludes that
    y=i[n]aixi
    .

Completeness: After receiving

u1, the prover will divide the size of the vectors
G,x,a
by
2
, by folding them like this:

  • a=u1aL+u11aR
  • G=u11GL+u1GR
  • x=u11xL+u1xR

    Observe that
    <a,G>+<a,x>U=<u1aL+u11aR,u11GL+u1GR>+<u1aL+u11aR,u11xL+u1xRU>=u12(<aL,GR>+<aL,xR>U)+u12(<aR,GL>+<aR,xL>U)+K0=u12L1+u12R1+K0

If the procedure stops here, the prover sends

a, of size
n/2
, to the verifier, the verifier folds
G,x
as the prover did and checks that
K0+u12L1+u12R1=<a,G>+<a,x>U

The full procedure is a repetition of this step
s
times. The fully folded versions
K,x,G
of
K0
,
x
and
G
are respectively
K0+i[s]ui2Li+ui2Ri
,
h(x)
, and
PCh
, where
h=Πi[s](ui1+uiX2i1)
.

Soundness: The soundness comes form the fact that

Li,Ri are sent prior to receiving the
ui
.
If the claimed
y
is false, call this value
y
, the verifier starts with
K0=<a,G>+yU
. To make the proof works, during the first round the prover sends
L1,R1
, computed correctly, and after receiving
u1
, the prover hopes that
K0+u12L1+u12R1=<a,G>+<a,x>U
, where
a,G,x
are the correct folded version of
a,G,x
. If this happens, the prover continues to send the correct
Li,Ri
and he is saved since the folded commitments will coincide after this event. The probability of such a collision is
1/r
, from the binding property of the Pedersen commitment. Since the prover can hope for such a collision at each round, he has
k/r
chances to win, which is close to none.

It is important to note that Pedersen commitments are homomorphic for

+: we have
PCP1+PCP2=PCP1+P2

This property allows a prover to prove a batch opening of several commitments at one point with one proof:

Let

(PCPi)i[l] a list of Pedersen commitments. To prove to a verifier that the commitments open to
(yi)i[l]
at
ζ
:

  • The prover sends the commitments
    (PCPi)i[l]
    and the values
    (Pi(ζ))i[l]
    to the verifier
  • The verifier samples a random
    v
    , sends it to the prover
  • Prover and verifier engage the bullet proof opening argument to verify that
    i[l]viPCPi
    opens at
    i[l]viPi(ζ)

The completeness follows from the homomorphic property of the Pedersen commitment scheme. The soundness comes the fact that the commitments and purported values are sent prior to receiving

v, so if one of the
Pi(ζ)
is wrong, the prover has
k/r
chances to win from the soundness proof of the bullet proof opening argument.

SNARK

A SNARK (Succint Non interactice Argument of Knowledge) allows a prover to convince a verifier that he knows

x (private), such that
f(x)=y
, where
y
is public (think of
f
as a hash function for instance). The prover doesn't provide
x
, and verifying a proof is much cheaper than computing
f(x)
.

We will indifferentially use the term "circuit" or "constraint system".

We specify some useful notations:

  • The circuits are written in capital.
  • Sat(_,_,_)
    is a boolean function checking satisfiability of a constraint system. It takes as parameters a circuit
    C
    , public input
    y
    and private input
    x
    and returns
    1
    or
    0
    according to whether or not
    C
    is satisfied when fed with
    x,y
    .
  • P(_)
    is a proving function, which takes a list of
    Sat(_,_,_)
    funcions. It returns proof object
    π
    if
    Sat(Ci,yi,xi)=1
    for all
    i
    , otherwise it fails.
  • V(_,_)
    is a verifying function. It's a boolean function which takes a proof
    π
    and a public input
    y
    , it returns true or false according to whether or not the proof is valid
  • H
    is a secure hash function

Example: if one has

2 circuits
C1,C2
, and associated public/private inputs
(y1,x1),(y2,x2)
such that
Sat(Ci,yi,xi)=1
. We create the proof attesting of this fact like this:
π=P(Sat(C1,y1,x1),Sat(C2,y2,x2))
. Verifying both claims is done by running
V(π,y1||y2)
.

We can now give an informal definition of what is meant by a SNARK in this post:

We suppose that a SNARK is a tuple

(Sat,P,V), with the following properties:

  • P
    outputs a proof which is of fixed short size (say a few kB to fix ideas)
  • The memory for running
    Sat(C,y,x)
    is
    O(|y|+|x|)
  • The memory and CPU time for running
    P((Sat(Ci),yi,xi)i)
    is
    O(i(|xi|+|yi|))
  • The memory and CPU time for running
    V(π,y)
    is
    O(|y|)
    with a small constant
  • P
    is bounded in memory and cannot process arbitrarily large computations

The third point means that

V is cheap to run as long as there
|y|
is small.

For example, if

C is a circuit, and we have associated public/private inputs
(y,x)
such that
Sat(C,y,x)=1
. If
π=P(Sat(C,y,x)
, then checking
Sat(C,y,x)=1
and
V(π,y)=1
attest in both cases that
y,x
satisfies the circuit, but running
V(π,y)
takes a few seconds and is very cheap in memory, while checking
Sat(C,y,x)
may take a long computational time, but more importantly is extremely memory greedy.

Proof Carrying Data

Given a Snark as it was defined in the previous section, say we have a large set of circuits

(Ci)i[n] with associated inputs
(xi,yi)i[n]
, and we want to output a proof that
Sat(Ci,yi,xi)=1
for all
i
.

First attempt

The first solution is to feed

P all the circuits at once and compute
P((Sat((Ci,yi,xi)i[n])
.

The issue is that

P is bounded in memory, so brute-force proving all the statements at once will eventually exhaust the memory of
P
. Moreover the complexity (in time)
P
grows linearly with the cumulated number of constraints of all circuits
Ci
, so the time to compute a gigantic circuit needs a big computational power (typically it cannot be ran on a laptop).

Second attempt

We know that

V is an algorithm that is cheap to implement. The idea is to express this algorithm as a r1cs, the language supported by our SNARKs, and see
Sat(V,_,_)
as a particular satisfiability problem (constraining
V
so that it returns
1
) to be fed to the prover algorithm
P
. If one has a proof
π
, and associated public input
y
such that
V(π,y)
returns
1
, then
P(Sat(V,π||y,_)
(there are no private inputs) outputs a proof
π
attesting that
Sat(V,π,y)=1
, otherwise said
π
is a proof attesting that a proof is correct.

An important observation is that

Sat(V,_,_) does not take a lot of memory when fed to
P
by assumption, since
V
is cheap and
Sat
grows linearly in memory with the size of
V
.

Having observed this, the idea is to sequentially handle the circuits

(Ci)i[n] to output a proof that
Sat(Ci,yi,xi)=1
, and that the proof of the previous statement is correct. More precisely, if
π
is a proof of a previous statement, with associated public input
y
, then at step
i
, we run
P((Sat(Ci,yi,xi),Sat(V,π||y,_))

The new generated proof
π
attests that
Sat(Ci,yi,xi)=1
and that
V(π,y)=1
. So we have a way to build a proof that not only attests the validity of a circuit, but also attests recursively that all previous proofs are correct. It's an example of Proof Carrying Data (PCD) infrastructure. There are several different constructions of PCD, the most general one is that instead of having a sequence of circuits, we have an acyclic graph. Here we stick to the sequential version.

There is a last problem that arises when doing that. When accumulating proofs, the list of public inputs grows with the number of proofs. Here is an illustration of that, starting with

C0:

  • π0=P(Sat(C0,y0,x0)
    (there is no previous proof statement)
  • π1=P(Sat(C1,y1,x1),Sat(V,π0||y0,_))
  • π2=P(Sat(C2,y2,x2),Sat(V,π1||y1||π0||y0,_))
  • π3=P(Sat(C3,y3,x3),Sat(V,π2||y2||π1||y1||π0||y0,_)

We see that the number of public inputs increases at each step, for instance verifying

π3 is done by running
V(π3,y3||π2||y2||π1||y1||π0||y0)
.

Since the CPU time and memory consumption of

V grow linearly with the size of the public input this will quickly become problematic. The solution to counter that is to hash all the public inputs
(π,y)
using
H
, and add a circuit checking that
H(π,y)
gives the expected hash result
h
.
h
becomes the public input and
π,y
become private inputs. We name this circuit
CH
for "Correct Hash", and
Sat(CH,y,x)
returns
1
if
H(x)=y
.

Here is the final construction:

  • π0=P(Sat(C0,y0,x0))
    (there is no previous proof statement)
  • compute
    h0=H(π0,y0)
  • π1=P(Sat(CH,h0,π0||y0),Sat(C1,y1,x1),Sat(V=1,_,π0||y0))
  • Hash the public inputs by computing
    h1=H(π1,h0,y1)
  • π2=P(Sat(CH,h1,π1||h0||y1),Sat(C2,y2,x2),Sat(V=1,_,π1||h0||y1))
  • Hash the public inputs by computing
    h2=H(π2,h1,y2)
  • π3=P(Sat(CH,h2,π2||h1||y2),Sat(C3,y3,x3),Sat(V=1,_,π2||h1||y2))

We see that after

π2, the number of public inputs does not increase anymore, this number only depends on the number of public inputs of the
i
-th circuit.

Instantiating Halo with PLONK

Halo is the application of the construction described in the previous section on a version of PLONK in which the Kate commitment scheme has been replaced by the Pedersen commitment scheme, along with the bullet-proof opening argument.

Let's recall roughly the workflow of PLONK between a prover and a verifier, where the proof is in fact a list of commitments:

  • Prover sends a PLONK proof
    (PCPi0)i[l]
    to the verifier
  • Verifier queries the values
    (Pi0(ζ0))i[l]
    and performs various checks on them
  • Verifier samples
    v
    and checks a batch opening proof for
    i[l]viPCPi0
    at
    ζ0
    :
    1. sample a random
      u0=(u10,,uk0)
      , send them to the prover
    2. compute
      h0(X)=i[s](ui01+ui0X2i1)
    3. evaluate
      h0
      at
      ζ0
    4. compute
      PCh0
      (costly operation!)
    5. perform consistancy checks between
      h0(ζ0)
      ,
      PCh0

For the construction described in the previous section to work, the verification procedure needs to be cheap (at most logarithmic in the size of the circuit).

The computation of

PCh does not match this property, since the cost is linear in
n
.

To circumvent that, we outsource the computation of

PCh to the prover:

  • Prover sends a PLONK proof
    (PCPi0)i[l]
    to the verifier
  • Verifier queries the values
    (Pi0(ζ0))i[l]
    and performs various checks on them
  • Verifier samples
    v
    and checks a batch opening proof for
    i[l]viPCPi0
    at
    ζ0
    :
    1. sample a random
      u0=(u10,,uk0)
      , send them to the prover
    2. compute
      h0(X)=i[s](ui01+ui0X2i1)
    3. evaluate
      h0
      at
      ζ0
  • Prover computes
    PCh0
    and sends it to the verifier
    4. verifier performs consistancy checks between
    h0(ζ0)
    ,
    PCh0

The verifier now no longer has to compute

PCh0, but the trade off is that for the final checks, he does not know if
PCh0
is correct!

To check that

PCh0 is correct, a solution would be to ask the prover to open it, but an opening proof yields another Pedersen commitment to compute from the verifier But remember that we prove a sequence of circuit, and the prover's job is exactly to provide opening proofs. So this check is forwarded to the next proof, where
PCh0
and
u0
become public inputs. The workflow between the prover and the verifier in the next proof becomes, at step
1
:

  • Proof at stage
    0
    is correct, up to the correctness of
    PCh0
  • Prover sends a new PLONK proof
    (PCPi1)i[l]
    to the verifier
  • Verifier queries the values
    (Pi1(ζ1))i[l]
    and performs various checks on them
  • Verifier queries
    u0
    from the previous proof and computes
    h0(ζ1)
  • Verifier samples
    v
    and checks a batch opening proof for
    i[l]viPCPi1+vl+1PCh0
    at
    ζ1
    :
    1. sample a random
      u1=(u11,,uk1)
      , send them to the prover
    2. compute
      h1(X)=i[s](ui11+ui1X2i1)
    3. evaluate
      h1
      at
      ζ1
  • Prover computes
    PCh1
    and sends it to the verifier
    4. verifier performs consistancy checks between
    h1(ζ1)
    ,
    PCh1
  • PCh0
    is now validated, and the correctness of
    π0
    is fully proven
  • π1
    is correct, up to the correctness of
    PCh1

The computations of

PChi are therefore forwarded from proofs to proofs. The soundness comes from the fact that the prover does not know
ui
in advance, so if he sends a random
PChi
he has very little chances to match the check of the opening proof.

With the construction from the previous section, and this forwarding system, one can obtain a chain of arbitray length of proofs, in which the verification of the

i-th step guarantees that all the previous proofs were correct!

If we use Fiat-Shamir, we can write a compact version of the scheme:

  • Proof
    π0
    is correct, up to the correctness of
    PCh0
  • Prover sends a new proof
    π1
    :
    1. (PCPi1,Pi1(ζ1))i[l]
    2. PCh1
    3. A proof for the opening of
      i[l]viPCPi1+vl+1PCh0
  • Verifier performs various checks on
    (Pi1(ζ1))i[l]
  • Verifier checks the batch opening proof, validating the correctness of
    PCh0
    and therefore
    π0
  • π1
    is correct, up to the correctness of
    PCh1

Caveat

A circuit

C in PLONK reasons about variables which live in a field
Fr
. If one wants to deal with variables in another field
Fp
, one needs to encode
Fp
arithemetic in a language where variables are in
Fr
, and it is extremely inefficient.

However, from the previous section, we saw that a proof

π consists of:

  • (PCPi,Pi(ζ))i[l]
    : it's a set of points on
    E(Fp)
    plus some values on
    Fr
  • PCh
    : it's a point on
    E(Fp)
  • A proof for the opening of
    i[l]viPCPi+vl+1PCh
    : a set of points on
    E(Fp)

So a proof

π is splitted in
2
components
(πp,πr)
living respectively in
Fp
and
Fr
.

And a the verifier's job consits of:

  • performing various checks on
    (Pi(ζ))i[l]
    (
    Fr
    operations)
  • computing
    h(ζ)
    (
    Fr
    operations)
  • performing consistency checks between
    h(ζ)
    and a folded commitment (
    Fp
    operations)

So

V is splitted in
2
components
(Vp,Vr)
reasonning respectively in
Fp
and
Fr
.

So we see that writing

V in a language which does not support
Fp
operations is problematic.

The solution is to find another curve

E defined on
Fr
and which contains a group of order
p
, and bounce back and forth between the
2
curves, by carefully separating the proofs and verification circuits in their respective field.

Here is a description step by step of a full sequence, where we omit the inputs to not overload notations, and write

P(C) for generating the proof of circuit
C
:

  • On
    Fr
    1. P(C0)π0=(π0r,π0p)
    2. π0r
      is saved
    3. π0
      attests that
      C0
      is satisfied, up to the correctness of a Pedersen commitment
  • On
    Fp
    :
    1. P(C1,Vp(π0p))π1=(π1r,π1p)
    2. π1p
      is saved
    3. π1
      attests that
      C1
      is satisfied, up to the correctness of a Pedersen commitment
  • On
    Fr
    :
    1. P(C2,Vr(π0r),Vr(π1r))π2=(π2r,π2p)
    2. π2r
      is saved
    3. π2
      attests that
      C2
      is satisfied, up to the correctness of a Pedersen commitment and that
      π0
      is correct
  • On
    Fp
    :
    1. P(C3,Vp(π1p),Vp(π2p))π3=(π3r,π3p)
    2. π3p
      is saved
    3. π3
      attests that
      C3
      is satifsifed, up to the correctness of a Pedersen commitmtent and that
      π1
      iscorrect

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Author: Thomas Piellard.

I'm part of a research team at ConsenSys. If you are interested in our work (fast finite field arithmetic, elliptic curve pairings, and zero-knowledge proofs), give us a shout.