Try   HackMD

Halo2-svd: Singular value decomposition in zero-knowledge proofs

Table of Contents

Introduction

We present Halo2-svd, an open source library which provides functionality to verify the singular value decomposition (SVD) of a matrix and to perform basic linear algebraic operations in Halo2. Singular value decomposition is one of the most important tools available in linear algebra to analyze a matrix. Our library brings this powerful tool to zero-knowledge proofs, allowing developers an easy and efficient method for performing the linear algebra required for applications ranging from machine learning to decentralized reputation management.

Using singular value decomposition, every matrix

M can be written as the product
M=UΣV,

where
U
and
V
are orthogonal matrices and
Σ
is a diagonal matrix with positive entries. Essentially, this decomposition says that a matrix can be viewed as a rotation of coordinates and a rescaling. The orthogonal matrices determine the rotation and the diagonal matrix
Σ
determines the rescaling. With this interpretation, it can be seen why SVD is an important tool in machine learning- by eliminating the small entries in
Σ
, one can compress the matrix
M
, while retaining its most essential features. This is essentially the idea behind Principal Component Analysis (PCA), which is technique for dimensionality reduction in machine learning. The singular vectors (these are the columns of
U
and the rows of
V
) also satisfy special properties, which have been used in algorithms, such as Eigentrust.

Zero-knowledge (ZK) proofs [4] are a framework that allow a prover to prove that a statement is true, without revealing any additional information beyond the validity of the statement. One of the most popular implementation of a ZK proof system is Halo2, which provides basic primitives for writing ZK circuits. Recently this has been used by Axiom to trustlessly access historic on-chain data of Ethereum. The availability of this data naturally prompts interest in the application of statistical and machine learning algorithms. To facilitate this, an efficient implementation of fundamental linear algebraic operations such as matrix multiplication, eigenvalue calculations, and singular value decompositions within a ZK circuit becomes imperative.

There are two major challenges to overcome in order to implement these operations in ZK circuits:

  1. Performing matrix multiplication is a very costly operation: for size
    n
    square matrices, it naively requires
    O(n3)
    operations and with Strassen's algorithm
    O(n2.8)
    [3].
  2. One needs to convert floating point numbers to the prime field
    Fp
    , since all numbers in a ZK circuit are represented in such a field. This requires implementing some sort of quantization scheme and a corresponding fixed point arithmetic.

For the first point above, the key observation is that one simply needs to verify the result of a matrix multiplication inside the ZK circuit, not compute it. This can be done with only

O(n2) operations with very high probability, see Lemma 1 below. The second obstacle is overcome by using the ZKFixedPointChip [1], which is a library that implements fixed point arithmetic for Halo2. (See also the blog post [1] for an explanation of the Fixed Point Chip and the quantization method that it uses to convert floating point numbers to elements of the prime field.)

The prime field

Fp we will work with is constructed from a
254
-bits prime described in [5]. In the following we will use
Mn×m(R)
and
Mn×m(Fp)
to denote
n×m
matrices valued in
R
and
Fp
respectively.

Efficient algorithms for matrix multiplication and SVD

We start by describing the algorithm for the verification of matrix multiplication. Consider three matrices

AMn×k(Fp),
BMk×m(Fp)
, and
CMn×m(Fp)
. Suppose that we wish to verify that
C=AB
inside the ZK-circuit. Instead of naively multiplying the matrices
A
and
B
inside the circuit and checking the result against
C
, we use the following variant of Freivalds' randomized algorithm [8] to verify matrix multiplication (interestingly, this variant appears in Sec 2.2 of [9]):

  1. Sample a random field element
    rFp
    .
  2. Construct the vector
    v=(1,r,,rm1)
    .
  3. Compute the quantities
    A(Bv)
    and
    Cv
    .
  4. If
    ABv=Cv
    , then the test passes and we have verified that
    AB=C
    ; otherwise, it fails.

The element

r is randomly sampled by the Prover using the Fiat-Shamir transform [6]. Note that it is important that the Prover commits to matrices
A
,
B
and
C
before the Fiat-Shamir transform is used to sample
r
. The computation of the multiplication result
C
is performed by the Prover outside the circuit and he can use any algorithm for this purpose. In our code, we simply use the trivial multiplication algorithm.

The most important aspect of the above algorithm is that it only requires

nm+mk+kn multiplications (i.e., one multiplication per element of the matrices), compared to
O(nmk)
multiplications for the trivial matrix multiplication algorithm.

The following lemma implies that if the claimed multiplication result is incorrect, then the test fails with high probability.

Lemma 1. Let

AMn×k(Fp),
BMk×m(Fp)
and
CMn×m(Fp)
. Let
rFp
be chosen uniformly at random over
Fp
. Define
v=(1,r,,rm1)
. If
ABC
(as field matrices), then one has:
P(Cv=ABv)m1p.

Proof. This result is a simple consequence of the Schwartz-Zippel lemma [2]. If

ABC, then there exists
i,j
such that
ci,jl=1kai,lbl,j
. By a direct computation the coefficient
i
of the vector
CvABv
is given by:
(CvABv)i=s=1mrs1(ci,sl=1kai,lbl,s).

By our assumption
(CvABv)i
is a non-zero polynomial in
r
of degree at most
m1
.
Therefore
(CvABv)i=0
is equivalent to a non-zero polynomial being zero at the random field element
r
. The Schwartz-Zippel lemma implies this can happen with probability at most
(m1)/p
. Q.E.D.

Note that we have so far only discussed matrix multiplication for matrices over the prime field. What we are really interested in are matrices over real numbers. We will discuss the matrix multiplication for such matrices in the next section.

Once we have an efficient algorithm for verifying matrix multiplication, we can use it to verify the SVD

(U,Σ,V) of matrix
M
by simply checking that
MUΣV
and that
UtUId
and
VtVId
.

Design considerations

Matrix multiplication

Under fixed point precision, there are multiple methods of implementing Freivalds' algorithm [8] for real matrices the most straightforward being that we trivially encode all operations using the fixed point chip. For example to check if

C=AB (say all matrices with dimension
n×n
), analogous to the field matrix algorithm described above, we can check that their quantizations satisfy
CqAqBq
(see next section for a precise definition of the quantization scheme used by the fixed point chip) by iterating the following simple test
k
times:

  1. Sample a random vector
    v{1,1}n
  2. Compute
    u:=(CqAqBq)v
  3. Check if
    u<δ
    for some tolerance
    δ

While such an implementation appears to be correct, one needs to be very careful while implementing this using the fixed point chip, especially for the check in Step 3, because the fixed point chip does not check for overflows and a dishonest prover could exploit this fact to cheat. Secondly, it turns out that for a reasonably small probability of error for this algorithm (say

109), the number of iterations required is
k30
, which is quite significant and leads to a large overhead (between 5-10 times compared to our implementation) in circuit size. Lastly, each iteration would also require a hash to be calculated in circuit, which too is quite expensive.

Instead, in our library we implement Freivalds' algorithm at a lower level- at the level of the field elements. In the fixed point arithmetic of [1], a

P precision element
xR
is represented as the field element
x2P
in the field
Fp
. Real number multiplication under this encoding is simply field multiplication followed by a rescaling of the output by
2P
. Matrix multiplication of real matrices under this encoding can then be viewed as field matrix multiplication followed by a rescaling of each of the elements of the output. Freivalds' algorithm for field matrices can be implemented as described in the previous section. The implementation and analysis of this algorithm is much more efficient and simple than the one for the real numbers, particularly because for field matrices, we do not need to check if the difference vector (
u
above in Step 2) is close to zero, we can simply check that it is zero or not. Further, it is sufficient to use a single random vector for the test in the algorithm, which eliminates the overhead due to multiple iterations. Finally, it turns out that this random vector can also be chosen to be of the form
(1,r,r2,,rn1)
for a random field element
r
as is shown above in Lemma 1. As a result, we do not require any hashes to be computed in circuit apart from the commitment for the first phase of the ZK-circuit, which generates the element
r
.

Overall field matrix multiplication can be checked using about ~

9n2 cells. As we mentioned above, to obtain the real matrix output from the output of the field matrix multiplication, we need to rescale the matrix. This costs between
60n2
to
100n2
(depending on the lookup table size and the precision) and turns out to be the much more expensive part of the multiplication procedure.

SVD verification

We design the SVD verification procedure such that it does not require rescaling any matrix product, since, as mentioned above, this is a very expensive operation. The bounds, which we choose to check for the verification of the SVD can all be viewed as

M1M2M3M4m<δ (see the next section and also [7]). For such bounds, it is sufficient to instead create the unscaled field products of these matrices and simply verify that the difference of their elements is smaller than an appropriately scaled version of
δ
. In addition to these bounds, we also verify that the elements of
Uq
and
Vq
are smaller than
1
and that the singular values are smaller than the maximum value that can be represented using the fixed point chip. These bounds ensure that the matrix multiplications used do not overflow.

The singular value decomposition algorithm

We now describe the full algorithm to check the singular value decomposition of a matrix

MMn×m(R) inside the ZK-circuit. The singular value decomposition (SVD) of a matrix
M
can be written as
M=UΣV,

where
UOn(R)
,
VOm(R)
and
ΣMn×m(R)
is a diagonal matrix with non-negative diagonal entries. Here
On(R)
denotes the set of real orthogonal matrices of size
n×n
.

There are two sources of numerical error that will arise as one attempts to check such a decomposition within a ZK-circuit. First when one numerically computes the SVD of

M using for instance numpy, one will get matrices
U^,Σ^,V^
which are only approximately equal to their exact counterparts
U,Σ,V
. Second, one needs to apply a quantization scheme to these matrices in order to be able to represent then in
Fp
. The quantization of a matrix
X
is given by:
(Xq)i,j=2P2PXi,j+1/2.

In other words,
Xq
is the matrix with the elements of
X
, up to a precision of
P
bits.
2P
is referred to as the quantization scale. Therefore for any real matrix
XMn×m(R)
, the matrix
2PXq
can be represented as a field matrix in
Mn×m(Fp)
.

We now describe our complete algorithm to verify the SVD of

M in Halo2. We use the observation that given
VVT=Id
, we have:
M=UΣVMVT=UΣ.

Rewriting the SVD condition this way will help us avoid rescaling the matrices while checking the SVD.

SVD computation and verification in our library follows the following steps. Suppose the prover wishes to compute the SVD of a fixed point quantized matrix

M (i.e.,
Mq=M
), then the prover:

  1. Computes the numerical approximations
    U^,Σ^,V^
    to the matrices in the singular value decomposition
    M=UΣV
    using any library for linear algebra (numpy in the example in our library).
  2. Converts the matrices
    U^,Σ^,V^
    to prime field valued matrices using the quantization procedure described above, namely the matrices:
    UP:=2PU^q,ΣP:=2PΣ^q,VP:=2PV^q.

    Similarly define
    MP:=2PMq=2PM
    since
    Mq=M
    .
  3. Computes the matrix product
    P1:=MPVPT
    outside the ZK-circuit.
  4. Computes the product
    P2:=UPΣP
    inside the ZK-circuit. Note this only requires
    n2
    multiplications since
    ΣP
    is a diagonal matrix.
  5. Proves the following claims in the ZK-circuit:
  • The matrix multiplication
    P1:=MPVPT
    is correct using the randomized algorithm of the previous section.
  • That
    ||P1P2||m<22Pδ1
    . The definition of the norm
    ||||m
    and the way
    δ1
    should be chosen will be explained in the next section.
  1. Computes the two products
    PU:=UPUPT
    and
    PV:=VPVPT
    outside the ZK-circuit, then proves inside the ZK-circuit:
  • The matrix multiplications
    PU=UPUPT
    and
    PV=VPVPT
    are correct.
  • That
    ||22PIdPU||m<22Pδ2
    and
    ||22PIdPV||m<22Pδ2
    , where
    δ2
    is a small error also defined in the next section.

Note that the above algorithm can easily be adapted to check the eigenvalue decomposition of a real symmetric matrix

M.

Error analysis of the SVD algorithm.

In this section, we present precise error bounds used in our SVD verification algorithm. For an in-depth error analysis, we refer the reader to [7], where these bounds are proven.

Let us first explain how we define the norm

||||m defined over field valued matrices. Given an integer
x
satisfying
0x<p
, where
p
is the prime of our prime field
Fp
, define the quantity:
|x|Fp:={x0<xp/2,pxp/2x<p.

The idea behind this definition is that we want to think of prime field elements past the midpoint of the field as representing negative numbers. For a matrix

XMm×n(Fp), we can now define:
||X||m:=maxi,j|Xi,j|Fp.

Now we can state the error analysis results. Let

ϵ=253. The results of [7] guarantee that:

  • ||P1P2||m<2Pmax(n,m)(1+||M||2)+3ϵ22Pmax(n,m)||M||2,
  • ||22PIdPU||m<22Pϵmax(n,m)+32P1n.

So we want to choose

δ1=2Pmax(n,m)(1+||M||2)+3ϵmax(n,m)||M||2 and
δ2=ϵmax(n,m)+32P1n
.

To further get a sense of the values of

δ1,δ2, let us assume for instance that
M
is a square matrix of size
n=m=1000
. We choose
P=32
for the quantization scale and further assume that
||M||2<102
. Then one gets roughly that
δ12321052.3105
and that
δ2323310003.5107
. The first error is larger since it is also growing with the matrix norm.

Performance of the algorithm

The total number of advice cells and lookup cells used will depend on the parameter

P of quantization scale which is denoted by PRECISION_BITS in the code, as well as on the LOOKUP_BITS parameter.

For LOOKUP_BITS =19 and PRECISION_BITS= 32, the number of advice cells grow as

162N2=135N2+27N2 (first phase + second phase) with
26N2
lookup cells, and for LOOKUP_BITS =19 and PRECISION_BITS= 63 the number of advice cells grows as
228N2=201N2+27N2
(first phase + second phase) with
48N2
lookup cells. Note here that we use the Halo2 Challenge API to implement the Fiat-Shamir transform [6] and sample the randomness for matrix multiplication verification. This API divides the program into two phases.

Acknowledgments

This project was conducted as part of the Axiom Open Source Program. We are grateful to N. Feng, Y. Sun, and J. Wang for organizing it. It's build on top of ZK circuit primitives provided by Halo2 and Axiom's Halo2-base. We also used the fixed-point chip implementation of [1].

Reference