Alice wants to prove that she knows some private information and public information which passes given computations or constraints. And Bob can verify the proof provided by Alice without revealing any private information. So in such scenario, we need zero knowledge proof (ZKP). The key properties of ZKP are two things.One is the computation could be hard or expensive, and the verification is very cheap. We call it succinctness. For blockchain, it’s really useful for verifying an expensive computation in smart contract. This is what zkEVM does in Layer-2 blockchain as we can run a program in off-chain zkEVM and also generate the proof off-chain. And then we only need a cheap gas fee for verifying it on-chain. Another property of ZKP is the zero knowledge. Because the verifier does not need to know the private. It’s useful in user-privacy preserving scenario. Take ZK version of Kaggle as an example, a user can use ZKP to prove that he has a machine learning model which gets 99% accuracy on an image classification task but he does not want to public his valuable model to the verifier. It’s also note that the public includes input, output, and any intermediate variable in computation. So it’s useful for public some important results along with the proof.
With the rapid development of various AI technologies and their amazing capabilities, how to combine them with cutting-edge cryptography and blockchain technologies to improve their security and transparency is a much-needed and promising direction. Leveraging zero knowledge to proetct private information while keeping it's verifiability for machine learning algorithms is what ZKML does. Daniel Kang's blog [9] privdes us a good example to verify the algorithmic integrity of Twitter’s recommendation results with zkml. Although Twitter open-sourced their algorithm, if Twitter generates "For You" timeline with other models instead of the public version they guaranteed, they are able to manipulate bias and content censorship. We were able to ensure via ZKP that Twitter was not secretly using a malicious alternative recommendation model.
All operations in machine learning models are related to float numbers. But all the operations in Halo2's ZK circuit are natively in BN254 field which is a finite unsigned integer type. Implementing fixed/float point arithmetic is an important basic for any type of ML or numerical algorithm. Because the ZK proof system only accept ZK circuit which consists of many constraints that only support +
, -
, *
, and =
operations over BN254. So if we want to use ZK proof system to proof our computation, we must find a way (arithmetization) to transform the computation to ZK circuit with ZK language. The transformation is what this project does.
This project provides an easy-to-use fixed point arithmetic chip based on Halo2. Fixed point numbers will be automatically represented with BN254 numbers. It provides basic operations (multiplication, addition, subtraction, division) and also implements a large number of commonly used nonlinear functions (e.g., log
, sin
) using the ZK primitive. To show the power of the implemented fixed point arithmetic, this project leverages it to implement the most representative machine learning algorithms, including linear regression, logistic regression, and decision tree. In addition to the inference of these algorithms, this project also implements the more challenging and ambitious training process.
As an overview, this project provides four useful chips (aka., ZK circuits):
32.32
to 63.63
) with automatically generated polynomial using Remez algorithmadd
, sub
, mul
, div
, mod
, sign
, clip
, polynomial
, bit_xor
, sum
, neg
, exp
, log
, pow
, sqrt
, max
, sin
, cos
, tan
, sinh
, cosh
, etc.sigmoid
(based on exp
) using FixedPointChipThere are two demonstrations presenting the training process of ZK linear regression and ZK decision tree. To run the demo and develop your own ZK circuit based on this project, please check out ZKFixedPointChip.
The training of linear regression has a closed form solution, but the solution involves matrix inverse operation which has the ill-conditioned problem leading to numerical unstable. This project implements linear regression training with SGD (Minibatch Stochastic Gradient Descent). If we generate the whole training process (e.g., 50 epochs and 32 batchs per epoch need 1600 training steps) with just one proof, the computation complexity of proving is too high. So I split the whole training process into multiple step proofs. One step proof stands for one batch or multiple batchs. With the help of AggregationChip, the whole training process can be represented with one proof aggregating all the step proofs. The above figure shows the loss curve when training linear regression model. To perform sanity checks of my implementation, I also implement a non ZK version and compare the loss values of both implementation in each training step. In the figure, the blue dashed line is ZK version of linear regression and the red solid line is the non-ZK version. We can see the error is negligible and it shows the correctness of my ZK implementation.
Training of decision tree is quite different with linear/logistic regression. Decision tree is a binary tree (or multiway tree but is ignored in this project). This project exposes easy-to-use interfaces for inferencing and training decision tree with a given dataset and hyper-parameters (e.g., max depth). There are different kinds of decision tress and its training algorithms. For simplicity, this project chooses to implement CART training algorithm [8]. It also references the Python implementation provided by Jason Brownlee[2]. To visualize the generated decision tree and compare it with the one generated by Python version, the decision tree will be automatically exported with dot
language and graphviz
is used to convert it to svg
figure. The node with yellow circle is leaf node, the green node is splitting node, and the blue one is fake node. For more details, please refer to the implementation section.
To explore the computation complexity and performance of this project, I conduct the benchmark for the training of linear regression, logistic regression, and decision tree. From the results, increasing batch size will save some proving time for the whole training process.
ZK circuit in halo2
only supports +
, -
, and *
operations over BN254. That means, development of ZKP lacks support for float/fixed point operations. Compared with fixed point operations, floating-point operations introduce a lot of unnecessary computation overhead. For simplicity, I select to implement ZK fixed-point arithmetics using halo2
. This project provides not only basic fixed-point arithmetics (add
, sub
, div
, mod
, mul
), but only complex operations (e.g., sin
, exp
, log
). Fixed point arithmetic is an important basic for any type of ML or numerical algorithm.
On the other hand, most existed ZKML projects focus on deep learning instead of traditional machine learning algorithms. Transforming original float-point deep learning models into integer-only models is a widely researched and applied technology. This is called quantization. There are many existed tools such as Tensorflow and PyTorch provide model quantization. For most existed ZKML projects, they only support deep learning models and the models must be quantized first. And for non-linear functions, directly quantizing them is basically impossible. There are some projects or researches [5] propose to implement them with polynomial approximation. Many of the existed ZKML projects instead use lookup
tables to implement them. In addition, for many traditional ML algorithms such as decision tree, it's hard to quantize the whole model. With FixedPointChip
in this project, ZK developers can easily develop ZK versions of ML algorithms without caring about quantization and the implementations of most commonly used non-linear functions. For ZKML developers, this project also implements some examples showing how to use FixedPointChip
to develop ZKML algorithms (see implementations in LinearRegressionChip
, LogisticRegressionChip
, and DecisionTreeChip
). For ZKML users or other ZK developers who want to use out-of-the-box ZKML algorithms directly in downstream applications or ZK circuits, they can directly pick this project.
The challenges of this project are in several folds. As shown in the motivation, quantization is not easy for non-linear functions. A mainstream solution is using high-degree polynomial to approximate the non-linear function in a given input range. For a input number, it should be first tranformed to this given input range and then calculate the output of the polynomial. Second, ZK circuit only supports BN254 opeartions.I need to transform all quantization operations into BN254 opeartions in ZK circuit ways. I need to not only just finish this transformation correctly, but also find a ZK-friendly way. Becuase most fast operations in normal programs are slow and hard to implement in ZK circuit, like logic and
and xor
.
Third, variable length operations and recursion are ZK-unfriendly. They can not directly be supported. Decision tree relies on them to train and build the whole tree. When inferencing, it also requires traversal on paths with different lengths. To solve this challenge, this project proposes path padding and fake nodes. Path padding is used for making every traversal paths having the same legnths. And fake nodes are added to the original decision tree to make the tree be a complete binary tree. Finally, developing ZK circuit using halo2
needs carefully debug and sanity checks. Halo2
is still a very new project, so it's hard to debug when facing errors. It requires us to carefully write codes and debug. For example, if we make the generate circuits different between proving and verifying, it will complaint ConstraintFailure
error without telling us which line in the source code leads to this error. To verifying the correctness of the implementation, it's necessary to write sanity checks comparing the ZK version and non-ZK version.
The overview workflow of operations in FixedPointChip
is shown in the above figure. If we want to calculate \(\sin(x)\) using FixedPointChip
, we first quantize the input decimal number and then feed to FixedPointChip
to get the output. The output is also quantized number stored in a BN254 integer. With the help of dequantization
, we will get the final output in the format of decimal number. It's note that all the operations in green color are automatically done by this project. The user only need to input the original decimal number and the get decimal output.
To implement FixedPointChip
, first thing is how to represent decimal numbers with BN254 integer numbers. The transformation from decimal numbers to BN254 numbers is called quantization
. And dequantization
is the inverse operation. The formula of quantization
and dequantization
is presented in the following figure:
The decimal number is \(x\) and the quantized BN254 number is \(x_q\). We call \(s\) scale and \(z\) zero point. If \(z=0\), we call it symmetric quantization, otherwise it's asymmetric quantization. If we have a computation (e.g., \(\sin(x)\)) which is originally designed for decimal numbers, but how can we transform it to let the computation only contains BN254 operations? This is called the quantization of this computation. Take mulitiplication as an example. Here is the quantizations of mul
in symmetric quantization and asymmetric quantization:
Because the tranformed computation should only contains the quantized numbers and the output should be the quantization of the decimal output of the original computation. The disadvantage of symmetric quantization is the range of the quantization results include negative fields because the range of input decimals include negatives. So in design, we need to deal with negative numbers in BN254. Because BN254 is a finite cyclic group whose range is \([0,M]\) where \(M \approx 2^{254}\), it does not support negative numbers natively. Thanks to BN254 supports modulo, negative numbers can be transformed to the valid range by modulo (i.e., \(M-x\)). But in implmentation, we need to keep the negative fields in our mind and carefully process them which will bring more computation overheads. Symmetric quantization has the advantage that the computation is simpler as you can see the computation for asymmetric quantization takes more computation terms. Compared with symmetric version, asymmetric quantization has the ability to transform the input into a positive range so we do not need to worry about how to deal with negative numbers. It's a trade-off betwen both versions. This project selects symmetric quantization.
To ensure that the ranges of positive numbers and negative numbers do not overlap, this project supports a maximum precision of \(63\). To ensure the error in a small range, the minimum precision is \(32\). \(p\) in the figure denotes the precision.
All the operations in FixedPointChip
can be splitted into three categories: basic operations, basic functions, and derived functions. Basic operations include four basic arithmetics: mul
, div
, mod
, add
, and sub
. For the detail implementation of mul
, the quantization formula is: \(c_q = \frac{a_q \times b_q}{s}\). To support negatives, we know modular multiplication property: \((a*b)\,mod\,n=(a\,mod\,n)*(b\,mod\,n)\). But the division does not has this nice property, we must implement it with ZK circuit manually. I call this signed division \(c = \frac{a}{b}\) where \(a\) might be negative number and \(b\) is postive number. Like div_mod
in Axiom's halo2-base
, I first decompose \(a\) to \(a = b \times q + r\) where \(r > 0\) and \(q\) could be negative number. If \(a\) is negative number, thanks to modular multiplication property, then \(a\,mod\,M = b \times (q\,mod\,M) + q\). With this decomposition as ZK constraint, we can safely support signed division and the \(q\,mod\,M\) is the output.
With basic operations, the next step is implementing basic functions:
sin
exp2
log2
To implement these basic functions, as they are non-linear functions, we cannot directly make quantizations for them. Like the implementation in I-BERT [5], FixedPointChip
first uses Remez algorithm to find the optimal polynomial approximation for the basic function. Polynomial only contains quantization +
and *
so we can just use basic operations in FixedPointChip
to implement it. The coefficients in the optimal polynomial is pre-computed and fixed in the FixedPointChip
because the coefficients of the polynomial do not need to be changed for different precisions and inputs. Meanwhile, the Remez algorithm can only find the optimal approximation in a given input range (e.g., \([2,4]\)) which is smaller than the domain (e.g., \(\mathbb{R}\)) of the function. To make the approximation support for the domain of the function, it also needs some transformation for the input. Take log2
as an example. the workflow of its implementation is as follows:
log2
over the range of \([2, 4]\)Once we have basic operations and basic functions, we can easily use them to implement many kinds of derived function. A list of the derived functions in FixedPointChip
is shown in below:
The formulas of linear regression and logistic regression are:
With the power of FixedPointChip
, it's easy to implement these formulas to support the inferences of linear and logistic regression.
As for the training, although they are convex functions under a given loss (e.g., mean square error, log loss), the closed solution involves some numerical unstable operations such as matrix inverse operation which has the ill-conditioned problem. So I decide to implement the training with SGD algorithm. SDG will iteratively update the model parameters by the gradient of the loss function to approach the global optimal solution. The SGD algorithm is (original source from [7]):
The inference of decision tree involves tree traversal in the decision tree. The traversal path is from root node to leaf node. In each non-leaf node, the node will select left or right branch according to the conditional expression (e.g., \(x_0 < 1.3\)) in the node. Because for different input, the traversal path is different and the length of the path is also different. But variable length input is unfriendly to ZK circuit. To solve this problem, I propose to use path padding to force all traversal paths to have the same length. For example, we have two paths in the above figure. Path 1 has length of 3 and Path 2 has length of 2. Because the maximum length of all possible paths is 3. So all the traversal paths will be padding into length of 3. For path 2, although the second node is the leaf node. But to make the path padded into length of 3, DecisionTreeChip
will add one self-loop for this leaf node and repeat accessing this leaf node once. To represent the decision tree with a ZK friendly structure, each node in decision tree is represented with a \(5\)-tuple: \(\langle \text{slot, val, LR, RT, CLS} \rangle\) where slot denotes the split feature argument for this node, val is the split value, LR is the index of left child node, RT is the index of right child, and CLS is the class of this node. If the current node is leaf node, then slot, val are both \(0\), and LR and RT are just the index of the current node which will bring self-loop for leaf nodes. If the current node is non-leaf node, then CLS will be a special fake class \(255\).
To train the decision tree, we need to recursively generate the deicion tree from top node to bottom. In each non-leaf node, we will calculate Gini impurities for different split parameters (split feature, split value) and find the best split with minimum Gini impurity. Gini Impurity is the probability of incorrectly classifying a randomly chosen element in the dataset if it were randomly labeled according to the class distribution in the dataset. A perfect splitting results in a Gini score of \(0\), whereas the worst case split that results in a Gini score of \(0.5\) (for a binary class problem). The formula of Gini impurity is \(G = \sum_{i=1}^C p(i) \times (1-p(i))\) where \(p(i)\) is the probability of randomly picking an element of class \(i\). To find the best split in each node, it will first calculates Gini impurities for different split parameters over the dataset in this node. The best split has the minimum Gini impurity. Each non-leaf node has a conditional expression to decide the next node is either the left child or the right child. And the dataset of current node will be splitted into left part and right part according to the conditional expression. The left part of dataset will be the dataset used in the left child node. The right part of dataset will be the one used in the right child node. The conditional expression consists of the split feature and the split value. Because the dataset contains multiple samples and each sample has many features, the conditional expression is something like \(x_0 < 1.3\) where the split feature is the first feature in the dataset and the split value is \(1.3\). The conditional expression is used to determine whether each sample in the dataset goes to the left part or the right part. The criteria to decide whether the node is leaf node is controled by hyper-parameter min_size
and the depth of the generated decision tree is controled by max_depth
. If the number of samples in the dataset of the current node is less than or equal to min_size
, then the current node will be leaf node and the class of this node is the most common class in the dataset of current node. Because the training process is recursion but the recursion is ZK unfriendly, in the implementation, the recursions will be transformed into non-recursion using queue. And decision tree is not ensured to always be complete binary tree. Non-complete binary tree will leads to variable length of input and computation steps which is not supported by ZK circuit. This project proposes fake node to eliminate this obstacle. As shown in the above figure, the blue dashed circles are fake nodes. During training, we will still do the split even for the leaf node and the added child nodes for the leaf node are called fake nodes. Fake nodes do not influence the result because we have added the self-loop for the real leaf node (yellow circles).
select_from_idx
with lookup
tables in ZK-LR and ZK-DTThe training of ML algorithms will involve a large amount of data samples. Current implementation utilizes select_from_idx
provided by Axiom's halo2-base
to get a specific sample in a given dataset with index. You can treat it as accessing an element from an array by index. The implementatiom of select_from_idx
will decompose the index into an one-hot vector whose length is same as the target array and then uses inner product to get the element. Because the dataset is fixed during the whole training and there are a large amount of accessing operations, it's sub-optimal to do so many inner products for the same dataset. lookup
table is an important feature of halo2
and a better alternative of select_from_idx
.
The performance of the current implementation still has a lot of room for improvement. For example, the current inner_product
just calculates the element-wise production for each pair and then sum them. It is expected to be improved with gate overlapping like the implementation in Axiom's halo2-base
. Improving FixedPointChip
should be a core direction in the future.
As an advice by Horace Pan et al.'s ZKML project [10], the extra capacity of BN254 variables will be wasted for FixedPointChip
with low-precision. Packing multiple numbers inside one BN254 variable is one solution to improve the efficiency.
In ML scenario, it's common to link multiple computation layers (e.g., Conv2D-BatchNorm-ReLU
). If we do quantizations on them separately, it will cause some unnecessary redundant computations. It's better to fuse the quantization of commonly used combination of multiple ML layers. This is called fused quantization, as details shown in Lei Mao's blog [1].
This project was conducted in Axiom Open Source Program. It is also supported by Clique2046. It's build on top of ZK circuit primitives provided by Halo2 and Axiom's Halo2-base. During design and implementation, I also refered many open source projects such as [3] and [6]. Without the infrastructure and help of these projects, this project could not have been born.