Quantization and fixed-point. Neural network inference is typically done in floating-point arithmetic, which is extremely expensive to emulate in the prime field of arithmetic circuits. To avoid this overhead, we focus on DNNs quantized in int8 and uint8. For these DNNs, weights and activations are represented as 8 bit integers, though intermediate computations may involve up to 32 bit integers. In these quantized DNN, each weight, activation, and output is stored as a tuple (wquant, z, s), where wquant and z are 8-bit integer weight and zero point, and s is a floating point scale factor. z and s are often shared for all weights in a layer, which reduces the number of bits necessary to represent the DNN. In this representation, the weight wquant represents the real number weight:
w = (wquant − z) · s
To more efficiently arithmetize the network, we replace the
floating point s by a fixed point approximation a
b for a, b ∈ N and compute w via
w = ((wquant − z) · a)/b
where the intermediate arithmetic is done in standard 32-bit integer arithmetic. Our choice of lower precision values of a and b results in a slight accuracy drop but dramatic improvements in prover and verifier performance. As an example of fixed point arithmetic after this conversion, consider adding y = x1 + x2 with zero points and scale factors zy, z1, z2 and sy, s1, s2, respectively. The floating point computation:
(y − zy ) · sy = (x1 − z1) · s1 + (x2 − z2) · s2
is replaced by the fixed point computation
y ≈ (x1 - z1) * a1 / b1 * ay / by + (x2 - z2) * a2 / b2 * ay / by + zy
The addition and multiplication can be done natively in the finite field, but the division cannot. To address this, we factor the computation of each layer into dot products and create a custom gate to verify division. We further fuse the division and non-linearity gates for efficiency. We describe this process below.
The author is discussing how to build a large language model (LLM) inside a cryptographic proof system called zk-SNARK. One challenge is dealing with floating-point numbers, which are difficult to work with in this context.
To solve this issue, the author uses a process called quantization, which converts floating-point numbers into a smaller set of integers (8-bit integers in this case). In the quantized deep neural network (DNN), each weight, activation, and output is stored as a tuple (wquant, z, s). The real number weight 'w' is calculated using the formula:
w = (wquant - z) * s
To make the calculations more efficient, the floating-point scale factor 's' is replaced with a fixed-point approximation a/b (a and b are natural numbers). The weight 'w' is then computed as:
w = ((wquant - z) * a) / b
This method uses 32-bit integer arithmetic and results in lower precision values of 'a' and 'b'. Although this causes a slight drop in accuracy, it improves the performance of the proof system.
The author also provides an example of fixed-point arithmetic after the conversion. The floating-point computation for adding y = x1 + x2 with zero points and scale factors is replaced with fixed-point computation. This new computation involves addition and multiplication that can be performed natively in the finite field. However, division cannot be done natively, so a custom gate is created to verify division.
In summary, the author explains a technique to build an LLM inside a zk-SNARK by converting floating-point numbers to fixed-point numbers. This is done using quantization and fixed-point arithmetic, which reduces computational complexity and improves performance while maintaining acceptable levels of accuracy.
y ≈ (x1 - z1) * a1 / b1 * ay / by + (x2 - z2) * a2 / b2 * ay / by + zy
This formula replaces the original floating-point computation for adding y = x1 + x2
with zero points and scale factors. It involves addition and multiplication that can be performed natively in the finite field. However, division cannot be done natively, so a custom gate is created to verify division, as mentioned in the original text.
Machine Learning applications in various domians
Throughout all examples, main issue is we have no way to tell whether the results of these inferences are really computed by acclaimed ML model or the accuracy
In this model you can protect the privacy of the prover's secret data and the integrity of the computation in the results simultaneously
You can use ZKPs to address integrity issue.
ZKPs for ML
What are Challenges of ZKML (machine learning inferences is just one of general purpose computations that can be computed)
Example: Nueral Nets
VGG 16 on CIFAR-10
General purpose ZKP systems
Wat Do?
If you only want to support each type of special operation, we can design very efficient ZKPs that go beyond the barrier of linear prover time. You can design protocols with sub-linear prover time.
ML is eating the world
Can we execute ML models trustlessly?
Trustless audits
Decentralized prompt marketplaces
Attested cameras can help - attested cameras contain hardware devices on them that sign the pixels that come off the censor immediately upon capture. After you have the signature it can attest to a specific photo.
ZK-img: attesting to image edits securely and privately
Attested cameras themselves and ZK-img don't just solve the problem
Trustless Face ID - take photo of your face with attested camera to validate photo came from real world. Then perform some image edits or cropping -> input to a face ID model. Then combinging these steps can produce a proof that your face embedding matches one that you uploaded previously. This can all be done privately without reviewing sensitive biometric info to service providers
ZKPs for DNN inference - prior work for toy demonstrations of capabilities, we can scale to ImageNet used in production
ZKML: verified DNN inference - need to make many design choices from ML to proving stack
Optimmized for DNNs for ZK
Optimized arithmetization
Evaluation
With ONNX compilation, we now
Input Data + Model Weights => ZKP circuit for NN => Output
circomlib-ml - a comprehensive circom library containing circutis that compute common layers in TensorFlow Keras
keras2circom - A user-friendly translator that converts ML models in Python into a Circom circuits (easy to onboard new Web2/ML developers)
ZKaggle - a decentralized bounty platform hor hosting, verifying, and paying out bounties, with the added benefit of privacy preservation
What if we have everything public?
Challenges to transpile NNs into ZKP circuits:
ZK-ML/linear-regression-demo (2 yrs ago)
0xZKML/zk-mnist (1.5 yrs ago)
soCathie/ ZKML (10 months ago)
zk-ml/uchikoma(5 months)
zkonduit/ezkl (1 month)
ddang/zkml (3 weeks ago)
If you want privacy, need client side ZKML - {Privacy}
Benchmark
ZK-Friendly
Folding
Model Secrecy - True ZKML
ZKML tries to solve an abstract problem many specific concrete instantiations, produce a zkp that an ml model ran on some input
Generative models are trained to fool classifiers using cryptography to encode human level of judgements. If you are using a hidden input in any ML model that hidden input may be chosen adversarially a downstream user cant tell
biometric-info; as user will run ml model client side but then send result. Social media wants to know that ran model honestly.
consumer may not trust API provider, social media app may not trust someone is user (dating apps an example)
To generate zkp need to transform computation to where every variable is an integer modular (large cryptographically chosen prime) During arithmetic addition or multiplication over prime field is not close to differential operation. The core premise of deep learning is that your model should be differentiable, at least a discrete approximation of a differentiable model. Althogh some non-linearities could have somewhat diff costs to implement in zk always going to have to pay to somehow reconcile this fundamentally non-differentiable prime field object in deep learning land
1980s theory of NN assumed smoothness everywhere
The weights are often stored in int 8 or floating point 8 version, but activations are blown up to higher precision in the intermediate step especially for non-linearities. (Ex) if you are doing softmax - numerically unstable, need high precision in the intermediates.
Ways to bridge the gap b/w differentiable ML and Zkp. Challenges is u have to convince practitioners to use a different non-linearity. Only 5 or 6 ppl use making them switch is a challenge.
Instead of evaluating a function you store precompiled lookup values in nearest neighbor
Do u think we get to a world where ML applications drive the library composition behavior like pytorch or tensor flow
With newer systems the user base is more diverse now seeing many groups develop libraries on top of halo 2 for their specific application
more incentive to develop tooling around matrix multiplication
spartan stuff like sumcheck related systems
At axiom talking with sc application teams with their on chain needs
If you want to prove inference, need to access that inference on chain. We think models on chain like linear regression or traditional ML page ranked algorithms or pca. Space is still very early.
Taking the idea of biometric info and literally put it on chain. Hash use it to authenticate SC.
Maybe you want a data marketplace where you sell private data to customers. Prompt engineering marketplaces u can prove you ran Stable Diffusion on prompt
It appears there are 2 camps of ZKML application research. Reminds me of the dichotomy of ZK-SNARKs for privacy vs. SNARKs for scaling.
Letting an ML model run off chain and bringing its input on-chain.
Having algorithms that are biased in different ways. One nice way to fix this is having a social media company they would publish a hash of the program they use for spam detection, ranking, emphasis, deemphasis. They publish the hash and pre-commit to reveal what the code is, except for the weights. In the meantime, they provide ZKPs to you that what you are running in your browser is the output of that program.
In real time this maybe isn’t as helpful, but what you know in a couple of years the program will be revealed and then people can analyze it. Realistically, it has to be delayed because algorithms have to be closed source for some time. But delay them long enough.
For this to be possible, you need to prove what you as the user see is the output of running this code. Maybe a use case of validity ML and not so much ZKML.
ZKML gets you trust and accountability. You know that a particular model which was agreed upon, even though you don't know the exact parameters of the model, you know this model is the one being run. For example, the twitter algorithm. Even though you don’t know the specifics of the model, you know it's the same one which continues to be used. Something like a language model used as a judge. It turns out that when you go into a courtroom, you see the person of the judge. With ZKML or validity ML you have some accountability. This is a model that judged a fair a mount of cases and performed very well.
UX - When thinking about accountability for proof, how does the end user know and believe this green check mark they received they can know and trust. This is hard to do without blockchain. We expect on-chain organizations will need this.
ZKML is usually a framework that gives you verifiability. In the case of ML we have not had this before.
Imagine a factory takes in different kinds of wood and creates different kinds of doors. This factory is basically the ML algorithm. The machines cutting the wood in the factory are the guts of this ML algorithm. This is the thing which the prover and the verifier agree on beforehand. You agree on the structure of the factory. When the factory owner goers in and changes the knobs and the dials. These are the parameters or settings within the ML algorithm which makes the factory tick. On the ZK side you have the agreement of this factory and a commitment to all the settings within the factory. Even though you don’t know exactly what these settings are, these settings are fixed. Once you fixed the factory and knobs and dials, there is nothing more to be done. Once you have that encoded, your logs as long as they are the same will be producing the same type of doors afterward. All of this verification stuff can happen on chain or can be a single party to single party thing. In general, we like to think of public verifiability as a key thing.
What ZKML is not - there are misconceptions with federated learning in which you have different people training the same algorithm on different systems and then sending them to some, but I don’t think it applies to this distributed training.
What its not is a robust watermark - if you make an output of an LLM and you make a slight change, you can tell it came form that LLM if you were the maker. We can’t do that at all. We do the opposite, if you change any tiny bit of it the proof fails to verify.
ZKML as a way to recognize things. You can be the wallet, the chain can learn to recognize your biometrics or other patterns of your data exhaust or authentication. If you train a language model to imitate you, and it's executed in a proof then in some sense the wallet is you because you can give yourself an agent which can’t lie about your intent. You have delegated your authority to that agent.
ZKML is not by itself - any verification on the hardware side. This by itself does not solve bugs like if someone takes the channel that information is obtained from and replaces it with something different. If you are trying to verify things that involve the physical world, you need something on the verification side aside from the proof.
ZKML - verifiability and scalability. You don’t have to have all these computers running the same computation over and over. Something that can bring more verifiability which is the property that's most important to world coin. ZKML would be helpful with people building decentralized social media. Hiding the weights of their own algorithms.
ZK proofs have these three important properties - privacy, succinctness and verifiability. SNARKs have been used to scale blockchains and provide privacy. When it comes to zk proofs and ML it's not that ML doesn’t have scalability or privacy in a sense (centralized). It comes down to this verifiability component - the power for u to check that a particular model was actually run corerctly.
The verifiable property, you can think of as being made up of soundness and completeness. Soundness is like I can’t produce a false proof. Completeness is if I have the truth, I can prove it. Completeness is less important for ZKML because the input is fuzzy, and the judgement is funny. If I’m having trouble proving my phase, I can take a different picture and that's fine.
Different properties will be important in different use cases. For the twitter use case mentioned before, we can give yourself the goal of making a twitter that respects personal privacy. There is value in keeping user inputs locally and doing the ML in a way that's private. You can’t make training fully private, but you can try to do something differential, maybe. For some of these other use cases, it all depends on how you use the proofs. Privacy becomes more important with applications that wind up interacting with blockchains. ZKPs give you back scalability and privacy. Privacy will just end up as important with proof of humanity applications. Anti-spam, but also not reveal which person was behind an identity. It will often be possible to separate the thing into 2 stages, perhaps where one stage doesn’t need that e identity anonymization.
ML is high overhead and ZK is also high overhead, but I’m not convinced the overheads actually stack. ML is fundamentally structured in its matrix multi vector multiplication, it's linear. For many operations which are structured, there are bespoke ways to do proofs of it. Instead of doing proof of a thing, you do proofs of a random projection of the thing. I wonder to what extent have these techniques been experimented with. Could it be that ZK on top of ML could be much lower overhead then ZK on top of many other things. If it means this may be much more viable.
Asymptotically, matrix vector proof in O(n), asymptotically less than creation of the witness. We have a hope that because we understand at a higher level what we are doing with the computation vs. running a ow level VM we have the ability to have fantastic asymptotic perforce.
This is why this first became possible. When we look at ZKVMs and ZK-EVMs we should be able to intuitively do a lot better. We are not concerned with simulating memory, register allocations, or execution of literal instructions. We are only concerned with matrix multiplication, convolution, non-linearities the basic bare-bones components of these modular neural net models.
For ML, you use floating point 32/16 numbers. Where in ZKs you work over finite fields. Which are integers up until some large prime number, then you cycle back to 0. This conversion is strange but still doable. Bringing the results of ML models to a blockchain allows us to say this; smart contracts on a blockchain are these programmatic things, you just needed simple operations, but now ZKML models allows you to do much more expressive things on chain; language analysis on chain, image or video analysis on chain, but it is still being held at the security of the chain. Everyone is still verifying as if they ran the model themselves. This opens up the ability of smart contracts to process data and do more interesting things.
Right now we are at 100M parameter scale, so we can do small image recognition, extremely small language models, GPT-2 will probably happen soon. Compile an off the shelf GPT-2 with ONNX, you will need a big machine to run it 2 TB of ram. The frontier of what we can do is coming very quickly, bc of this asymptotic improvements. Much of the improvement comes from changing the arguments. We are going from make it right to make it fast. As you start throwing in engineering details, things move faster. Good chance we catch up to the state of the art now maybe a year from now or 2 years from now.
GPT 2 has 1.5 billion parameters. One interesting example is on-chain gaming. Game called AI arena. The point is it is a super smash bros style tournament, instead of you fighting other people online, you train your AI agent to mimic your behavior to then battle another AI. It is difficult for users to trust their models are the ones being run. You would like to know that the model you spent training is actually the one competing in this tournament. With ZKML you can point to this verifier contract that yes this is your model. Areas in which ML is quite important, but also the trust component is very important. Only when these two are married does the tech make sense to us.
There are many different vertices; hardware optimized for ZK, theoretical to create smaller proofs, pure implementation for papers and theory, better tooling. On the hardware side currently ZK is not standardized yet so there are different proving systems competing for different use cases. In the future, we will l have FPGAs and ASICs to bolster the prover's performance to allow for more intensive use cases.
We are lagging in implementing new breaking edge research. We just have to do the engineering work, all the theoretical stuff needed to improve speed is there.
The data side tends to be lacking in ZKML. If you can't control the data source or what is going into your model, it doesn’t matter that you are running a particular model. Adversarial attacks exist in ML. We need signed or attested data sources, we need APIs where the provider says here is a signature that when I give you this response in your API call it comes from us. You have to start from the very beginning, missing any piece in the entire chain lets anyone attack something.
The signature tells you who says the image is real, the provenance of the proof, it doesn’t tell you that the image is real, however.
ZKML is one component of a complex crypto system. You have to think through all the components of the cryptography and the economics in order for the system to function.
Hardware is interesting. We are in a period in the next three years of total chaos in the proof system. Should we forget all of these ZK-EVMs and do it in NOVA instead? That is a nightmare if you are a hardware manufacturer, and you have a longer planning schedule or if you are a miner what do you need? This will be a big part of the puzzle in getting the economics of the system to improve. Can we make the proof system modular enough that proof miners can start to reduce the cost of doing this by a couple orders of magnitude?
On-chain verification implies that anyone anywhere can verify this in perpetuity. This is really powerful, and even stronger than what we need. There is a company, Jensen AI that is doing decentralized training. You are entrusting a network of gpu nodes to train your model for you. They don’t have the credibility of Amazon, so you can’t naturally trust them. They could generate a ZKP that says they are training your model or executing your model correctly. Likewise, they don’t need public verifiably, they only need to convince you. IN these cases you can use weaker forms of proofs like interactive proofs, which can be a lot more concretely efficient.
EZKL - using Halo 2, PSE’s branch of Halo 2 at some point will support other proof systems. Amount of code that touches API of Halo2 is about 10%.
Modulus is focused on older and more theoretical line of work, involves layered circuits which mimic the properties of NN very well. Prover is very fast as long as compute is structured.
Trust is just I know that people who give me an AI service can prove to me what they are using. Accountability is difficult. We are in the early stages of thinking how to make the AIs accountable. You can make people who are building accountable through the legal system.
One of the fun things about ZKML is it offers this different kind of accountability. Imagine you have a powerful server giving instructions to a week robot. This is vulnerable to the server being corrupted and it will give an erroneous instruction. With ZKML this server can only produce correct results. It is computing some function which has been defined in advance. Even if an adversary were to run some computation, they could still only prove correct executions. Maybe they can also mess with the input. But, if the input is signed, then the whole supply chain becomes secure.
Maybe we could audit decisions made by autonomous vehicles. After a crash, you can prove that it has or hasn’t been tampered with or whose algorithm was run. Or if I want to delegate to my agent to act for me after I die to execute my estate now I don’t have to care who runs that agent. I know the agent will only be able to produce self-signed decisions.
The other way it bolsters trust is that most ML models because it takes time and effort to train them, folks don't want to just reveal them. You as a purchaser of ML model may want to know its. A good ML model beforehand. But now with ZK you can on some test set which is representative of inputs you can send this to model vendor and vendor can generate a proof that their algorithm gets a certain metric on this testset. My model gets 98% accuracy or something. Now you are convinced that their model does well on your testset. Now we can evaluate models in advance and trust they do the thing the vendor claims they do. This adds a bit of trust to a model marketplace, for example.