ZKML Notes

Sources

Papers

Scaling up Trustless DNN Inference with Zero-Knowledge Proofs

Articles

Presentations

High Level

How can consumers of these services trust that the service has correctly served the predictions?
In order to do so, we use a cryptographic technology called ZK-SNARKs (zero-knowledge succinct non-interactive argument of knowledge), which allow a prover to prove the result of a computation without revealing any information about the inputs or intermediate steps of the computation.
ZK-SNARKs allow an MLaaS provider to prove that the model was executed correctly post-hoc, so model consumers can verify predictions as they wish. Unfortunately, existing work on ZK-SNARKs can require up to two days of computation to verify a single ML model prediction.
In order to address this computational overhead, we have created the first ZK-SNARK circuit of a model on ImageNet (MobileNet v2) achieving 79% accuracy while being verifiable in 10 seconds on commodity hardware. We further construct protocols to use these ZK-SNARKs to verify ML model accuracy, ML model predictions, and trustlessly retrieve documents in cost-efficient ways
Building on our efficient ZK-SNARKs, we also show that it’s possible to use these ZK-SNARKs for a variety of applications. We show how to use ZK-SNARKS to verify ML model accuracy. In addition, we also show that ZK-SNARKs of ML models can be used to trustlessly retrieve images (or documents) matching an ML model classifier. Importantly, these protocols can be verified by third-parties, so can be used for resolving disputes.
The Model Consumer wants to verify the model accuracy to ensure that the Model Provider is not malicious, lazy, and or erroneous (i.e., has bugs in the serving code).
To verify model accuracy, the model provider (MP) will commit to a model by hashing its weights. The model consumer (MC) will then send a test set to the MP, on which the MP will provide outputs and ZK-SNARK proofs of correct execution. By verifying ZK-SNARKs on the test set, MC can be confident that MP has executed the model correctly. After the model accuracy is verified, MC can purchase the model or use the MP as an MLaaS provider

How to Acieve ZKML

The most common methods of doing secure ML are with multi-party computation (MPC), homomorphic encryption (HE), or interactive proofs (IPs). As we describe, these methods are either impractical, do not work in the face of malicious adversaries (Knott et al., 2021; Kumar et al., 2020; Lam et al., 2022; Mishra et al., 2020), or do not hide the weights/inputs (Ghodsi et al., 2017b). In this work, we propose practical methods of doing verified ML execution in the face of malicious adversaries
- MPC - One of the most common methods of doing secure ML is with MPCs, in which the computation is shared across multiple parties (Knott et al., 2021; Kumar et al.,2020; Lam et al., 2022; Mishra et al., 2020; Jha et al., 2021). There are a variety of MPC protocols with different guarantees. However, all MPC protocols have shared properties: they require interaction (i.e., both parties must be simultaneously online) but can perform computation without revealing the computation inputs (i.e., weights and ML model inputs) across partiesThere are several security assumptions for different MPC protocols. The most common security assumption is the semi-honest adversary, in which the malicious party participates in the protocol honestly but attempts to steal information. In this work, we focus on potentially malicious adversaries,MPC that is secure against malicious adversaries is impractical: it can cost up to 550 GB of communication and 657 seconds of compute per example on toy datasets (Pentyala et al., 2021).
- (HE) Homomorphic encryption allows parties to perform computations on encrypted data without first decrypting the data (Armknecht et al., 2015). HE is deployed to preserve privacy of the inputs, but cannot be used to verify that ML model execution happened correctly. Furthermore, HE is incredibly expensive. Since ML model inference can take up to gigaflops of computation, HE for ML model inference is currently impractical, only working on toy datasets such as MNIST or CIFAR-10 (Lou & Jiang, 2021; Juvekar et al., 2018).
- ZK-SNARKs for secure ML. Some recent work has produced ZK-SNARK protocols for neural network inference on smaller datasets like MNIST and CIFAR-10. Some of these works like (Feng et al., 2021) use older proving systems like (Groth, 2016). Other works (Ghodsi et al., 2017a; Lee et al., 2020; Liu et al., 2021; Weng et al., 2022) use interactive proof or ZK-SNARK protocols based on sumcheck (Thaler, 2013) custom-tailored to DNN operations such as convolutions or matrix multiplications. Compared to these works, our work in the modern Halo2 proving system (zcash, 2022) allows us to use the Plonkish arithmetization to more efficiently represent DNN inference by leveraging lookup arguments and well-defined custom gates. Combined with the efficient software package halo2 and advances in automatic translation, we are able to outperform these methods
Prior work on SNARKing neural networks using proof systems intended for generic computations started with the more limited R1CS arithmetization (Gennaro et al., 2013) and the Groth16 proof system (Groth, 2016), in which neural network inference is less efficient to express. In Section 4, we describe how to use this more expressive Plonkish arithmetization to efficiently express DNN inference.

Converting floating point to Fixed point is an issue

Quantization and fixed-point. Neural network inference is typically done in floating-point arithmetic, which is extremely expensive to emulate in the prime field of arithmetic circuits. To avoid this overhead, we focus on DNNs quantized in int8 and uint8. For these DNNs, weights and activations are represented as 8 bit integers, though intermediate computations may involve up to 32 bit integers. In these quantized DNN, each weight, activation, and output is stored as a tuple (wquant, z, s), where wquant and z are 8-bit integer weight and zero point, and s is a floating point scale factor. z and s are often shared for all weights in a layer, which reduces the number of bits necessary to represent the DNN. In this representation, the weight wquant represents the real number weight:

w = (wquant − z) · s

To more efficiently arithmetize the network, we replace the
floating point s by a fixed point approximation a
b for a, b ∈ N and compute w via

w = ((wquant − z) · a)/b

where the intermediate arithmetic is done in standard 32-bit integer arithmetic. Our choice of lower precision values of a and b results in a slight accuracy drop but dramatic improvements in prover and verifier performance. As an example of fixed point arithmetic after this conversion, consider adding y = x1 + x2 with zero points and scale factors zy, z1, z2 and sy, s1, s2, respectively. The floating point computation:

(y − zy ) · sy = (x1 − z1) · s1 + (x2 − z2) · s2

is replaced by the fixed point computation

y ≈ (x1 - z1) * a1 / b1 * ay / by + (x2 - z2) * a2 / b2 * ay / by + zy

The addition and multiplication can be done natively in the finite field, but the division cannot. To address this, we factor the computation of each layer into dot products and create a custom gate to verify division. We further fuse the division and non-linearity gates for efficiency. We describe this process below.

GPT-4 Explanation

The author is discussing how to build a large language model (LLM) inside a cryptographic proof system called zk-SNARK. One challenge is dealing with floating-point numbers, which are difficult to work with in this context.
To solve this issue, the author uses a process called quantization, which converts floating-point numbers into a smaller set of integers (8-bit integers in this case). In the quantized deep neural network (DNN), each weight, activation, and output is stored as a tuple (wquant, z, s). The real number weight 'w' is calculated using the formula:

w = (wquant - z) * s

To make the calculations more efficient, the floating-point scale factor 's' is replaced with a fixed-point approximation a/b (a and b are natural numbers). The weight 'w' is then computed as:

w = ((wquant - z) * a) / b

This method uses 32-bit integer arithmetic and results in lower precision values of 'a' and 'b'. Although this causes a slight drop in accuracy, it improves the performance of the proof system.

The author also provides an example of fixed-point arithmetic after the conversion. The floating-point computation for adding y = x1 + x2 with zero points and scale factors is replaced with fixed-point computation. This new computation involves addition and multiplication that can be performed natively in the finite field. However, division cannot be done natively, so a custom gate is created to verify division.

In summary, the author explains a technique to build an LLM inside a zk-SNARK by converting floating-point numbers to fixed-point numbers. This is done using quantization and fixed-point arithmetic, which reduces computational complexity and improves performance while maintaining acceptable levels of accuracy.

y ≈ (x1 - z1) * a1 / b1 * ay / by + (x2 - z2) * a2 / b2 * ay / by + zy

This formula replaces the original floating-point computation for adding y = x1 + x2 with zero points and scale factors. It involves addition and multiplication that can be performed natively in the finite field. However, division cannot be done natively, so a custom gate is created to verify division, as mentioned in the original text.

ZKML step flow Input -> SNARK verification

Input Layer: Begin with the input layer of the LLM, which receives input data (e.g., text or tokens) to be processed within the zk-SNARK.
Quantization: Perform quantization, converting floating-point numbers (weights, biases, and activations) to fixed-point numbers or smaller integer representations to enable more efficient processing within the zk-SNARK.
Encoding: Encode the quantized model parameters, input data, and intermediate values in the prime field used by the zk-SNARK arithmetic circuit.
Arithmetic Circuit: Process the encoded input data through the zk-SNARK arithmetic circuit, which is designed to mimic the computations of the LLM using fixed-point arithmetic and custom gates for division, non-linearity, and other operations that cannot be natively performed in the finite field.
Decoding: Decode the output of the zk-SNARK arithmetic circuit back into a more standard representation (e.g., converting fixed-point numbers to floating-point numbers).
Output Layer: The decoded output is then passed through the output layer of the LLM, which produces the final predictions or classifications.
zk-SNARK Proof Generation: Generate a zk-SNARK proof demonstrating that the LLM computation was performed correctly without revealing any sensitive information about the input data or the model parameters.
zk-SNARK Verification: A verifier checks the validity of the zk-SNARK proof, ensuring that the LLM computation was performed correctly and securely.

ZKPs & Application to ML (Yupeng Zhang)

Machine Learning applications in various domians

Image Processing
Playing Go
Speech Recognition
Discovering Antibiotics

Integrity issues in ML

Reproducibility - many ML models claim to have high accuracy on some data sets, but later nobody can reproduce results at all. Ongoing debate in ML community
Validity - Even if you have high quality model how can you make sure company is using ML model in their model. (human guided burrito bot raises questions about the future of robot delivery)
AI fairness - how can we make sure that machine learning models are not biased? Not looking to attributes of geneder or race or approved by some authorities or experts

Throughout all examples, main issue is we have no way to tell whether the results of these inferences are really computed by acclaimed ML model or the accuracy

Naive solutions to integrity issues: disclosing the ML model

Difficult to do in practice becasue people are not willing to share their ML model - Compromise the privacy of ML models
Can we validate ML inferences and accuracy without seeing the underlying ML models?
The cryptographic solution is Zero knowledge proofs
ZKP allows the Prover to make a claim about the secret data.

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

In this model you can protect the privacy of the prover's secret data and the integrity of the computation in the results simultaneously

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

You can use ZKPs to address integrity issue.

Secret data is the Prover is ML model
Public computation is the ML inference/accuracy on user input/public testing dataset
The claim the prover is making is that computation on this model and data set = 99.9%
The model remains private but the computation is publicly verified/
Reproducibility - attach a proof for the claimed accuracy
Validity and Fairness - Prove that inferences are from a committed/approved ML model

ZKPs for ML

What are Challenges of ZKML (machine learning inferences is just one of general purpose computations that can be computed)

In theory there is no difference between theory and practice but in practice there is
High overhead to apply generic ZKP to ML algorithms
- Efficiency
- Scalability

Example: Nueral Nets
VGG 16 on CIFAR-10

15 million parameters in the model
2^32 gates for inference

General purpose ZKP systems

libSNARK: scales to 2^25 gates on with 16GM memory (prover time 30 min)
Virgo (IP-based) scale to 2^26 gates (prover time 1 minute)
If you want to apply ZKP schemes to VGG 16 it will take hours to days to generate a ZKP for a single inference

Wat Do?

Design special purpose ZKP for common operations in ML computations
- Fully connected layers (matrix multiplications) or convolution layers (2D convolutions) for image processing

If you only want to support each type of special operation, we can design very efficient ZKPs that go beyond the barrier of linear prover time. You can design protocols with sub-linear prover time.

C (x, y) = \sum_{i = 1}^{n} A (x, i) \cdot B (i, y)

Can apply a sumcheck protocol to check the relationship for matrix multiplication, additional prover time is
$O (n^{2})$ , proof size
$O (l o g n)$
Faster than computing the result in
$O (n^{3})$
This is possible because We don't verify each step of the computation, you only need an algorithm to validate that the result given the result is already computed. That is why for these special operations we can have protocols with prover time that is faster than computing
Compatible with GKR protocol
Can Compute Convolution using FFT

Efficient Sumcheck for FFT

FFT: evaluating polynomial at powers of root of unity:

$w^{n}$
$=$
$1$
$m o d$
$p$
- An efficient sumcheck protocol with prover time
  $O (n)$ , proof size
  $O (l o g n)$ , verifier time
  $O (l o g^{2} n)$
- faster than computing the result in
  $O (n l o g n)$

Additional Optimizations

Performance

Prover 176x faster than prior work (vCNN)
2-3 orders of magnitude faster than generic ZKP
Verification faster than computing
Proof size smaller than the model

ZK Decision Trees

Summary

zkCNN efficient ZKP protocols
- Matrix multiplication
- Convulution
- ReLU and max pooling
ZK decision Tree

Scaling Trustless DNN Inference with ZK-SNARKs to GPT, ResNet, and more

ML is eating the world
- ChatGPT
- Google Bard AI
- Bing Search
- Stable Diffusion
- Midjourney
Can we execute ML models trustlessly?
- MPC & HE - can give privacy and validity, no interaction required, high compute overhead
- ZKPs - Prove computation can happen correctly, anyone can verify

Proof can be validated by any verifier after the fact
Proofs can be computed after the computation is done

Applications

Trustless audits
- prove that you ran the FDA regulated model (medical ML)
  Prove that my CT timeline isn't biased
- Proof of Training - commit to training data. Commit to images you are using for ML medical model. Produce ZK-SNARKs of training to prove model was trained correctly
- Test-time trustless audit
  $F (H_{1}, H_{2}, . . ., H_{n}; W)$
  - Compute audit functions over data, weights
  - Want to make sure data isn't biased (twitter algo)
  - Model performs well
  - Training data contains no copywrighted images
Decentralized prompt marketplaces

Prompts are valuable and difficult!
Proof of prompts - take hidden inputs and generate ZK-SNARK that produces an output
- purchase prompt and can modify for particular use cases

Trustless biometric identification/fighting deepfakes
- Deepfakes are images or video that modify what a person looks like for a malicious purpose or to spread misinformation, this is different than harmless uses of generative AI.
- Deepfakes are on the rise
  - used by state actors to spread misinformation
  - trick businesses into miswiring funds
  - trick consumers into being scammed

Attested cameras can help - attested cameras contain hardware devices on them that sign the pixels that come off the censor immediately upon capture. After you have the signature it can attest to a specific photo.
- Photo takers want to edit image privately - crop out specific information
- If you want to do in a privacy preserving way you can't reveal the original image
- In some sense no way to fully block against physical attacks but want to raise the cost of performing the attacks beyond being able to do attacks entirely in software (eg. plastic surgery)
ZK-img: attesting to image edits securely and privately
- given the signature or commitment to original image ZK-img can take this as a hidden witness and output the edited image that is done using ZK-SNARKs so the consumer of the photo can verify that the edits were done honestly (didn't run an AI method that swapped out the face)
Attested cameras themselves and ZK-img don't just solve the problem
- ZK-img needs to be combinded with other technology. Decentralized ways with tracking signatures
  - registry of images and proofs. Whenever someone wants to consume image they can consult registry
  - Attesting to multiple edits - may need to use multiple proofs
Trustless Face ID - take photo of your face with attested camera to validate photo came from real world. Then perform some image edits or cropping -> input to a face ID model. Then combinging these steps can produce a proof that your face embedding matches one that you uploaded previously. This can all be done privately without reviewing sensitive biometric info to service providers

Open-Sourcing ZKML

ZKPs for DNN inference - prior work for toy demonstrations of capabilities, we can scale to ImageNet used in production
ZKML: verified DNN inference - need to make many design choices from ML to proving stack
- choose correct architecture
- Arithmetize DNN computation efficiently
- leveraging proving system for high efficiency
Optimmized for DNNs for ZK
- quantized models
- optimize layout for framework
- can achieve high accuracy
Optimized arithmetization
- linear layers via custom gates (polynomial constraints)
- Non-linearities via lookups
- Fused fixed-point arithmetic
Evaluation
- MobileNet v2

Applications described depend on being able to compute these ZK-SNARKs in a reasonable amount of time

Compared to $85,000 for a moderate size dataset

Conclusion

We are increasingly interacting with digital systems
ZKPs provide trust in the face of adversaries
- for ML Models
- Against Deepfakes
Framework is open-source

EZKL - ZKML Devcon talk

Why ZKML?

Gives the blockchain eyes to perceive the physical world - make decisions about physical reality, satisfy human intents
Makes it possible for a human, not a field element, to own digital assets - hey Ethereum please transfer 10 ETH to Vitalik.eth
Lets Smart contract exercise judgements - deal with any kind of ambiguous situation decide if a contract is satisfied

EZKL: Turn an ONNX model into a ZKP

Prove + verify at command-line (or binary, contract, WASM)
Adding layers daily, enough for small production models
Performance improving 2~8x per month
Focused on feature completeness, then optimization
Apache 2.0

Can define a forward model x,y,z are tensors with shape determined at runtime

Can define arbitrary functions, matrix multiplications, non-linearities, powers, compose them and then the tool will determine a quantization strategy figure out the runtime shape of the tensors and translate that into something that can be run as a ZKP

def forward(self, x, y, z):

x = self.sigmoid(self.connc2 (x + y @ x**2 - self.relu(z))) + 2

return x

ZKML allows scalable automated Oracles

Ingesting some kind of signed data - rate limit or signed off (attestation)
Text Models, Image classification, makes a decisions about what the data was
On-chain verification feeds back into attestation data loop can be used in the next model

Signed Ingestion - Authenticated content is here and verifiable in a ZK-SNARK

Http: SXG (signed http standard promulgated by google), signed AMP, signed endpoints (Cloudflare one-click SXG, ngix)
Email (DKIM)- Domain Keys Identified Mail is a method used to authenticate email messages. It helps ensure that the email was actually sent by the domain it claims to be from, and that its content has not been tampered with during transit
Images at the publisher (NYT signs off on imagery): C2PA, Images at the camera
Third-party notaries (Lit, TLS Notary, Deco) create signatures as semi-trusted third party
All use standard signature schemes (ECDSA, RSA, Ed25519) that are now verifiable in ZK-SNARKs and/or on-chain
We need an https-like push to SIGN YOUR DATA

Test Models Image classification - Ontogeny recapitulates phylogeny roadmap

With ONNX compilation, we now

Download the next model in the hisdtory of AI (from MNIST to Stable Diffusion)
Fix any model size or quantization problems, implements any new nodes or gadgets
Repeat
Scale with Optimization, Aggregation, Recursion, Fusions
- Optimization -
- Aggregation - a tool for combining multiple proofs into one proof and checking once
- Recursion - being able to verify the last STARK inside the new one, lets you make a separate proof for each layer. No memory constraint per se but a money constraint
- Fusion - strategy of once we have a higher level understanding of the intent of the programmer, the computational circuit they are expressing in python its easy to swap out sophisticated ZK arguments instead of looking at the level of constraints look at the level of MSM, convolution arguments and make invisible to dev

On-chian verification

Constrained by precompiles we have on chain in Ethereum
Stage 1 - input easy to make and hard to verify proofs
Stage 2- those proofs are aggregated with strategy requiring hefty machine (450 GB RAM)
Stage 3 - Hard to make easy to verify proof (600k gas on chain)

ZKML will be table stakes for chains for next 5-10 years

Delivering on the promise of blockchain to a mass audience will require more robust but still-decentralized identity solutions
- fully indentify solutions years off, but growth will be fast
ZKML Oracles will be simpler, faster, and much more scalable
- put arbitrary off-chain data on chain
- Opens the firehose to get data on chain
A ZKML Model is a 'smart judge' that can interpret ambiguous events…
- ZK KYC
  - Prove the person and id match, and the id is not sanctioned
  - Regulators won't accept as KYC, but
  - Could have prevented tornado sanctions
- Prediction Markets
  - Classifying text into few classes possible with small models
  - Construct a smart contract that pays if a news story classifies to the predicted outcome (election outcome, hurricane intensity, covid variant)
  - Anyone can download signed story, run model, submit proof
- Gut check for smart contracts
  - SC or abstracted account adds a zkml fraud/ spam check for unusual behavior (rate limit with proof of humanity, weak Sybil protection)
- Put the A in DAO
  - Now: humans judge, vote, signatories use multisig
  - Replace with on-chain AI automation, eg. for contract fullfillment
- MPC + ZK: Genetic screening
  - Patient wants a prediction (eg. chance of developing a genetic disorder), but to check anonymously, choose whether and to whom to reveal
  - Screening models are trained on controlled data, and cannot be publicly shared
  - Model should be private to model owner, data to patient
  - Model owner and patient can prove inference in MPC, revealing only outcome, and patient gets a certified prediction
- Differential Privacy + ZKML: Census
  - Secret real data is commited to publicly (revealing nothing at regular intervals
  - Server creates differentially-private noisy marginal summary on which clients are free to prototype analysis
    - Client iterates locally on summary, decides on model M and sends to data owner
  - Server runs the model in ZK on the real full-table data, returns the result to the client, and proves to the client:
    - The real data matches the commitment
    - The real data was used to create the noisy marginal summary
    - When Model M was run on the real data, it produced the returned result

Unraveling ZKML Present Realities and Future Horizons in Privacy-Preserving AI

What is ZKML

Input Data + Model Weights => ZKP circuit for NN => Output

We need ZK but don't want everything to be private
- first use case hold model weights public but keep input data private
  - Face recognition (don't want to put facial features on chain, but might trust the model)
- Public input data, but private model weights (assetsassett or IP for companies)
  - How to hide model weights, keep model proprietary, but everyone will trust the legitimacy of the model. Since the input data is public can make sure the data is correct. You know company is being consistent running the same model over and over
- Private input public model
  - biometric authentication, e.g. SC wallet?
  - private image/data marketplace (sell an image to you how do I sell you image then?)
- Public input private model
  - algorithm for monetization/marketplace e.g. decentralized Kaggle?

ZKML POC

circomlib-ml - a comprehensive circom library containing circutis that compute common layers in TensorFlow Keras
keras2circom - A user-friendly translator that converts ML models in Python into a Circom circuits (easy to onboard new Web2/ML developers)
ZKaggle - a decentralized bounty platform hor hosting, verifying, and paying out bounties, with the added benefit of privacy preservation
What if we have everything public?
- We need some form of a rollup even for computation - perform off-chain computation and make everything public showing computation was done correctly

ZKML Timeline

Challenges to transpile NNs into ZKP circuits:

Floating-point weights -> fixed-point arithmetic
1. Scale it up and truncate decimal numbers
2. Quantize it - find min/max of range and slice into bits
Model size depth - difficult to have big or deep model because it increases prover time

What has been done?

ZK-ML/linear-regression-demo (2 yrs ago)
- By Peiyuan Liao @ Linear A
- Written in circom
- LR only
0xZKML/zk-mnist (1.5 yrs ago)
- By 0xZKML @ 0xPARC
- written in circom
- Final dense layer only
soCathie/ ZKML (10 months ago)
- By cathie So @ PSE
- Put full convolutional NN onto ZKP circuit
zk-ml/uchikoma(5 months)
- By Peiyuan Liao @ Linear A
- written in circom
- transpiler for non-fp runtime
- AIGC as NFTs
zkonduit/ezkl (1 month)
- Written in Halo 2
- Major update: model with 100M params
ddang/zkml (3 weeks ago)
- written in halo 2
- In order to prove GPT-2 Model in ZKP you need around 1 TB of RAM on server side
If you want privacy, need client side ZKML - {Privacy}

What can be done?

Benchmark
- Currently ZK oriented (e.g. prover time, verifier time)
- Model accuracy is needed not just ZK benchmarks - should also be part of the benchmark (truncating or quantizing methods impact the benchmarking)
ZK-Friendly
- Weightless/non-floating point ML paradigms - do these exist? Literature in the 1970s have many models using integer weights (identifying fingerprints for example)
- eg. "Zero Gravity" the winning project at ZK Hack Lisbon
Folding
- Nova
- Sangria
- Project "zator" verifies a 512-layer CNN (design a model so each block is the same structure 512-> 256 etc)
- Could be impractical - hard to tell a Web 2 or ML company to change model architecture for folding.
- Batch Inference Folding
  - Applying the same model/circuit over a large dataset
  - Minimum ML modifications
Model Secrecy - True ZKML
- Functional commitments - have ID or hash which represents what the circuit or function does
  - Model architecture is currently public - weights are secret today, for smaller models we need model secrecenc
  - How do we hide the circuit?
  - zkVM? a solution?
- MPC
  - For federated learning or verifiable training

Where ZK & ML intersect

ZKML tries to solve an abstract problem many specific concrete instantiations, produce a zkp that an ml model ran on some input

inherits the nice properties of ZKPs

Why ZKML?

ML provider may have some model weights they want to keep hidden
API provider can use ZK techniques to prove model ran as expected
Cloud provider/API provider hacked - relevant for medical predictions. If you send proof you know model is run correctly
But there could be bugs in the model
Model provider may be lazy to save money run a smaller model but not what you expect

Challenges for Humans

Challenging in areas where as a human you can’t evaluate the output
You as a human shouldn’t be overriding the model so you want the gold standard but not a cheap approximation

How?

Compute a hash of the of image verify hash with attested censor only reveal hash at the end
weights = parameters on function
whether an image or voice is authentic is becoming more difficult to determine.

Generative models are trained to fool classifiers using cryptography to encode human level of judgements. If you are using a hidden input in any ML model that hidden input may be chosen adversarially a downstream user cant tell

biometric-info; as user will run ml model client side but then send result. Social media wants to know that ran model honestly.
consumer may not trust API provider, social media app may not trust someone is user (dating apps an example)

Linear algebra and non-linearities

Proof systems that are good for matrix multiplications are not good for non-linearities and vica versa
Trading off proving time for model accuracy

Challenges of NN in a ZKP

To generate zkp need to transform computation to where every variable is an integer modular (large cryptographically chosen prime) During arithmetic addition or multiplication over prime field is not close to differential operation. The core premise of deep learning is that your model should be differentiable, at least a discrete approximation of a differentiable model. Althogh some non-linearities could have somewhat diff costs to implement in zk always going to have to pay to somehow reconcile this fundamentally non-differentiable prime field object in deep learning land

1980s theory of NN assumed smoothness everywhere

The weights are often stored in int 8 or floating point 8 version, but activations are blown up to higher precision in the intermediate step especially for non-linearities. (Ex) if you are doing softmax - numerically unstable, need high precision in the intermediates.

Ways to bridge the gap b/w differentiable ML and Zkp. Challenges is u have to convince practitioners to use a different non-linearity. Only 5 or 6 ppl use making them switch is a challenge.

Quantization techniques focus of work of ML on edge devices to save power on cell phone
Roughly speaking Difficulty of implementing inference in zk can be proxied by how much battery power a model actually takes
People have been working on quantizing models and reducing compute - leverage model to pick best model for zk
Proving stack is halo2 which supports lockup tables which are helpful for non-linearities
Using a production system like halo2 gives access to better tooling and real world implementation
Data set caled img-net - largest dataset in ML
- matrix is 224 x 224

Instead of evaluating a function you store precompiled lookup values in nearest neighbor

Do u think we get to a world where ML applications drive the library composition behavior like pytorch or tensor flow

Axiom writing fixed point math libraries.
take models in tensor flow and turning them into zkps
circom was developed to write hermez zkr

With newer systems the user base is more diverse now seeing many groups develop libraries on top of halo 2 for their specific application

more incentive to develop tooling around matrix multiplication

spartan stuff like sumcheck related systems

At axiom talking with sc application teams with their on chain needs

If you want to prove inference, need to access that inference on chain. We think models on chain like linear regression or traditional ML page ranked algorithms or pca. Space is still very early.

Elements of statistical learning

many algorithms can be applied on chain today with scale of data that is available
trevor hasty trip sirani
2012 seismic shift in ML

Taking the idea of biometric info and literally put it on chain. Hash use it to authenticate SC.

Maybe you want a data marketplace where you sell private data to customers. Prompt engineering marketplaces u can prove you ran Stable Diffusion on prompt

Zkpod ai with microphone attestation

Random thoughts

It appears there are 2 camps of ZKML application research. Reminds me of the dichotomy of ZK-SNARKs for privacy vs. SNARKs for scaling.

On the one hand you have the notion of proving model integrity; reproducibility, validity, & fairness. Powerful idea can be used for many things, quantization or other optimization issues aside.
On the other hand, using ZKML to be a preference master that blockchain users interface with, LLM as a rollup etc. Exciting but seems beyond realization even in 18 months.

Notes from Zuzalu Panel on ZKML

Framework for AI primitive stack

Interface - interact with the AI through (chatbot)
Data - many places to decentralize
Models & Compute - training the model, taking preprocessed data through the model and teaching it how you want to it behave.
Inference - taking the model, putting it in the real worked and see how it behaves(ex. model trained on German shepherds sees Boston terrier and can still say it's a dog).
Governance - decentralized AI

Letting an ML model run off chain and bringing its input on-chain.

Why is this important?

Having algorithms that are biased in different ways. One nice way to fix this is having a social media company they would publish a hash of the program they use for spam detection, ranking, emphasis, deemphasis. They publish the hash and pre-commit to reveal what the code is, except for the weights. In the meantime, they provide ZKPs to you that what you are running in your browser is the output of that program.

In real time this maybe isn’t as helpful, but what you know in a couple of years the program will be revealed and then people can analyze it. Realistically, it has to be delayed because algorithms have to be closed source for some time. But delay them long enough.

For this to be possible, you need to prove what you as the user see is the output of running this code. Maybe a use case of validity ML and not so much ZKML.

ZKML gets you trust and accountability. You know that a particular model which was agreed upon, even though you don't know the exact parameters of the model, you know this model is the one being run. For example, the twitter algorithm. Even though you don’t know the specifics of the model, you know it's the same one which continues to be used. Something like a language model used as a judge. It turns out that when you go into a courtroom, you see the person of the judge. With ZKML or validity ML you have some accountability. This is a model that judged a fair a mount of cases and performed very well.

UX - When thinking about accountability for proof, how does the end user know and believe this green check mark they received they can know and trust. This is hard to do without blockchain. We expect on-chain organizations will need this.

ZKML is usually a framework that gives you verifiability. In the case of ML we have not had this before.

For ZKML the best analogy

Imagine a factory takes in different kinds of wood and creates different kinds of doors. This factory is basically the ML algorithm. The machines cutting the wood in the factory are the guts of this ML algorithm. This is the thing which the prover and the verifier agree on beforehand. You agree on the structure of the factory. When the factory owner goers in and changes the knobs and the dials. These are the parameters or settings within the ML algorithm which makes the factory tick. On the ZK side you have the agreement of this factory and a commitment to all the settings within the factory. Even though you don’t know exactly what these settings are, these settings are fixed. Once you fixed the factory and knobs and dials, there is nothing more to be done. Once you have that encoded, your logs as long as they are the same will be producing the same type of doors afterward. All of this verification stuff can happen on chain or can be a single party to single party thing. In general, we like to think of public verifiability as a key thing.

ZKML is not

What ZKML is not - there are misconceptions with federated learning in which you have different people training the same algorithm on different systems and then sending them to some, but I don’t think it applies to this distributed training.

What its not is a robust watermark - if you make an output of an LLM and you make a slight change, you can tell it came form that LLM if you were the maker. We can’t do that at all. We do the opposite, if you change any tiny bit of it the proof fails to verify.

ZKML as a way to recognize things. You can be the wallet, the chain can learn to recognize your biometrics or other patterns of your data exhaust or authentication. If you train a language model to imitate you, and it's executed in a proof then in some sense the wallet is you because you can give yourself an agent which can’t lie about your intent. You have delegated your authority to that agent.

ZKML is not by itself - any verification on the hardware side. This by itself does not solve bugs like if someone takes the channel that information is obtained from and replaces it with something different. If you are trying to verify things that involve the physical world, you need something on the verification side aside from the proof.

ZKML - verifiability and scalability. You don’t have to have all these computers running the same computation over and over. Something that can bring more verifiability which is the property that's most important to world coin. ZKML would be helpful with people building decentralized social media. Hiding the weights of their own algorithms.

ZK proofs have these three important properties - privacy, succinctness and verifiability. SNARKs have been used to scale blockchains and provide privacy. When it comes to zk proofs and ML it's not that ML doesn’t have scalability or privacy in a sense (centralized). It comes down to this verifiability component - the power for u to check that a particular model was actually run corerctly.

The verifiable property, you can think of as being made up of soundness and completeness. Soundness is like I can’t produce a false proof. Completeness is if I have the truth, I can prove it. Completeness is less important for ZKML because the input is fuzzy, and the judgement is funny. If I’m having trouble proving my phase, I can take a different picture and that's fine.

Different properties will be important in different use cases. For the twitter use case mentioned before, we can give yourself the goal of making a twitter that respects personal privacy. There is value in keeping user inputs locally and doing the ML in a way that's private. You can’t make training fully private, but you can try to do something differential, maybe. For some of these other use cases, it all depends on how you use the proofs. Privacy becomes more important with applications that wind up interacting with blockchains. ZKPs give you back scalability and privacy. Privacy will just end up as important with proof of humanity applications. Anti-spam, but also not reveal which person was behind an identity. It will often be possible to separate the thing into 2 stages, perhaps where one stage doesn’t need that e identity anonymization.

ML is high overhead and ZK is also high overhead, but I’m not convinced the overheads actually stack. ML is fundamentally structured in its matrix multi vector multiplication, it's linear. For many operations which are structured, there are bespoke ways to do proofs of it. Instead of doing proof of a thing, you do proofs of a random projection of the thing. I wonder to what extent have these techniques been experimented with. Could it be that ZK on top of ML could be much lower overhead then ZK on top of many other things. If it means this may be much more viable.

Asymptotically, matrix vector proof in O(n), asymptotically less than creation of the witness. We have a hope that because we understand at a higher level what we are doing with the computation vs. running a ow level VM we have the ability to have fantastic asymptotic perforce.

This is why this first became possible. When we look at ZKVMs and ZK-EVMs we should be able to intuitively do a lot better. We are not concerned with simulating memory, register allocations, or execution of literal instructions. We are only concerned with matrix multiplication, convolution, non-linearities the basic bare-bones components of these modular neural net models.

For ML, you use floating point 32/16 numbers. Where in ZKs you work over finite fields. Which are integers up until some large prime number, then you cycle back to 0. This conversion is strange but still doable. Bringing the results of ML models to a blockchain allows us to say this; smart contracts on a blockchain are these programmatic things, you just needed simple operations, but now ZKML models allows you to do much more expressive things on chain; language analysis on chain, image or video analysis on chain, but it is still being held at the security of the chain. Everyone is still verifying as if they ran the model themselves. This opens up the ability of smart contracts to process data and do more interesting things.

Right now we are at 100M parameter scale, so we can do small image recognition, extremely small language models, GPT-2 will probably happen soon. Compile an off the shelf GPT-2 with ONNX, you will need a big machine to run it 2 TB of ram. The frontier of what we can do is coming very quickly, bc of this asymptotic improvements. Much of the improvement comes from changing the arguments. We are going from make it right to make it fast. As you start throwing in engineering details, things move faster. Good chance we catch up to the state of the art now maybe a year from now or 2 years from now.

GPT 2 has 1.5 billion parameters. One interesting example is on-chain gaming. Game called AI arena. The point is it is a super smash bros style tournament, instead of you fighting other people online, you train your AI agent to mimic your behavior to then battle another AI. It is difficult for users to trust their models are the ones being run. You would like to know that the model you spent training is actually the one competing in this tournament. With ZKML you can point to this verifier contract that yes this is your model. Areas in which ML is quite important, but also the trust component is very important. Only when these two are married does the tech make sense to us.

There are many different vertices; hardware optimized for ZK, theoretical to create smaller proofs, pure implementation for papers and theory, better tooling. On the hardware side currently ZK is not standardized yet so there are different proving systems competing for different use cases. In the future, we will l have FPGAs and ASICs to bolster the prover's performance to allow for more intensive use cases.

We are lagging in implementing new breaking edge research. We just have to do the engineering work, all the theoretical stuff needed to improve speed is there.

The data side tends to be lacking in ZKML. If you can't control the data source or what is going into your model, it doesn’t matter that you are running a particular model. Adversarial attacks exist in ML. We need signed or attested data sources, we need APIs where the provider says here is a signature that when I give you this response in your API call it comes from us. You have to start from the very beginning, missing any piece in the entire chain lets anyone attack something.

The signature tells you who says the image is real, the provenance of the proof, it doesn’t tell you that the image is real, however.

ZKML is one component of a complex crypto system. You have to think through all the components of the cryptography and the economics in order for the system to function.

Hardware is interesting. We are in a period in the next three years of total chaos in the proof system. Should we forget all of these ZK-EVMs and do it in NOVA instead? That is a nightmare if you are a hardware manufacturer, and you have a longer planning schedule or if you are a miner what do you need? This will be a big part of the puzzle in getting the economics of the system to improve. Can we make the proof system modular enough that proof miners can start to reduce the cost of doing this by a couple orders of magnitude?

On-chain verification implies that anyone anywhere can verify this in perpetuity. This is really powerful, and even stronger than what we need. There is a company, Jensen AI that is doing decentralized training. You are entrusting a network of gpu nodes to train your model for you. They don’t have the credibility of Amazon, so you can’t naturally trust them. They could generate a ZKP that says they are training your model or executing your model correctly. Likewise, they don’t need public verifiably, they only need to convince you. IN these cases you can use weaker forms of proofs like interactive proofs, which can be a lot more concretely efficient.

EZKL - using Halo 2, PSE’s branch of Halo 2 at some point will support other proof systems. Amount of code that touches API of Halo2 is about 10%.

Modulus is focused on older and more theoretical line of work, involves layered circuits which mimic the properties of NN very well. Prover is very fast as long as compute is structured.

Trust is just I know that people who give me an AI service can prove to me what they are using. Accountability is difficult. We are in the early stages of thinking how to make the AIs accountable. You can make people who are building accountable through the legal system.

One of the fun things about ZKML is it offers this different kind of accountability. Imagine you have a powerful server giving instructions to a week robot. This is vulnerable to the server being corrupted and it will give an erroneous instruction. With ZKML this server can only produce correct results. It is computing some function which has been defined in advance. Even if an adversary were to run some computation, they could still only prove correct executions. Maybe they can also mess with the input. But, if the input is signed, then the whole supply chain becomes secure.

Maybe we could audit decisions made by autonomous vehicles. After a crash, you can prove that it has or hasn’t been tampered with or whose algorithm was run. Or if I want to delegate to my agent to act for me after I die to execute my estate now I don’t have to care who runs that agent. I know the agent will only be able to produce self-signed decisions.

The other way it bolsters trust is that most ML models because it takes time and effort to train them, folks don't want to just reveal them. You as a purchaser of ML model may want to know its. A good ML model beforehand. But now with ZK you can on some test set which is representative of inputs you can send this to model vendor and vendor can generate a proof that their algorithm gets a certain metric on this testset. My model gets 98% accuracy or something. Now you are convinced that their model does well on your testset. Now we can evaluate models in advance and trust they do the thing the vendor claims they do. This adds a bit of trust to a model marketplace, for example.

ZKML Notes

Sources

Papers

Articles

Presentations

High Level

How to Acieve ZKML

Converting floating point to Fixed point is an issue

GPT-4 Explanation

ZKML step flow Input -> SNARK verification

ZKPs & Application to ML (Yupeng Zhang)

Integrity issues in ML

Naive solutions to integrity issues: disclosing the ML model

Efficient Sumcheck for FFT

Additional Optimizations

Performance

ZK Decision Trees

Summary

Scaling Trustless DNN Inference with ZK-SNARKs to GPT, ResNet, and more

Applications

Open-Sourcing ZKML

Conclusion

EZKL - ZKML Devcon talk

Why ZKML?

EZKL: Turn an ONNX model into a ZKP

Can define a forward model x,y,z are tensors with shape determined at runtime

ZKML allows scalable automated Oracles

Signed Ingestion - Authenticated content is here and verifiable in a ZK-SNARK

Test Models Image classification - Ontogeny recapitulates phylogeny roadmap

On-chian verification

ZKML will be table stakes for chains for next 5-10 years

Unraveling ZKML Present Realities and Future Horizons in Privacy-Preserving AI

What is ZKML

ZKML POC

ZKML Timeline

What has been done?

What can be done?

Where ZK & ML intersect

Why ZKML?

Challenges for Humans

How?

Linear algebra and non-linearities

Challenges of NN in a ZKP

Elements of statistical learning

Zkpod ai with microphone attestation

Random thoughts

Notes from Zuzalu Panel on ZKML

Framework for AI primitive stack

Why is this important?

For ZKML the best analogy

ZKML is not

Read more

Anoma: A first principles jounrey from Intents to Applications

Namada Hackerhouse

Plato on Anoma

Diagrams