dante
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
      • No invitee
    • Publish Note

      Publish Note

      Everyone on the web can find and read all notes of this public team.
      Once published, notes can be searched and viewed by anyone online.
      See published notes
      Please check the box to agree to the Community Guidelines.
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
No invitee
Publish Note

Publish Note

Everyone on the web can find and read all notes of this public team.
Once published, notes can be searched and viewed by anyone online.
See published notes
Please check the box to agree to the Community Guidelines.
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
3
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Honey I SNARKED the GPT > Or rather how we got [nanoGPT](https://github.com/karpathy/nanoGPT) into Halo2-KZG circuits using [EZKL](https://github.com/zkonduit/ezkl). > This post is written in collaboration with [Bianca Gănescu](https://twitter.com/BiancaGanescu) a Master's student at Imperial College London (supervised by [Jonathan Passerat-Palmbach](https://jopasser.at/)) who was instrumental in making this happen, and [Jason Morton](https://twitter.com/jasonmorton). I am the lead developer on EZKL (stylized and pronounced *Ezekiel*), a library which enables users to convert computational graphs represented in the Open Neural Network Exchange (ONNX) format to a (**Halo2-KZG**) ZK-SNARK circuit. Since starting the project in August of 2022 we've come a long way in terms of the [breadth of models](https://github.com/zkonduit/ezkl/tree/main/examples) we support, our proving [performance](https://github.com/zkonduit/ezkl/actions/runs/5062656323/jobs/9088409174), and also the [community](https://t.me/+CoTsgokvJBQ2NWQx) of folks building and researching on top of our tool! Bianca, as part of her Master's thesis, wanted to get nanoGPT into a ZK-SNARK using our tooling. This post is a high level overview of the steps it took to make that happen. The sections below are somewhat technical and assume some knowledge of how the [Halo2 API](https://zcash.github.io/halo2/index.html) is structured. ### Supporting arbitrary model sizes ![](https://hackmd.io/_uploads/HkYSN8iHh.png) A question we often get is how we manage to get large models into a **Halo2-KZG** circuit for (almost) any arbitrary trusted setup size. The way we've been able to deal with this is by having operations "wrap" around advice columns. To illustrate this consider our dot product argument: ```rust | a0 | a1 | m | s_dot | |-----|-----|-------|-------| | x_i | y_i |m_{i-1}| s_dot | | | |m_i | | ``` where constraints are such that $x_i*y_i + m_{i-1} = m_i$, and $i$ indexes over the elements of a vector or some other iterable structure. Let's say $i$ exceeds the number of rows available in a single column (let's call that $M$). We start allocating the $x_i, y_i$ where $i > M$ to a new set of advice columns $a_2, a_3$. It requires some trickery in that we also need to map the result of the bottom of $a_0, a_1$ to the top of $a_2, a_3$ (by way of copy constraints), such that the constraint with a negative rotation of $-1$ still holds. In this manner we can have a relatively large number of constraints by increasing the number of columns, rather than increasing the number of rows. The way we get around the region shuffling that may break this is (drum roll) .... we have a single region for the entire circuit and make sure to keep track of the current offset throughout. Folks are often suprised that our code looks quite different to other Halo2 examples, and this single region strategy is why. As a side effect, it also has the nice property of leaking very little information about the architecture of the model, providing a kind of functional commitment. ### Supporting (almost) all Onnx operations The ONNX format has over [100 operations](https://github.com/onnx/onnx/blob/main/docs/Operators.md) that it can represent. Our first approach was to painstakingly implement circuit constraint equivalents for each one at a time, dealing with high-dimensional tensor shape casting, and creating custom constraints and gates for each one. That is until we discovered [tract](https://github.com/sonos/tract). Tract is a library developed by the Sonos team (and mainly the work of one savant [@kali](https://github.com/sonos/tract/commits?author=kali)), which reduces and prunes complex ONNX graphs into compositions of a small set of operations (the number of which is reducing with each new tract release). Instead of implementing 100+ operations we only need to implement 20 or so as equivalent circuit constraints. In particular many linear operations such as matrix multiplication, column sum, outer product, dot product, transposition, transposed matrix multiplication, matrix to vector multiplication, Hadamard products, batched matrix multiplication, tensor contraction, even bilinear transformations over multiple tensors are represented as a single operation: the Einstein summation operation. For those unfamiliar the Einstein summation convention is a powerful notation for expressing operations over multi-dimensional tensors. For a primer on it, I recommend this [article](https://towardsdatascience.com/einstein-index-notation-d62d48795378); but a brief summary is that by way of a succinct notational convention we can express any abitrary summation or product of elements between multidimensional tensors. For instance to express matrix multiplication we might write: ``` ik,kj->ij, ``` which in effect expresses: >we have 2 2D objects one with dimensionality `ik`, the other with dimensionality `kj` and our operation should produce an output of dimensionality `ij` by reducing along the common axis of dimensionality `k`. For fun, here's another relationship this sort of notation can express: ``` bn,anm,bm->ba. ``` We leave this as an exercise for the reader to untangle what this represents :kissing_heart:. Luckily the constraint equivalent of einsum is just a repeated application of the (generalized) argument of the dot product detailed above over the appropriate input axes. In follow up posts we'll release the exact construction of these constraints and arguments. As such when we started to use tract in the backend, and implemented einsum, we were able to support a whole world of models out of the box including LSTMs and self-attention layers in transformer blocks (one of the constitutive elements of nanoGPT). ### Allowing for parallel region layouts in Halo2 Our "single region" strategy for Halo2, detailed above, means we are particularly well positioned to layout circuits in parallel. When dealing with 10million+ constraints, even generating circuit constraints or the witness can take some time and for the sake of user sanity / lack of patience we wanted to speed this process up. Sadly the original implementation of Halo2 does not allow for parallel region layouts, and in particular the creation of the copy-constraints in the circuit are not deterministic in a multi-threaded setting. These copy constraints are expressed using "cycles", effectively directed graphs which have as constitutive elements all the Halo2 cells that should have equal witness values (i.e are copy-constrained). Quoting the [Halo2 book](https://zcash.github.io/halo2/design/proving-system/permutation.html) directly: > For example, suppose that we have a circuit that defines the following equality constraints: $a≡b, a≡c, d≡e$. From this we have the equality-constraint sets ${a,b,c}$ and ${d,e}$. We want to construct the permutation: $(a b c) (d e)$. This is expressed as two disjoint cycles $(a b c)$ and $(d e)$. When two cells are copy-constrained, the cycles they are currently a part of are merged. For example, given two disjoint cycles $(A B C D)$ and $(E F G H)$ (diagrams taken from the Halo2 book): ``` A +---> B ^ + | | + v D <---+ C E +---> F ^ + | | + v H <---+ G ``` After adding constraint $B≡E$: ``` A +---> B +-------------+ ^ | | | + v D <---+ C <---+ E F ^ + | | + v H <---+ G ``` In particular, referring to step 2 of the algorithm in the [Halo2 Book](https://zcash.github.io/halo2/design/proving-system/permutation.html) for permutation arguments used to build copy-constraints: > 2. Otherwise, left and right belong to different cycles. Make left the larger cycle and right the smaller one, by swapping them iff sizes(aux(left))<sizes(aux(right)). The smaller "cycle" in a multi-threaded setting can be non-deterministic and as such, the constructed copy constraint labels can differ between the key generation and proof stage. For an illustration of how the order in which the copy constraints are generated can create different cycle directions see the below diagram (creds to me the artist): ![Untitled-2022-01-06-1700(6)](https://github.com/privacy-scaling-explorations/halo2/assets/45801863/e47c1a5d-3e78-4a5d-982d-087e793e7a40) A way to mitigate this is to make this label deterministic which we do by way of the usage of different data structures for representing copy-constraint cycles in [this PR to PSE Halo2](https://github.com/privacy-scaling-explorations/halo2/pull/180). Once we had this implemented we could generate the witness and layout circuit constraints in a parallel fashion. ### Rescaling fixed-point arithmetic As Halo2 (and most ZK frameworks) operate over [Fields](https://en.wikipedia.org/wiki/Field_(mathematics)) we are somewhat constrained in the types of arithmetic we can easily (or cheaply) represent. For speed and convenience and codebase simplicity we currently perform operations using [fixed-point](https://en.wikipedia.org/wiki/Fixed-point_arithmetic) arithmetic. Given a base $b$, a scale $s$ and a shift $k$, we represent a given decimal number $x$ as $b^{s}x + k$, which we then map to a member of the given field $\mathbb{F}$ we are operating over. As a default we choose $b=2, k=0$ and leave $s$ as a tunable parameter that can be set by the user. The issue is that when composing lots of chained multiplicative operations, such as in deep networks like multi-head self-attention layers, the underlying scale of the fixed-point representation can _explode_. The result of the product has for scale the _sum_ of the input scales. This poses a particular problem when used in conjunction with lookup tables. In particular we leverage fixed-column lookups to represent non-linear functions such as $ReLU$ or $e^x$. Ignoring cryptographic details, a lookup table is basically two advice columns, and two fixed columns of the Halo2 "grid." The fixed columns are pre-filled and committed at circuit definition time with all the allowed input-output pairs. The advice columns are filled at proving time, and an argument is made that all the input-output pairs in the witness are valid input-output pairs that appear in the pre-filled fixed columns. Using random linear combination ideas, the argument conceptually reduces to looking up one advice column in one fixed column (image taken from [here](https://www.google.com/search?q=lookup+arguments+plonk&client=firefox-b-d&tbm=isch&sxsrf=APwXEdeFhRLzIsT2aIimkJjQwybX65wQtg:1684923063628&source=lnms&sa=X&ved=2ahUKEwjpsrnF243_AhWMQUEAHdngBNMQ_AUoAnoECAoQBA&biw=1512&bih=791&dpr=2#imgrc=K0HRIqcDMXVtFM)): ![](https://hackmd.io/_uploads/SyL3MwiS3.png) In the diagram above that corresponds to, say, the blue column and the orange one. When we say "$ReLU(x)$", what this means in a circuit is that the provided input-output witness values of $(x,ReLU(x))$ should be a row of the pre-committed lookup table. When the fixed-point scale explodes in value, witness pairs can quickly fall out of range of our lookups. To keep this scale in control we introduce a heuristic scaling method where we periodically "divide-down" elements that have too large a fixed-point scale. This division is also performed using a lookup. ### nanoGPT in a SNARK This combination of improvements mean we were able to get nanoGPT, exported as an Onnx file, [into a Halo2 circuit](https://github.com/zkonduit/ezkl/pull/255). Our latest benchmarks with public outputs and private model parameters and input data can be tracked [here](https://github.com/zkonduit/ezkl/actions/runs/5062656323/jobs/9088409174). In this setting we are proving a statement akin to: > I ran this private nanoGPT-like model on some private data and it produced an output which matches this publicly known value. We ran the benchmarks on the following spec: ``` CPU AMD Epyc 7413 - 24c/48t - 2.65 GHz/3.6 GHz RAM 512 GB ECC 3200 MHz Data disks 2×960 GB SSD NVMe Soft RAID ``` **Results:** All tests were done with $K$ for the SRS generation such that "columns overflow", as detailed in the first section. ##### Tiny model (4 transformer layers ~ 250k parameters ~ 16 million constraints ~ $K=24$): | | Description | | ----------- | ----------- | | verifying key generation | 382s | | proving key generation | 850s | | proof time | 1828s | | verify time | 0s | ##### Small model (20 transformer layers ~ 1mil parameters ~ 64 million constraints ~ $K=24$): | | Description | | ----------- | ----------- | | verifying key generation | 937s | | proving key generation | 2106s | | proof time | 5169s | | verify time | 0s | ### Future work We're excited to keep improving on these proving times. In particular we're excited to see where hardware acceleration can take us in terms of performance (creds to [@weikengchen](https://twitter.com/weikengchen) for a great discussion on this). ### Shoutouts to other similar work: - Daniel Kang's [work](https://medium.com/@danieldkang/verified-execution-of-gpt-bert-clip-and-more-6acb693fd55f) of getting transformer models into Halo2. - [Orion](https://github.com/gizatechxyz/orion), is a new ONNX runtime built with Cairo. The purpose is to provide a runtime implementation for verifiable ML model inferences using STARKs. - [keras2circom](https://github.com/socathie/keras2circom) a library for converting keras models into circom circuits. - [awesome-zkml](https://github.com/worldcoin/awesome-zkml) a repository of ZKML projects and resources.

Import from clipboard

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template is not available.
Upgrade
All
  • All
  • Team
No template found.

Create custom template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

How to use Slide mode

API Docs

Edit in VSCode

Install browser extension

Get in Touch

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Upgrade to Prime Plan

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

No updates to save
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Upgrade

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Upgrade

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully