dante
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee
  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    3
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    # Honey I SNARKED the GPT > Or rather how we got [nanoGPT](https://github.com/karpathy/nanoGPT) into Halo2-KZG circuits using [EZKL](https://github.com/zkonduit/ezkl). > This post is written in collaboration with [Bianca Gănescu](https://twitter.com/BiancaGanescu) a Master's student at Imperial College London (supervised by [Jonathan Passerat-Palmbach](https://jopasser.at/)) who was instrumental in making this happen, and [Jason Morton](https://twitter.com/jasonmorton). I am the lead developer on EZKL (stylized and pronounced *Ezekiel*), a library which enables users to convert computational graphs represented in the Open Neural Network Exchange (ONNX) format to a (**Halo2-KZG**) ZK-SNARK circuit. Since starting the project in August of 2022 we've come a long way in terms of the [breadth of models](https://github.com/zkonduit/ezkl/tree/main/examples) we support, our proving [performance](https://github.com/zkonduit/ezkl/actions/runs/5062656323/jobs/9088409174), and also the [community](https://t.me/+CoTsgokvJBQ2NWQx) of folks building and researching on top of our tool! Bianca, as part of her Master's thesis, wanted to get nanoGPT into a ZK-SNARK using our tooling. This post is a high level overview of the steps it took to make that happen. The sections below are somewhat technical and assume some knowledge of how the [Halo2 API](https://zcash.github.io/halo2/index.html) is structured. ### Supporting arbitrary model sizes ![](https://hackmd.io/_uploads/HkYSN8iHh.png) A question we often get is how we manage to get large models into a **Halo2-KZG** circuit for (almost) any arbitrary trusted setup size. The way we've been able to deal with this is by having operations "wrap" around advice columns. To illustrate this consider our dot product argument: ```rust | a0 | a1 | m | s_dot | |-----|-----|-------|-------| | x_i | y_i |m_{i-1}| s_dot | | | |m_i | | ``` where constraints are such that $x_i*y_i + m_{i-1} = m_i$, and $i$ indexes over the elements of a vector or some other iterable structure. Let's say $i$ exceeds the number of rows available in a single column (let's call that $M$). We start allocating the $x_i, y_i$ where $i > M$ to a new set of advice columns $a_2, a_3$. It requires some trickery in that we also need to map the result of the bottom of $a_0, a_1$ to the top of $a_2, a_3$ (by way of copy constraints), such that the constraint with a negative rotation of $-1$ still holds. In this manner we can have a relatively large number of constraints by increasing the number of columns, rather than increasing the number of rows. The way we get around the region shuffling that may break this is (drum roll) .... we have a single region for the entire circuit and make sure to keep track of the current offset throughout. Folks are often suprised that our code looks quite different to other Halo2 examples, and this single region strategy is why. As a side effect, it also has the nice property of leaking very little information about the architecture of the model, providing a kind of functional commitment. ### Supporting (almost) all Onnx operations The ONNX format has over [100 operations](https://github.com/onnx/onnx/blob/main/docs/Operators.md) that it can represent. Our first approach was to painstakingly implement circuit constraint equivalents for each one at a time, dealing with high-dimensional tensor shape casting, and creating custom constraints and gates for each one. That is until we discovered [tract](https://github.com/sonos/tract). Tract is a library developed by the Sonos team (and mainly the work of one savant [@kali](https://github.com/sonos/tract/commits?author=kali)), which reduces and prunes complex ONNX graphs into compositions of a small set of operations (the number of which is reducing with each new tract release). Instead of implementing 100+ operations we only need to implement 20 or so as equivalent circuit constraints. In particular many linear operations such as matrix multiplication, column sum, outer product, dot product, transposition, transposed matrix multiplication, matrix to vector multiplication, Hadamard products, batched matrix multiplication, tensor contraction, even bilinear transformations over multiple tensors are represented as a single operation: the Einstein summation operation. For those unfamiliar the Einstein summation convention is a powerful notation for expressing operations over multi-dimensional tensors. For a primer on it, I recommend this [article](https://towardsdatascience.com/einstein-index-notation-d62d48795378); but a brief summary is that by way of a succinct notational convention we can express any abitrary summation or product of elements between multidimensional tensors. For instance to express matrix multiplication we might write: ``` ik,kj->ij, ``` which in effect expresses: >we have 2 2D objects one with dimensionality `ik`, the other with dimensionality `kj` and our operation should produce an output of dimensionality `ij` by reducing along the common axis of dimensionality `k`. For fun, here's another relationship this sort of notation can express: ``` bn,anm,bm->ba. ``` We leave this as an exercise for the reader to untangle what this represents :kissing_heart:. Luckily the constraint equivalent of einsum is just a repeated application of the (generalized) argument of the dot product detailed above over the appropriate input axes. In follow up posts we'll release the exact construction of these constraints and arguments. As such when we started to use tract in the backend, and implemented einsum, we were able to support a whole world of models out of the box including LSTMs and self-attention layers in transformer blocks (one of the constitutive elements of nanoGPT). ### Allowing for parallel region layouts in Halo2 Our "single region" strategy for Halo2, detailed above, means we are particularly well positioned to layout circuits in parallel. When dealing with 10million+ constraints, even generating circuit constraints or the witness can take some time and for the sake of user sanity / lack of patience we wanted to speed this process up. Sadly the original implementation of Halo2 does not allow for parallel region layouts, and in particular the creation of the copy-constraints in the circuit are not deterministic in a multi-threaded setting. These copy constraints are expressed using "cycles", effectively directed graphs which have as constitutive elements all the Halo2 cells that should have equal witness values (i.e are copy-constrained). Quoting the [Halo2 book](https://zcash.github.io/halo2/design/proving-system/permutation.html) directly: > For example, suppose that we have a circuit that defines the following equality constraints: $a≡b, a≡c, d≡e$. From this we have the equality-constraint sets ${a,b,c}$ and ${d,e}$. We want to construct the permutation: $(a b c) (d e)$. This is expressed as two disjoint cycles $(a b c)$ and $(d e)$. When two cells are copy-constrained, the cycles they are currently a part of are merged. For example, given two disjoint cycles $(A B C D)$ and $(E F G H)$ (diagrams taken from the Halo2 book): ``` A +---> B ^ + | | + v D <---+ C E +---> F ^ + | | + v H <---+ G ``` After adding constraint $B≡E$: ``` A +---> B +-------------+ ^ | | | + v D <---+ C <---+ E F ^ + | | + v H <---+ G ``` In particular, referring to step 2 of the algorithm in the [Halo2 Book](https://zcash.github.io/halo2/design/proving-system/permutation.html) for permutation arguments used to build copy-constraints: > 2. Otherwise, left and right belong to different cycles. Make left the larger cycle and right the smaller one, by swapping them iff sizes(aux(left))<sizes(aux(right)). The smaller "cycle" in a multi-threaded setting can be non-deterministic and as such, the constructed copy constraint labels can differ between the key generation and proof stage. For an illustration of how the order in which the copy constraints are generated can create different cycle directions see the below diagram (creds to me the artist): ![Untitled-2022-01-06-1700(6)](https://github.com/privacy-scaling-explorations/halo2/assets/45801863/e47c1a5d-3e78-4a5d-982d-087e793e7a40) A way to mitigate this is to make this label deterministic which we do by way of the usage of different data structures for representing copy-constraint cycles in [this PR to PSE Halo2](https://github.com/privacy-scaling-explorations/halo2/pull/180). Once we had this implemented we could generate the witness and layout circuit constraints in a parallel fashion. ### Rescaling fixed-point arithmetic As Halo2 (and most ZK frameworks) operate over [Fields](https://en.wikipedia.org/wiki/Field_(mathematics)) we are somewhat constrained in the types of arithmetic we can easily (or cheaply) represent. For speed and convenience and codebase simplicity we currently perform operations using [fixed-point](https://en.wikipedia.org/wiki/Fixed-point_arithmetic) arithmetic. Given a base $b$, a scale $s$ and a shift $k$, we represent a given decimal number $x$ as $b^{s}x + k$, which we then map to a member of the given field $\mathbb{F}$ we are operating over. As a default we choose $b=2, k=0$ and leave $s$ as a tunable parameter that can be set by the user. The issue is that when composing lots of chained multiplicative operations, such as in deep networks like multi-head self-attention layers, the underlying scale of the fixed-point representation can _explode_. The result of the product has for scale the _sum_ of the input scales. This poses a particular problem when used in conjunction with lookup tables. In particular we leverage fixed-column lookups to represent non-linear functions such as $ReLU$ or $e^x$. Ignoring cryptographic details, a lookup table is basically two advice columns, and two fixed columns of the Halo2 "grid." The fixed columns are pre-filled and committed at circuit definition time with all the allowed input-output pairs. The advice columns are filled at proving time, and an argument is made that all the input-output pairs in the witness are valid input-output pairs that appear in the pre-filled fixed columns. Using random linear combination ideas, the argument conceptually reduces to looking up one advice column in one fixed column (image taken from [here](https://www.google.com/search?q=lookup+arguments+plonk&client=firefox-b-d&tbm=isch&sxsrf=APwXEdeFhRLzIsT2aIimkJjQwybX65wQtg:1684923063628&source=lnms&sa=X&ved=2ahUKEwjpsrnF243_AhWMQUEAHdngBNMQ_AUoAnoECAoQBA&biw=1512&bih=791&dpr=2#imgrc=K0HRIqcDMXVtFM)): ![](https://hackmd.io/_uploads/SyL3MwiS3.png) In the diagram above that corresponds to, say, the blue column and the orange one. When we say "$ReLU(x)$", what this means in a circuit is that the provided input-output witness values of $(x,ReLU(x))$ should be a row of the pre-committed lookup table. When the fixed-point scale explodes in value, witness pairs can quickly fall out of range of our lookups. To keep this scale in control we introduce a heuristic scaling method where we periodically "divide-down" elements that have too large a fixed-point scale. This division is also performed using a lookup. ### nanoGPT in a SNARK This combination of improvements mean we were able to get nanoGPT, exported as an Onnx file, [into a Halo2 circuit](https://github.com/zkonduit/ezkl/pull/255). Our latest benchmarks with public outputs and private model parameters and input data can be tracked [here](https://github.com/zkonduit/ezkl/actions/runs/5062656323/jobs/9088409174). In this setting we are proving a statement akin to: > I ran this private nanoGPT-like model on some private data and it produced an output which matches this publicly known value. We ran the benchmarks on the following spec: ``` CPU AMD Epyc 7413 - 24c/48t - 2.65 GHz/3.6 GHz RAM 512 GB ECC 3200 MHz Data disks 2×960 GB SSD NVMe Soft RAID ``` **Results:** All tests were done with $K$ for the SRS generation such that "columns overflow", as detailed in the first section. ##### Tiny model (4 transformer layers ~ 250k parameters ~ 16 million constraints ~ $K=24$): | | Description | | ----------- | ----------- | | verifying key generation | 382s | | proving key generation | 850s | | proof time | 1828s | | verify time | 0s | ##### Small model (20 transformer layers ~ 1mil parameters ~ 64 million constraints ~ $K=24$): | | Description | | ----------- | ----------- | | verifying key generation | 937s | | proving key generation | 2106s | | proof time | 5169s | | verify time | 0s | ### Future work We're excited to keep improving on these proving times. In particular we're excited to see where hardware acceleration can take us in terms of performance (creds to [@weikengchen](https://twitter.com/weikengchen) for a great discussion on this). ### Shoutouts to other similar work: - Daniel Kang's [work](https://medium.com/@danieldkang/verified-execution-of-gpt-bert-clip-and-more-6acb693fd55f) of getting transformer models into Halo2. - [Orion](https://github.com/gizatechxyz/orion), is a new ONNX runtime built with Cairo. The purpose is to provide a runtime implementation for verifiable ML model inferences using STARKs. - [keras2circom](https://github.com/socathie/keras2circom) a library for converting keras models into circom circuits. - [awesome-zkml](https://github.com/worldcoin/awesome-zkml) a repository of ZKML projects and resources.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully