plafer

@plafer

I currently work for Polygon in the Miden team, where I focus primarily on the Miden VM. This platform is where I post my exploration of zero-knowledge tech.

Joined on Mar 25, 2024

  • This note attempts to be a summary of the Basefold paper and talk by Binyi Chen. The paper makes three major contributions that are all interlinked. It introduces a linear code (which we call "foldable code") that generalizes the Reed-Solomon code such that it no longer requires FFT-friendly fields, an Interactive Oracle of Proximity (IOPP), which uses the foldable code, and can be thought of as FRI but for multilinear polynomials, and a polynomial commitment scheme for multilinear polynomials, which mixes the IOPP and sum-check together. Let's dive in!
     Like  Bookmark
  • This note is about LogUp-GKR, and is the culmination of many notes that built up to this. Make sure to take a look at the book for links to all other articles. Namely, the previous article deep dove into the original GKR. Also make sure to familiarize yourself with our math conventions. Recall the LogUp equation from the LogUp note: $$ \sum_{i=0}^{n-2} \sum_{j=0}^{k-1} \frac{m_{a_{ij}}}{(\alpha - a_{ij})} = \sum_{i=0}^{n-2} \sum_{j=0}^{k-1} \frac{m_{b_{ij}}}{(\alpha - b_{ij})} $$ LogUp-GKR is a technique for a prover to prove the correct evaluation of this equation to a verifier using a modified version of GKR. We're going to be specifically interested in the case where the $m_{a_{ij}}$, $m_{b_{ij}}$, $a_{ij}$ and $b_{ij}$ are values derived from a STARK trace. As discussed in the LogUp note, we know how to prove this relationship using AIR constraints directly. This note is interested in how to offload this task to GKR instead.
     Like  Bookmark
  • This note picks up where the previous one on multiset checks left off. The main motivation for this note is to support a variation on multiset checks where we can encode how many times we expect an element to show up in the set (i.e. the multiplicity) without having that element showing up that many times. Specifically, the multiset check with multiplicities equation is $$ \prod_{i=1}^{n-1} \prod_{j=1}^k (\alpha - a_{ij})^{m_{a_{ij}}} = \prod_{i=1}^{n-1} \prod_{j=1}^k (\alpha - b_{ij})^{m_{b_{ij}}} $$ For example, if $m_{a_{23}} = 2$, then the term $(\alpha - a_{23})$ in the equation is repeated twice: $(\alpha - a_{23})\cdot(\alpha - a_{23})$. Hence, the value $a_{23}$ would appear once in the ${a_{ij}}$ multiset, but would need to appear twice in the ${b_{ij}}$ multiset (or equivalently, only once, but with an associated multiplicity of $2$). Note also that setting a multiplicity value to $m_{a_{ij}} = 0$ has the effect of "turning off" the associated $(\alpha - a_{ij})$ term. This is especially useful for range checks, which the Miden VM docs have a good introduction about.
     Like  Bookmark
  • In this note, we will explore two versions of the Sum-Check protocol: the vanilla protocol, as well as a useful generalization that we'll refer to as "Specialized Sum-Check". The Sum-Check problem statement Sum-check is an algorithm where the following statement is proven: $$ \sum_{x_1 \in {0, 1}} \dots \sum_{x_m \in {0, 1}} g(x_1, \dots, x_m) = C_1, $$ where $g: \mathbb{F}^m \rightarrow \mathbb{F}$ is any m-variate polynomial, and $C_1 \in \mathbb{F}$ is the result of computing the sum. The polynomial $g$ is public (i.e. known to both the prover and verifier). Hence, the verifier could compute that sum itself. However, we often don't want the verifier to have to do all this work. The main benefit of sum-check is that the verifier can be convinced that the statement is true with very little work, at the cost of more work for the prover.
     Like  Bookmark
  • Recall from the previous article that running GKR on the LogUp circuit leaves us with the claims $\tilde{f}0(\rho), \dots, \tilde{f}{m-1}(\rho)$, for some randomly sampled $\rho \in \mathbb{F}^{\log n}$. We will be proving these claims in Air. Rather than proving $m$ separate claims, we will instead prove the random linear combination of these claims: $$ \sigma = \sum_{j = 0}^{m-1} \alpha_j \tilde{f}_j(\rho) $$ for randomly sampled $\alpha_j \in \mathbb{F}$, $0 \leq j < m$.
     Like  Bookmark
  • Welcome Math Preliminaries Representations of univariate polynomials Representations of multilinear polynomials and the multilinear extension Motivation Multiset checks Multiset checks with multiplicities, and LogUp
     Like  Bookmark
  • This book is dedicated to providing the necessary background for understanding LogUp-GKR, a technique to accelerate proving LogUp invented by Shahar Papini and Ulrich Habock in 2023. We will also explore the practical ramifications of using the technique in a real-world STARK-based zero-knowledge virtual machine (zk-VM). I wrote this book as a result of my experience in implementing LogUp-GKR for the Miden VM, alongside Bobbin Threadbare and Al. It is my hope that it will provide the necessary tools to help onboard future developers to that part of the Miden VM codebase, as well as help maintainers of other VMs decide whether or not they should integrate LogUp-GKR in their zk-VM. Practical knowledge of STARKs is assumed. If you have any suggestions and/or feedback, drop a comment, or DM me on Twitter/X.
     Like 1 Bookmark
  • Remember to familiarize yourself with our math conventions. A "multiset check" checks the equality of two multisets (i.e. sets where multiple copies of the same element are allowed). Multiset checks are also called "permutation checks", because equivalently, they check whether a list is a permutation of another list. In the context of STARK-based zk-VMs, these multisets are derived from the trace. For example, column 1 and column 2 could be the two multisets of interest, in which case running a multiset check would ensure that the values in column 1 are the same values as in column 2, potentially rearranged. Or, $\mathrm{column_1} \cup \mathrm{column}_2$ could be the first multiset, while $\mathrm{column_3} \cup \mathrm{column}_4$ could be the second multiset. However, as we'll see in the next section, the elements in the multisets don't actually need to live explicitly in the trace. Definition of a multiset in STARKs Before we discuss how to implement a multiset check, we should first give a proper definition of how our multisets are defined. There are multiple possible definitions, and each different definition will influence how the multiset check is implemented. In this note, we will focus on the definition used in the Miden VM.
     Like  Bookmark
  • In this short note, we establish the conventions that will be used throughout the book. The field used will be called $\mathbb{F}$. We will call our trace domain $\mathbb{H} = {1, \omega, \omega^2, \dots, \omega^{n-1}}$, where $\omega \in \mathbb{F}$ is the group generator, and $n$ is the trace length. We will model the trace $\mathbb{T}$ as an $n \times m$ matrix. Recall that in STARKs, for each column, a polynomial over domain $\mathbb{H}$ is interpolated from the values in that column. We will name each column polynomial explicitly: $c_0, \dots, c_{m-1}$, where $c_i: \mathbb{F} \rightarrow \mathbb{F}$, such that $c_j(i) = \mathbb{T}_{ij} \quad \forall i \in \mathbb{H}, \forall j \in {1, \dots, m}$.
     Like  Bookmark
  • Setup Recall from our previous note that the Specialized Sum-Check protocol allows to prove a statement of the form $$ \sum_{(x_0, \cdots, x_v) \in {0, 1}^{v+1}} g(f_0(x_0, \cdots, x_v), \cdots, f_m(x_0, \cdots, x_v)) = C, $$ where $f_0, \cdots, f_m$ are multi-linear polynomials defined over some field $\mathbb{F}$, $g$ is any multivariate polynomial $\mathbb{F}^{v+1} \rightarrow \mathbb{F}$, and $C$ is some arbitrary value. During each round of the protocol, the prover is asked to build the following univariate polynomial: $$
     Like  Bookmark
  • In this note, we will go over the two main representations of univariate polynomials: the "monomial" and "evaluation" representations. The monomial representation is the traditional representation that most people are familiar with. Specifically, a polynomial in monomial representation is written as: $$ p(x) = a_{n-1}x^{n-1} + \dots + a_2 x^2 + a_1 x + a_0 = \sum_{i=0}^{n-1} a_i x^i $$ We call it the "monomial representation" because the coefficients $a_0, \dots, a_{n-1}$ multiply their respective monomial $x^0, \dots, x^{n-1}$. There is also another very useful representation that we'll call the "evaluation representation". However, before we can talk more deeply about it, and how both representations are two sides of the same coin, we need to first brush up on vector spaces. Brief review of vector spaces linalg_basis
     Like 1 Bookmark
  • This note is intended to be a condensed description of the GKR protocol. For a more gentle introduction to the subject, it may be useful to first read my GKR introduction note. I will also assume familiarity with the sum-check protocol, as well as the multilinear extension. GKR is a recursively defined protocol where a prover convinces a verifier of a correct evaluation of a predetermined layered circuit. Circuit format Below is an example of a layered circuit: gkr In general, GKR supports layered circuits of any number $d$ of layers, where each node in layer $i$ is connected to 2 nodes in layer $i+1$, and where those 2 nodes are either added or multiplied together.
     Like  Bookmark
  • GKR is a protocol for circuit evaluation, specifically for layered circuits. We will be covering the protocol as presented in Thaler's Proofs, Arguments and Zero-Knowledge, which is slightly simpler than the one presented in the original GKR paper. In this note, we will present the GKR protocol "by re-discovery". That is, we will present 2 intermediary protocols that build on each other, and culminate into GKR. Specifically, we will start with the most naive version, GKR**, improve on it to get GKR*, and once again improve on it to get GKR. The hope is that first learning GKR** and GKR* makes understanding why and how GKR works much easier. The circuit All 3 protocols will be proving the evaluation of the following circuit. gkr The circuit has 3 layers, where layer 0 is the output layer, and layer 2 is the input layer. $S_i$ denotes the number of nodes in layer $i$: $S_0 = 2$, $S_1 = 4$, $S_2 = 8$. It will also be useful to have a variable $k_i$ to hold the logarithm of $S_i$: $k_0 = 1$, $k_1 = 2$, $k_2 = 3$. Finally, the values that each layer takes is represented by a function $W_i: {0, 1}^{k_i} \rightarrow \mathbb{F}$. That is, each layer's $j^{\mathrm{th}}$ value is mapped to from the bit decomposition of $j$. For example, consider how $W_2$ is defined:
     Like  Bookmark
  • In the previous note, we discussed two representations for univariate polynomials, as well as a precise view on what it means to "interpolate" a univariate polynomial from a set of points. This note is the continuation of the previous one, but for multilinear polynomials instead of univariate. That is, mulvariate polynomials where the degree in each variable is at most $1$. Of particular interest to us will be the multilinear extension (or, equivalently the "multivariate Lagrange interpolating polynomial"), due to its heavy use in LogUp-GKR. Multilinear polynomials The most common way to define multilinear polynomial is: $$ p(x_0, \dots, x_{m - 1}) = \sum_{i_0 = 0}^1 \dots \sum_{i_{m-1} = 0}^1 a_{i_0, \dots, i_{m-1}} x_0^{i_0} \dots x_{m-1}^{i_{m-1}} $$
     Like 1 Bookmark