Jagged PCS - HackMD

# Jagged Polynomial Commitment Scheme ## Why Succinct needs another Polynomial Commitment? Modern SNARKs almost always rely on a *polynomial commitment scheme* (PCS). A PCS lets a prover commit to a huge polynomial $P$ in one short hash-like value and later prove statements of the form “$P(x)=y$’’ without revealing $P$ itself. Inside a zkVM, the usual workflow is: 1. Represent the whole CPU trace as many rectangular tables—one per instruction. 2. Encode every column of every table into its own low-degree polynomial. 3. Commit to those polynomials individually with a PCS and prove all the required row-wise constraints. Committing column-by-column is easy conceptually but hurts performance: * The prover must build many Merkle trees (or similar) instead of one. * The verifier must check an evaluation proof for *each* column. * Recursion circuits in rollups balloon because verifier logic grows with the number of columns. The goal of the *jagged PCS* is to keep the familiar “one table = many columns” arithmetization while paying the verifier cost of a single polynomial. ## From tables to a single polynomial ### The jagged picture Imagine stacking every instruction table on top of the next, aligning their columns. The *top* panel of the figure below shows the outcome: coloured rectangles mark live data, while the white areas under each rectangle are virtual zero-padding that the prover never needs to store. ![image](https://hackmd.io/_uploads/BkqHZms-eg.png) * Each instruction type $y$ has its own table $T_y$. * The table spans $2^{c_y}$ columns, so the integer $c_y$ measures the table’s **width** (always rounded up to a power of two). * Every column in that table is filled for exactly $h_y$ rows; anything beneath is treated as zero. * If we scan the coloured cells in row-major order (left-to-right across a row, then top-to-bottom) you touch every real field element exactly once. The total count is \begin{equation} M \;=\; \sum_{y} \underbrace{2^{c_y}}_{\text{table width}} \cdot \underbrace{h_y}_{\text{table height}} \;=\; \underbrace{2^m}_{\substack{\text{dense size} \\ \text{(after padding)}}} \end{equation} where the right-hand equality holds after padding $M$ up to the next power of two if needed. The *bottom* panel shows the same values packed consecutively into a dense block of length $M$. Indexing that block by $i\in\{0,1\}^{m}$ turns it into a single multilinear polynomial $\widehat q:\mathbb F^{m}\to\mathbb F$. During a jagged-PCS proof, the verifier never manipulates the jagged skyline itself. Instead, a small selector polynomial $\widehat f_{t,c}$ — computed only from the publicly known height list $t_y$ and width list $c_y$ — redirects any query about a table cell to the matching coordinate of $\widehat q$. As a result, the entire trace needs just one commitment and one opening proof, no matter how many tables or columns the program used. ### Dense versus sparse views * **Sparse view:** a function $p:{0,1}^{n}\times{0,1}^{k}\times{0,1}^{c}\to\mathbb F$ that is non-zero only inside the jagged silhouette. * **Dense view:** a flattened function $q:{0,1}^{m}\to\mathbb F$ obtained by a simple bijection $$ i\;\mapsto\;(\text{tab}(i),\text{row}(i),\text{col}(i)). $$ Committing to $q$ is cheap (just one polynomial), but the verifier still needs to *query* $p$ at structured points when running the usual sumcheck constraints. Jagged PCS gives exactly that translation. :::info ### Example: Suppose we have two tables: - Table 0 ($y=0$) has width $2^{c_0} = 2$ and height $h_0 = 2$. - Table 1 ($y=1$) has width $2^{c_1} = 1$ and height $h_1 = 1$. Then the total number of non-zero cells is: \begin{equation} M = 2^{c_0} \cdot h_0 + 2^{c_1} \cdot h_1 = 2 \cdot 2 + 1 \cdot 1 = 5, \end{equation} so $m = \lceil \log_2 5 \rceil = 3$. In dense view, we define $q: \{0,1\}^3 \to \mathbb F$ where each index $i \in \{0,1,2,3,4\}$ maps to a structured triple: | $i$ | tab(i) | row(i) | col(i) | |:---:|:------:|:------:|:------:| | 0 | 0 | 0 | 0 | | 1 | 0 | 0 | 1 | | 2 | 0 | 1 | 0 | | 3 | 0 | 1 | 1 | | 4 | 1 | 0 | 0 | Now $q(3)$, for example, corresponds to $p(0, 1, 1)$ in the sparse view — that is, table 0, row 1, column 1. ::: ## Commit–Open protocol in a nutshell ### Commit phase 1. The prover flattens the trace into a dense function $q : \{0,1\}^{m} \to \mathbb{F}$ by assigning each non-zero cell a unique index $i$, following the row-major traversal of the jagged matrix. 2. The prover sends: * a commitment to the multilinear extension $\widehat q : \mathbb{F}^{m} \to \mathbb{F}$ using any standard dense PCS (e.g., Basefold or Brakedown), * the **cumulative heights** $t_y = h_0 + \dots + h_y$ and the **column widths** $c_y$, both of which define how to reconstruct the jagged shape from the flat view. This allows the verifier to reconstruct the sparse structure symbolically and simulate queries to the original per-table layout, all from a single committed polynomial. ### Proving an evaluation Suppose the verifier wants to check a single point of the original jagged function: \begin{equation} p(\mathbf z_{\mathrm{row}},\mathbf z_{\mathrm{tab}},\mathbf z_{\mathrm{col}}) = v. \end{equation} 1. Both prover and verifier rewrite this as a sum over the dense representation: \begin{equation} \underbrace{\widehat p(\mathbf z_{\mathrm{row}},\mathbf z_{\mathrm{tab}},\mathbf z_{\mathrm{col}})}_{\text{sparse MLE at a jagged coordinate}} \;=\; \sum_{i \in \{0,1\}^{m}} \underbrace{\widehat q(i)}_{\text{dense MLE at flat index } i} \cdot \underbrace{\widehat f_{t,c}(\mathbf z_{\mathrm{tab}}, \mathbf z_{\mathrm{row}}, \mathbf z_{\mathrm{col}}, i)}_{\text{selector: 1 if } i \mapsto (\text{tab}, \text{row}, \text{col})} \end{equation} where $\widehat f_{t,c}$ is a multilinear *selector polynomial* that equals 1 only when index $i$ maps to the given $(\text{tab}, \text{row}, \text{col})$ tuple, and equals 0 elsewhere. 2. They now run the standard **sumcheck protocol** on this sum expression: - The **prover** goes through every index $i \in \{0,1\}^{m}$, evaluates $\widehat q(i)$ (from the dense commitment), and computes the selector value $\widehat f_{t,c}(\mathbf z_{\mathrm{tab}}, \mathbf z_{\mathrm{row}}, \mathbf z_{\mathrm{col}}, i)$. This step costs about $5M$ field multiplications, plus some small overhead for handling the table coordinates ($2^n + 2^k$ terms). - The **verifier**, on the other hand, does much less work. He doesn’t touch the whole sum— he only needs to evaluate the selector $\widehat f_{t,c}$ at a few points during the protocol. Since $\widehat f_{t,c}$ is built using a compact **read-once branching program (ROBP)** with just 4 memory states, the verifier's total work is only $O(m \cdot 2^k)$ field operations. 3. At the end of the sumcheck protocol, the verifier is left with a **single claim**: that the dense polynomial $\widehat q$ evaluates to some value $v'$ at a random point $z' \in \mathbb F^{m}$. This final check is easy: it uses the original dense commitment (e.g., via Basefold), and completes the proof that $p(\mathbf z_{\mathrm{row}}, \mathbf z_{\mathrm{tab}}, \mathbf z_{\mathrm{col}}) = v$. The overall soundness error is at most $2^{m}/|\mathbb{F}|$. ## Making many columns cheap When the column count $2^{k}$ is large, the factor $2^{k}$ in verifier time is undesirable. Two complementary tricks remove it. ### Jagged assist (batch evaluation proof) When the verifier needs to compute $\widehat f_{t,c}$ at $2^{k}$ different points (one for each column), doing it herself would be expensive. Instead, she delegates this task to the prover using a clever interactive protocol called, in the Succinct paper, the **"jagged assist"**. This protocol is based on the classic **$2$-to-$1$ trick**, which allows multiple evaluations of a multilinear polynomial to be reduced to just **one**. Here’s how it helps: - The **verifier** asks the prover to prove the values of $\widehat f_{t,c}$ at all $2^k$ inputs in **one batch**. - The **verifier** then only has to check **a single evaluation** of $\widehat f_{t,c}$, which is cheap. - The **prover** does a little more work — about $O(m \cdot 2^k)$ operations — but this is still very efficient, especially since each evaluation of $\widehat f_{t,c}$ only costs $O(m)$. This trick dramatically reduces verifier cost without adding much overhead for the prover, and it’s particularly helpful when the number of columns $2^k$ is large. :::info ### Example: Suppose the verifier wants to compute $\widehat f_{t,c}(z_{\text{tab}}, z_{\text{row}}, z_{\text{col}}, i)$ for all $2^k = 8$ columns, where $k = 3$. Instead of computing all 8 values directly, the verifier asks the prover to produce a batch proof for: $$ \left\{ \widehat f_{t,c}(z_{\text{tab}}, z_{\text{row}}, y, i) \mid y \in \{0,1\}^3 \right\}. $$ The prover evaluates $\widehat f_{t,c}$ at each of the 8 $y$ values and runs a sumcheck-like protocol to convince the verifier that the answers are correct. In the end, the verifier checks just **one** random linear combination of those evaluations, e.g.: \begin{equation} \sum_{y \in \{0,1\}^3} \alpha_y \cdot \widehat f_{t,c}(z_{\text{tab}}, z_{\text{row}}, y, i) \end{equation} for some random coefficients $\alpha_y$. This reduces verifier cost from $8 \cdot O(m)$ down to $O(m)$. ::: ### Fancy jagged (group columns into tables) Sometimes, many columns in the trace have the same height — for example, when a whole set of columns comes from the same CPU instruction. Instead of treating each column separately, we can **group them into a single table**. This idea leads to the **fancy jagged** optimization: - Let $k$ be the number of tables, and let $c = \max_y c_y$ be the largest column-log-width among them. - Each table $T_y$ is a block of $2^{c_y}$ columns that all share the same height $h_y$. - Now, instead of computing a selector polynomial $\widehat f_{t,c}$ that tracks **individual columns**, we use one that targets **entire tables**. - This selector is still computed by a **read-once branching program (ROBP)**, but now of width $6$ (slightly wider than before due to the need to account for internal column indices). **Why this helps:** The verifier now needs to process only $2^k$ different possibilities — where $k$ is the **number of tables**, not the number of columns. This means: \begin{equation} \text{Verifier cost} = O(m \cdot 2^{k_{\text{tables}}}), \end{equation} which is significantly smaller when $k_{\text{tables}} \ll k_{\text{columns}}$. This is often the case in real-world zkVM traces, where each instruction maps to a fixed-size table. :::info **Example:** Suppose a zkVM trace has **24 columns**, grouped as follows: - Columns 0–7 come from **Instruction A** (height = 128), - Columns 8–15 come from **Instruction B** (height = 64), - Columns 16–23 come from **Instruction C** (height = 128). Instead of treating this as $k_{\text{columns}} = 24$, we treat it as $k_{\text{tables}} = 3$ (one table per instruction). The ROBP selector now only distinguishes between these **three tables**, reducing the verifier’s branching from $2^{24}$ paths to just $2^3 = 8$. The internal structure of each table (e.g. column index within the group) is handled locally, and the selector $\widehat f_{t,c}$ remains efficient — just slightly wider to account for intra-table addressing. ::: ## Why not use a general sparse PCS? Schemes like Spark, Surge, and Spark++ support arbitrary sparsity but pay for it: | Scheme | Extra polynomials | Mults per entry | Needs explicit location list? | | ------------- | ----------------- | ------------------ | ----------------------------- | | Spark / Surge | ≥ 7 | ≈ 7 | yes | | Spark++ | ≥ $d^2+3d+2$ | ≥ 12 (for $d=2$) | yes | | Jagged PCS | 0 | 5 | no | Because jagged sparsity is highly structured, heights $(t_y)$ encode all locations compactly; the prover never commits to extra oracles. ## An intuitive proof sketch Here’s a high-level summary of how the **jagged PCS** proof works, step by step: 1. **Flattening identity:** We start by rewriting the original jagged function $\widehat p(\mathbf z_{\mathrm{row}}, \mathbf z_{\mathrm{tab}}, \mathbf z_{\mathrm{col}})$ using a selector polynomial $\widehat f_{t,c}$. This identity expresses $\widehat p$ as a weighted sum over the flat array $\widehat q(i)$: $$ \widehat p(\mathbf z_{\mathrm{row}}, \mathbf z_{\mathrm{tab}}, \mathbf z_{\mathrm{col}}) \;=\; \sum_{i \in \{0,1\}^m} \underbrace{\widehat q(i)}_{\text{flattened view}} \cdot \underbrace{\widehat f_{t,c}(\mathbf z_{\mathrm{tab}}, \mathbf z_{\mathrm{row}}, \mathbf z_{\mathrm{col}}, i)}_{\text{selector: picks matching index}} $$ 2. **Sumcheck protocol:** Apply the standard sumcheck protocol to the sum above. This turns a large claim about $M = 2^m$ values into just two smaller claims on multilinear polynomials. 3. **First claim (on $\widehat q$):** One of these claims involves evaluating the committed polynomial $\widehat q$ at a random point $z' \in \mathbb F^m$. This is efficiently handled by the dense PCS (e.g., Basefold), since $\widehat q$ was committed up front. 4. **Second claim (on $\widehat f_{t,c}$):** The other claim involves evaluating $\widehat f_{t,c}$ at that same $z'$. Here’s the key trick: Because the table heights $t_y$ and widths $c_y$ are known, the selector polynomial $\widehat f_{t,c}$ can be computed by a very compact circuit — specifically, a **width-4 or width-6 read-once branching program (ROBP)**. This ROBP: * Keeps track of a running carry (for table/column decoding), * And a flag that marks whether we’ve gone beyond the table’s height. 5. **Efficient evaluation of $\widehat f_{t,c}$:** The ROBP selector is a multilinear function and can be evaluated at any point using dynamic programming across its layers. This takes only $O(m)$ field operations — very fast, even for large $m$. 6. **Soundness guarantee:** The entire argument is a sequence of honest applications of the sumcheck protocol, each with fresh field randomness. So the soundness error is tightly bounded by $2^m / |\mathbb F|$, just like any standard sumcheck. ## Open variants and engineering notes This section highlights practical tweaks and engineering tricks that improve efficiency when deploying the jagged PCS. * **Commit to heights, don’t send them:** Instead of transmitting the column heights $(h_y)$ directly, you can embed them into the trace itself and include them in the polynomial commitment. A small GKR-style subprotocol can then verify that these heights were used correctly when computing the selector $\widehat f_{t,c}$. This removes the verifier’s $2^k$ cost entirely, since she no longer needs to process each column individually. * **Works well with binary fields and 2-adic tricks:** Jagged PCS is fully compatible with low-degree optimizations like: * **binary field packing** (for compact data representation), * and **$2$-adic sumcheck** (which exploits the power-of-two structure for speed). These techniques can be layered on top without modification. * **Tables with non-power-of-two widths:** If a table has width $w$ that’s not a power of two, you can still use jagged PCS by **splitting it** into multiple sub-tables: * Decompose it into $\lceil \log w \rceil$ parts, each of width a power of two. * This ensures that the cost for the verifier remains **logarithmic in $w$**, preserving efficiency. ## Conclusion The jagged PCS bridges two worlds: the natural “many-columns per table” structure of zkVM traces, and the efficiency of committing to just one polynomial. By carefully flattening sparse data into a dense view and using a lightweight selector polynomial to translate structured queries, the protocol achieves: * **One commitment** and **one opening** no matter how many tables or columns exist, * **Verifier costs** that scale only with the number of *tables*, not columns, * **Prover costs** that remain competitive with general-purpose sparse PCS schemes. Advanced tricks like jagged assist, grouped tables, and ROBP-based selectors further compress the verifier’s effort — often down to a handful of field operations. In practice, jagged PCS is especially well-suited for zkVMs and rollups where traces are naturally columnar and instruction-structured. It offers a principled, modular way to turn familiar table-based arithmetizations into succinct proofs with near-optimal overhead — all while playing nicely with existing SNARK tools like sumcheck, GKR, and 2-adic optimizations.