# Albi Protostar tmp notes $$ \newcommand{\NP}{\mathsf{NP}} \newcommand{\Rel}{\mathcal{R}} \newcommand{\npx}{\mathbf{x}} \newcommand{\npw}{\mathbf{w}} \newcommand{\npX}{\mathbf{X}} \newcommand{\npW}{\mathbf{W}} \newcommand{\acc}{\mathsf{acc}} \newcommand{\bin}{\{ 0,1 \}} \newcommand{\l}{\ell} \newcommand{\abs}[1]{\left| {#1} \right|} \newcommand{\set}[1]{\left\{ #1 \right\}} \newcommand{\Angle}[1]{\left \langle {#1} \right \rangle} \newcommand{\concat}{\|} \newcommand{\O}{\mathcal{O}} \newcommand{\zk}[1]{\, {\color{red}#1}} \newcommand{\Adversary}[1]{\mathcal{#1}} \newcommand{\P}{\Adversary{P}} \newcommand{\V}{\Adversary{V}} \newcommand{\Adv}{\Adversary{A}} \newcommand{\Ext}{\Adversary{E}} \newcommand{\Sim}{\Adversary{S}} \newcommand{\Algo}[1]{\mathsf{#1}} \newcommand{\D}{\Algo{D}} \newcommand{\C}{\mathcal{C}} \newcommand{\sample}{\stackrel{\$}{\gets}} \newcommand{\Commit}{\Algo{Commit}} \newcommand{\Open}{\Algo{Open}} \newcommand{\Poly}{\Algo{Poly}} \newcommand{\Pairing}{\Algo{Pairing}} \newcommand{\state}{\Algo{st}} % Keys \newcommand{\Key}[1]{\mathsf{#1}} \newcommand{\ck}{\Key{ck}} \newcommand{\pp}{\Key{pp}} \newcommand{\vp}{\Key{vp}} \newcommand{\dk}{\Key{dk}} \newcommand{\vk}{\Key{vk}} \newcommand{\pk}{\Key{pk}} % ### % # Sets with blackboard style % ### \newcommand{\FF}{\mathbb{F}} \newcommand{\GG}{\mathbb{G}} \newcommand{\NN}{\mathbb{N}} \newcommand{\BB}{\mathbb{B}} % 'Left' Group \newcommand{\GL}{\GG_{1}} % 'Right' Group \newcommand{\GR}{\GG_{2}} % 'Target Group \newcommand{\GT}{\GG_{T}} % Probabilities \newcommand{\Pr}[1]{\mathbb{P}\left[ {#1} \right]} \newcommand{\PrCond}[2]{\mathbb{P}\left[ {#1} \ \middle|\ {#2} \right]} % Pairing Group \newcommand{\GroupElementNormal}[2]{[#1]_#2} \newcommand{\GroupElementBig}[2]{\big[{#1}\big]_#2} \newcommand{\GroupElementVariable}[2]{\left[{#1}\right]_#2} \newcommand{\gl}[1]{\GroupElementNormal{#1}{1}} \newcommand{\glb}[1]{\GroupElementBig{#1}{1}} \newcommand{\glv}[1]{\GroupElementVariable{#1}{1}} \newcommand{\gr}[1]{\GroupElementNormal{#1}{2}} \newcommand{\grb}[1]{\GroupElementBig{#1}{2}} \newcommand{\grv}[1]{\GroupElementVariable{#1}{2}} \newcommand{\gt}[1]{\GroupElementNormal{#1}{T}} \newcommand{\gtb}[1]{\GroupElementBig{#1}{T}} \newcommand{\gtv}[1]{\GroupElementVariable{#1}{T}} % Multi-Variate polynomials \newcommand{\pow}{\mathsf{pow}} \newcommand{\shift}[1]{{#1}^{\circlearrowleft}} \newcommand{\mv}[1]{\underline{#1}} \newcommand{\u}{\mv{u}} \newcommand{\i}{\mv{i}} \newcommand{\X}{\mv{X}} \newcommand{\even}{\mathsf{even}} \newcommand{\odd}{\mathsf{odd}} \newcommand{\col}[1]{\mathbf{#1}} $$ This document contains notes and ideas as a result of an implementation of Protostar in halo2. Originally, the plan was to introduce folding as an optional "prover", and implement a decider using the existing PlonK-ish prover. Due to time constraints, only the folding prover and verifier were implemented. ## When is folding useful? It is important to remember that folding is only a performance improvement over existing SNARK recursion/accumulation techniques, and is limited to the case where the prover is iteratively proving the **same** circuit. The two main improvements are - **Efficient Prover**: The prover performs less MSMs and there are no FFTs - **Smaller Recursive Verifier**: The circuit verifying the folding iteration has fewer MSMs and hashes, leading to fewer constraints for an IVC circuit. It is still unclear what would be the best strategy for designing circuits and provers so that proving is asymptotically optimal. What is clear though, is that folding offers a much larger tradeoff space than existed with PlonK-ish circuit design. Concepts such as the circuit size, constraint degrees, and columns all take on different interpretations, and taking these differences into account can lead to much better performance. ## Protostar Implementation In what follows, we explain how certain concepts in halo2 were adapted to Protostar while detailing some ideas of how they could be done more efficiently. We also assume familiarity with the architecture of halo2 and use the some of the terminology from it. ### Program trace and IOP A chip will reference several columns that are part of the entire execution trace. By trace, we refer to the witness values stored in the column which are committed to by the prover, and queryable by the verifier. When referring to all columns or challenges of a specific type, we use the vector notation to indicate so. For example, the set of all fixed columns of a circuit is $\vec{\col{f}} = [\col{f}_0, \ldots, \col{f}_{n_\col{f}-1}]$, where $n_\col{f}$ is the number of fixed columns. #### Constant Columns Constant columns contain values which are the same for every execution of the circuit. They are invariant to the inputs of the circuit, and can be pre-processed and included as part of the proving key. We denote a generic constant column by $\col{q} \in \FF^n$. Most constant columns will be of the `Fixed` type which we denote as $\col{f} \in \FF^n$. For example, they could contain coordinates of fixed base points of an ECC scalar multiplication operation, or the constants of a hash function. We also consider `Selector` columns which contain only binary values, used to indicate whether a constraint should be active at a given row. We denote them as $\col{s} \in \bin^n$. #### Witness Columns Witness columns contain data which depends on the circuit's inputs. We denote these columns by $\col{w} \in \FF^n$. They must be committed to during proof generation. Public inputs/outputs are referred to as `Instance` values which are contained in columns $\col{i} \in \FF^n$. They can serve two purposes: - They contain values that must be known by the verifier, such as the actual outputs of the computation verified by the circuit. In this case, they are padded with zeros, and hashed in the clear as part of the transcript. This counts as committing to the column, and during the proof generation phase, the verifier can interpolate the values to compute evaluation of the polynomial they define. - The columns are large but contain values that the verifier knows are valid. For example, it could contain a list of public keys of a set of validators to be looked up during the verification of a signature. Most circuit values will be contained in `Advice` columns, denoted by $\col{a} \in \FF^n$. These can be considered as all intermediary variables that were constructed as part of the circuit's evaluation. They do not need to be known by the verifier, and are committed to as part of the proof generation. ##### Instance handling Halo2 currently supports both short instances that can be added in the clear to the transcript, or longer ones that need to be committed and subsequently queried. Committed instances seem to be designed for the case where they correspond to lookup tables. They can be created and verified by another circuit, and then treated as "trusted" by the verifier. For example, it would be used to delegate certain expensive constraints to specialized circuits, such as hashes. Usually, one would use these columns to lookup rows back into the circuit. For the case where instances are simply values to be used inside the circuit, treating them as separate columns can be somewhat wasteful. In the case where they represent - For cases when the number of instances is small, it would be beneficial to prove their usage in the circuit by leveraging the permutation argument - The idea is to virtually extend the permutation argument's domain to include the new instance values, and copy the values into advice columns. - Instance columns no longer need to be padded to the size of the circuit. - If instances need to be committed, they should be treated in that way. #### Challenges As noted above, new versions of halo2 enable a chip to access randomness that is independent of the inputs. These are sampled from the random-oracle during the generation of the proof, and crucially this must happen after advice columns have been committed to. We denote these as $c \in \FF$. One of the subtleties with Protostar is that challenges inside expressions are treated as variables rather than constants as is usually the case in PlonK-ish arithmetizations. If we take a RLC like $\sum_{j=0}^{m-1} c^j \cdot w_j$, both $c$ and the witnesses $w_j$ have degree 1, so the total degree of the expression would be $m$. In order to prevent the degree of expressions from blowing-up, we use the following trick: Whenever the verifier sends a challenge $c$ to the prover, we assume that the prover responds with $c_2, c_3, \ldots, c_{m-1}$ such that $c_j = c^j$. Since these prover messages are "untrusted", the verifier would need to check the constraints $c_j - c\cdot c_{j-1} = 0$. Fortunately, since $m$ is usually small, the verifier can compute these directly ensuring they are correct by construction. Note that these new challenge powers must be folded in the same way as any other prover message. #### Sending fewer columns The recursion overhead of an IVC circuit for Protostar depends on the size of the folding verifier circuit. This is mainly impacted by the number witness columns committed (scalar multiplications) and the number of rounds in this witness generation phase (hashes). It is always possible to layout a halo2 circuit in such a way that it only contains a single advice column by zipping the values of multiple columns together, and querying the single column at different relative offsets. Unfortunately, this results in worse FFT performance since the number of field operations scales super-linearly with the number of rows. Protostar does not require any FFTs, so this is not a concern. Moreover, the performance of the MSM during the commitment phase improves thanks to the asymptotic of Pippenger's algorithm. Therefore, one could expect Protostar to perform much better during the folding phase when only a single advice column is used, as this yields better MSM performance for the prover, and fewer hashes and scalar multiplications for the recursive folding verifier. This does have an impact for the decider prover, as they they perform work which is linear in the maximum column size. In the original PlonK-ish prover, the prover can split a column into multiple ones and batch their evaluations together to minimize the maximum column size. We could argue though, that in the case of folding, the main circuit would be rather small. Therefore, the amount of work done during by the decider should not be too demanding compared to the amount of work done during all folding iterations. Having a single column also ensures that values which are constrained together are close together in memory, enabling better cache locality during proving. ### Gates and Constraints A `Circuit` in halo2 is defined by a set of `Chips` which provide fully constrained functions that a user can use to describe the logic of the circuit. The base halo2 library already ships chips for ECC operations, Poseidon and SHA256 hashes, and more. The goal of a chip is to isolate and abstract the details of writing polynomial constraints. When a chip is initialized, it calls into the circuit's `ConstraintSystem` to tell it how it is going to generate witness values, and what polynomial constraints must be enforced. More precisely, it tells the constraint system how many columns (`Selector`, `Fixed`, `Instance`, or `Advice`) it requires, whether certain columns can be used to copy values from other parts of the trace (equality-enabled), whether it needs lookups and how to fill in those values, and crucially, the constraints that are applied to the values in the columns. > The PSE fork also allows chip designers to write constraints which depend on random challenges which are sampled during the proof generation. This is useful for writing a RLC of different values directly into the constraint. The most direct application of this is a batch zero-test of values. > > Another recent update introduced a permutation gadget for ensuring that two sets of values across the whole trace are equal. We can for example consider all range-constrained values of a circuit as one set, and copy them to another section of the trace where we apply constraints to ensure they are all in the desired range. Given access to all the columns and challenges of the types described above, a `Gate` defines the polynomial constraints that are applied to these values to ensure they are valid. In general, a gate will consider a bounded region of the entire circuit trace, and is mostly isolated from the other components of the circuit. A gate is defined by a set of multivariate polynomials $G_1, \ldots, G_m$, where their variables reference different columns. Each polynomial is called a _constraint_, and can be written in a general way as $$ G_j\big( \vec{\col{s}}, \vec{\col{f}}; \vec{\col{i}}, \vec{\col{a}}, \vec{c} \big). $$ There are several subtleties to note. - A constraint usually will only reference a subset of all the circuit's columns, so some variables are simply unused. - Constraints can refer to a column at different _rotations_. That is, when we evaluate some $G$ at row $i \in [n]$, we can refer to values of a column $\col{f}$ at row $i + r$, where $r \in \mathbb{Z}$. For simplicity, we can assume that rotated columns are duplicated with the desired cyclic rotation applied. That is, for each rotation $r$ of a column $\col{f}$, we include a new column $\col{f}' \in \FF^n$ such that $\col{f}'_i = \col{f}_{i+r \bmod n}$ for all $i \in [n]$. From here, we can mostly ignore the categorization of constraints within a specific gate. We will think of the union of all the circuit's gate constraints as the set of polynomials $$G_0, \ldots, G_{m-1}$$ A circuit is considered valid for a given set of witness column if $$ G_j\big( \vec{\col{s}}_i, \vec{\col{f}}_i; \vec{\col{i}}_i, \vec{\col{a}}_i, \vec{c} \big) =0, \quad \forall j \in [m], \forall i \in [n]. $$ That is, when we evaluate each constraint $G_j$ over the values of the columns at row $i$, the polynomial should evaluate to 0. In order to handle multiple constraints efficiently, we can batch all constraints using a random challenge $y \in \FF$ sampled by the verifier. We would then write $$ G\big( \vec{\col{s}}, \vec{\col{f}}; \vec{\col{i}}, \vec{\col{a}}, \vec{c} \big) = \sum_{j\in[m]} y^j\cdot G_j\big( \vec{\col{s}}, \vec{\col{f}}; \vec{\col{i}}, \vec{\col{a}}, \vec{c} \big). $$ With high probability, the circuit is considered valid if $$ G\big( \vec{\col{s}}_i, \vec{\col{f}}_i; \vec{\col{i}}_i, \vec{\col{a}}_i, \vec{c} \big) =0, \quad \forall i \in [n]. $$ #### Selectors We take a bit of time to discuss the use of _selectors_ and how they are used to describe constraints. - There is usually only a single selector per constraint, since it is used to toggle whether the corresponding constraint should be active. We can therefore often write it as $$G_j\big( \vec{\col{s}}, \vec{\col{f}}; \vec{\col{i}}, \vec{\col{a}}, \vec{c} \big) = \col{s}_j \cdot G_j'\big(\vec{\col{f}}; \vec{\col{i}}, \vec{\col{a}}, \vec{c} \big),$$ where $G'$ no longer depends on the other selectors. Halo2 refers to these top-level selectors as _simple selectors_. - Gates can use _complex selectors_ to define more complex constraint selection logic. For example, take the following constraint $$\col{s}_j \cdot\Big( (1-\col{s}'_j)\cdot G^{(0)}_j\big(\vec{\col{f}}; \vec{\col{i}}, \vec{\col{a}}, \vec{c} \big) + \col{s'}_j\cdot G^{(1)}_j\big(\vec{\col{f}}; \vec{\col{i}}, \vec{\col{a}}, \vec{c} \big) \Big).$$ Here, $\col{s}_j$ is a simple selector, which activates the entirety of the $j$-th constraint, while $\col{s}'_j$ is a complex selector which selects either $G^{(0)}_j$ or $G^{(1)}_j$ at a given row. - Within a specific gate, it is common that all constraints share the same selector. This happens for instance when the circuit developper uses `Constraints::with_selector` to group several constraints together. The use of selectors in halo2 seems to be mainly motivated by the [observation](https://zcash.github.io/halo2/design/implementation/selector-combining.html) that multiple independent binary selectors can be combined into a single constant column which selects exactly one constraint at each row. This is an optimization that helps reduce the number of scalar multiplication performed by the verifier. After this optimization takes place, all selector columns are considered as fixed ones. This does not apply to the folding stage of Protostar, since the number of scalar multiplications performed by the verifier only depends on number of committed witness columns. In fact, it makes it harder to optimize the folding prover, since it benefits greatly from knowing when constraints will be inactive. In the folding phase of Protostar, the prover operates on the values in the columns directly, rather than their extension over an FFT domain. This means that when the prover must evaluate a constraint at some row $i\in[n]$, the prover is able to optimize the evaluation of the constraint in that row. In particular, we can consider the partially evaluated constraint $$ G_i\big( \vec{\col{i}}_i, \vec{\col{a}}_i, \vec{c} \big) = G\big( \vec{\col{s}}_i, \vec{\col{f}}_i; \vec{\col{i}}_i, \vec{\col{a}}_i, \vec{c} \big), \quad \forall i \in [n]. $$ The idea is that we can "bake in" the values from the constant columns to obtain a specific constraint for the row $i$. If for example, only the first constraint $G_0$ is active (that is, $\col{s}_{0,i} =1$), then we would have $$ G_i\big( \vec{\col{i}}_i, \vec{\col{a}}_i, \vec{c} \big) = G_0'\big( \vec{\col{f}}_i; \vec{\col{i}}_i, \vec{\col{a}}_i, \vec{c} \big), \quad \forall i \in [n]. $$ When evaluating the constraint at row $i$, we could entirely skip the evaluations of $G_1,\ldots, G_{m-1}$ since we know their evaluation would be cancelled by their respective selectors. There are many more observations that can be made surrounding selectors. For instance, they also enforce the linear independence of different constraints so that they do not need to be multiplied by the challenge $y$. This is the case for the example of complex selectors, where $G^{(0)}_j$ and $G^{(1)}_j$ are guaranteed to be independent, since only one of them is active at any given row. Looking at the actual constraint in each row (after partially evaluating in the constant columns), each polynomial $G_i$ will have a specific degree $d_i$ which may likely be lower than $\deg G$. As we will see during the _ErrorCheck_, $G_i$ will need to be evaluated $\approx d_i$ times. Therefore, being able to know exactly what to evaluate and how many times will make the folding prover more efficient. #### Expressions In halo2, a constraint is described as an `Expression`, a binary tree where - Leaves correspond to variables such as - `Constant(F)` - `Challeng(idx)` - `Selector(col, is_simple)` - `Fixed(col, rot)` - `Instance(col, rot)` - `Advice(col, rot)` - Nodes correspond to mathematical operations such as - `Mul(e1, e2)` - `Add(e1, e2)` - `Neg(e)` - `Scale(s, e)` Users can easily build an `Expression` by using the overloaded `+,*,-` operators. Unfortunately, this also means that they can only be built at run-time since children nodes must be allocated on the heap. An `Expression` can be evaluated by recursively traversing the tree. This is done many times during a proof generation (approximatively $d\cdot n$ times, where $d$ is the maximum degree of the overall expression). Since nodes contain references to their children, this incurs pointer-chasing at every single evaluation, which can lead to worse performance than having an inlined evaluation function. A useful property of `Expressions` is that they can be easily manipulated in code. For example, we can map from one type of expression to another while preserving the tree structure. The optimized evaluation sub-routine of PSE works along these lines to create a more efficiently evaluatable tree, using better variable caching. #### Challenge Substitution In Protostar, the degree of expressions depends on the degree of challenge leaves, since these need to be interpolated from two different transcripts. For example, the expression for checking whether $w_0,w_1,w_2$ are 0 $$ G(c,w_0, w_1, w_2) = w_0 + c\cdot (w_1 + c \cdot w_2) $$ has degree 3. If we introduce the symbolic variables $c_1 = c, c_2 = c^2$, we can write it instead as $$ G'(c_1, c_2, w_0, w_1, w_2) = w_0 + c_1 \cdot w_1 + c_2 \cdot w_2 $$ Essentially, we replace products of a challenge by a variable representing its power, such that $G'$ now has degree 2. To perform the above substitution, we can apply the following recursive algorithm ```rust // Multiplies G by c^p, such that only leaf nodes are multiplied by challenges, fn multiply(G, c, p = 0): match G: // Replace "multiplication by c" node with e, // and multiply the children by c^{p+1} c * e => multiply(e, c, p+1) // addition and negation simply recurse over children e1 + e2 => multiply(e1, c, p) + multiply(e2, c, p) -e => -multiply(e, c, p) // replace multiplication c*c^p // with symbolic variable c_{p+1} = c^{p+1} c => c_{p+1} // replace multiplication v*c^p of leaf by challenge // with symbolic variable v => v * c_p ``` The intuition is that we let the multiplication by $c^0$ trickle down from the root. When we encounter a challenge, we accumulate it into $c^p$, and continue recursing until we encounter a leaf. If this leaf is challenge, we replace it with the symbolic power $c_{p+1}$, and otherwise, multiply it by the $c_p$ #### Compile-time expressions Rather than encoding constraints as run-time trees, we can use Rust's type system to define them as a actual types. This would ensure that the compiler can emit optimized code for evaluating the polynomial. Unfortunately, we would lose the ability to create constraints as easily as halo2, since it is not possible to use the `*,+,-` operators in this way. While this means we lose the ability to describe constraints as polynomials, we would have to use specialized functions to build expressions. This is done in [plonky3](https://github.com/Plonky3/Plonky3/blob/main/air/src/air.rs) and leads to constraints being easier to parse in code, since the intent of it would be clearer. ### Permutations Permutations for PlonK-ish circuits are enforces via linear constraints of the form $w_i - w_{\sigma(i)} = 0$, which results in a trivial error term and can therefore be ignored by the folding prover. While they must still be proved during the decider phase, circuits this should correspond to a marginal cost compared to the many folding iterations. This means that circuits could be designed by allowing many more columns to participate in the permutation argument. ### Lookups Once all instance, advice and challenges have been produced, the prover and verifier engage in the lookup argument to prove that certain elements of the trace are contained in another set of (usually) preprocessed columns. This is useful for storing all precomputed inputs and outputs of functions which are generally not friendly to polynomial constraints. Examples include binary operations like XOR, AND, etc, or S-boxes for hash functions. There are several candidate constructions for handling lookups, including Plookup, halo2, and more recently logUp. The two first ones were built on top of the original permutation argument, wherein the prover commits to permutations of the input and table columns. This is undesirable since this incurs an MSM of the size of the table at every folding step. The logUp technique is more efficient since the prover only commits to columns of the same size as the number of entries looked up. While some of these vectors are of length $T$, they only contain $L$ non-zero elements which corresponds approximatively to the number of scalar multiplications required for the commitment. Moreover, the Protostar paper details a technique for ensuring that the amount of work performed by the prover during the folding phase stays linear in the number of lookups $L$. #### Efficient multiplicities computation Geometry has [implemented](https://github.com/geometryresearch/halo2) logUp for halo2. Unfortunately, the API exposed for looking up a value from a table does not have an efficient mechanism for obtaining the index of the row from the table that the values was queried at. This index is required during proving to efficiently generate the $m$ commitment which maps each entry in the table to the number of times it was looked up. Instead, the prover has to recompute this mapping at each folding iteration, which still incurs a cost linear in the table size. For a pre-processed function evaluation table, the desired behavior would be to for the user to provide the inputs of the memoized function, and let the circuit builder compute the corresponding index of this evaluation in the table, so it can fetch the output from the table to store it as part of the witness. Under the hood, the circuit builder would store the index of the lookup to facilitate the creation of the $m$ column. For unindexed lookups, tables could be stored as maps which associate entries with their index. #### Combining fixed tables halo2 allows circuit designers to provide several distinct lookup arguments, where each lookup defines a separate table. Unfortunately, this means that each lookup argument must be proved separately, and greatly increases the number of commitments sent by the prover during folding. This makes sense when some tables are fixed, and others are "online" since both sets are committed to at different times, but this comes at a large cost for the recursive verifier as it must process more commitments. For fixed tables, it is easy to combine all tables into a single one by concatenating them including an additional fixed "table selector" column which to indicate which table is actually looked up (to prevent collisions). Padding each entry with zeros would not affect the prover during proving since it would know how many entries are actually in the row. The end result would be a reduction in the number of commitments sent by the prover, and would therefore lead to a smaller recursive verifier circuit. #### Online lookups In STARK literature, lookup arguments are used as "communication busses" between different parts of the execution trace. Entire rows can be copied from one section to another, where the goal is to apply heaver constraints to only the rows where it is necessary. For example, Miden has a separate STARK trace for hashes, so that the hash constraints do not need to be applied to the main part. This is not necessary in Protostar, since we can already conditionally apply constraints to certain rows by looking at selectors. Another use case for them is implementing RAM inside a circuit. The main idea is that a RAM can be simulated by checking all timestamps and access to ensure consistency across all rows. Every time the RAM is read or written to, the row is copied to another section of the trace, where these entries are sorted by their index and timestamp. Constraints ensure for example that consecutive reads have the same value, and that they are preceded by a write operation. In the context of Protostar, we do not actually need to store these copies in different tables, but simply store them at the end of the main trace. Selectors ensure the constraints are only applied to relevant sections. Since RAM can be a fundamental part of circuit design, it would make sense to integrate this #### Compressed Verifier ### ErrorCheck ### Decider