Starks engine - HackMD

# STARK Engine Now that we are familiar with virtual machines, Brainfuck ISA, and STARKS- we will bring it all together to see how our STARK engine works. This Turing-complete zk-STARK engine consists of a virtual machine, prover, and verifier. The instruction set it uses is Brainfuck, which was discussed earlier. This was chosen as the target language due to its well-known and simple instruction set, but this engine can be generalized to arbitrary instruction set architectures as well. --- ## Interaction between the VM, Prover, and Verifier Given a Brainfuck program and a standard input, both of which are public to the prover and verifier: 1. The VM takes this program and runs it with corresponding inputs to generate a trace and output. 2. The prover takes the program, input, output, and trace and gives a proof of computational integrity. 3. The verifier verifies the integrity of this proof with access to the program, input, output, and the proof itself. The diagram below shows how these parties interact with each other and what they have access to. ![INPUT](https://hackmd.io/_uploads/r1GEm6iLJl.png =600x) --- ## Register Set of the VM To understand what the trace that our VM generates is, we need to look at the register set of our VM which consists of:-. - `clk`: Represents the number of cycles of state changes till that point. Its value increases by one in every cycle. - `ip`: Represents the index of the current instruction in the Brainfuck program. - `ci`: Represents the value of the current instruction (i.e., one of the 8 values in the BF ISA). - `ni`: Represents the value of the next instruction. - `mp`: Represents an address in RAM. - `mv`: Represents the value of the RAM at the address. - `inv`: Represents the inverse (modulo prime field) of `mv`. The entire RAM is initialized to zero, i.e. for all `mp`, the initial value of `mv` is zero. At the start of the computation, `mp` is set to zero as well. Now, our VM, following the von Neumann architecture can:- 1. read the next instruction from program and decode it; 2. (possibly) read a word from memory or standard input; 3. update the register set in accordance with the instruction; 4. (possibly) write a word to memory or standard output. ![MACHINE](https://hackmd.io/_uploads/SyKRvTjIJl.png =500x) --- ## Tables The trace that the VM returns are tables which record the state changes of the register set. We will first look at all these tables briefly. ### Processor Table The processor table represents the state changes of all 7 registers discussed above during the entire computation. Its rows are sorted according to `clk`, or the cycle number. ### Memory Table The memory table is derived from the processor by taking 3 registers’ state changes (or their corresponding columns), `clk`, `mp` and `mv`, and then sorting the rows according to `mp`, and secondly by `clk`. #### What is the need for a memory table? There are two problems we wish to address: 1. The initial value of all memory cells is zero, but these zero-valued memory cells can be accessed in the middle of the computation. In the trace or the rows of the processor table, there is no boundary constraint that can be enforced at a fixed row or memory pointer. 2. When the `mp` is reset to a previous value in the processor table, `mv` changes. Its new value now must be consistent with the value it had the last time the memory pointer pointed to it. This constraint pertains to two rows, but the problem is that these two rows are generally not consecutive and not even spaced apart by a static amount in the processor table. After we have a table, whose columns represent the state changes of certain registers during the computation, we can apply boundary constraints on the first row to enforce the initial state of the registers. We can apply transition constraints on two consecutive rows to enforce computational integrity of the registers’ change of state. We can also enforce terminal constraints, which we’ll talk about later on. Now, since transition constraints can only apply to **consecutive pairs of rows**, we need a table which can address our above issues and constraints of `mv` for a change in `mp`. This can be done if only the table’s rows were sorted not by `clk` but by `mp` first and then `clk`. This purpose is solved by the memory table. The transition constraints that enforce the issues raised above can now be articulated:- - Whenever `mp` changes, the new `mv` is zero (since RAM is initialized to zero). - Whenever `mp` is unchanged and the `clk` jumps by more than one, the `mv` cannot change. Therefore the processor table constraints enforce the correct evolution of the register set through time. And the memory table constraints enforce the consistency of RAM accesses. Since it is important to have both to prove computational integrity, the prover commits to both tables. The verifier also verifies two sets of AIR constraints, one for the processor table and one for the memory table. But one question remains: How does the verifier verify that these tables pertain to the same execution trace or the same computation for the same Brainfuck program and standard input? #### Consistency of memory and processor table Here is where [**permutation arguments**](https://hackmd.io/@Muskan05/r1Q8_I0Uke#Permutation-Argument) come into the picture. A permutation argument establishes that the sets of rows (only considering `clk`, `mp` and `mv`) of the processor and memory tables are identical. Without loss of generality, let’s assume that the prover has access to random scalars $a$, $b$, $c$ and $\alpha$ (we will later see where we get these random scalars from) supplied by the verifier. The prover then uses these scalars to compress all columns into one (here clk, mp, and mv), simply by taking a weighted vectorial sum. $$ a*clk + b*mp + c*mv $$ The verifier uses the same scalars to compute the same weighted sum in select coordinates but obviously never accesses the entire vector. This construction of taking weighted sums allows us to reduce a claim about the identical sets of **rows of two tables**, to an equivalent but easier to prove claim about the identical sets of **elements of two columns**. We now apply permutation arguments on these two compressed columns `ci` (of processor) and `ki` (of memory) by extending them with a new column that computes the running product of the permutation argument by accumulating one element of the column at each step using the random scalar $\alpha$. We then compare the terminal value of the extension column or the final product to check if both tables have the same state changes in permuted order. | $c_i$ | $e$ | |-------|-----| | $w$ | $\alpha - w$ | | $x$ | $(\alpha - w)(\alpha - x)$ | | $y$ | $(\alpha - w)(\alpha - x)(\alpha - y)$ | | $z$ | $(\alpha - w)(\alpha - x)(\alpha - y)(\alpha - z)$ | | $k_i$ | $e$ | |-------|-----| | $x$ | $\alpha - x$ | | $y$ | $(\alpha - x)(\alpha - y)$ | | $w$ | $(\alpha - x)(\alpha - y)(\alpha - w)$ | | $z$ | $(\alpha - x)(\alpha - y)(\alpha - w)(\alpha - z)$ | If the two columns really do contain the same values but in a different order, then the products should be the same: $\prod_i (\alpha - c_i) = \prod_i (\alpha - k_i)$. ### Instruction Table The instruction table is made by combining the rows of the processor table and of the program itself, for three registers `ip`, `ci` and `ni`, and then sorting them according to `ip`. #### What is the need for an instruction table? We need to ensure that the order and indexing of instructions in the program, i.e. `ip`, corresponding `ci`, and then `ni`, of the program are consistent with the order and indexing of instructions of the processor table. But because execution is sorted by `clk`, and not `ip`, in the processor table, it is not possible to compare this consistency of program, `ip`, `ci` and `ni`, with computation for consecutive pairs of rows (recall from memory tables how transition constraints can only be applied to consecutive rows). Therefore we need an instruction table to check this consistency. #### How does the instruction table do this? Now, since transition constraints can only apply to consecutive pairs of rows, we combine the rows of the program and processor table sorted by `ip`. Now to ensure consistency of order and relation on `ip`, `ci` and `ni`, we again use arguments. The argument used here is an [**evaluation argument**](https://hackmd.io/@Muskan05/r1Q8_I0Uke#Evaluation-Arguments). An evaluation argument like discussed before, ensures that the instructions of the program occur in the same order and consistency of registers in the processor table as well. Again, without loss of generality, let’s assume that the prover has access to random scalars $d$, $e$, $f$ and $\beta$ (we will later see where we get these random scalars from) supplied by the verifier. The prover then uses these scalars to compress multiple columns into one (here, `ip`, `ci`, and`ni`), simply by taking a weighted vectorial sum, just like in memory table’s case. $$d*ip + e*ci + f*ni$$ Again, this construction of taking weighted sums allows us to reduce a claim about the identical sets of **rows of tables**, to an equivalent but easier to prove claim about the identical sets of **elements of a column**. We now apply evaluation arguments on this compressed column `ci` (of instruction table) by extending it with a new column that computes the running sum of the evaluation argument using the random scalar $\delta$ for the accumulation step. The very important thing to note here is that unlike the case of permutation argument discussed in memory table, each element or row is **not accumulated at each step** because the processor table can have many occurrences of the same `ip`, `ci` and `ni`, (due to the repetitive loop nature of instructions `[` and `]`). As we want to prove the consistency of registers with the program, we only need to accumulate one instance of each repeating row. This is achieved by applying a condition that whenever the next row’s ip is the same as the current one, that row will not get accumulated. Once we get a running sum terminal with the following conditional, we can compare it with a program evaluation argument supplied by the verifier. If the two values are the same, we can be convinced that the program appears as a sublist in the same order of the processor table sorted by `ip`. Now that this is out of the way, how do we ensure that the repeating rows of the instruction table are consistent with those with the processor table, i.e. just sorted by `ip`, and not tampered with. #### Consistency of instruction and processor table Here we use a [**permutation arguments**](https://hackmd.io/@Muskan05/r1Q8_I0Uke#Permutation-Argument) again that establishes that the sets of rows (only considering `ip`, `ci` and `ni`) of the processor and instruction tables are identical (removing the rows present due to combining with program rows of course). This is again ensured by extending the weighted sum or compressed column, and using a condition for accumulating that, whenever the next row’s `ip` is same as that of the current row, only then the current row is accumulated in the running product. This ensures the above stated purpose. The terminal of this running product is then compared with the terminal of the running product obtained by applying a permutation argument over the combined element of `ip`, `ci`, and `ni`, in the processor table. If the values are the same, then we are assured that instruction and processor tables are consistent with each other and that the former is correctly derived from the latter by combining with the program's rows. **Note:** Keep note of how the conditionals for accumulating are applied on the instruction table arguments and extending of columns. The permutation argument of the processor table accumulates every row just like its case in the memory table above. These conditionals are required for the instruction table, because we combine rows from two different places and check two separate consistencies. ### Input Table The input table has those rows of the processor table where the `ci`, or current instruction is , (or the Brainfuck instruction for input). It only has the register `ci`, which is extended to apply an evaluation argument on the `ci` column, whose terminal value is then compared with the input evaluation argument of the processor table. The consistency of the terminal values, or last row’s running sum, ensures that the input order of the processor table (visible as trace only to prover) is consistent with the inputs of the input table (which is public to both prover and verifier). ### Output Table The output table has those rows of the processor table where the `ci`, or current instruction is . (or the Brainfuck instruction for output). It only has the register `ci`, which is extended to apply an evaluation argument on the `ci` column, whose terminal value is then compared with the output evaluation argument of the processor table. Like the above case of input table, the consistency of the terminal values, or last row’s running sum, ensures that the output order of the processor table (visible as trace only to prover) is consistent with the outputs of the output table (which is public to both prover and verifier). --- ## All Tables, Registers, and Arguments summarized: - **Processor table** (`clk`, `ip`, `ci`, `ni`, `mp`, `mv`, `inv` : sorted by `clk`) Memory permutation argument (**mpa**: `clk`, `mp`, `mv`) Instruction permutation argument (**ipa**: `ip`, `ci`, `ni`) Input evaluation argument (**iea**: `ci`, where `ci = ,`) Output evaluation argument (**oea**: `ci`, where `ci = .`) - **Memory table** (`clk`, `mp`, `mv` : sorted by `mp`, then `clk`) Processor permutation argument (**ppa**: `clk`, `mp`, `mv`) - **Instruction table** (`ip`, `ci`, `ni` : sorted by `ip`) Processor permutation argument (**ppa**: `ip`, `ci`, `ni`) Program evaluation argument (**pea**: `ip`, `ci`, `ni`) - **Input table** (`ci`, where `ci = ,`) Evaluation argument (**ea**: `ci`, where `ci = ,`) - **Output table** (`ci`, where `ci = .`) Evaluation argument (**ea**: `ci`, where `ci = .`) ### Terminal Comparisons for Consistency We compare the terminals of the mentioned arguments to ensure consistency as follows: - `Tmpa` (of processor) = `Tppa` (of memory) - `Tipa` (of processor) = `Tppa` (of instruction) - `Tiea` (of processor) = `Tea` (of input, public to both prover and verifier) - `Toea` (of processor) = `Tea` (of output, public to both prover and verifier) - `Tpea` (of instruction) = program evaluation terminal given by verifier The diagram given below summarizes how the tables are related with each other, and with input, ouput, and program. ![Instruction Table (1)](https://hackmd.io/_uploads/ryh_f0oIkx.png =700x) --- ## Table Interface and Interpolation Every table discussed above comes with AIR constraints that establish the correct state changes of the registers. Moreover, the terminals of the arguments assert the correctness of interactions between them, as discussed above. We want to translate this notion into the source code, so it makes sense to create a Table struct that captures the important data to prove the VM computation. Below is a list of objects contained in the Table struct: * `field`: In our case, we are using the Goldilocks field. * `matrix` :Represents the part of the algebraic execution trace that this table captures. * `base_width` : The width of the matrix before extension. * `full_width`: The width of the matrix after extension. * `length`: The number of rows in the table before padding. * `height` : Represents the rounded-up next power of two of the table's length. The table is padded to reach this height. * `generator`: The generator of the multiplicative group of the finite field,common to all tables which is used to interpolate the column. * `order`: The multiplicative order of generator. * `omicron`: The generator of the subgroup of order `height` over which the columns are interpolated. Having discussed all the objects of the Table struct, let us now talk about the interpolation of each table. There are a total of five tables and their respective registers as stated in the summary. We establish three types of AIR constraints: - boundary - transition - terminal (The details of these constraints will be discussed in the [next blog](https://hackmd.io/@Muskan05/B1AGebo8Jx)) To form the polynomials equations of these AIR constraints for all the tables, we interpolate their columns using [Lagrange interpolation](https://hackmd.io/@Muskan05/rJb7hejU1x#Lagrange-Interpolation) to form polynomials for each of the register columns. We also form three kinds of zerofier polynomials: - boundary zerofiers - transition zerofiers - terminal zerofiers For each table, their respective zerofiers divide the corresponding AIR constraints (formed by the interpolated polynomials of registers), to give quotient polynomials which enforce the given constraints for the rows they are applied to. ## Codewords These quotient polynomials are evaluated on an [expanded domain](https://hackmd.io/@Muskan05/rJb7hejU1x#Expanded-domain) and converted into reed-solomon codewords. ## Height and Padding of Tables Since we are going to apply FRI protocol later on codewords (similar to what was discussed in the STARKS example), we need the interpolation to happen on rows whose number is a power of 2. For this, we pad all the tables by adding dummy or padding rows. Now the question arises, what are the number of these padding rows, or what is should be the height in the table struct for all tables (the power of 2 the table is padded to)? We need a common height for all tables, or a common domain for all polynomials. This is because we are going to add all quotient polynomials later to form a combination polynomial and this requires a common domain for all. It is intuitive to think that the height we should go for is the maximum of all tables' lengths rounded to power of 2. You can recall now that the instruction table has the maximum length (since it has both rows of processor table and the program), thus the height or padded length of the instruction table will also be the maximum. Therefore this is the height that all the tables are padded to. Once we have added the number of padding rows satisfying this height for all tables, we interpolate the columns. ## Base Codewords Once we've padded all tables received in the trace given by the VM to the instruction table's height, we apply lagrange interpolation to all the base register columns. We evaluate all these polynomials on the [expanded domain](https://hackmd.io/@Muskan05/rJb7hejU1x#Expanded-domain) to form base codewords for all the tables. These base codewords are zipped together to form a single zipped base codeword. **How does zipping occur?** We convert each field element in the codeword to a byte array. These byte arrays are then concatenated in order of zipping for each respective index of the codeword ### Need for a zipped codeword We discussed earlier how the prover receives deterministic random scalars from the verifier to extend the base columns and form extension columns which apply evaluation and permutation arguments for the tables. If we were in an interactive setting, the verifier would have sent these random scalars (or challenges as we'll refer to them in this context) to the verifier, but since our zkVM needs to be non-interactive, we will use Fiat Shamir. For Fiat Shamir, the commitment that we sent to the merkle tree at the start will be the zipped codeword. We will then receive a random scalar as discussed in Fiat Shamir earlier. For subsequent challenges, or random scalars, we will concatenate this data with the codeword and hash it again to get another random scalar and so on for as many challenges that we require. Since this is deterministic, the prover cannot cheat and it allows the verifier to later on verify the constrainsts using these scalars as required. ### Extension of Tables To extend the base columns of the tables, to form extension columns, we use the random scalars received by applying Fiat Shamir throught the zipped base codeword. The extension columns are then also padded to the instruction table's height, evaluated on the expanded offset domain for all the tables. Then these extension codewords are zipped similarly and Fiat Shamir is applied to this zipped extension codeword to get another set of challenges which are used to form the combination polynomial. ## Combination Polynomial For each table, once we've formed the AIR constraint equations and divided them by their respective zerofiers, we receive quotient polynomials for every table. Now, we have to verify that these quotient polynomials are of the correct degree bound or that the prover has not acted maliciously. To be able to verify this by applying the FRI protocol to a common polynomial equation evaluation rather than to mutiple quotient polynomials' evaluations of the five tables, we form a combination polynomial. **How do we form the combination polynomial?** We combine the quotient polynomials into one polynomial using random weights or challenges received from the verifier. For every quotient polynomial $q_i(X)$, there is a degree bound $b_i$ originating from the known height (instruction table) $H$, the AIR constraints, and the zerofier degree. The prover combines the nonlinear combination: $$ \sum_{i=0} \alpha_i \cdot q_i(X) + \beta_i \cdot X^{d-b_i} \cdot q_i(X), $$ where the weights $\alpha_i$ and $\beta_i$ are provided by the verifier, and where $d$ is the maximum degree bound provably be FRI. The codeword associated with this polynomial is the input to FRI. In other words, the input to FRI is the sum of all codewords multiplied with weights $\alpha_i$ and again with weights $X^{d-b_i}$ $\beta_i$. The second part is necessary to adequately bound the individual degrees of $q_i(X)$. The FRI protocol is applied to the evaluation of this combination polynomial over the evaluation domain, or the combination codeword. ## FRI Protocol FRI is a completely standalone protocol that establishes that a given Merkle root decommits to a codeword corresponding to a low-degree polynomial (the combination codeword). The nonlinear combination is separately checked (checking that the combination polynomial has been formed correctly from the quotient polynomials), in a list of uniformly random indices sampled independently from FRI. Checking consistensy of FRI and consistency of the non linear combination separately increases the proof size as well as prover and verifier work, but it makes the interface between the FRI and trace/AIR parts cleaner. This helps if we want to apply other modifications to our STARK Engine such as the Halo-style recursion, whereby the recursive verifier verifies the nonlinear combination only, and the FRI part is postponed until the last possible step. --- We have seen an overview of how our STARK Engine works. In the [next blogpost](https://hackmd.io/@Muskan05/B1AGebo8Jx) we will discuss the AIR constraints and Arithmetization in detail.