To do: - consider speccing a "Plonkish IR" encompassing Halo2, Plonk-3, Plonk-4, ACIR with different configs - spec out ACIR, Evaluated, Partially Evaluated, CCS - sketch out transformation algorithms (flatten/unflatten, evaluate/unevaluate, others?) and their difficulties ## Diagram https://excalidraw.com/#json=yJaqzS5aODMD2aJCyYrun,9YgROu_OmfnX2b8_wUfWXg ## VAMP-IR Modules Every IR in vamp-ir can be expressed as a *module*. A module is a tuple $\text{(pubs, definitions, expressions, auxiliary data)}$. What differentiates one IR from another is the shape of the expressions used in the definitions and constraints, and the presence and type of auxiliary data. ### pubs This is a list of variables which will be treated as *public*. These are normal variables in an arithmetic circuit but may have designated locations in an IR. ### definitions A *definition* defines a function by binding a function body to a name, with designated inputs and, optionally, an output. Functions have scope and can contain other functions and constraints. Definitions also contain the instructions for *witness generation* and may include the `fresh` keyword and operations other than $+, -, \times$. Each variable ought to have a definition and the definitions should be ordered such that each definition ought to refer only to previously defined variables. ### expressions An *expression* is a tree of infix operations $(+, -, \times)$ whose leaves are constants, variables, tuples, lists, or function calls. ### constraints A *constraint* is an equality between two expressions. ### auxiliary data Auxiliary data may be derived from an IR and included in the module to assist in efficient transformation to another IR. An example of auxiliary data that may be helpful is *cycle counts* for transformations into STARK-like representations e.g. AIR and AIRScript. ## Example VAMP-IR circuit ``` // This is a circuit for a game in which a player earns points for providing // numbers a and b which satisfy the Pythagorean relation with a public // hypotenuse length c. The player's score and their triangle leg lengths // a and b remain private. // The player submits // - private witnesses (score, new_score, a, b) // - and public witness c def is_pyth x y z = { // return 0 if x^2 + y^2 = z^2 // return 1 otherwise def r = x^2 + y^2 - z^2; // r = 0 => r_inv = 0 // r != 0 => r_inv = 1/r def r_inv = fresh (1|r); (r*r_inv-1)*r = 0; r*r_inv }; pub c; new_score = score + is_pyth a b c; ``` ## Initial IR After the AST is type-checked we receive a module that retains function calls in its representation and has been only minimally flattened or processed. This IR is close to the initial AST from the parser. This IR has no auxiliary data. The expression form is a tree whose nodes are labeled with nodes labeled with $\text{+, -,}$ or $\times$ and whose leaves are constants, variables, or function calls. ``` Constraint = Equal(Variable | Constant, Expr) | Equal(Variable | Constant, Function) | Equal((Variable | Constant, Variable | Constant, ...), Function) Expr = Add(Expr, Expr) | Subtract(Expr, Expr) | Multiply(Expr, Expr) | Negate(Expr) | Variable | Constant ``` ### Example ``` pubs = [ c ]; definitions = [ def is_pyth x y z = { def r = (((x^2)+(y^2))-(z^2)); def r_inv = fresh (1|r); (((r*r_inv)-1)*r) = 0; r*r_inv } ]; constraints = [ new_score = score + is_pyth a b c; ]; auxiliary_data = []; Constraints: 1 Variables: 5 ``` ## Evaluated *Evaluated* is an IR with no remaining function calls, but its expressions may be in any state of flattening. ``` Constraint = Equal(Variable | Constant, Expr) Expr = Add(Expr, Variable | Constant) | Subtract(Variable | Constant, Variable | Constant) | Multiply(Variable | Constant, Variable | Constant) | Negate(Variable | Constant) | Variable | Constant ``` ### Example ``` pubs = [ c ]; definitions = [ def v1 = (((x^2)+(y^2))-(z^2)), def v2 = fresh (1|v1), def v3 = (v1*v2-1)*v1, def v4 = v1*v2, ]; constraints = [ v1 = (((a^2)+(b^2))-(c^2)); v3 = (v1*v2-1)*v1; v4 = v1*v2; new_score = score + v4; v3 = 0; ]; auxiliary_data = []; Constraints: 5 Variables: 6 ``` ## Three-address code (3AC) 3AC is totally flattened and totally evaluated. The expression form is either a single node of a tree labeled with $\text{+, -,}$ or $\times$, or is a leaf (i.e. a variable or a constant). Equations are a single, unique, variable on the left and an expression on the right. This IR has no auxiliary data. ### Description Allowed equations: $\quad z = x + y$ $\quad z = x - y$ $\quad z = x\times y$ $\quad y = -x$ $\quad y = x$ ... where $x$, $y$, and $z$ may each be a variable or a constant. ``` Constraint = Equal(Variable | Constant, Expr) Expr = Add(Expr, Variable | Constant) | Subtract(Variable | Constant, Variable | Constant) | Multiply(Variable | Constant, Variable | Constant) | Negate(Variable | Constant) | Variable | Constant ``` ### Example ``` pubs = [ c, ]; definitions = [ v1 = a * a, v2 = b * b, v3 = v1 + v2, v4 = c * c, v5 = v3 - v4, v7 = 1|v5, v8 = v7, v9 = v5 * v8, v10 = v9 - 1, v11 = v10 * v5, v12 = v5 * v8, v13 = score + v12, ]; constraints = [ v1 = a * a, v2 = b * b, v3 = v1 + v2, v4 = c * c, v5 = v3 - v4, v8 = v7, v9 = v8 * v5, v10 = v9 - 1, v11 = v10 * v5, v12 = v5 * v8, v11 = 0, new_score = v13, ]; auxiliary_data = []; Constraints: 12 Variables: 17 ``` ## Eliminated *Eliminated* form is a maximally unflattened representation with no function calls remaining. All function calls have been evaluated and all unnecessary variables have been eliminated via substitution. The expression form is simply a multivariate polynomial with no restrictions on degree or number of variables. This IR has no auxiliary data. Eliminated form is perhaps most useful for human inspection of circuits, say, in LaTeX. ### Description Allowed equations: $\displaystyle \quad y = \sum_{i=0}^n\left(c_i\prod_{j=0}^m xj^{e_{i,j}}\right)$ ... where the $c_i$ are constants, and $y$ and the $xj$ are variables. ``` Constraint = Equal(Variable | Constant, Expr) Expr = Add(Expr, Expr) | Subtract(Expr, Expr) | Multiply(Expr, Expr) | Negate(Expr) | Variable | Constant ``` ### Example ``` pubs = [ c, ]; definitions = [ v1 = fresh (1|((a*a + b*b) - (c*c))), ]; constraints = [ 0 = ((a*a + b*b) - (c*c))*v1 - 1) * ((a*a + b*b) - (c*c)), new_score = score + ((a*a + b*b) - (c*c))*v1, ]; Constraints: 2 Variables: 6 ``` ## Partially Evaluated and Flattened (PEF) This is similar to the Initial IR, except that calls to functions not on a "whitelist" have been evaluated and flattened. The whitelist forms the auxiliary data in this IR. Expressions retaining white-listed functions have the form: `(tuple of outputs) = white_listed_fn(inputs)` An expression not containing a white-listed function will be purely arithmetic with no function calls. During synthesis, Vamp-IR will replace expressions retaining white-listed functions with an optimized gadget available in the backend. ### Description Allowed expressions: $\quad z = x + y$ $\quad z = x - y$ $\quad z = x\times y$ $\quad y = -x$ $\quad y = x$ $\quad y = f(x1, x2, ...)$ $\quad (y_1, y_2, ...) = f(x1, x2, ...)$ ``` Constraint = Equal((Variable | Constant)^n, Expr) Expr = Add(Expr, Expr) | Subtract(Expr, Expr) | Multiply(Expr, Expr) | Negate(Expr) | Variable | Constant | Function((Variable | Constant)^k) ``` ### Example ``` pubs = [ c ]; definitions = [ def is_pyth x y z = { def r = (((x^2)+(y^2))-(z^2)); def r_inv = fresh (1|r); (((r*r_inv)-1)*r) = 0; r*r_inv }, ]; constraints = [ v1 = is_pyth a b c new_score = score + v1; ]; auxiliary_data = [ "is_pyth" ]; Constraints: 2 Variables: 6 ``` ## Plonkish (Halo2, zk-garage/plonk, ACIR, others) *Plonkish* is a family of IRs parameterized by a configuration in its auxiliary data. The configuration in the auxiliary data consists of: - a list of $k$ fixed columns (also called selectors) - a list of $n$ advice columns (also called witnesses) - a list of $m$ instance columns (also called public inputs) - $D$, the maximum constraint degree - a list of multivariate polynomial constraints (standard and custom gates) - a list of allowed lookup queries (partitions of advice columns) See more at: https://zcash.github.io/halo2/concepts/arithmetization.html ### Description Allowed expressions: $\quad z = x + y$ $\quad z = x - y$ $\quad z = x\times y$ $\quad y = -x$ $\quad y = x$ $\quad custom_{out} = custom(custom_{in})$ $\quad lookup_{out} = lookup(lookup_{in})$ ...where the $n+m$ advice and instance columns are partitioned into a list of outputs ($custom_{out}$) and a list of inputs ($custom_{in})$ for each custom gate. For lookups, $lookup_{out}$ and $lookup_{in}$ only partition the $n$ advice columns. Custom gates are multivariate polynomials that must be satisfied by the $k+n+m$ available advice and instance columns, along with a partition of the advice and instance columns into "inputs" and "outputs". Lookup queries are a partition of the $n$ advice columns into "inputs" and "outputs", much like a function signature. The actual logic of the function is defined elsewhere. ### Example An example of a circuit using the configuration defined in the auxiliary data. The auxiliary data is given in JSON and defines a configuration with width-5 constraints with a custom gate (`fifth_power`) and a lookup gate (`matrix`). Notice that some definitions have been written to correspond to the available custom gates and lookups in the configuration. A programmer desiring efficient circuits would write their Vamp-IR code in this manner, but this is not enforced. This is a 16-bit hash circuit inspired (extremely loosely) by the Poseidon SNARK friendly hash function which uses matrix multiplication and fifth powers. ``` pubs = []; definitions = [ def fifth_power x1 = { def x2 = x1^5; x2 }, def matrix x1 x2 = { def new_1 = 3*x1 + 7*x2; def new_2 = -2*x1 + 5*x2; def x3 = range_8 (fresh (new_1 % 2^8)); def x4 = range_8 (fresh (new_2 % 2^8)); (x3, x4) }, def bool x = { x^2 = x; x }, def range_2 x = { def lo = bool (fresh (x % 2)); def hi = bool (fresh (x \ 2)); x = 2*hi + lo; x }, def range_4 x = { def lo = range_2 (fresh (x % 2^2)); def hi = range_2 (fresh (x \ 2^2)); x = 2^2*hi + lo; x }, def range_8 x = { def lo = range_4 (fresh (x % 2^4)); def hi = range_4 (fresh (x \ 2^4)); x = 2^4*hi + lo; x }, def hash_round lo hi = { matrix (fifth_power lo) (fifth_power hi) }, def hash x = { def lo = range_8 (fresh (x % 2^8)); def hi = range_8 (fresh (x \ 2^8)); def (new_lo, new_hi) = iter 10 hash_round lo hi; 2^8*hi_mod + lo_mod }, ]; constraints = [ y = hash x, ]; auxiliary_data = { "fixed_columns": ["a", "b", "c", "d", "e"], "advice_columns": ["x1", "x2", "x3", "x4"], "instance_columns": [], "D": 5, "custom_gates": [ { "standard": { "constraint": a*x1*x2 + b*x1 + c*x2 + d*x3 + e*x4", "inputs": ["x1", "x2", "x4"], "outputs": ["x3"] }, "fifth_power": { "constraint": "x2 - x1^5", "inputs": ["x1"], "outputs" ["x2"] }, } ], "lookups": [ { "matrix": { "inputs": ["x1", "x2"], "outputs": ["x3", "x4"] } } ] } ``` ## R1CS R1CS uses constraints of the form $C = A\times B$ where $A$, $B$, and $C$ are *linear combinations* of variables and constants. For convenience, we designate one variable in $C$ to be the "out" variable and write only it on the left. The rest of the terms in $C$ are moved to the right. This IR needs no auxiliary data. ### Formal definition Allowed equations: $\displaystyle z = \left(\sum_{i=0}^n a_i xi\right)\left(\sum_{i=0}^n b_i xi\right) - \left(\sum_{i=0}^n c_i xi\right)$ ### Example ``` pubs = [ c, ]; definitions = [ v1 = (2*a)*b, v2 = (a + b - c)*(a + b + c) - v1, v3 = 1|v2, v4 = v2 * v3, ]; constraints = [ v1 = (2*a)*b, v2 = (a + b - c)*(a + b + c) - v1, v4 = v2 * v3, 0 = (v4 - 1)*v2, new_score = v4 + score, ]; auxiliary_data = []; Constraints: 5 Variables: 9 ``` ## AirScript AirScript is an intermediate representation used by PolygonMiden which compiles to their Winterfell STARK proof system. STARKs are very inefficient unless the circuit conforms to certain structure: namely, that the entire circuit can be written as a bounded loop of a relatively small subcircuit. If a Vamp-IR circuit has this structure it is fairly simple to compile it into AirScript by computing some auxiliary data. ### Auxiliary Data Format In order to compile Vamp-IR to AirScript we need: - a list of private main register names - (optional) a list of private auxiliary register names - at least one public register with a name and a length - a hashmap from Vamp-IR variable names to AirScript register names - every variable in the Vamp-IR document needs to be mapped uniquely to a private or public register name with the appropriate suffix if needed (`.first`, `.last`, `'`) - a partition of the constraints into boundary constraints and integrity constraints ### Generating the Auxiliary Data Consider the simple case of an iterated function in Vamp-IR. The function below has the same number of inputs as outputs and so it works a lot like a register machine with registers `a`, `b`, `c`, and `d`. The entire circuit is a single function iterated 8 times, along with some constraints on the initial inputs and final outputs of the function. ``` def my_function (a, b, c, d) = { def a_next = a + c; def b_next = b + d; def c_next = c - 1; def d_next = 2*d; (a_next, b_next, c_next, d_next) }; (a_out, b_out, c_out, d_out) = iter 8 my_function(a_in, b_in, c_in, d_in); c_out = 0; b_in = a_in; ``` 1. Evaluate all function calls within `my_function`, replacing them with the appropriate constraints and definitions. Do *not* evaluate the intrinsic `iter`. 2. Condense the constraints within `my_function` by eliminating all intermediate variables. This will reduce the number of registers we need. 3. Create a register name as a string for each non-public parameter of `my_function` and each internal non-public variable in `my_function`, *excluding the output variable(s)*. Add an entry for each variable to the variable name map. 4. Create a register name for each public input accessed by `my_function`. Pubs are declared globally by name in Vamp-IR so these names should be reused here. A list of these names and the length of each register needed becomes the data in `public_inputs` above. Add an entry for each public variable to the variable name map. AirScript requires at least one public variable, so if the list of pubs is empty a dummy public variable should be created. 5. Loop through the list of expressions and add an entry to the partition hashmap for each expression. Boundary constraints will only reference variables mapped to `(register name).first` or `(register name).last`. Integrity constraints will only reference variables mapped to `(register name)` or `(register name)'`. A constraint that references variables from both classes is invalid. ### Example ``` pubs = []; definitions = [ def my_function (a, b, c, d) = { def a_next = a + c; def b_next = b + d; def c_next = c - 1; def d_next = 2*d; (a_next, b_next, c_next, d_next) }, ]; constraints = [ a_next = a + c, b_next = b + d, c_next = c - 1, d_next = 2*d, c_out = 0, b_in = a_in, ]; auxiliary_data = { "main_registers": ["a", "b", "c", "d"], "auxregisters": [], "public_inputs": ["dummy", 1], "variable_mapping": { "a": "a", "b": "b", "c": "c", "d": "d", "a_in": "a.first", "b_in": "b.first", "c_in": "c.first", "d_in": "d.first", "a_out": "a.last", "b_out": "b.last", "c_out": "c.last", "d_out": "d.last", "a_next": "a'", "b_next": "b'", "c_next": "c'", "d_next": "d'", }, "periodic_columns": [], "constraints": { "integrity": [ "a_next = a + c", "b_next = b + d", "c_next = c - 1", "d_next = 2*d", ], "boundary": [ "c_out = 0", "b_in = a_in", ] } } ``` ### Compiling more complex Vamp-IR circuits to AirScript The scheme above does not make use of *periodic columns*. Periodic columns allow different constraints to be active on different cycles. We can use the periodic columns to allow for more complex Vamp-IR circuits which have some constraints on computation done before or after the iterated function. (Hash functions come to mind as an example of this.) The scheme above also does not use any *auxiliary trace columns*. Auxiliary trace columns can access new randomness from the Verifier. (Author note: I am not sure what auxiliary trace columns are used for. I suspect they are needed for folding or recursion.) ## CCS *Custom constraint system* or CCS is a new constraint system that generalizes Plonkish, R1CS, and AIR constraints. Mappings from other constraint systems into CCS are given in this paper: https://eprint.iacr.org/2023/552.pdf ## Transformations ### Flattening *Flattening* is the process by which constraints are broken down into a list of smaller constraints by adding auxiliary variables. Flattening increases the total number of constraints. For example, the constraint $d = a\times(b+c)$ can be flattened to $$v_1 = b + c\\ d = a\times v_1$$ by adding the auxiliary variable $v_1$. ### Unflattening (elimination) *Unflattening* is the reverse of flattening, where auxiliary variables are eliminated and the total number of constraints decreases. ### Partial flattening according to a list of allowed constraint forms Total flattening and total unflattening are not difficult. However, most proof systems can only handle constraints of a certain size and shape, so totally unflattened constraints may not "fit" with a particular proof system. On the other hand, a totally flattened constraint system will not be very efficient, due to the large number of constraints which are only partially filled. In "vanilla" Plonk, a maximally filled constraint will have the form $$Q_m x y + Q_l x + Q_r y + Q_o z + Q_c + w = 0$$ where $Q_m$, $Q_l$, $Q_r$, $Q_o$, and $Q_c$ are constants; $x$, $y$, and $z$ are private variables; and $w$ is a public input. This same constraint expressed in 3AC, which is totally flattened, would be this list of 11 constraints: $$\begin{align} v_1 &= x\times y\\ v_2 &= Q_m\times v_1\\ v_3 &= Q_l\times x\\ v_4 &= Q_r\times y\\ v_5 &= Q_o\times z\\ v_6 &= v_2 + v_3\\ v_7 &= v_4 + v_5\\ v_8 &= v_6 + v_7\\ v_9 &= Q_c + w\\ v_{10} &= v_8 + v_9\\ v_{10} &= 0 \end{align}$$ Likewise, a single constraint in fully eliminated form may not fit into a single vanilla Plonk constraint, and will need to partially flattened into pieces that are small enough to fit in a single constraint. In order to represent circuits efficiently we will need to use either a bottom-up or top-down process: 1. Partially unflatten 3AC constraints by eliminating variables until no more variables can be eliminated while remaining in the target form, or 2. Partially flatten totally eliminated constraints by adding auxililary variables until each constraint fits into the target form. ### Algorithms for partial unflattening Partial unflattening requires a function that can check if an expression is allowed in the target constraint system. Call this function `check`. ``` fn check (to_be_checked: Expression, reference_list: [Expression]) -> {0, 1} ``` 1. Beginning in 3AC form, order the equations so that variables in the expressions on the right side refer occur on the left side of equations earlier in the list. 2. Create a hashmap with an entry for each equation, associating a variable to a 3AC expression. Ignore any simple equalities that use an already defined variable. ``` H: Variable -> Expression ``` 3. Loop through the list of equations $eq_i$ 1. For `eq[i] = Equal(v1, InfixOp(v2, v3))`: 1. If `check(Subtract(InfixOp(H(v2), v3), v1, reference_list) = 1` 1. Mutate the equation to `Equal(v1, InfixOp(H(v2), v3))`. 1. Then if `check(Subtract(InfixOp(H(v2), H(v3)), v1, reference_list) = 1`, mutate `eq[i]` to `Equal(v1, InfixOp(H(v2), H(v3))`. 2. Else, if `check(Subtract(InfixOp(v2, H(v3)), v1, reference_list) = 1` 1. Mutate `eq[i]` to `Equal(v1, InfixOp(v2, H(v3)))`. The function `check()` can be performed with subtree matching, with eqvuialence relations for commutativity, associativity, the distributive property, etc.