u32 operations in Miden VM

# u32 operations in Miden VM ###### tags: `notes` In this note we assume that all values are elements in a 64-bit field defined by modulus $2^{64} - 2^{32} + 1$. One of the nice properties of this field is that multiplying two elements in this field with values less than $2^{32}$ does not overflow field modulus. In the sections below, we describe how we can implement common operations on u32 values (i.e., 32-bit unsigned integers) in this field very efficiently. This includes arithmetic operations, comparison operations, and bitwise operations. Vast majority of operations rely only on 16-bit range checks which can be implemented efficiently using a methodology described in [this note](https://hackmd.io/D-vjBYtHQB2BuOB-HMUG5Q). Some operations, such as bitwise `AND`, `OR`, and `XOR` require lookups in additional auxiliary tables, which are described [here](https://hackmd.io/vejHasuZSVWZOafBLNS_YQ). ### Checking element validity Another nice property of this field is that checking whether four 16-bit values form a valid field element can be done relatively cheaply. Assume $t_0$, $t_1$, $t_2$, and $t_3$ are known to be 16-bit values (e.g., less than $2^{16}$), and we want to verify that $2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0$ is a valid field element. For simplicity, let's denote: $$ v_{hi} = 2^{16} \cdot t_3 + t_2 \\ v_{lo} = 2^{16} \cdot t_1 + t_0 $$ We can then impose the following constraint to verify element validity: $$ \left(1 - m \cdot (2^{32} - 1 - v_{hi})\right) \cdot v_{lo} = 0 $$ The above constraint holds only if either of the following hold: * $v_{lo} = 0$ * $v_{hi} \ne 2^{32} - 1$ To satisfy the latter constraint, the prover would need to set $m = (2^{32} - 1 - v_{hi})^{-1}$, which is possible only when $v_{hi} \ne 2^{32} - 1$. This constraint is sufficient because modulus $2^{64} - 2^{32} + 1$ in binary representation is 32 ones, followed by 31 zeros, followed by a single one: $$ 1111111111111111111111111111111100000000000000000000000000000001 $$ This implies that the largest possible 64-bit value encoding a valid field element would be 32 ones, followed by 32 zeros: $$ 1111111111111111111111111111111100000000000000000000000000000000 $$ Thus, for a 64-bit value to encode a valid field element, either the lower 32 bits must be all zeros, or the upper 32 bits must not be all ones (which is $2^{32} - 1$). ## Conversions Converting field elements into u32 values may be accomplished via two operations `U32SPLIT` and `U32CAST`. Supporting these operations requires 5 helper registers. ### Splitting Assume $a$ is a field element. We define `U32SPLIT` operation as $a \rightarrow (b, c)$, where $b$ contains lower 32-bits of $a$, and $c$ contains the upper 32 bits of $a$. The diagram below illustrates this graphically. ![](https://i.imgur.com/6V2l0nI.png) To facilitate this operation, the prover needs to provide five non-deterministic 'hint' values $t_0$, $t_1$, $t_2$, $t_3$, and $m$ such that: $$ a = 2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0 \\ b = 2^{16} \cdot t_1 + t_0 \\ c = 2^{16} \cdot t_3 + t_2 \\ \left(1 - m \cdot (2^{32} - 1 - c)\right) \cdot b = 0 $$ The last constraint ensures that the decomposition of $a$ into four 16-bit values is done in such a way that these values encode a valid field element (as described [here](#Checking-element-validity)). We also need to enforce constraints against $t_0$, $t_1$, $t_2$, $t_3$ to make sure that they are 16-bit values, and this can be done via permutation-based range checks. ### Casting Assume $a$ is a field element. We define `U32CAST` operation as $a \rightarrow b$, where $b$ contains lower 32-bits of $a$. The diagram below illustrates this graphically. ![](https://i.imgur.com/izR2lXy.png) To facilitate this operation, the prover needs to provide five non-deterministic 'hint' values $t_0$, $t_1$, $t_2$, $t_3$, and $m$ such that: $$ a = 2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0 \\ b = 2^{16} \cdot t_1 + t_0 \\ \left(1 - m \cdot (2^{32} - 2^{16} \cdot t_3 - t_2 - 1)\right) \cdot b = 0 $$ The last constraint ensures that the decomposition of $a$ into four 16-bit values is done in such a way that these values encode a valid field element (as described [here](#Checking-element-validity)). We also need to enforce constraints against $t_0$, $t_1$, $t_2$, $t_3$ to make sure that they are 16-bit values, and this can be done via permutation-based range checks. ## Addition Assume $a$ and $b$ are known to be 32-bit values. `U32ADD` operation computes $a + b \rightarrow (c, d)$, where $c$ contains the low 32-bits of the result and $d$ is the carry bit. The diagram below illustrates this graphically. ![](https://i.imgur.com/7QSjSCW.png) To facilitate this operation, the prover needs to provide two non-deterministic 'hint' values $c_0$ and $c_1$ such that: $$ a + b = c + 2^{32} \cdot d \\ d^2 - d = 0 \\ c = 2^{16} \cdot c_1 + c_0 $$ We also need to enforce constraints against $c_0$ and $c_1$ to make sure that they are 16-bit values, and this can be done via permutation-based range checks. ### Addition with carry In implementing efficient big integer arithmetic, it may be convenient to have an "addition with carry" operation which has the following semantics. Assume $a$ and $b$ are known to be 32-bit values, and $c$ is known to be a binary value (either $1$ or $0$). `U32ADDC` operation computes $a + b + c \rightarrow (d, e)$, where $d$ contains the low 32-bits of the result and $e$ is the carry bit. The diagram below illustrates this graphically. ![](https://i.imgur.com/5mCZlF7.png) To facilitate this operation, the prover needs to provide two non-deterministic 'hint' values $d_0$ and $d_1$ such that: $$ a + b + c = d + 2^{32} \cdot e \\ e^2 - e = 0 \\ d = 2^{16} \cdot d_1 + d_0 $$ We also need to enforce constraints against $d_0$ and $d_1$ to make sure that they are 16-bit values, and this can be done via permutation-based range checks. ## Subtraction Assume $a$ and $b$ are known to be 32-bit values. `U32SUB` operation computes $a - b \rightarrow (c, d)$, where $c$ contains the 32-bit result in two's complement and $d$ is the borrow bit. The diagram below illustrates this graphically. ![](https://i.imgur.com/HfcOM6J.png) To facilitate this operation, the prover needs to provide two non-deterministic 'hint' values $c_0$ and $c_1$ such that: $$ a = b + c - 2^{32} \cdot d \\ d^2 - d = 0 \\ c = 2^{16} \cdot c_1 + c_0 $$ We also need to enforce constraints against $c_0$ and $c_1$ to make sure that they are 16-bit values, and this can be done via permutation-based range checks. ## Multiplication Assume $a$ and $b$ are known to be 32-bit values. `U32MUL` operation computes $a \cdot b \rightarrow (c, d)$, where $c$ contains the low 32 bits of the result and $d$ contains the upper 32 bits of the result. The diagram below illustrates this graphically. ![](https://i.imgur.com/jolgNBN.png) To facilitate this operation, the prover needs to provide five non-deterministic 'hint' values $c_0$, $c_1$, $d_0$, $d_1$, and $m$ such that: $$ a \cdot b = 2^{48} \cdot d_1 + 2^{32} \cdot d_0 + 2^{16} \cdot c_1 + c_0 \\ c = 2^{16} \cdot c_1 + c_0 \\ d = 2^{16} \cdot d_1 + d_0 \\ \left(1 - m \cdot (2^{32} - 1 - d)\right) \cdot c = 0 $$ The last constraint ensures that the decomposition of $a \cdot b$ into four 16-bit values is done in such a way that these values encode a valid field element (as described [here](#Checking-element-validity)). We also need to enforce constraints against $c_0$, $c_1$, $d_0$, $d_1$ to make sure that they are 16-bit values, and this can be done via permutation-based range checks. ### Multiply-add In implementing efficient big integer arithmetic, it may be convenient to have an "multiply-add" operation which has the following semantics. Assume $a$, $b$, and $c$ are known to be 32-bit values. `U32MADD` operation computes $a \cdot b + c \rightarrow (d, e)$, where $d$ contains the low 32 bits of the result and $e$ contains the upper 32 bits of the result. The diagram below illustrates this graphically. ![](https://i.imgur.com/OAugJ6D.png) To facilitate this operation, the prover needs to provide five non-deterministic 'hint' values $d_0$, $d_1$, $e_0$, $e_1$, and $m$ such that: $$ a \cdot b + c = 2^{48} \cdot e_1 + 2^{32} \cdot e_0 + 2^{16} \cdot d_1 + d_0 \\ d = 2^{16} \cdot d_1 + d_0 \\ e = 2^{16} \cdot e_1 + e_0 \\ \left(1 - m \cdot (2^{32} - 1 - e)\right) \cdot d = 0 $$ The last constraint ensures that the decomposition of $a \cdot b + c$ into four 16-bit values is done in such a way that these values encode a valid field element (as described [here](#Checking-element-validity)). We also need to enforce constraints against $d_0$, $d_1$, $e_0$, $e_1$ to make sure that they are 16-bit values, and this can be done via permutation-based range checks. **Note**: that the above constraints guarantee the correctness of the operation iff $a \cdot b + c$ cannot overflow field modules (which is the case for the field with modulus $2^{64} - 2^{32} + 1$). ## Division Assume $a$ and $b$ are known to be 32-bit values. `U32DIV` operation computes $a / b \rightarrow (c, d)$, where $c$ contains the quotient and $d$ contains the remainder. The diagram below illustrates this graphically. ![](https://i.imgur.com/o3KZtzM.png) To facilitate this operation, the prover needs to provide four non-deterministic 'hint' values $t_0$, $t_1$, $t_2$, $t_3$ such that: $$ a = b \cdot c + d \\ a - c = 2^{16} \cdot t_1 + t_0 \\ b - d - 1= 2^{16} \cdot t_2 + t_3 $$ The second constraint enforces that $c \leq a$, while the third constraint enforces that $d < b$. We also need to enforce constraints against $t_0$, $t_1$, $t_2$, $t_3$ to make sure that they are 16-bit values, and this can be done via permutation-based range-checks. ## Comparisons Assume $a$ and $b$ are known to be 32-bit values. We define `U32LT` and `U32GT` operations as follows: * $lt(a, b) \rightarrow c$, where $c = 1$ iff $a < b$, and $0$ otherwise. * $gt(a, b) \rightarrow c$, where $c = 1$ iff $a > b$, and $0$ otherwise. Both of these operations can be emulated using `U32SUB` operation. For example, `U32SUB SWAP DROP` would be equivalent to `U32LT`, and `U32SUB SWAP DROP NOT` would be equivalent to `U32GT`. But, if we wanted to have these operations execute in just a single VM cycle, we could do it as described below. We do not cover equality comparisons in this note because equality comparisons for field elements and u32 values are identical. ### Less than Stack transition for `U32LT` operation would look as follows: ![](https://i.imgur.com/n7q4ccS.png) To facilitate this operation, the prover needs to provide four non-deterministic 'hint' values $t_0$ and $t_1$ such that: $$ a = b + (t_1 \cdot 2^{16} + t_0) - 2^{32} \cdot c \\ c^2 - c = 0 $$ Here, $t_0$ and $t_1$ hold the lower and upper 16 bits of $a - b$ in two's complement. Thus, we also need to enforce constraints against $t_0$ and $t_1$ to make sure that they are 16-bit values, and this can be done via permutation-based range-checks. ### Greater than Stack transition for `U32GT` operation looks identical to the stack transition of the `U32LT` operation, but the constraint we need to enforce are slightly different. Specifically: $$ b = a + (t_1 \cdot 2^{16} + t_0) - 2^{32} \cdot c \\ c^2 - c = 0 $$ Here, $t_0$ and $t_1$ hold the lower and upper 16 bits of $b - a$ in two's complement (rather than $a - b$). And we also need to enforce additional constraints to make sure that $t_0$ and $t_1$ are 16-bit values. ## Bit shifts To perform u32 bit shift operations in a single VM cycle, we need to introduce an additional (6th) helper register. We also need to rely on a lookup table which maps integers in the range between $0$ and $32$ to the corresponding powers of two (i.e., $0 \rightarrow 1$, $1 \rightarrow 2$, $2 \rightarrow 4$ etc.). AIR for such a table can be defined relatively easily. ### Left shift Assume $a$ is known to be a 32-bit value and $b$ is an integer in the range $[0, 32)$. `U32SHL` operation computes $shl(a, b) \rightarrow c$, where $c$ is the result of performing a bitwise shift of $a$ by $b$ bits to the left. The diagram below illustrates this graphically. ![](https://i.imgur.com/Lq0YGZo.png) To facilitate this operation, the prover needs to provide six non-deterministic 'hint' values $t_0$, $t_1$, $t_2$, $t_3$, $m$, and $p$ such that: $$ p = 2^b \\ a \cdot p = 2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0 \\ c = 2^{16} \cdot t_1 + t_0 \\ \left(1 - m \cdot (2^{32} - 2^{16} \cdot t_3 - t_2 - 1)\right) \cdot c = 0 $$ The first constraint needs to be enforced via a permutation-based lookup table. The last constraint ensures that the decomposition of $a \cdot p$ into four 16-bit values is done in such a way that these values encode a valid field element (as described [here](#Checking-element-validity)). We also need to enforce constraints against $t_0$, $t_1$, $t_2$, $t_3$ to make sure that they are 16-bit values, and this can be done via permutation-based range checks. ### Right shift Assume $a$ is known to be a 32-bit value and $b$ is an integer in the range $[0, 32)$. `U32SHR` operation computes $lsh(a, b) \rightarrow c$, where $c$ is the result of performing a bitwise shift of $a$ by $b$ bits to the right. The diagram below illustrates this graphically. ![](https://i.imgur.com/QYQ8Mjs.png) To facilitate this operation, the prover needs to provide six non-deterministic 'hint' values $t_0$, $t_1$, $t_2$, $t_3$, $m$, and $p$ such that: $$ p = 2^{32-b} \\ a \cdot p = 2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0 \\ c = 2^{16} \cdot t_3 + t_2 \\ \left(1 - m \cdot (2^{32} - 1 - c)\right) \cdot (2^{16} \cdot t_1 + t_0) = 0 $$ The first constraint needs to be enforced via a permutation-based lookup table. The last constraint ensures that the decomposition of $a \cdot p$ into four 16-bit values is done in such a way that these values encode a valid field element (as described [here](#Checking-element-validity)). We also need to enforce constraints against $t_0$, $t_1$, $t_2$, $t_3$ to make sure that they are 16-bit values, and this can be done via permutation-based range checks. ## Bit rotations Similarly to bit shift operations, performing bit rotation operations in a single VM cycle requires a 6th helper register and also relies on a lookup table which maps integers in the range between $0$ and $32$ to the corresponding powers of two. ### Rotate left Assume $a$ is known to be a 32-bit value and $b$ is an integer in the range $[0, 32)$. `U32ROTL` operation computes $rotl(a, b) \rightarrow c$, where $c$ is the result of performing a bitwise rotation of $a$ by $b$ bits to the left. The diagram below illustrates this graphically. ![](https://i.imgur.com/mULvf5F.png) To facilitate this operation, the prover needs to provide six non-deterministic 'hint' values $t_0$, $t_1$, $t_2$, $t_3$, $m$, and $p$ such that: $$ p = 2^b \\ a \cdot p = 2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0 \\ c = (2^{16} \cdot t_3 + t_2) + (2^{16} \cdot t_1 + t_0) \\ \left(1 - m \cdot (2^{32} - 2^{16} \cdot t_3 - t_2 - 1)\right) \cdot (2^{16} \cdot t_1 + t_0) = 0 $$ The first constraint needs to be enforced via a permutation-based lookup table. The last constraint ensures that the decomposition of $a \cdot p$ into four 16-bit values is done in such a way that these values encode a valid field element (as described [here](#Checking-element-validity)). We also need to enforce constraints against $t_0$, $t_1$, $t_2$, $t_3$ to make sure that they are 16-bit values, and this can be done via permutation-based range checks. ### Rotate right Assume $a$ is known to be a 32-bit value and $b$ is an integer in the range $[0, 32)$. `U32ROTR` operation computes $rotr(a, b) \rightarrow c$, where $c$ is the result of performing a bitwise rotation of $a$ by $b$ bits to the right. The diagram below illustrates this graphically. ![](https://i.imgur.com/TbN5Zfi.png) To facilitate this operation, the prover needs to provide six non-deterministic 'hint' values $t_0$, $t_1$, $t_2$, $t_3$, and $p$ such that: $$ p = 2^{32 - b} \\ a \cdot p = 2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0 \\ c = (2^{16} \cdot t_3 + t_2) + (2^{16} \cdot t_1 + t_0) \\ \left(1 - m \cdot (2^{32} - 2^{16} \cdot t_3 - t_2 - 1)\right) \cdot (2^{16} \cdot t_1 + t_0) = 0 $$ The first constraint needs to be enforced via a permutation-based lookup table. The last constraint ensures that the decomposition of $a \cdot p$ into four 16-bit values is done in such a way that these values encode a valid field element (as described [here](#Checking-element-validity)). We also need to enforce constraints against $t_0$, $t_1$, $t_2$, $t_3$ to make sure that they are 16-bit values, and this can be done via permutation-based range checks. ## Bitwise AND Assume $a$ and $b$ are known to be a 32-bit values. `U32AND` operation computes $and(a, b) \rightarrow c$, where $c$ is the result of performing a bitwise AND of $a$ and $b$. The diagram below illustrates this graphically. ![](https://i.imgur.com/lqOYO5l.png) To facilitate this operation, we will need to perform a lookup in a table described [here](https://hackmd.io/vejHasuZSVWZOafBLNS_YQ#Bitwise-operations). The prover will also need to provide a non-deterministic 'hint' for the lookup (row address $r$). The lookup in the table can be accomplished by including the following values into the lookup product: $$ \beta + \alpha \cdot r + \alpha^2 \cdot a + \alpha^3 \cdot b $$ $$ \beta + \alpha \cdot (r + 7) + \alpha^2 \cdot c $$ ## Bitwise OR Bitwise OR operation `U32OR` can be performed in the same way as the bitwise AND operation, but we'll need to use a different lookup table - the one specialized for computing bitwise OR operations. ## Bitwise XOR Bitwise XOR operation `U32XOR` can be performed in the same way as the bitwise AND operation, but we'll need to use a different lookup table - the one specialized for computing bitwise XOR operations. ## Bitwise NOT Assume $a$ is known to be a 32-bit value. `U32NOT` operation computes $not(a) \rightarrow b$, where $b$ is a bitwise negation of $a$. The diagram below illustrates this graphically. ![](https://i.imgur.com/TZAcnd2.png) To facilitate this operation we need to enforce the following constraints: $$ b = 2^{32} - a - 1 $$