u32 operations in Miden VM

tags: notes

In this note we assume that all values are elements in a 64-bit field defined by modulus \(2^{64} - 2^{32} + 1\). One of the nice properties of this field is that multiplying two elements in this field with values less than \(2^{32}\) does not overflow field modulus.

In the sections below, we describe how we can implement common operations on u32 values (i.e., 32-bit unsigned integers) in this field very efficiently. This includes arithmetic operations, comparison operations, and bitwise operations.

Vast majority of operations rely only on 16-bit range checks which can be implemented efficiently using a methodology described in this note. Some operations, such as bitwise AND, OR, and XOR require lookups in additional auxiliary tables, which are described here.

Checking element validity

Another nice property of this field is that checking whether four 16-bit values form a valid field element can be done relatively cheaply. Assume \(t_0\), \(t_1\), \(t_2\), and \(t_3\) are known to be 16-bit values (e.g., less than \(2^{16}\)), and we want to verify that \(2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0\) is a valid field element.

For simplicity, let's denote:

\[ v_{hi} = 2^{16} \cdot t_3 + t_2 \\ v_{lo} = 2^{16} \cdot t_1 + t_0 \]

We can then impose the following constraint to verify element validity:

\[ \left(1 - m \cdot (2^{32} - 1 - v_{hi})\right) \cdot v_{lo} = 0 \]

The above constraint holds only if either of the following hold:

  • \(v_{lo} = 0\)
  • \(v_{hi} \ne 2^{32} - 1\)

To satisfy the latter constraint, the prover would need to set \(m = (2^{32} - 1 - v_{hi})^{-1}\), which is possible only when \(v_{hi} \ne 2^{32} - 1\).

This constraint is sufficient because modulus \(2^{64} - 2^{32} + 1\) in binary representation is 32 ones, followed by 31 zeros, followed by a single one:

\[ 1111111111111111111111111111111100000000000000000000000000000001 \]

This implies that the largest possible 64-bit value encoding a valid field element would be 32 ones, followed by 32 zeros:

\[ 1111111111111111111111111111111100000000000000000000000000000000 \]

Thus, for a 64-bit value to encode a valid field element, either the lower 32 bits must be all zeros, or the upper 32 bits must not be all ones (which is \(2^{32} - 1\)).

Conversions

Converting field elements into u32 values may be accomplished via two operations U32SPLIT and U32CAST. Supporting these operations requires 5 helper registers.

Splitting

Assume \(a\) is a field element. We define U32SPLIT operation as \(a \rightarrow (b, c)\), where \(b\) contains lower 32-bits of \(a\), and \(c\) contains the upper 32 bits of \(a\). The diagram below illustrates this graphically.

To facilitate this operation, the prover needs to provide five non-deterministic 'hint' values \(t_0\), \(t_1\), \(t_2\), \(t_3\), and \(m\) such that:

\[ a = 2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0 \\ b = 2^{16} \cdot t_1 + t_0 \\ c = 2^{16} \cdot t_3 + t_2 \\ \left(1 - m \cdot (2^{32} - 1 - c)\right) \cdot b = 0 \]

The last constraint ensures that the decomposition of \(a\) into four 16-bit values is done in such a way that these values encode a valid field element (as described here). We also need to enforce constraints against \(t_0\), \(t_1\), \(t_2\), \(t_3\) to make sure that they are 16-bit values, and this can be done via permutation-based range checks.

Casting

Assume \(a\) is a field element. We define U32CAST operation as \(a \rightarrow b\), where \(b\) contains lower 32-bits of \(a\). The diagram below illustrates this graphically.

To facilitate this operation, the prover needs to provide five non-deterministic 'hint' values \(t_0\), \(t_1\), \(t_2\), \(t_3\), and \(m\) such that:

\[ a = 2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0 \\ b = 2^{16} \cdot t_1 + t_0 \\ \left(1 - m \cdot (2^{32} - 2^{16} \cdot t_3 - t_2 - 1)\right) \cdot b = 0 \]

The last constraint ensures that the decomposition of \(a\) into four 16-bit values is done in such a way that these values encode a valid field element (as described here). We also need to enforce constraints against \(t_0\), \(t_1\), \(t_2\), \(t_3\) to make sure that they are 16-bit values, and this can be done via permutation-based range checks.

Addition

Assume \(a\) and \(b\) are known to be 32-bit values. U32ADD operation computes \(a + b \rightarrow (c, d)\), where \(c\) contains the low 32-bits of the result and \(d\) is the carry bit. The diagram below illustrates this graphically.

To facilitate this operation, the prover needs to provide two non-deterministic 'hint' values \(c_0\) and \(c_1\) such that:

\[ a + b = c + 2^{32} \cdot d \\ d^2 - d = 0 \\ c = 2^{16} \cdot c_1 + c_0 \]

We also need to enforce constraints against \(c_0\) and \(c_1\) to make sure that they are 16-bit values, and this can be done via permutation-based range checks.

Addition with carry

In implementing efficient big integer arithmetic, it may be convenient to have an "addition with carry" operation which has the following semantics. Assume \(a\) and \(b\) are known to be 32-bit values, and \(c\) is known to be a binary value (either \(1\) or \(0\)). U32ADDC operation computes \(a + b + c \rightarrow (d, e)\), where \(d\) contains the low 32-bits of the result and \(e\) is the carry bit. The diagram below illustrates this graphically.

To facilitate this operation, the prover needs to provide two non-deterministic 'hint' values \(d_0\) and \(d_1\) such that:

\[ a + b + c = d + 2^{32} \cdot e \\ e^2 - e = 0 \\ d = 2^{16} \cdot d_1 + d_0 \]

We also need to enforce constraints against \(d_0\) and \(d_1\) to make sure that they are 16-bit values, and this can be done via permutation-based range checks.

Subtraction

Assume \(a\) and \(b\) are known to be 32-bit values. U32SUB operation computes \(a - b \rightarrow (c, d)\), where \(c\) contains the 32-bit result in two's complement and \(d\) is the borrow bit. The diagram below illustrates this graphically.

To facilitate this operation, the prover needs to provide two non-deterministic 'hint' values \(c_0\) and \(c_1\) such that:

\[ a = b + c - 2^{32} \cdot d \\ d^2 - d = 0 \\ c = 2^{16} \cdot c_1 + c_0 \]

We also need to enforce constraints against \(c_0\) and \(c_1\) to make sure that they are 16-bit values, and this can be done via permutation-based range checks.

Multiplication

Assume \(a\) and \(b\) are known to be 32-bit values. U32MUL operation computes \(a \cdot b \rightarrow (c, d)\), where \(c\) contains the low 32 bits of the result and \(d\) contains the upper 32 bits of the result. The diagram below illustrates this graphically.

To facilitate this operation, the prover needs to provide five non-deterministic 'hint' values \(c_0\), \(c_1\), \(d_0\), \(d_1\), and \(m\) such that:

\[ a \cdot b = 2^{48} \cdot d_1 + 2^{32} \cdot d_0 + 2^{16} \cdot c_1 + c_0 \\ c = 2^{16} \cdot c_1 + c_0 \\ d = 2^{16} \cdot d_1 + d_0 \\ \left(1 - m \cdot (2^{32} - 1 - d)\right) \cdot c = 0 \]

The last constraint ensures that the decomposition of \(a \cdot b\) into four 16-bit values is done in such a way that these values encode a valid field element (as described here). We also need to enforce constraints against \(c_0\), \(c_1\), \(d_0\), \(d_1\) to make sure that they are 16-bit values, and this can be done via permutation-based range checks.

Multiply-add

In implementing efficient big integer arithmetic, it may be convenient to have an "multiply-add" operation which has the following semantics. Assume \(a\), \(b\), and \(c\) are known to be 32-bit values. U32MADD operation computes \(a \cdot b + c \rightarrow (d, e)\), where \(d\) contains the low 32 bits of the result and \(e\) contains the upper 32 bits of the result. The diagram below illustrates this graphically.

To facilitate this operation, the prover needs to provide five non-deterministic 'hint' values \(d_0\), \(d_1\), \(e_0\), \(e_1\), and \(m\) such that:

\[ a \cdot b + c = 2^{48} \cdot e_1 + 2^{32} \cdot e_0 + 2^{16} \cdot d_1 + d_0 \\ d = 2^{16} \cdot d_1 + d_0 \\ e = 2^{16} \cdot e_1 + e_0 \\ \left(1 - m \cdot (2^{32} - 1 - e)\right) \cdot d = 0 \]

The last constraint ensures that the decomposition of \(a \cdot b + c\) into four 16-bit values is done in such a way that these values encode a valid field element (as described here). We also need to enforce constraints against \(d_0\), \(d_1\), \(e_0\), \(e_1\) to make sure that they are 16-bit values, and this can be done via permutation-based range checks.

Note: that the above constraints guarantee the correctness of the operation iff \(a \cdot b + c\) cannot overflow field modules (which is the case for the field with modulus \(2^{64} - 2^{32} + 1\)).

Division

Assume \(a\) and \(b\) are known to be 32-bit values. U32DIV operation computes \(a / b \rightarrow (c, d)\), where \(c\) contains the quotient and \(d\) contains the remainder. The diagram below illustrates this graphically.

To facilitate this operation, the prover needs to provide four non-deterministic 'hint' values \(t_0\), \(t_1\), \(t_2\), \(t_3\) such that:

\[ a = b \cdot c + d \\ a - c = 2^{16} \cdot t_1 + t_0 \\ b - d - 1= 2^{16} \cdot t_2 + t_3 \]

The second constraint enforces that \(c \leq a\), while the third constraint enforces that \(d < b\). We also need to enforce constraints against \(t_0\), \(t_1\), \(t_2\), \(t_3\) to make sure that they are 16-bit values, and this can be done via permutation-based range-checks.

Comparisons

Assume \(a\) and \(b\) are known to be 32-bit values. We define U32LT and U32GT operations as follows:

  • \(lt(a, b) \rightarrow c\), where \(c = 1\) iff \(a < b\), and \(0\) otherwise.
  • \(gt(a, b) \rightarrow c\), where \(c = 1\) iff \(a > b\), and \(0\) otherwise.

Both of these operations can be emulated using U32SUB operation. For example, U32SUB SWAP DROP would be equivalent to U32LT, and U32SUB SWAP DROP NOT would be equivalent to U32GT. But, if we wanted to have these operations execute in just a single VM cycle, we could do it as described below.

We do not cover equality comparisons in this note because equality comparisons for field elements and u32 values are identical.

Less than

Stack transition for U32LT operation would look as follows:

To facilitate this operation, the prover needs to provide four non-deterministic 'hint' values \(t_0\) and \(t_1\) such that:

\[ a = b + (t_1 \cdot 2^{16} + t_0) - 2^{32} \cdot c \\ c^2 - c = 0 \]

Here, \(t_0\) and \(t_1\) hold the lower and upper 16 bits of \(a - b\) in two's complement. Thus, we also need to enforce constraints against \(t_0\) and \(t_1\) to make sure that they are 16-bit values, and this can be done via permutation-based range-checks.

Greater than

Stack transition for U32GT operation looks identical to the stack transition of the U32LT operation, but the constraint we need to enforce are slightly different. Specifically:

\[ b = a + (t_1 \cdot 2^{16} + t_0) - 2^{32} \cdot c \\ c^2 - c = 0 \]

Here, \(t_0\) and \(t_1\) hold the lower and upper 16 bits of \(b - a\) in two's complement (rather than \(a - b\)). And we also need to enforce additional constraints to make sure that \(t_0\) and \(t_1\) are 16-bit values.

Bit shifts

To perform u32 bit shift operations in a single VM cycle, we need to introduce an additional (6th) helper register. We also need to rely on a lookup table which maps integers in the range between \(0\) and \(32\) to the corresponding powers of two (i.e., \(0 \rightarrow 1\), \(1 \rightarrow 2\), \(2 \rightarrow 4\) etc.). AIR for such a table can be defined relatively easily.

Left shift

Assume \(a\) is known to be a 32-bit value and \(b\) is an integer in the range \([0, 32)\). U32SHL operation computes \(shl(a, b) \rightarrow c\), where \(c\) is the result of performing a bitwise shift of \(a\) by \(b\) bits to the left. The diagram below illustrates this graphically.

To facilitate this operation, the prover needs to provide six non-deterministic 'hint' values \(t_0\), \(t_1\), \(t_2\), \(t_3\), \(m\), and \(p\) such that:

\[ p = 2^b \\ a \cdot p = 2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0 \\ c = 2^{16} \cdot t_1 + t_0 \\ \left(1 - m \cdot (2^{32} - 2^{16} \cdot t_3 - t_2 - 1)\right) \cdot c = 0 \]

The first constraint needs to be enforced via a permutation-based lookup table. The last constraint ensures that the decomposition of \(a \cdot p\) into four 16-bit values is done in such a way that these values encode a valid field element (as described here). We also need to enforce constraints against \(t_0\), \(t_1\), \(t_2\), \(t_3\) to make sure that they are 16-bit values, and this can be done via permutation-based range checks.

Right shift

Assume \(a\) is known to be a 32-bit value and \(b\) is an integer in the range \([0, 32)\). U32SHR operation computes \(lsh(a, b) \rightarrow c\), where \(c\) is the result of performing a bitwise shift of \(a\) by \(b\) bits to the right. The diagram below illustrates this graphically.

To facilitate this operation, the prover needs to provide six non-deterministic 'hint' values \(t_0\), \(t_1\), \(t_2\), \(t_3\), \(m\), and \(p\) such that:

\[ p = 2^{32-b} \\ a \cdot p = 2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0 \\ c = 2^{16} \cdot t_3 + t_2 \\ \left(1 - m \cdot (2^{32} - 1 - c)\right) \cdot (2^{16} \cdot t_1 + t_0) = 0 \]

The first constraint needs to be enforced via a permutation-based lookup table. The last constraint ensures that the decomposition of \(a \cdot p\) into four 16-bit values is done in such a way that these values encode a valid field element (as described here). We also need to enforce constraints against \(t_0\), \(t_1\), \(t_2\), \(t_3\) to make sure that they are 16-bit values, and this can be done via permutation-based range checks.

Bit rotations

Similarly to bit shift operations, performing bit rotation operations in a single VM cycle requires a 6th helper register and also relies on a lookup table which maps integers in the range between \(0\) and \(32\) to the corresponding powers of two.

Rotate left

Assume \(a\) is known to be a 32-bit value and \(b\) is an integer in the range \([0, 32)\). U32ROTL operation computes \(rotl(a, b) \rightarrow c\), where \(c\) is the result of performing a bitwise rotation of \(a\) by \(b\) bits to the left. The diagram below illustrates this graphically.

To facilitate this operation, the prover needs to provide six non-deterministic 'hint' values \(t_0\), \(t_1\), \(t_2\), \(t_3\), \(m\), and \(p\) such that:

\[ p = 2^b \\ a \cdot p = 2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0 \\ c = (2^{16} \cdot t_3 + t_2) + (2^{16} \cdot t_1 + t_0) \\ \left(1 - m \cdot (2^{32} - 2^{16} \cdot t_3 - t_2 - 1)\right) \cdot (2^{16} \cdot t_1 + t_0) = 0 \]

The first constraint needs to be enforced via a permutation-based lookup table. The last constraint ensures that the decomposition of \(a \cdot p\) into four 16-bit values is done in such a way that these values encode a valid field element (as described here). We also need to enforce constraints against \(t_0\), \(t_1\), \(t_2\), \(t_3\) to make sure that they are 16-bit values, and this can be done via permutation-based range checks.

Rotate right

Assume \(a\) is known to be a 32-bit value and \(b\) is an integer in the range \([0, 32)\). U32ROTR operation computes \(rotr(a, b) \rightarrow c\), where \(c\) is the result of performing a bitwise rotation of \(a\) by \(b\) bits to the right. The diagram below illustrates this graphically.

To facilitate this operation, the prover needs to provide six non-deterministic 'hint' values \(t_0\), \(t_1\), \(t_2\), \(t_3\), and \(p\) such that:

\[ p = 2^{32 - b} \\ a \cdot p = 2^{48} \cdot t_3 + 2^{32} \cdot t_2 + 2^{16} \cdot t_1 + t_0 \\ c = (2^{16} \cdot t_3 + t_2) + (2^{16} \cdot t_1 + t_0) \\ \left(1 - m \cdot (2^{32} - 2^{16} \cdot t_3 - t_2 - 1)\right) \cdot (2^{16} \cdot t_1 + t_0) = 0 \]

The first constraint needs to be enforced via a permutation-based lookup table. The last constraint ensures that the decomposition of \(a \cdot p\) into four 16-bit values is done in such a way that these values encode a valid field element (as described here). We also need to enforce constraints against \(t_0\), \(t_1\), \(t_2\), \(t_3\) to make sure that they are 16-bit values, and this can be done via permutation-based range checks.

Bitwise AND

Assume \(a\) and \(b\) are known to be a 32-bit values. U32AND operation computes \(and(a, b) \rightarrow c\), where \(c\) is the result of performing a bitwise AND of \(a\) and \(b\). The diagram below illustrates this graphically.

To facilitate this operation, we will need to perform a lookup in a table described here. The prover will also need to provide a non-deterministic 'hint' for the lookup (row address \(r\)). The lookup in the table can be accomplished by including the following values into the lookup product:

\[ \beta + \alpha \cdot r + \alpha^2 \cdot a + \alpha^3 \cdot b \]

\[ \beta + \alpha \cdot (r + 7) + \alpha^2 \cdot c \]

Bitwise OR

Bitwise OR operation U32OR can be performed in the same way as the bitwise AND operation, but we'll need to use a different lookup table - the one specialized for computing bitwise OR operations.

Bitwise XOR

Bitwise XOR operation U32XOR can be performed in the same way as the bitwise AND operation, but we'll need to use a different lookup table - the one specialized for computing bitwise XOR operations.

Bitwise NOT

Assume \(a\) is known to be a 32-bit value. U32NOT operation computes \(not(a) \rightarrow b\), where \(b\) is a bitwise negation of \(a\). The diagram below illustrates this graphically.

To facilitate this operation we need to enforce the following constraints:

\[ b = 2^{32} - a - 1 \]

Select a repo