# 30-bit Limbs of Multi-Precision Integers Multiplication of two 32-bit integers in WASM is implemented by the `i64.mul` instruction. In particular, we represent each 32-bit integer as an `i64` type integer and store the result as a `i64` type integer. We cannot use `i32.mul` because it only stores the lower half of the 64-bit multiplication result in a `i32` type value, unlike in x86 where the `mul` instruction can store both the upper half and the lower half 32-bit values in two registers respectively. The iterative Montgomery product algorithm: ``` S[j] = 0 for j in 0 to n-1 for i in 0 to n-1 t = S[0] + x[i] * y[0] (_, t') = t (_, q[i]) = u[0] * t' (c, _) = t + q[i] * p[0] for j in 1 to n-1 (c, S[j-1]) = S[j] + x[i]*y[i] + q[i]*p[j] + c S[n-1] = c ``` BigInt `S[n]` is width $w$ limbs stored in 32-bit integers. We'd like to avoid handling the carry `c` in the above algorithm. According to [1], doing the carrying on the terms `S[j] + x[i]*y[i] + q[i]*p[j] + c`takes up almost half of the runtime. Presumbly because to calculate `(c, val) = a * b`, we need one bitwise right-shift `i64.shr_u` and a bitwise bitwise and `i64.and` to retrieve `c` and `val` from the product of `a * b`. When $w=32$, `S[j] + x[i]*y[i] + q[i]*p[j]` would always have a carry. How many product term (e.g. x[i]*y[i]) can we add together without overflow a `i64` integer? For a w-by-w product term, we have $xy<2^{2w}$. We can add $k$ such terms without overflow if $k2^{2w} \le 2^{64}$. Solving for $k$ gives, ``` k = 1 if w = 32 k = 4 if w = 31 k = 16 if w = 30 k = 64 if w = 29 k = 256 if w = 28 ``` How many terms needed to be added in our algorithm? ## References [1] https://github.com/mitschabaude/montgomery#13-x-30-bit-multiplication ###### tags: `msm` `zkp` `public`