# 30-bit Limbs of Multi-Precision Integers
Multiplication of two 32-bit integers in WASM is implemented by the `i64.mul` instruction. In particular, we represent each 32-bit integer as an `i64` type integer and store the result as a `i64` type integer. We cannot use `i32.mul` because it only stores the lower half of the 64-bit multiplication result in a `i32` type value, unlike in x86 where the `mul` instruction can store both the upper half and the lower half 32-bit values in two registers respectively.
The iterative Montgomery product algorithm:
```
S[j] = 0 for j in 0 to n-1
for i in 0 to n-1
t = S[0] + x[i] * y[0]
(_, t') = t
(_, q[i]) = u[0] * t'
(c, _) = t + q[i] * p[0]
for j in 1 to n-1
(c, S[j-1]) = S[j] + x[i]*y[i] + q[i]*p[j] + c
S[n-1] = c
```
BigInt `S[n]` is width $w$ limbs stored in 32-bit integers.
We'd like to avoid handling the carry `c` in the above algorithm. According to [1], doing the carrying on the terms `S[j] + x[i]*y[i] + q[i]*p[j] + c`takes up almost half of the runtime. Presumbly because to calculate `(c, val) = a * b`, we need one bitwise right-shift `i64.shr_u` and a bitwise bitwise and `i64.and` to retrieve `c` and `val` from the product of `a * b`. When $w=32$, `S[j] + x[i]*y[i] + q[i]*p[j]` would always have a carry.
How many product term (e.g. x[i]*y[i]) can we add together without overflow a `i64` integer? For a w-by-w product term, we have $xy<2^{2w}$. We can add $k$ such terms without overflow if $k2^{2w} \le 2^{64}$. Solving for $k$ gives,
```
k = 1 if w = 32
k = 4 if w = 31
k = 16 if w = 30
k = 64 if w = 29
k = 256 if w = 28
```
How many terms needed to be added in our algorithm?
## References
[1] https://github.com/mitschabaude/montgomery#13-x-30-bit-multiplication
###### tags: `msm` `zkp` `public`