# [BeH22] EdMSM: MSM for SNARKs and Faster Montgomery multiplication > IACR ePrint: https://eprint.iacr.org/2022/1400 ## Improve approach > Overall, the impl in Go outperforms the Rust baseline (Arkworks) by 40-47% 1. Optimize Coarsely Integrated Operand Scanning (CIOS) modular multiplication. - reduce from $4N^2+4N+2$ to $4N^2-N$, where N = 6 should be defined in a field of 128-bit secure elliptic curve. ![CIOS opt](https://hackmd.io/_uploads/r1f6cDN0a.png) 2. Reduce the mixed addition cost, which is the dominating cost for large MSM instance, by using the extended Jacobian coordinates with a twisted Edwards form. - this trick can apply to any elliptic curve that admits a twisted Edwards form. e.g. BLS24 and BN families (remarks: mopro is currently using BN254 curve for Groth16 proof) ![tEd curve with custom coordinates](https://hackmd.io/_uploads/BJfRjDNRp.png) ## Other optimizatin aspects 1. Parallelism: Step 2 of the Pippenger algorithm can be computed independently ![parallel on the Pippenger algo](https://hackmd.io/_uploads/H1XETP4RT.png) 2. Precomputation: precompute the base point on elliptic curve - Currently infeasible. For example, need 58GB for storing enough BLS12-377 curve points to produce a Groth16 proof when $n = 10^8$ 3. Fine-tune the architecture-level instruction - keep the carry in the CPU/GPU flag to avoid moving it to a register - e.g. x86, they use `ADCX`, `ADOX` for unsigned addition and `MULX` for multiplication <u>**Result of [BeH22] implementation of EdMSM**</u>: ![BeH22 impl result](https://hackmd.io/_uploads/Sk3N2wE0T.png)