# mopro 22 milestone 4 report [TOC] ↖️ [backlink for miletstone 3](/v8WPAG8-RsCANrbv_seSXw) <!-- Backlog for now since the conclusion of the discussion with Vivian is that we will integrate msm into either zkmopro/circom-compat or halo2. Then, import the modified version of those in mopro. https://github.com/zkmopro/mopro/pull/221#issuecomment-2247714099 --> ## Summary This milestone benchmarks previously built MSM algorithms on real iOS devices, showing trends similar to those on laptops and simulators. Some larger MSM instances couldn't be benchmarked due to a [GPU hang error]((https://hackmd.io/v8WPAG8-RsCANrbv_seSXw#GPU-Hang-Error-on-mobile-device)), which is being addressed with by us and the metal dev community. Additionally, the MSM code has been isolated into an importable crate to be adapted to the refactored mopro. Future work includes (1) integrating the GPU-acceleration crate into various proving systems (e.g. [zkmopro/circom-compat](https://github.com/zkmopro/circom-compat), [halo2-fibonacci](https://github.com/ElusAegis/halo2-fibonacci-sample)), (2) supporting curve-specific MSM algorithms, and (3) continuing research and development on accelerating MSM on mobile GPUs. ## Goals of this milestone The following are the 3 main targets of this milestone. 1. Refactor the MSM implementations from [zkmopro/mopro](https://github.com/zkmopro/mopro) into [zkmopro/gpu-acceleration](https://github.com/zkmopro/gpu-acceleration) to make it a standalone crate. 2. Organize the interface of `gpu-acceleration` crate to make it integrable for mopro. 3. Benchmark the MSM implementations on iOS device. > We focus on integrating all the algorithms we had done into `zkmopro/gpu-acceleration/gpu-exploration-app` Algorthms: * Bucket-wise MSM (on CPU) * Window-wise MSM with Precomputed Points (on CPU) * Window-wise Metal MSM (on GPU) * Bucket-wise Metal MSM (on GPU) ## Benchmarking MSM Algorithms on iOS Devices Detailed information for the testing device. ![telegram-cloud-photo-size-5-6140902175367347766-y](https://hackmd.io/_uploads/ryqXX-OKC.jpg) * Phone type: IPhone 14 pro * ChipSet: Apple A16 Bionic (4nm) * CPU: Hexa-core (2x3.46 GHz Everest + 4x2.02 GHz Sawtooth) * GPU: Apple GPU (5-core graphics) * Memory: 6GB RAM [hardware reference](https://www.gsmarena.com/apple_iphone_14_pro-11860.php) ### Methodology <!-- e.g. Device Setup, Benchmarking Process, Metrics --> The unit for all column is ms. ### Result | Instance Size | arkwork (baseline) | Metal (GPU) | Bucket Wise Msm | Precompute Msm | result(screen shot) | | ------------- | ------------------ | ---------------- | ------------------ | ------------------ | ---------------------------------------- | | 2^10 | 11.29 | 21.31 (-88.66%) | 80.54 (-613.07%) | 25.57 (-126.41%) | [![](https://i.imgur.com/tzjjEgQ.png)](https://i.imgur.com/tzjjEgQ.png)<br> | | 2^12 | 19.34 | 62.85 (-225.00%) | 331.15 (-1612.43%) | 82.93(-328.85%) | [![](https://i.imgur.com/ShdhJaS.png)](https://i.imgur.com/ShdhJaS.png)<br> | | 2^14 | 53.11 | - | 221.17 (-316.43%) | 265.68 (-400.24%) | [![](https://i.imgur.com/6cx0LwH.png)](https://i.imgur.com/6cx0LwH.png)<br> | | 2^16 | 198.11 | - | 759.87 (-283.57%) | 888.23 (-348.36) | [![](https://i.imgur.com/LRARN41.png)](https://i.imgur.com/LRARN41.png) | | 2^18 | 707.95 | - | 2938.61 (-315.09%) | 1429.06 (-101.86%) | [![](https://i.imgur.com/XE3BVFV.png)](https://i.imgur.com/XE3BVFV.png)<br> | | 2^20 | 1902.76 | - | 7341.82 (-285.85%) | - | [![](https://i.imgur.com/oJY6O0T.png)](https://i.imgur.com/oJY6O0T.png)<br> | --- ### Footnote When the instance size is larger than $2^{14}$, `MetalMsm` encounters a GPU Hang error. For sizes above $2^{18}$, `precomputeMsm` takes an excessive amount of time generating extended points, which might cause the process of generating the proving key and verification key to be too slow, as indicated in the log below. Additionally, deserialization takes even longer than in `arkwork (baseline)`. Therefore, `precomputeMsm` is not compared for instance sizes $2^{20}$ and $2^{22}$. ```sh Precomputation time for 5x(262144 points) with precompute_factor=19 is: 2064.716182334s ``` We have to optimize `precomputeMsm` and make it a more compatible version for groth16. ## Future Works This section outlines the future directions for our project, focusing on the integration of the GPU-acceleration crate, support for curve-specific MSM algorithms, and ongoing research on mobile GPU acceleration. 1. **Integration of GPU-Acceleration Crate:** We plan to integrate the GPU-acceleration crate into projects where proof computations are performed, such as zkmopro/circom-compat and halo2-fibonacci. Then, import the modified crates back to mopro for the actual acceleration. 2. **Supporting Curve Features:** We aim to support MSM algorithms tailored to different curves, ensuring optimal performance for each specific use case. 3. **R&D on Mobile GPU Acceleration:** We will continue our research into optimizing MSM algorithms for execution on mobile GPUs, seeking to overcome the limitations currently faced. ## Worklog * [X] Isolation of MSM. * [X] Modifying CI actions. * [X] Integrate rest of the Algorithms inside `gpu-acceleration/mopro-msm`. * [x] Integrate the MSMs back to `mopro-core` & `mopro-ffi` to make `gpu-exploration-app` execute these ffi functions. * [x] Benchmark all the MSMs we currently have and generate reports.