# mopro-21 milestone_1 report :::info Latest Updated: Jan. 17th, 2024 ::: [TOC] > This page illustrate what we did in milestone 1 for mopro issue #21. Current work can be seen in this [branch](https://github.com/FoodChain1028/mopro-21/tree/main) ## Things we have done ### Implement msm benchmarking > path: `mopro-core/src/middleware/gpu_exploration/mod.rs` Code related to benchmarking is under the `gpu-benchmarks` feature flag. Use vanilla arkworks msm and implement benchmarking as module that can be called in `mopro-ffi`. The benchmarking function will return a struct contains the statistics. <u>**crates used in the file**</u> - Timing: `std::time` - Memory Usage: `jemallocator` and `jemalloc_ctl` <u>**reproducing steps (from root directory):**</u> 1. `cd ./mopro-core` 2. `cargo test --release --features gpu-benchmarks --lib -- middleware::gpu_explorations::tests --nocapture` 3. result ![image](https://hackmd.io/_uploads/r12403NYT.png) <u>**The structure of BenchmarkResult**</u> ```rust pub struct BenchmarkResult { pub num_msm: u32, pub avg_processing_time: f64, pub total_processing_time: f64, } ``` <u>**Generate the benchmarking report**</u> > path: `mopro-core/src/middleware/gpu_exploration/bin/generate_benchmark_report.rs` 1. `cd ./mopro-core` 2. `cargo run --release --features gpu-benchmarks --package mopro-core --bin generate_benchmark_report` 3. Report will be in `mopro-core/benchmarks/gpu_explorations/msm_bench_rust_laptop.csv` The statistics is attached in [the appendix](#Appendix-Current-benchmarking-result-on-laptop) ### Extend UDL file to support msm benchmarking > path: `mopro-ffi/src/lib.rs` `Msm-benchmarking` function was added and updated in `mopro-ffi` and `udl file` in `mopro-ffi/src` <u>**reproducing steps (from root directory):**</u> 1. `cd ./mopro-ffi` 2. `cargo test --release --features gpu-benchmarks --package mopro-ffi --lib -- tests::test_run_msm_benchmark --exact --nocapture` 3. result ![image](https://hackmd.io/_uploads/rypd02NKa.png) ## Some Issues 1. We have no idea whether we should deal with multi-threads to run benchmarking in GPU. If this is necessary, we will have to find the solution to implement multi-threads on IOS GPU. 2. The current benchmarking results indicate that single_msm is being executed sequentially in the memory usage aspect. We still need to determine the actual execution method (concurrent or multithreaded) and make necessary adjustments. ## Appendix: Current benchmarking result on laptop ![msm benchmarks of time and memory usage](https://hackmd.io/_uploads/BJMR1GEYa.png) new benchmarks result is very different from the previous one (the main issue is that we did not use --release flag) - Average time for single MSM: $\approx{}$ 0.3 ms - time for 10_000 MSM: $\approx{}$ 3_000 ms - Average allocated memory for single MSM: $\approx{}$ 50_000 Bytes ## Reference - [arkworks groth16](https://github.com/arkworks-rs/groth16/blob/42b38f1675fd45aa4429a2d335653e37507ec95c/src/prover.rs#L66) - [arkworks vanilla msm](https://github.com/arkworks-rs/algebra/blob/master/ec/src/scalar_mul/fixed_base.rs#L85) - [jemalloc-ctl rust doc](https://docs.rs/jemalloc-ctl/latest/jemalloc_ctl/) - [jemallocator rust doc](https://docs.rs/jemallocator/latest/jemallocator/) - [std:time rust doc](https://doc.rust-lang.org/std/time/struct.Duration.html) - [Chart](https://docs.google.com/spreadsheets/d/1cl96S-Cdh6YRIbwofVp_nyDngzsNXNhCy0X_yvhtcuw/edit?usp=sharing)