# mopro-21 milestone_1 report
:::info
Latest Updated: Jan. 17th, 2024
:::
[TOC]
> This page illustrate what we did in milestone 1 for mopro issue #21.
Current work can be seen in this [branch](https://github.com/FoodChain1028/mopro-21/tree/main)
## Things we have done
### Implement msm benchmarking
> path: `mopro-core/src/middleware/gpu_exploration/mod.rs`
Code related to benchmarking is under the `gpu-benchmarks` feature flag. Use vanilla arkworks msm and implement benchmarking as module that can be called in `mopro-ffi`. The benchmarking function will return a struct contains the statistics.
<u>**crates used in the file**</u>
- Timing: `std::time`
- Memory Usage: `jemallocator` and `jemalloc_ctl`
<u>**reproducing steps (from root directory):**</u>
1. `cd ./mopro-core`
2. `cargo test --release --features gpu-benchmarks --lib -- middleware::gpu_explorations::tests --nocapture`
3. result

<u>**The structure of BenchmarkResult**</u>
```rust
pub struct BenchmarkResult {
pub num_msm: u32,
pub avg_processing_time: f64,
pub total_processing_time: f64,
}
```
<u>**Generate the benchmarking report**</u>
> path: `mopro-core/src/middleware/gpu_exploration/bin/generate_benchmark_report.rs`
1. `cd ./mopro-core`
2. `cargo run --release --features gpu-benchmarks --package mopro-core --bin generate_benchmark_report`
3. Report will be in `mopro-core/benchmarks/gpu_explorations/msm_bench_rust_laptop.csv`
The statistics is attached in [the appendix](#Appendix-Current-benchmarking-result-on-laptop)
### Extend UDL file to support msm benchmarking
> path: `mopro-ffi/src/lib.rs`
`Msm-benchmarking` function was added and updated in `mopro-ffi` and `udl file` in `mopro-ffi/src`
<u>**reproducing steps (from root directory):**</u>
1. `cd ./mopro-ffi`
2. `cargo test --release --features gpu-benchmarks --package mopro-ffi --lib -- tests::test_run_msm_benchmark --exact --nocapture`
3. result

## Some Issues
1. We have no idea whether we should deal with multi-threads to run benchmarking in GPU. If this is necessary, we will have to find the solution to implement multi-threads on IOS GPU.
2. The current benchmarking results indicate that single_msm is being executed sequentially in the memory usage aspect. We still need to determine the actual execution method (concurrent or multithreaded) and make necessary adjustments.
## Appendix: Current benchmarking result on laptop

new benchmarks result is very different from the previous one (the main issue is that we did not use --release flag)
- Average time for single MSM: $\approx{}$ 0.3 ms
- time for 10_000 MSM: $\approx{}$ 3_000 ms
- Average allocated memory for single MSM: $\approx{}$ 50_000 Bytes
## Reference
- [arkworks groth16](https://github.com/arkworks-rs/groth16/blob/42b38f1675fd45aa4429a2d335653e37507ec95c/src/prover.rs#L66)
- [arkworks vanilla msm](https://github.com/arkworks-rs/algebra/blob/master/ec/src/scalar_mul/fixed_base.rs#L85)
- [jemalloc-ctl rust doc](https://docs.rs/jemalloc-ctl/latest/jemalloc_ctl/)
- [jemallocator rust doc](https://docs.rs/jemallocator/latest/jemallocator/)
- [std:time rust doc](https://doc.rust-lang.org/std/time/struct.Duration.html)
- [Chart](https://docs.google.com/spreadsheets/d/1cl96S-Cdh6YRIbwofVp_nyDngzsNXNhCy0X_yvhtcuw/edit?usp=sharing)