# mopro 22 milestone 1 report
[TOC]
## Links
- [link to milestone 2 report](https://hackmd.io/YtTVJUArRmWrdvaTza2LGg)
## Things we have done
1. Researched zprize works and papers related to MSM acceleration.
2. Integrate `zprize2022/TrapdoorTech` works on msm into current mopro and get the benchmark of it.
3. Adapt the benchmarking method arkworks pippenger msm into zprize convention, which using 10 instances of $2^{16}$ points and scalars on BLS12_377 curve as benchmark data.
## Benchmark Result
We set up with the following parameters:
- Instance size: $2^{16} = 65536$
- Number of instance: 10
Overall, Arkworks Pippenger MSM algorithm (in arkworks 0.4 version) runs faster than TrapdoorTech Zprize work in 2022. Specifically, arkworks runs almost **300% faster on iOS simulator and 130% faster on iPhone 14 Pro**. A takeaway is that arkworks MSM runs nearly 50% slower on mobile device. We suspect that some optimizations by arkworks pippenger implementation might work less efficiently on the mobile device.
Since the other zprize work, [`nistath`](https://github.com/nistath/zprize-mobile-msm/tree/main), has the same benchmark results according to the [zprize blog](https://www.zprize.io/blog/zprize-i-spotlight-open-division-prize-7-msm-on-mobile), we decided to skip the integration of this work and proceed to the next milestone.
In conclusion, we will **focus on optimization of arkworks pippenger msm** as the next step.
To reproduce the following benchmarks on laptop with CPU, use the commands on root path i.e. `mopro/`. Replace $ALGO_NAME with one of {halo2curve_msm, arkworks_pippenger}.
- For $2^{16}$ size of instances
```
cargo test --release --features=gpu-benchmarks --package mopro-core --lib -- middleware::gpu_explorations::$ALGO_NAME::tests::test_run_benchmark --exact --nocapture
```
- For benchmark results for all different instance sizes
```
cargo test --release --features=gpu-benchmarks --package mopro-core --lib -- middleware::gpu_explorations::$ALGO_NAME::tests::test_run_multi_benchmarks --exact --nocapture
```
1. Result running of iOS simulator:
- logging:
```bash
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 135.93175ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 122.234958ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 130.396541ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 123.629666ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 145.457667ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 159.236792ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 134.852375ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 128.190292ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 138.075042ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 132.762041ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 128.630458ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 126.423125ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 128.428083ms
Done running benchmark. Check the result at: "../mopro-core/benchmarks/gpu_explorations"
Result of Arkwork (Baseline): 133.40375307692307 ms (diff: 0.0 %)
Running MSM in algorithm: TrapdoorTech Zprize...
Vectors already generated
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 559.812209ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 533.640042ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 525.178417ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 527.08525ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 525.088292ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 518.739ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 514.366458ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 519.871917ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 520.039125ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 528.623416ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 524.508458ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 525.372292ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 515.997166ms
Done running benchmark. Check the result at: mopro-core/"../mopro-core/benchmarks/gpu_explorations"
Result of TrapdoorTech Zprize: 526.0247724615385 ms (diff: -294.3103251062381 %)
```
- result screenshot
<img src="https://hackmd.io/_uploads/B1aNfCYlR.png" alt="MSM result on simulator 15 pro Max" height="500">
2. Result running on FoodChain's Iphone 14 pro
- logging
```bash
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 195.635083ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 206.639791ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 205.1675ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 197.742167ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 207.147166ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 199.729459ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 203.080416ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 198.15875ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 201.636916ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 210.273792ms
Done running benchmark. Check the result at: "../mopro-core/benchmarks/gpu_explorations"
Result of Arkwork (Baseline): 202.52110399999998 ms (diff: 0.0 %)
Running MSM in algorithm: TrapdoorTech Zprize...
Vectors already generated
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 470.619458ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 470.25775ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 471.028792ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 470.972375ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 471.092292ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 470.932375ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 471.304875ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 471.343375ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 470.9955ms
Average time to execute MSM with 65536 points and scalars in 1 iterations is: 470.738208ms
Done running benchmark. Check the result at: mopro-core/"../mopro-core/benchmarks/gpu_explorations"
Result of TrapdoorTech Zprize: 470.9284999999999 ms (diff: -132.53304998771878 %)
```
- result screenshot
<img src="https://hackmd.io/_uploads/BkdTW0FeC.png" alt="MSM result on iphone 14 pro" height="500">
3. Benchmark comparison on different size of points and scalars on Rust/Laptop
**<u>[BLS12_377] Arkworks Pippenger vs TrapdoorTech</u>**
- The benchmarking data is the same for all algorithms.
- Each scalar has 32 Bytes, which can represent 0 to 2^256 - 1 unsigned integer.
- The average msm time is calculated by averaging the msm time of 10 different random instance with the same size
| msm_size | Arkworks Pippenger <br> arkworks 0.4 (ms) | TrapdoorTech <br> zprize2022 (ms) |
|:--------:| ---------------------------------------------------------------- | -------------------------------------------------------- |
| 2^16 | 134.915549 | 473.977274 |
| 2^18 | 491.054058 | 1675.766808 |
| 2^20 | 1922.294874 | 5787.312958 |
| 2^22 | 6661.680833 | 20958.7608 |
| 2^24 | 27812.06526 | 86147.87218 |
**<u>[BN254/Apple M2 Chips] Arkworks Pippenger vs Halo2curve</u>**
- We have tested halo2curve's msm implementation of both single-core version (i.e. `multiexp_serial()`) and multi-core one (i.e. `best_multiexp()`)
| msm_size | Arkworks Pippenger <br> arkworks 0.4 (ms) | Halo2curve <br> multi-core (ms) | Halo2curve <br> single-core (ms) |
|:--------:| ----------------------------------------- | ------------------------------- | -------------------------------- |
| 2^16 | 82.19 | 116.79 | 464.35 |
| 2^18 | 307.24 | 410.98 | 1649.23 |
| 2^20 | 1140.88 | 1544.34 | 6154.86 |
| 2^22 | 4160.43 | 5254.70 | 20962.72 |
**<u>[BN254/Intel i7-1185G7] Arkworks Pippenger vs Halo2curve</u>**
- The "asm" feature is enabled in x86_64 architecture
| msm_size | Arkworks Pippenger <br> arkworks 0.4 (ms) | Halo2curve <br> multi-core (ms) |
|:--------:| ----------------------------------------- | ------------------------------- |
| 2^16 | 140.90 | 154.78 |
| 2^18 | 579.82 | 569.52 |
| 2^20 | 2974.86 | 2547.60 |
| 2^22 | 10732.08 | 9866.36 |
## Optimization methods comparison
| MSM <br> algorithms | Tuned window size | Operations on Twisted Edwards | Batch Addition | Signed Bucket Indexes | GLV Decomposition |
|:-----------------------------------------------------------------------------------------------------------------:| ----------------- | ----------------------------- | -------------- | --------------------- | ----------------- |
| [Arkworks 0.4](https://github.com/arkworks-rs/algebra/blob/master/ec/src/scalar_mul/variable_base/mod.rs) | ✅ | Optional | | ✅ | ✅ |
| [TrapdoorTech <br> zprizez 2022](https://github.com/Trapdoor-Tech/TrapdoorTech-zprize-mobile-harness/tree/master) | ✅ | ✅ | | | |
| [Halo2curve <br> multi-core](https://github.com/privacy-scaling-explorations/halo2curves/blob/main/src/msm.rs) | ✅ | | ✅ | ✅ | ✅ |
| [Arkmsm <br> by Snarkify](https://github.com/snarkify/arkmsm/tree/main) | ✅ | | ✅ | ✅ | ✅ |
## Working logs
1. Get benchmark of zprize works
- [x] create utils function to generate benchmarking points on BLS12_377 (**Moven**)
- [X] deserialize the points serialized in arkworks 0.3 version to arkworks 0.4 version (**Moven**)
- [X] add `arkworks-pippenger` msm to mopro-ffi's uniffi-binding (**Moven**)
- [X] benchmarking arkworks-pippenger msm in Swift.
- [X] add `Trapdoor` msm to mopro-ffi's uniffi-binding (**FoodChain**)
- [X] adding test in `mopro-ffi/src/lib/trapdoortech_zprize_msm`
- [x] adding test in `mopro-ffi/test/bindings` (yes, but not passed yet)
- [X] changing the result and error type to zprize for consistency.
`benchmark_msm` function format:
```rust
// the function running msm
pub fn benchmark_msm<I>(
benchmark_dir: &str,
instances: I,
iterations: u32,
) -> Result<Vec<Duration>, preprocess::HarnessError>
where I: Iterator<Item = preprocess::Instance>;
// the function running on IOS
pub fn run_benchmark(
instance_size: u32,
num_instance: u32,
utils_dir: &str,
benchmark_dir: &str,
) -> Result<BenchmarkResult, preprocess::MoproError>
```
- [X] Set up Download url for points and scalars in mobile device (**FoodChain**)
- [X] leveraging AWS S3 to store the files and provide the download url
- [X] benchmarking Trapdoor's msm in IOS. (**FoodChain**)
2. Select msm we want to use, supported by benchmarking result
- [X] (if zprize works is better) upgrade overall arkworks from 0.3 to 0.4 version to fit mopro. (`arkworks_pippenger` currently uses MSM in 0.4)
3. Backlog (the implementation had the same improvement of Trapdoor Tech's work and we have issues running their code, so it's not worthy to integrate)
- [ ] ~~add `nistath` msm to mopro-ffi's uniffi-binding~~
- [ ] ~~benchmarking `nistath` msm in Swift~~
Next Step (milestone 3):
- Metal API: For now we might use `metal-rs` to make the algorithm more compatible with `arkworks`.
- Optimizing ideas: please see this [page](https://hackmd.io/@moven0831/BeH22)
## Blockers
### ✅ Issue on Arkworks version
Both of the zprize works in Rust, [nistath](https://github.com/nistath/zprize-mobile-msm/tree/main) and [TrapdoorTech](https://github.com/Trapdoor-Tech/TrapdoorTech-zprize-mobile-harness/tree/master), are based on Arkworks version "0.3".
Therefore, some important components in `ark-ff` and `ark-ec` are deprecated on Arkworks "0.4", which is used in mopro.
For instance,
- Trait `AffineCurve` is not in ark_ec anymore.
- Method `serialize_unchecked()` is not in ark-serialize anymore.
- Method `serialized_size()` in not in `ark_ff::field` so that the `ark_bls12_377::fields::fr` cannot implement this.
- `ToBytes` and `FromBytes` in ark-ff have been removed by [ark-algebra#417](https://github.com/arkworks-rs/algebra/pull/417)
#### 0319
- [ChangeLog](https://github.com/arkworks-rs/algebra/blob/master/CHANGELOG.md) about Arkworks "0.3" to "0.4"
- [Commit](https://github.com/arkworks-rs/circom-compat/commit/b892c62597687c23341cda1e8e89d58bb6428f36) on "Update ark-circom for arkworks 0.4.0"
#### (Closed with [commit](https://github.com/oskarth/mopro/commit/48de8c170c1b1d6d1aab0cd72a8ed645371c2825)) 0324
- Solved by allowing multiple version with patch feature in Rust
```toml
# GPU explorations
ark-bls12-377 = { version = "0.3", optional = true }
ark-bls12-381 = { version = "0.3", optional = true }
ark-ed-on-bls12-377 = { version = "0.3", optional = true }
ark-ed-on-bls12-381 = { version = "0.3", optional = true }
ark-poly = { version = "0.3", optional = true }
ark-poly-commit = { version = "0.3", optional = true }
ark-sponge = { version = "0.3", optional = true }
duration-string = { version = "0.0.6", optional = true }
rand = { version = "0.8.0", optional = true }
rand_chacha = { version = "0.3.1", optional = true }
lazy_static = { version = "1.4.0", optional = true }
# GPU explorations from mopro/Cargo.toml patch
ark-ec-3 = { git = 'https://github.com/arkworks-rs/algebra.git', package = 'ark-ec', tag = 'v0.3.0', features = ["parallel"], optional = true}
ark-ff-3 = { git = 'https://github.com/arkworks-rs/algebra.git', package = 'ark-ff', tag = 'v0.3.0', features = ["parallel"], optional = true }
ark-serialize-3 = { git = 'https://github.com/arkworks-rs/algebra.git', package = 'ark-serialize', tag = 'v0.3.0', optional = true }
ark-std-3 = { git = 'https://github.com/arkworks-rs/std.git', package = 'ark-std', tag = 'v0.3.0', optional = true }
```
- try the benchmarking on `mopro/mopro-core`
```bash
cargo test --release --features=gpu-benchmarks --package mopro-core --lib -- middleware::gpu_explorations::trapdoortech_zprize_msm::tests::test_benchmark_msm --exact --nocapture
```
**<u>Result:</u>**
```
running 1 test
Running benchmark for baseline result
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 461.931291ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 463.537083ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 462.997125ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 464.71475ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 462.676667ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 462.52125ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 464.768708ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 463.627459ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 464.488875ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 463.715ms
Done running benchmark: Ok([461.931291ms, 463.537083ms, 462.997125ms, 464.71475ms, 462.676667ms, 462.52125ms, 464.768708ms, 463.627459ms, 464.488875ms, 463.715ms])
test middleware::gpu_explorations::trapdoortech_zprize_msm::tests::test_benchmark_msm ... ok
```
### ✅ Issue on benchmarking method
#### 0330
- Benchmarking points and scalar incompatible: zprize works are benchmarks in BLS12_377; current arkworks baseline are in BN254
- proposal: move the benchmarking point and scalar generation into `gpu_explorations/utils` and make current arkworks baseline in bls12_377
#### 0402
- Found that `ark_bls12_377::G1Affine` type is different from version 0.3 and 0.4, specifically one is using `short_weierstrass_jacobian` and another only `short_weierstrass`
**<u>Error msg:</u>**
```
error[E0117]: only traits defined in the current crate can be implemented for types defined outside of the crate
--> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:23:1
|
23 | impl From<ark_bls12_377_3::G1Affine> for G1Affine {
| ^^^^^-------------------------------^^^^^--------
| | | |
| | | `ark_ec::short_weierstrass::Affine` is not defined in the current crate
| | `ark_ec::short_weierstrass_jacobian::GroupAffine` is not defined in the current crate
| impl doesn't use only types from inside the current crate
|
= note: define and implement a trait or new type instead
```
- Solution: try to impl from() to convert the types
- find source code [`ark_ec::short_weierstrass::Affine`](https://docs.rs/ark-ec/0.4.0/src/ark_ec/models/short_weierstrass/affine.rs.html#68) and [`ark_ec::short_weierstrass_jacobian::GroupAffine`](https://docs.rs/ark-ec/0.3.0/src/ark_ec/models/short_weierstrass_jacobian.rs.html#74-150) then implement the transform method
#### 0409
- Found critical problem that the point repr method can be different in arkworks 0.3 and 0.4. Specifically, one is using Jacobian form to repr point and others use SW form.
- Failed code
```rust
for instance in instances {
let points = &instance.0;
for p in points {
let mut bytes = Vec::new();
let k = BigInt::new(p.x.0.0);
k.serialize_compressed(&mut bytes).unwrap();
println!("k: {:?}", k);
println!("len: {:?} bytes: {:?}", bytes.len(), bytes);
let new_p = G1Affine::deserialize_compressed(&*bytes).unwrap();
println!("new_p: {:?}", new_p);
}
}
```
```
running 1 test
dir: "src/middleware/gpu_explorations/utils/vectors/10x16"
k: BigInt([9731552286689537486, 12127634018691867890, 9776079508585867596, 14025949321989632482, 3173994719657650652, 1061268229925581])
len: 48 bytes: [206, 45, 69, 169, 62, 107, 13, 135, 242, 100, 18, 146, 226, 2, 78, 168, 76, 105, 60, 101, 132, 156, 171, 135, 226, 177, 210, 126, 152, 46, 166, 194, 220, 233, 40, 136, 97, 75, 12, 44, 205, 42, 219, 195, 55, 197, 3, 0]
thread 'middleware::gpu_explorations::arkworks_pippenger::tests::test_run_benchmark' panicked at mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:54:67:
called `Result::unwrap()` on an `Err` value: InvalidData
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
test middleware::gpu_explorations::arkworks_pippenger::tests::test_run_benchmark ... FAILED
```
- Success code (sample bls points in 0.4 version)
```rust
for instance in instances {
let points = &instance.0;
for p in points {
let mut bytes = Vec::new();
let p_a = G1Affine::rand(&mut rng);
p_a.x.serialize_compressed(&mut bytes).unwrap();
println!("p: {:?}", p_a.x);
println!("len: {:?} bytes: {:?}", bytes.len(), bytes);
let new_p = G1Affine::deserialize_compressed(&*bytes).unwrap();
println!("new_p: {:?}", new_p);
}
}
```
```
p: BigInt([5512034307592596933, 99440614254125635, 9312191808998893853, 13158212786987652246, 9267201699218040409, 55789045323959188])
len: 48 bytes: [197, 173, 210, 173, 255, 173, 126, 76, 67, 34, 221, 56, 182, 72, 97, 1, 29, 105, 174, 183, 28, 141, 59, 129, 150, 212, 102, 135, 208, 92, 155, 182, 89, 166, 201, 127, 216, 182, 155, 128, 148, 139, 228, 84, 215, 51, 198, 0]
new_p: (119164677558380895836223082391237862732875551641716767440243712604200065663798289487670039744221328193177263451589, 122683500655082031675443015824881521317135573936647759675201525980281219854269715687554971804151299033112627198155)
```
#### (Closed with [commit](https://github.com/oskarth/mopro/commit/e025f08d7f1a9b96dbfe4ed2179d56aa2160b8c0?diff=unified&w=0)) 0411
- Found a way to successfully parsed points and scalars. Then, get the benchmark result of arkworks pippenger
```rust=
let points = &instance.0;
let scalars = &instance.1;
let mut parsed_points = Vec::<G1Affine>::new();
let mut parsed_scalars = Vec::<ScalarField>::new();
// parse points and scalars from arkworks 0.3 compatible format to 0.4 compatible
for p in points {
let new_p =
G1Affine::new_unchecked(BigInt::new(p.x.0 .0).into(), BigInt::new(p.y.0 .0).into());
parsed_points.push(new_p);
}
for s in scalars {
let new_s = ScalarField::new(BigInt::new(s.0));
parsed_scalars.push(new_s);
}
```
```
running 1 test
dir: "src/middleware/gpu_explorations/utils/vectors/10x16"
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 127.120917ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 131.391917ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 129.130125ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 128.530083ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 140.565792ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 129.133667ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 128.416542ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 133.244625ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 129.896042ms
Average time to execute MSM with 65536 points and 65536 scalars and 1 iterations is: 128.161ms
Done running benchmark: Ok([127.120917ms, 131.391917ms, 129.130125ms, 128.530083ms, 140.565792ms, 129.133667ms, 128.416542ms, 133.244625ms, 129.896042ms, 128.161ms])
test middleware::gpu_explorations::arkworks_pippenger::tests::test_run_benchmark ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 12 filtered out; finished in 3.00s
```
### ✅ File not detected on swift
> The precalculated files were not readable by the target due to the configuration problem.
#### 0414
We uses `AWS S3` to store the pre-calculated points and scalars. With `Example/MoproKit/FileDownloader`, we are able to download the files from `S3` and store inside our target (whether simulator or mobile device).
Current points and scalar 10 instances and each of them has size of 2^16:
- points: `https://mopro-msm.s3.eu-north-1.amazonaws.com/vectors/16x10/points`
- scalars: `https://mopro-msm.s3.eu-north-1.amazonaws.com/vectors/16x10/scalars`
## References
- [zprize2022-mobile-nistath](https://github.com/nistath/zprize-mobile-msm/tree/main)
- [zprize2022-mobile-TrapdoorTech](https://github.com/Trapdoor-Tech/TrapdoorTech-zprize-mobile-harness/tree/master)
- [zprize2022-mobile-gnark](https://github.com/gbotrel/zprize-mobile-harness/tree/main)
- [zprize2022-gpu-yrrid](https://github.com/z-prize/2022-entries/tree/main/open-division/prize1-msm/prize1a-msm-gpu/yrrid)
- [Optimization strategies](https://hackmd.io/@drouyang/msm)