# Multi Scalar Multiplication $a_1G_1+a_2G_2+\dots+a_nG_n$ $n \rightarrow$ size of MSM $b \rightarrow$ bit-size of $a_i$ ## Bucket method $S_1+2S_2+3S_3+\dots+(2^c-1)S_{2^c-1}$ - **Original** *Computation cost (+)* $\frac{b}{c}\cdot(n+2^c)+(b-c-\frac{b}{c}-1)$ *e.g.* $(n,b,c)=(10^6, 256, 16)$ $16n+(2^{20}+223) = 17\,048\,799$ *Memory cost* $2^c$ - :+1: **Negative $G$** *Computation cost (+)* $\frac{b}{c}\cdot(n+2^{c-1})+(b-c-\frac{b}{c}-1)$ ($G\mapsto-G$ cost is neglected) *e.g.* $(n,b,c)=(10^6, 256, 16)$ $16n+(2^{19}+223) = 16\,524\,511$ *Memory cost* $2^{c}$ ---- ### endomorphisms use - :bulb: **endomorphism: 2-dimensional decompostion** *Computation cost (+)* $\frac{b}{2c}\cdot(2n+2^{c-1})+cte$ ($G\mapsto\lambda G$ cost is neglected) (:warning: Scalar decomposition shouldn't be neglected) *e.g.* $(n,b,c)=(10^6, 256, 16)$ $16n+(2^{18}+105) = 16\,262\,249$ $\implies$ unless $c$ is chosen differently, there seems no interest in using endomorphisms this way. - :bulb: **endomorphism: 4-dimensional decompistion (on $G_2$)** Number of group ops: $\frac{b}{4c}\cdot(4n+2^{c-1})+cte$ ($G\mapsto\lambda G$ cost is neglected) (:warning: Scalar decomposition shouldn't be neglected) *e.g.* $(n,b,c)=(10^6, 256, 16)$ $16n+(2^{17}+43) = 16\,131\,115$ $\implies$ unless $c$ is chosen differently, there seems no interest in using endomorphisms this way. - :bulb: **different idea to use endomorphisms** $(\mu_1,\dots,\mu_6)=(1,-1,\lambda,-\lambda,\lambda^2,-\lambda^2)$ Convert scalar $a_i$ to digit set containing $(\mu_1,\dots,\mu_6)$. $\implies$ :warning: likely overflowed as $\lambda$ are $\approx b/2$-bit *:bulb: Ideas:* - eigenvalues can be expressed in polynomial form evaluated at the seed $u$. *e.g.* for `BLS12-381`, $\lambda=u^2-1$ and $\lambda^2=-u^2$. Maybe a polynomial algorithm might help with the digit set conversion? - eigenvalues can be expressed algebraically for CM curves. *e.g.* for $j=0$ curves (`BLS12-381`), $\lambda=(-1+\sqrt{-3})/2$. Maybe an algebraic algorithm might help, thus delaying the reduction step that causes overflow. ----- ### General thoughts :thought_balloon: - :+1: when random scalar are required ($r,s$ in prove alg.), one can instead sample $r_1, r_2, s_1, s_2$ and use GLV mul without lattice decomposition. - :shrug: I think in verify alg. (small size msm, \# public inputs), `multiexp` is slower than naive `mul-and-add` or maybe a different `multiexp` alg. (Strauss, Bos-coster?). - :+1: `multiexp` is needed in the setup, one can use precomputation as it is a fixed-base MSM. - :shrug::-1: Can we use a 4-dimensional endo-based MSM on $G_1$ via some twist? **NO** - :shrug::+1: Can (*surjective*) isogenies help with MSM? Map a fraction of the points to a "better" curve and do 2 MSMs, one in the "better" curve and one in the "less better" curve. - :+1::-1: Are extended Jacobians the fastest coordinates for **add-only** on Weierstrass? Can we *map* (isogeny) to a curve form with the fastest addition formulae? - :-1: Use affine coordinates and batch inversion with Montgomery's trick (as in Barretenberg)? :warning: **but additions are sequentiel in the bucket-list method..** What if we don't accumulate points in the buckets *on the fly* and instead wait for, say 10 points ? - :shrug: Can we re-use $\max S_i$ or recycle *part* of the buckets? ----- ## Breaking down the complexity :muscle: $T=a_1G_1+a_2G_2+\cdots+a_nG_n$ (1) The points $G_i$ are in affine coordinates. $T=S_1+2S_2+3S_3+\dots+(2^c-1)S_{2^c-1}$ (2) The points $S_i$ are **are not** in affine coordinates. `gurvy` uses the bucket-list method . The (+) cost is $\frac{b}{c}\cdot(n+2^{c})+(b-c-\frac{b}{c}-1)$ This (+) can be broken down to `madd`, `add` and `double`. - `madd` is used to accumulate $G_i$ in the $c$-MSM buckets with cost $\frac{b}{c}\cdot(n-2^c+1)$ - `add` is used to sum the $c$-MSM buckets with cost $\frac{b}{c}\cdot(2^{c+1}-3)$ - `add/double` is used to combine $c$-MSMs into a $b$-MSM with cost $b-c+b/c-1$ - `add` is used $b/c-1$ times - `double` is used $b-c$ times Note that `mdouble` is used when `madd` the same point but this is statistically rarely to happen. Also, `add` uses `double` but this doesn't happen when summing the $c$-MSM buckets ($S_1+2S_2+\cdots$). system | madd ($z_2=1$) | add | double | --------|-----|----------------|-------------------| Jacobian|7**M**+4**S**|11**M**+5**S**|2**M**+5**S**| Extended Jacobian|8**M**+2**S**|12**M**+2**S**|6**M**+4**S**| extJac to Jac conversion|2M+2S| ----|----| :gear: `gurvy` currently uses `madd` in Jacobian extended coordinates (8**M**+2**S**) and `add/double` in Jacobian coordinates. The conversion `unsafeFromJacExtended()` happens for all the $\frac{b}{c}(2^c-1)$ buckets, and this costs 2**M**+2**S** per bucket. The final step in done in Jacobian and costs 11**M**+5**S** for `add` and 2**M**+5**S** for `double`. Assuming 1**S**=1**M**, the ovearall cost is: $\frac{b}{c}(n-2^c+1)\cdot10$**M** $+$ $\frac{b}{c}(2^{c+1}-3)\cdot16$**M** $+$ $(\frac{b}{c}-1)\cdot16$**M** $+$ $(b-c)\cdot7$**M** $+$ $\frac{b}{c}(2^c-1)\cdot4$**M** :bulb: By using Jac extended `add` (12**M**+2**S**) to sum the buckets and then converting only the output to Jac, this saves $\frac{b}{c}(2^{c+1}-3)\cdot2$**M**$+\frac{b}{c}(2^c-2)\cdot4$**M** $=\frac{b}{c}(2^{c+3}-10)\cdot$**M** for $(n,b,c)=(10^6,256,16)$, this should save 8388448**M** (if we neglect the memory cost of doing things in Jac extended). :watch: ::: spoiler Benchmarking on my macbook Air M1 (darwin/arm64) *(msm/extJac branch)* > MultiExp on `BN254` $\mathbb{G}_1$ ``` benchmark old ns/op new ns/op delta BenchmarkMultiExpg1/32_points-8 255522 248711 -2.67% BenchmarkMultiExpg1/64_points-8 343396 329379 -4.08% BenchmarkMultiExpg1/128_points-8 487555 470011 -3.60% BenchmarkMultiExpg1/256_points-8 683237 653802 -4.31% BenchmarkMultiExpg1/512_points-8 1022324 967510 -5.36% BenchmarkMultiExpg1/1024_points-8 1654419 1574284 -4.84% BenchmarkMultiExpg1/2048_points-8 2846562 2698358 -5.21% BenchmarkMultiExpg1/4096_points-8 4841740 4704300 -2.84% BenchmarkMultiExpg1/8192_points-8 8440007 8245725 -2.30% BenchmarkMultiExpg1/16384_points-8 15842990 15400970 -2.79% BenchmarkMultiExpg1/32768_points-8 31599238 30608169 -3.14% BenchmarkMultiExpg1/65536_points-8 52787454 51724312 -2.01% BenchmarkMultiExpg1/131072_points-8 102621792 100288368 -2.27% BenchmarkMultiExpg1/262144_points-8 201552992 197356833 -2.08% BenchmarkMultiExpg1/524288_points-8 430887097 409126139 -5.05% BenchmarkMultiExpg1/1048576_points-8 811442312 827994916 +2.04% BenchmarkMultiExpg1/2097152_points-8 1658519209 1608798500 -3.00% BenchmarkMultiExpg1/4194304_points-8 3387097000 3284217833 -3.04% BenchmarkMultiExpg1/8388608_points-8 6520269792 6350829500 -2.60% BenchmarkMultiExpg1/16777216_points-8 12498788292 12269033250 -1.84% ``` > MultiExp on `BN254` $\mathbb{G}_2$ ``` BenchmarkMultiExpg2/32_points-8 701438 705559 +0.59% BenchmarkMultiExpg2/64_points-8 900327 907202 +0.76% BenchmarkMultiExpg2/128_points-8 1237129 1259441 +1.80% BenchmarkMultiExpg2/256_points-8 1859797 1841193 -1.00% BenchmarkMultiExpg2/512_points-8 3101968 2971313 -4.21% BenchmarkMultiExpg2/1024_points-8 5201898 5063088 -2.67% BenchmarkMultiExpg2/2048_points-8 9216457 8806857 -4.44% BenchmarkMultiExpg2/4096_points-8 16151247 15383952 -4.75% BenchmarkMultiExpg2/8192_points-8 28317443 27172224 -4.04% BenchmarkMultiExpg2/16384_points-8 52957750 50718538 -4.23% BenchmarkMultiExpg2/32768_points-8 106693288 103207783 -3.27% BenchmarkMultiExpg2/65536_points-8 182290528 170549917 -6.44% BenchmarkMultiExpg2/131072_points-8 353747861 339250417 -4.10% BenchmarkMultiExpg2/262144_points-8 678960958 652002562 -3.97% BenchmarkMultiExpg2/524288_points-8 1390392667 1280709750 -7.89% BenchmarkMultiExpg2/1048576_points-8 2581350250 2444061000 -5.32% BenchmarkMultiExpg2/2097152_points-8 5209850083 4953646875 -4.92% BenchmarkMultiExpg2/4194304_points-8 10652438417 10126838375 -4.93% BenchmarkMultiExpg2/8388608_points-8 18627952208 18262826792 -1.96% BenchmarkMultiExpg2/16777216_points-8 36243274666 35248729833 -2.74% ``` ::: As a point of reference, this optimization results in an average-delta 3% on `BN256` $\mathbb{G}_1$ and 3.5% on `BN256` $\mathbb{G}_2$. -- :face_with_monocle: Projective-Jacobian mixed-coordinates might be a better cost-memory tradeoff. *WIP* in msm/Proj branch. **[update]** :-1: it seems that they are **slower** on my mba M1. ---- ## :stopwatch: Benchmarking different coordinates systems on AWS On **AWS z1d.large** (2 Intel(R) Xeon(R) Platinum 8151 CPUs @ 3.40GHz), with hyperthreading disabled. The curves benchmarked are `bn256` and `bls12-381` on both $\mathbb{G}_1$ and $\mathbb{G}_2$. :::spoiler old vs. new extJac/Jac mix > `bn256` $\mathbb{G}_1$ ```` benchmark old ns/op new ns/op delta BenchmarkMultiExpg1/32_points-2 521287 508333 -2.49% BenchmarkMultiExpg1/64_points-2 746741 719370 -3.67% BenchmarkMultiExpg1/128_points-2 1142163 1131175 -0.96% BenchmarkMultiExpg1/256_points-2 1842109 1820274 -1.19% BenchmarkMultiExpg1/512_points-2 3126018 3064150 -1.98% BenchmarkMultiExpg1/1024_points-2 5294729 5205683 -1.68% BenchmarkMultiExpg1/2048_points-2 9296090 9137673 -1.70% BenchmarkMultiExpg1/4096_points-2 16668307 16561631 -0.64% BenchmarkMultiExpg1/8192_points-2 29825495 29631754 -0.65% BenchmarkMultiExpg1/16384_points-2 54576319 54392382 -0.34% BenchmarkMultiExpg1/32768_points-2 109159567 108854958 -0.28% BenchmarkMultiExpg1/65536_points-2 188171430 186831015 -0.71% BenchmarkMultiExpg1/131072_points-2 387075768 378966747 -2.09% BenchmarkMultiExpg1/262144_points-2 755280398 747000697 -1.10% BenchmarkMultiExpg1/524288_points-2 1628169040 1527633996 -6.17% BenchmarkMultiExpg1/1048576_points-2 3140597435 3069504753 -2.26% BenchmarkMultiExpg1/2097152_points-2 6389197235 6118134792 -4.24% BenchmarkMultiExpg1/4194304_points-2 13455489291 13217081599 -1.77% BenchmarkMultiExpg1/8388608_points-2 25562515521 25368417686 -0.76% BenchmarkMultiExpg1/16777216_points-2 50404305053 49881827904 -1.04% ```` > `bn256` $\mathbb{G}_2$ ```` BenchmarkMultiExpg2/32_points-2 1232296 1219927 -1.00% BenchmarkMultiExpg2/64_points-2 1966166 1961351 -0.24% BenchmarkMultiExpg2/128_points-2 3159590 3181318 +0.69% BenchmarkMultiExpg2/256_points-2 5224148 5274185 +0.96% BenchmarkMultiExpg2/512_points-2 8955463 8992370 +0.41% BenchmarkMultiExpg2/1024_points-2 15319723 15397745 +0.51% BenchmarkMultiExpg2/2048_points-2 27013055 27266400 +0.94% BenchmarkMultiExpg2/4096_points-2 48443705 48773110 +0.68% BenchmarkMultiExpg2/8192_points-2 86247781 86710775 +0.54% BenchmarkMultiExpg2/16384_points-2 158282131 159584661 +0.82% BenchmarkMultiExpg2/32768_points-2 319112962 321946192 +0.89% BenchmarkMultiExpg2/65536_points-2 527998026 529085494 +0.21% BenchmarkMultiExpg2/131072_points-2 1040535731 1046170701 +0.54% BenchmarkMultiExpg2/262144_points-2 1974330895 1981523097 +0.36% BenchmarkMultiExpg2/524288_points-2 4085596111 4067140114 -0.45% BenchmarkMultiExpg2/1048576_points-2 7979044565 7916374093 -0.79% BenchmarkMultiExpg2/2097152_points-2 15897717360 15709507282 -1.18% BenchmarkMultiExpg2/4194304_points-2 30557102743 31065406100 +1.66% BenchmarkMultiExpg2/8388608_points-2 55755268337 56151052033 +0.71% BenchmarkMultiExpg2/16777216_points-2 108201091198 108363861615 +0.15% ```` > `bls12-381` $\mathbb{G}_1$ ```` benchmark old ns/op new ns/op delta BenchmarkMultiExpg1/32_points-2 852569 826290 -3.08% BenchmarkMultiExpg1/64_points-2 1232968 1192016 -3.32% BenchmarkMultiExpg1/128_points-2 1873297 1847131 -1.40% BenchmarkMultiExpg1/256_points-2 3135669 3103259 -1.03% BenchmarkMultiExpg1/512_points-2 5512026 5418419 -1.70% BenchmarkMultiExpg1/1024_points-2 9647502 9534631 -1.17% BenchmarkMultiExpg1/2048_points-2 16713951 16529521 -1.10% BenchmarkMultiExpg1/4096_points-2 29799702 29687489 -0.38% BenchmarkMultiExpg1/8192_points-2 54869562 54639714 -0.42% BenchmarkMultiExpg1/16384_points-2 97234072 96712642 -0.54% BenchmarkMultiExpg1/32768_points-2 173870773 172688195 -0.68% BenchmarkMultiExpg1/65536_points-2 329913407 330971677 +0.32% BenchmarkMultiExpg1/131072_points-2 664793641 672117913 +1.10% BenchmarkMultiExpg1/262144_points-2 1215578861 1223581607 +0.66% BenchmarkMultiExpg1/524288_points-2 2399145599 2494885108 +3.99% BenchmarkMultiExpg1/1048576_points-2 4773107729 4919274245 +3.06% BenchmarkMultiExpg1/2097152_points-2 9312957452 9645666810 +3.57% BenchmarkMultiExpg1/4194304_points-2 20163248513 20044303434 -0.59% BenchmarkMultiExpg1/8388608_points-2 37759568154 37823067724 +0.17% BenchmarkMultiExpg1/16777216_points-2 74314681897 74254425555 -0.08% ```` > `bls12-381` $\mathbb{G}_2$ ```` BenchmarkMultiExpg2/32_points-2 2329363 2332239 +0.12% BenchmarkMultiExpg2/64_points-2 3580014 3609963 +0.84% BenchmarkMultiExpg2/128_points-2 5545405 5652144 +1.92% BenchmarkMultiExpg2/256_points-2 9561587 9835672 +2.87% BenchmarkMultiExpg2/512_points-2 16950495 17188199 +1.40% BenchmarkMultiExpg2/1024_points-2 29834844 30447527 +2.05% BenchmarkMultiExpg2/2048_points-2 51581666 52632859 +2.04% BenchmarkMultiExpg2/4096_points-2 91989539 93505308 +1.65% BenchmarkMultiExpg2/8192_points-2 169590089 172246403 +1.57% BenchmarkMultiExpg2/16384_points-2 300826103 305571853 +1.58% BenchmarkMultiExpg2/32768_points-2 535131120 544599354 +1.77% BenchmarkMultiExpg2/65536_points-2 1007360361 1031914947 +2.44% BenchmarkMultiExpg2/131072_points-2 1992632700 2064772883 +3.62% BenchmarkMultiExpg2/262144_points-2 3554205257 3739934611 +5.23% BenchmarkMultiExpg2/524288_points-2 7139699361 7487118132 +4.87% BenchmarkMultiExpg2/1048576_points-2 13719677013 14180237227 +3.36% BenchmarkMultiExpg2/2097152_points-2 27400802029 28353498217 +3.48% BenchmarkMultiExpg2/4194304_points-2 55124802240 56211299429 +1.97% BenchmarkMultiExpg2/8388608_points-2 101207103156 102828613245 +1.60% BenchmarkMultiExpg2/16777216_points-2 196243971093 198313487848 +1.05% ```` ::: :::spoiler old exJac/Jac mix vs. full extJac > `bn256` $\mathbb{G}_1$ ````` benchmark old ns/op new ns/op delta BenchmarkMultiExpg1/32_points-2 521287 505743 -2.98% BenchmarkMultiExpg1/64_points-2 746741 715711 -4.16% BenchmarkMultiExpg1/128_points-2 1142163 1122852 -1.69% BenchmarkMultiExpg1/256_points-2 1842109 1814809 -1.48% BenchmarkMultiExpg1/512_points-2 3126018 3053075 -2.33% BenchmarkMultiExpg1/1024_points-2 5294729 5180156 -2.16% BenchmarkMultiExpg1/2048_points-2 9296090 9144958 -1.63% BenchmarkMultiExpg1/4096_points-2 16668307 16517425 -0.91% BenchmarkMultiExpg1/8192_points-2 29825495 29504432 -1.08% BenchmarkMultiExpg1/16384_points-2 54576319 54100444 -0.87% BenchmarkMultiExpg1/32768_points-2 109159567 108410216 -0.69% BenchmarkMultiExpg1/65536_points-2 188171430 184376422 -2.02% BenchmarkMultiExpg1/131072_points-2 387075768 373543696 -3.50% BenchmarkMultiExpg1/262144_points-2 755280398 715367175 -5.28% BenchmarkMultiExpg1/524288_points-2 1628169040 1471526746 -9.62% BenchmarkMultiExpg1/1048576_points-2 3140597435 2914223970 -7.21% BenchmarkMultiExpg1/2097152_points-2 6389197235 5719922389 -10.48% BenchmarkMultiExpg1/4194304_points-2 13455489291 12748529204 -5.25% BenchmarkMultiExpg1/8388608_points-2 25562515521 23565207643 -7.81% BenchmarkMultiExpg1/16777216_points-2 50404305053 47302479106 -6.15% ````` > `bn256` $\mathbb{G}_2$ ````` BenchmarkMultiExpg2/32_points-2 1232296 1249745 +1.42% BenchmarkMultiExpg2/64_points-2 1966166 1997378 +1.59% BenchmarkMultiExpg2/128_points-2 3159590 3219511 +1.90% BenchmarkMultiExpg2/256_points-2 5224148 5323031 +1.89% BenchmarkMultiExpg2/512_points-2 8955463 9062418 +1.19% BenchmarkMultiExpg2/1024_points-2 15319723 15502242 +1.19% BenchmarkMultiExpg2/2048_points-2 27013055 27432473 +1.55% BenchmarkMultiExpg2/4096_points-2 48443705 48917456 +0.98% BenchmarkMultiExpg2/8192_points-2 86247781 86933369 +0.79% BenchmarkMultiExpg2/16384_points-2 158282131 159936677 +1.05% BenchmarkMultiExpg2/32768_points-2 319112962 321651126 +0.80% BenchmarkMultiExpg2/65536_points-2 527998026 529662601 +0.32% BenchmarkMultiExpg2/131072_points-2 1040535731 1041968048 +0.14% BenchmarkMultiExpg2/262144_points-2 1974330895 1968151271 -0.31% BenchmarkMultiExpg2/524288_points-2 4085596111 4082436474 -0.08% BenchmarkMultiExpg2/1048576_points-2 7979044565 7929108100 -0.63% BenchmarkMultiExpg2/2097152_points-2 15897717360 15681778432 -1.36% BenchmarkMultiExpg2/4194304_points-2 30557102743 30792360857 +0.77% BenchmarkMultiExpg2/8388608_points-2 55755268337 55394701677 -0.65% BenchmarkMultiExpg2/16777216_points-2 108201091198 107124629457 -0.99% ````` > `bls12-381` $\mathbb{G}_1$ ``` benchmark old ns/op new ns/op delta BenchmarkMultiExpg1/32_points-2 852569 829411 -2.72% BenchmarkMultiExpg1/64_points-2 1232968 1205226 -2.25% BenchmarkMultiExpg1/128_points-2 1873297 1853342 -1.07% BenchmarkMultiExpg1/256_points-2 3135669 3098810 -1.18% BenchmarkMultiExpg1/512_points-2 5512026 5413164 -1.79% BenchmarkMultiExpg1/1024_points-2 9647502 9511648 -1.41% BenchmarkMultiExpg1/2048_points-2 16713951 16530675 -1.10% BenchmarkMultiExpg1/4096_points-2 29799702 29730510 -0.23% BenchmarkMultiExpg1/8192_points-2 54869562 54480986 -0.71% BenchmarkMultiExpg1/16384_points-2 97234072 96521896 -0.73% BenchmarkMultiExpg1/32768_points-2 173870773 171850351 -1.16% BenchmarkMultiExpg1/65536_points-2 329913407 326537114 -1.02% BenchmarkMultiExpg1/131072_points-2 664793641 659427818 -0.81% BenchmarkMultiExpg1/262144_points-2 1215578861 1188254587 -2.25% BenchmarkMultiExpg1/524288_points-2 2399145599 2369601364 -1.23% BenchmarkMultiExpg1/1048576_points-2 4773107729 4513294717 -5.44% BenchmarkMultiExpg1/2097152_points-2 9312957452 9109505322 -2.18% BenchmarkMultiExpg1/4194304_points-2 20163248513 19529947129 -3.14% BenchmarkMultiExpg1/8388608_points-2 37759568154 36694205577 -2.82% BenchmarkMultiExpg1/16777216_points-2 74314681897 71905093486 -3.24% ``` > `bls12-381` $\mathbb{G}_2$ ``` BenchmarkMultiExpg2/32_points-2 2329363 2379804 +2.17% BenchmarkMultiExpg2/64_points-2 3580014 3657124 +2.15% BenchmarkMultiExpg2/128_points-2 5545405 5700218 +2.79% BenchmarkMultiExpg2/256_points-2 9561587 9835235 +2.86% BenchmarkMultiExpg2/512_points-2 16950495 17231132 +1.66% BenchmarkMultiExpg2/1024_points-2 29834844 30474849 +2.15% BenchmarkMultiExpg2/2048_points-2 51581666 52662943 +2.10% BenchmarkMultiExpg2/4096_points-2 91989539 93575238 +1.72% BenchmarkMultiExpg2/8192_points-2 169590089 172433579 +1.68% BenchmarkMultiExpg2/16384_points-2 300826103 305326552 +1.50% BenchmarkMultiExpg2/32768_points-2 535131120 543053653 +1.48% BenchmarkMultiExpg2/65536_points-2 1007360361 1026305380 +1.88% BenchmarkMultiExpg2/131072_points-2 1992632700 2033939368 +2.07% BenchmarkMultiExpg2/262144_points-2 3554205257 3678478188 +3.50% BenchmarkMultiExpg2/524288_points-2 7139699361 7394030804 +3.56% BenchmarkMultiExpg2/1048576_points-2 13719677013 14094188831 +2.73% BenchmarkMultiExpg2/2097152_points-2 27400802029 28299739984 +3.28% BenchmarkMultiExpg2/4194304_points-2 55124802240 56356443198 +2.23% BenchmarkMultiExpg2/8388608_points-2 101207103156 103271315353 +2.04% BenchmarkMultiExpg2/16777216_points-2 196243971093 198943078573 +1.38% ``` ::: :::spoiler old extJac/Jac mix vs. Proj/Jac mix :-1: too slow --> trop la flemme de copypasta ::: ... :end: It seems that the best solution is **full Jacobian extended on $\mathbb{G}_1$** and **old Jacobian extended/Jacobian mix on $\mathbb{G}_2$**. ----- ## map to a faster shape of curve? :running: The only shapes-coordinates with faster `madd` are Jacobi quartics XXYZZ/XXYZZR (9**M**), Edwards projective/inverted (9**M**), twisted Edwards extended (8**M**) and twisted Edwards extended with $a=-1$ (7**M**). To convert a short Weierstrass to a twisted Edwards or a Jacobi quartic, we need a point of order 2 to map to a Montgomery curve which has then a bijection with these shapes. - [ ] BN256 (prime-order curve) :pensive: - [ ] G1 - [ ] G2 - [ ] BLS12-381 :pensive: - [ ] G1 - [ ] G2 - [ ] BLS12-377 - [x] G1 :star-struck: - [ ] G2 - [ ] BW6-761 - [x] G1 :star-struck: - [ ] G2 ### Hessian curves? - :+1: `madd` on Hessian projective is as fast as in Weierstrass extJac, `add` is faster compared to both Jac and extJac ($\Delta$=4**M**, 2**M** resp.) and `double` is faster than extJac ($\Delta$=2**M**) and slightly slower than Jac ($\Delta$=-1**M**). - :+1: It seems that there is a birational equivalence between Hessian and Weierstrass. - :-1: For not too big $n$, it might be interesting to map the curve to its Hessian form (cost: ?) before running the MSM as the overall cost is not dominated by `madd` only. However, this means implementing the arithmeric of a new curve from... **[update]** The birational equivalence is defined over the initial field *if and only if* $q\equiv 2 \mod 3$ and $t\equiv0\mod3$. ---- ## :spiral_note_pad: Notes on endomorphism use - :-1: on additional endomorphism on $E(\mathbb{F}_p)$: The twisted $p$-power Frobenius is going to act like a 6-th root of unity on the subgroup of the sextic twist, and we already have that coming from the curve, so I don't think we get anything new... In fact, in the case of curves of discriminant $D=-3$ and sextic twists, where there is an endomorphism of characteristic polynomial $X^2+X+1$, the Frobenius+twist will not bring a new endomorphism. - :-1: on endo-MSM: Generally, using endomorphism decompositions on $a_1, ..., a_n$ will roughly halve their bitlength but double the number of points. So it might not be particularly worthwhile until we've found an algorithm that scales very nicely in terms of $n$. However, if we can find $G_i$ such that $\zeta(G_i) = G_j$ for some $i \not= j$, then we can use this to eliminate $G_i$ or $G_j$ before we call the MSM algorithm. In general this should never really happen, though. - :+1: For GLV use, `gurvy` precomputes a fixed basis (`utils.PrecomputeLattice()`) and then write a scalar $s$ as $s_1+\lambda s_2$ (`utils.SplitScalar()`). The precompute can be ommitted since for $j=0$ curves $E(\mathbb{F}_q)$ (all curves in `gurvy`), a short basis can be written down explicitly in function of the Frobenius trace [[Smith13](https://hal.inria.fr/hal-00874925/document)] $b_1=\{(t+c)/2-1, c\},\quad b_2=\{1-(t-c)/2, 1-1/2(t+c)\}$ with $t=q+1-\#E(\mathbb{F}_q)$ the Frobenius trace and $c^2=(4q-t^2)/3$. For the lattice reduction step ('online' split of scalars), the SOA seems to be [this recent 2020 paper of Thomas Pornin](https://eprint.iacr.org/2020/454.pdf), which was [recently implemented in `dalek`](https://github.com/dalek-cryptography/curve25519-dalek/pull/323). **It seems that no division is used, and only two plain integer multiplications are involved, the inner loop consisting only of efficient linear operations (ad- ditions, subtractions, and left shifts).** [update] Well that (Pornin) works for Schnorr's signature aggregation (as in Atkin's trick) but for GLV you can just get away with a Babai rounding.. - :+1: When using the endomorphism, half pf the points are $\zeta(G_i)$. So when accumulating the points inside the buckets if $G_i+\zeta(G_i)$ happens it costs less than a plain addition *i.e.* $G_i+\zeta(G_i)=-\zeta^2(G_i)=(\omega^2x,-y)$ where $\omega^2$ is a third root of unity *i.e.* $\omega^2+\omega+1=0\mod q$. ---- ## :spiral_note_pad: Notes on MSM and arithmetic circuits - :thinking_face: The scalars $(a_1, a_2, \cdots, a_n)$ in the $\texttt{prove}$ algorithm are actually the values of the SNARK *witness*. This witness basically encodes all the values in the arithmetic circuit. So some of these values are linear combinations of other values. One can benefit from this correlation in the scalars $a_i$ to optimize the MSM algorithm.