Signed Bucket Indexes for Multi-Scalar Multiplication (MSM)

# Signed Bucket Indexes for Multi-Scalar Multiplication (MSM) If we treat bucket indexes as signed integers, we are able to saves one bit in bucket index encoding, and therefore reduce number of buckets and bucket memory usage by half. The method is denoted as the signed-bucket-index method, and also known as NAF method. This method was reported in [1] and implemented by top-2 performers of the [zPrize MSM-WASM track](https://www.zprize.io/blog/announcing-zprize-results). NAF stands for Non-Adjcent Form. The classic definition of NAF is documented in [Textbook Definition of Non-Adjacent Form (NAF)](/HkVWGwsRTM2HeBL1VN0lAw). ## Intuition In bucket method, we slice each scalar $s$ into $c$-bit slices $s_i$, and use $s_i$ as bucket indexes. If we ignore bucket $0$ (since $0*P=0$), then we have $$s_i \in \{1, ..., 2^c-1\}$$ Therefore, we need $2^c - 1$ buckets in total. However, if there is a way (explained later) to map $s_i$ to the following range $$\{-2^{c-1},...,-1, 1,...,2^{c-1}-1 \}$$ We can use the following trick, ``` if slice s_i > 0, add P to bucket if slice s_i < 0, add -P to bucket ``` Then, we only need bucket $\{1, ..., 2^{c-1}\}$, which is $2^{c-1}$ buckets in total. For example, for slice $s_i \in \{1,2,3,4,5,6,7\}$ that needs $2^3-1 = 7$ buckets in total, if we map $s_i$ to $\{-4,-3,-2,-1,1,2,3\}$, only $2^{(3-1)} = 4$ buckets are needed. ## Solution _The section was modified from Gregor Baude's description in a slack channel_ Given window size $c$, denote $L = 2^c$. First, splitting the scalar $s$ into $K$ slices $s_i$ mathematically means that $$s = s_0 + s_1 L + ... + s_{K-1} L^{K-1} = \sum_i s_i L^i $$ Note that, between two slices, we have the following identity: $$ s_i L^i + s_{i+1} L^{i+1} = (s_i - L) L^i + (s_{i+1} + 1) L^{i+1} $$ In other words, we can reduce a slice by $L$ by carrying $1$ over to the next higher slice. Originally, $s_i$ is in the range [0, L), so we need $L-1$ buckets (ignoring the $0$ bucket). We want to get rid of the upper half. So whenever $s_i >= L/2$, we replace it with $s_i - L$ and add 1 to the next slice $s_{i+1}$. After this transformation, $s_i$ is in the range $[-L/2, L/2)$. If $s_i$ is negative, we can negate both the slice and the point, since $(-s_i) * G = s_i * (-G)$. Doing this brings our $s_i$ non-sign bits to the range $[0, L/2]$. So get a range of half the size. Indeed we have reduced $2^c - 1$ buckets to $2^{c-1}$ buckets. So the algorithm in pseudo code is: ``` let carry = 0; for (i = 0,...,K-1) { s[i] += carry; // we have to add the carry bit from last iteration! if (s[i] >= L/2) { s[i] = L - s[i]; // note: we subtract L and negate at once carry = 1; addToBucket(-point, s[i]); // use negated point } else { carry = 0; addToBucket(point, s[i]); } } ``` Caveat: how to deal with the case where highest slice $s_{K-1} >= L/2$? ### Handling overflow of the highest slice The implementation above ignored the final carry result (we don't add it anywhere). So, there is an implicit assumption: that the final carry is $0$. If the final carry is $1$, the sum is not correct. When can the final carry be $1$? Let's assume first that the window size unevenly divides the scalar bit length. Concretely, with the scalar length $128$ bits (that's what you get after GLV decomposition), let's say your window width is $c=13$. Since $13*9 = 117$, you get $10$ slices, but the final slices just have $11$ bits set $(128 - 117)$, not the full $13$ bits. That means that in the final step of the algorithm above, we will never be in the case $s[i] >= L/2$, so the final carry will never be $1$. We're good! However, if c evenly divides the scalar bit length, there are final carries of $1$ for about half the scalars. So, that would make the algorithm incorrect. This concretely happens with $c=16$, which is the best window width for MSM size $2^{18}$. The naive solution was to set the scalar bit length to one more than the actual value, $129$ instead of $128$. Then, you're guaranteed to never have a final carry. However, it's stupid because in the $c=16$ case, it means your final slice is just $1$ bit, and you run a whole sub-MSM just for this $1$ extra bit. An improved trick is to transform scalars such that they never have their MSB set. (Note: we're talking about the MSB of the whole scalars here, not about the MSB of individual slices. It's all about avoiding that final carry.) The following trick was implemented in Yrrid's solution. > If the MSB of the scalar is set, we negate the scalar and point, and continue processing normally. This works since: (M - s) (-P) = -s (-P) = s P Without GLV decomposition, scalars have $255$ bits, which is $15 * 17$, so the bad cases are $c=15, 17$. We could just avoid those window sizes, or do the naive thing of pretending the scalar size is $256$. After GLV decomposition, we need to deal with two scalar halves, **both of which can't have their MSB set**. ## References [1] Multi-scalar multiplication: state of the art & new ideas with Gus Gutoski [\[youtube 49:00\]](https://youtu.be/Bl5mQA7UL2I) ###### tags: `msm` `zkp` `public`