Try   HackMD

Signed Bucket Indexes for Multi-Scalar Multiplication (MSM)

If we treat bucket indexes as signed integers, we are able to saves one bit in bucket index encoding, and therefore reduce number of buckets and bucket memory usage by half. The method is denoted as the signed-bucket-index method, and also known as NAF method. This method was reported in [1] and implemented by top-2 performers of the zPrize MSM-WASM track.

NAF stands for Non-Adjcent Form. The classic definition of NAF is documented in Textbook Definition of Non-Adjacent Form (NAF).

Intuition

In bucket method, we slice each scalar

s into
c
-bit slices
si
, and use
si
as bucket indexes. If we ignore bucket
0
(since
0P=0
), then we have

si{1,...,2c1}

Therefore, we need

2c1 buckets in total.

However, if there is a way (explained later) to map

si to the following range
{2c1,...,1,1,...,2c11}

We can use the following trick,

if slice s_i > 0, add P to bucket
if slice s_i < 0, add -P to bucket

Then, we only need bucket

{1,...,2c1}, which is
2c1
buckets in total.

For example, for slice

si{1,2,3,4,5,6,7} that needs
231=7
buckets in total, if we map
si
to
{4,3,2,1,1,2,3}
, only
2(31)=4
buckets are needed.

Solution

The section was modified from Gregor Baude's description in a slack channel

Given window size

c, denote
L=2c
.

First, splitting the scalar

s into
K
slices
si
mathematically means that
s=s0+s1L+...+sK1LK1=isiLi

Note that, between two slices, we have the following identity:

siLi+si+1Li+1=(siL)Li+(si+1+1)Li+1

In other words, we can reduce a slice by

L by carrying
1
over to the next higher slice.

Originally,

si is in the range [0, L), so we need
L1
buckets (ignoring the
0
bucket). We want to get rid of the upper half. So whenever
si>=L/2
, we replace it with
siL
and add 1 to the next slice
si+1
.

After this transformation,

si is in the range
[L/2,L/2)
. If
si
is negative, we can negate both the slice and the point, since
(si)G=si(G)
. Doing this brings our
si
non-sign bits to the range
[0,L/2]
. So get a range of half the size. Indeed we have reduced
2c1
buckets to
2c1
buckets.

So the algorithm in pseudo code is:

let carry = 0;

for (i = 0,...,K-1) {
  s[i] += carry; // we have to add the carry bit from last iteration!
  if (s[i] >= L/2) {
    s[i] = L - s[i]; // note: we subtract L and negate at once
    carry = 1;
    addToBucket(-point, s[i]); // use negated point
  } else {
    carry = 0;
    addToBucket(point, s[i]);
  }
}

Caveat: how to deal with the case where highest slice

sK1>=L/2?

Handling overflow of the highest slice

The implementation above ignored the final carry result (we don't add it anywhere). So, there is an implicit assumption: that the final carry is

0. If the final carry is
1
, the sum is not correct.

When can the final carry be

1?

Let's assume first that the window size unevenly divides the scalar bit length. Concretely, with the scalar length

128 bits (that's what you get after GLV decomposition), let's say your window width is
c=13
. Since
139=117
, you get
10
slices, but the final slices just have
11
bits set
(128117)
, not the full
13
bits. That means that in the final step of the algorithm above, we will never be in the case
s[i]>=L/2
, so the final carry will never be
1
. We're good!

However, if c evenly divides the scalar bit length, there are final carries of

1 for about half the scalars. So, that would make the algorithm incorrect. This concretely happens with
c=16
, which is the best window width for MSM size
218
.

The naive solution was to set the scalar bit length to one more than the actual value,

129 instead of
128
. Then, you're guaranteed to never have a final carry. However, it's stupid because in the
c=16
case, it means your final slice is just
1
bit, and you run a whole sub-MSM just for this
1
extra bit.

An improved trick is to transform scalars such that they never have their MSB set. (Note: we're talking about the MSB of the whole scalars here, not about the MSB of individual slices. It's all about avoiding that final carry.) The following trick was implemented in Yrrid's solution.

If the MSB of the scalar is set, we negate the scalar and point, and continue processing normally. This works since: (M - s) (-P) = -s (-P) = s P

Without GLV decomposition, scalars have

255 bits, which is
1517
, so the bad cases are
c=15,17
. We could just avoid those window sizes, or do the naive thing of pretending the scalar size is
256
. After GLV decomposition, we need to deal with two scalar halves, both of which can't have their MSB set.

References

[1] Multi-scalar multiplication: state of the art & new ideas with Gus Gutoski [youtube 49:00]

tags: msm zkp public