Sparse Representation Trick on Keccak

Sparse Base

Each operand of keccak_f is 64-bit value in base 2 initially, so when we are doing base conversion from

b_{1}

b_{2}

, we are doing:

x = \sum_{i = 0}^{63} x_{i} b_{1}^{i} \to y = \sum_{i = 0}^{63} f (x_{i}) b_{2}^{i} .

The base can be up to 15 without wrapping around the field size (

15^{64} - 1 \approx 2^{250} < 2^{253}

When doing conversion in practice, we don't lookup the 64-bit value, which could be too large. So we split it into chunks to fit system table size limit then using running sum trick to convert by lookup chunk by chunk.

In fraklin, they seem to have table size limit around

2^{16}

, so conversion from base 13 they have chunk size 4, and from base 9 they have chunk size 5, where both have table size around

2^{16}

. (

4 \log_{2} 13 \approx 14.8

and

5 \log_{2} 9 \approx 15.85

). The numbers could be found here.

First Sparse Base - 13

First sparse base is for theta step. In theta, it xor up to 12 values. So if we pick the base to be 13, we just add them together without wrapping around, and take the parity bit as xor result when building the lookup table.

How the 12 comes from is hard to explain in word, so here is the pseudo code to explain:

# state is already in sparse base 13,
# where state[0][0] could be added with round constant already,
# so its coefficients in sparse base 13 ranges from 0 to 2.
def theta(state: List[List[int]]):
    c = 5 * [0]
    new_state = [5 * [0] for i in range(5)]

    for i in range(5):
        c[i] = sum(state[i])
        
    # because state[0][0]'s coefficients ranges from 0 to 2,
    # so c[0]'s ranges from 0 to 6

    for i in range(5):
        for j in range(5):
            # note theta doesn't handle rotate here and leaves it for rho
            new_state[i][j] = state[i][j] + c[i-1] + 13 * c[i+1]

    # observe that state[0][0] is never added with c[0],
    # so the max coefficient after linear combination could be
    #   2 + 5 + 5 = 12 
    #   1 + 6 + 5 = 12
    #   1 + 5 + 6 = 12
    # which are all up to 12.

Second Sparse Base - 9

Second sparse base is for xi, which calculates a ^ (~b & c) ^ d. The result with inputs are found injective to an linear combination 2a + b + 3c + 2d, which ranges

[0, 8]

. So if we convert the inputs to base 9, we just add them together with the coefficients without wrapping around. When it needs to convert to other bases, it use table lookup only once to get the result bit. The truth table could be found here.

The d could be round constant or block of data to digest, see here for details.

Rotation

In fraklin they don't use the 64-th root of unity trick mentioned in the note. They unroll 25 rotations and split values into chunks carefully and check the rotation is correct and convert first base to second base at the same time.

We can do such check in base 2 more efficiently because the chunk size can be up to 16 (roughly 3-4 times efficiency). But the base conversion saves a bunch of bitwise lookup in theta and xi, so this sparse base conversion trick is still benefit for us.

For rho(T, offset=1) as example, initially we have

T

in base 13 for theta, and we want rotated version

X

in base 9 for xi:

\begin{aligned} T = \sum_{0}^{64} t_{i} \cdot 13^{i} \to X = \sum_{0}^{63} x_{i} \cdot 9^{i} \\ {\begin{cases} T_{0} = t_{0} + t_{64} \cdot 13^{64} \overset{?}{\to} X_{0} = x_{1} \\ T_{1} = \sum_{0}^{3} t_{i + 1} \cdot 13^{i} \overset{?}{\to} X_{1} = \sum_{0}^{3} x_{i + 2} \cdot 9^{i} \\ T_{2} = \sum_{0}^{3} t_{i + 5} \cdot 13^{i} \overset{?}{\to} X_{2} = \sum_{0}^{3} x_{i + 6} \cdot 9^{i} \\ ⋮ \\ T_{15} = \sum_{0}^{3} t_{i + 57} \cdot 13^{i} \overset{?}{\to} X_{15} = \sum_{0}^{3} x_{i + 58} \cdot 9^{i} \\ T_{16} = \sum_{0}^{1} t_{i + 61} \cdot 13^{i} \overset{?}{\to} X_{16} = \sum_{0}^{1} x_{i + 62} \cdot 9^{i} \\ T_{17} = t_{63} \overset{?}{\to} X_{17} = x_{0} \end{cases} \\ {\begin{cases} T \overset{?}{=} T_{0} + \sum_{1}^{16} T_{i} \cdot 13^{4 i - 3} + T_{17} \cdot 13^{63} \\ X \overset{?}{=} X_{0} \cdot 9 + \sum_{1}^{16} X_{i} \cdot 9^{4 i - 2} + X_{17} \end{cases} \end{aligned}

The notation

\overset{?}{\to}

does a base conversion lookup, and

\overset{?}{=}

does equality check.

Note that

T

has 65 bits becuase theta does rotate left 1 by linear combination. So for the

T_{0} \overset{?}{\to} X_{0}

its lookup uses OverflowCognizantConverterTable.

We don't need explicitly range check each full chunk because conversion lookup does for us. However, we need to carefully check the truncated chunk like

T_{16}

, which must have 2-bit inside it, otherwise, we can try to move the value of

t_{63}

into

T_{16}

but still pass the sum of

T

check.

franklin's approach to check this is also tricky:

Enumerate all 25 rotation and see how many truncated chunks we could have in total. Ref
Encode each item of counts a number which multiplies its count won't wrap around. Ref
Assign each encoded value to the first_to_second_base_converter_table for lookup, so we know how many bits the chunk have by lookup. Ref
When conversion, we track all the chunks' bits count of 25 round and finally check sum of them doesn't wrap around, which means each of chunk has correct bits count. Ref

Then it again saves 37 (12 + 12 + 13) range lookups into 3, so brilliant!

Algorithm

A simplified pseudo code for quick view (squeeze is simplified for fixed digest size 256-bit):

# input state_base_13: size 5x5 matrix keccak state in base 13
# input block_base_9: size 17 list of u64 data to digest in base 9 (None when squeeze)
# output state_base_13: updated state in base 13
# output digest: size 4 list of u64 in base 2 if is squeezing
def keccak_f(state_base_13: List[List[int]], block_base_9: List[int]) -> (List[List[int]], List[int]):
    digest = None

    for round in range(25):
        state_base_13 = theta(state_base_13)
        state_base_9 = rho(state_base_13)
        state_base_9 = pi(state_base_9)
        
        if round < 24:
            # xi adds round constant internally
            state_base_13, _ = xi(state_base_9, None, round)
            continue


        # when if block_base_9 is None (is squeezing), 
        # digest is first 4 words of updated state for keccak256 which looks like:
        # digest = convert_base(updated_state_base_9[0][0:4], from=9, to=2)
        state_base_13, digest = xi(state_base_9, block_base_9, round)

        # xi adds round constant internally when block_base_9 is None,
        # so we need to add round constant here when we are still absorbing.
        # this is also the reason we have first base to be 13 instead of 12.
        if block_base_9 is not None:
            state_base_13[0][0] += round_constant_base_13[24]

    return state_base_13, digest

# input data: arbitrary size list of u64 in base 2
# output digest: size 4 list of u64 in base 2
def keccak256(data: List[int]) -> List[int]:
    # pad data to multiple of RATE (RATE is 17 u64 words per keccak_f)
    padding = [0] * (0 if len(data) % RATE == 0 else RATE - len(data) % RATE)
    data.extend(padding)

    # absorb
    state_base_13 = [5 * [0] for i in range(5)]
    for offset in range(0, len(data), RATE):
        block = data[offset:offset+RATE]

        if i == 0:
            state_base_13 = convert_base(block, from=2, to=13)
            continue

        block_base_9 = convert_base(block, from=2, to=9)
        state_base_13, _ = keccak_f(state_base_13, block_base_9)

    # squeeze
    _, digest = keccak_f(state_base_13, None)

    return digest

Targeting a 2**25 table

Our table row size is larger than the franklin's.

base 13 and 9

Chunk size 6 bits

Running sum https://github.com/zcash/orchard/blob/main/src/circuit/gadget/utilities/lookup_range_check.rs

Reuse the sponge function https://github.com/zcash/orchard/blob/main/src/circuit/gadget/poseidon.rs

Question

Q. Why is `OverflowCognizantConverterTable`'s size
$\frac{13 \cdot (13 + 1)}{2}$ instead of
$13^{2}$ ?

Not sure why it's not for i in 0..13 { for j in 0..13 { ... } }.

Because we query OverflowCognizantConverterTable for the last chunk high value and last chunk low value. The two values are suppose to be chunk 0 of the base 13 value after the rotation, so they have the relationship of low + high <= 12. That implies only
$\frac{13 \cdot (13 + 1)}{2}$ cases work.

Sparse Base

First Sparse Base - 13

Second Sparse Base - 9

Rotation

Algorithm

Targeting a 2**25 table

Question

Q. Why is OverflowCognizantConverterTable's size 13⋅(13+1)2 instead of 132?

Reference

Read more

2024 zkVM team summary (PRECON)

Halo2 Peak Memory Usage

2024-01-11 Milestone review

2023-11-18 PSE tooling

Q. Why is `OverflowCognizantConverterTable`'s size
$\frac{13 \cdot (13 + 1)}{2}$ instead of
$13^{2}$ ?