Each operand of keccak_f
is 64-bit value in base 2 initially, so when we are doing base conversion from
The base can be up to 15 without wrapping around the field size (
When doing conversion in practice, we don't lookup the 64-bit value, which could be too large. So we split it into chunks to fit system table size limit then using running sum trick to convert by lookup chunk by chunk.
In fraklin, they seem to have table size limit around
First sparse base is for theta
step. In theta
, it xor up to 12 values. So if we pick the base to be 13, we just add them together without wrapping around, and take the parity bit as xor result when building the lookup table.
How the 12 comes from is hard to explain in word, so here is the pseudo code to explain:
# state is already in sparse base 13,
# where state[0][0] could be added with round constant already,
# so its coefficients in sparse base 13 ranges from 0 to 2.
def theta(state: List[List[int]]):
c = 5 * [0]
new_state = [5 * [0] for i in range(5)]
for i in range(5):
c[i] = sum(state[i])
# because state[0][0]'s coefficients ranges from 0 to 2,
# so c[0]'s ranges from 0 to 6
for i in range(5):
for j in range(5):
# note theta doesn't handle rotate here and leaves it for rho
new_state[i][j] = state[i][j] + c[i-1] + 13 * c[i+1]
# observe that state[0][0] is never added with c[0],
# so the max coefficient after linear combination could be
# 2 + 5 + 5 = 12
# 1 + 6 + 5 = 12
# 1 + 5 + 6 = 12
# which are all up to 12.
Second sparse base is for xi
, which calculates a ^ (~b & c) ^ d
. The result with inputs are found injective to an linear combination 2a + b + 3c + 2d
, which ranges
The d
could be round constant or block of data to digest, see here for details.
In fraklin they don't use the 64-th root of unity trick mentioned in the note. They unroll 25 rotations and split values into chunks carefully and check the rotation is correct and convert first base to second base at the same time.
We can do such check in base 2 more efficiently because the chunk size can be up to 16 (roughly 3-4 times efficiency). But the base conversion saves a bunch of bitwise lookup in theta
and xi
, so this sparse base conversion trick is still benefit for us.
For rho(T, offset=1)
as example, initially we have theta
, and we want rotated version xi
:
The notation
Note that theta
does rotate left 1 by linear combination. So for the OverflowCognizantConverterTable
.
We don't need explicitly range check each full chunk because conversion lookup does for us. However, we need to carefully check the truncated chunk like
franklin's approach to check this is also tricky:
counts
a number which multiplies its count won't wrap around. Reffirst_to_second_base_converter_table
for lookup, so we know how many bits the chunk have by lookup. RefThen it again saves 37 (12 + 12 + 13) range lookups into 3, so brilliant!
A simplified pseudo code for quick view (squeeze is simplified for fixed digest size 256-bit):
# input state_base_13: size 5x5 matrix keccak state in base 13
# input block_base_9: size 17 list of u64 data to digest in base 9 (None when squeeze)
# output state_base_13: updated state in base 13
# output digest: size 4 list of u64 in base 2 if is squeezing
def keccak_f(state_base_13: List[List[int]], block_base_9: List[int]) -> (List[List[int]], List[int]):
digest = None
for round in range(25):
state_base_13 = theta(state_base_13)
state_base_9 = rho(state_base_13)
state_base_9 = pi(state_base_9)
if round < 24:
# xi adds round constant internally
state_base_13, _ = xi(state_base_9, None, round)
continue
# when if block_base_9 is None (is squeezing),
# digest is first 4 words of updated state for keccak256 which looks like:
# digest = convert_base(updated_state_base_9[0][0:4], from=9, to=2)
state_base_13, digest = xi(state_base_9, block_base_9, round)
# xi adds round constant internally when block_base_9 is None,
# so we need to add round constant here when we are still absorbing.
# this is also the reason we have first base to be 13 instead of 12.
if block_base_9 is not None:
state_base_13[0][0] += round_constant_base_13[24]
return state_base_13, digest
# input data: arbitrary size list of u64 in base 2
# output digest: size 4 list of u64 in base 2
def keccak256(data: List[int]) -> List[int]:
# pad data to multiple of RATE (RATE is 17 u64 words per keccak_f)
padding = [0] * (0 if len(data) % RATE == 0 else RATE - len(data) % RATE)
data.extend(padding)
# absorb
state_base_13 = [5 * [0] for i in range(5)]
for offset in range(0, len(data), RATE):
block = data[offset:offset+RATE]
if i == 0:
state_base_13 = convert_base(block, from=2, to=13)
continue
block_base_9 = convert_base(block, from=2, to=9)
state_base_13, _ = keccak_f(state_base_13, block_base_9)
# squeeze
_, digest = keccak_f(state_base_13, None)
return digest
Our table row size is larger than the franklin's.
base 13 and 9
Chunk size 6 bits
Running sum https://github.com/zcash/orchard/blob/main/src/circuit/gadget/utilities/lookup_range_check.rs
Reuse the sponge function https://github.com/zcash/orchard/blob/main/src/circuit/gadget/poseidon.rs
OverflowCognizantConverterTable
's size Not sure why it's not for i in 0..13 { for j in 0..13 { ... } }
.
Because we query OverflowCognizantConverterTable for the last chunk high value and last chunk low value. The two values are suppose to be chunk 0 of the base 13 value after the rotation, so they have the relationship of low + high <= 12. That implies only
cases work.