or
or
By clicking below, you agree to our terms of service.
New to HackMD? Sign up
Syntax | Example | Reference | |
---|---|---|---|
# Header | Header | 基本排版 | |
- Unordered List |
|
||
1. Ordered List |
|
||
- [ ] Todo List |
|
||
> Blockquote | Blockquote |
||
**Bold font** | Bold font | ||
*Italics font* | Italics font | ||
~~Strikethrough~~ | |||
19^th^ | 19th | ||
H~2~O | H2O | ||
++Inserted text++ | Inserted text | ||
==Marked text== | Marked text | ||
[link text](https:// "title") | Link | ||
 | Image | ||
`Code` | Code |
在筆記中貼入程式碼 | |
```javascript var i = 0; ``` |
|
||
:smile: | ![]() |
Emoji list | |
{%youtube youtube_id %} | Externals | ||
$L^aT_eX$ | LaTeX | ||
:::info This is a alert area. ::: |
This is a alert area. |
On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?
Please give us some advice and help us improve HackMD.
Do you want to remove this version name and description?
Syncing
xxxxxxxxxx
Acknowledgement
We port Konstantin Panarin's idea of sparse representation and the code examples to Halo2.
Terminology
Keccak
Sparse representation
The lane size is 64 bits. We split it into slices. In the binary to 13 base mapping, we work with 16 bits or 16 chunks per slice. A lane has 4 slices.
In the 13 base to 9 base mapping, we work with 4 bits or 4 chunks per slice.
Introduction
Overview
The original keccak
How we do keccak in the circuit
The first base and the second base
Why the first sparse base is 13? because in theta step we have
Note that we don't do full rotate_left(, 1), we do shift_left(, 1) instead and save the wrap over in the later step. After shift_left the sparse representation is at most 13^65 and is about 241 bits.
Why the second sparse base is 9? because in the xi and iota steps combined we have at most 8 0~1 variables adding together.
Break down of the theta step
new_state[x][y]
is a base 13 number with 65 chunks.The full roation is supposed to give us a base 13 value
t
with 64 chunks. Let's denote itt0
~t63
.The
t0
is supposed to bes0 + c4_0 + c1_63
. But innew_state[x][y]
,s0 + c4_0
is at thens0
andc1_63
is at thens64
.We call the
ns0
"last chunk high value" andns64
"last chunk low value."Break down the rotate_and_convert
In the rotate_and_convert, we do the rotation and the conversion from base 13 to base 9 simultaniously.
For example, we want to rotate 5. Let's first observe the ground truth
t
we assume
first base converter
has been applied ont
, so that each chunkti
is either 0 or 1.The sum-product of t and the base gives us a 64 chunks base 9 number.
But the
new_state[x][y]
from the theta step has the special chunks "chunk 0" and "chunk 64"We can handle the rest of the chunks as usual, but the
ns0
andns64
needs special care.add the sum-product with
keccak_u64_first_converter(ns0 + ns64)*9**5
and we get the correctly rotated base 9 number.Example
Note that
ns64 + ns0
must be less than or equal to 12keccak_u64_first_converter(ns0 + ns64)
evalutes to 0Break down the rho step validation
How do we implement Keccak steps in the circuit?
Theta
Original
In circuit
The argument that each output lane's coefficients is at most 12
Note that in the middle of the rounds we might enter the theta step from previous iota step, which is a step that xors a round constant to
state[0][0]
. So thatstate[0][0]
might start with 2.Since
state[0][0]
might start with 2 andc[0]
contains astate[0][0]
,c[0]
could be at most 6.Taking a closer look to the number
new_state[x][y]
in base 13 system at line 9, we observe the largest possible value for coefficient at any given position.For
x not in (0, 1, 4)
, we havestate[x][y]
at most 1,c[(x+4)% 5]
at most 5, and13 * c[(x+1)%5]
from the previous position that contributs at most 5. The total is at most 1+5+5 = 11.For
x == 0
, we havestate[x][y]
at most 2, which happens atstate[0][0]
,c[(x+4)% 5]
at most 5 sincestate[0][0]
is not in it, and13 * c[(x+1)%5]
at most 5 for the same reason. The total is at most 2 + 5 + 5 = 12.For
x in (1, 4)
,state[x][y]
is at most 1. One ofc[(x+4)% 5]
orc[(x+1)%5]
, but not both, could be at most 6 sincestate[0][0]
is in exact one of them. The total is at most 1 + 5 + 6 = 12.Cost analysis
5 linear combination for c
25 linear combination for new_state
Pi
Nothing too interesting here
Cost analysis
No cost in circuit???
Rho
Original
In circuit
Cost analysis
lastly we need 3 linear combination check for offset_map for each non-4 step
total:
Xi + Iota step
Original
Original: Merging Iota step
We have the
keccak_u64_second_converter
function to map2*a + b + 3*c + 2*d
toa ^ (~b & c) ^ d
.Depending if we are at the last round of round_b or last round of keccak_f, we'll use
d
to do- iota step to add a round constant in base 9, or
- absorb next input to step, then add round constant in base 13 later.
In the circuit
Cost analysis
25 linear combination check for
new_state[x][y]
Helpers
get_step_size (check_offset_helper)
get_steps
keccak_u64_first_converter
keccak_u64_second_converter
The idx is from 0~15
The a, b, c, d are binary decomposition
If we sort by the f_alg column and select f_logic column together, we get the following table. Don't worry about the duplicated rows, they are consistent since f_algebratic -> f_logic is well-defined
If we implement that table it would look just like
keccak_u64_second_converter
rotate_and_convert
normalizer
Tables
from_binary_converter_table
Purpose: Convert 64 bits value from binary to base 13 in 16 bits chunks
This table has 2**16 rows
first_to_second_base_converter_table
Purpose: convert the middle slices, where the chunk size is 4
This table has 13**4 rows
of_first_to_second_base_converter_table
Purpose: convert the t0 and the t64 chunks from the arithmetic result of the sparse form to the bitwise operation result of the binary form.
The table size should be (13 + 1) * 13 / 2 = 91 rows
from_second_base_converter_table
Purpose: Convert from base 9 to base 13 or binary
This table has 9**5 rows
Sponges
Keccak use simple sponge
For Keccak256, the squeeze phase is a no-op
First we absorb the input bytes. Convert them to base 13 sparse representation using from_binary_converter_table.
We absorb at a rate of 16 bits. (could enlarge the table size to optimize this)
state is initialized 1600 bits.
We absorb 1088 bits of input at a time.
We run keccak_f1600 for 1600 bits state to get the next state.
When we finish the absorb, we xored 0x01 and 0x80 at specific place. Run keccak_f1600 again.
Return first 256 bits of the state.
Cost analysis
(WIP) Working with multiple inputs
Appendix
(WIP) Decyphering block counting function
What's known
g = |x| { of_transformed[x as usize] };
counts
is[12, 12, 13]
. The first_base_num_of_chunks is 4, useually we lookup the 4 bit chunks. But there are exceptional cases, thecounts
meansWhat's unknown
https://github.com/matter-labs/franklin-crypto/blob/a78e17674dd1c99788259986cae11239953b50af/src/plonk/circuit/hashes_with_tables/keccak/gadgets.rs#L176
Chips
expose keccak256 as a gadget
impl keccak256 for keccak-f
keccak-f a trait,
gadget
rho, theta,
gadget is a group of traits
chips implements the traits
keccak-f as the unit interface
try learn from the poseidon example https://github.com/zcash/orchard/blob/main/src/circuit/gadget/poseidon.rs
(WIP) Circuit as a Lookup Table
Because most
keccak256
in evm doesn't have fixed length input. If we want to handle all cases including the worst case by defining some max input length with a completekeccak256
, we might have to fine-tune the circuit to not waste too much.Another approach is to not have a complete
keccak256
, we instead have repeatingkeccak_f
with some beginning and end condition, which seems more friendly for other circuit to use.Naively each
keccak_f
could look like:if not is_final
then we have to bump the number of offset (either 1 or 17)where
i_*
is input and the region inside waves we performkeccak_f
.Other circuits call it like
lookup(i_1, i_2, i_3, ..., i_17, hash, offset, size, is_final)
to ensurekeccak256
is correct. (would call multiple times if the input is more than 17 words)Then keccak256 circuit will set next input to all 0 and check if
hash
is matching next output(o_1, o_2, o_3, o_4)
whenis_final
is set to1
.The reason for next input and next output is because current algorithm kind of defer the
keccak_f
to next round with no input. If we are changing the order, the base 13 would not work anymore.When the
is_final
=1, we do the padding.