I started contributing to the EF PSE ZK-EVM effort in Q3 2021
Several teammates came from Loopring
Split because Loopring is an app-specific ZKR, while Taiko is general purpose
Around 20 people now
Taiko
Type 1 ZK-EVM
Prioritize Ethereum equivalence, give up ZK-friendly optimizations
Main benefit is smooth migration and reusability of all L1 dapps, tools, and infrastructure
Main downside is slower proof generation
Decentralized
Only relies on Ethereum public data, anyone can run a node
Permissionless
Any proposer or prover can participate
Architecture
ZK-EVM (community effort from PSE ZK-EVM circuits)
L2 node
Protocol
Taiko Protocol
Block proposers: submit a list of transactions to the rollup contract, appended to the chain
Block provers: generate proofs for blocks
Key properties:
Block execution is deterministic after the block is proposed
Instant finality
Parallel proof generation
Block on-chain finalized if block and all its predecessors proven (L2 -> Lx)
Extra transaction injected at block proposing time
Used to bring the L1 block hash into an L2 smart contract (for bridging)
Store all L2 block hashes
Also used to verify some L2 data (no specialized circuit code is required)
KECCAK-256/SHA-256
Repetition of the same operations in N rounds on some state
After N rounds:
More data to process: carry state over
All data processed: load initial state
KECCAK-256/SHA-256
Not really efficient:
Work on binary fields (vs native field)
Large state
Lots of binary operations
Two different approaches:
Bit approach
Packed approach (example: 2 bits/data bit)
KECCAK-256: s = a ^ ((~b) & c)
AND
b0*b1 (degree(b0) + degree(b1))
OR
1-(1-b0)*(1-b1) (degree(b0) + degree(b1))
XOR
b0+b1 - 2*b0*b1 (degree(b0) + degree(b1))
NOT
1 - b0 (degree(b0))
Bit approach:
s = a + (1-b)*c - 2*a*(1-b)*c (degree 3)
But, s = a ^ ((~b) & c) ⇔ s ^ a = ((~b) & c) => s*a-2*s*a = (1-b)*c (degree 2)
Packed approach:
Could lookup (a, b, c, s), but needs 2^3=8 entries per bit
Instead lookup (3 - 2*a + b - c, s) with lookup table [0, 1, 1, 0, 0], only 5 entries/bit
Quite easy to find best solution with brute force
Have to make sure there’s no overflow/underflow per bit
Difference adds up, minimum circuit height for 8 bits
log2(8^8)=24 vs log2(5^8)=18.6
XOR
Bit approach:
Custom gates
Chaining many of these results in a high degree
Packed approach
Just add them together, do & 1 using a lookup
Mixed approach
Do custom gates up to some target degree
Then do lookup on more efficient table
Bit rotation/shift
Bit approach:
Free: just substitute directly in the constraint expression
Packed approach
Split up packed representation in parts at the rotation point, merge back together
Can delay merging/resplitting if the resulting parts are again needed in lookups, so split smart
Example: 12345678abcd as the 12-bit word
rot 1: 123|4567|8abc|d -> d|123 | 4567 | 8abc
rot 2: 12|3456|78ab|cd -> cd|12 | 3456 | 78ab
rot 5: 123|4567|8abc|d -> 8abc | d|123 | 4567
Bit approach
Binary representation
Use custom gates
Great because:
Simple
Pretty good performance
Many custom gates, but custom gates are cheap
MSM with binary coefficients cheap (reduces to a conditional sum)
But:
Requires a lot of cells (1 cell/bit)
Have to be very careful about the expression degree (HyperPlonk?)
Efficient high degree circuits useful to minimize verification costs (fewer columns)
Generally less flexible circuit layout wise
Packed approach
Packs multiple bits together (for keccak: 3 bits/bit)
More limited use of custom gates
ADD/SUB/MUL(with constant)
PACK/UNPACK/SPLIT/MERGE
Have to use lookups for most calculations
Sometimes “free” when range check already needed
Great because:
Efficient if lookups are efficient
Generally uses fewer cells
Very flexible circuit layout wise
But:
Lookups currently not very efficient (Caulk?)
Circuits generally more complex for these types of calculations
Circuit layout
Extremely important for performance
Bit approach:
Most efficient generally to just do a round/row
This requires a lot of custom gates per row and many columns
Can split a round over multiple rows by disabling custom gates, but loses some efficiency
Packed approach:
Easy to spread out lookups over multiple rows, using relative offsets to read results on every nth row
Not custom gate heavy, only used as the glew, so no problem disabling on many rows without any significant loss of efficiency
Example: a + b + c
SHA-256
for i from0 to 63
S1 := (e rightrotate 6) xor (e rightrotate 11) xor (e rightrotate 25)
ch := (e and f) xor ((not e) and g)
temp1 := h + S1 + ch + k[i] + w[i]
S0 := (a rightrotate 2) xor (a rightrotate 13) xor (a rightrotate 22)
maj := (a and b) xor (a and c) xor (b and c)
temp2 := S0 + maj
h := g
g := f
f := e
e := d + temp1
d := c
c := b
b := a
a := temp1 + temp2
Additions are modulo 2^32
Use optimized versions of ch and maj to lower the degree:
select(s, t, f) := s*t + (1-s)*f
ch := select(e, f, g)
maj := select(b^c, a, b);
SHA-256
SHA-256 circuit layout
Some extra bits to handle additions modulo 2^32
w pretty similar to the other state
Only w[i] required on each row
# copy chunk into first 16 words w[0..15] of the message schedule array# Extend the first 16 words into the remaining 48 words w[16..63] of the array:for i from 16 to 63s0 := (w[i-15] rightrotate 7) xor (w[i-15] rightrotate 18) xor (w[i-15] rightshift 3)
s1 := (w[i-2] rightrotate 17) xor (w[i-2] rightrotate 19) xor (w[i-2] rightshift 10)
w[i] := w[i-16] + s0 + w[i-7] + s1
Results
Hardware: c5.18xlarge
Software: PSE halo2 + different FFT
But
CPU pretty old
Circuit size has an impact on performance
KECCAK-256 (bit) (2^16) (16 cores): 14.8 hashes/s
Number of lookups required decreases the higher the circuit
Not all optimizations enabled/implemented (not using asm field operations, not the best MSM, …)
Circuit expression degree higher than needed
Scaling pretty badly over multiple cores currently
Take into account recursion? May want to prefer higher expression degree to reduce columns
Taiko ZK-EVM: Overview and Optimizations KECCAK-256 and SHA-256 Brecht Devos brecht@taiko.xyz
{"metaMigratedAt":"2023-06-17T14:49:44.623Z","metaMigratedFrom":"YAML","title":"Taiko ZK-EVM: Overview and Optimizations","breaks":true,"description":"Taiko talk on overview and optimizations -- KECCAK-256 and SHA-256","contributors":"[{\"id\":\"2435332b-f6a2-41d6-a459-c558af9bc6e3\",\"add\":14548,\"del\":6410}]"}