ZKEVM pi circuit design

# ZKEVM pi circuit design ## Addressing Problems Existing design rely on verifier to compute 2 hirarchy of random linear combination to prepare instances. The 1st hirarchy happened on convert block/transaction information which bytes up to U256/Word level. For example, take `block -> difficulty` as example. To fit into BN254 field, we need below convertion ```rust let difficulty = rlc(block_values.difficulty.to_le_bytes(), randomness); // Word to Little ending byte first, then rlc. region.assign_advice( || "difficulty", self.block_table.value, offset, || Value::known(difficulty), )?; ``` The 2nd rlc happend on union all the fields (block, transaction id, transaction index, transaction value) into a single BN254 field. The final value is so called `rpi_rlc`, which is ultimate a single BN254 field. In the end, verifier computed `rpi_rlc`, along with other fields (rand_rpi, chain_ID, state_root, pre_state_root) and commit as instance polymonial. One problem for above design is the gas consumption for L2 rollup, since the L2 verifier happened on chain. The rlc is very costly, which need verifier to use bunch of `AddMod,MulMod,DUP,SWAP` opcode to compute 2 hirarchy rlc. This consume a lot of gas. There is no EVM pre-compile to compute rlc in lower cost way. For L1 zkevm, although the verification is out-side EVM, we still can explore more efficient way to compute it. ## Solution Idea Overall idea is, - flattern & hash. verifier just need to hash all raw bytes into a single digest and put digest as final input. Keccak is a pre-compile therefore the digest computation is cheaper. - witness rlc computation and constraints them to raw bytes. Intuitively, we can layout all of the raw public into a single column vertically (here majorly based on Halo2 circuit perspective). There will be separate column to accumulate raw public input bytes and derive input_rlc for Keccak lookup. Tricky part, since right now single value is split into bytes and layout vertically, we turn to use permutation (copy) constraints to assure cell equality, instead of rely on rotation constaints. We still can know the exactly cell of each accumulated value positions due to bytes is known in advance, therefore permutation constraints fits this scenarios. In Halo2, permutation constraints is happened via `AssignedCell` + `require_equal`. For the digest part, it's a Keccak digest which is U256. To fit into BN254, here need to split into 2xU128, named with `Hi`, `Lo` respectively. There is also a witness column for bytes + 2^8 accumulated to constriant with `Hi`, `Lo`. And a witness column for bytes accumulated rlc to used for output_rlc. ### Design considerations - Keccak Lookup need to happend at same rotation for input_rlc/output_rlc. ### Columns Definition Layout #### Raw Public Input layout raw public input as witness to bytes vertically | raw_bytes | | -------- | | byte 1 | | byte 2 | | ... | With 1 extra rlc to aggregate all bytes | raw_bytes | rlc | | -------- | --------- | | byte 1 | byte1 = acc 0 | | byte 2 | acc 0 x rand + byte 2| | ... | .... | | byte n | acc n-1 x rand + byte n And another extra rlc to aggregate by value-based. | ... | value_start |rlc by value| | -------- |---| --------- | | ... | 1 | byte1 = acc 0 | | ... | 0 | acc 0 x rand + byte 2| | ... | 0 | .... | | ... | ... | ... | ... | 0 | acc i - 1 x rand + byte i | ... | 1 | byte i+1 = acc i+1 | ... | 0 | acc i+1 x rand + byte i+2 | ... | 0 | ... `Value` logically can be any number of bytes. For example, coinbase is 20 bytes (160 bits), so the `rlc by value` will aggregate & accumulated those 20 bytes. For block `difficulty` it's EVM word 32 bytes (256 bits), then `rlc by value` is on 32 bytes. There will be a `value_start` boolean column to mark the boundary. `rand` will be choosed depends on the Field size. If final value can be fit into Field, then rand will be choose as `2^8`. At the time of writings (10 April 2023), zkevm field is BN254. Therefore the value <= 254 bits can be rlc by `2^8`. #### Keccak Digest witness digest is 32 bytes and layout verticlely. | digest_bytes | | -------- | | byte 0 | | byte 2 | | ... | | byte 31 | And same logic applied, 2 rlc column will be derived based on digest_bytes. | digest_bytes |value_start| rlc | lc | | -------- | ---- | ---- | --- | | byte 0 | 1 | byte 0 = acc 0 | byte 0 = acc_hi 0 | | byte 1 | 0 | acc 0 x rand + byte 1 | acc_hi 0 x `2^8` + byte 1 | | ... | 0 | ... | ... | | byte 15 | 0 | ... | acc_hi 14 x `2^8` + byte 15 | | byte 16 | 1 | acc 15 x rand + byte 16 | byte 16 = acc_lo | | ... | ... | acc 16 x rand + byte 17 | acc_lo 1 x `2^8` + byte 17 | | byte 31 | 0 | ... | ... | derived `rlc` column will be used for keccak output lookup, while `lc` column will be used separate into 2 x 16 bytes aggregation. The first `[0, 16)` digest bytes is named `hi`, while `[16, 32)` digest is named `lo` #### instance | lc | ... | instance | | -------- | -------- | -------- | | ... | ... | Hi digest | | ... | ... | Lo digest | |... | ... | ... | |acc_hi 14 x `2^8` + byte 15 | ... | ... | |... | ... | ... | |... | ... | ... | |acc_lo 14 x `2^8` + byte 31 | ... | ... | Copy constraints to enforce - `hi digest = acc_hi 14 x 2^8 + byte 15` - `lo digest = acc_lo 14 x 2^8 + byte 31` #### Keccak Lookup Input bytes/output digest rlc lookup is happened on last row. Since output digest is only at 31th cell, there is extra copy constraint to assure 31th cell equals to the cell in last row. ### Thoughts & TODO - further eliminate validator effort by witness TxSignHash. - refactor TxTable assignments & constraints to tx_circuits to isolate responsibility - might need to use cell-manager to reduce row consumption