owned this note
owned this note
Published
Linked with GitHub
## copy circuit
### copy circuit constrant
- lookup tx table(change to word late), bytecode table, rw table.
- constraint items: id, address( + slot_addr), rw counter, mask(front_mask, back_mask), pad, value_rlc, word_index etc.
### copy cases
- memory --> memory
- bytecode --> memory
- memory --> bytecode
- tx/calldata --> memory
- memory --> rlc
- rlc --> memory
- memory --> log
### new design
#### memory --> memory
- assume copy event:
`src_start` = 10, `src_end` = 50, `copy_length` = 45, because copy destination exceeds acutal `src_end`, pad to zero at extra positions. it is said byte value is zero of address[51, 55].
`dest_start` = 30, `dest_end` = 75,
sencondly, after mask 32 times, copy event needs to collect bytes in address [0, 96],
for read steps, actual read address is in range [10, 50], and mask bytes are in range address[0, 9] & address[56, 96] as mask flag, which don't involve in calculating `value_acc`.
for write steps, actual write address is in range[30, 75], and mask bytes are in range address[10, 30] & address[75, 96] as mask flag. `value_acc` don't count these bytes as well.
| q_step | type | address | byte0 | byte[1..14] | byte15 | value_word | value_word_prev | slot_addr | src_addr_end | addr_copy_start | addr_copy_end |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 1 | Memory | 0 | byte0 | ... | byte15 | r_word0_lo | r_prev_word0_lo | 0 | 50 | 10 | 50 |
| 0 | Memory | 0 | byte0 | ... | byte15 | w_word0_lo | w_prev_word0_lo | 0 | 0 | 30 | 75 |
| 1 | Memory | 16 | byte16| ... | byte31| r_word0_hi | r_prev_word0_hi | 0 | 50| 10 |50 |
| 0 | Memory | 16 | byte16| ... | byte31| w_word0_hi | w_prev_word0_hi | 0 | 0 | 30 |75 |
| 1 | Memory | 32 | byte32| ... |byte47| r_word1_lo |r_prev_word1_lo |32 | 50 | 10 |50 |
| 0 | Memory |32 |byte32| ... |byte47| w_word1_lo |w_prev_word1_lo |32 | 0 | 30 |75 |
| 1 | Memory |48 |byte48| ... |byte63| r_word1_hi |r_prev_word1_hi |32 | 50 | 10 |50 |
| .. | Memory | ... | ... | ... | .. | ... | ... | ...| ... | ...|... | ... |
| 0 | Memory | 95 | byte64| ... | byte95 | w_word2_hi |w_prev_word2_hi | 64 | 0| 30 | 75|
Notes:
`value_word`: keep raw 16 bytes per row. each two r/w consist of whole word.
`value_word_prev`: no longer store raw bytes.
`addr_copy_start` & `addr_copy_end`: new columns for calculating mask information(alternative is keep 16 bit with 16 columns).
`address` : address begin from first byte, which can be masked.
#### bytecode --> memory
assume copy event:
`src_start` = 10, `src_end` = 50, `copy_length` = 45, because copy destination exceeds acutal `src_end`, pad to zero at extra positions. it is said byte value is zero of address[51, 55].
`dest_start` = 30, `dest_end` = 75.
| q_step | type | address | byte0 | byte[1..14] | byte15 | value_word | value_word_prev | slot_addr | src_addr_end | addr_copy_start | addr_copy_end |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 1 | bytecode| 0 | mem_byte0 | ... | bytecode_byte15 | 0 | 0 | 0 | 50 | 10 | 50 |
| 0 |Memory | 0 | byte0 | ... | byte15 | w_word0_lo | w_prev_word0_lo | 0 | 0 | 30 | 75 |
| 1 |bytecode| 16 | byte16 | ... | byte15 | 0 | 0 | 0 | 50 | 10 | 50 |
| 0 |Memory | 16 | byte16| ... | byte31| r_word0_hi | w_prev_word0_hi | 0 | 50| 10 |50 |
| 1 |bytecode| 32 | byte32| ... | byte31| 0(w_word0_hi) | 0(r_prev_word0_hi) | 0 | 50 | 30 |75 |
| 0 | Memory| 32 | byte32 | ... | byte15 | w_word1_lo | w_prev_word1_lo | 0 | 0 | 10 | 50 |
| 1 |bytecode | 48 | byte48| ... |byte47| 0(r_word1_lo)|0(r_prev_word1_lo)|32 | 50 | 10 |50 |
| 0 |Memory|48 |byte48| ... |byte47| w_word1_lo |w_prev_word1_lo |32 | 0 | 30 |75 |
| 1 |bytecode | 64 | byte64 | ... | byte15 | 0 | 0 | 0 | 50 | 10 | 50 |
| 0 |Memory |64 |byte64| ... |byte63| w_word1_hi |w_prev_word1_hi |32 | 50 | 10 |50 |
| .. | ... | ... | ... | ... | .. | ... | ... |0 | 0 | 50| 0 | 50 |
| 0 | Memory | 95 | byte64| ... | byte95 | w_word2_hi |w_prev_word2_hi | 64 | 0| 30 | 75|
### constraint looks like
1. is_pad:
- loop id in [0, 15]
- is_pad = addr > src_end
- assert is_pad && byte[id] == 0
2. rw_counter
- each two read/write steps rwc += 1
- each two read/write steps rwc_left += 1
3. mask
- front mask: address < addr_copy_start
- loop i in 0..16 {
if address + i (mask) LTGadget
do rlc
}
- back mask: address > addr_copy_end
4. address
- each read/write step address += 16
- each two steps slot_addr += 32
5. addr_copy_start & addr_copy_end
- // if is_first_step: addr_copy_start = address
6. value_rlc
- rlc per row by mask
7. word_hi, word_lo
- r_word[i]_lo(hi) = byte0 + byte1*2^8 + ... + byte15 * 2^120
- w_word[i]_lo(hi) = byte0 + byte1*2^8 + ... + byte15 * 2^120
8. real_bytes_left
- real_bytes_left[1] = real_bytes_left[0] - num_non_masked_byte
9. lookups
- each two r/w steps lookup [with r_word[i]_lo(hi) or w_word[i]_lo(hi) ] + slot_addr
- for other non memory lookups, do 16 bytes lookup per 16 columns(same lookup count as before)
### costs analyze
- 16x (32 --> 2) rows reduction
- number of coulumns increases
- number of lookups remains
### todos:
1. other cases' difference, not list above
### Dev
Circuits involved
Copy circuit (with bus mapping)
1. don't impl the 16 lookups for bytecode/tx/... now. It is possible that we also add some hi lo columns to bytecode circuit. Will decide later.