## copy circuit ### copy circuit constrant - lookup tx table(change to word late), bytecode table, rw table. - constraint items: id, address( + slot_addr), rw counter, mask(front_mask, back_mask), pad, value_rlc, word_index etc. ### copy cases - memory --> memory - bytecode --> memory - memory --> bytecode - tx/calldata --> memory - memory --> rlc - rlc --> memory - memory --> log ### new design #### memory --> memory - assume copy event: `src_start` = 10, `src_end` = 50, `copy_length` = 45, because copy destination exceeds acutal `src_end`, pad to zero at extra positions. it is said byte value is zero of address[51, 55]. `dest_start` = 30, `dest_end` = 75, sencondly, after mask 32 times, copy event needs to collect bytes in address [0, 96], for read steps, actual read address is in range [10, 50], and mask bytes are in range address[0, 9] & address[56, 96] as mask flag, which don't involve in calculating `value_acc`. for write steps, actual write address is in range[30, 75], and mask bytes are in range address[10, 30] & address[75, 96] as mask flag. `value_acc` don't count these bytes as well. | q_step | type | address | byte0 | byte[1..14] | byte15 | value_word | value_word_prev | slot_addr | src_addr_end | addr_copy_start | addr_copy_end | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 1 | Memory | 0 | byte0 | ... | byte15 | r_word0_lo | r_prev_word0_lo | 0 | 50 | 10 | 50 | | 0 | Memory | 0 | byte0 | ... | byte15 | w_word0_lo | w_prev_word0_lo | 0 | 0 | 30 | 75 | | 1 | Memory | 16 | byte16| ... | byte31| r_word0_hi | r_prev_word0_hi | 0 | 50| 10 |50 | | 0 | Memory | 16 | byte16| ... | byte31| w_word0_hi | w_prev_word0_hi | 0 | 0 | 30 |75 | | 1 | Memory | 32 | byte32| ... |byte47| r_word1_lo |r_prev_word1_lo |32 | 50 | 10 |50 | | 0 | Memory |32 |byte32| ... |byte47| w_word1_lo |w_prev_word1_lo |32 | 0 | 30 |75 | | 1 | Memory |48 |byte48| ... |byte63| r_word1_hi |r_prev_word1_hi |32 | 50 | 10 |50 | | .. | Memory | ... | ... | ... | .. | ... | ... | ...| ... | ...|... | ... | | 0 | Memory | 95 | byte64| ... | byte95 | w_word2_hi |w_prev_word2_hi | 64 | 0| 30 | 75| Notes: `value_word`: keep raw 16 bytes per row. each two r/w consist of whole word. `value_word_prev`: no longer store raw bytes. `addr_copy_start` & `addr_copy_end`: new columns for calculating mask information(alternative is keep 16 bit with 16 columns). `address` : address begin from first byte, which can be masked. #### bytecode --> memory assume copy event: `src_start` = 10, `src_end` = 50, `copy_length` = 45, because copy destination exceeds acutal `src_end`, pad to zero at extra positions. it is said byte value is zero of address[51, 55]. `dest_start` = 30, `dest_end` = 75. | q_step | type | address | byte0 | byte[1..14] | byte15 | value_word | value_word_prev | slot_addr | src_addr_end | addr_copy_start | addr_copy_end | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | 1 | bytecode| 0 | mem_byte0 | ... | bytecode_byte15 | 0 | 0 | 0 | 50 | 10 | 50 | | 0 |Memory | 0 | byte0 | ... | byte15 | w_word0_lo | w_prev_word0_lo | 0 | 0 | 30 | 75 | | 1 |bytecode| 16 | byte16 | ... | byte15 | 0 | 0 | 0 | 50 | 10 | 50 | | 0 |Memory | 16 | byte16| ... | byte31| r_word0_hi | w_prev_word0_hi | 0 | 50| 10 |50 | | 1 |bytecode| 32 | byte32| ... | byte31| 0(w_word0_hi) | 0(r_prev_word0_hi) | 0 | 50 | 30 |75 | | 0 | Memory| 32 | byte32 | ... | byte15 | w_word1_lo | w_prev_word1_lo | 0 | 0 | 10 | 50 | | 1 |bytecode | 48 | byte48| ... |byte47| 0(r_word1_lo)|0(r_prev_word1_lo)|32 | 50 | 10 |50 | | 0 |Memory|48 |byte48| ... |byte47| w_word1_lo |w_prev_word1_lo |32 | 0 | 30 |75 | | 1 |bytecode | 64 | byte64 | ... | byte15 | 0 | 0 | 0 | 50 | 10 | 50 | | 0 |Memory |64 |byte64| ... |byte63| w_word1_hi |w_prev_word1_hi |32 | 50 | 10 |50 | | .. | ... | ... | ... | ... | .. | ... | ... |0 | 0 | 50| 0 | 50 | | 0 | Memory | 95 | byte64| ... | byte95 | w_word2_hi |w_prev_word2_hi | 64 | 0| 30 | 75| ### constraint looks like 1. is_pad: - loop id in [0, 15] - is_pad = addr > src_end - assert is_pad && byte[id] == 0 2. rw_counter - each two read/write steps rwc += 1 - each two read/write steps rwc_left += 1 3. mask - front mask: address < addr_copy_start - loop i in 0..16 { if address + i (mask) LTGadget do rlc } - back mask: address > addr_copy_end 4. address - each read/write step address += 16 - each two steps slot_addr += 32 5. addr_copy_start & addr_copy_end - // if is_first_step: addr_copy_start = address 6. value_rlc - rlc per row by mask 7. word_hi, word_lo - r_word[i]_lo(hi) = byte0 + byte1*2^8 + ... + byte15 * 2^120 - w_word[i]_lo(hi) = byte0 + byte1*2^8 + ... + byte15 * 2^120 8. real_bytes_left - real_bytes_left[1] = real_bytes_left[0] - num_non_masked_byte 9. lookups - each two r/w steps lookup [with r_word[i]_lo(hi) or w_word[i]_lo(hi) ] + slot_addr - for other non memory lookups, do 16 bytes lookup per 16 columns(same lookup count as before) ### costs analyze - 16x (32 --> 2) rows reduction - number of coulumns increases - number of lookups remains ### todos: 1. other cases' difference, not list above ### Dev Circuits involved Copy circuit (with bus mapping) 1. don't impl the 16 lookups for bytecode/tx/... now. It is possible that we also add some hi lo columns to bytecode circuit. Will decide later.