# RLP Encoding Circuit The goal of the circuit is to verify that for a given RLP-encodable data, the encoded data is valid. ## RLP Encoding An item is defined as: - A string, i.e. byte array is an item - A list of items is an item ### Code ```python def rlp_encode(input): if isinstance(input,str): if len(input) == 1 and ord(input) < 0x80: return input else: return encode_length(len(input), 0x80) + input elif isinstance(input,list): output = '' for item in input: output += rlp_encode(item) return encode_length(len(output), 0xc0) + output def encode_length(L,offset): if L < 56: return chr(L + offset) elif L < 256**8: BL = to_binary(L) return chr(len(BL) + offset + 55) + BL else: raise Exception("input too long") def to_binary(x): if x == 0: return '' else: return to_binary(int(x / 256)) + chr(x % 256) ``` ### Possible Ranges The first bytes for the below cases end-up in the following ranges: - For an item - if `len(item) == 1` and `item < 0x80`: **[0x00, 0x7f]** - if `len(item) < 56`: **[0x80, 0xb7]** - if `len(item) > 55`: **[0xb8, 0xbf]** - For a list of items - if `len(items) < 56`: **[0xc0, 0xf7]** - if `len(items) > 55`: **[0xf8, 0xff]** Hence, given the first byte: 1. **val ∈ [0x00, 0x7f]** - This byte is the value itself - The following byte will be a new start of item/items 2. **val ∈ [0x80, 0xb7]** - The following `length` bytes represent an item with `length = val - 0x80` - The byte following `length` bytes will be a new start of item/items 3. **val ∈ [0xb8, 0xbf]** - The following `length_of_length` bytes, where `length_of_length = val - 0xb7` are the big-endian representation for the length of the actual item - We construct the length of the actual item `length` from the next `length_of_length` bytes - After `length_of_length` bytes, the following `length` bytes represent the item - The byte following `length_of_length + length` bytes (from the first byte) will be a new start of item/items 4. **val ∈ [0xc0, 0xf7]** - The following `length` bytes, where `length = val - 0xc0` are the RLP-encoding for all the items contained in this list of items - We recursively follow the same points from 1, 2, 3 and 4 - **val ∈ [0xf8, 0xff]** - The following `length_of_length` bytes, where `length_of_length = val - 0xf7` are the big-endian representation for the length of the RLP-encoding for all the items contained in this list of items - We construct the length from the next `length_of_length` bytes - After `length_of_length` bytes, the following `length` bytes are those items - We recursively follow the same points from 1, 2, 3 and 4 ### Circuit Layout #### Columns Every byte in the RLP-encoded output takes up a single row in the circuit layout. We use the following columns for every row: 1. `q_enable`: Selector that denotes whether this row is enabled in the layout. 2. `q_first`: Fixed column that denotes whether this is the first row in the layout. 3. `q_last`: Selector that denotes whether this is the last row in the layout. 4. `is_final`: Denotes whether the row is the last byte in the RLP-encoded data. 5. `index`: Denotes the index of this row, starting from `0` and ending at `n-1` where `n` is the byte length of the RLP-encoded data. 6. `tag`: Denotes the [row tag](https://hackmd.io/WSdo-JmdTgyiwk5B945LcQ#Tags). 7. `value`: Denotes the byte value in the RLP-encoded data. 8. `length_rindex`: - Denotes the number of rows, including the current row, that represent the big-endian form of the `length`. - Since the `length` can only take up to 8 bytes (`len < 256**8`), we have `1 <= length_rindex <= 8`. - `length_rindex` decrements until `length_rindex == 1` for the last row representing `Tag::Length`. 9. `length_acc`: Denotes an accumulator for the length of data, in the case where `len > 55` and the length is represented in its big-endian form. 10. `block_length`: Denotes the byte length of the current `Length` or `LengthOfLength` block. 11. `depth`: Denotes the depth of nested structure from the root. Starts from `0`. 12. `parent_rindex`: Denotes the appropriate `data_rindex` from the immediate parent block. We use the `parent_rindex` to calculate the correct `data_rindex` once we step out of a nested block. 13. `data_rindex`: Denotes the number of rows, including the current row, that are left in the current data block. 14. `value_rlc`: - Denotes an accumulator for the value field over all rows. - `value_rlc = value[0] + r * value[1] + r^2 * value[2] + ... + r^(n-1) * value[n-1]`. 15. `hash`: Denotes the keccak-256 hash of the RLP-encoded data. 16. `keccak_tuple`: - Denotes a tuple `(keccak_tuple[0], keccak_tuple[1], keccak_tuple[2])` where - `keccak_tuple[0] -> value_rlc` - `keccak_tuple[1] -> n` - `keccak_tuple[2] -> keccak256(rlp_encode(input))` 17. `padding`: Denotes whether the row appears after `is_final`, hence the purpose is padding. #### Tags We make use of the following tags: 1. `Tag::Data`: Represents the data bytes of an item. 2. `Tag::Length`: Represents the length of an item(s), when `len < 56`. 3. `Tag::LengthOfLength`: Represents the length in bytes of the length of the item(s) in binary form, when `len > 55`. #### Constraints We have the following constraints for transitioning between the above tag states: - `Tag::LengthOfLength`: - Current row's `value`: `value ∈ [0xb8, 0xbf] or value ∈ [0xf8, 0xff]`. - Current row's `length_acc`: `length_acc == 0`. - Current row's `data_rindex` and `block_length`: `data_rindex == block_length`. - Current row's `parent_rindex`: - *(First row, i.e. start of layout)* If `q_first` then `parent_rindex == 0`. - *(Stay at the same depth)* If `data_rindex_prev == 1` then `parent_rindex == parent_rindex_prev - block_length_prev`. - *(Go down one level deeper)* Otherwise `parent_rindex == data_rindex_prev`. - Current row's `depth`: - *(First row, i.e. start of layout)* If `q_first` then `depth == 0`. - *(Stay at the same depth)* If `data_rindex_prev == 1` then `depth == depth_prev`. - *(Go down one level deeper)* Otherwise `depth == depth_prev + 1`. - Next row's `tag`: `tag_next == Tag::Length`. - Next row's `length_rindex`: `length_rindex_next == lvalue`, where - `lvalue == value - 0xb7` if `value ∈ [0xb8, 0xbf]`. - `lvalue == value - 0xf7` if `value ∈ [0xf8, 0xff]`. - `Tag::Length`: - Current row's `length_acc`: `length_acc == (length_acc_prev * 256) + lvalue`, where - `lvalue = value - 0x80` for `value ∈ [0x80, 0xb7]`. - `lvalue = value - 0xc0` for `value ∈ [0xc0, 0xf7]`. - `lvalue = value` otherwise. - Current row's `data_rindex` and `block_length`: - *(First row, i.e. start of layout)* If `q_first` then `data_rindex == block_length`. - *(Stay at the same depth)* If `depth == depth_prev`: - If `data_rindex_prev == 1` then `data_rindex == block_length`. - Otherwise `data_rindex == data_rindex_prev - 1` and `block_length == block_length_prev`. - *(Go down one level deeper)* Otherwise `data_rindex == block_length`. - Current row's `parent_rindex`: - *(First row, i.e. start of layout)* If `q_first` then `parent_rindex == 0`. - *(Stay at the same depth)* If `data_rindex_prev == 1` then `parent_rindex == parent_rindex_prev - block_length_prev`. - *(Go down one level deeper)* Otherwise `parent_rindex == data_rindex_prev`. - Current row's `depth`: - *(First row, i.e. start of layout)* If `q_first` then `depth == 0`. - *(Stay at the same depth)* If `data_rindex_prev == 1` then `depth == depth_prev`. - *(Go down one level deeper)* Otherwise `depth == depth_prev + 1`. - Next row's `tag`: - *(First row, i.e. start of layout)* If `q_first`: - `tag_next == Tag::Data`. - `length_rindex == 1`. - *(Stay at the same depth)* If `depth == depth_prev`: - If `length_rindex > 1`: - `tag_next == Tag::Length`. - `length_rindex == length_rindex_next + 1`. - If `length_rindex == 1` and `depth_next == depth`: - `tag_next == Tag::Data`. - `length_acc == data_rindex_next`. - If `length_rindex == 1` and `depth_next == depth + 1`: - If `(0x80 <= value_next <= 0xb7) or (0xc0 <= value_next <= 0xf7)` then `tag_next == Tag::Length`. - If `(0xb8 <= value_next <= 0xbf) or (0xf8 <= value_next <= 0xff)` then `tag_next == Tag::LengthOfLength`. - *(Go down one level deeper)* If `depth == depth_prev + 1`: - `tag_next == Tag::Data`. - `length_rindex == 1`. - `Tag::Data`: - Current row's `length_rindex`: `length_rindex == 0`. - Current row's `length_acc`: `length_acc == 0`. - Current row's `block_length`: - *(Stay at the same depth)* If `depth == depth_prev` then `block_length == block_length_prev`. - *(Go up one level above)* If `depth == depth_prev - 1` then *TODO* - If `data_rindex > 1`: *TODO* - If `data_rindex == 1`: *TODO* Additional constraints: - For every row: - `is_final` is boolean, i.e. `is_final ∈ [0, 1]`. - `padding` is boolean, i.e. `padding ∈ [0, 1]`. - For the first row in the circuit, i.e. `q_first == 1`: - `value_rlc == value`. - `index == 0`. - For the subsequent rows: - `index == index_prev + 1`. - `hash == hash_prev`. - `value_rlc == (value_rlc_prev * r) + value`. - For the last row of the RLP-encoded data, i.e. `(is_final == 1) and (padding == 0)`: - `(value_rlc, index + 1, hash) == (keccak_tuple[0], keccak_tuple[1], keccak_tuple[2])`. - For all rows except the first row, i.e. `q_first != 1`: - Padding can go 0 -> 1 only once, i.e. `padding - padding_prev ∈ [0, 1]`. - For the last row of the layout, i.e. `q_last == 1`: - The row is either the end of the RLP-encoded data or its a padding row, i.e. `(is_final == 1) or (padding == 1)`. #### Examples ###### Example #1 - input: `"dog"` - output: `[ 0x83, ‘d’, ‘o’, ‘g’ ]` | tag | value | length_rindex | length_acc | block_length | depth | parent_rindex | data_rindex | |--------------|-----------|---------------|------------|--------------|-------|---------------|-------------| | Length | 131 | 1 | 3 | 4 | 0 | 0 | 0 | | Data | 100 | 0 | 0 | 4 | 0 | 0 | 3 | | Data | 111 | 0 | 0 | 4 | 0 | 0 | 2 | | Data | 103 | 0 | 0 | 4 | 0 | 0 | 1 | ###### Example #2 - input: `[ “cat”, “dog” ]` - output: `[ 0xc8, 0x83, 'c', 'a', 't', 0x83, 'd', 'o', 'g' ]` | tag | value | length_rindex | length_acc | block_length | depth | parent_rindex | data_rindex | |----------------|-----------|---------------|------------|--------------|-------|---------------|-------------| | LengthOfLength | 200 | 0 | 0 | 9 | 0 | 0 | 9 | | Length | 131 | 1 | 3 | 4 | 1 | 9 | 4 | | Data | 99 | 0 | 0 | 4 | 1 | 9 | 3 | | Data | 97 | 0 | 0 | 4 | 1 | 9 | 2 | | Data | 116 | 0 | 0 | 4 | 1 | 9 | 1 | | Length | 131 | 1 | 3 | 4 | 1 | 5 | 4 | | Data | 100 | 0 | 0 | 4 | 1 | 5 | 3 | | Data | 111 | 0 | 0 | 4 | 1 | 5 | 2 | | Data | 103 | 0 | 0 | 4 | 1 | 5 | 1 | ###### Example #3 - input: `[ "abcdabcd....abcd", "wxyzwxyz...wxyz" ]` for each string, `len(string) == 1024` - output: `[ 0xf9, 0x08, 0x06, 185, 0x04, 0x00, 'a', 'b', 'c', 'd', ..., 185, 0x04, 0x00, 'w', 'x', 'y', 'z', ... ]` | tag | value | length_rindex | length_acc | block_length | depth | parent_rindex | data_rindex | |----------------|-----------|---------------|------------|--------------|-------|---------------|-------------| | LengthOfLength | 249 | 0 | 0 | 2057 | 0 | 0 | 2057 | | Length | 8 | 2 | 8 | 2057 | 0 | 0 | 2056 | | Length | 6 | 1 | 2054 | 2057 | 0 | 0 | 2055 | | LengthOfLength | 185 | 0 | 0 | 1027 | 1 | 2055 | 1027 | | Length | 4 | 2 | 4 | 1027 | 1 | 2055 | 1026 | | Length | 0 | 1 | 1024 | 1027 | 1 | 2055 | 1025 | | Data | 97 | 0 | 0 | 1027 | 1 | 2055 | 1024 | | Data | 98 | 0 | 0 | 1027 | 1 | 2055 | 1023 | | Data | 99 | 0 | 0 | 1027 | 1 | 2055 | 1022 | | Data | 100 | 0 | 0 | 1027 | 1 | 2055 | 1021 | | ... | ... | ... | ... | ... | ... | ... | ... | | Data | 100 | 0 | 0 | 1027 | 1 | 2055 | 1 | | LengthOfLength | 185 | 0 | 0 | 1027 | 1 | 1028 | 1027 | | Length | 4 | 2 | 4 | 1027 | 1 | 1028 | 1026 | | Length | 0 | 1 | 1024 | 1027 | 1 | 1028 | 1025 | | Data | 119 | 0 | 0 | 1027 | 1 | 1028 | 1024 | | Data | 120 | 0 | 0 | 1027 | 1 | 1028 | 1023 | | Data | 121 | 0 | 0 | 1027 | 1 | 1028 | 1022 | | Data | 122 | 0 | 0 | 1027 | 1 | 1028 | 1021 | | ... | ... | ... | ... | ... | ... | ... | ... | | Data | 122 | 0 | 0 | 1027 | 1 | 1028 | 1 | ###### Example #4 Legacy transaction with: - Nonce = 1 - Gas Price = 2 - Gas = 3 - To = [4; 20] - Value = 5 - Data = [6; 66] Output: `[248, 93, 1, 2, 3, 148, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 184, 66, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]` | tag | value | length_rindex | length_acc | block_length | depth | parent_rindex | data_rindex | |----------------|-----------|---------------|------------|--------------|-------|---------------|-------------| | LengthOfLength | 248 | 0 | 0 | 95 | 0 | 0 | 95 | | Length | 93 | 1 | 93 | 95 | 0 | 0 | 94 | | Data | 1 | 0 | 0 | 95 | 0 | 0 | 93 | | Data | 2 | 0 | 0 | 95 | 0 | 0 | 92 | | Data | 3 | 0 | 0 | 95 | 0 | 0 | 91 | | Length | 148 | 1 | 20 | 21 | 1 | 91 | 21 | | Data | 4 | 0 | 0 | 21 | 1 | 91 | 20 | | Data | 4 | 0 | 0 | 21 | 1 | 91 | 19 | | ... | ... | ... | ... | ... | ... | ... | ... | | Data | 4 | 0 | 0 | 21 | 1 | 91 | 2 | | Data | 4 | 0 | 0 | 21 | 1 | 91 | 1 | | Data | 5 | 0 | 0 | 95 | 0 | 0 | 69 | | LengthOfLength | 184 | 0 | 0 | 68 | 1 | 69 | 68 | | Length | 66 | 1 | 66 | 68 | 1 | 69 | 67 | | Data | 6 | 0 | 0 | 68 | 1 | 69 | 66 | | Data | 6 | 0 | 0 | 68 | 1 | 69 | 65 | | ... | ... | ... | ... | ... | ... | ... | ... | | Data | 6 | 0 | 0 | 68 | 1 | 69 | 2 | | Data | 6 | 0 | 0 | 68 | 1 | 69 | 1 | <!--- 4. `is_item_start` (NOT NEEDED?) - To denote whether this byte is the start of the encoding for an item. - Value is boolean, i.e. `[0, 1]`. - If enabled, the previous row must have either `is_item_end` or both `is_item_end` and `is_list_end` enabled. - If enabled, and `is_item_end` is enabled as well, the byte value must lie in the range `[0x00, 0x7f]`. 5. `is_item_end` (NOT NEEDED?) - To denote whether this byte is the end of the encoding for an item. - Value is boolean, i.e. `[0, 1]`. - If enabled, the next row must have either `is_item_start` or `is_list_start` enabled, or the current row is the last row. - If enabled, and `is_item_start` is enabled as well, the byte value must lie in the range `[0x00, 0x7f]`. 6. `is_list_start` (NOT NEEDED?) - To denote whether this byte is the start of the encoding for a list of items. - Value is boolean, i.e. `[0, 1]`. - If enabled, the current row must have `is_item_start`, `is_item_end` and `is_list_end` disabled. - If enabled, the previous row must have either `is_item_end` or both `is_item_end` and `is_list_end` enabled. 7. `is_list_end` (NOT NEEDED?) - To denote whether this byte is the end of the encoding for a list of items. - Value is boolean, i.e. `[0, 1]`. - If enabled, the current row also has `is_item_end` enabled. - If enabled, the current row must have `is_list_start` disabled. - If enabled, the next row must have either `is_item_start` or `is_list_start` enabled, or the current row is the last row. 8. `nested_level` (NOT NEEDED?) - To denote how deep the list nesting is at any position. - Values between any two consecutive bytes can either be unchanged, increment by `1` or decrement by `1`. - If `is_list_start` is enabled, and the current row is the first row, then `nested_level` is `1`. - If `is_list_start` is enabled, and the previous row has `is_list_end` disabled, then `nested_level` must increment by `1`. - If `is_list_end` is enabled, and the next row has `is_list_start` enabled, then `nested_level` remains unchanged. - If `is_list_end` is enabled, and the next row has `is_list_start` disabled, then `nested_level` must decrement by `1`. --->