# RLP Encoding Circuit
The goal of the circuit is to verify that for a given RLP-encodable data, the encoded data is valid.
## RLP Encoding
An item is defined as:
- A string, i.e. byte array is an item
- A list of items is an item
### Code
```python
def rlp_encode(input):
if isinstance(input,str):
if len(input) == 1 and ord(input) < 0x80: return input
else: return encode_length(len(input), 0x80) + input
elif isinstance(input,list):
output = ''
for item in input: output += rlp_encode(item)
return encode_length(len(output), 0xc0) + output
def encode_length(L,offset):
if L < 56:
return chr(L + offset)
elif L < 256**8:
BL = to_binary(L)
return chr(len(BL) + offset + 55) + BL
else:
raise Exception("input too long")
def to_binary(x):
if x == 0:
return ''
else:
return to_binary(int(x / 256)) + chr(x % 256)
```
### Possible Ranges
The first bytes for the below cases end-up in the following ranges:
- For an item
- if `len(item) == 1` and `item < 0x80`: **[0x00, 0x7f]**
- if `len(item) < 56`: **[0x80, 0xb7]**
- if `len(item) > 55`: **[0xb8, 0xbf]**
- For a list of items
- if `len(items) < 56`: **[0xc0, 0xf7]**
- if `len(items) > 55`: **[0xf8, 0xff]**
Hence, given the first byte:
1. **val ∈ [0x00, 0x7f]**
- This byte is the value itself
- The following byte will be a new start of item/items
2. **val ∈ [0x80, 0xb7]**
- The following `length` bytes represent an item with `length = val - 0x80`
- The byte following `length` bytes will be a new start of item/items
3. **val ∈ [0xb8, 0xbf]**
- The following `length_of_length` bytes, where `length_of_length = val - 0xb7` are the big-endian representation for the length of the actual item
- We construct the length of the actual item `length` from the next `length_of_length` bytes
- After `length_of_length` bytes, the following `length` bytes represent the item
- The byte following `length_of_length + length` bytes (from the first byte) will be a new start of item/items
4. **val ∈ [0xc0, 0xf7]**
- The following `length` bytes, where `length = val - 0xc0` are the RLP-encoding for all the items contained in this list of items
- We recursively follow the same points from 1, 2, 3 and 4
- **val ∈ [0xf8, 0xff]**
- The following `length_of_length` bytes, where `length_of_length = val - 0xf7` are the big-endian representation for the length of the RLP-encoding for all the items contained in this list of items
- We construct the length from the next `length_of_length` bytes
- After `length_of_length` bytes, the following `length` bytes are those items
- We recursively follow the same points from 1, 2, 3 and 4
### Circuit Layout
#### Columns
Every byte in the RLP-encoded output takes up a single row in the circuit layout.
We use the following columns for every row:
1. `q_enable`: Selector that denotes whether this row is enabled in the layout.
2. `q_first`: Fixed column that denotes whether this is the first row in the layout.
3. `q_last`: Selector that denotes whether this is the last row in the layout.
4. `is_final`: Denotes whether the row is the last byte in the RLP-encoded data.
5. `index`: Denotes the index of this row, starting from `0` and ending at `n-1` where `n` is the byte length of the RLP-encoded data.
6. `tag`: Denotes the [row tag](https://hackmd.io/WSdo-JmdTgyiwk5B945LcQ#Tags).
7. `value`: Denotes the byte value in the RLP-encoded data.
8. `length_rindex`:
- Denotes the number of rows, including the current row, that represent the big-endian form of the `length`.
- Since the `length` can only take up to 8 bytes (`len < 256**8`), we have `1 <= length_rindex <= 8`.
- `length_rindex` decrements until `length_rindex == 1` for the last row representing `Tag::Length`.
9. `length_acc`: Denotes an accumulator for the length of data, in the case where `len > 55` and the length is represented in its big-endian form.
10. `block_length`: Denotes the byte length of the current `Length` or `LengthOfLength` block.
11. `depth`: Denotes the depth of nested structure from the root. Starts from `0`.
12. `parent_rindex`: Denotes the appropriate `data_rindex` from the immediate parent block. We use the `parent_rindex` to calculate the correct `data_rindex` once we step out of a nested block.
13. `data_rindex`: Denotes the number of rows, including the current row, that are left in the current data block.
14. `value_rlc`:
- Denotes an accumulator for the value field over all rows.
- `value_rlc = value[0] + r * value[1] + r^2 * value[2] + ... + r^(n-1) * value[n-1]`.
15. `hash`: Denotes the keccak-256 hash of the RLP-encoded data.
16. `keccak_tuple`:
- Denotes a tuple `(keccak_tuple[0], keccak_tuple[1], keccak_tuple[2])` where
- `keccak_tuple[0] -> value_rlc`
- `keccak_tuple[1] -> n`
- `keccak_tuple[2] -> keccak256(rlp_encode(input))`
17. `padding`: Denotes whether the row appears after `is_final`, hence the purpose is padding.
#### Tags
We make use of the following tags:
1. `Tag::Data`: Represents the data bytes of an item.
2. `Tag::Length`: Represents the length of an item(s), when `len < 56`.
3. `Tag::LengthOfLength`: Represents the length in bytes of the length of the item(s) in binary form, when `len > 55`.
#### Constraints
We have the following constraints for transitioning between the above tag states:
- `Tag::LengthOfLength`:
- Current row's `value`: `value ∈ [0xb8, 0xbf] or value ∈ [0xf8, 0xff]`.
- Current row's `length_acc`: `length_acc == 0`.
- Current row's `data_rindex` and `block_length`: `data_rindex == block_length`.
- Current row's `parent_rindex`:
- *(First row, i.e. start of layout)* If `q_first` then `parent_rindex == 0`.
- *(Stay at the same depth)* If `data_rindex_prev == 1` then `parent_rindex == parent_rindex_prev - block_length_prev`.
- *(Go down one level deeper)* Otherwise `parent_rindex == data_rindex_prev`.
- Current row's `depth`:
- *(First row, i.e. start of layout)* If `q_first` then `depth == 0`.
- *(Stay at the same depth)* If `data_rindex_prev == 1` then `depth == depth_prev`.
- *(Go down one level deeper)* Otherwise `depth == depth_prev + 1`.
- Next row's `tag`: `tag_next == Tag::Length`.
- Next row's `length_rindex`: `length_rindex_next == lvalue`, where
- `lvalue == value - 0xb7` if `value ∈ [0xb8, 0xbf]`.
- `lvalue == value - 0xf7` if `value ∈ [0xf8, 0xff]`.
- `Tag::Length`:
- Current row's `length_acc`: `length_acc == (length_acc_prev * 256) + lvalue`, where
- `lvalue = value - 0x80` for `value ∈ [0x80, 0xb7]`.
- `lvalue = value - 0xc0` for `value ∈ [0xc0, 0xf7]`.
- `lvalue = value` otherwise.
- Current row's `data_rindex` and `block_length`:
- *(First row, i.e. start of layout)* If `q_first` then `data_rindex == block_length`.
- *(Stay at the same depth)* If `depth == depth_prev`:
- If `data_rindex_prev == 1` then `data_rindex == block_length`.
- Otherwise `data_rindex == data_rindex_prev - 1` and `block_length == block_length_prev`.
- *(Go down one level deeper)* Otherwise `data_rindex == block_length`.
- Current row's `parent_rindex`:
- *(First row, i.e. start of layout)* If `q_first` then `parent_rindex == 0`.
- *(Stay at the same depth)* If `data_rindex_prev == 1` then `parent_rindex == parent_rindex_prev - block_length_prev`.
- *(Go down one level deeper)* Otherwise `parent_rindex == data_rindex_prev`.
- Current row's `depth`:
- *(First row, i.e. start of layout)* If `q_first` then `depth == 0`.
- *(Stay at the same depth)* If `data_rindex_prev == 1` then `depth == depth_prev`.
- *(Go down one level deeper)* Otherwise `depth == depth_prev + 1`.
- Next row's `tag`:
- *(First row, i.e. start of layout)* If `q_first`:
- `tag_next == Tag::Data`.
- `length_rindex == 1`.
- *(Stay at the same depth)* If `depth == depth_prev`:
- If `length_rindex > 1`:
- `tag_next == Tag::Length`.
- `length_rindex == length_rindex_next + 1`.
- If `length_rindex == 1` and `depth_next == depth`:
- `tag_next == Tag::Data`.
- `length_acc == data_rindex_next`.
- If `length_rindex == 1` and `depth_next == depth + 1`:
- If `(0x80 <= value_next <= 0xb7) or (0xc0 <= value_next <= 0xf7)` then `tag_next == Tag::Length`.
- If `(0xb8 <= value_next <= 0xbf) or (0xf8 <= value_next <= 0xff)` then `tag_next == Tag::LengthOfLength`.
- *(Go down one level deeper)* If `depth == depth_prev + 1`:
- `tag_next == Tag::Data`.
- `length_rindex == 1`.
- `Tag::Data`:
- Current row's `length_rindex`: `length_rindex == 0`.
- Current row's `length_acc`: `length_acc == 0`.
- Current row's `block_length`:
- *(Stay at the same depth)* If `depth == depth_prev` then `block_length == block_length_prev`.
- *(Go up one level above)* If `depth == depth_prev - 1` then *TODO*
- If `data_rindex > 1`: *TODO*
- If `data_rindex == 1`: *TODO*
Additional constraints:
- For every row:
- `is_final` is boolean, i.e. `is_final ∈ [0, 1]`.
- `padding` is boolean, i.e. `padding ∈ [0, 1]`.
- For the first row in the circuit, i.e. `q_first == 1`:
- `value_rlc == value`.
- `index == 0`.
- For the subsequent rows:
- `index == index_prev + 1`.
- `hash == hash_prev`.
- `value_rlc == (value_rlc_prev * r) + value`.
- For the last row of the RLP-encoded data, i.e. `(is_final == 1) and (padding == 0)`:
- `(value_rlc, index + 1, hash) == (keccak_tuple[0], keccak_tuple[1], keccak_tuple[2])`.
- For all rows except the first row, i.e. `q_first != 1`:
- Padding can go 0 -> 1 only once, i.e. `padding - padding_prev ∈ [0, 1]`.
- For the last row of the layout, i.e. `q_last == 1`:
- The row is either the end of the RLP-encoded data or its a padding row, i.e. `(is_final == 1) or (padding == 1)`.
#### Examples
###### Example #1
- input: `"dog"`
- output: `[ 0x83, ‘d’, ‘o’, ‘g’ ]`
| tag | value | length_rindex | length_acc | block_length | depth | parent_rindex | data_rindex |
|--------------|-----------|---------------|------------|--------------|-------|---------------|-------------|
| Length | 131 | 1 | 3 | 4 | 0 | 0 | 0 |
| Data | 100 | 0 | 0 | 4 | 0 | 0 | 3 |
| Data | 111 | 0 | 0 | 4 | 0 | 0 | 2 |
| Data | 103 | 0 | 0 | 4 | 0 | 0 | 1 |
###### Example #2
- input: `[ “cat”, “dog” ]`
- output: `[ 0xc8, 0x83, 'c', 'a', 't', 0x83, 'd', 'o', 'g' ]`
| tag | value | length_rindex | length_acc | block_length | depth | parent_rindex | data_rindex |
|----------------|-----------|---------------|------------|--------------|-------|---------------|-------------|
| LengthOfLength | 200 | 0 | 0 | 9 | 0 | 0 | 9 |
| Length | 131 | 1 | 3 | 4 | 1 | 9 | 4 |
| Data | 99 | 0 | 0 | 4 | 1 | 9 | 3 |
| Data | 97 | 0 | 0 | 4 | 1 | 9 | 2 |
| Data | 116 | 0 | 0 | 4 | 1 | 9 | 1 |
| Length | 131 | 1 | 3 | 4 | 1 | 5 | 4 |
| Data | 100 | 0 | 0 | 4 | 1 | 5 | 3 |
| Data | 111 | 0 | 0 | 4 | 1 | 5 | 2 |
| Data | 103 | 0 | 0 | 4 | 1 | 5 | 1 |
###### Example #3
- input: `[ "abcdabcd....abcd", "wxyzwxyz...wxyz" ]` for each string, `len(string) == 1024`
- output: `[ 0xf9, 0x08, 0x06, 185, 0x04, 0x00, 'a', 'b', 'c', 'd', ..., 185, 0x04, 0x00, 'w', 'x', 'y', 'z', ... ]`
| tag | value | length_rindex | length_acc | block_length | depth | parent_rindex | data_rindex |
|----------------|-----------|---------------|------------|--------------|-------|---------------|-------------|
| LengthOfLength | 249 | 0 | 0 | 2057 | 0 | 0 | 2057 |
| Length | 8 | 2 | 8 | 2057 | 0 | 0 | 2056 |
| Length | 6 | 1 | 2054 | 2057 | 0 | 0 | 2055 |
| LengthOfLength | 185 | 0 | 0 | 1027 | 1 | 2055 | 1027 |
| Length | 4 | 2 | 4 | 1027 | 1 | 2055 | 1026 |
| Length | 0 | 1 | 1024 | 1027 | 1 | 2055 | 1025 |
| Data | 97 | 0 | 0 | 1027 | 1 | 2055 | 1024 |
| Data | 98 | 0 | 0 | 1027 | 1 | 2055 | 1023 |
| Data | 99 | 0 | 0 | 1027 | 1 | 2055 | 1022 |
| Data | 100 | 0 | 0 | 1027 | 1 | 2055 | 1021 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| Data | 100 | 0 | 0 | 1027 | 1 | 2055 | 1 |
| LengthOfLength | 185 | 0 | 0 | 1027 | 1 | 1028 | 1027 |
| Length | 4 | 2 | 4 | 1027 | 1 | 1028 | 1026 |
| Length | 0 | 1 | 1024 | 1027 | 1 | 1028 | 1025 |
| Data | 119 | 0 | 0 | 1027 | 1 | 1028 | 1024 |
| Data | 120 | 0 | 0 | 1027 | 1 | 1028 | 1023 |
| Data | 121 | 0 | 0 | 1027 | 1 | 1028 | 1022 |
| Data | 122 | 0 | 0 | 1027 | 1 | 1028 | 1021 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| Data | 122 | 0 | 0 | 1027 | 1 | 1028 | 1 |
###### Example #4
Legacy transaction with:
- Nonce = 1
- Gas Price = 2
- Gas = 3
- To = [4; 20]
- Value = 5
- Data = [6; 66]
Output: `[248, 93, 1, 2, 3, 148, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 184, 66, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6]`
| tag | value | length_rindex | length_acc | block_length | depth | parent_rindex | data_rindex |
|----------------|-----------|---------------|------------|--------------|-------|---------------|-------------|
| LengthOfLength | 248 | 0 | 0 | 95 | 0 | 0 | 95 |
| Length | 93 | 1 | 93 | 95 | 0 | 0 | 94 |
| Data | 1 | 0 | 0 | 95 | 0 | 0 | 93 |
| Data | 2 | 0 | 0 | 95 | 0 | 0 | 92 |
| Data | 3 | 0 | 0 | 95 | 0 | 0 | 91 |
| Length | 148 | 1 | 20 | 21 | 1 | 91 | 21 |
| Data | 4 | 0 | 0 | 21 | 1 | 91 | 20 |
| Data | 4 | 0 | 0 | 21 | 1 | 91 | 19 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| Data | 4 | 0 | 0 | 21 | 1 | 91 | 2 |
| Data | 4 | 0 | 0 | 21 | 1 | 91 | 1 |
| Data | 5 | 0 | 0 | 95 | 0 | 0 | 69 |
| LengthOfLength | 184 | 0 | 0 | 68 | 1 | 69 | 68 |
| Length | 66 | 1 | 66 | 68 | 1 | 69 | 67 |
| Data | 6 | 0 | 0 | 68 | 1 | 69 | 66 |
| Data | 6 | 0 | 0 | 68 | 1 | 69 | 65 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| Data | 6 | 0 | 0 | 68 | 1 | 69 | 2 |
| Data | 6 | 0 | 0 | 68 | 1 | 69 | 1 |
<!---
4. `is_item_start` (NOT NEEDED?)
- To denote whether this byte is the start of the encoding for an item.
- Value is boolean, i.e. `[0, 1]`.
- If enabled, the previous row must have either `is_item_end` or both `is_item_end` and `is_list_end` enabled.
- If enabled, and `is_item_end` is enabled as well, the byte value must lie in the range `[0x00, 0x7f]`.
5. `is_item_end` (NOT NEEDED?)
- To denote whether this byte is the end of the encoding for an item.
- Value is boolean, i.e. `[0, 1]`.
- If enabled, the next row must have either `is_item_start` or `is_list_start` enabled, or the current row is the last row.
- If enabled, and `is_item_start` is enabled as well, the byte value must lie in the range `[0x00, 0x7f]`.
6. `is_list_start` (NOT NEEDED?)
- To denote whether this byte is the start of the encoding for a list of items.
- Value is boolean, i.e. `[0, 1]`.
- If enabled, the current row must have `is_item_start`, `is_item_end` and `is_list_end` disabled.
- If enabled, the previous row must have either `is_item_end` or both `is_item_end` and `is_list_end` enabled.
7. `is_list_end` (NOT NEEDED?)
- To denote whether this byte is the end of the encoding for a list of items.
- Value is boolean, i.e. `[0, 1]`.
- If enabled, the current row also has `is_item_end` enabled.
- If enabled, the current row must have `is_list_start` disabled.
- If enabled, the next row must have either `is_item_start` or `is_list_start` enabled, or the current row is the last row.
8. `nested_level` (NOT NEEDED?)
- To denote how deep the list nesting is at any position.
- Values between any two consecutive bytes can either be unchanged, increment by `1` or decrement by `1`.
- If `is_list_start` is enabled, and the current row is the first row, then `nested_level` is `1`.
- If `is_list_start` is enabled, and the previous row has `is_list_end` disabled, then `nested_level` must increment by `1`.
- If `is_list_end` is enabled, and the next row has `is_list_start` enabled, then `nested_level` remains unchanged.
- If `is_list_end` is enabled, and the next row has `is_list_start` disabled, then `nested_level` must decrement by `1`.
--->