Update: We should just follow traditional Fiat-Shamir, to always commit and generate challenge. Assuming EVM state transition is deterministic is not working for malicious prover.
In circuit, in order to reduce constraints, we try to use random linear combination as a cheap hash function on range-checked bytes for:
For the first point, it allows us to store an EVM word in a single witness, without worrying about the fact that it doesn't fit in the field. Most time we move these random linear combination word directly from here to there, only when we need to perform arithmetic or bitwise operation we will decompress/decomposite/decode the word into bytes (with range check on each byte) and then do the actual operation.
We can also store an EVM word in 2 witnesses representing hi-part and lo-part, but it makes us need to move 2 witnesses around for each word. It's not very clear how the random linear combination saves the overall constraints, we might need to explore more opcodes and know the actual benefit.
han
For the second point, it allows us to easily do RLP encoding for transaction or merkle (hexary) patricia trie node in fixed amount of witnesses, without worrying about the fact that RLP encoded bytes could have arbitrary and unlimited length (for MPT node it has a max length, but for tx it doesn't). Each accumulated witness will be further decompress/decomposite/decode into serveral bytes and fed to keccak256
as input.
It would be really nice if we can further ask
keccak256
to accept a accumulated witness and the amount of bytes it contains as inputs.
han
The randomness for random linear combination is important, otherwise the malicious prover could try to find a collision to mint its ether from the thin air.
Here are 2 approaches trying to derive a reasonable randomness to mitigate the risk.
Assuming we could separate all random linear combined witnesses to different polynomials in our constraint system, we can first commit polynomials except those for random linear combined witnesses, and derive the randomness from commitments as public input, and continue the proving process.
The public inputs of circuit at least contains:
Regardless of the fact that the new state trie root could be a bad one (it should be a bad one, otherwise the attack affect nothing), since the state trie root implies all the bytes it contains, with transaction raw data, if we derive the randomness from all of them, the malicious prover needs to decide what's the new bad state trie root, then find the collisions with input and output. It somehow limits the possible collision pairs because the input and output are also fixed.
Suppose a deterministic virtual machine consists of 2 opcodes PUSH32
and ADD
, and the VM runs as a pure function run
as described:
def randomness_approach_2(bytecode: list, claimed_output: list) -> int:
return int.from_bytes(keccak256(bytecode + claimed_output), 'little') % FP
def run(bytecode: list, claimed_output: list):
"""
run takes bytecode to execute and treat the top of stack as output in the end.
"""
# Derive randomness
r = randomness_approach_2(bytecode, claimed_output)
# Despite an EVM word is 256-bit which is larger then field size, we store it
# as random linear combination in stack. Top value is in the end of the list.
stack = []
program_counter = 0
while program_counter < len(bytecode):
opcode = bytecode[program_counter]
# PUSH32
if opcode == 0x00:
# Read next 32 bytes as an EVM word from bytecode
bytes = bytecode[program_counter+1:program_counter+33]
# Push random linear combination of the EVM word to stack
stack.append(random_linear_combine(bytes, r))
program_counter += 33
# ADD
elif opcode == 0x01:
# Pop 2 random linear combination EVM word from stack
a, b = stack.pop(), stack.pop()
# Decompress them into little-endian bytes
bytes_a, bytes_b = rlc_to_bytes[a], rlc_to_bytes[b]
# Add them together
bytes_c = add_bytes(bytes_a, bytes_b)
# Push result as random linear combination to stack
stack.append(random_linear_combine(bytes_c, r))
program_counter += 1
else:
raise ValueError("invalid opcode")
assert rlc_to_bytes[stack.pop()] == claimed_output, "unsatisfied"
from Crypto.Hash import keccak
from Crypto.Random.random import randrange
def keccak256(data: list) -> list:
return list(keccak.new(digest_bits=256).update(bytes(data)).digest())
# BN254 scalar field size.
FP = 21888242871839275222246405745257275088548364400416034343698204186575808495617
def fp_add(a: int, b: int) -> int: return (a + b) % FP
def fp_mul(a: int, b: int) -> int: return (a * b) % FP
# rlc_to_bytes records the original bytes of a random linear combination.
# In circuit we ask prover the provide bytes and verify all bytes are in range
# and the random linear combination matches.
rlc_to_bytes = dict()
def random_linear_combine(bytes: list, r: int) -> int:
"""
random_linear_combine returns bytes[0] + r*bytes[1] + ... + (r**31)*bytes[31].
"""
rlc = 0
for byte in reversed(bytes):
assert 0 <= byte < 256
rlc = fp_add(fp_mul(rlc, r), byte)
rlc_to_bytes[rlc] = bytes
return rlc
def add_bytes(lhs: list, rhs: list) -> list:
"""
add_bytes adds 2 little-endian bytes value modulus 2**256 and returns result
as bytes also in little-endian.
"""
result = (
int.from_bytes(lhs, 'little') +
int.from_bytes(rhs, 'little')
) % 2**256
return list(result.to_bytes(32, 'little'))
def randomness_approach_2(bytecode: list, claimed_output: list) -> int:
return int.from_bytes(keccak256(bytecode + claimed_output), 'little') % FP
def run(bytecode: list, claimed_output: list):
"""
run takes bytecode to execute and treat the top of stack as output in the end.
"""
# Derive randomness
r = randomness_approach_2(bytecode, claimed_output)
# Despite an EVM word is 256-bit which is larger then field size, we store it
# as random linear combination in stack. Top value is in the end of the list.
stack = []
program_counter = 0
while program_counter < len(bytecode):
opcode = bytecode[program_counter]
# PUSH32
if opcode == 0x00:
# Read next 32 bytes as an EVM word from bytecode
bytes = bytecode[program_counter+1:program_counter+33]
# Push random linear combination of the EVM word to stack
stack.append(random_linear_combine(bytes, r))
program_counter += 33
# ADD
elif opcode == 0x01:
# Pop 2 random linear combination EVM word from stack
a, b = stack.pop(), stack.pop()
# Decompress them into little-endian bytes
bytes_a, bytes_b = rlc_to_bytes[a], rlc_to_bytes[b]
# Add them together
bytes_c = add_bytes(bytes_a, bytes_b)
# Push result as random linear combination to stack
stack.append(random_linear_combine(bytes_c, r))
program_counter += 1
else:
raise ValueError("invalid opcode")
assert rlc_to_bytes[stack.pop()] == claimed_output, "unsatisfied"
def test_run():
a, b = randrange(0, 2**256), randrange(0, 2**256)
c = (a + b) % 2**256
run(
bytecode=[
0x00, *a.to_bytes(32, 'little'), # PUSH32 a
0x00, *b.to_bytes(32, 'little'), # PUSH32 b
0x01 # ADD
],
claimed_output=list(c.to_bytes(32, 'little')),
)
test_run()
All the random linear combination or decompression will be constraint in PLONK constraint system, where the randomness is fed as public input.
It derives the randomness from both input and output (feed them into keccak256), which is the approach 2. Although it uses raw value in bytes instead of hashed value, but assuming the keacck256 and the merkle (hexary) patricia trie in Ethereum is collision resistent, it should be no big differece between the two cases.
The issue at least reduces to: Whether a malicious prover can find collisions between stack push and pop, after it decides the input and output.