# Plonk Intermediate Representation The goal of this document is to describe the semantics of a low-level intermediate language for Plonk circuits. Compiling a high-level program into an IR form gives us the flexibility to upgrade the back-end compiler over time, to support upgrades to the core cryptosystem (e.g. lookup tables, range proofs and who knows what else). It also increases portability, as multiple languages can target the same IR. ## IR semantics An IR program is a sequence of opcodes, that act on a Plonk circuit's 'program memory' - also referred to as 'wires' in the original Plonk paper. A Plonk program transcript can be viewed as a grid of memory cells, whose width is (probably) 4 and whose depth is equal to the number of circuit constraints. ## Data locations of a Plonk program It might be helpful to distinguish between typical circuit 'public inputs' and 'private inputs', and traditional program inputs (e.g. the data you feed into a C program's main()). When carving out explicit data regions the IR operates on, it might make sense to have a smart-contract style concept of 'calldata': information that's fed into a program, that can be used to derive the circuit witnesses (e.g. a Merkle root and a hash path) We can further split this into public and private calldata regions. Extending this logic, we can also have 'public outputs', the data that's returned by the program (which will form part of the circuit's traditional 'public inputs') ## Data regions * public calldata * private calldata * public outputs * program memory: reads/write locations must be known at compile time * RAM memory: Plookups enables O(1) RAM access, so IR programs can have memory where read/write locations are known at run time ## Types The IR could also be strongly typed. Either its strongly typed or we duplicate arithmetic opcodes for operands of different lengths Possible type list: * bool * uint8/16/32/64 * int8/16/32/64 * float? technically could be done, would naturally be expensive in terms of gates * field (native prime field element) ## Opcodes ### Opcode Nomenclature * variables in square brackets (e.g. [a]) represent locations in program memory * an ellipsis represents a variable-length list of variables (e.g. [...a] represents a list of locations in program memory) ### Opcode list | Opcode | Opcode Bytes | Types | Description | Notes | | -------- | -------- | -------- | -------- | -------- | | add | [a], [b], [o] | | o = a + b | | | sub | [a], [b], [o] | | o = a - b | | | mul | [a], [b], [o] | | o = a * b | | | div | [a], [b], [o] | | o = a / b | | | mod | [a], [b], [o] | | o = a % b | | | accumulate | x, [...a], ...q, [o] | | $o = \sum_i^xa_iq_i$ | | | setconst | x, [o] | | o = x | | | setpub | [p], [o] | | o = p | from public calldata | | | setpriv | [p], [o] | | o = p |from private calldata | | | dup | [a], [o] | | o = a | | | gt | [a], [b], [o] | | o = (a > b) | | | gte | [a], [b], [o] | | o = (a >= b) | | | lt | [a], [b], [o] | | o = (a < b) | | | lte | [a], [b], [o] | | o = (a <= b) | | | eq | [a], [b], [o] | | o = (a == b) | | | iszero | [a], [o] | | o = (a == 0) | | | and | [a], [b], [o] | | o = a & b | | | or | [a], [b], [o] | | o = a | b | | | xor | [a], [b], [o] | | o = a ^ b | | | not | [a], [o] | | o = !a | | | shl | [a], x, [o] | | o = a << x | x is const | | shr | [a], x, [o] | | o = a >> x | | | rotl | [a], x, [o] | | o = a <<< x | | | rotr | [a], x, [o] | | o = a >>> x | | | asserteq | [a], [o] | | throw if a != 0 | | | assertlt | [a], x | | throw if a >= 2^x | | | gadd | [x1], [y1], [x2], [y2], [x3], [y3] | | (x3, y3) = (x1, y1) $\bigoplus$ (x2, y2) | elliptic curve point addition | | | mload | [x], [o] | | read o from RAM location x | | | mstore | [x], [a] | | write a to RAM location x | |