A Guide to AppliedZKP zkEVM Circuit Code

We’ve studied several zkEVM projects in the previously published article Exploring Popular zkEVM Solutions, among which the representative of native-based zkEVM is what is being researched by the AppliedZKP team of the Ethereum Foundation. The text below will guide you through the circuit code of the AppliedZKP zkEVM.

The code is still in the initial implementation, and many crates have not yet stabilized. Thus, all discussions in this article are subject to change in the future. Please always refer to the latest code if you are to do code development and code study. This text aims to provide an easy guide to the basic framework and logic of the AppliedZKP zkEVM code base. Hopefully, more developers will join us in the further research of zkEVM.

Code Base	Path	Branch	Commit_ID	Date
zkEVM-circuits	https://github.com/appliedzkp/zkEVM-circuits	main	39a7bf7b30e25faa7db26a3cb94640240164542d	2021-10-18

Code Structure

In the current code, there are three crates, namely bus-mapping, keccak256, and zkevm-circuits. The relationship among the three is shown in the figure below:

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Also in Exploring Popular zkEVM Solutions, we pointed out that Bus-mapping is the state channel of evmcircuit and statecircuit in zkEVM. The devs organize the code in the way shown in the figure above. Let’s look at each crate in turn.

1. bus-mapping

The objectives of bus-mapping are:

Analyzes EVM execution path
Constructs witness data following complete execution path analysis
Provides a simple interface to gather and put witness data in the circuit, and witness it with a small number of functions

So the basic function is to process witness data, regardless of whether it is the path data executed by the EVM or the state data that changes before and after.

Let's look at each crate in bus-mapping specifically.

1.1 EVM

Basically, EVM defines ProgramCounter, GlobalCounter, EVMWord, GasInfo, and GasCost include and contains data infrustractures like stack, memory, and opcodes. It can be seen that pc and gc are encapsulation to usize; EVMWord is the encapsulation of the u8 array with a length of 32.

StackAddress is usize, but Stack is a dynamic array of EVMWord. MemoryAddress is also usize, which should be between 0 and 1023 while Memory is an array of u8. Developers use Rust macro cleverly to extract key memory information, such as traits of index and range, and implement them in a unified way. Storage is a HashMap and its key values are all EVMWord.

The opcodes crate defines the opcode traits of stack, memory, and storage, as well as the implementation of parts of the opcode traits.












pub trait Opcode: Debug {
    /// Generate the associated [`MemoryOp`](crate::operation::MemoryOp)s,
    /// [`StackOp`](crate::operation::StackOp)s, and
    /// [`StorageOp`](crate::operation::StorageOp)s associated to the Opcode
    /// is implemented for.
    fn gen_associated_ops(
        &self,
        exec_step: &mut ExecutionStep,
        container: &mut OperationContainer,
        next_steps: &[ExecutionStep],
    ) -> Result<usize, Error>;
}

Let’s take mload as an example to see how the trait is implemented. Here is the implementation of go-ethereum.

// core/vm/instructions.go
func opMload(pc *uint64, interpreter *EVMInterpreter, scope *ScopeContext) ([]byte, error) {
    v := scope.Stack.peek()
    offset := int64(v.Uint64())
    v.SetBytes(scope.Memory.GetPtr(offset, 32))
    return nil, nil
}

There are three steps of mload:
1. Take the value of offset from the last position of the stack;
2. Take the value of [offset, offset+32) bytes from memory;
3. Write bytes in the last position of the stack.
gen_associated_ops of mload is to do these steps. For detailed code, please refer to
bus-mapping/src/EVM/opcodes/mload.rs.

1.2 exec_trace

This is the most important crate in bus-mapping, including BlockConstants, ExecutionTrace, and OperationRef. BlockConstants contains some constants of EVM, such as coinbase, timestamp, etc.

ExecutionTrace is the core of bus-mapping, which is the interface form of witness. ExecutionStep contains content used in every execution such as memory, stack, current instruction, and Operation-related bus_mapping_instance. ExecutionStep implements its own gen_associated_ops function, which can call the corresponding gen_associated_ops according to the current instruction.

pub struct ExecutionStep {
    pub(crate) memory: Memory,
    pub(crate) stack: Stack,
    pub(crate) instruction: OpcodeId,
    pub(crate) gas_info: GasInfo,
    pub(crate) depth: u8,
    pub(crate) pc: ProgramCounter,
    pub(crate) gc: GlobalCounter,
    // Holds refs to the container with the related mem ops.
    pub(crate) bus_mapping_instance: Vec<OperationRef>,
}

ExecutionTrace include all steps executed, and container includes each operation used in Trace. The most critical function in this structure is build, which is used to set gc and generate a corresponding Operation for each step in the trace.

pub struct ExecutionTrace<F: FieldExt> {
    steps: Vec<ExecutionStep>,
    block_ctants: BlockConstants<F>,
    container: OperationContainer,
}

1.3 operation

The operation crate is relatively simple, defining RW, Target, MemoryOp, StackOp, StorageOp, and Operation, where RW is of two types, i.e., read and write, and Target includes Memory, Stack, and Storage. The encapsulation of Operation will specify what kind of Operation it is.

2. zkevm-circuits

This crate requires the use of halo2 to implement the zkEVM circuits, which is also a more complex part of the implementation of zkEVM. It temporarily includes the following subcrates: evm_circuit, gadget, state_circuit, and util.

2.1 util and gadget

Util is used for tool implementations, such as converting Rust's basic data types, such as bool, u8, etc., to the elements of a finite field.

Gadget defines the basic tool chip. For example, evm_word is a constraint check for 256 bit (or 32 bytes), is_zero is to determine whether it is 0, and monotone checks whether the advice column increases monotonically within a range. The implementation of these chip circuits follows the general architectural design. For more details, please refer to our previous publication Develop Circuits Using Halo2.

2.2 state_circuit

In zkEVM design specs, there are state proof and evm proof. State proof is used to help evm proof check whether all random read, write, and access records are effective. The records of random read and write are generated by state_circuit. State_circuit defines BusMapping and Config. From Develop Circuits Using Halo2, we can know that config is the most critical part to write the circuit of Halo2. It contains all the information on the lookup table needed for evm proof.

Take the code comments in state_circuit as an example. q_target is the selector, where 1 represents the first line of each target, 2 represents memory, 3 represents stack, and 4 represents storage. As we can see, state is grouped by address first, and then sorted by global_counter. Note that the address here is not the Ethereum address. For memory and stack, the address is the StackAddress and MemoryAddress mentioned in the previous section, namely usize, so the values in the example are 0 and 1. global_counter is the counter, which will count different addresses separately. Value is the operation value, and flag marks read and write, where 1 is write, and 0 is read. padding is used to fill the gap between each target and has no practical meaning. storage_key is exclusive to the storage operation, which is the key, and value_pre is the value of the previous value in this target, which is 0 at the beginning.

/*
Example state table:

| q_target | address | global_counter | value | flag | padding | storage_key | value_prev |
-------------------------------------------------------------------------------------------
|    1     |    0    |       0        |  0    |   1  |    0    |             |            |   // init row (write value 0)
|    2     |    0    |       12       |  12   |   1  |    0    |             |            |
|    2     |    0    |       24       |  12   |   0  |    0    |             |            |
|    2     |    1    |       0        |  0    |   1  |    0    |             |            |   // init row (write value 0)
|    2     |    1    |       2        |  12   |   0  |    0    |             |            |
|    2     |         |                |       |      |    1    |             |            |   // padding
|    2     |         |                |       |      |    1    |             |            |   // padding
|    1     |    0    |       3        |  4    |   1  |    0    |             |            |
|    3     |    0    |       17       |  32   |   1  |    0    |             |            |
|    3     |    0    |       89       |  32   |   0  |    0    |             |            |
|    3     |    1    |       48       |  32   |   1  |    0    |             |            |
|    3     |    1    |       49       |  32   |   0  |    0    |             |            |
|    3     |         |                |       |      |    1    |             |            |   // padding
|    1     |    1    |       55       |  32   |   1  |    0    |      5      |     0      |   // first storage op at the new address has to be write
|    4     |    1    |       56       |  33   |   1  |    0    |      8      |     32     |
|    4     |         |                |       |      |    1    |             |            |   // padding
*/

2.3 evm_circuit

The evm_circuit is the core of the entire projec, including the implementation of opcode, the definition of Constraint, the definition of EVM execution results, the element of each execution step (ExecutionStep), and the most important EVMCircuit.

For example, in the definition of Constraint, the name serves as an annotation. When the selector is a success, the gadget's selector - polys are the Constraint’s polynomials, and lookups are the lookup tables used by the Constraint. In the Halo2 circuit instruction manual, the constraints interface function in the OpGadget trait is an array that returns Constraint.

struct Constraint<F> {
    name: &'static str,
    selector: Expression<F>,
    polys: Vec<Expression<F>>,
    lookups: Vec<Lookup<F>>,
}

Lookup is divided into two categories. First, FixedLookup, which an array of 3 Expressions; second, BusMappingLookup, which is an enum containing OpExecutionState, Stack, Memory, AccountStorage, etc.

pub(crate) enum Lookup<F> {
    FixedLookup(FixedLookup, [Expression<F>; 3]),
    BusMappingLookup(BusMappingLookup<F>),
}

The difference between CallField and CallStateField is that the former is used in the call of BusMappingLookup, and the latter is used in the OpExecutionState of BusMappingLookup.

The Cell structure includes constraint expressions, columns, and the offset used by the selector.

pub(crate) struct Cell<F> {
    // expression for constraint
    expression: Expression<F>,
    column: Column<Advice>,
    // relative position to selector for synthesis
    rotation: usize,
}

A word is represented by one expression and is positioned by an array of 32 cells.

struct Word<F> {
    // random linear combination expression of cells
    expression: Expression<F>,
    // inner cells for synthesis
    cells: [Cell<F>; 32],
}

The structure, CoreStateInstance, is often used when writing opcode circuits and is used for context caching in the assign implementation in evm_circuit.

pub(crate) struct CoreStateInstance {
    is_executing: bool,
    global_counter: usize,
    call_id: usize,
    program_counter: usize,
    stack_pointer: usize,
    gas_counter: usize,
}

ExecutionStep contains the opcode executed in this step, the execution result, and the value used. Although the name is the same as in the bus-mapping crate, its purpose and structure are different.

pub(crate) struct ExecutionStep {
    opcode: OpcodeId,
    case: Case,
    values: Vec<BigUint>,
}

EVMCircuit contains the selector, lookup table, read-write table, fixed table, and op execution gadget for each step.

struct EVMCircuit<F> {
    q_step: Selector,
    qs_byte_lookup: Column<Advice>,
    fixed_table: [Column<Fixed>; 4],
    rw_table: [Column<Advice>; 7],
    op_execution_gadget: OpExecutionGadget<F>,
}

Currently, opcodes of arithmetic, byte, comparator, dup, pop, and push have been implemented. We take add/sub as an example to explain how to write an op circuit.

First, define the data structure required by op (when the operation is successful), which is also the operation data of the op in most cases. When it fails, no other data is needed, we just need to handle the corresponding case. For the AddSuccessAllocation structure, the selector is used for activation in different cases. Because ADD and SUB are implemented together, swap is used to distinguish add or sub. a, b, c are used to represent the three values of the op operation, satisfying a + b == c. When sub, a + c == b.

struct AddSuccessAllocation<F> {
    selector: Cell<F>,
    swap: Cell<F>,
    a: Word<F>,
    b: Word<F>,
    c: Word<F>,
    carry: [Cell<F>; 32],
}

Second, define the corresponding Gadget structure, mainly the results of the op in different cases. For add/sub, there are three states: success, stackunderflow, and outofgas.

pub struct AddGadget<F> {
    success: AddSuccessAllocation<F>,
    stack_underflow: Cell<F>, // case selector
    out_of_gas: (
        Cell<F>, // case selector
        Cell<F>, // gas available
    ),
}

The final step is to implement the OpGadget trait for the op. To begin with, define several case situations covered by this op, as well as the number of values, the number of cells, and whether or not to halt.

const RESPONSIBLE_OPCODES: &'static [OpcodeId] =
    	&[OpcodeId::ADD, OpcodeId::SUB]; // This operation implements opcodes ADD and SUB.  
const CASE_CONFIGS: &'static [CaseConfig] = &[
	CaseConfig {
    	case: Case::Success,
    	num_word: 3,  // a + b + c
    	num_cell: 33, // 32 carry + swap
    	will_halt: false,
  },
	CaseConfig {
    	case: Case::StackUnderflow,
    	num_word: 0,
    	num_cell: 0,
    	will_halt: true,
	},
	CaseConfig {
    	case: Case::OutOfGas,
    	num_word: 0,
    	num_cell: 0,
    	will_halt: true,
	},
]; // Several case situations by ADD and SUB

The construct() function is relatively simple, which is to initialize our Gadget with case_allocations. The constraints function is a relatively critical interface, which needs the process of the current state and the next state to construct constraints.

fn constraints(
&self,
state_curr: &OpExecutionState<F>,
state_next: &OpExecutionState<F>,
) -> Vec<Constraint<F>> {
    let OpExecutionState { opcode, .. } = &state_curr;

    let common_polys = vec![
        (opcode.expr() - OpcodeId::ADD.expr())
            * (opcode.expr() - OpcodeId::SUB.expr()),
    ];

    let success = {
        // success constraint
    };

    let stack_underflow = {
        // stack underflow constraint
    };

    let out_of_gas = {
        // out_of_gas constraint
    };

    array::IntoIter::new([success, stack_underflow, out_of_gas])
        .map(move |mut constraint| {
            constraint.polys =
                [common_polys.clone(), constraint.polys].concat();
            constraint
        })
           .collect()
}

If you want to know how to use halo2 to develop circuits, please refer to Develop Circuits Using Halo2.

3. keccak256

While we are studying the code, the crate of keccak256 is still in the construction stage. We will analyze it in the follow-up study.

How to participate

As written in Sin7Y’s early articles, zkEVM is the brightest jewel in the crown of Ethereum scaling, and we look forward to more developers participating in this great project. Of course, as an open-source project, you may make your contribution in whatever way you like. Fixing a typo, improving documentation, or implementing a certain function are all helpful. If you are more familiar with the development of Halo2 and want to implement a certain opcode, you can take the above ADD/SUB as an example. The implementation process complies with writing the design specifications first. After the design specifications are merged by the core developers, you can write the opcode circuit code. Finally, don't forget to write the unit test code for the newly added opcode implementation.

References

ZkEVM Specifications https://github.com/appliedzkp/zkEVM-specs
Circuits for zkEVM https://github.com/appliedzkp/zkEVM-circuits
Develop Circuits Using Halo2 https://hackernoon.com/sin7y-tech-review-develop-circuits-using-halo-2
Exploring Popular zkEVM Solutions https://hackernoon.com/exploring-popular-zkEVM-solutions-appliedzkp-matter-labs-hermez-and-sin7y-quick-publication-ltq37ah

📎 A little about us

As a team of both vigorous and professional blockchain devs, Sin7Y has been doing constant research on zkEVM, layer 2, privacy computing, and cross-chain technology, delivering high-quality tech reviews on a regular basis. You can find us on Twitter or write to us at sin7y@web3.com. Feel free to contact us by DM or Email if you have any queries.