Try   HackMD

Sin7Y Tech Review (22): Cairo - Instruction

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Cairo is a practically-efficient Turing-complete STARK-friendly CPU architecture. In this article, we introduce the CPU architecture of Cairo in terms of instruction structure and state transition, and provide some examples of instruction.

Instruction structure

The word natively supported by Cairo CPU is a field element, where the field is some fixed finite field of characteristic

P>263. Each instruction will occupy 1 or 2 words. If an immediate value([ap] = "12345678")follows the instruction, the instruction will occupy 2 words, and the value will be stored in the second word. The first word of each instruction consists of the following elements:

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

  • offdst
    [ bit0..15]: destination address offset, representative value
    215+i=015bi2i[215,215)
    ;
  • offop0
    [ bit16..31]: op0 address offset, representative value
    215+i=015bi2i[215,215)
    ;
  • offop1
    [ bit32..47]: op1 address offset, representative value
    215+i=015bi2i[215,215)
    ;
  • dst reg
    [ bit48]: Base register of destination address offset, ap or fp;
  • op0 reg
    [ bit49]: Base register of op0 address offset, ap or fp;
  • op1_src
    [ bit50..52]: address of op1,
    • Case: 000
      op1=m(op0+offop1)
    • Case: 001
      op1=m(pc+offop1)
    • Case: 010
      op1=m(fp+offop1)
    • Case: 100
      op1=m(ap+offop1)
  • res_logic
    [ bit53..54]: computational logic
    • Case: 00
      res=op1
    • Case: 01
      res=op0+op1
    • Case: 10
      res=op0op1
  • pc_update
    [ bit55..57]: the update logic of pc
    • Case: 000 // common
      next_pc=pc+instruction_size
    • Case: 001 // absolute jump
      next_pc=res
    • Case: 010 // relative jump
      next_pc=pc+res
    • Case: 100 // conditional relative jump
      next_pc=pc+op1(or instruction_size if dst=0)
  • ap_update
    [ bit58..59]: the update logic of ap
    • Case: 00
      next_ap=ap(or ap+2 if opcode=1)
    • Case: 01
      next_ap=ap+res
    • Case: 10
      next_ap=ap+1
  • opcode
    [ bit60..62]: opcode types
    • Case: 000 // jmp
    • Case: 001 // call
    • Case: 010 // ret
    • Case: 100 // assert

State transition

The state transition function represents a general state transition unit (because it contains the processing logic of all instruction types), and a calculation is usually decomposed into multiple continuously executed instructions. Therefore, we need to:

  1. ensure the content of the instruction and the validity of the states before and after the instruction is executed (e.g., passing a certain range check and state consistency check) and
  2. ensure that the executed instruction is valid.

Transition logic

If the state before and after the instruction execution is consistent, the update of the state must be executed according to the following logic:

    #Context: m(.).
    #Input state: (pc, ap, and fp).
    #Output state: (next_pc, next_ap, and next_fp).
    #Compute op0. 
If op0_reg == 0: // Judge the basic register of op0 according to the flag value, 0 is ap,1 is fp, and obtain the value of op0.
        op0 = m(ap + off_{op0})
    else:
        op0 = m(fp + off_{op0})
    #Compute op1 and instruction_size.
    switch op1_src: // Obtain the value of op1
        case 0:
            instruction_size = 1 // op1 = m[[ap/fp + off_op0] +off_op1], with two-level imbedding at most.
            op1 = m(op0 + off_{op1})
        case 1:
            instruction_size = 2 // There exist(s) immediate number(s). The instruction size is 2 words.
            op1 = m(pc + off_{op1})// 
            #If off_{op1} = 1, we have op1 = immediate_value. // For example,  [ap] = "12345678", op1 = "12345678"
        case 2:
            instruction_size = 1 // op1 = [fp + off_op1]
            op1 = m(fp + off_{op1})
        case 4:
            instruction_size = 1 // op1 = [ap +off_op1]
            op1 = m(ap + off_{op1})
        default:
            Undefined Behavior
    #Compute res.
    if pc_update == 4:  // jnz
        if res_logic == 0 && opcode == 0 && ap_update != 1: // Assign && jump && advanced ap
            res = Unused // Under conditional jump, the values of res_logic, opcode and ap_update flags can only be as shown above.The res variable will not be used on this occasion.
        else:
            Undefined Behavior
    else if pc_update = 0, 1 or 2:
        switch res_logic: // The computational logic of res is:
            case 0: res = op1
            case 1: res = op0 + op1
            case 2: res = op0 * op1
            default: Undefined Behavior
    else: Undefined Behavior
    # Compute dst.
    if dst_reg == 0:
        dst = m(ap + offdst)
    else:
        dst = m(fp + offdst)
    # Compute the new value of pc.
    switch pc_update:
        case 0: # The common case: [ap] = 1
            next_pc = pc + instruction_size
        case 1: # Absolute jump: jmp abs 123
            next_pc = res
        case 2: # Relative jump: jmp rel [ap - 1]
            next_pc = pc + res
        case 4: # Conditional relative jump (jnz): jmp rel [ap - 1] if [fp - 7] = 0
            next_pc =
                if dst == 0: pc + instruction_size // If dst = 0, then pc conducts ordinary updates; otherwise, it is similar to Case 2.
                else: pc + op1 // Under this circumstance, res is Unused, so pc directly conducts + op1, instead of + res.
        default: Undefined Behavior
    # Compute new value of ap and fp based on the opcode.
    if opcode == 1:
        # "Call" instruction.
        assert op0 == pc + instruction_size
        assert dst == fp
        # Update fp.
        next_fp = ap + 2 // [ap] saves the current fp value, [ap + 1] saves the pc after the call instruction; when the ret instruction is executed, it is convenient to reset fp to [ap], and then continue to execute subsequent instructions.
        # Update ap.
        switch ap_update:
            case 0: next_ap = ap + 2 // [ap] saves the value of fp, and [ap+1] saves the instruction address after the call instruction; thus, ap increases by 2.
        default: Undefined Behavior
    else if opcode is one of 0, 2, 4:
        # Update ap.
        switch ap_update:
            case 0: next_ap = ap // [fp + 1] = 5
            case 1: next_ap = ap + res // advanced ap  [ap] += 123
            case 2: next_ap = ap + 1 // normal  [fp + 1] = [ap]; ap++
            default: Undefined Behavior
        switch opcode: // Within the same function, the fp value remains unchanged.
            case 0:
                next_fp = fp
            case 2:
                # "ret" instruction.
                next_fp = dst // The ret instruction goes with the call instruction, and assert dst == fp.
            case 4:
                # "assert equal" instruction.
                assert res = dst
                next_fp = fp
    else: Undefined Behavior

Instruction verification

As shown in Figure 1, one instruction consists of the following elements:

structure instruction := 
(off_dst : bitvec 16) 
(off_op0 : bitvec 16)
(off_op1 : bitvec 16)    
(flags : bitvec 15) 

Where:

offdst[ bit0..15],
offop0
[ bit16..31],
offop1
[ bit32..47] are in range
215+i=015bi2i[215,215)

But in AIR, we convert them into the following form:
off~=off+215
(Where * represents one of dst, op0 and op1)

So the value range of

off~should be
[0,216)

Therefore, for the coding of an instruction, we have:

inst=off~dst+216off~op0+232off~op1+248i=014(2ifi)1
Note: Instead of allocating 15 virtual columns with a length of NN to the 15-bit flags of the instruction, instead, we use one virtual column
f~i=j=i142jifj
with a length of 16N, which meets the following requirements:

  1. f~0=j=0142j0fj
    is a15-bit value
  2. f~15=0
  3. f~i2f~i+1=j=i14(2jifj)2j=i+114(2ji1fj)=j=i14(2jifj)j=i+114(2jifj)=fi

Therefore, equation (1) can be written as:

inst=off~dst+216off~op0+232off~op1+248f~0[0,263)2

Because the finite field's characteristic

P>263, one instruction can only correspond to one valid field element.

Therefore, for the instruction itself, the following constraints should be met:

Instruction:

inst=off~dst+216off~op0+232off~op1+248f~0
Bit:
(f~i2f~i+1)(f~i2f~i+11)=0 for all i[0,15)

Last value is zero:
f~15=0

Offset are in range: virtual column
off~[0,216)
$, so then
off[215,215)

Instruction examples

Assert equal

The assert equal instruction can be expressed in the following syntax:

<left_handle_op> = <right_handle_op>

It ensures that both sides of the formula are equal; otherwise the execution of the program will be rejected.
The left side of the equation often comes from

[fp+offdst] or
[ap+offdst]
, and the right side has some possible forms(
reg0
and
reg1
can be
fp
or
ap
,
can be addition or multiplication, and
imm
can be any fixed field elements):

  • imm
  • [reg1+offop1]
  • [reg0+offop0][reg1+offop1]
  • [reg0+offop0]imm
  • [[reg0+offop0+offop1]

Note2: Division and subtraction can be expressed as multiplication and addition with different operand orders, respectively.
An

assert instruction can be considered as an assignment instruction, in which the value of one side is known, and the other side is unknown. For example,
[ap]=4
can be considered as an assertion that the value of [ap] is 4, or as an assignment setting [ap] to 4, according to the context.

Figure 4 shows some examples of "assert equal" instructions and the flag values for each instruction:

Figure 4. Examples of assert equal instructions

Interpret instruction [fp + 1] = 5:

  • assert instruction => opcode = 4
  • next_ap = ap => ap_update = 00 = 0
  • next_pc = pc + instruction_size => pc_update = 000 = 0
  • For op0 and op1, there is no add or mul => res_logic(res) = 00 = 0
  • There exists an immediate value => op1_src(op1) = 001 = 1
  • The immediate value address instructs that the addresses are adjacent => off_op1 = 1
  • The left side of equation [fp + 1] => dst_reg(dst) = 1
  • The left side of equation [fp + 1] => off_dst = 1
  • op0_reg/ off_op0 => inital value(1/-1) //Because these flags are not used in this instruction, the default value is filled in.

Conditional and unconditional jump

The

jmp instruction allows changing the value of the program counter
pc
.

Cairo supports relative jump (its operand represents the offset from the current

pc)and absolute jump - represented by keywords
rel
and
abs
respectively. The
jmp
instruction may be conditional. For example, when the value of a memory unit is not 0, the
jmp
instruction will be triggered.

The syntax of the instruction is as follows:

#Unconditional jumps.
jmp abs <address>
jmp rel <offset>
#Conditional jumps.
jmp rel <offset> if <op> !

Figure 5 shows some examples of the

jmp instructions and the corresponding flag values of each instruction:


Figure 5 Examples of jump instructions

Interpret instruction jmp rel [ap +1] + [fp - 7]:

  • jmp instruction => opcode = 0
  • next_ap = ap => ap_update = b00 = 0
  • next_pc = pc + res=> pc_update = b010 = 2
  • res = op0 + op1 => res_logic(res) = b01 = 1
  • op1: [fp - 7] => op1_src(op1) = b010 = 2
  • op1: [fp - 7] => off_op1 = -7
  • op0: [ap + 1] => op0_src(op0) = 0
  • op0: [ap + 1] => off_op0 = 1
  • dst_reg/ off_dst => inital value(1/-1) ///Because these flags are not used in this instruction, the default value is filled in.

call
and
ret

The

call and
ret
instructions allow the implementation of the function stack. The
call
instruction updates the program counter(
pc
)and frame pointer(
fp
)registers. The update of the program counter is similar to the
jmp
instruction. The previous value of
fp
is written to
[ap]
, to allow the
ret
instruction to reset the value of
fp
to the value before the call; similarly, the returned
pc
(the address of the instruction after the call instruction) is written to
[ap+1]
, to allow the
ret
instruction to jump back and continue the execution of the code after the call instruction. As two memory units were written,
ap
advanced by 2 and
fp
is set to the new
ap
.

The syntax of the instructions are as follows:

call abs <address>
call rel <offset>
ret

Figure 6 shows some examples of

call and
ret
instructions and flag values corresponding to each instruction:

Figure 6 Examples of call instructions and ret instructions

Interpret instruction call abs [fp + 4]:

  • call instruction => opcode = 0
  • next_ap = ap => ap_update = b00 = 0
  • next_pc = res => pc_update = b001 = 1
  • res = op1 => res_logic(res) = b00 = 0
  • op1: [fp + 4] => op1_src(op1) = b010 = 2
  • op1: [fp + 4] => off_op1 = 4
  • op0_reg/ off_op0 => inital value(0/1) ///Because these flags are not used in this instruction, the default value is filled in.
  • dst_reg/ off_dst => inital value(0/0) ///Because these flags are not used in this instruction, the default value is filled in.

Advancing
ap

The instruction ap += <op> increments the value of

ap by the given operand.

Figure 7 shows some examples of the advancing

ap instructions and the corresponding flag values of each instruction:

Figure 7 Examples of advancing ap instructions

Interpret instruction ap += 123:

  • advancing ap instruction => opcode = 0
  • next_ap = ap + res => ap_update = b01 = 1
  • next_pc = pc + instruction_size => pc_update = b000 = 0
  • res = op1 => res_logic(res) = b00 = 0
  • op1 = 123 => op1_src(op1) = b001 = 1
  • op1 = 123 => off_op1 = 1
  • op0_reg/ off_op0 => inital value(1/-1) ///Because these flags are not used in this instruction, the default value is filled in.
  • dst_reg/ off_dst => inital value(1/-1) ///Because these flags are not used in this instruction, the default value is filled in.