# Sin7Y Tech Review (22): Cairo - Instruction ![](https://i.imgur.com/WyvNOje.jpg) Cairo is a practically-efficient Turing-complete STARK-friendly CPU architecture. In this article, we introduce the CPU architecture of Cairo in terms of instruction structure and state transition, and provide some examples of instruction. # Instruction structure The word natively supported by Cairo CPU is a field element, where the field is some fixed finite field of characteristic $P > 2^{63}$. Each instruction will occupy 1 or 2 words. If an immediate value([ap] = "12345678")follows the instruction, the instruction will occupy 2 words, and the value will be stored in the second word. The first word of each instruction consists of the following elements: ![](https://i.imgur.com/k74WZvY.png) * $off_{dst}$[ bit0..15]: destination address offset, representative value $-2^{15} + \sum_{i = 0}^{15} b_i \cdot 2^i \in [-2^{15}, 2^{15})$; * $off_{op0}$[ bit16..31]: op0 address offset, representative value $-2^{15} + \sum_{i = 0}^{15} b_i \cdot 2^i \in [-2^{15}, 2^{15})$; * $off_{op1}$[ bit32..47]: op1 address offset, representative value $-2^{15} + \sum_{i = 0}^{15} b_i \cdot 2^i \in [-2^{15}, 2^{15})$; * $dst \ reg$[ bit48]: Base register of destination address offset, ap or fp; * $op0 \ reg$[ bit49]: Base register of op0 address offset, ap or fp; * $op1\_src$[ bit50..52]: address of op1, - Case: 000 $$op1 = m(op0 + off_{op1})$$ - Case: 001 $$op1 = m(pc + off_{op1})$$ - Case: 010 $$op1 = m(fp + off_{op1})$$ - Case: 100 $$op1 = m(ap + off_{op1})$$ - $res\_logic$ [ bit53..54]: computational logic - Case: 00 $$res = op1$$ - Case: 01 $$res = op0 + op1$$ - Case: 10 $$res = op0 * op1$$ - $pc\_update$ [ bit55..57]: the update logic of pc - Case: 000 // common $$next\_pc = pc + instruction\_size$$ - Case: 001 // absolute jump $$next\_pc = res$$ - Case: 010 // relative jump $$next\_pc = pc + res$$ - Case: 100 // conditional relative jump $$next\_pc = pc + op1(or \ instruction\_size \ if \ dst = 0)$$ - $ap\_update$[ bit58..59]: the update logic of ap - Case: 00 $$next\_ap = ap(or \ ap + 2 \ if \ opcode = 1)$$ - Case: 01 $$next\_ap = ap + res$$ - Case: 10 $$next\_ap = ap + 1$$ - $opcode$[ bit60..62]: opcode types - Case: 000 // jmp - Case: 001 // call - Case: 010 // ret - Case: 100 // assert # State transition The state transition function represents a general state transition unit (because it contains the processing logic of all instruction types), and a calculation is usually decomposed into multiple continuously executed instructions. Therefore, we need to: 1. ensure the content of the instruction and the validity of the states before and after the instruction is executed (e.g., passing a certain range check and state consistency check) and 2. ensure that the executed instruction is valid. ## Transition logic If the state before and after the instruction execution is consistent, the update of the state must be executed according to the following logic: ``` #Context: m(.). #Input state: (pc, ap, and fp). #Output state: (next_pc, next_ap, and next_fp). #Compute op0. If op0_reg == 0: // Judge the basic register of op0 according to the flag value, 0 is ap,1 is fp, and obtain the value of op0. op0 = m(ap + off_{op0}) else: op0 = m(fp + off_{op0}) #Compute op1 and instruction_size. switch op1_src: // Obtain the value of op1 case 0: instruction_size = 1 // op1 = m[[ap/fp + off_op0] +off_op1], with two-level imbedding at most. op1 = m(op0 + off_{op1}) case 1: instruction_size = 2 // There exist(s) immediate number(s). The instruction size is 2 words. op1 = m(pc + off_{op1})// #If off_{op1} = 1, we have op1 = immediate_value. // For example, [ap] = "12345678", op1 = "12345678" case 2: instruction_size = 1 // op1 = [fp + off_op1] op1 = m(fp + off_{op1}) case 4: instruction_size = 1 // op1 = [ap +off_op1] op1 = m(ap + off_{op1}) default: Undefined Behavior #Compute res. if pc_update == 4: // jnz if res_logic == 0 && opcode == 0 && ap_update != 1: // Assign && jump && advanced ap res = Unused // Under conditional jump, the values of res_logic, opcode and ap_update flags can only be as shown above.The res variable will not be used on this occasion. else: Undefined Behavior else if pc_update = 0, 1 or 2: switch res_logic: // The computational logic of res is: case 0: res = op1 case 1: res = op0 + op1 case 2: res = op0 * op1 default: Undefined Behavior else: Undefined Behavior # Compute dst. if dst_reg == 0: dst = m(ap + offdst) else: dst = m(fp + offdst) # Compute the new value of pc. switch pc_update: case 0: # The common case: [ap] = 1 next_pc = pc + instruction_size case 1: # Absolute jump: jmp abs 123 next_pc = res case 2: # Relative jump: jmp rel [ap - 1] next_pc = pc + res case 4: # Conditional relative jump (jnz): jmp rel [ap - 1] if [fp - 7] = 0 next_pc = if dst == 0: pc + instruction_size // If dst = 0, then pc conducts ordinary updates; otherwise, it is similar to Case 2. else: pc + op1 // Under this circumstance, res is Unused, so pc directly conducts + op1, instead of + res. default: Undefined Behavior # Compute new value of ap and fp based on the opcode. if opcode == 1: # "Call" instruction. assert op0 == pc + instruction_size assert dst == fp # Update fp. next_fp = ap + 2 // [ap] saves the current fp value, [ap + 1] saves the pc after the call instruction; when the ret instruction is executed, it is convenient to reset fp to [ap], and then continue to execute subsequent instructions. # Update ap. switch ap_update: case 0: next_ap = ap + 2 // [ap] saves the value of fp, and [ap+1] saves the instruction address after the call instruction; thus, ap increases by 2. default: Undefined Behavior else if opcode is one of 0, 2, 4: # Update ap. switch ap_update: case 0: next_ap = ap // [fp + 1] = 5 case 1: next_ap = ap + res // advanced ap [ap] += 123 case 2: next_ap = ap + 1 // normal [fp + 1] = [ap]; ap++ default: Undefined Behavior switch opcode: // Within the same function, the fp value remains unchanged. case 0: next_fp = fp case 2: # "ret" instruction. next_fp = dst // The ret instruction goes with the call instruction, and assert dst == fp. case 4: # "assert equal" instruction. assert res = dst next_fp = fp else: Undefined Behavior ``` ## Instruction verification As shown in Figure 1, one instruction consists of the following elements: ``` structure instruction := (off_dst : bitvec 16) (off_op0 : bitvec 16) (off_op1 : bitvec 16) (flags : bitvec 15) ``` Where: $off_{dst}$[ bit0..15], $off_{op0}$[ bit16..31], $off_{op1}$[ bit32..47] are in range $$-2^{15} + \sum_{i = 0}^{15} b_i \cdot 2^i \in [-2^{15}, 2^{15})$$ But in AIR, we convert them into the following form: $$\widetilde{off}_* = off_* +2^{15}$$(Where * represents one of dst, op0 and op1) So the value range of $\widetilde{off}_*$should be $[0,2^{16})$ Therefore, for the coding of an instruction, we have: $$inst = \widetilde{off}_{dst} + 2^{16} \cdot \widetilde{off}_{op0} + 2^{32} \cdot \widetilde{off}_{op1} + 2^{48} \cdot \displaystyle\sum_{i=0}^{14} (2^i \cdot f_i) (1)$$ **Note:** Instead of allocating 15 virtual columns with a length of NN to the 15-bit flags of the instruction, instead, we use one virtual column$$ \tilde f_i = \textstyle\sum_{j=i}^{14} 2^{j-i} \cdot f_j$$with a length of 16N, which meets the following requirements: 1. $\tilde f_0 = \textstyle\sum_{j=0}^{14} 2^{j-0} \cdot f_j$ is a15-bit value 2. $\widetilde f_{15} = 0$ 3. $$\widetilde f_i - 2\widetilde f_{i+1} = \displaystyle\sum_{j=i}^{14} (2^{j-i} \cdot f_j) - 2 \cdot \displaystyle\sum_{j=i + 1}^{14} (2^{j-i-1} \cdot f_j) = \displaystyle\sum_{j=i}^{14} (2^{j-i} \cdot f_j) - \displaystyle\sum_{j=i + 1}^{14} (2^{j-i} \cdot f_j) = f_i$$ Therefore, equation (1) can be written as: $$inst = \widetilde{off}_{dst} + 2^{16} \cdot \widetilde{off}_{op0} + 2^{32} \cdot \widetilde{off}_{op1} + 2^{48} \cdot \tilde f_0 \in [0,2^{63}) (2)$$ Because the finite field's characteristic $P > 2^{63}$, one instruction can only correspond to one valid field element. Therefore, for the instruction itself, the following constraints should be met: **Instruction:** $inst = \widetilde{off}_{dst} + 2^{16} \cdot \widetilde{off}_{op0} + 2^{32} \cdot \widetilde{off}_{op1} + 2^{48} \cdot \tilde f_0$ **Bit:** $(\widetilde f_i - 2\widetilde f_{i+1} )(\widetilde f_i - 2\widetilde f_{i+1} - 1) = 0 \ for \ all \ i \in [0,15)$ **Last value is zero:** $\widetilde f_{15} = 0$ **Offset are in range:** virtual column $\widetilde{off}_* \in [0,2^{16})$$, so then ${off}_* \in [2^{-15}, 2^{15})$ # Instruction examples ## Assert equal The assert equal instruction can be expressed in the following syntax: *<left_handle_op> = <right_handle_op>* It ensures that both sides of the formula are equal; otherwise the execution of the program will be rejected. The left side of the equation often comes from $[fp+off_{dst}]$ or $[ap+off_{dst}]$, and the right side has some possible forms($reg_0$ and $reg_1$ can be $fp$ or $ap$, $\circ$ can be addition or multiplication, and $imm$ can be any fixed field elements): - $imm$ - $[reg_1+off_{op1}]$ - $[reg_0+off_{op0}] \circ [reg_1 + off_{op1}]$ - $[reg_0+off_{op0}] \circ imm$ - $[[reg_0+off_{op0} + off_{op1}]$ **Note2:** Division and subtraction can be expressed as multiplication and addition with different operand orders, respectively. An $assert$ instruction can be considered as an assignment instruction, in which the value of one side is known, and the other side is unknown. For example, $[ap] = 4$ can be considered as an assertion that the value of [ap] is 4, or as an assignment setting [ap] to 4, according to the context. Figure 4 shows some examples of "assert equal" instructions and the flag values for each instruction: ![](https://i.imgur.com/71rAspp.png) Figure 4. Examples of assert equal instructions **Interpret** instruction [fp + 1] = 5: - assert instruction => opcode = 4 - next_ap = ap => ap_update = 00 = 0 - next_pc = pc + instruction_size => pc_update = 000 = 0 - For op0 and op1, there is no add or mul => res_logic(res) = 00 = 0 - There exists an immediate value => op1_src(op1) = 001 = 1 - The immediate value address instructs that the addresses are adjacent => off_op1 = 1 - The left side of equation [fp + 1] => dst_reg(dst) = 1 - The left side of equation [fp + 1] => off_dst = 1 - op0_reg/ off_op0 => inital value(1/-1) //Because these flags are not used in this instruction, the default value is filled in. ## Conditional and unconditional jump The $jmp$ instruction allows changing the value of the program counter $pc$. Cairo supports relative jump (its operand represents the offset from the current $pc$)and absolute jump - represented by keywords $rel$ and $abs$ respectively. The $jmp$ instruction may be conditional. For example, when the value of a memory unit is not 0, the $jmp$ instruction will be triggered. The syntax of the instruction is as follows: ``` #Unconditional jumps. jmp abs <address> jmp rel <offset> #Conditional jumps. jmp rel <offset> if <op> ! ``` Figure 5 shows some examples of the $jmp$ instructions and the corresponding flag values of each instruction: ![](https://i.imgur.com/Gry2T8x.png) Figure 5 Examples of jump instructions **Interpret** instruction jmp rel [ap +1] + [fp - 7]: - jmp instruction => opcode = 0 - next_ap = ap => ap_update = b00 = 0 - next_pc = pc + res=> pc_update = b010 = 2 - res = op0 + op1 => res_logic(res) = b01 = 1 - op1: [fp - 7] => op1_src(op1) = b010 = 2 - op1: [fp - 7] => off_op1 = -7 - op0: [ap + 1] => op0_src(op0) = 0 - op0: [ap + 1] => off_op0 = 1 - dst_reg/ off_dst => inital value(1/-1) ///Because these flags are not used in this instruction, the default value is filled in. ## $call$ and $ret$ The $call$ and $ret$ instructions allow the implementation of the function stack. The $call$ instruction updates the program counter( $pc$ )and frame pointer( $fp$ )registers. The update of the program counter is similar to the $jmp$ instruction. The previous value of $fp$ is written to $[ap]$, to allow the $ret$ instruction to reset the value of $fp$ to the value before the call; similarly, the returned $pc$ (the address of the instruction after the call instruction) is written to $[ap+1]$, to allow the $ret$ instruction to jump back and continue the execution of the code after the call instruction. As two memory units were written, $ap$ advanced by 2 and $fp$ is set to the new $ap$. The syntax of the instructions are as follows: ``` call abs <address> call rel <offset> ret ``` Figure 6 shows some examples of $call$ and $ret$ instructions and flag values corresponding to each instruction: ![](https://i.imgur.com/7y3jsI2.png) Figure 6 Examples of call instructions and ret instructions **Interpret** instruction call abs [fp + 4]: - call instruction => opcode = 0 - next_ap = ap => ap_update = b00 = 0 - next_pc = res => pc_update = b001 = 1 - res = op1 => res_logic(res) = b00 = 0 - op1: [fp + 4] => op1_src(op1) = b010 = 2 - op1: [fp + 4] => off_op1 = 4 - op0_reg/ off_op0 => inital value(0/1) ///Because these flags are not used in this instruction, the default value is filled in. - dst_reg/ off_dst => inital value(0/0) ///Because these flags are not used in this instruction, the default value is filled in. ## Advancing $ap$ The instruction *ap += <op>* increments the value of $ap$ by the given operand. Figure 7 shows some examples of the advancing $ap$ instructions and the corresponding flag values of each instruction: ![](https://i.imgur.com/3SMqjHx.png) Figure 7 Examples of advancing ap instructions **Interpret** instruction ap += 123: - advancing ap instruction => opcode = 0 - next_ap = ap + res => ap_update = b01 = 1 - next_pc = pc + instruction_size => pc_update = b000 = 0 - res = op1 => res_logic(res) = b00 = 0 - op1 = 123 => op1_src(op1) = b001 = 1 - op1 = 123 => off_op1 = 1 - op0_reg/ off_op0 => inital value(1/-1) ///Because these flags are not used in this instruction, the default value is filled in. - dst_reg/ off_dst => inital value(1/-1) ///Because these flags are not used in this instruction, the default value is filled in.