HackMD - Collaborative Markdown Knowledge Base

**2.4 Connecting ZKWASM Virtual Machine with Arithmetic Circuits** Now we are ready to make one step further. Instead of constructing a ZKSNARK scheme for simple programs, we would like to construct a ZKSNARK, for a WASM virtual machine. ZKWASM as a large program of its own, it needs to emulate the execution of $I$ start with $E$ under $IO$.stdin to generate $IO$.stdout and provide a ZKSNARK proof which proves that $IO$.stdout is valid. Similar to what needs to be done for simple programs, we need to construct a huge arithmetic circuits with carefully designed constraints $C$ such that the following two are equivalent: (1) $IO$.stdout is the unique valid output if the execution of I start with $E$ under $IO$.stdin satisfies the WASM specification. (2) There exists a list of witness 𝑠𝑖 such that $C(I, E, IO, s_0, s_1, · · · , s_e )$ = 0. We noticed that a valid execution trace will always produce a valid output respecting the WASMspecification. So to construct ZKSNARK for WASM virtual machine, it is sufficient to constructan arithmetic circuit $C$ of two states before and after an instruction so that the following two areequivalent. (1) Given $(I, E, IO)$ and $𝑠_0$ is the initial state, $[t_0, t_1, t_2, · · · ]$ is a valid execution trace satisfy Definition 2.1. (2) Given an execution trace $[t_0, t_1, · · · ]$ of $(I, E, IO, s_0)$ and s_k = t_(k−1)◦· · ·𝑡1◦𝑡0 (𝑠0). C(I, E, IO, 𝑠𝑘, 𝑠𝑘+1) = 0 implies $t_k$ enforces the semantics of $op(s_k .iaddr)$. We do not construct such circuit $C$ from scratch, we construct it from small building blocks inSection 3 then create the architecture of $C$ in Section 4 and present all the details in Section 5. **3 BASIC BUILDING BLOCKS OF ZAWA CIRCUITS** As described in Section 2.4, the arithmetic circuit of execution trace is crucial in constructingSNARKS of WASM virtual machine. In this section we will give a brief of some basic techniquesand elementary circuits used to construct our final arithmetic circuits in ZAWA. **3.1 Representing Basic Types in Halo2 Constraint System** Recall that to prove an arithmetic circuit matrix with constraint $C$ holds, the Plonkish proofsystem interpolates each column $ci$ into polynomials $c_i$(𝑥) such that $c_i$(𝑗) = 𝑐𝑖𝑗 and then uses KCG commitment scheme to prove $C(c_i(𝑥))$ = 0 holds for all $x$ = 1, 2, 3, · · · . However, to use KZG commitment scheme on polynomial $c_i$, we require $c_i(𝑥) ∈ F$ where $F$ is thescalar field of some elliptic curve $C$. Therefore, each $c_i(𝑗)$ is in the scalar field $F$ of the elliptic curveC in Halo2’s arithmetic circuit system. Since the basic types in WASM are i64 and i32 which do notmatch the number field $F$ in Halo2, we need to add a constraint $x$ < 232 (or $x$< 264) to represent avariable $x$ with type $i32$ or $i64$. In ZAWA, we use $TN$ to denote a table containing elements from 0to $2N$ − 1 and then we use a polynomial lookup to prove that values of some column 𝑐𝑖 are lessthan 2𝑁 − 1 by 𝑝𝑙𝑜𝑜𝑘𝑢𝑝(TN, 𝑐𝑖(𝑗)) = 0. When 𝑁 is large (e.g. 64) and TN becomes too big, we willdecompose an i64 into several parts and prove that each part is less then 28 − 1. Below we use thenotation 𝑥 ∈ TN to denote 𝑥 < 2𝑁 and omit the details of decomposing 𝑥 into small pieces whennecessary. **3.2 Representing Map Using Polynomial Lookup of Tables** Other than specifying a range for cells, another usage of polynomial lookup is that we can encodethe state of key-value map into tables and use polynomial lookup to specify the semantics of gettinga value of a certain key in a map. Here is an example. Recall that we represent the state of ZAWA by $(iaddr, F,M, G, SP, I, IO)$where $C$ and $H$ are fixed by the WASM image. We encode state $C$ and $H$ in tables TC : 𝐴𝑑𝑑𝑟𝑒𝑠𝑠 × Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. ZAWA: A ZKSNARK WASM Emulator $opcode$ and $T_H$ : $Address × U 64$. By doing this, we can use the polynomial lookup to specify thesemantic of getting opcode $op$ at address $addr$ in C by (𝑖𝑎𝑑𝑑𝑟, 𝑜𝑝) ∈ TC and specify the semanticof WASM of getting the initial byte data $v$ at address $addr$ in H by (𝑑, 𝑎𝑑𝑑𝑟) ∈ TH. **3.3 Representing Math Semantic as Arithmetic Circuits** According to the WASM specification, the semantics of opcodes are usually defined as mathematical equations and state transformation. Thus we need to construct arithmetic circuits to enforce the semantics of the opcodes. For example, suppose that the opcode $divu$ (a division of unsigned int) has the following semantics: 𝑑𝑖𝑣𝑢 (𝑎, 𝑏) = (𝑎 − 𝑎 mod 𝑏) ÷ b It follows that to write the above mathematical definition into polynomial constraints we need tointroduce intermediate witness $r$ such that the above semantics can be rewritten as follows: ![image](https://hackmd.io/_uploads/r1IF3ZavT.png) However, since $r$ and $b$ are in $F$, it needs more work to represent $r < b$ into polynomial constraints.Fortunately, in ZAWA, we use range check to constraintr and $b$ within 64 bits. The above constraintscan be further rewritten into the following polynomial constraints with one more extra witness $k$: ![image](https://hackmd.io/_uploads/rJui3WpDT.png) When dealing with opcode that has more complicated mathematical semantics, we need a way toformally prove that the derived constraints represent the same semantic. In ZAWA, we use Z3 toformally check that the mathematical definition is correctly refined to the arithmetic circuits. **3.4 Enforcing Valid Dynamic State Accessing using Polynomial Lookup Tables** Given a sequence of state transition function {$t_i$ } such that each transition might read or writefinitely many key-value pairs (e.g. access memory M, stack SP or global G) in the state S.We label each read or write of {$t_i$ } in a sub sequence {𝑡𝑘𝑖} and use the tuple (𝑡𝑖𝑑 = 𝑖,𝑚𝑖𝑑 =𝑘, 𝑎𝑐𝑐𝑒𝑠𝑠𝑇𝑦𝑝𝑒, 𝑎𝑑𝑑𝑟𝑒𝑠𝑠, 𝑣𝑎𝑙𝑢𝑒) to denote the access log of {𝑡𝑘𝑖} such that each access log has thefollowing semantic: * Init memory: (𝑡𝑖𝑑,𝑚𝑖𝑑,𝑖𝑛𝑖𝑡, 𝑎𝑑𝑑𝑟, 𝑣) := {𝑠.𝑎𝑑𝑑𝑟 = 𝑣; } * Write value 𝑣 to memory: (𝑡𝑖𝑑,𝑚𝑖𝑑,𝑤𝑟𝑖𝑡𝑒, 𝑎𝑑𝑑𝑟, 𝑣) := {𝑠.𝑎𝑑𝑑𝑟 = 𝑣; } * Read from memory: (𝑡𝑖𝑑,𝑚𝑖𝑑, 𝑟𝑒𝑎𝑑, 𝑎𝑑𝑑𝑟, 𝑣) := {𝑎𝑠𝑠𝑒𝑟𝑡(𝑠.𝑎𝑑𝑑𝑟 ≡ 𝑣); } As the address in 𝑡𝑘𝑖can be randomly distributed which makes it hard to reason about the factthat a read from a address $addr$ should get the value $v$ that related to the latest write or init of that $addr$. To solve this, we do the following. First, we create a lookup table T by rearranging the log bytheir access address and order them by (tid, mid) within each address block (see Table 2) Second, we enforce the semantic of init, read and write by equip Table 2 with constraints of eachrow using Equation 5. ![image](https://hackmd.io/_uploads/SkJNSM6DT.png) Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. Sinka Gao, Hongfei Fu, Heng Zhang, Junyu Zhang, and Guoqiang Li Table 2. Memory access table | address | id = (tid, mid) | accessType | value | | ------------------------------ |:---------------------------:|:--------------------------:|:--------------------:| | addr1 addr1 addr1 | tid1 tid2 tid3 | acc1 acc2 acc3 | v1 v2 v3 | | addr2 addr2 | tid4 tid5 | acc3 acc4 | v4 v5 | | ... | tidk | acck | $v_k$ | Remark 3. Rearranging means the map from access log to $T$ is a one to one map. The first constraintenforces that for access logs visiting the same address, they are sorted by their accessing order. Thesecond constraint enforces that the read access get the correct value and the third constraint enforcesthat init happens once and only once at the beginning of each address block. Theorem 3.1. Give an memory access log $L_i$, the log $L_i$ is valid if there exists a table $T$ such that $T$ satisfies the above constraints and each $L_i$ is in $T$ and vice versa. Proof. First, for init access log, the only constraint we need to enforce is that each address canonly be init once. Suppose there are two init access logs $L_a$ and $L_b$ init the same address in theaccess log $L_i$, then they must both exists in $T$ in the same address block, which contradicts thethird constraint of Equation 5. Second, for read access log 𝐿𝑟 = (𝑡𝑖𝑑,𝑚𝑖𝑑, 𝑟𝑒𝑎𝑑, 𝑎𝑑𝑑𝑟, 𝑣), we need toprove that the latest write or init access log $L_latest$ to $addr$ has put value $v$ into $addr$. Consider the second constraint of Equation 5, it is sufficient to prove that the $L_latest$ is the closest entry to $L_r$ in $T$. Suppose that $L_latest$ is not the closest rewrite (init) entry to $L_r$ in $T$, then there exists anotherwrite access log $L_0$ between $L_latest$ and $L_r$ such that 𝐿𝑜 .𝑖𝑑 > 𝐿𝑙𝑎𝑡𝑒𝑠𝑡 .𝑖𝑑 (by the first constraint ofEquation 5) which means that 𝐿𝑜 is the latest update of 𝑎𝑑𝑑𝑟 and contradicts the assumption. Inthe end, all the write access are valid because all the parameters are explicit. In the end, as a consequence of Theorem 3.1, given any read access at address $addr$ withfixed $tid$, $mid$ and $accessType$, we can check the validity of the return value $v$ by checking$(t_i,mid,accessType,addr, v) ∈ T$. **4 ZAWA ARCHITECTURE CIRCUITS** As we have prepared our circuit building blocks in Section 3, we start constructing the main circuitsinvolved in ZAWA. We will first describe the workflow of ZAWA by splitting it into four stages togive a big picture of how different circuits (see Figure 1) interplay with each other and then we willpresent the details of each circuit. Step 1: Image Setup. Defined by the WASM specification, a WASM image I is divided into sections.Among them, there are sections that do not affect the execution of WASM (custom section, typesection, export section, data count section) and sections that decide the execution semantics (initialmemory section, code section, global data section). At the image setup stage, we encode the codesection into the lookup table TI and the data section into the lookup table TH. These two tables willbe used to enforce that each instruction in the execution trace is a valid instruction and that all the initialization of the memory access log table complies with the initial data section of image $I$. Step 2: Execution Trace Generation. Recall that a valid execution trace is a sequence of transitionfunctions $[t_0, t_1, · · · ]$ such that each $t_i$ is related to the $i$th instruction during the execution of Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. ZAWA: A ZKSNARK WASM Emulator ![arch-circuits](https://hackmd.io/_uploads/ryLu5fTD6.png) 1. Architecture circuits $(I, E, IO)$. We uses the standard WASM run-time interpreter to generate $t_i$ that is valid as defined inDefinition 2.1 Remark 4. We do not require the WASM run-time interpreter to be a trust component since if it generates an invalid sequence, the constraints of the Execution Circuit fail because our Execution Circuit enforces the semantics of each instruction. Step 3: Synthesis Circuits. Once a valid execution trace is generated, it can be used to fill our mainexecution circuit TE, together with other lookup tables TF (calling frame table), TM (memoryaccess log table), TG (global access log table) and TS P (stack access log table). Step 4: $Proof Generation$. After all the circuits are synthesised, we can generate a ZKSNARK proofvia Halo2’s proof system. The proof can be used to prove that the execution trace and its outputare valid. **4.1 Setup Circuits** Setup circuits are filled by the ZAWA compiler component and its purpose is to provide lookuptables $T_C$, $T_H$, $T_G$ that encode code section, initial memory section and global data section. Code Section. The elementary items in the code section are opcodes of instructions that are groupedin a tree-like hierarchy. Each instruction can be indexed by $moid$ (modular id), $mmid$ (memoryblock instance id), $fid$ (function id) and $iid$ (offset of the instruction in a particular function). Wedenote $iaddr$ to be the tuple of $(moid,mmid,fid,iid)$ and represent the code section as a map from𝑖𝑎𝑑𝑑𝑟 to $opcode$. Using the technique in Section 3.2, it is equivalent to encoding the code sectioninto $T_C$ (see Table 3). Code table $T_C$ is later used to constrain entries in execution table TE (seeSection 4.2) such that if 𝑒 ∈ TE then (𝑒.𝑖𝑎𝑑𝑑𝑟, 𝑒.𝑜𝑝𝑐𝑜𝑑𝑒) must also in TC. Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. Sinka Gao, Hongfei Fu, Heng Zhang, Junyu Zhang, and Guoqiang Li Table 3. Code table | moid | mmid | fid | iid | opcode | | ---- | ---- | ---- |:----:| ------ | | 0𝑥00 | 0𝑥01 | 0𝑥01 | 0𝑥00 | add | | 0𝑥00 | 0𝑥01 | 0𝑥01 | 0𝑥01 | sub | | ... | ... | ... | ... | sub | Initial Memory & Global Data. The element items in the memory section of WASM image are unsigned 64 bit words (u64). The address of each $u64$ word can be indexed by $mmid$ and $of$ $f$ $set$.Besides the value, memory can have types that are either mutable or immutable. Thus the memory section can be represented as a map from $(mmid, of f set)$ to $(value,isMutable)$. Similarly, usingthe technique in Section 3.2, we can encode the initial memory section into $T_H$. Similar to the initmemory section, the global data section contains variable instances that can be shared between different modules which can also be represented as a map from $(mmid,of f set)$ to $(value,isMutable)$.Thus we merge two tables into one and use 𝑙𝑡𝑦𝑝𝑒 = 𝑀𝑒𝑚𝑜𝑟𝑦 |𝐺𝑙𝑜𝑏𝑎𝑙 to distinguish them (see Table4 for an example of $T_H$). | ltype | mmid | offset | value | isMutable | |:------:|:-----:|:------:|:-----:|:---------:| | heap | mmid0 | 1 | 0x01 | true | | heap | mmid1 | 1 | 0x01 | true | | Global | mmid2 | 1 | 0x01 | true | | Global | mmid3 | 1 | 0x01 | false | We use TH to constrains entries in the memory access log table TM (see Table 2) so that∀𝑒, 𝑒 ∈ TM ∧ 𝑒.𝑎𝑐𝑐𝑒𝑠𝑠𝑇𝑦𝑝𝑒 = 𝐼𝑛𝑖𝑡 → (𝑒.𝑖𝑎𝑑𝑑𝑟, 𝑣𝑎𝑙𝑢𝑒) ∈ TH. The meaning of this constraint is thatfor each init access log in TM it must be defined in the initial memory section or global data section. **4.2 Execution Trace Circuits** Execution Trace Circuits are used to constraint the execution trace $[t_0, t_1, t_2, · · · ]$ (see Section 2.1)emulated from WASMI (WASM interpreter). Each trace element is related to an instruction in the code table $T_C$ and has a predefined semantic based on the opcode. The semantics of a WASMopcode is defined based on its parameters derived from the stack and the micro operations. First,since WASM is a stack machine, we define the operands of an opcode 𝑜𝑝 to be $operands(op) = p_0, p_1, p_2 · · · , p_k$ where 𝑝𝑖 are values on the stack and 𝑝𝑖 = 𝑠𝑡𝑎𝑐𝑘 [𝑠𝑝 + 𝑖]. Second, we define the semantics of 𝑜𝑝 by asequence of microoperations ![image](https://hackmd.io/_uploads/HkY9CfpDT.png) Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. ZAWA: A ZKSNARK WASM Emulator When filling execution trace into the execution circuit, we arrange the instruction into small blocks(see Table 5) of the execution circuit such that each block represents an instruction. Within eachblock, we use the 𝑠𝑡𝑎𝑟𝑡 column to indicate whether this row is the start of a new instruction blockand put $op$ and $mop$ in the opcode column. In the address column, we push all used addresses andthe first row is the instruction address of this instruction in $T_C$ and in the $sp$ column we record allthe changes of stack pointer. Table 5. Execution table | start | opcode | bit cell | state | aux | 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 ∈ 𝑇1 | sp | u64 cell | | ----- |:------:| -------- |:-----:|:----:|:------------:|:---:|:--------:| | true | op | b0 | tid0 | aux | iaddr0 | sp | w0 | | 0 | mop0 | b1 | frame | aux0 | addr0 | ... | w1 | | 0 | mop1 | b2 | ... | aux1 | addr1 | ... | w2 | | 0 | mop2 | b3 | s3 | aux2 | addr2 | ... | w3 | | ... | ... | ... | ... | ... | ... | ... | ... | | true | op1 | b | tid1 | aux | iaddr1 | sp' | w | | ... | ... | ... | ... | ... | ... | ... | ... | Although different opcodes might have different semantics thus different $mop_k$ , $addr_i$, etc. There are some common constraints that we need to enforce in the execution circuit. First. we need toenforce that each instruction exists in the code section, thus $(iaddr,opcode) ∈ T_C$. Second, suppose that $operand$ $p_i$ is got from stack pointer $sp$ as a result of $mop_k (sp)$, then $(sp, read,iaddr, k, p_i) ∈T_M$, which means the result $p_i$ is enforced from a valid memory access log table. Similarly, suppose that witness $w_i$ is got from memory access of $addr_j$ with access type 𝑙𝑡𝑦𝑝𝑒 as a result of $mop_k$ , then$(mem, addr_j,𝑙𝑡𝑦𝑝𝑒, k,w_i) ∈ T_M$. Third, we enforce that all the cells in bit column are either zero orone and all the cells in u64 witness column and operand column are in $T_64$ (less than 264). **4.3 Frame Circuit** Frame Circuit is a table (see Figure 2) that helps us to find out the next $iaddr$ of the return instruction (see Section 5.2). Each entry of $T_F$ is a tuple of $(prevFrame,currentFrame,iaddr)$ where $currentFrame$ is the tid of the call instruction that starts this call frame, $prevFrame$ is the $tid$ of the call instruction of previous call frame and $iaddr$ is the call instruction address of the currentcall frame. Suppose that $t_i$ is a return instruction at state $s_i$ with $(currentFrame,prevFrame)$ andthe state 𝑠𝑖+1 = 𝑡𝑖 ◦ 𝑡𝑖−1 ◦ · · ·𝑡0 (𝑠0), then we constrain that 𝑝𝑙𝑜𝑜𝑘𝑢𝑝(TF, (𝑝𝑟𝑒𝑣𝐹𝑟𝑎𝑚𝑒, 𝑐𝑢𝑟𝑟𝑒𝑛𝑡𝐹𝑟𝑎𝑚𝑒, 𝑠𝑖+1.(𝑖𝑎𝑑𝑑𝑟 − 1))) = 0 to make sure the return address is correct (see Figure 2). **4.4 Access Log Circuit** Recall that the access log circuit is a unique table corresponding to a valid memory access log sequence and satisfies Equation 5. In WASM specification, an access log is used for three differenttypes that are memory access, stack access and global access. Each access log has a type field that is either Init, Read or Write and all logs are sorted by $(address, (tid, tmid))$ where $address$ is indexedby $(mmid,of f set)$, $tid$ is the transition index of the execution log that contains the access and𝑡𝑚𝑖𝑑 is the index of the access micro-op in that instruction (see Table 2). Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. Sinka Gao, Hongfei Fu, Heng Zhang, Junyu Zhang, and Guoqiang Li ![frame-circuit](https://hackmd.io/_uploads/Hk21MmTDT.png) Fig. 2. Frame circuit **4.5 IO Circuits that Support Zero-knowledge** Zero-knowledge of inputs is not supported in WASM specification. Thus to support the privateinputs which we do not want to leak, we need to add special instructions in the ZAWA to distinguishbetween private and public inputs. We represent public inputs in a separate column and use the polynomial lookup to link input values with the result of $get-public-input(inputCursor)$ (See Figure3). Similarly, we use a separate column to hold output data and use a polynomial lookup to enforce ![public-input](https://hackmd.io/_uploads/B17DMQTw6.png) Fig. 3. Public input circuit that the value we output in the execution circuit 𝑎𝑢𝑥 cell lies in the output column. When dealingwith private inputs from get_private_input(inputCursor), we put them into the related cell with noconstraints as the proof system will hide the value for us. **5 INSTRUCTION CIRCUITS** Once the architecture circuits are all prepared in Section 4, the remaining things are constructingthe circuits $C_op$ of various opcode $op$ for instructions supported by WASM specification. Since the constraint defined on the execution trace circuit will be applied on a row basis, and the cells of the constraints of each $op$ will span over multiple rows, we use the notation $c.(curr + k)$ to denote the 𝑘th cell in column $c$ followed by the current row. For example, suppose that we want to define the constraints of add instruction (see Figure 4)using the circuit layout in Table 6 where $w_1$,$w_2$ are got from the stack and $w_0$ is equal to the result of the add instruction which is pushed back to the stack. First, we know that by definition of 𝑎𝑑𝑑, 𝑤0 = (𝑤1 + 𝑤2) mod 264. Thus by introducing a new witness $overflow$ we encode the mod Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. ZAWA: A ZKSNARK WASM Emulator Table 6. Add circuit within execution trace circuit | start | opcode | bit cell | state | aux | 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 ∈ 𝑇1 | 𝑠𝑝 ∈ 𝑇F | u64 cell | extra | | ----- |:----------:|:--------:|:-------:|:---:|:------------:|:-------:|:--------:|:-----:| | true | add | 𝑜𝑣𝑒𝑟𝑓𝑙𝑜w | tid | nil | iaddr0 | sp | w0 | nil | | 0 | 𝑟𝑒𝑎𝑑𝑆𝑡𝑎𝑐k | nil | nil | nil | nil | - | w1 | nil | | 0 | 𝑟𝑒𝑎𝑑𝑆𝑡𝑎𝑐k | nil | nil | nil | nil | - | w2 | nil | | 0 | 𝑤𝑟𝑖𝑡𝑒𝑆𝑡𝑎𝑐k | nil | nil | nil | nil | - | w3 | nil | | true | 𝑜𝑡ℎ𝑒𝑟𝑜p | - | 𝑡𝑖𝑑 + 1 | nil | 𝑖𝑎𝑑𝑑𝑟1 | sp' | w0 | nil | def add := w1 = read(stack sp);4 w2 = read(stack sp-1); sp' = sp-1; w0 = (w1 + w0) mod 2^64; write(stack, sp-1, w0); FALLTHROUGH Fig. 4. Add instruction definition semantic into arithmetic constraint as 𝑤0 +𝑜𝑣𝑒𝑟 𝑓 𝑙𝑜𝑤 × 264 = 𝑤1 +𝑤2. Second, we enforce the stackoperation are valid, that is (𝑠𝑡𝑎𝑐𝑘, 𝑟𝑒𝑎𝑑, 𝑠𝑝 − 1, 𝑡𝑖𝑑, 0,𝑤0) ∈ 𝑇M, (𝑠𝑡𝑎𝑐𝑘, 𝑟𝑒𝑎𝑑, 𝑠𝑝, 𝑡𝑖𝑑, 1,𝑤1) ∈ 𝑇Mand (𝑠𝑡𝑎𝑐𝑘,𝑤𝑟𝑖𝑡𝑒, 𝑠𝑝 −1, 𝑡𝑖𝑑, 2,𝑤2) ∈ 𝑇M. Third, we need to constrain that the next instruction mustfollow 𝑖𝑎𝑑𝑑𝑟0 in address, therefore 𝑖𝑎𝑑𝑑1 = 𝑖𝑎𝑑𝑑𝑟0 + 1. In the end, we constrain the 𝑠𝑝 column by𝑠𝑝′ + 1 = 𝑠𝑝. Put it all together, and replace variables using the notation of𝑐𝑜𝑙𝑢𝑚𝑛𝑁𝑎𝑚𝑒.(𝑐𝑢𝑟𝑟 +𝑘),we have Equation 6. ![image](https://hackmd.io/_uploads/HykfPmpDa.png) Since constraints are applied on a row basis of a circuit, we need to make sure that $C_add$ doesnot apply on rows that are not a starting row of an instruction block or a block with otheropcodes. So a natural way to apply $C_add$ only on necessary rows is to multiply C𝑎𝑑𝑑 (𝑐𝑢𝑟𝑟) with𝑐𝑢𝑟𝑟 .𝑠𝑡𝑎𝑟𝑡 × (𝑐𝑢𝑟𝑟 .𝑜𝑝𝑐𝑜𝑑𝑒 == 𝑜𝑝) and the final constraint related to opcode add is C𝑎𝑑𝑑 (𝑐𝑢𝑟𝑟) :=𝑐𝑢𝑟𝑟 .𝑠𝑡𝑎𝑟𝑡 × (𝑐𝑢𝑟𝑟 .𝑜𝑝𝑐𝑜𝑑𝑒 == 𝑜𝑝) × C𝑎𝑑𝑑 (𝑐𝑢𝑟𝑟) = 0. Remark 5. For better readability, from this point we will simply use the name of the cell instead of the notation $columnName.(curr + k)$ if no confusion is introduced by doing so. The content of the rest of this section is arranged in subsections to describe circuits of instructionsin different categories. After we have constructed all the constraints $C_Opi$ for all opcodes $opi$, wesimply sum them up and get the final constraint C𝑜𝑝 (𝑐𝑢𝑟𝑟) :=Í𝑖𝑐𝑢𝑟 .𝑠𝑡𝑎𝑟𝑡 × (𝑐𝑢𝑟 .𝑜𝑝𝑐𝑜𝑑𝑒 ==𝑜𝑝) × C𝑜𝑝𝑖(𝑐𝑢𝑟𝑟) = 0 for TE. Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. Sinka Gao, Hongfei Fu, Heng Zhang, Junyu Zhang, and Guoqiang Li **5.1 Numeric Instructions** Numeric Instructions are the majority of instructions in WASM. In general, the semantics of numericinstructions contain file parts, parameters preparation, arithmetic calculation, writing result backto stack, update stack pointer and FALLTHROUGH as in Figure 5. def arithop := param1 = read(stack sp); \\ parameters preparation param2 = read(stack (sp-1)); \\ parameters preparation ... paramN = read(stack (sp-N+1)); \\ parameters preparation result = arith(param1, param2, param3, ..., paramN); \\ calculation write(stack, (sp-N+1), result); \\ result write back sp = sp-N+1; FALLTHROUGH; Fig. 5. Arithmetic instruction Based on the arithmetic definition, we assign the cells in the execution trace circuit TE in Table 7 and the constraints of arithmetic opcode are defined in Equation 7. Table 7. Add circuit in execution trace circuit | start | opcode | bit cell | state | aux | 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 ∈ 𝑇1 | 𝑠𝑝 ∈ 𝑇F | u64 cell | extra | |:-----:|:-------:|:--------:| ------- |:---:|:------------:|:-------:| -------- |:-----:| | true | 𝑎𝑟𝑖𝑡ℎ𝑂p | nil | tid | nil | 𝑖𝑎𝑑𝑑𝑟0 | sp | 𝑝𝑎𝑟𝑎𝑚0 | nil | | 0 | nil | nil | nil | nil | nil | nil | ... | nil | | 0 | nil | nil | nil | nil | nil | nil | 𝑝𝑎𝑟𝑎𝑚N | nil | | 0 | nil | nil | nil | nil | nil | nil | 𝑟𝑒𝑠𝑢𝑙t | nil | | true | 𝑜𝑡ℎ𝑒𝑟𝑜p | - | 𝑡𝑖𝑑 + 1 | nil | 𝑖𝑎𝑑𝑑𝑟1 | sp' | w'0 | nil | ![image](https://hackmd.io/_uploads/H1sx57aD6.png) **5.2 Control Flow Instructions** In WASM specification, there are three different categories of control flow: $FALLTHROUGH$, $branch$,and $call (return)$. Implementation of the $FALLTHROUGH$ is already covered in Section 5.1. Thus itis sufficient to implement call (return) and branch. $Call (Return) Circuit$. Call instruction will first add a new frame table entry $(tid, preFrameId,iaddr_0)$ into the frame circuits $T_F$ and then load calling parameters onto the stack and go to the $iaddr_1$ for next instruction (see Table 8 for the circuit layout of call). The circuit constraint for call instructionis Equation 8. Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. ZAWA: A ZKSNARK WASM Emulator                                                             Table 8. Circuit layout of call | start | opcode | bit cell | state | aux | 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 ∈ 𝑇1 | 𝑠𝑝 ∈ 𝑇F | u64 cell | extra | |:-----:|:-----------------:|:--------:|:----------:|:---:|:------------:|:-------:|:--------:|:-----:| | true | 𝑐𝑎𝑙𝑙(𝑡𝑎𝑟𝑔𝑒𝑡𝐼𝑎𝑑𝑑𝑟) | nil | tid | nil | 𝑖𝑎𝑑𝑑𝑟0 | sp | 𝑝𝑎𝑟𝑎𝑚0 | nil | | 0 | nil | nil | 𝑝𝑟𝑒𝐹𝑟𝑎𝑚𝑒𝐼d | nil | nil | nil | ... | nil | | 0 | nil | nil | nil | nil | nil | nil | 𝑝𝑎𝑟𝑎𝑚N | nil | | true | 𝑜𝑡ℎ𝑒𝑟𝑜𝑝 | - | 𝑡𝑖𝑑 + 1 | nil | 𝑖𝑎𝑑𝑑𝑟1 | sp' | nil | nil | | 0 | nil | nil | tid | nil | nil | nil | nil | nil | ![image](https://hackmd.io/_uploads/ryorhQTwa.png) To define the constraint for Return instruction, we need to find the correct return address of the current frame and set the frame state to the previous frame. Recall that, as described in Section4.3, the entries in $T_F$ are used to help the return instruction to find the correct calling frame and previous frame. Thus we can define the semantics of $return$ by finding the correct return $iaddr$ from $𝑇_F$ and then enforce that the next instruction address is equal to iaddr and update the frame stateaccordingly (see Table 9 for the circuit layout and Equation 9 for the circuit constraint). Table 9. Circuit layout of return | start | opcode | bit cell | state | aux | 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 ∈ T1 | 𝑠𝑝 ∈ TF | u64 cell | extra | |:-----:|:-------:|:--------:|:-----------:|:---:|:------------:|:-------:|:--------:|:-----:| | true | 𝑟𝑒𝑡𝑢𝑟n | nil | tid | nil | 𝑖𝑎𝑑𝑑𝑟0 | sp | nil | nil | | 0 | nil | nil | 𝑝𝑟𝑒𝑣𝐹𝑟𝑎𝑚𝑒𝐼d | nil | nil | nil | nil | nil | | true | 𝑜𝑡ℎ𝑒𝑟𝑜p | - | 𝑡𝑖𝑑 + 1 | nil | 𝑖𝑎𝑑𝑑𝑟1 | sp' | nil | nil | | 0 | nil | nil | 𝑛𝐹𝑟𝑎𝑚𝑒𝐼d | nil | 𝑖𝑎𝑑𝑑𝑟0 | nil | nil | nil | ![image](https://hackmd.io/_uploads/Hyke07aDa.png) Branch Circuit. Branch instructions in WASM include br, br_if, if * then * else *, etc. The semanticsof branch instructions can be uniformly abstracted as three steps (see Figure 6). The circuit layout def branchop := param1 = read(stack sp); \\ parameters preparation param2 = read(stack (sp-1)); \\ parameters preparation ... paramN = read(stack (sp-N+1)); \\ parameters preparation iaddr1 = select(param1, param2, ..., paramN); \\ calculate branch address GOTO iaddr2; \\branch to target address Fig. 6. Semantic of branch instruction of the branch instruction is sketched in Table 10 and its circuit constraint is defined in Equation 10. Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. Sinka Gao, Hongfei Fu, Heng Zhang, Junyu Zhang, and Guoqiang Li Table 10. Circuit layout of call | start | opcode | bit cell | state | aux | 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 ∈ 𝑇1 | 𝑠𝑝 ∈ 𝑇F | u64 cell | extra | | ----- |:--------:|:--------:|:-------:|:---:|:------------:|:-------:|:--------:|:-----:| | true | branchop | nil | tid | nil | 𝑖𝑎𝑑𝑑𝑟0 | sp | 𝑝𝑎𝑟𝑎𝑚0 | nil | | 0 | nil | nil | frameld | nil | nil | nil | ... | nil | | 0 | nil | nil | nil | nil | nil | nil | 𝑝𝑎𝑟𝑎𝑚N | nil | | true | otherop | - | tid+1 | nil | 𝑖𝑎𝑑𝑑𝑟1 | sp' | nil | nil | | 0 | nil | nil | frameld | nil | nil | nil | nil | nil | ![image](https://hackmd.io/_uploads/B1MlzNTDa.png) **5.3 Memory (Stack, Global) Instructions** Memory, Stack and Global instructions can be abstracted as a tuple of $(category,type,address,size = 8|16|32|64,value)$ where $category$ can be Memory, Stack or Global and $type$ can be Init, Reador Write. The layout of the circuit is defined in Table 11. By using the access log circuit defined inChapter 4.4, the constraint for the memory circuit is simply (𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦,𝑙𝑡𝑦𝑝𝑒, 𝑡𝑖𝑑, 𝑎𝑑𝑑𝑟𝑒𝑠𝑠, 𝑣𝑎𝑙𝑢𝑒′) ∈𝑇M ∧ 𝑡𝑟𝑢𝑛𝑐(𝑣𝑎𝑙𝑢𝑒′, 𝑠𝑖𝑧𝑒) = 𝑣𝑎𝑙𝑢𝑒. Remark 6. For read, this constraint ensures the result read from 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 is valid. For write, 𝑇Mensures that the next 𝑟𝑒𝑎𝑑 of 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 will return the previously written value correctly. Table 11. Memory access circuit within execution trace circuit | start | opcode | bit cell | state | aux | 𝑎𝑑𝑑𝑟𝑒𝑠𝑠 ∈ 𝑇1 | 𝑠𝑝 ∈ 𝑇F | u64 cell | extra | |:-----:|:-------------:|:--------:|:-----------:|:---:|:------------:|:-------:| -------- |:-----:| | true | op(type,size) | nil | tid | nil | 𝑖𝑎𝑑𝑑𝑟0 | sp | address | nil | | 0 | nil | nil | frameld | nil | nil | nil | value | nil | | true | otherop | - | tid+1 | nil | 𝑖𝑎𝑑𝑑𝑟1 | sp' | w'0 | nil | | 0 | nil | nil | frameld=tid | nil | nil | nil | w3 | nil | **5.4 Customized Instruction Extension** Given a fixed image, an entry function and an array of input arguments, the execution trace is then decided which means the number of instructions is fixed. As described in Section 4.2, eachinstruction occupies 𝑛 (a fixed number of) rows in the execution circuit TE. Thus the total rows of $TE$ are fixed. When doing proof in Halo2 using $KZG$ commitment, each column is interpolated into polynomials using $FFT$ (fast fourier transform). Because $FFT$ is an algorithm of $NlogN$ complexity,the total rows of $𝑇E$ affect the overall performance in a nonlinear way. Thus to reduce the numberof columns, a good way is to compress multiple instructions into one. ZAWA supports two ways of customizing foreign instructions for the purpose of compressing.One is implementing customized inline opcodes and the other is using external proofs of specificforeign functions. Compress using customized inline instruction. When the semantics of instructions we would liketo compress is simple and can fit into one instruction block, we can use the inline extension. For Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. ZAWA: A ZKSNARK WASM Emulator example, suppose that we want to sum the lowest 4 bits 𝑥 : $u64$ by function $sumLowest(x)$. If we use a standard loop to implement the algorithm, it will take 4 instructions to extract 4 bits andanother 2 to do the sum. However, if we inline this function into a customized inline instruction,then we can encode the arithmetic constraint within one instruction block. As a case study, wecompare the SHA256 execution trace with and without inline instructions in Table 12. Table 12. Row reduce by using customized instructions | original | original rows | customized | optimized rows | |:-----------------------------------------------------:| ------------- |:------------:|:--------------:| | 𝜆𝑥, 𝑦, (𝑥&𝑦) / (𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 (𝑥)&𝑧) | 4 | cℎ(𝑥, 𝑦) | 1 | | 𝜆𝑥, 𝑦, 𝑧, 𝑧 / (𝑥&(𝑦 / 𝑧)) | 2 | 𝑚𝑎𝑗(𝑥, 𝑦, 𝑧) | 1 | | 𝜆𝑥, 𝑟𝑜𝑡𝑟32 (𝑥, 2)/ 𝑟𝑜𝑡𝑟32 (𝑥, 13) / 𝑟𝑜𝑡𝑟32 (𝑥, 22) | 5 | 𝑙𝑠𝑖𝑔𝑚𝑎0(𝑥) | 1 | | 𝜆𝑥, 𝑟𝑜𝑡𝑟32 (𝑥, 6) / 𝑟𝑜𝑡𝑟32 (𝑥, 11) / 𝑟𝑜𝑡𝑟32 (𝑥, 25) | 5 | 𝑙𝑠𝑖𝑔𝑚𝑎1(𝑥) | 1 | Compress using customized foreign functions. When the semantics of instructions we would like to compress is too complex to fit into one instruction block and the semantic of these instructions can be abstracted into a pure function then we can using foreign functions to compressing the execution trace. A foreign function $f$ in ZAWA is a special purpose circuit that can be used to constraint that the input and output of $f$ is valid. Although foreign calls saves the size of it will introduce extra costs when more circuits are added. **6 PROGRAM PARTITION AND PROOF BATCHING** As discussed in Section 5.4, when encoding execution trace in 𝑇E, each instruction will take aconstant number of rows. In Halo2 proof system, there is a limit to the total number of rows ofarithmetic circuits [25]. Therefore, for large WASM images, we probably can not fit the wholeexecution trace into 𝑇E. To solve the problem of long execution trace, ZAWA use the technique ofprogram partition and proof batching. The idea is that we split the execution trace [𝑡0, 𝑡1, · · · ] intoa group of sub sequences, generate execution proof for each group and batch all the proofs in theend. Here we first give a bold sketch of the overview of the technique and then presents the extraconstraints we need to provide when batching sub proofs. Given an execution sequence $t_i$, we split it into small execution chunks $t_[𝑎,𝑏]$ = t_a, t[a+1], · · · , 𝑡𝑏−1and denote the M[𝑎,𝑏], SP[𝑎,𝑏], G[𝑎,𝑏]to be the memory, stack and global access log related to $t[𝑎,𝑏]$. We notice that given an execution trace 𝑡[𝑎,𝑏], by using the arithmetic circuits constructedin Section 4, we can prove 𝑡[𝑎,𝑏]is valid under the context (F,M[𝑎,𝑏], SP[𝑎,𝑏], I(C, H ), IO). Wedenote P[𝑎,𝑏]the proof of the valid execution of 𝑡[𝑎,𝑏] and P is the proof of the valid execution of 𝑡𝑖under the full access log (𝑖𝑎𝑑𝑑𝑟, F,M, G, SP, I, IO). Now it remains to find out what conditionswe need to enforce so that (P[0,𝑘−1] ∧ P[𝑘,2𝑘−1] ∧ · · · ∧ P[0,𝑒𝑛𝑑 ]) → P. Thus it is sufficient to make sure that for each $t_i ∈ t_[𝑎,𝑏]$the constraints applied on it in P[𝑎,𝑏]isequivalent to the constraints applied on it in P. As we have presented in Section 4, constraints applied on each instruction block in 𝑇E contains two parts, that are polynomial constraints aboutcells of the current and next instruction block and constraints of polynomial lookup of state (memory, stack, global) access logs. Equivalent of Polynomial Constraints. Regarding the polynomial constraints of cells, it is easy tocheck that if 𝑡𝑖 ∈ 𝑡[𝑎,𝑏] and 𝑖 < 𝑏 then all polynomial constraints of cells of the instruction block of Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. Sinka Gao, Hongfei Fu, Heng Zhang, Junyu Zhang, and Guoqiang Li $t_i$ in P[𝑎,𝑏] are equal to the those in P. So it remains to constrain that the last instruction of 𝑡[𝑎,𝑏]has the same polynomial constraints both in P[𝑎,𝑏] and P. However, it is not true in general sinceif we split the execution sequence into blocks that are disjoint, then the connection between the two sequences is lost. Therefore, to solve this problem, we need to pad a glue instruction at theend of each sub sequence and enforce the address of the glue instruction equal to the address ofthe first instruction of the next block (see Figure 7). By doing so we can check that the polynomialconstraints of each $t_i$ in P[𝑎,𝑏]is equivalent as it is in $P$. ![subsequence](https://hackmd.io/_uploads/BymodETva.png) Fig. 7. Split execution sequence into sub sequence Equivalent of Polynomial Lookup. Given a constraint of polynomial lookup for a cell in $t_i$, we need to show that $c ∈ T_M$ if and only if $c ∈ ∪T_M𝑘$. By the definition of Equation 5 we know that the property hold if and only if the concatenate of $T_M𝑘$ satisfies Equation 5. Notice that Equation 5only constraints adjacent rows, we extract a glue table $T_GM$ for $T_M𝑘(𝑘 = 1, 2, · · · )$ as in (Figure 8)and then it follows that $T_M =∪T_M𝑘$ if $T_M$, $T_M𝑘$ and $T_GM$ all satisfy Equation 5. ![memory-glue](https://hackmd.io/_uploads/H1uQKNav6.png) Fig. 8. Split memory access log into sub log As a conclusion, to solve the long execution trace problem, we split the execution trace into $𝑡[𝑎_0,𝑏_0 ]$, 𝑡[𝑎1,𝑏1 ], 𝑡[𝑎2,𝑏2 ], · · · and construct TM𝑘 TS P𝑘 TG𝑘for execution block 𝑡[𝑎𝑘,𝑏𝑘 ]. Suppose thatP𝑘 proves the valid execution of 𝑡[𝑎𝑘,𝑏𝑘 ] and TGM is the gluing map constructed as in Figure 8, Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. ZAWA: A ZKSNARK WASM Emulator then we claim that a proof $P_𝑏𝑎𝑡𝑐ℎ$ can prove that the execution sequence $t_i$ is a valid execution if and only if $P_batch$ is the batched proof of all the constraints in Equation 11. ![image](https://hackmd.io/_uploads/HkC9tEaDT.png) In ZAWA, we write the verifying algorithm of $P_k$ into arithmetic circuits $V_k$ and the total batchcircuit of Equation 11 is construct by putting the verifying circuits together with the circuits thatdo the other simple checks. Remark 7. Proof batching is an active research topic. Instead of writing verify function into arithmetic circuits, there are other methods [5, 13, 24, 31] that are worth trying as well. Since we focus moreon the consistency of program partition and memory access log in this paper, we leave the analysis otrying different batching methods as future work. **7 PERFORMANCE BENCHMARK** All the benchmark test suites are run on a machine with AMD Ryzen 7 5800X3D 8-Core Processor,one GeForce RTX 3090 graphic card and 32G * 4 DDR4 2133 ram. **7.1 Performance without Program Partition and Proof Batching** Among programs whose valid execution trace fits into the max sequence size of ZAWA, we measure the performance of ZAWA when dealing with two special programs: the Fibonacci function whichhas a deep call stack and the binary search function which has frequent memory access. As shown in Table 13 and Table 14, circuit size $CS$ denotes the total number of rows of circuits in ZAWA as apower of 2 and trace size denotes the total instructions included in the execution trace. Proof timeis the time ZAWA used to create the proof for the valid execution and the verify time is the timefor a verifier to check the proof. The column memory swap indicates whether the overall memory consumption of ZAWAis large than 128G. Table 13. Benchmark for fibonacci in ZAWA | circuit size | trance size | call depth | proof time | verify time | memory swap | |:------------:| ----------- |:----------:| ---------- |:-----------:|:-----------:| | 18 | 9037 | 13 | 44s | 22ms | false | | 19 | 23677 | 15 | 88s | 24ms | false | | 20 | 38317 | 16 | 178s | 22ms | false | | 21 | 100333 | 18 | 358s | 22ms | false | | 22 | 162349 | 19 | 828s | 29ms | true | From Table 13 and Table 14 we see that the verifying time is O(1) and proof time grows linearlywhen the size of the circuit grows. When the memory consumption of the simulated image (Fibonacci) is small, the instruction volume grows linearly as the circuit size grows (see Table 13).When the memory consumption of the simulated image (binary search) grows linearly as the circuitsize grows, the instruction volume grows slowly as shown in Table 14. **7.2 Performance for Large Program with Proof Batching** For a program with a large execution sequence, we split the execution trace into partitions andgenerate a sub-proof for each partition. To batch these sub proofs, we use the batching circuit Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. Sinka Gao, Hongfei Fu, Heng Zhang, Junyu Zhang, and Guoqiang Li Table 14. Benchmark for binary search in ZAWA | circuit | trace size | search buf size(64k per page) | proof time | verify time | memory swap | | ------- | ---------- |:-----------------------------:|:----------:| ----------- |:-----------:| | 18 | 585 | 26 | 44.200s | 22ms | false | | 19 | 616 | 63 | 87.200s | 24ms | false | | 20 | 647 | 124 | 173.200s | 22ms | false | | 21 | 678 | 246 | 342.200s | 24ms | false | | 22 | 809 | 490 | 803.200s | 25ms | true | (see Section 6) to generate a batched proof. Therefore the time of creating a final proof of a largeprogram is the sum of sub-proof generation time plus the proof batching time. It can be seen in Table 15. Benchmark for proof batching in ZAWA | partition CS | partition proof | total pieces | batching CS | batching proof | verify time | memory swap | |:------------:| --------------- |:------------:| ----------- | -------------- |:-----------:|:-----------:| | 18 | 2 | 1 | 22 | 168s | 4.7ms | false | | 18 | 2 | 2 | 22 | 326s | 4.63ms | false | | 20 | 2 | 1 | 22 | 168s | 4.77ms | false | | 20 | 2 | 2 | 22 | 324s | 5.05ms | false | | 22 | 2 | 1 | 22 | 168s | 5.07ms | false | | 22 | 2 | 2 | 22 | 325s | 4.93ms | true | Table 15 that the proving time for batching sub-proofs with different partition size is constant andthe only factor that affects the batching proof time is the number of pieces of sub-proof. In conclusion, if we have a large program for ZAWA and we want to optimize the total time of generating the final proof, then the factors we need to consider are the following: First, What is the partition size we use to split the whole transaction sequence. Second, how many pieces we should put together in one proof batch. Third, how we generate the proofs in parallel. **8 CONCLUSION & FURTHER WORK** We presented ZAWA, a WASM virtual machine that leverages the technology of ZKSNARK. As faras know, we are the first to present a novel way to implement the semantics of a WASM virtualmachine in arithmetic circuits. Using ZAWA, we are able to deploy serverless service in an untrusted cloud since ZAWA does not only emulates the execution but also gives a correctness proof of the execution result. Hence any vulnerability on the cloud will not affect the correctness of the verifiable result (a faulty result cannot have correct ZKSNARK proof). For large applications that provide large execution traces, we successfully applied the execution partition and proof batching technique so the ZAWA scales when the program size grows.Regarding the performance, various optimization can be applied in the future including improvingthe commitment scheme [12], adopting a better parallel computing strategy [44], using SNARK specific hardware [38, 45], etc. **REFERENCES** [1] Farag Azzedin and Muthucumaru Maheswaran. Evolving and managing trust in grid        computing systems. In IEEECCECE2002. Canadian Conference on Electrical and Computer        Engineering. Conference Proceedings (Cat. No. 02CH37373),volume 3, pages 1424–1429.        IEEE, 2002. [2] Olivier Bégassat, Alexandre Belling, Théodore Chapuis-Chkaiban, and Nicolas Liochon. A        specification for a zk-evm,2021.Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication        date: November 2022.ZAWA: A ZKSNARK WASM Emulator [3] Eli Ben-Sasson, Alessandro Chiesa, Daniel Genkin, Eran Tromer, and Madars Virza. Snarks for        c: Verifying programexecutions succinctly and in zero knowledge. In Annual cryptology        conference, pages 90–108. Springer, 2013. [4] Eli Ben-Sasson, Alessandro Chiesa, Daniel Genkin, Eran Tromer, and Madars Virza. Snarks for        c: Verifying programexecutions succinctly and in zero knowledge. In Annual cryptology        conference, page 90–108. Springer, 2013. [5] Eli Ben-Sasson, Alessandro Chiesa, Eran Tromer, and Madars Virza. Scalable zero knowledge        via cycles of ellipticcurves. Algorithmica, 79(4):1102–1160, 2017. [6] Dan Boneh, Justin Drake, Ben Fisch, and Ariel Gabizon. Efficient polynomial commitment        schemes for multiple pointsand polynomials. Cryptology ePrint Archive, 2020. [7] Dan Boneh, Justin Drake, Ben Fisch, and Ariel Gabizon. Halo infinite: Recursive zk-snarks        from any additive polynomial commitment scheme. Cryptology ePrint Archive, 2020. [8] Jonathan Bootle, Andrea Cerulli, Pyrros Chaidos, Jens Groth, and Christophe Petit. Efficient        zero-knowledge argumentsfor arithmetic circuits in the discrete log setting. In Annual        International Conference on the Theory and Applications of Cryptographic Techniques, page        327–357. Springer, 2016. [9] Benedikt B"unz, Jonathan Bootle, Dan Boneh, Andrew Poelstra, Pieter Wuille, and Greg        Maxwell. Bulletproofs: Shortproofs for confidential transactions and more. In 2018 IEEE        symposium on security and privacy (SP), page 315–334.IEEE, 2018. [10] Benedikt B"unz, Ben Fisch, and Alan Szepieniec. Transparent snarks from dark compilers. In        Annual International Conference on the Theory and Applications of Cryptographic        Techniques, page 677–706. Springer, 2020. [11] Bor-Yuh Evan Chang, Karl Crary, Margaret DeLap, Robert Harper, Jason Liszka, Tom Murphy        VII, and Frank Pfenning.Trustless grid computing in concert. In International Workshop on        Grid Computing, pages 112–125. Springer, 2002. [12] Binyi Chen, Benedikt Bünz, Dan Boneh, and Zhenfei Zhang. Hyperplonk: Plonk with        linear-time prover and highdegree custom gates. Cryptology ePrint Archive, 2022. [13] Alessandro Chiesa, Lynn Chua, and Matthew Weidner. On cycles of pairing-friendly elliptic        curves. SIAM Journal onApplied Algebra and Geometry, 3(2):175–192, 2019. [14] Alessandro Chiesa, Yuncong Hu, Mary Maller, Pratyush Mishra, Noah Vesely, and Nicholas        Ward. Marlin: preprocessing zksnarks with universal and updatable srs. In Annual        International Conference on the Theory and Applications of Cryptographic Techniques, page        738–768. Springer, 2020. [15] Stephen Chong, Eran Tromer, and Jeffrey A Vaughan. Enforcing language semantics using        proof-carrying data.Cryptology ePrint Archive, 2013. [16] Craig Costello, C’edric Fournet, Jon Howell, Markulf Kohlweiss, Benjamin Kreuter, Michael        Naehrig, Bryan Parno,and Samee Zahur. Geppetto: Versatile verifiable computation. In 2015        IEEE Symposium on Security and Privacy, page253–270. IEEE, 2015. [17] Karl Crary and Susmit Sarkar. Foundational certified code in a metalogical framework. In        International Conference on Automated Deduction, pages 106–120. Springer, 2003. [18] Jacob Eberhardt and Stefan Tai. Zokrates-scalable privacy-preserving off-chain        computations. In 2018 IEEE International Conference on Internet of Things (iThings) and IEEE        Green Computing and Communications (GreenCom) and IEEE Cyber,Physical and Social        Computing (CPSCom) and IEEE Smart Data (SmartData), page 1084–1091. IEEE, 2018. [19] Ariel Gabizon, Zachary J Williamson, and Oana Ciobotaru. Plonk: Permutations over        lagrange-bases for oecumenical noninteractive arguments of knowledge. Cryptology ePrint        Archive, 2019. [20] Phani Kishore Gadepalli, Sean McBride, Gregor Peach, Ludmila Cherkasova, and Gabriel        Parmer. Sledge: A serverless-first, light-weight wasm runtime for the edge. In Proceedings of        the 21st International Middleware Conference, pages265–279, 2020. [21] Jens Groth. Efficient zero-knowledge arguments from two-tiered homomorphic        commitments. In International Conference on the Theory and Application of Cryptology and        Information Security, page 431–448. Springer, 2011. [22] Jens Groth. On the size of pairing-based non-interactive arguments. In Annual international        conference on the theoryand applications of cryptographic techniques, pages 305–326.        Springer, 2016. [23] Jens Groth and Mary Maller. Snarky signatures: Minimal signatures of knowledge from        simulation-extractable snarks.In Annual International Cryptology Conference, page 581–612.        Springer, 2017. [24] Ulrich Haböck, Alberto Garoffolo, and Daniele Di Benedetto. Darlin: Recursive proofs using        marlin. arXiv preprint arXiv:2107.04315, 2021. [25] Halo2. The halo2 book, 2020. Accessed on 2022-9-3. [26] Max Hoffmann, Michael Klooß, and Andy Rupp. Efficient zero-knowledge arguments in the        discrete log setting,revisited. In Proceedings of the 2019 ACM SIGSAC Conference on        Computer and Communications Security, page2093–2110, 2019. [27] Marek Jawurek, Florian Kerschbaum, and Claudio Orlandi. Zero-knowledge using garbled        circuits: how to provenon-algebraic statements efficiently. In 2013 ACM SIGSAC Conference        on Computer and Communications Security,(CCS’13), pages 955–966. ACM, 2013. Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022. Sinka Gao, Hongfei Fu, Heng Zhang, Junyu Zhang, and Guoqiang Li [28] Yier Jin and Yiorgos Makris. Proof carrying-based information flow tracking for data secrecy        protection and hardwaretrust. In 2012 IEEE 30th VLSI Test Symposium (VTS), pages 252–257.        IEEE, 2012. [29] Aniket Kate, Gregory M Zaverucha, and Ian Goldberg. Polynomial commitments. Tech. Rep,        2010. [30] Ahmed Kosba, Charalampos Papamanthou, and Elaine Shi. xjsnark: A framework for        efficient verifiable computation.In 2018 IEEE Symposium on Security and Privacy (SP), page        944–961. IEEE, 2018. [31] Abhiram Kothapalli, Srinath Setty, and Ioanna Tzialla. Nova: Recursive zero-knowledge        arguments from folding schemes. In Annual International Cryptology Conference, pages        359–388. Springer, 2022. [32] Mary Maller, Sean Bowe, Markulf Kohlweiss, and Sarah Meiklejohn. Sonic: Zero-knowledge        snarks from linear-sizeuniversal and updatable structured reference strings. In Proceedings        of the 2019 ACM SIGSAC Conference on Computerand Communications Security, page        2111–2128, 2019. [33] Brian McFadden, Tyler Lukasiewicz, Jeff Dileo, and Justin Engler. Security chasms of wasm.        NCC Group Whitepaper,2018. [34] Bryan Parno, Jon Howell, Craig Gentry, and Mariana Raykova. Pinocchio: Nearly practical        verifiable computation.Communications of the ACM, 59(2):103–112, 2016. [35] Juha Partala, Tri Hong Nguyen, and Susanna Pirttikangas. Non-interactive zero-knowledge        for blockchain: A survey.IEEE Access, 8:227945–227961, 2020. [36] Luke Pearson, Joshua Fitzgerald, Héctor Masip, Marta Bellés-Muñoz, and Jose Luis        Muñoz-Tapia. Plonkup: Reconciling plonk with plookup. Cryptology ePrint Archive, 2022. [37] Siani Pearson. Taking account of privacy when designing cloud computing services. In 2009        ICSE Workshop on Software Engineering Challenges of Cloud Computing, pages 44–52. IEEE,        2009. [38] BO Peng, Yongxin Zhu, Naifeng Jing, Xiaoying Zheng, and Yueying Zhou. Design of a        hardware accelerator forzero-knowledge proof in blockchains. In International Conference        on Smart Computing and Communication, pages136–145. Springer, 2020. [39] risc zero. risc zero: overview of the zkvm, 2022. Accessed on 2022-9-3. [40] Hassan Takabi, James BD Joshi, and Gail-Joon Ahn. Security and privacy challenges in cloud        computing environments.IEEE Security & Privacy, 8(6):24–31, 2010. [41] Vitalik. The different types of zk-evms, 2022. Accessed on 2022-9-3. [42] Riad S Wahby, Srinath Setty, Max Howald, Zuocheng Ren, Andrew J Blumberg, and Michael        Walfish. Efficient ram andcontrol flow in verifiable outsourced computation. Cryptology        ePrint Archive, 2014. [43] Gavin Wood and Jutta Steiner. Trustless computing—the what not the how. In Banking        Beyond Banks and Money,pages 133–144. Springer, 2016. [44] Howard Wu, Wenting Zheng, Alessandro Chiesa, Raluca Ada Popa, and Ion Stoica. {DIZK}: A        distributed zero knowledge proof system. In 27th USENIX Security Symposium (USENIX        Security 18), pages 675–692, 2018. [45] Charles F Xavier. Pipemsm: Hardware acceleration for multi-scalar multiplication.        Cryptology ePrint Archive, 2022. [46] Zhifeng Xiao and Yang Xiao. Security and privacy in cloud computing. IEEE communications        surveys & tutorials,15(2):843–859, 2012. Received 10 November 2022 Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: November 2022.