Untrusted Program & Mprotect Document

# Untrusted Program & Mprotect Document This documentation may be edited at any time for clarification, typos, and added details. ## Requirements **Must** - Have a master toggle that is defined within a program’s ELF’s NOTE segment. When set to “1”, it will allow for executing untrusted instructions and enabling page protections (e.g. for every memory access check for permissions and allow `mprotect` to be called). That toggle must be read from the ELF when parsing it. - Read from the ELF file the initial page permissions from the LOAD segments. If the master toggle is on, it’ll initialize the executor’s page prot hashmap with those values and contribute those values to the program’s global initial cumulative sum. - When the master toggle is on, be able to verifiably read encoded rv64 instruction from memory and verifiably decode them before executing. - When the master toggle is on, assign/update read, write, and/or execute permissions for memory pages (4096 bytes) via the `mprotect` syscall. - When the master toggle is on, respects the page prot permissions when accessing memory for reading, writing, or executing. (e.g. a valid SP1 proof can’t be generated if memory operations are rejected b/c of permissions). **Should** - Have negligible performance impact for programs that don’t have the master toggle set to “1". **Should Not** - Develop a full-fledged OS model that include multi-tenants, processes etc. ## Instruction Decoding & Untrusted Program An additional feature of SP1 Hypercube is the ability to run untrusted programs. Basically, this allows an instruction that was written to memory, instead of being a preprocessed instruction derived from the ELF, to be ran as a part of the execution. This requires the following primitives, which this document will go over. - Fetching the instruction from `pc` can now be also from an untrusted program, rather than directly from the `ProgramTable`. This should be accounted for in all the adapters. - The 32-bit instruction should be fetched from the corresponding memory address. - The 32-bit instruction should be decoded into the form used inside the `ProgramChip`. - A page protection rule must be implemented for the VM's safety. **The Adapters** Recall the following code in the `ALUTypeReader`. ```rust= builder.assert_bool(is_real.clone()); let is_untrusted = is_real.clone() - cols.is_trusted; builder.assert_bool(is_untrusted.clone()); builder.assert_bool(cols.is_trusted); // A real row must be executing either a trusted program or untrusted program. builder.assert_eq(is_untrusted.clone() + cols.is_trusted, is_real.clone()); // If the row is running an untrusted program, the page protection checks must be on. let public_values = builder.extract_public_values(); builder.when(is_untrusted.clone()).assert_one(public_values.is_untrusted_programs_enabled); // ... builder.send_program(pc, instruction.clone(), cols.is_trusted); builder.send_instruction_fetch( pc, instruction, instr_field_consts, [clk_high.clone(), clk_low.clone()], is_untrusted.clone(), ); ``` Here, `is_trusted` is whether the instruction is read from the ELF itself, and `is_untrusted` is whether the instruction is read as an untrusted program from memory. In a padding row, both must be false, and in a real row, exactly one must be true. In the case of `is_trusted == 1`, the `instruction` is fetched via `send_program`: handled through the preprocessed `ProgramChip`. In the case of `is_untrusted == 1`, its fetched via `send_instruction_fetch`. Note that `is_untrusted == 1` is possible even when reading the instruction from the ELF: since the memory at `pc` is initialized with the appropriate instruction, using `is_untrusted == 1` and fetching the instruction from the memory will lead to the same result as `is_trusted == 1`, but less efficient as the decoding process will be done. The only case where this may cause an issue is where a memory address `pc` which belongs to the trusted ELF program instruction becomes overwritten with a new instruction, then ran as an untrusted program. To defend against this possibility, we initialize the page permissions based on the ELF's permission flags: this will prevent the said `pc` being writable. We also note the `public_values.is_untrusted_programs_enabled`. This is a public value that denotes whether or not the untrusted programs are possible. This value is checked to be boolean in `eval_public_values`, and checked to be equal to the flag inside the `vk`. If this value is turned off, then no page protection rules will be checked, but the AIR will constrain that only trusted instructions from the ELF will be ran. Therefore, we check that if `is_untrusted` is true, then this public values is toggled on. **Fetching the Encoded Instruction** This is done in the `InstructionFetchChip`. The columns are ```rust= pub struct InstructionFetchCols<T> { pub clk_high: T, pub clk_low: T, pub pc: [T; 3], pub top_two_limb_inv: T, pub instruction: InstructionCols<T>, pub instr_type: T, pub base_opcode: T, pub funct3: T, pub funct7: T, pub memory_access: MemoryAccessCols<T>, /// The selected 32 bits of read memory, in this case the 32 bit encoded instruction. pub selected_word: [T; 2], pub offset: T, pub is_real: T, } ``` Here, the columns have the following meaning - `clk_high`, `clk_low`: current CPU timestamp, required to read the memory - `pc`: the program counter (memory address) to access - These `pc` are also range checked to be three u16 limbs - `memory_access`: the memory access to the `pc` using the Blum argument - `offset`: equal to `0` if `pc == 0 (mod 8)` , or `1` if `pc == 4 (mod 8)`. - Here, the `pc` is enforced to be a multiple of `4`. Note that this is important to ensure that the `HALT_PC` related logic explained above is still in tact. - `selected_word`: the u32 instruction selected from the u64 value read from memory. - `top_two_limb_inv`: inverse to check that `pc >= 2^16` is true. - `instruction`, `instr_type`, `base_opcode`, `funct3` , and `funct7` values are forwarded off to the `InstructionDecode` table to ensure the correct decoding. The chip handles the following interactions. - It has a `receive_instruction_fetch`, which receives the instruction request from the adapters. This is similar to what happens in `ProgramChip`. The information sent & received is the current pc, current clk, the decoded instruction, and constants related to the opcode executed (used for proving correct decoding and more details below) - It has a `send_instruction_decode`, which sends the selected word and asks the `InstructionDecodeChip` to verify its decoding. The selected word, the decoded instruction, and the constants related to the opcode are sent. - Finally, it has a `send_page_prot`, which asks to verify that the page related to the `pc` address has the executable permissions toggled on. This is handled in `PageProtChip`. **Decoding the Encoded Instruction** As mentioned, this is done in the `InstructionDecodeChip`. ```rust= pub struct InstructionDecodeCols<T> { pub multiplicity: T, pub instruction: InstructionCols<T>, pub instr_type: T, pub funct3: T, pub funct7: T, pub is_r_type: T, pub is_i_type: T, pub is_i_type_shamt: T, pub is_i_type_shamt_32: T, pub is_j_type: T, pub is_b_type: T, pub is_s_type: T, pub is_u_type: T, pub is_a_0: IsZeroOperation<T>, pub encoded_instruction: [T; 2], pub encoded_instruction_bits: [T; 32], } ``` The columns have the following meaning - `instruction` is the decoded instruction. - `instr_type` is the type of instruction for this instruction (e.g. RType, IType, etc…). Each RISC-V opcode table will send this based on the instruction. This column should have values of the enum in `crates/core/executor/src/instruction.rs`). It’s basically a bitmap for the instruction type where all the bits are mutually exclusive. - `funct3` is funct3 for the instruction, and is sent by the RISC-V opcode tables. - `funct7` is funct7 for the instruction, and is sent by the RISC-V opcode tables. This is only relevant for some of the instruction types. For others, it's constrained to be zero. - The type selectors determine the type. A non-padding row has exactly one selector turned on, and the said selector will be determined based on the `instr_type` value. - `is_a_0` is a gadget used to check if `op_a` is equal to 0. - `encoded_instruction` is the encoded instruction stored in two 16 bit limbs. - `encoded_instruction_bits` is the bitwise decomposition of `encoded_instruction`. Once the `encoded_instruction` is bit-decomposed, the circuit aims to do the instruction decoding according to the RISC-V specification. Note that the `InstructionCols` format is - `opcode`, the Opcode as defined in SP1 executor - `op_a, op_b, op_c`: register or immediate values - `op_a_0`: whether or not `op_a == 0` - `imm_b, imm_c`: whether or not `op_b, op_c` are immediates Here, note that `opcode` itself is not constrained: the guaranteed use of a correct opcode is done through the `base_opcode, instr_type, funct3, funct7` values. However, the `op_a, op_b, op_c, op_a_0, imm_b, imm_c` values are constrained at all times. **One part to look out for in the audit is to verify that the values `base_opcode, instr_type, funct3, funct7` are correct according to the RISC-V64 specification, and that these values guarantee a unique decoding of the 32-bit instruction value.** **Note that `ECALL` is not one of the possibilities in the instruction decoding.** This is because untrusted programs are not allowed to run an `ECALL`. This is checked in `SyscallInstrsChip`, as shown below. We'll discuss this further in the next section. ```rust= // Assert that only the trusted program can call ecall. builder.when(local.is_real).assert_one(local.adapter.is_trusted); ``` Note also that there's a `multiplicity` column, so that the `receive_instruction_decode` can happen with any multiplicity. This is to make it possible that the same encoded instruction do not have to be decoded multiple times. Note that this is fine, as the entire message within that `receive_instruction_decode` is guaranteed to be correct. This multiplicity is also checked to be zero when the row is a padding row, as shown below. ```rust= // Constrain the interaction with instruction decode table let untrusted_instruction_const_fields = [ local.instr_type.into(), decoded_base_opcode, local.funct3.into(), local.funct7.into(), ]; builder.when_not(is_real).assert_zero(local.multiplicity); builder.receive_instruction_decode( [local.encoded_instruction[0].into(), local.encoded_instruction[1].into()], local.instruction, untrusted_instruction_const_fields, local.multiplicity.into(), ); ``` ## Memory Protection Checks The handling of memory permissions works in a same way that the memory argument does. Each page of size `2^12` bytes hold its current permissions, and their status is proved by the same Blum argument. Similar to `MemoryGlobalInit`, `MemoryGlobalFinalize`, `MemoryLocal` chips used to handle the permutation argument, there are `PageProtGlobalInit`, `PageProtGlobalFinalize`, `PageProtLocal` chips corresponding. **Page Protection on RISC-V instructions** All normal RISC-V instructions cannot change the page protection status. In fact, most RISC-V instructions do not require an access to the page protection status: only memory instructions do. The LOAD instructions require a `READ` permissions for the page, while the STORE instructions require a `WRITE` permission. Of course, in the case of an untrusted program, the page corresponding to `pc` must have the `EXEC` permission. The only way to modify the page permission is through a `MPROTECT` syscall, which allows a modification of a page's permission. As `SyscallInstrsChip` only allows syscalls to be handled by a trusted program of the ELF, an untrusted program cannot call `MPROTECT`. The `PageProtChip` checks a request on a page's permissions. First, it receives the page protection check request via an interaction, as shown below. ```rust= builder.receive_page_prot( local.clk_high, local.clk_low, &local.addr.map(Into::into), local.permissions, local.is_real, ); ``` Here, `clk_high, clk_low` are timestamps, `addr` is the address to check the page permission on, and `permissions` is the bit flag of permissions to check. The `PageProtOperation` aims to read the permissions of a given address at a timestamp. ```rust= PageProtOperation::<AB::F>::eval( builder, local.clk_high.into(), local.clk_low.into(), &local.addr.map(Into::into), local.page_prot_op, local.is_real.into(), ); ``` The Blum argument part of the AIR is identical as the one for memory. The harder part is getting the page index from the address. Since each page has `2^12` bytes, and the addresses are formatted as three `u16` limbs, we use the page index for `[addr0, addr1, addr2]` as simply `[addr0 >> 12, addr1, addr2]` with `4, 16, 16` bits each. This can be derived by breaking `addr0` into a 4-bit limb and 12-bit limb. Now, the AIR needs to validate that the actual permissions of the page includes the `permissions` that was received in `receive_page_prot`. To do this, we use the fact that bitmap `b` being included in `a` is equivalent to `a & b == b`. ```rust= builder.send_byte( AB::Expr::from_canonical_u8(ByteOpcode::AND as u8), local.permissions, local.permissions, local.page_prot_op.page_prot_access.prev_prot_bitmap, local.is_real, ); ``` Note that `PageProtChip` only reads the page permissions, and cannot write it. **MPROTECT Syscall** The `MPROTECT` syscall sends over an page address and a permission as input. It can only be sent when the `is_untrusted_programs_enabled` flag is turned on in the public values. ```rust= let syscall_id = prev_a_byte[0].clone(); // Compute whether this ecall is mprotect. let is_mprotect = { IsZeroOperation::<AB::F>::eval( builder, IsZeroOperationInput::new( syscall_id.clone() - AB::Expr::from_canonical_u32(SyscallCode::MPROTECT.syscall_id()), local.is_page_protect, local.is_real.into(), ), ); local.is_page_protect.result }; builder.when(is_mprotect).assert_one(is_page_protect_active); ``` The actual syscall AIR does a similar bit decomposition to get the page index, while also checking that the lowest 12 bits of the address is zero to ensure page alignment of the input. The written permission is also bitwise decomposed to ensure that the permission value is a valid one, a value with 3 bits (for READ, WRITE, EXEC). **Page Protection on Precompiles** The page protection also needs to be done on precompiles themselves. This leads to two different points of discussion. - Many syscalls access a slice of memory, which may not be contained entirely within a page. It may span over two consecutive pages. Note that because the length of the slice is pre-determined, it's guaranteed that the slice cannot span over three or more pages, as the slice cannot be larger than a single page of memory. - Unlike LOAD/STORE or execution of an untrusted program which have very clear page protection rules corresponding to them, for precompiles they may not be so apparent. **First, the slice issue.** We have `AddressSlicePageProtOperation`. The inputs are - `clk_high, clk_low`: the timestamp values - `start_addr`: the starting address of the slice - `end_addr`: the ending address of the slice - The exact memory region this covers is `[start_addr, end_addr + 8)`. - `start_addr` and `end_addr` are assumed to be multiples of `8`. This is usually done through the `SyscallAddrOperation` in the syscall itself. - `start_addr` and `end_addr` are assumed to be valid addresses of three `u16` limbs. - `permissions`: the permissions to check in the slice - `is_real`: if the row is a real row, not a padding row and the operation itself has the following columns. ```rust= pub struct AddressSlicePageProtOperation<T> { pub page_is_equal_or_adjacent: PageIsEqualOrAdjacentOperation<T>, pub page_operations: [PageOperation<T>; 2], pub page_prot_accesses: [PageProtAccessCols<T>; 2], pub is_page_protect_active: T, } ``` As all page protection goes, it only applies when `public_values.is_untrusted_programs_enabled` is on. So `is_page_protect_active` is only turned on if and only if both `is_real` and `is_untrusted_programs_enabled` is on. ```rust= // Check page protect active is set correctly based on public value and is_real builder.assert_bool(is_real.clone()); let public_values = builder.extract_public_values(); let expected_page_protect_active = public_values.is_untrusted_programs_enabled.into() * is_real.clone(); builder.assert_eq(cols.is_page_protect_active, expected_page_protect_active); ``` As the slice may touch at most two pages, with the two pages being consecutive, it suffices to get the first page index and whether or not the slice touches two pages. Therefore, the circuit does the following. - gets the `start_page_idx`, the page index of `start_addr` using `PageOperation` - gets the `end_page_idx`, the page index of `end_addr` using `PageOperation` - verifies the `start_page_idx`'s permissions are as expected, similar to `PageProtChip` - check if `start_page_idx` is equal to `end_page_idx` or not - if not, (and only if not) verifies the `end_page_idx`'s permissions are as expected The `PageIsEqualOrAdjacentOperation` checks if `start_page_idx == end_page_idx`. The boolean result for `end_page_idx == start_page_idx + 1` is `cols.page_is_equal_or_adjacent.is_adjacent`, which is the boolean flag used for accessing the `end_page_idx`. Because this value is only constrained when `cols.is_page_protect_active` is turned on, we additionally verify that the `is_adjacent` flag is zero when `cols.is_page_protect_active` is zero. ```rust= builder .when(cols.page_is_equal_or_adjacent.is_adjacent) .assert_one(cols.is_page_protect_active); ``` In `PageIsEqualOrAdjacentOperation`, the page indices are of `4, 16, 16` bits as previously mentioned. So, the strategy is to combine the bottom two limbs into a 20-bit limb, then use that to compare the two page indices. ```rust= pub struct PageIsEqualOrAdjacentOperation<T> { pub is_overflow: T, // Bool flag that is set to 0 if equal, 1 if adjacent. pub is_adjacent: T, } ``` Here, `is_overflow` is whether or not the 20-bit limb of the `end_page_idx` becomes zero, so that there is a carry onto the top 16-bit limb. The exact constraints placed here are - assert `is_real, is_adjacent, is_overflow` are boolean. - combine each page indices into 20-bit and 16-bit limbs - If `is_adjacent` is false, then the page indices are equal - If `is_adjacent` is true, then assert `is_real == 1`. - If `is_adjacent` is true and `is_overflow` is false - the 20-bit limb increments by one - the 16-bit limb stays equal - If `is_adjacent` is true and `is_overflow` is true - the starting page's 20-bit limb is `2^20 - 1` - the ending page's 20-bit limb is equal to `0` - the 16-bit limb increments by one **Now we discuss each precompile's required permissions**. There are operations that act like `x <- x op y` where each `x, y` are memory slices. For example, elliptic curve addition and `Fp` operations act in such a way. In these cases, `x` is enforced to have `READ, WRITE` permissions, and `y` is enforced to have `READ` permissions. These operations include - `ED_ADD`, as the syscall does `p <- p + q` - `WeierstrassAdd`, as the syscall does `p <- p + q` - All `Fp` add, sub, mul, and `Fp2` add, sub, mul opcodes - `UINT256_MUL`, as the syscall does `x <- x * y (mod M)` There are operations that act like `x <- f(y)`, where `x, y` are one or more memory slices. In this case, `x` is enforced to have `WRITE` permissions, and `y` is enforced to have `READ` permissions. These operations include the following. - `ED_DECOMPRESS`, as the syscall does `x <- decompress(y)` - `WeierstrassDecompress`, as the syscall does `y <- decompress(x)` - `U256x2048Mul`, as the syscall does `(lo, hi) <- a x b` - `Uint256Ops`, as the syscall does `(d, e) <- (a op b) + c` Some operations do `x <- f(x)`, in which `x` is enforced to have both `READ, WRITE` permissions. These operations include the following. - `KECCAK_PERMUTE`, as the syscall does `state <- permute(state)` - `POSEIDON2`, as the syscall does `state <- permute(state)` - `WeierstrassDouble`, as the syscall does `x <- 2 * x` Finally, `SHA` has some more complex conditions. For `ShaExtend`, the slice `[w_ptr, w_ptr + 16 * 8)` is checked to be readable, where as the slice `[w_ptr + 16 * 8, w_ptr + 64 * 8)` is checked to be readable and writable. Note that the entire `w_ptr` slice will be read immediately after the `ShaExtend` precompile takes place, if the user program uses the precompiles through the SHA patch, as shown [here](https://github.com/sp1-patches/RustCrypto-hashes/commit/e48b656ebc806117554bb33c2f8687e4637e37ff). For `ShaCompress`, the `w_ptr` slice is checked to be readable, while the `h_ptr` slice is checked to be readable and writeable. **Initialization & Finalization of Page Protection** The fundamental logic of `PageProtGlobalChip` is the same as the memory one. The `page_idx` are range checked to be `4, 16, 16` bits, and the page protection values are set to be `3` bits. The initialization and finalizations are ensured to be unique, similar to the logic for memory, by checking that the indices are strictly increasing. We shift the bottom `4` bits by `12` bits, to use the usual comparison between four u16 limbs. There are two differences. First, unlike `MemoryGlobalInit`, where the initial values can be any arbitrary `Word` of four valid u16 limbs, in `PageProtGlobalInit` the only initial page permissions is the default `READ | WRITE` permissions. This is shown below. ```rust= // If it's the initialize chip, then assert that the page prot is the default page prot. if self.kind == MemoryChipType::Initialize { builder .when(local.is_real) .assert_eq(local.page_prot, AB::Expr::from_canonical_u8(DEFAULT_PAGE_PROT)); } ``` The second difference is that in memory, `x0` must be enforced to be initialized to zero, whereas there's no such restrictions for page protection. In fact, in `MemoryGlobalInit`, it's actually impossible to initialize just `x0`, as there's no way to get the validity state to be `1`. This is fine, as one can just initialize and finalize all registers, and initialization of `x5, x10, x11` happens naturally due to the `HALT` ecall. Therefore, we modify the state machine. The state is still `(index, prev_page_idx, prev_valid)`, with the start being `(0, 0, 1)`, but the transition is little different. - In all cases, transitioning to state `(index + 1, page_idx, 1)` with `prev_page_idx < page_idx`, with `page_idx` newly initialized or finalized is possible. - If `index == 0, prev_page_idx == 0`, it's possible to transition to state `(1, 0, 0)`. The state must end with `validity = 1` for each shard, and the `page_idx` must be connected throughout the shards. This allows the initialization to not initialize the page with `page_idx == 0`. However, it's still impossible to initialize a single page with `page_idx == 0`. This is actually fine, as the page protection is for memory, and as previously discussed we have a guard to only allow memory addresses `>= 2^16`. Therefore, the case where `page_idx == 0` (addresses `[0, 2^12)`) is the only page to initialize/finalize will never happen in actual proving, so there are no completeness issues. Similar to memory, parts of page protection that are enforced from the ELF itself are embedded within the `initial_global_cumulative_sum`. These will ensure that - either `pc` in the `ProgramChip` is never declared to be executable - or, the page corresponding to `pc` in `ProgramChip` is never writeable, and initializes with the appropriate instruction inside the ProgramChip (i.e. regardless of whether `is_trusted` is on or `is_untrusted` is on, the instruction itself is the same)