Multi-threaded Cannon (Experimental) author: @mbaxter # Overview This document discusses the proposed approach to implementing [multithreading](https://en.wikipedia.org/wiki/Thread_(computing)) support in the Cannon VM. The main goal here is to reduce memory requirements of the `op-program` by enabling [Go](https://en.wikipedia.org/wiki/Go_(programming_language)) garbage collection, which [requires threading support](https://github.com/golang/go/blob/e6504ce671afcf0a165be02f9e04f6a99ca08020/src/runtime/mgc.go#L7). Secondarily, this will allow us to remove some [patching we do](https://github.com/ethereum-optimism/optimism/blob/e81a35a4b864256f15df6c9a064c1dfadb30483b/cannon/mipsevm/patch.go#L66) to overwrite unsupported operations from the loaded [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format) program. Thread management and scheduling are typically handled by the operating system via `syscall` operations that give program control over to the OS [kernel](https://en.wikipedia.org/wiki/Kernel_%28operating_system%29). To support a minimal implementation of multithreading in the Cannon VM that specifically targets the `go1.21` runtime, a few new Linux-specific `syscall` operations must be implemented. Additionally, the state of the threads in the system must be managed deterministically and in a provable way. # Resources - [Multithreaded Cannon VM](https://github.com/ethereum-optimism/optimism/blob/d9dcc280e23dd188eecb53fe7eef4871258673dc/packages/contracts-bedrock/src/cannon/MIPS2.sol) - https://specs.optimism.io/experimental/fault-proof/cannon-fault-proof-vm.html#syscalls - [Linux syscall numbers](https://gpages.juszkiewicz.com.pl/syscalls-table/syscalls.html) (see `mipso32` column) with links to linux documentation, for example: - [Futex](https://www.man7.org/linux/man-pages/man2/futex.2.html) - Go runtime implementation targeting Linux OS / MIPS architecture: - [Futex operations](https://github.com/golang/go/blob/d8392e69973a64d96534d544d1f8ac2defc1bc64/src/runtime/os_linux.go#L48-L95) - [Some syscall constants](https://github.com/golang/go/blob/d8392e69973a64d96534d544d1f8ac2defc1bc64/src/runtime/sys_linux_mipsx.s#L15-L44) - [Explainer on Go thread management](https://medium.com/@sanilkhurana7/understanding-the-go-scheduler-and-looking-at-how-it-works-e431a6daacf) # Threading Logic In this implementation, the new VM rotates between threads to provide [concurrent](https://en.wikipedia.org/wiki/Concurrent_computing) execution rather than true [parallel](https://en.wikipedia.org/wiki/Parallel_computing) processing. The VM state holds an ordered list of `ThreadState` objects representing the set of all threads. Once a thread has exited, it can be removed from the set of threads to avoid the overhead of maintaining an ever-growing collection of threads. ## Thread Traversal Threads are traversed deterministically by moving from the first thread to the last thread, then back to the first thread repeatedly. For example, given the set of threads: {0,1,2,3}, we would traverse to each like: 0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, …. This sequencing allows us to implement the set of threads as 2 stacks, whose states can be succinctly proven. For details, see the “Witness Updates” section. ## Context Switching ### Sleep, Yield, and Wait Operations to sleep, yield, or wait on some condition cause the VM to move on to the next thread in the sequence of threads. ### Wake When a wake call is made, the state’s `Wakeup` field is set to the target memory address specified in this call. The VM will iterate through the list of existing threads (visiting one thread at each individual step / state transition) looking for threads waiting on this address that can be woken up. First, the VM sets up for thread iteration by walking back to the first thread (iff `Wakeup` is set and we are traversing left, keep going until we have an empty left stack). Once we get there, switch directions and walk right, checking each thread as we go. Iteration stops when either: - A thread is found that is waiting on the target address and the value stored here has changed or else the timeout (if set) has elapsed. - If such a thread is found, the thread is woken up. - All threads have been checked (we have emptied the right stack) After stopping, the `Wakeup` signal is cleared from the state and normal execution resumes. ### Forced Thread Preemption To avoid thread starvation (for example where a busy loop never executes a sleep, yield, or wait), the VM will force a context switch if a given thread has been executing too long. For each step executed on a particular thread, the state variable `StepsSinceLastContextSwitch` is incremented. When we switch contexts to a new thread, `StepsSinceLastContextSwitch` is reset to 0. If `StepsSinceLastContextSwitch` reaches some maximum value `MAX_STEPS_PER_THREAD`(concrete value TBD), the vm forces a switch to the next thread in the sequence. # State Management ## Modified Data Structures ### State - Remove the following fields related to the CPU state which are now managed by the `ThreadState` structure: - `PC` - `NextPC` - `LO` - `HI` - `Registers` - Add thread-related fields: - `StepsSinceLastContextSwitch` - Tracks the number of steps devoted to the current thread. Once this hits some max value, we force a context switch to the next thread and reset this value to 0. - `Wakeup` - a `uint32` memory address representing a signal to wakeup threads that are waiting on the word at this address via [FUTEX_WAIT](https://www.man7.org/linux/man-pages/man2/futex.2.html). Set to -1 if there is no current wakeup signal. - `TraverseRight` - a `bool` representing the current direction of traversal through the threads - `ThreadsLeft` - a stack of `ThreadState` objects - `ThreadsRight` - a stack of `ThreadState` objects - `NextThreadID` - Tracks the next thread ID to be assigned to newly created threads. This is monotonically increasing. ## New Data Structures ### ThreadState - `ThreadID` - A `uint32` id for the thread - `ExitCode`- A `uint8` exit code - `Exited` - A bool determining whether the current thread has exited - `FutexAddr` - A `uint32` address that the thread is waiting on. Equal to -1 if there is no current wait. - `FutexVal` - A `uint32` value representing the expected value at `FutexAddr` when this thread is waiting. The thread can wake up when it reaches `FutureTimeoutStep` (below) or else when the value at this address differs from `FutexVal` . Set to 0 when there is no current wait. - `FutexTimeoutStep` - Represents a future VM step when this thread’s current wait is over and the thread can be woken up. Set to 0 when there is no wait or no timeout. Currently set to a fixed number of steps `FUTEX_TIMEOUT_STEPS` (10,000).* - The general idea is that we expect most timeouts to be small (5ms, 10ms etc). So a fixed timeout should work well in those contexts. - `Cpu` - `PC` - `NextPC` - `LO` - `HI` - `Registers` <aside> 💡 *TODO - figure out a good fixed value to use for `FUTEX_TIMEOUT_STEPS` Here’s one calculation: - Assume a 4.0GHz cpu, this translates to 4 billion operations (or steps) per second - That means we get 4 steps per nanosecond - Which means we get 4,000,000 steps per millisecond - So 10 milliseconds (a good assumed standard wait) would be 40,000,000 steps We could also do some empirical testing: - For example, we can adjust the fixed timeout up and down and see how this affects the number of steps required to complete program execution and how many waits are called. The idea being that if steps are too short, a thread might go back to sleep immediately when woken up, increasing the number of wait calls. If the steps are too long, other threads might be blocked waiting for the sleeping threads, which would increase the number of steps required before the program finishes executing. </aside> ## Witness Updates ### Proving the state of all threads In order to represent the current state of all threads, we can maintain 2 thread stacks: `ThreadsLeft` and `ThreadsRight` plus a `bool` telling us which way we are currently traversing the threads. These stacks can be represented in a way that is succinctly provable. **Representing the set of threads with 2 stacks** To traverse through the set of threads moving right, we pop a thread off the right stack, and push it onto the left stack. And vice versa for traversing leftward - we pop from the left stack and push to the right stack: ![image](https://hackmd.io/_uploads/rkBt5RnqR.png) When traversing right, if the right stack is empty, we switch directions. Similarly when traversing left if the left stack is empty we switch directions. To operate on the current thread without switching to a new thread context, we pop from the appropriate stack, update the current thread and push back to the same stack. To drop an exited thread, we just don’t push it back onto either stack. When adding a new thread, we can (arbitrarily) push to the stack we’re traversing away from. **Representing a stack as a “hash onion”:** Stacks are a nice data structure because they can be represented using a “hash onion” style witness / proof construction. A stack’s structure is represented and proven as follows[^1]: - A stack with no elements can be represented as some constant value like: - `w0 = hash(bytes32(0) ++ bytes32(0))`. - To add an element to the stack, the witness is updated to: - `w1 = hash(w0 ++ hash(el0))`. - To add another element, we get: - `w2 = hash(w1 ++ hash(el1))`. - To pop an element from this stack, peel back the last hash operation: - `w3 = w1` - To prove the top value `elTop` on the stack, given some witness `w`, you just need to reveal the `bytes32` witness `w'`for the stack without `elTop` and verify: - `w = hash(w' ++ hash(elTop))` > [^1]: :bulb: * the `++` operation represents concatenation and arguments on either side are byte strings # Syscalls Go runtime sourcecode syscall references: - https://github.com/golang/go/blob/d8392e69973a64d96534d544d1f8ac2defc1bc64/src/runtime/sys_linux_mipsx.s#L15-L44 ## New Syscalls | Syscall(s) | Operation | Description | | --- | --- | --- | | https://www.man7.org/linux/man-pages/man2/exit.2.html | | Exit / terminate the current thread | | https://www.man7.org/linux/man-pages/man2/gettid.2.html | | Return the id of the current thread | | https://www.man7.org/linux/man-pages/man2/futex.2.html | `FUTEX_WAIT_PRIVATE` | Given a memory address and an expected value, check if the actual value at this address matches the expected value. If these values do not match, return an error. Otherwise put the thread to sleep by setting `FutexAddr` and FutexVal on the thread state. If a timeout is set, once the timeout expires the thread can be awakened. Otherwise, when the value at `FutexAddr` differs from `FutexVal`, the thread can be woken up. | | https://www.man7.org/linux/man-pages/man2/futex.2.html | `FUTEX_WAKE_PRIVATE` | Set the `State.Wakeup` to the target address, and traverse through the threads (one step / thread at a time) until a thread waiting on the value at this address that can be awakened is found. Once such a thread is found and woken up, or else all of the threads are checked, clear the Wakeup value and resume normal operations. | | https://www.man7.org/linux/man-pages/man2/clone.2.html | | Clones the thread with the provided threadID and adds it to the list of threads. | | https://www.man7.org/linux/man-pages/man2/sched_yield.2.html | | Yield by moving to the next thread. | | https://www.man7.org/linux/man-pages/man2/open.2.html | | Return error [EBADF](https://www.man7.org/linux/man-pages/man3/errno.3.html) ["Bad File Descriptor"](https://www.man7.org/linux/man-pages/man3/errno.3.html) | | https://www.man7.org/linux/man-pages/man2/nanosleep.2.html | | “Sleep” by moving to the next thread. | | https://www.man7.org/linux/man-pages/man2/clock_gettime.2.html | | Use the current step and a Hz parameter to calculate an elapsed time value. | ## Noop Syscalls | Syscall(s) | Operation | Description | | --- | --- | --- | | https://www.man7.org/linux/man-pages/man2/sched_getaffinity.2.html | | Used by linux targets to get the number of CPUs available to the runtime. https://github.com/golang/go/blob/2496653d0a5c6c26b879bb5bdd135e1f7504e051/src/runtime/os_linux.go#L107 . This sets the `runtime.NumCPU()` . The method getproccount will [return a value of 1](https://github.com/golang/go/blob/2496653d0a5c6c26b879bb5bdd135e1f7504e051/src/runtime/os_linux.go#L118-L120) if an empty bitmask is returned from this syscall. So we can safely treat this as a noop - returning 0. | | https://www.man7.org/linux/man-pages/man2/madvise.2.html | | Can be ignored. This is merely a hint to the VM regarding memory usage. Should return success. | | https://www.man7.org/linux/man-pages/man2/rt_sigprocmask.2.html https://www.man7.org/linux/man-pages/man2/sigaltstack.2.html https://www.man7.org/linux/man-pages/man2/rt_sigaction.2.html | | sigprocmask, sigaltstack, sigaction related syscalls for signals. sigaction (see sysSigaction in [os_linux.go in the go runtime](https://github.com/golang/go/blob/go1.21.10/src/runtime/os_linux.go#L544) seems related to cgo. Shouldn’t be used for pure Go programs. | | https://www.man7.org/linux/man-pages/man2/prlimit64.2.html | | Used to get and set resource limits (such as cpu time, memory, etc) of a process. This shouldn’t affect the VM state and can be handled with a default successful return value. | | sysClose | | sysOpen always returns an error, so we can ignore sysClose | | sysPread64 | | | | sysFstat64 | | | | https://www.man7.org/linux/man-pages/man2/openat.2.html | | Used to read /proc fs data | | sysReadLink https://www.man7.org/linux/man-pages/man2/readlinkat.2.html | | Same as open. Used to read /proc files (and possibly others). One unusual callsite is in the http://google.golang.org/protobuf module. | | sysIoctl | | | | sysEpollCreate1 | | | | sysPipe2 | | | | sysEpollCtl | | | | sysEpollPwait | | | | sysGetRandom | | | | sysUname | | | | sysStat64 | | | | sysGetuid | | | | sysGetgid | | | | https://www.man7.org/linux/man-pages/man2/_llseek.2.html | | Used to load TZ (timezone) data. Go runtime handles bad tz info well. | | https://www.man7.org/linux/man-pages/man2/mincore.2.html (4217) | | | | https://www.man7.org/linux/man-pages/man2/tgkill.2.html 4266 | | Used by go runtime to [force syscall execution](https://github.com/golang/go/blob/d8392e69973a64d96534d544d1f8ac2defc1bc64/src/runtime/os_linux.go#L857) across all threads. TGKill is also [used for preemption](https://github.com/golang/go/blob/792a26130347c9b9db344ba56f86645679a1a9d9/src/runtime/signal_unix.go#L385), but this is codepath is only hit when the runtime signal handler is invoked, which shouldn't happen in the op-program | ## Syscall TODOs Still need to investigate whether and how the following should be implemented. | Syscall(s) | Operation | Description | | --- | --- | --- | | https://www.man7.org/linux/man-pages/man2/setitimer.2.html (4104) https://www.man7.org/linux/man-pages/man2/timer_create.2.html https://www.man7.org/linux/man-pages/man2/timer_settime.2.html https://www.man7.org/linux/man-pages/man2/timer_delete.2.html | | Timers are used for profiling, which can be stubbed out via binary instrumentation. TODO: where else are they being used. | | | | |