Multi-threaded Cannon (Experimental)
author: @mbaxter
This document discusses the proposed approach to implementing multithreading support in the Cannon VM. The main goal here is to reduce memory requirements of the op-program
by enabling Go garbage collection, which requires threading support. Secondarily, this will allow us to remove some patching we do to overwrite unsupported operations from the loaded ELF program.
Thread management and scheduling are typically handled by the operating system via syscall
operations that give program control over to the OS kernel. To support a minimal implementation of multithreading in the Cannon VM that specifically targets the go1.21
runtime, a few new Linux-specific syscall
operations must be implemented. Additionally, the state of the threads in the system must be managed deterministically and in a provable way.
mipso32
column) with links to linux documentation, for example:
In this implementation, the new VM rotates between threads to provide concurrent execution rather than true parallel processing. The VM state holds an ordered list of ThreadState
objects representing the set of all threads. Once a thread has exited, it can be removed from the set of threads to avoid the overhead of maintaining an ever-growing collection of threads.
Threads are traversed deterministically by moving from the first thread to the last thread, then back to the first thread repeatedly. For example, given the set of threads: {0,1,2,3}, we would traverse to each like: 0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, ….
This sequencing allows us to implement the set of threads as 2 stacks, whose states can be succinctly proven. For details, see the “Witness Updates” section.
Operations to sleep, yield, or wait on some condition cause the VM to move on to the next thread in the sequence of threads.
When a wake call is made, the state’s Wakeup
field is set to the target memory address specified in this call. The VM will iterate through the list of existing threads (visiting one thread at each individual step / state transition) looking for threads waiting on this address that can be woken up. First, the VM sets up for thread iteration by walking back to the first thread (iff Wakeup
is set and we are traversing left, keep going until we have an empty left stack). Once we get there, switch directions and walk right, checking each thread as we go. Iteration stops when either:
After stopping, the Wakeup
signal is cleared from the state and normal execution resumes.
To avoid thread starvation (for example where a busy loop never executes a sleep, yield, or wait), the VM will force a context switch if a given thread has been executing too long.
For each step executed on a particular thread, the state variable StepsSinceLastContextSwitch
is incremented. When we switch contexts to a new thread, StepsSinceLastContextSwitch
is reset to 0. If StepsSinceLastContextSwitch
reaches some maximum value MAX_STEPS_PER_THREAD
(concrete value TBD), the vm forces a switch to the next thread in the sequence.
ThreadState
structure:
PC
NextPC
LO
HI
Registers
StepsSinceLastContextSwitch
- Tracks the number of steps devoted to the current thread. Once this hits some max value, we force a context switch to the next thread and reset this value to 0.Wakeup
- a uint32
memory address representing a signal to wakeup threads that are waiting on the word at this address via FUTEX_WAIT. Set to -1 if there is no current wakeup signal.TraverseRight
- a bool
representing the current direction of traversal through the threadsThreadsLeft
- a stack of ThreadState
objectsThreadsRight
- a stack of ThreadState
objectsNextThreadID
- Tracks the next thread ID to be assigned to newly created threads. This is monotonically increasing.ThreadID
- A uint32
id for the threadExitCode
- A uint8
exit codeExited
- A bool determining whether the current thread has exitedFutexAddr
- A uint32
address that the thread is waiting on. Equal to -1 if there is no current wait.FutexVal
- A uint32
value representing the expected value at FutexAddr
when this thread is waiting. The thread can wake up when it reaches FutureTimeoutStep
(below) or else when the value at this address differs from FutexVal
. Set to 0 when there is no current wait.FutexTimeoutStep
- Represents a future VM step when this thread’s current wait is over and the thread can be woken up. Set to 0 when there is no wait or no timeout. Currently set to a fixed number of steps FUTEX_TIMEOUT_STEPS
(10,000).*
Cpu
PC
NextPC
LO
HI
Registers
In order to represent the current state of all threads, we can maintain 2 thread stacks: ThreadsLeft
and ThreadsRight
plus a bool
telling us which way we are currently traversing the threads. These stacks can be represented in a way that is succinctly provable.
Representing the set of threads with 2 stacks
To traverse through the set of threads moving right, we pop a thread off the right stack, and push it onto the left stack. And vice versa for traversing leftward - we pop from the left stack and push to the right stack:
When traversing right, if the right stack is empty, we switch directions. Similarly when traversing left if the left stack is empty we switch directions. To operate on the current thread without switching to a new thread context, we pop from the appropriate stack, update the current thread and push back to the same stack. To drop an exited thread, we just don’t push it back onto either stack. When adding a new thread, we can (arbitrarily) push to the stack we’re traversing away from.
Representing a stack as a “hash onion”:
Stacks are a nice data structure because they can be represented using a “hash onion” style witness / proof construction. A stack’s structure is represented and proven as follows[1]:
w0 = hash(bytes32(0) ++ bytes32(0))
.w1 = hash(w0 ++ hash(el0))
.w2 = hash(w1 ++ hash(el1))
.w3 = w1
elTop
on the stack, given some witness w
, you just need to reveal the bytes32
witness w'
for the stack without elTop
and verify:
w = hash(w' ++ hash(elTop))
Go runtime sourcecode syscall references:
Syscall(s) | Operation | Description |
---|---|---|
https://www.man7.org/linux/man-pages/man2/exit.2.html | Exit / terminate the current thread | |
https://www.man7.org/linux/man-pages/man2/gettid.2.html | Return the id of the current thread | |
https://www.man7.org/linux/man-pages/man2/futex.2.html | FUTEX_WAIT_PRIVATE |
Given a memory address and an expected value, check if the actual value at this address matches the expected value. If these values do not match, return an error. Otherwise put the thread to sleep by setting FutexAddr and FutexVal on the thread state. If a timeout is set, once the timeout expires the thread can be awakened. Otherwise, when the value at FutexAddr differs from FutexVal , the thread can be woken up. |
https://www.man7.org/linux/man-pages/man2/futex.2.html | FUTEX_WAKE_PRIVATE |
Set the State.Wakeup to the target address, and traverse through the threads (one step / thread at a time) until a thread waiting on the value at this address that can be awakened is found. Once such a thread is found and woken up, or else all of the threads are checked, clear the Wakeup value and resume normal operations. |
https://www.man7.org/linux/man-pages/man2/clone.2.html | Clones the thread with the provided threadID and adds it to the list of threads. | |
https://www.man7.org/linux/man-pages/man2/sched_yield.2.html | Yield by moving to the next thread. | |
https://www.man7.org/linux/man-pages/man2/open.2.html | Return error EBADF "Bad File Descriptor" | |
https://www.man7.org/linux/man-pages/man2/nanosleep.2.html | “Sleep” by moving to the next thread. | |
https://www.man7.org/linux/man-pages/man2/clock_gettime.2.html | Use the current step and a Hz parameter to calculate an elapsed time value. |
Syscall(s) | Operation | Description |
---|---|---|
https://www.man7.org/linux/man-pages/man2/sched_getaffinity.2.html | Used by linux targets to get the number of CPUs available to the runtime. https://github.com/golang/go/blob/2496653d0a5c6c26b879bb5bdd135e1f7504e051/src/runtime/os_linux.go#L107 . This sets the runtime.NumCPU() . |
|
The method getproccount will return a value of 1 if an empty bitmask is returned from this syscall. So we can safely treat this as a noop - returning 0. | ||
https://www.man7.org/linux/man-pages/man2/madvise.2.html | Can be ignored. This is merely a hint to the VM regarding memory usage. Should return success. | |
https://www.man7.org/linux/man-pages/man2/rt_sigprocmask.2.html | ||
https://www.man7.org/linux/man-pages/man2/sigaltstack.2.html | ||
https://www.man7.org/linux/man-pages/man2/rt_sigaction.2.html | sigprocmask, sigaltstack, sigaction related syscalls for signals. | |
sigaction (see sysSigaction in os_linux.go in the go runtime seems related to cgo. Shouldn’t be used for pure Go programs. | ||
https://www.man7.org/linux/man-pages/man2/prlimit64.2.html | Used to get and set resource limits (such as cpu time, memory, etc) of a process. This shouldn’t affect the VM state and can be handled with a default successful return value. | |
sysClose | sysOpen always returns an error, so we can ignore sysClose | |
sysPread64 | ||
sysFstat64 | ||
https://www.man7.org/linux/man-pages/man2/openat.2.html | Used to read /proc fs data | |
sysReadLink | ||
https://www.man7.org/linux/man-pages/man2/readlinkat.2.html | Same as open. Used to read /proc files (and possibly others). One unusual callsite is in the http://google.golang.org/protobuf module. | |
sysIoctl | ||
sysEpollCreate1 | ||
sysPipe2 | ||
sysEpollCtl | ||
sysEpollPwait | ||
sysGetRandom | ||
sysUname | ||
sysStat64 | ||
sysGetuid | ||
sysGetgid | ||
https://www.man7.org/linux/man-pages/man2/_llseek.2.html | Used to load TZ (timezone) data. Go runtime handles bad tz info well. | |
https://www.man7.org/linux/man-pages/man2/mincore.2.html (4217) | ||
https://www.man7.org/linux/man-pages/man2/tgkill.2.html 4266 | Used by go runtime to force syscall execution across all threads. TGKill is also used for preemption, but this is codepath is only hit when the runtime signal handler is invoked, which shouldn't happen in the op-program |
Still need to investigate whether and how the following should be implemented.
Syscall(s) | Operation | Description |
---|---|---|
https://www.man7.org/linux/man-pages/man2/setitimer.2.html (4104) | ||
https://www.man7.org/linux/man-pages/man2/timer_create.2.html | ||
https://www.man7.org/linux/man-pages/man2/timer_settime.2.html | ||
https://www.man7.org/linux/man-pages/man2/timer_delete.2.html | Timers are used for profiling, which can be stubbed out via binary instrumentation. TODO: where else are they being used. | |
++
operation represents concatenation and arguments on either side are byte strings ↩︎