Try   HackMD

Multi-threaded Cannon (Experimental)

author: @mbaxter

Overview

This document discusses the proposed approach to implementing multithreading support in the Cannon VM. The main goal here is to reduce memory requirements of the op-program by enabling Go garbage collection, which requires threading support. Secondarily, this will allow us to remove some patching we do to overwrite unsupported operations from the loaded ELF program.

Thread management and scheduling are typically handled by the operating system via syscall operations that give program control over to the OS kernel. To support a minimal implementation of multithreading in the Cannon VM that specifically targets the go1.21 runtime, a few new Linux-specific syscall operations must be implemented. Additionally, the state of the threads in the system must be managed deterministically and in a provable way.

Resources

Threading Logic

In this implementation, the new VM rotates between threads to provide concurrent execution rather than true parallel processing. The VM state holds an ordered list of ThreadState objects representing the set of all threads. Once a thread has exited, it can be removed from the set of threads to avoid the overhead of maintaining an ever-growing collection of threads.

Thread Traversal

Threads are traversed deterministically by moving from the first thread to the last thread, then back to the first thread repeatedly. For example, given the set of threads: {0,1,2,3}, we would traverse to each like: 0, 1, 2, 3, 2, 1, 0, 1, 2, 3, 2, ….

This sequencing allows us to implement the set of threads as 2 stacks, whose states can be succinctly proven. For details, see the “Witness Updates” section.

Context Switching

Sleep, Yield, and Wait

Operations to sleep, yield, or wait on some condition cause the VM to move on to the next thread in the sequence of threads.

Wake

When a wake call is made, the state’s Wakeup field is set to the target memory address specified in this call. The VM will iterate through the list of existing threads (visiting one thread at each individual step / state transition) looking for threads waiting on this address that can be woken up. First, the VM sets up for thread iteration by walking back to the first thread (iff Wakeup is set and we are traversing left, keep going until we have an empty left stack). Once we get there, switch directions and walk right, checking each thread as we go. Iteration stops when either:

  • A thread is found that is waiting on the target address and the value stored here has changed or else the timeout (if set) has elapsed.
    • If such a thread is found, the thread is woken up.
  • All threads have been checked (we have emptied the right stack)

After stopping, the Wakeup signal is cleared from the state and normal execution resumes.

Forced Thread Preemption

To avoid thread starvation (for example where a busy loop never executes a sleep, yield, or wait), the VM will force a context switch if a given thread has been executing too long.

For each step executed on a particular thread, the state variable StepsSinceLastContextSwitch is incremented. When we switch contexts to a new thread, StepsSinceLastContextSwitch is reset to 0. If StepsSinceLastContextSwitch reaches some maximum value MAX_STEPS_PER_THREAD(concrete value TBD), the vm forces a switch to the next thread in the sequence.

State Management

Modified Data Structures

State

  • Remove the following fields related to the CPU state which are now managed by the ThreadState structure:
    • PC
    • NextPC
    • LO
    • HI
    • Registers
  • Add thread-related fields:
    • StepsSinceLastContextSwitch - Tracks the number of steps devoted to the current thread. Once this hits some max value, we force a context switch to the next thread and reset this value to 0.
    • Wakeup - a uint32 memory address representing a signal to wakeup threads that are waiting on the word at this address via FUTEX_WAIT. Set to -1 if there is no current wakeup signal.
    • TraverseRight - a bool representing the current direction of traversal through the threads
    • ThreadsLeft - a stack of ThreadState objects
    • ThreadsRight - a stack of ThreadState objects
    • NextThreadID - Tracks the next thread ID to be assigned to newly created threads. This is monotonically increasing.

New Data Structures

ThreadState

  • ThreadID - A uint32 id for the thread
  • ExitCode- A uint8 exit code
  • Exited - A bool determining whether the current thread has exited
  • FutexAddr - A uint32 address that the thread is waiting on. Equal to -1 if there is no current wait.
  • FutexVal - A uint32 value representing the expected value at FutexAddr when this thread is waiting. The thread can wake up when it reaches FutureTimeoutStep (below) or else when the value at this address differs from FutexVal . Set to 0 when there is no current wait.
  • FutexTimeoutStep - Represents a future VM step when this thread’s current wait is over and the thread can be woken up. Set to 0 when there is no wait or no timeout. Currently set to a fixed number of steps FUTEX_TIMEOUT_STEPS (10,000).*
    • The general idea is that we expect most timeouts to be small (5ms, 10ms etc). So a fixed timeout should work well in those contexts.
  • Cpu
    • PC
    • NextPC
    • LO
    • HI
  • Registers

Witness Updates

Proving the state of all threads

In order to represent the current state of all threads, we can maintain 2 thread stacks: ThreadsLeft and ThreadsRight plus a bool telling us which way we are currently traversing the threads. These stacks can be represented in a way that is succinctly provable.

Representing the set of threads with 2 stacks

To traverse through the set of threads moving right, we pop a thread off the right stack, and push it onto the left stack. And vice versa for traversing leftward - we pop from the left stack and push to the right stack:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

When traversing right, if the right stack is empty, we switch directions. Similarly when traversing left if the left stack is empty we switch directions. To operate on the current thread without switching to a new thread context, we pop from the appropriate stack, update the current thread and push back to the same stack. To drop an exited thread, we just don’t push it back onto either stack. When adding a new thread, we can (arbitrarily) push to the stack we’re traversing away from.

Representing a stack as a “hash onion”:

Stacks are a nice data structure because they can be represented using a “hash onion” style witness / proof construction. A stack’s structure is represented and proven as follows[1]:

  • A stack with no elements can be represented as some constant value like:
    • w0 = hash(bytes32(0) ++ bytes32(0)).
  • To add an element to the stack, the witness is updated to:
    • w1 = hash(w0 ++ hash(el0)).
  • To add another element, we get:
    • w2 = hash(w1 ++ hash(el1)).
  • To pop an element from this stack, peel back the last hash operation:
    • w3 = w1
  • To prove the top value elTop on the stack, given some witness w, you just need to reveal the bytes32 witness w'for the stack without elTop and verify:
    • w = hash(w' ++ hash(elTop))

Syscalls

Go runtime sourcecode syscall references:

New Syscalls

Syscall(s) Operation Description
https://www.man7.org/linux/man-pages/man2/exit.2.html Exit / terminate the current thread
https://www.man7.org/linux/man-pages/man2/gettid.2.html Return the id of the current thread
https://www.man7.org/linux/man-pages/man2/futex.2.html FUTEX_WAIT_PRIVATE Given a memory address and an expected value, check if the actual value at this address matches the expected value. If these values do not match, return an error. Otherwise put the thread to sleep by setting FutexAddr and FutexVal on the thread state. If a timeout is set, once the timeout expires the thread can be awakened. Otherwise, when the value at FutexAddr differs from FutexVal, the thread can be woken up.
https://www.man7.org/linux/man-pages/man2/futex.2.html FUTEX_WAKE_PRIVATE Set the State.Wakeup to the target address, and traverse through the threads (one step / thread at a time) until a thread waiting on the value at this address that can be awakened is found. Once such a thread is found and woken up, or else all of the threads are checked, clear the Wakeup value and resume normal operations.
https://www.man7.org/linux/man-pages/man2/clone.2.html Clones the thread with the provided threadID and adds it to the list of threads.
https://www.man7.org/linux/man-pages/man2/sched_yield.2.html Yield by moving to the next thread.
https://www.man7.org/linux/man-pages/man2/open.2.html Return error EBADF "Bad File Descriptor"
https://www.man7.org/linux/man-pages/man2/nanosleep.2.html “Sleep” by moving to the next thread.
https://www.man7.org/linux/man-pages/man2/clock_gettime.2.html Use the current step and a Hz parameter to calculate an elapsed time value.

Noop Syscalls

Syscall(s) Operation Description
https://www.man7.org/linux/man-pages/man2/sched_getaffinity.2.html Used by linux targets to get the number of CPUs available to the runtime. https://github.com/golang/go/blob/2496653d0a5c6c26b879bb5bdd135e1f7504e051/src/runtime/os_linux.go#L107 . This sets the runtime.NumCPU() .
The method getproccount will return a value of 1 if an empty bitmask is returned from this syscall. So we can safely treat this as a noop - returning 0.
https://www.man7.org/linux/man-pages/man2/madvise.2.html Can be ignored. This is merely a hint to the VM regarding memory usage. Should return success.
https://www.man7.org/linux/man-pages/man2/rt_sigprocmask.2.html
https://www.man7.org/linux/man-pages/man2/sigaltstack.2.html
https://www.man7.org/linux/man-pages/man2/rt_sigaction.2.html sigprocmask, sigaltstack, sigaction related syscalls for signals.
sigaction (see sysSigaction in os_linux.go in the go runtime seems related to cgo. Shouldn’t be used for pure Go programs.
https://www.man7.org/linux/man-pages/man2/prlimit64.2.html Used to get and set resource limits (such as cpu time, memory, etc) of a process. This shouldn’t affect the VM state and can be handled with a default successful return value.
sysClose sysOpen always returns an error, so we can ignore sysClose
sysPread64
sysFstat64
https://www.man7.org/linux/man-pages/man2/openat.2.html Used to read /proc fs data
sysReadLink
https://www.man7.org/linux/man-pages/man2/readlinkat.2.html Same as open. Used to read /proc files (and possibly others). One unusual callsite is in the http://google.golang.org/protobuf module.
sysIoctl
sysEpollCreate1
sysPipe2
sysEpollCtl
sysEpollPwait
sysGetRandom
sysUname
sysStat64
sysGetuid
sysGetgid
https://www.man7.org/linux/man-pages/man2/_llseek.2.html Used to load TZ (timezone) data. Go runtime handles bad tz info well.
https://www.man7.org/linux/man-pages/man2/mincore.2.html (4217)
https://www.man7.org/linux/man-pages/man2/tgkill.2.html 4266 Used by go runtime to force syscall execution across all threads. TGKill is also used for preemption, but this is codepath is only hit when the runtime signal handler is invoked, which shouldn't happen in the op-program

Syscall TODOs

Still need to investigate whether and how the following should be implemented.

Syscall(s) Operation Description
https://www.man7.org/linux/man-pages/man2/setitimer.2.html (4104)
https://www.man7.org/linux/man-pages/man2/timer_create.2.html
https://www.man7.org/linux/man-pages/man2/timer_settime.2.html
https://www.man7.org/linux/man-pages/man2/timer_delete.2.html Timers are used for profiling, which can be stubbed out via binary instrumentation. TODO: where else are they being used.

  1. Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →
    * the ++ operation represents concatenation and arguments on either side are byte strings ↩︎