CSCI 0300 Fundamentals of Computer Systems

# CSCI 0300 Fundamentals of Computer Systems ## Topic Overviews for Midterm Exam Computer organization * A **CPU**, which contains one or more processors. These are the pieces of silicon that actually run programs and act. * **Memory**, which is a form of storage, and the only form of storage that processors can access directly. * **Devices**, which allow the computer to store data across reboots and power failures (harddisks and SSDs) and interact with the outside world (screens, keyboards, speakers, WiFi controllers). Memory * A **Byte** is 8 bits so it takes on values between 2^0 and 2^8 - 1 (0 and 255), * A "**post-office box**" in computer memory is identified by an address. On a computer with M bytes of memory, there are M such boxes, each having as its address a number between 0 and M−1. C * argc = total number of arguments including the filename * argv = the actual arguments * Signed integers are stored as their two's complement * Arithmetic overflow on signed integers is **undefined behavior** * Arithmetic overflow on unsigned integers is **not** undefined behavior Pointers * `&*local` is the same as `local` * `int arr[1]` is the same as `int a; int* arr = &a` and ``int[] arr = {1}`` * `malloc(#)` returns a `void*` type and should be casted * Pointer arithmetic always increments the address stored in the pointer by the **number of bytes** of its type Memory Segments * Text (code) segment: static lifetime memory that contains the program code along with constant global variables * Data segment: static lifetime memory that contains (non constant) global variables * Stack segment: automatic lifetime memory that contains the local variables for a given function * Heap segment: dynamic lifetime memory controlled by the user ex: malloc, new etc * Note: the contents of uninitialized memory is **undefined behavior** `C` collection rules 1. The **first member** rule says that the address of a collection is the same as the address of its first member 2. The **array** rule says that all elements ("members") of the array are laid out consecutively in memory 3. The **struct** rule says that the members of a struct are laid out in the order they're declared in, without overlap, and subject only to alignment constraints. 4. The **union** rule says that the address of all members of a union is the same as the address of the union. 5. The **minimum** rule says that the memory used for a collection shall be the minimum possible without violating any of the other rules. 6. The **malloc** rule says that any call to malloc that succeeds returns a pointer that is aligned for any type. This rule has some important consequences: it means that malloc() must return pointers aligned for the maximum alignment, which on x86-64 Linux is 16 bytes. In other words, any pointer returned from malloc points to an address that is a multiple of 16. `C` structs * If a struct is defined as `struct x_t {...}` then we can initialize a struct of type `struct x_t` as follows: `struct x_t a = {...}` and `a.i1 = ...`. We can also have `struct x_t* a; a->i1 = ...` * The struct arrow syntax is the same as `a->i1 == (*a).i1` * When a struct is defined as `typedef struct {...} x_t;`, there is no need to use `struct x_t a`. Instead we can do `x_t a` * Compiler optimizations will reorder struct members but non optimized compliers won't `C` `sizeof` and `alignof` * `sizeof` keyword returns the size in bytes (!) of its argument, which can either be a type or an object. * The alignment of a type T is a number a ≥ 1 such that the address of every object of type T is a multiple of a. * ![](https://i.imgur.com/x4W8jgW.png) x86-64 Assembly ``` ->Overall register layout 63 31 15 7 0 +-------------------------------+-------------------------------+ | | | | | +---------------------------------------------------------------+ |---------------------%rax (64 bits/8 bytes)--------------------| |-----%eax (32 bits/4 bytes)----| |-%ax (16b/2B)--| |--%ah--|--%al--| <-- 8 bits/1 byte each ``` * q is an assembly quad and C long * L is an assembly long and C int * Thus, `%eax` stores C ints and assembly longs and `%rax` stores assembly quads and C longs * Computational instructions operate one or more registers. The convention is `op source, destination`. For example, instruction `addq %rax, %rbx` performs the computation` %rbx := %rbx + %rax`. * Data movement instructions move data between registers and memory – so they can move values from one register to another, from memory into a register, and from a register back to memory. For example, `movq %rax, %rbx` copies the contents of `%rax` into `%rbx`, so it performs the assignment `%rbx = %rax`. * Control flow instructions change the next instruction the processor executes (something called the "instruction pointer", and stored in special register `%rip`) * For assembly flags and common conditional expressions see https://cs.brown.edu/courses/csci0300/2023/notes/l10.html x86-64 Stack * `%rsp` always points to the lowest (leftmost) stack address that is currently used * When a function declares a new variable, the `%rsp` needs to be shifted to accomodate space for that new variable * When a function returns, `%rsp` has to move up (right) and back to where it was when the function was originally called * Explicit modification of `%rsp` involves arithmetic like `subq $0x10, %rsp` * Implicit modification of `%rsp` involves operations such as ` pushq %rax` which is the same as `movq %rax, -8(%rsp)` and then `subq $8, %rsp` or `subq $8, %rsp; movq %rax, (%rsp) * The first 6 arguments are stored in their specific registers as part of the calling convention and the remaining arguments are stored in the caller's stack frame and can be accessed by looking at the entry rsp of the caller * When a function returns, the `%rip` register is set to the address stored at the entry `%rsp` of the * `%rbp` always points to the highest (rightmost) stack address that is currently used and stores the address of the `%rbp` of the caller * ` call <addr>` The assembly instruction for calling a function. It stores addr into `%rip` to begin executing that part of the text section. This function also pushes the current instruction pointer (`%rip`) onto the stack * `callq <func>` "pushes" the current `%rip + 1` onto the stack as part of the caller and thus decreases `%rsp` by 8 * `leaveq` has the same effect as `mov %rbp,%rsp` and then `pop %rbp`. * `mov %rbp,%rsp` collapses the stack by moving `%rsp` to where `%rbp` * `pop %rbp` moves `%rbp` to the address it holds i.e the base pointer of the caller Midterm Practice Notes 2022 * A null byte corresponds to `0x00` * Trying to dereference null pointer results in undefined behavior * Global/data buffer overflow is when you try to access elements outside of your definition * Accessing uninitialized memory results in undefined behavior * Different machines may have different representations for undefined behavior such as signed integers etc * Note that structs have align of the biggest element * When compilers reorder structs, you cannot safely cast the struct pointer to the type of the first character * Dmalloc can detect heap-use-after free, wild writes only one byte beyond the assigned region. Dmalloc cannot detect writing a large amount beyond the assigned heap Midterm Practice Notes 2021 * The arguments to the function are stored at (`%rsp`) when the caller executes the callq instruction, but this callq instruction will push a return address onto the stack (basically, executing push `%rip`) and therefore when the callee begins executing instructions, the arguments will be stored starting at `8(%rsp)`, since %rsp changes during the callq instruction. * Statically-sized arrays (i.e. `T[n]`) follow the alignment of its underlying member T; thus, char `c[17]` has an alignment requirement of `alignof(char) == 1`. * **The end address of a struct must be aligned with the align of the largest member** * To make a struct with x bytes of data and y bytes of padding, force align with a long so there are x - 8 bytes of data and then use a character array of some bytes * The datatype upon which comp arch is defined on must fit in a register * boundary write error / heap buffer overflow is when you write past the allocated heap * internal metadata is more efficient as the metadata might be in the same cache block * Structs can violate the malloc rule in that the returned pointer from malloc must be aligned with the largest primitive type. See 4D 2021 midterm for example * `cmpq $100, -8(%rsp)` `jle .L2` Corresponds to `if (-8(%rsp) <= 100 * `%rsp` is not actually stored on the stack. It only holds a pointer to the top of the stack ## Topic Overviews for Final Exam Operating Systems * Mediates access to hardware (safety and convenience) * Isolation between processes (safety and performance) * Fair sharing of resources (performance and safety) * Kernel: perating system software that runs with full privilege, meaning full privilege over all resources on the computer * Userspace: software that runs without elevated machine privilege * Process: a program in active execution * Program: compiled code that is not in active execution Weensy OS * Written in C++ but has some limitations in that: * if kernel code crashes or does something bad (e.g., accessing memory outside a valid segment), there are no safeguards * The standard library isn't available in the kernel because a lot of the standard library functions rely on the kernel for memory allocation etc * Debuggers like GDB don't work as well with kernel code since GDB generally is managed by the kernel * sys_yield() is a system call that yields resources back to the kernel and lets the kernel run another process (take turns to run on the processor) * console_printf() is a library function provided by WeensyOS's userspace standard library that allows programs to write to the console * Upon every timer interrupt, the kernel gets control and then schedules a progam to run * asm volatile ("cli"); is the "clear interrupts" c/c++ command that prevents any interrupts from interrupting a process (such as the kernel) Privilege * The `cs` register represents the fact that the processor is executing in privileged/unprivileged modes `cs = 0` represents privileged mode, `cs = 3` represents unprivileged mode * When the kernel allows a program to execute, the kernel changes the value of the `cs` register from 0 to 3, and back to 3 from 0 when sys_yield is called or an exception is triggered * A segfault occurs if a userspace process tries to access a higher privileged memory area. Similar faults trigger kernel code to execute that handles the exception Memory Protection * Timer interrupts, restricted process privilege, and kernel fault/interrupt handlers help the kernel protect the computer from processes that attack the CPU time resource. But there are other important resources that it also needs to protect, most importantly the memory (RAM), which is a pool of fast storage that all processes (including the kernel) share. * Pagetables are used which determine which addresses each userspace and kernel process have access to * PTE_P: present. Indicates that the page is in use. * PTE_W: writable. Indicates that the page can be read and written. * PTE_U: user-space. Indicates that the page can be accessed by user-space processes (i.e., if %cs is set to 3). * To provide isolation between userspace processes, virtual memory is needed Virtual Memory * The idea behind virtual memory is to give each process the illusion that it is the only process running on the procesor and has access to all of the memory available, this is the power of virtualization: the notion of faking out an equivalent abstraction over some hardware without changing the interface * Each process gets its own pagetable which is what enables address translation from virtual address to physical memory Pagetables * Each level of a pagetable contains 512 entries of 8 bytes each where each entry is an address to the next lower level page table * This structure means that there is 1 L4 page table, up to 512 L3 page tables (each of 512 L4 PT entries can point to a single L3 PT page), up to 512^2 L2 page tables, and up to 512^33 L1 page tables. Using all 512^3 L1 page tables would allow addressing 512^3 × 512 = 512^4 ≈ 68 billion pages. 68 billion pages of 4096 bytes each would cover 256 TB of addressable memory; the page tables themselves would be 512 GB in size. * All addresses we have seen are virtual addresses * The x86-64 architecture uses four levels of page table. This is reflected in the structure of a virtual address: ``` 63 47 38 29 20 11 0 +---------+------+------+------+------+-----------+ | UNUSED | L4 | L3 | L2 | L1 | offset | +---------+------+------+------+------+-----------+ |9 bits|9 bits|9 bits|9 bits| 12 bits ``` * Each of the 9 bits represent an index into the LX pagtable. The scheme is as follows in the L4 page table, access L4[47:38] which gives us the L3 page table. Then access L3[38:29] etc.. * Process should never be able to write to pagetables. Reading pagetable is OK. * Generally process are never able to write directly to physical addresses. When a pointer is dereferenced, all of this pagetable translation happens under the hood essentially in the hardware System Calls * First the syscall handler is called, then cs is changed to 0 to indicate that the kernel is currently executing on the processor, the register values for the process is saved * `fork() `creates a clone (duplicates) of the current process: copies process description, makes a new PID, copies every register value except for RAX, copies pagetable with new PA mappings but same VA, clone of stack, heap, code (can be shared bc read only) but elsewhere since we don't want race conditions on R/W segments of memory. When fork returns, it does so in different fashions for parent and child. If the return value is 0, then it is the child process, else it is in the parent process. * `execv(pathname, argv)` replaces the active program with another program where argv is defined as argv[0] = program name, argv[1:] = arguments to program at pathname, else nullptr. This replaces the code, stack, and heap segment along with a fresh state of the registers along with fresh pagetable mappings. The PID does not change because a new process is not created! * `waitpid(pid_of_child_process, pointer_wait_status, 0)`. fork() returns the PID of the child process so we can wait on any child because we will always call fork in the parent. wait_status holds the exit status of the child process specified by pid_of_child_process. Note: only controls the order in which parent and child processes exit. When `pid_of_child_process == -1`, then the function call will wait for any child process to terminate. * `exit(status)` exits the process and communicates this number "status" to the parent process that is waiting on the process. * Zombie process: child process exits but the parent has not called wait/waitpid on the pid of the child process yet so the PID exists in the kernel memory along with the exit status * No grandparent- process can wait/waitpid on a grandchild process * If the parent process exits before a child process, the child process will be reparented Pipes * pipes are unidirectional with 1 writer and 1 reader for each pipe. * `pipe(pipefds, [optional flags])` where `pipefds[0]` is the read end of the pipe and `pipefds[1]` is the write end of the pipe. * In a process if you create a pipe and then fork, the file descriptor table contains both ends of the pipe. * Convention says to `close(pipefds[0])` if the current process is not reading from the pipe and only writing. * Pipes are only shared betwen parent and child processes (or at least processes with some sort of relationship) * File descriptor in the kernel are pointers to file structs whereas file descriptor table in the userspace refer to indexes into the file descriptor array in the kernel * Pipes have a limited capacity and in linux the cap is 64 KB Multiprocessing * Using more than one process to achieve a task * Apache http tasks and the shell pipe command Parallelism * Truly running one process on each processor at the same time * Requires some sort of hardware to support this (i.e multiple CPUs) Concurrency * Time slicing performed by the operating system on processes that exist at the same time but are run one at a time on the CPU in short indistinguishable intervals Processes * Different virtual address space with different PA mappings * Independent memory * Each process has >= 1 thread (initially a process has only one thread that runs main) * Because processes have different mappings for the same virtual addresses, each process has its own state for all the variables Threads * Same virtual address space and same memory * Different stacks, same code, heap, etc. * Because threads share the same code, heap, virtual address space, there can be race conditions * The kernel sees threads as seperate processes that use the same pagetable * Make sure that the threads are not accessing the same stack * We can create threads using the clone system call, * `thread.join()` is the same as waiting on a process to finish execution * We can also create threads using `pthread_create` which is also available in C * When threads are run and main() exits without detaching or joining the threads, undefined behavior occurs Race Conditions * Race conditions occurs when 2 or more threads observe shared state in memory without synchronization and act upon it (ORDER OF OPERATIONS IS UNDEFINED) * Because threads share memory, accessing and modification of shared memory casues unpredictable behavior when it comes to the contents of the shared memory * Because of how interrupts work, a thread might not be able to store their update to a variable into RAM before another thread loads the same memory from RAM * Combining multiple ASM instructions into one do not solve race conditions because caches are maintained and values from memory are not always up to date Synchronization * When 2 or more threads simultaneously access an object in memory, then either all accesses must be reads or even if one of these operations is a write, the programmer must enforce an order of operations Atomics * `std::atomic<T> atom`where T is a primitive type <= 8 bytes * `atom->fetch_add(x)` loads `atom` from memory and add `x` * Compilers use the `lock` instruction to execute `fetch_add` etc * Atomic instructions are quite expensive because they require you to interact with hardware timers, etc * When atomic instructions are executed, the atomic variable whenever it is altered, all copies are changed too (i.e in other processes's caches) Synchronization Objects * Mutex: mutual exclusion `std::mutex lck` `lck.lock()` to lock a segment of code, `lck.unlock()` and only one thread may succceed to lock a thread and the unsuccessful threads will wait until the mutex has been unlocked. There is no return value because these mutexes are * Critical sections refer to segments of code that execute one at a time * `std::shared_mutex` >= 1 readers and exactly 1 writer can succeed at a time * `std::condition_variable cv` threads `cv.wait(std:::unique_lock<std::mutex> locked_mtx)` unlocks the mutexa and then parks the thread (these two steps happen atomically) (sleeps) on other threads until a thread `cv.notify_all()` or `cv.notify_one()` and then relocks the mutex and returns * Because `lock` and `unlock` are expensive instructions, condition variables are able to play a role to mitigate these expensive instructions * We must wrap a `wait` call in a while loop because of spurious wake up calls * `std::unique_lock<std::mutex> guard(mtx)` is a scoped lock with respect to the current scope (automatic lifetime) Deadlock * If two threads try to lock mutex 1 and then mutex 2 in opposing orders, then both threads are stuck in deadlock. Suppose thread 1 locks mutex 1, then thread 2 locks mutex 2, then thread 1 locks mutex 2 but cannot do so since it is waiting on mutex 2 to be unlocked. Similarly, thread 2 locks mutex 1 since it is waiting on mutex 1 to be unlocked. This is deadlock. * To AVOID deadlock, we need to maintain a consistent lock acquisition order Bounded Buffer * `bpos_` is the position at which the bbuf will be read from * `blen_` is the amount of data in the bbuf * `bindex = (bpos_ + blen_) % bcapacity` is the position at which the bbuf will be written to * `bpos_` needs synchronization because one thread is reading and another is writing * `blen_` needs synchronization because one thread is reading and another is writing * Multiple locks and unlocks in a function may cause issues in that some variable may change in between the two critical sections