--- tags: computer-arch --- # Quiz2 of Computer Architecture (2022 Fall) > Solutions ## Problem `A` Given the head of a linked list and a value `x`, partition it such that all nodes less than `x` come before nodes greater than or equal to `x`. You should preserve the original relative order of the nodes in each of the two partitions. Example 1: ![](https://hackmd.io/_uploads/S17X_WYGj.png) Input: `head = [1,4,3,2,5,2]`, `x = 3` Output: `[1,2,2,4,3,5]` Example 2: Input: `head = [2,1]`, `x = 2` Output: `[1,2]` C definition for the above singly-linked list: ```c typedef struct _node { int val; struct _node *next; } node_t; ``` The function prototype: ```c node_t *partition(node_t *head, int x); ``` The corresponding RISC-V implementation is listed as following: ```c .text .align 2 .globl partition partition: addi a5, sp, 12 sw a5, 24(sp) addi a5, sp, 20 sw a0, 12(sp) sw zero, 20(sp) sw a5, 28(sp) L1: bne a0, zero, L4 lw a5, 24(sp) lw a4, 20(sp) sw a4, 0(a5) lw a5, 28(sp) lw a0, 12(sp) sw zero, 0(a5) addi sp, sp, 32 jr ra L4: lw a5, 0(a0) bge a5, a1, L2 lw a4, 24(sp) addi a5, sp, 24 L3: sw a0, 0(a4) addi a4, a0, 4 sw a4, 0(a5) lw a0, 4(a0) j L1 L2: lw a4, 28(sp) addi a5, sp, 28 j L3 ``` Let's [reverse-engineer](https://en.wikipedia.org/wiki/Reverse_engineering) to get the RISC-V assembly back to the below C code: ```c node_t *partition(node_t *head, int x) { node_t *L = NULL, **p1 = &head, **p2 = &L; for (node_t *node = head; node; node = node->next) { node_t ***indir = node->val >= x ? (A01 /* fill your code */) : (A02 /* fill your code */); **indir = node; *indir = A03 /* fill your code */; } *p1 = L, *p2 = NULL; return head; } ``` Obviously, the above C code listing was incomplete, you shall provide the functioned implementations. `A01`, `A02`, and `A03` are C expressions. You must obey the following rules when filling them: * Write shorter code as possible as you can. * Follow the consistent coding style. That is, we prefer **`node_t *L`** to `node_t* L`. Be aware of the **spaces**! Details determine success or failure. > * A01 = ? > ==`&p2`== > * A02 = ? > ==`&p1`== > * A03 = ? > ==`&(**indir)->next`== > `*indir = &node->next;` is another feasible answer aside from `*indir = &(**indr)->next;`. > > This is an example. At this moment, the program is at the beginning of iteration body. > > Green square means pointer to pointer to pointer to `node_t`. Blue square means pointer to pointer to `node_t`. Yellow square means pointer to `node_t`. > ![](https://i.imgur.com/adbw7uB.png) > > After ` node_t ***indir = node->val >= x ? &p2 : &p1; ` is executed, the indir will point to p1, because `node->value` is less than x.(4<6) > ![](https://i.imgur.com/mQMsMvc.png) > > Since `**indir = node;` has the same effect as `*p = node` in this scenario, after it is executed, the `next` pointed by `p1` will point to node. ![](https://i.imgur.com/3qqa4Cr.png) > > In the solutions, next instruction is `*indir = &(**indir)->next;`, which has the same effect as `p1 = &node->next` in this scenario. It makes `p1` points to `next` field of `node`. ![](https://i.imgur.com/NqKBwIB.png) > > After `*p1 = L, *p2 = NULL;` is executed, two list will be connected. ![](https://i.imgur.com/sKuSDUR.png) --- ## Problem `B` Consider the following Python class: ```python class Vector: def __init__(self, x, y): self.x = x self.y = y def transform(self, f): return Vector(f(self.x), f(self.y)) ``` This code has to be converted to C code below. ```c #include <stdlib.h> typedef struct vector { int x, y; } vector_t; vector_t *transform(vector_t *self, int (*f)(int)) { vector_t *nv = malloc(sizeof(vector_t)); nv->x = f(self->x), nv->y = f(self->y); return nv; } ``` Translate the `transform` function to RISC-V. The function takes inputs `self` in `a0` and `f` in `a1`, and returns output in `a0`. You may assume that `vector_t` is as defined in the C code. You may also assume that you have access to `malloc`, and that `malloc` and f each receive their argument in `a0`, and return their result in `a0`. Your solution MUST fit within the lines provided. ```c transform: addi sp, sp, B01 B02 B03 B04 B05 mv s0, a0 mv s1, a1 li a0, 8 jal ra, malloc mv s2, a0 lw a0, 0(s0) B06 sw a0, 0(s2) lw a0, 4(s0) B07 sw a0, 4(s2) mv a0, s2 B08 B09 B10 B11 addi sp, sp, B12 ret ``` Both `B06` and `B07` MUST use `jalr` instruction. You MUST use ABI name for accessing registers defined in [RISC-V Calling Convention](https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf). i.e, use `ra` instead of `x1` while it appears in `jalr` instruction. > * B01 = ? > ==`-16`== > * B02 = ? > ==`sw ra, 0(sp)`== > * B03 = ? > ==`sw s0, 4(sp)`== > * B04 = ? > ==`sw s1, 8(sp)`== > * B05 = ? > ==`sw s2, 12(sp)`== > * B06 = ? > ==`jalr ra, s1, 0`== > * B07 = ? > ==`jalr ra, s1, 0`== > * B08 = ? > ==`lw ra, 0(sp)`== > * B09 = ? > ==`lw s0, 4(sp)`== > * B10 = ? > ==`lw s1, 8(sp)`== > * B11 = ? > ==`lw s2, 12(sp)`== > * B12 = ? > ==`16`== --- ## Problem `C` The stack can hold local variables in addition to the `ra` register and the `s` registers. The following labels that were externally specified are available to you: * `password`: a pointer to a statically-stored string "secretpass" * `get_chars`: A function defined as follows: * Input: `a0` is a pointer to a buffer * Effect: Reads characters from stdin, and fills the buffer pointed to by `a0` with the read data, null-terminating the string. Your code may assume that the input is at most `19` characters, not including the null-terminator. * Output: None The function `verify_password` is defined as follows: * Input: No register input; however, the function receives a string input from stdin. * Output: `a0` returns `1` if the input from stdin is exactly "secretpass", and `0` otherwise. Complete the function `verify_password`. Each line contains exactly one instruction or pseudoinstruction. ```c= verify_password: addi sp, sp, -24 # Make space for a 20-byte buffer C01 # Your code here mv a0, sp jal ra, get_chars la t0, password mv t1, sp Loop: lb t2, 0(t0) lb t3, 0(t1) C02 # Your code here C03 # Your code here addi t0, t0, C04 # Your code here addi t1, t1, C05 # Your code here j Loop Pass: li a0, C06 # Your code here j End Fail: li a0, C07 # Your code here End: C08 # Your code here C09 # Your code here jr ra ``` You MUST use ABI name for accessing registers defined in [RISC-V Calling Convention](https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf). i.e, use `ra` instead of `x1` while it appears in `jalr` instruction. > * C01 = ? > ==`sw ra, 20(sp)`== > * C02 = ? > ==`bne t2, t3, Fail`== OR ==`bne t3, t2, Fail`== > * C03 = ? > ==`beq t2, x0, Pass`== OR ==`beq t2, zero, Pass`== OR ==`beq t3, x0, Pass`== OR ==`beq t3, zero, Pass`== > * C04 = ? > ==`1`== > * C05 = ? > ==`1`== > * C06 = ? > ==`1`== > * C07 = ? > ==`0`== > * C08 = ? > ==`lw ra, 20(sp)`== > * C09 = ? > ==`addi sp, sp, 24`== > **Line 3**: First, we note that Line 2 allocated 24 bytes of space on the stack. Line 3 is storing a 4-byte register value to the stack space that was just allocated. This looks a lot like the function prologues that we've seen before in projects and labs! > > A quick refresher on function prologues and epilogues: according to calling convention, there are some callee-saved registers whose original values must be preserved after the function returns. > To achieve this, we usually store the original register values on the stack at the beginning of the function (the prologue), and we restore the original register values at the end of the function (the epilogue). > > In the provided code, there is only one callee-saved register already being used: `ra` at line 5. All the other registers being used are `t` registers which do not need to be saved. > > Thus we should save the original value of `ra` on the stack so that we can restore it later. > > **Line 4**: First, we note that Line 5 is calling the `get_chars` function. Before we call a function, we need to make sure that its arguments are properly loaded into the argument (`a`) registers. > According to the question, the `get_chars` function takes in one argument, in the `a0` register. > Thus, this line is probably going to involve putting something into the `a0` register. > > According to the question, the argument in `a0` is a pointer to a buffer in memory, which the function will fill up with (at most `19`) input characters. In other words, the `get_chars` function requires 20 bytes of space in memory to store the input characters, so `a0` needs to store the address of some 20-byte space in memory where we can store a string. Where do we have 20 bytes of space in memory right now? We allocated 24 bytes on the stack in line 2, and we used 4 of those bytes in line 3, so we have 20 bytes on the stack that we can use! > > What is the address of this space on the stack? Remember that the stack pointer `sp` always contains the address of the bottom of the stack. Since the stack grows down to make space (see line 2), the 20 bytes of space is located at the bottom of the stack, and the address of this space is the address stored in `sp`. Note that in Line 3, we stored the saved `ra` register value at the top of the 24-byte space on the stack, so the 20-byte buffer starts at the very bottom of the stack, where `sp` is pointing. > > **Line 6**: Lines 6 and 7 initialize t0 and t1, which we can see will be used repeatedly in the loop. We initialize `t0` to be the address of the start of the Password string, and we initialize `t1` to be the address of the start of the user’s input string. This will let us compare each character of the strings in a loop later. > > One way to reason this out: this line is putting a value into `t0`. We see at line 9 that `t0` will be used as an address (because it's in parentheses). `la` is an instruction that loads addresses into a register. > > Another way to reason this out is to note that the Password label is involved in this instruction, so we must use an instruction that uses a label. Branches and jumps don't make sense here because we aren’t implementing any loops or if/else conditions yet, so by process of elimination, we should use `la` here. > > **Line 7**: One way to reason this out is to note that only two registers are provided, so we probably want a pseudoinstruction that uses only two registers. `mv` is the only logical pseudoinstruction to use here. > > **Lines 9-10**: We want to use the loop to compare each character of the input string with each character of the provided password string. In lines 6-7, we initialized `t0` and `t1` with the addresses of the strings. Now we need to dereference those addresses to load each character of the string from memory. Since each character is one byte long, `lb` is the appropriate load instruction to use here. > > One way to reason this out is to note that the 0(t0) syntax is used only in load and store instructions. The addresses in `t0` and `t1` already contain data (the hardcoded password string and the input string, respectively), so we don't want to store to those addresses. This means we must be loading from those addresses. > > **Line 11**: Now that we’ve loaded a character from each string from memory, we need to check if those characters match. If they don't match, we can stop checking characters in a loop and jump to the fail case. > > **Line 12**: If we've reached the end of the string without branching to the Fail case, then we've checked that every character is equal, and we should enter the Pass case. > > Remember that strings end with a null terminator, so if either loaded character is `0` (null byte), then we know that we reached the end of both strings. Note that Line 11 will check for us that both strings have reached the null terminator, since it checks that every character is equal. This is why the Pass branch must come after the Fail branch–if they were swapped, then this function would jump to Pass whenever the shorter string terminates, even if the longer string still has extra characters left. > > **Lines 13-14**: Once we finish checking a pair of characters, we need to increment the addresses to the next character. The next time the loop runs, the load instructions will now load the next character in the string. > > Recall that characters are 1 byte, and RISC-V indexes memory by bytes, so we should increment the addresses in t0 and t1 by 1. > > **Line 15**: If we didn't jump to the Fail or Pass case, after we increment the addresses, we need to keep executing the loop and check the next character. To do this, we should jump back to the start of the loop. This is not a function call, and we don’t care about saving our return value, so a simple jump instruction is sufficient. > > There is an alternate solution here that combines Line 12 (pass check) and Line 15 (jumping to loop). As in the standard Line 12, we check if t2 (or t3, which must be equal at this point) is equal to `0` (the null byte). If so, we’ve reached the end of the string, so we should continue execution normally to the Pass case immediately following the loop. If not, then we need to loop again, so we jump back to the start of the loop. This logic is captured in `bne t2, x0, Loop`. With this alternate solution, Line 12 can be blank (or any innocuous instruction that doesn’t affect program execution, e.g. `add t6, x0, x0`). > > **Line 17**: If the check passes, we should load 1 into the return value register `a0`. > > One way to reason this out: This line of code only executes if we jump to the Pass label. The only line of code that is unique to the case where the check passes is putting `1` in the return value register, indicating success. Other cleanup lines of code (e.g. function epilogue, return instruction) are common to both the Pass and Fail cases, so they are more suited to be in the End label. > > **Line 18**: If the check passes, we don't want to execute the code under the Fail case, so we should jump to the End label. Like in Line 15, this is not a function call, and we don’t care about saving our return value, so a simple jump instruction is sufficient. > > **Line 20**: If the check fails, we should load 0 into the return value register `a0`. The reasoning from Line 17 also applies to solving this blank. > > **Line 22**: At the end of the function, we need to undo any setup we did at the beginning of the function. In particular, we should write a function epilogue to restore all the saved register values on the stack. > > Since we stored the original value of `ra` 20 bytes above the stack pointer in the prologue, we should load from that same location in memory to restore `ra`. > > **Line 23**: As part of the function epilogue, we should also increment the stack pointer back up to delete the space we allocated earlier in the function. In Line 2, we decremented the stack pointer by `24`, so we should now increment the stack pointer by `24`. > > **Line 24**: Finally, we need to return from the function. --- ## Problem `D` In order to enhance their models in [machine learning](https://en.wikipedia.org/wiki/Machine_learning), some data scientists add random noise to a training dataset. Here, we will take a dataset and turn it into RISC-V data that can be used. The function `jitter` is defined as follows: * Inputs: – `a0` holds a pointer to an integer array. – `a1` holds a buffer large enough for n integers (which does not overlap with the array in `a0`). – `a2` holds `n`, the length of the arrays. * Output: – `a0` holds a pointer to the buffer originally in a1. The buffer is filled with the result of calling `noisen` on each element in the `a0` array. The function `noisen` is defined as follows: * Input: `a0` holds an integer. * Output: `a0` returns the integer with noise added. The implementation of `jitter` is provided below, following calling convention: ```c= jitter: # BEGIN PROLOGUE addi sp, sp, -20 # We need to make space for 5 registers on the stack, # and each register is 4 bytes long. # (multiple lines omitted) # END PROLOGUE mv s0, a0 mv s1, a1 # Hold beginning of output arr mv s2, a1 mv s3, a2 # Hold counter loop: beq s3, x0, end lw a0, 0(s0) jal ra, noisen sw a0, 0(s1) addi s0, s0, 4 addi s1, s1, 4 addi s3, s3, -1 j loop end: mv a0, s2 # BEGIN EPILOGUE # (multiple lines omitted) addi sp, sp, 20 # END EPILOGUE ret ``` Part 1: List all registers that we need to save on the stack in order to follow calling convention. __ D01 __ > * D01 = ? > ==`s0`, `s1`, `s2`, `s3`, `ra`== (in any order) > > The function modifies s0, s1, s2, s3. According to calling convention, the function must leave saved registers unchanged, so we have to save their values at the start of the function, and restore their values at the end of the function. > > Also, line 15 of the function (`jal ra, noisen`) modifies `ra`. According to calling convention, `ra` must stay unchanged throughout the function (so that we know what address/instruction to return to after the end of the function), so we also need to save its value at the start of the function, and restore its value at the end of the function. Part 2: Now, we want to implement `noisen` to add some random offset to an integer `a0`. Unfortunately, we only have access to the `randomBool` function, which takes in no inputs and randomly returns either `1` or `0` in `a0`. If we implemented `noisen` to return `a0 + randomBool()`, the integer would always get shifted in the positive direction. Instead, we suggest implementing `noisen` so that noisen alternates between returning `a0 + randomBool()` and returning `a0 - randomBool()`. Fill in the blanks to complete the above suggested implementation of `noisen`. Assume that **you can read from and write to any memory addresses, in any section of memory**. ```c= noisen: addi sp, sp, -8 # Prologue sw ra, 0(sp) sw s0, 4(sp) mv s0, a0 jal ra, randomBool add s0, s0, a0 D02 # Your code here xori t0, t0, 64 D03 # Your code here mv a0, s0 # Epilogue lw ra, 0(sp) lw s0, 4(sp) addi sp, sp, 8 ret ``` > * D02 = ? > ==`lb t0, 3(ra)`== > * D03 = ? > ==`sb t0, 3(ra)`== > > This was an extremely difficult question, so don't worry if you weren't sure how to approach this question. This question relies on the concept of modifying unprotected instruction memory, to create [self-modifying code](https://en.wikipedia.org/wiki/Self-modifying_code). The first thing to notice is that line 7 is currently set to add the random bool to the data; to alternate to a subtraction, we would need to change this instruction to a sub instruction. The second thing to notice is that `add` and `sub` in RV32 is one bit off from each other, given usage of the same registers. This occurs in the funct7 with add using `000 0000` and sub using `010 0000` (this was by design, as add and subtract are so close in function). > > Thus, our general approach can be described as follows: > * Load the add/sub instruction into a temporary register > * Flip the correct bit in the func7 to change the instruction into a `sub`/`add` instruction > * Store the instruction back in its original spot > > Two major challenges arise from this approach: Finding a pointer to that instruction, and flipping the correct bit. > > In order to load the `add`/`sub` instruction, we need to find a pointer to the code; further, we do not have sufficient lines to run a la instruction, so this address must already be stored in a current register. One insight here is that the ra register is currently pointing to a location in the code; more precisely, we called `jal ra randomBool` on line 6. This guarantees that `ra` was set to point to line 7, which is incidentally exactly the line we need to load. > > Note that this could also work even if there were extra lines between the function call and the add/sub, using a nontrivial offset. > > The second concern is to determine how to flip the correct bit. Note that add/sub differ in the bit at the 30th position of our instruction. If we attempt to load the entire instruction and flip the bit with a single `xori`, we would require an immediate of `0x4000 0000`, which is greater than our maximum immediate of `2047`. Thus, we instead need to load part of the instruction, using a `lh` or `lb`. If we use `lh` to load the top half of the instruction, we would have to modify the 14th bit position, yielding an immediate of `0x4000` = 16384; thus, we must use a `lb` instruction to load the top byte of the instruction. This would allow us to modify the 6th bit position, which corresponds to an immediate of `0x40` = 64. > > Thus, the final solution is as follows: > * Load the top byte of the add/sub instruction into a temporary register, using the ra as a code pointer: `lb t0, 3(ra)` > * Flip the correct bit in the func7 to change the instruction into a sub/add instruction: `xori t0, t0, 64` > * Store the instruction back in its original spot: `sb t0, 3(ra)` --- ## Problem `E` The following operations should be implemented in RISC-V 32-bit integer assembly. Any extensions (such as `mul` and the floating point or vector extensions) are NOT permitted. Per blank, just one line of RISC-V code may be entered. - [ ] `float_lt` * Inputs: `a0` and `a1` are positive non-NaN IEEE-754 32-bit floating point numbers. * Output: `a0` returns` 1` if `a0 < a1` and `0` otherwise. * Example: If the inputs `a0` and `a1` were `1.5` and `1.75`, the expected output would be `1`. If `a0` and `a1` were `1.5` and `1.5` respectively, the expected output would be `0`. ```c float_lt: E01 # Your code here ret ``` > * E01 = ? > ==`slt a0, a0, a1`== OR ==`sltu a0, a0, a1`== - [ ] `skipline` * Inputs: None * Output: None * Effect: Skip the next assembly instruction in the caller function. We are only using the 32-bit RISC-V ISA (no 16 bit extension) You may assume that the next line of code exists, and is not a pseudoinstruction. * Example: Assume that the following code was run: ``` addi t0, x0, 5 jal skipline addi t0, t0, 300 addi t0, t0, 10 ``` Then at the conclusion of this code, `t0` would be equal to `15`. ```c skipline: E02 # Your code here ret ``` > * E02 = ? > ==`addi ra, ra, 4`==