Try   HackMD

Quiz2 of Computer Architecture (2022 Fall)

Solutions

Problem A

Given the head of a linked list and a value x, partition it such that all nodes less than x come before nodes greater than or equal to x.

You should preserve the original relative order of the nodes in each of the two partitions.

Example 1:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Input: head = [1,4,3,2,5,2], x = 3
Output: [1,2,2,4,3,5]

Example 2:
Input: head = [2,1], x = 2
Output: [1,2]

C definition for the above singly-linked list:

typedef struct _node {
    int val;
    struct _node *next;
} node_t;

The function prototype:

node_t *partition(node_t *head, int x);

The corresponding RISC-V implementation is listed as following:

    .text
    .align 2
    .globl partition
partition:
    addi    a5, sp, 12
    sw      a5, 24(sp)
    addi    a5, sp, 20
    sw      a0, 12(sp)
    sw      zero, 20(sp)
    sw      a5, 28(sp)
L1:
    bne     a0, zero, L4
    lw      a5, 24(sp)
    lw      a4, 20(sp)
    sw      a4, 0(a5)
    lw      a5, 28(sp)
    lw      a0, 12(sp)
    sw      zero, 0(a5)
    addi    sp, sp, 32
    jr      ra
L4:
    lw      a5, 0(a0)
    bge     a5, a1, L2
    lw      a4, 24(sp)
    addi    a5, sp, 24
L3:
    sw      a0, 0(a4)
    addi    a4, a0, 4
    sw      a4, 0(a5)
    lw      a0, 4(a0)
    j       L1
L2:
    lw      a4, 28(sp)
    addi    a5, sp, 28
    j       L3

Let's reverse-engineer to get the RISC-V assembly back to the below C code:

node_t *partition(node_t *head, int x)
{
    node_t *L = NULL, **p1 = &head, **p2 = &L;

    for (node_t *node = head; node; node = node->next) {
        node_t ***indir = node->val >= x ?
            (A01 /* fill your code */) : (A02 /* fill your code */);
        **indir = node;
        *indir = A03 /* fill your code */;
    }

    *p1 = L, *p2 = NULL;
    return head;
}

Obviously, the above C code listing was incomplete, you shall provide the functioned implementations. A01, A02, and A03 are C expressions. You must obey the following rules when filling them:

  • Write shorter code as possible as you can.
  • Follow the consistent coding style. That is, we prefer node_t *L to node_t* L. Be aware of the spaces! Details determine success or failure.
  • A01 = ?
    &p2
  • A02 = ?
    &p1
  • A03 = ?
    &(**indir)->next
    *indir = &node->next; is another feasible answer aside from *indir = &(**indr)->next;.

This is an example. At this moment, the program is at the beginning of iteration body.

Green square means pointer to pointer to pointer to node_t. Blue square means pointer to pointer to node_t. Yellow square means pointer to node_t.

After node_t ***indir = node->val >= x ? &p2 : &p1; is executed, the indir will point to p1, because node->value is less than x.(4<6)

Since **indir = node; has the same effect as *p = node in this scenario, after it is executed, the next pointed by p1 will point to node.

In the solutions, next instruction is *indir = &(**indir)->next;, which has the same effect as p1 = &node->next in this scenario. It makes p1 points to next field of node.

After *p1 = L, *p2 = NULL; is executed, two list will be connected.


Problem B

Consider the following Python class:

class Vector:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def transform(self, f):
        return Vector(f(self.x), f(self.y))

This code has to be converted to C code below.

#include <stdlib.h>

typedef struct vector { int x, y; } vector_t;

vector_t *transform(vector_t *self, int (*f)(int))
{
    vector_t *nv = malloc(sizeof(vector_t));
    nv->x = f(self->x), nv->y = f(self->y);
    return nv;
}

Translate the transform function to RISC-V. The function takes inputs self in a0 and f in a1, and returns output in a0. You may assume that vector_t is as defined in the C code. You may also assume that you have access to malloc, and that malloc and f each receive their argument in a0, and return their result in a0. Your solution MUST fit within the lines provided.

transform:
    addi sp, sp, B01
    B02
    B03
    B04
    B05
    mv s0, a0
    mv s1, a1
    li a0, 8
    jal ra, malloc
    mv s2, a0
    lw a0, 0(s0)
    B06
    sw a0, 0(s2)
    lw a0, 4(s0)
    B07
    sw a0, 4(s2)
    mv a0, s2
    B08
    B09
    B10
    B11
    addi sp, sp, B12
    ret

Both B06 and B07 MUST use jalr instruction. You MUST use ABI name for accessing registers defined in RISC-V Calling Convention. i.e, use ra instead of x1 while it appears in jalr instruction.

  • B01 = ?
    -16
  • B02 = ?
    sw ra, 0(sp)
  • B03 = ?
    sw s0, 4(sp)
  • B04 = ?
    sw s1, 8(sp)
  • B05 = ?
    sw s2, 12(sp)
  • B06 = ?
    jalr ra, s1, 0
  • B07 = ?
    jalr ra, s1, 0
  • B08 = ?
    lw ra, 0(sp)
  • B09 = ?
    lw s0, 4(sp)
  • B10 = ?
    lw s1, 8(sp)
  • B11 = ?
    lw s2, 12(sp)
  • B12 = ?
    16

Problem C

The stack can hold local variables in addition to the ra register and the s registers. The following labels that were externally specified are available to you:

  • password: a pointer to a statically-stored string "secretpass"
  • get_chars: A function defined as follows:
    • Input: a0 is a pointer to a buffer
    • Effect: Reads characters from stdin, and fills the buffer pointed to by a0 with the read data, null-terminating the string. Your code may assume that the input is at most 19 characters, not including the null-terminator.
    • Output: None

The function verify_password is defined as follows:

  • Input: No register input; however, the function receives a string input from stdin.
  • Output: a0 returns 1 if the input from stdin is exactly "secretpass", and 0 otherwise.

Complete the function verify_password. Each line contains exactly one instruction or pseudoinstruction.

verify_password: addi sp, sp, -24 # Make space for a 20-byte buffer C01 # Your code here mv a0, sp jal ra, get_chars la t0, password mv t1, sp Loop: lb t2, 0(t0) lb t3, 0(t1) C02 # Your code here C03 # Your code here addi t0, t0, C04 # Your code here addi t1, t1, C05 # Your code here j Loop Pass: li a0, C06 # Your code here j End Fail: li a0, C07 # Your code here End: C08 # Your code here C09 # Your code here jr ra

You MUST use ABI name for accessing registers defined in RISC-V Calling Convention. i.e, use ra instead of x1 while it appears in jalr instruction.

  • C01 = ?
    sw ra, 20(sp)
  • C02 = ?
    bne t2, t3, Fail OR bne t3, t2, Fail
  • C03 = ?
    beq t2, x0, Pass OR beq t2, zero, Pass OR beq t3, x0, Pass OR beq t3, zero, Pass
  • C04 = ?
    1
  • C05 = ?
    1
  • C06 = ?
    1
  • C07 = ?
    0
  • C08 = ?
    lw ra, 20(sp)
  • C09 = ?
    addi sp, sp, 24

Line 3: First, we note that Line 2 allocated 24 bytes of space on the stack. Line 3 is storing a 4-byte register value to the stack space that was just allocated. This looks a lot like the function prologues that we've seen before in projects and labs!

A quick refresher on function prologues and epilogues: according to calling convention, there are some callee-saved registers whose original values must be preserved after the function returns.
To achieve this, we usually store the original register values on the stack at the beginning of the function (the prologue), and we restore the original register values at the end of the function (the epilogue).

In the provided code, there is only one callee-saved register already being used: ra at line 5. All the other registers being used are t registers which do not need to be saved.

Thus we should save the original value of ra on the stack so that we can restore it later.

Line 4: First, we note that Line 5 is calling the get_chars function. Before we call a function, we need to make sure that its arguments are properly loaded into the argument (a) registers.
According to the question, the get_chars function takes in one argument, in the a0 register.
Thus, this line is probably going to involve putting something into the a0 register.

According to the question, the argument in a0 is a pointer to a buffer in memory, which the function will fill up with (at most 19) input characters. In other words, the get_chars function requires 20 bytes of space in memory to store the input characters, so a0 needs to store the address of some 20-byte space in memory where we can store a string. Where do we have 20 bytes of space in memory right now? We allocated 24 bytes on the stack in line 2, and we used 4 of those bytes in line 3, so we have 20 bytes on the stack that we can use!

What is the address of this space on the stack? Remember that the stack pointer sp always contains the address of the bottom of the stack. Since the stack grows down to make space (see line 2), the 20 bytes of space is located at the bottom of the stack, and the address of this space is the address stored in sp. Note that in Line 3, we stored the saved ra register value at the top of the 24-byte space on the stack, so the 20-byte buffer starts at the very bottom of the stack, where sp is pointing.

Line 6: Lines 6 and 7 initialize t0 and t1, which we can see will be used repeatedly in the loop.
We initialize t0 to be the address of the start of the Password string, and we initialize t1 to be the address of the start of the user’s input string. This will let us compare each character of the strings in a loop later.

One way to reason this out: this line is putting a value into t0. We see at line 9 that t0 will be used as an address (because it's in parentheses). la is an instruction that loads addresses into a register.

Another way to reason this out is to note that the Password label is involved in this instruction, so we must use an instruction that uses a label. Branches and jumps don't make sense here because we aren’t implementing any loops or if/else conditions yet, so by process of elimination, we should
use la here.

Line 7: One way to reason this out is to note that only two registers are provided, so we probably want a pseudoinstruction that uses only two registers. mv is the only logical pseudoinstruction to use here.

Lines 9-10: We want to use the loop to compare each character of the input string with each
character of the provided password string. In lines 6-7, we initialized t0 and t1 with the addresses
of the strings. Now we need to dereference those addresses to load each character of the string
from memory. Since each character is one byte long, lb is the appropriate load instruction to use here.

One way to reason this out is to note that the 0(t0) syntax is used only in load and store instructions.
The addresses in t0 and t1 already contain data (the hardcoded password string and the input
string, respectively), so we don't want to store to those addresses. This means we must be loading
from those addresses.

Line 11: Now that we’ve loaded a character from each string from memory, we need to check if those characters match. If they don't match, we can stop checking characters in a loop and jump to the fail case.

Line 12: If we've reached the end of the string without branching to the Fail case, then we've checked that every character is equal, and we should enter the Pass case.

Remember that strings end with a null terminator, so if either loaded character is 0 (null byte), then we know that we reached the end of both strings. Note that Line 11 will check for us that both strings have reached the null terminator, since it checks that every character is equal. This is why the Pass branch must come after the Fail branch–if they were swapped, then this function would jump to Pass whenever the shorter string terminates, even if the longer string still has extra characters left.

Lines 13-14: Once we finish checking a pair of characters, we need to increment the addresses to the next character. The next time the loop runs, the load instructions will now load the next character in the string.

Recall that characters are 1 byte, and RISC-V indexes memory by bytes, so we should increment the addresses in t0 and t1 by 1.

Line 15: If we didn't jump to the Fail or Pass case, after we increment the addresses, we need to
keep executing the loop and check the next character. To do this, we should jump back to the start of the loop. This is not a function call, and we don’t care about saving our return value, so a simple jump instruction is sufficient.

There is an alternate solution here that combines Line 12 (pass check) and Line 15 (jumping to loop). As in the standard Line 12, we check if t2 (or t3, which must be equal at this point) is equal to 0 (the null byte). If so, we’ve reached the end of the string, so we should continue execution normally to the Pass case immediately following the loop. If not, then we need to loop again, so we jump back to the start of the loop. This logic is captured in bne t2, x0, Loop. With this alternate solution, Line 12 can be blank (or any innocuous instruction that doesn’t affect program execution, e.g. add t6, x0, x0).

Line 17: If the check passes, we should load 1 into the return value register a0.

One way to reason this out: This line of code only executes if we jump to the Pass label. The only line of code that is unique to the case where the check passes is putting 1 in the return value register, indicating success. Other cleanup lines of code (e.g. function epilogue, return instruction) are common to both the Pass and Fail cases, so they are more suited to be in the End label.

Line 18: If the check passes, we don't want to execute the code under the Fail case, so we should jump to the End label. Like in Line 15, this is not a function call, and we don’t care about saving our return value, so a simple jump instruction is sufficient.

Line 20: If the check fails, we should load 0 into the return value register a0. The reasoning from Line 17 also applies to solving this blank.

Line 22: At the end of the function, we need to undo any setup we did at the beginning of the function. In particular, we should write a function epilogue to restore all the saved register values on the stack.

Since we stored the original value of ra 20 bytes above the stack pointer in the prologue, we should load from that same location in memory to restore ra.

Line 23: As part of the function epilogue, we should also increment the stack pointer back up to delete the space we allocated earlier in the function.
In Line 2, we decremented the stack pointer by 24, so we should now increment the stack pointer by 24.

Line 24: Finally, we need to return from the function.


Problem D

In order to enhance their models in machine learning, some data scientists add random noise to a training dataset. Here, we will take a dataset and turn it into RISC-V data that can be used.

The function jitter is defined as follows:

  • Inputs:
    a0 holds a pointer to an integer array.
    a1 holds a buffer large enough for n integers (which does not overlap with the array in a0).
    a2 holds n, the length of the arrays.
  • Output:
    a0 holds a pointer to the buffer originally in a1. The buffer is filled with the result of calling noisen on each element in the a0 array.

The function noisen is defined as follows:

  • Input: a0 holds an integer.
  • Output: a0 returns the integer with noise added.

The implementation of jitter is provided below, following calling convention:

jitter: # BEGIN PROLOGUE addi sp, sp, -20 # We need to make space for 5 registers on the stack, # and each register is 4 bytes long. # (multiple lines omitted) # END PROLOGUE mv s0, a0 mv s1, a1 # Hold beginning of output arr mv s2, a1 mv s3, a2 # Hold counter loop: beq s3, x0, end lw a0, 0(s0) jal ra, noisen sw a0, 0(s1) addi s0, s0, 4 addi s1, s1, 4 addi s3, s3, -1 j loop end: mv a0, s2 # BEGIN EPILOGUE # (multiple lines omitted) addi sp, sp, 20 # END EPILOGUE ret

Part 1: List all registers that we need to save on the stack in order to follow calling convention. __ D01 __

  • D01 = ?
    s0, s1, s2, s3, ra (in any order)

The function modifies s0, s1, s2, s3. According to calling convention, the function must leave saved registers unchanged, so we have to save their values at the start of the function, and restore their values at the end of the function.

Also, line 15 of the function (jal ra, noisen) modifies ra. According to calling convention, ra must stay unchanged throughout the function (so that we know what address/instruction to return to after the end of the function), so we also need to save its value at the start of the function, and restore its value at the end of the function.

Part 2: Now, we want to implement noisen to add some random offset to an integer a0. Unfortunately, we only have access to the randomBool function, which takes in no inputs and randomly returns either 1 or 0 in a0. If we implemented noisen to return a0 + randomBool(), the integer would always get shifted in the positive direction. Instead, we suggest implementing noisen so that noisen alternates between returning a0 + randomBool() and returning a0 - randomBool().
Fill in the blanks to complete the above suggested implementation of noisen. Assume that you can read from and write to any memory addresses, in any section of memory.

noisen: addi sp, sp, -8 # Prologue sw ra, 0(sp) sw s0, 4(sp) mv s0, a0 jal ra, randomBool add s0, s0, a0 D02 # Your code here xori t0, t0, 64 D03 # Your code here mv a0, s0 # Epilogue lw ra, 0(sp) lw s0, 4(sp) addi sp, sp, 8 ret
  • D02 = ?
    lb t0, 3(ra)
  • D03 = ?
    sb t0, 3(ra)

This was an extremely difficult question, so don't worry if you weren't sure how to approach this question. This question relies on the concept of modifying unprotected instruction memory, to create self-modifying code. The first thing to notice is that line 7 is currently set to add the random bool to the data; to alternate to a subtraction, we would need to change this instruction to a sub instruction. The second thing to notice is that add and sub in RV32 is one bit off from each other, given usage of the same registers. This occurs in the funct7 with add using 000 0000 and sub using 010 0000 (this was by design, as add and subtract are so close in function).

Thus, our general approach can be described as follows:

  • Load the add/sub instruction into a temporary register
  • Flip the correct bit in the func7 to change the instruction into a sub/add instruction
  • Store the instruction back in its original spot

Two major challenges arise from this approach: Finding a pointer to that instruction, and flipping the correct bit.

In order to load the add/sub instruction, we need to find a pointer to the code; further, we do not have sufficient lines to run a la instruction, so this address must already be stored in a current register. One insight here is that the ra register is currently pointing to a location in the code; more precisely, we called jal ra randomBool on line 6. This guarantees that ra was set to point to line 7, which is incidentally exactly the line we need to load.

Note that this could also work even if there were extra lines between the function call and the add/sub, using a nontrivial offset.

The second concern is to determine how to flip the correct bit. Note that add/sub differ in the bit at the 30th position of our instruction. If we attempt to load the entire instruction and flip the bit with a single xori, we would require an immediate of 0x4000 0000, which is greater than our maximum immediate of 2047. Thus, we instead need to load part of the instruction, using a lh or lb. If we use lh to load the top half of the instruction, we would have to modify the 14th bit position, yielding an immediate of 0x4000 = 16384; thus, we must use a lb instruction to load the top byte of the instruction. This would allow us to modify the 6th bit position, which corresponds to an immediate of 0x40 = 64.

Thus, the final solution is as follows:

  • Load the top byte of the add/sub instruction into a temporary register, using the ra as a code pointer: lb t0, 3(ra)
  • Flip the correct bit in the func7 to change the instruction into a sub/add instruction:
    xori t0, t0, 64
  • Store the instruction back in its original spot: sb t0, 3(ra)

Problem E

The following operations should be implemented in RISC-V 32-bit integer assembly. Any extensions (such as mul and the floating point or vector extensions) are NOT permitted. Per blank, just one line of RISC-V code may be entered.

  • float_lt
  • Inputs: a0 and a1 are positive non-NaN IEEE-754 32-bit floating point numbers.
  • Output: a0 returns 1 if a0 < a1 and 0 otherwise.
  • Example: If the inputs a0 and a1 were 1.5 and 1.75, the expected output would be 1. If a0 and a1 were 1.5 and 1.5 respectively, the expected output would be 0.
float_lt:
    E01 # Your code here
    ret
  • E01 = ?
    slt a0, a0, a1 OR sltu a0, a0, a1
  • skipline
  • Inputs: None
  • Output: None
  • Effect: Skip the next assembly instruction in the caller function. We are only using the 32-bit RISC-V ISA (no 16 bit extension) You may assume that the next line of code exists, and is not a pseudoinstruction.
  • Example: Assume that the following code was run:
    ​​​​addi t0, x0, 5
    ​​​​jal skipline
    ​​​​addi t0, t0, 300
    ​​​​addi t0, t0, 10
    

Then at the conclusion of this code, t0 would be equal to 15.

skipline:
    E02 # Your code here
    ret
  • E02 = ?
    addi ra, ra, 4