--- tags: computer-arch --- # Quiz2 of Computer Architecture (2021 Fall) > Solutions :::info :information_source: General Information * You are allowed to read **[lecture materials](http://wiki.csie.ncku.edu.tw/arch/schedule)**. * That is, an open book exam. * You shall not disclose your answer during the quiz. * Each answer has `3` points. * For RISC-V assembly, you CANNOT write any pseudo instructions. * Of course, you should answer everything **in English** except your formal name. * :timer_clock: 09:10 ~ 10:20AM on Oct 26, 2021 ::: ## Problem `A` We would like to implement an operation to reverse a [linked list](https://en.wikipedia.org/wiki/Linked_list) in memory, where the operand to this operation is the memory address of a pointer to the first node in the linked list (a pointer to a pointer). This operation has no destination register, but the operation zeroes the register specified by the operand upon completion (it does not preserve the operand). Every node in the linked list has the following structure. ```cpp struct node { struct node *next; void *value; } ``` Assume that pointers are 32 bits wide in this architecture. The next pointer is either the memory address of the next node in the list or is equal to 0 (`NULL`) to indicate the end of the linked list. The equivalent C code for this operation is provided below. ```cpp void reverse(struct node **head) { struct node *prev = NULL, *curr = *head; while (curr) { struct node *next = curr->next; curr->next = prev; prev = curr; curr = next; } } ``` The RISC-V assembly implementation is shown as following: ```cpp # head is passed in a0 # t0 holds prev # t1 holds next lw a0, 0(a0) beqz __A01__ addi t0, t0, 0 loop: lw t1, 0(a0) sw t0, 0(a0) addi t0, a0, 0 addi a0, t1, 0 __A02__ done: ``` Please complete the unfinished assembly to make the above work as the C counterpart does. You don't have to manipulate the RISC-V function calls for this operation. > * A01 = ? ==a0, done== > * A02 = ? ==bnez t1, loop== --- ## Problem `B` Consider the following C code: ```cpp unsigned multiply(unsigned x, unsigned y) { unsigned ret = 0; for (; y != 0; y--) ret += x; return ret; } ``` We can translate into the optimized RISC-V assembly, so that fewer dynamic instructions are executed for large values of `x` and `y`. A simple optimization is [loop unrolling](https://en.wikipedia.org/wiki/Loop_unrolling). With some careful bookkeeping, this can reduce the number of additions. This implementation keeps `x + x` as a temporary value to cut the number of additions approximately in half. The following is more efficient than simply translating from C. ```cpp addi x3, x0, 0 # result = 0 andi __B01__ # x? = y – (y % 2) add x7, x1, x1 # x7 = x + x beq x3, x0, done # y is 0 beq __B02__ # skip ahead if y is even add x3, x0, x1 # set result to x if y is odd loop: add __B03__ # result = result + 2x addi x6, x6, -2 # x6 = x6 - 2 bge x6, x0, loop # repeat if x6 != 0 done: ``` > * B01 = ? ==x6, x2, -1== :::spoiler # x6 = y – (y % 2) ::: > * B02 = ? ==x6, x2, loop== > * B03 = ? ==x3, x3, x7== --- ## Problem `C` The function `strtriple` receives as input a null-terminated string, and writes a string where every letter is repeated three times. For example, if we called stringtriple on the input string ``"Hello World!"``, we would get the output string ``"HHHeeellllllooo WWWooorrrlllddd!!!"``. More specifically: `strtriple` has the following inputs: `a0` stores a null-terminated string. It is guaranteed that the string is well-formatted. Let the string have strlen of `n`. `a1` contains a pointer to a buffer of length at least $3n + 1$ characters, potentially containing garbage data. You may assume that the buffer does not overlap with the string stored in `a0`. `strtriple` outputs the following: `a0` should return the pointer to the buffer we initially received in `a1`. The buffer should contain the string originally provided in `a0`, with every character tripled. You must properly null-terminate the string, and for full credit must NOT replicate the null terminator at the end. For full credit, you must follow proper calling convention. ```cpp strtriple: mv t2, a1 loop: lbu __C01__ beq t0, x0, end slli __C02__ add t1, t1, t0 sh t1, 0(a1) sb __C03__ addi a0, a0, 1 __C04__ j loop end: sb x0, 0(a1) __C05__ jr ra ``` Please complete the unfinished assembly to make the above work as expected. > * C01 = ? ==t0, 0(a0)== > * C02 = ? ==t1, t0, 8== > * C03 = ? ==t0, 2(a1)== > * C04 = ? ==addi a1, a1, 3== > * C05 = ? ==addi a0, t2, 0== OR ==xori a0, t2, 0== --- ## Problem `D` Consider the following [Skip List](https://en.wikipedia.org/wiki/Skip_list) implementation: ```cpp struct SkipListNode { void *data; struct SkipListNode **next; } ``` Its `find` operation is shown below: ```cpp int SL_find(void *find, int level, struct SkipListNode *sl, (int)(*cmp)(void *, void *) { if (cmp(find, sl->data) == 0) return 1; if (level == 0 && sl->next[level] == NULL) return 0; if (sl->next[level] == NULL) return SL_find(find, level - 1, sl, cmp); if (cmp(find, sl->next[level]->data)) > 0)) return SL_find(find, level - 1, sl, cmp); return SL_find(find, level, sl->next[level], cmp); } ``` In translating the code `cmp(find, sl->next[level]->data))` into RISC-V, we need to store arguments on the stack. So we will have `find` at `sp(0)`, `level` at `sp(4)`, `sl` at `sp(8)` and `cmp` at `sp(12)`. We are going to break up the translation into pieces. Please write down the RISC-V assembly accordingly. 1. Load find into `a0` > * D01 = ? ==lw a0, sp(0)== 2. The next thing we need to do is get `sl->next[level]->data` into `a1`. The RISC-V code for that would be: (Hint: Load `sl` into `t0`) > * D02 = ? ==lw t0, sp(8)== 3. Load `level` into `t1` > * D03 = ? ==lw t1, sp(4)== 4. load `sl->next` into `t0` > * D04 = ? ==lw t0, t0(4)== 5. load `sl->next[level]` into `t0` ```cpp slli __D05__ add t0, t0, t1 lw t0, t0(4) ``` > * D05 = ? ==t1, t1, 4== 6. load data into `a0` > * D06 = ? ==lw a0, t0(0)== 7. Finally, how do we call `cmp` (using `t0` as a temporary): ```cpp lw __D07__ __D08__ ``` > * D07 = ? ==t0, 12(sp)== > * D08 = ? ==jalr ra, t0== --- ## Problem `E` You are supposed to implement RISC-V code per requested as less instructions as possible. Follow calling convention, use register mnemonic names (e.g., refer to `t0` rather than `x6`), and add commas and a single space between registers/arguments (e.g. `addi a0, a1, 2`). If you do not follow this, you may be misgraded. 1. Implement less-than for floating-point numbers. * Inputs: `a0` and `a1` are positive non-NaN IEEE-754 32-bit floating point numbers. * Output: `a0` returns `1` if `a0 < a1` and `0` otherwise. * Example: If the inputs `a0` and `a1` were `1.2` and `1.25`, the expected output would be `1`. If `a0` and `a1` were `1.1` and `1.1` respectively, the expected output would be `0`. ```cpp fp_less_than: __E01__ ret ``` > * E01 = ? ==slt a0, a0, a1== OR ==sltu a0, a0, a1== 2. Find the end position of [C-style string](https://en.wikipedia.org/wiki/C_string_handling). * Inputs: `a0` is a pointer to a nonempty string * Output: `a0` returns a pointer immediately after the end of the string. * Example: Let `a0` be the pointer `0x10000000` to string s, which is the string `"Hello"`. Then the expected output would be `0x10000006`. * NOTE: You may NOTE use `strlen` or any other library function. ```cpp end_of_str: lb t0, 0(a0) __E02__ __E03__ ``` > * E02 = ? ==addi a0, a0, 1== > * E03 = ? ==bneq t0, x0, end_of_str== OR ==bneq x0, t0, end_of_str== OR ==bnez t0, end_of_str== 3. Find the length of a null-terminated string in bytes. The function should accept a pointer to a null-terminated string and return an integer. Your solution must be recursive. ```cpp strlen: __E04__ beq t0, zero, basecase addi sp, sp, -4 __E05__ __E06__ jal strlen addi a0, a0, 1 __E07__ __E08__ ret basecase: addi a0, zero, 0 ret ``` > * E04 = ? ==lb t0, 0(a0)== > * E05 = ? ==sw ra, 0(sp)== > * E06 = ? ==addi a0, a0, 1== > * E07 = ? ==lw ra, 0(sp)== > * E08 = ? ==addi sp, sp, 4== 4. Arithmetically negate a [Two's Complement](https://en.wikipedia.org/wiki/Two%27s_complement) 32-bit integer without using the `sub`, `mul` or pseudo instructions. ```cpp negate: __E09__ addi a0, a0, 1 ret ``` > * E09 = ? ==xori a0, a0, -1== 5. In this question, you will implement a simple recursive function in RISC-V. The function takes a decimal number as input, then outputs its binary representation encoded in the decimal digits. ```cpp int find_binary(unsigned int decimal) { if (decimal == 0) return 0; return decimal % 2 + 10 * findBinary(decimal / 2); } ``` For example, if the input to this function is `10`, then the output would be `1010`. The equivalent RISC-V code: ```cpp find_binary: addi sp, sp, -8 # a0 will have arg and be where we return sw ra, 4(sp) # saving return address on the stack sw s0, 0(sp) # saving "decimal % 2" on the stack __E10__ # we will just return 0 andi s0, a0, 1 # set s0 to a0 % 2 __E11__ # set a0 to a0 / 2 jal ra, find_binary # recursive call li t0, 10 __E12__ add a0, a0, s0 # accumulating the stored return value "decimal % 2" postamble: lw ra, 4(sp) # Restore ra __E13__ # restore s0 __E14__ # restore sp end: jr ra ``` Hint: * The recursive part of the function stores all of the first part of the return value “decimal % 2” on the stack * The second part of the function and postamble are combing all the return values by `+` and `10 *` > * E10 = ? ==beq a0, x0, postamble== > * E11 = ? ==srli a0, a0, 1== > * E12 = ? ==mul a0, t0, a0== > * E13 = ? ==lw s0, 0(sp)== > * E14 = ? ==addi sp, sp, 8== 6. Implement the function [strncpy](https://man7.org/linux/man-pages/man3/strncpy.3p.html) which takes in two `char *` arguments and copies up to the first `n` characters from source into destination. If it reaches a null terminator, then it copies that value into destination and stops copying in characters. If there is no null terminator among the first n characters of source, the string placed in destination will not be null-terminated. `strncpy` returns a pointer to the destination string. You may not store anything on the stack for this problem. > `char* strncpy(char *destination, char *source, unsigned int n);` ```cpp strncpy: add t0, x0, x0 # Current length loop: __E15__ # Check if we have iterated over the length of the string add t1, a1, t0 # calculate the address of the t0-th (count-th) char in the string lb t2, 0(t1) # load this char __E16__ # calculate the same address, but off of the dst string sb t2, 0(t1) # store into the dst string addi t0, t0, 1 # increment our counter bne t1, x0, loop # check if the loaded char was null end: jr ra ``` > * E15 = ? ==beq t0, a2, end== > * E16 = ? ==add t1, a0, t0== --- ## Problem `F` Evaluate the RISC-V code. Write down the specific value in HEX. Reminder: always include the prefix in your answer. - [ ] Case 1 ```cpp auipc t0, 0xABCDE # Assume this instruction is at 0x100 addi t0, t0, 0xABC ``` Write down the value of `t0` in HEX. > * F01 = ? ==0xABCDDBBC== - [ ] Case 2 ```cpp li t0, 0xABCDEFAD sw t0, 0(s0) lb t0, 0(s0) ``` Write down the value of `t0` in hex. Assume big-endianness. > * F02 = ? ==0xFFFFFFAB== --- ## Problem `G` [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) is a very clever datastructure for efficiently and probabilistically storing a set. Typically, it has two functions: check and insert. The basic idea for checking is that you hash what you are looking for multiple times. Each hash tells you a particular bit you need to set or check. So for checking you see if the bit is set. You repeat this for multiple iteration, with the hash including the iteration count (so each hash is different). If not all bits are set then the element does not exist in the bloom filter. If all bits are set then the element PROBABLY exists in the bloom filter. Similarly, for setting an element as present in a bloom filter you just set all those bits to `1`. We define the following structure. ```cpp #include <stdint.h> struct BloomFilter { uint32_t size; /* Size is # of bits, NOT BYTES, in the bloom filter */ uint16_t itercount; uint64_t (*)(void *data, uint16_t iter) hash; uint8_t *data; }; ``` For the insertion function, we need to set the appropriate bit for each iteration. ```cpp void insert(struct BloomFilter *b, void *element) { uint64_t bitnum; /* which bit we need to set */ for (int i = 0; i < b->itercount; ++i) { bitnum = b->hash(element, (uint16_t) i) % b->size; b->data[bitnum >> 3] = b->data[bitnum >> 3] | (1 << (bitnum & 0x7)); } } ``` We also have the following function to allocate a new bloom filter ```cpp struct BloomFilter *alloc(uint64_t (*)(void *data, uint16_t iter) hash, uint32_t size, uint16_t itercount) { struct BloomFilter *ret = malloc(64); ret->size = size; ret->data = calloc(size >> 3, 1); ret->hash = hash; ret->itercount = itercount; } ``` Complete the RISC-V translation necessary to allocate this: We will put `ret` in `s0`. ```cpp alloc: # Prolog addi sp, sp, __G01__ sw ra, 0(sp) sw s0, 4(sp) sw a0, 8(sp) sw a1, 12(sp) sw a2, 16(sp) # body addi a0, x0, 64 jal malloc mv s0, a0 # put ret in s0 __G02__ # load size into t0 __G03__ # store it __G04__ # div size by 8 with a shift li a1, 1 jal calloc sw a0, 12(s0) # store data __G05__ # load hash to t0 __G06__ # store it: Use the right type __G07__ # load itercount to t0 sh t0, 4(s0) # store it mv a0, s0 # epilog lw ra, 0(sp) lw s0, 4(sp) addi sp, sp, 20 jr ra ``` > * G01 = ? ==-20== > * G02 = ? ==lw t0, 12(sp)== > * G03 = ? ==sw t0, 0(s0)== > * G04 = ? ==srli a0, t0, 3== > * G05 = ? ==lw t0, 8(sp)== > * G06 = ? ==sw t0, 8(s0)== > * G07 = ? ==lw t0, 16(sp)== --- ## Problem `H` Please complete the translation by filling in the lines below with RISC-V assembly. The prologue and epilogue have been written correctly but are not shown. `strlen()`, both as a C function and RISC-V procedure, takes in one string as an argument and returns the length of the string (not including the null terminator). ```cpp /* Returns 1 if s2 is a substring of s1, and 0 otherwise. */ int is_substr(char *s1, char *s2) { int len1 = strlen(s1), len2 = strlen(s2); int offset = len1 - len2; while (offset >= 0) { int i = 0; while (s1[i + offset] == s2[i]) { i += 1 if (s2[i] == '\0') return 1; } offset -= 1; } return 0; } ``` The translated RISC-V Assembly: ```cpp is_substr: mv s1, a0 mv s2, a1 jal ra, strlen mv s3, a0 mv a0, s2 jal ra, strlen sub s3, s3, a0 Outer_Loop: blt s3, x0, False add t0, x0, x0 Inner_Loop: add t1, t0, s3 add t1, s1, t1 lbu t1, 0(t1) __H01__ __H02__ bne t1, t2, __H03__ addi t0, t0, 1 add t2, t0, s2 lbu t2, 0(t2) beq t2, x0, True jal x0, Inner_Loop Update_Offset: addi s3, s3, -1 __H04__ False: xor a0, a0, a0 jal x0, End True: addi a0, x0, 1 End: ``` > * H01 = ? ==add t2, s2, t0== > * H02 = ? ==lbu t2, 0(t2)== > * H03 = ? ==Update_Offset== > * H04 = ? ==jal x0, Outer_Loop== ---