Try   HackMD

Computer Architecture Final project: Annotate and explain Quiz2

Problem A

Consider that we plan to deploy a custom RISC-V processor for a space mission, which requires additional data protection in memory. We have decided to implement a Hamming code with even parity to safeguard our data. For this question, please refer to the parity table shown below.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Please note that in this context, bit 0 is considered the least significant bit (LSB).

What Are Bit Flips And How Are Spacecraft Protected From Them?

  • Part 1

Implement a RISC-V function called calc_parity that takes a 4-byte input in a0 and calculates the parity of its bits. It should return 1 for odd parity or 0 for even parity. For example:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

calc_parity must adhere to the RISC-V calling convention.

calc_parity:
    li t1, 0         # t1 holds current parity value in LSB
loop:
    xor t1, t1, a0   # adds the current least significant bit
                     # in a0 to the parity value in t1
    __A01__          # shift to the next bit in a0
    bne a0, x0, loop # loop if there are more than 1 bit left in a0
                     # 0 bits will never affect parity
    __A02__          # we only want the one parity bit
                     # in bit 0 of t1
    jr ra
  • A01 =?
    srli a0, a0, 1
    It is important to note that this is a logical shift, as using an arithmetic shift could potentially lead to an infinite loop if bit 31 is set.
  • A02 =?
    andi a0, t1, 1

📝 Explain and Annotate

  • Hamming code:Hamming code is a set of error-correction codes that can be used to detect and correct the errors that can occur when the data is moved or stored from the sender to the receiver.
    Hamming code formula :
    2k>=N+K+1

If we have an 8-bit data: 10101111 and need to encrypt it using Hamming code

1 2 3 4 5 6 7 8 9 10 11 12
1 0 1 0 1 1 1 1

By performing XOR operations using the following table, we obtain:

8 4 2 1
3 0 0 1 1
6 0 1 1 0
9 1 0 0 1
10 1 0 1 0
11 1 0 1 1
12 1 1 0 0
0 0 0 0

The encoded result after encryption is

1 2 3 4 5 6 7 8 9 10 11 12
1 0 1 0 0 1 0 0 1 1 1 1

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

For A01 # shift to the next bit in a0, we need to ensure that each right shift operation fills with 0 instead of 1. Therefore, we use a logical shift (srli a0, a0, 1) to shift the value stored in register a0 to the right. If we were to use an arithmetic shift (srai a0, a0, 1), it would continuously fill with 1, causing an infinite loop。

__A02__ # we only want the one parity bit
We only need to retain one bit for identifying the parity, so we directly perform andi a0, t1, 1 and store it in register a0 as the function return value。

easy test for calc_parity

.data 
    num1: .word 0x01
    result: .word 0x00   
    
.text
main:
    la s0, num1        
    lw a0, 0(s0)       
    jal calc_parity     
    la s0, result      
    sw a0, 0(s0)        
    jal end            

calc_parity:
    li t1, 0           
loop:
    xor t1, t1, a0    
    srli a0, a0, 1     
    bnez a0, loop       
    andi a0, t1, 1     
    jr ra             
    
end:
    li a7, 10          
    ecall               

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

  • Part 2
    Implement store_nibble, a RISC-V function that takes four bits of data in a0, encodes them into seven bits (as a byte with 0 in the most significant bit [MSB]), and stores the result at the memory address specified in a1. This function does not return any value. You may assume that calc_parity has been correctly implemented and adheres to the calling convention, but do not assume any specific implementation details of calc_parity. Also, you may assume that the most significant 28 bits of the argument in a0 are set to 0.
store_nibble:
    # prologue omitted
    mv s0, a0
    mv s1, a1
    srli s7, s0, 1      # set d3 through d1
    slli s7, s7, 1      # make space for next bit
    andi a0, s0, 0b1110 # parity of d1, d2, and d3
    jal ra, calc_parity # compute p2
    add s7, s7, a0
    slli s7, s7, 1      # make space for next bit
    andi t0, s0, 1      # extract d0
    add s7, s7, t0
    slli s7, s7, 1      # make space for next bit
    andi a0, s0, 0b1101 # parity of d0, d2, and d3
    jal ra, calc_parity # compute p1
    add s7, s7, a0
    slli s7, s7, 1      # make space for next bit
    andi a0, s0, 0b1011 # parity of d0, d1, and d3
    jal ra, calc_parity # compute p0
    add s7, s7, __A03__
    sb __A04__, __A05__ # s1 contains the memory address passed in as a1
    # epilogue omitted
    jr ra
  • A03=?
    a0
  • A04=?
    s7
  • A05=?
    0(s1)

📝 Explain and Annotate

The a0 register contains 4 bits of input data (d3, d2, d1, d0), and the highest 28 bits are set to 0. The a1 register is the memory address where the result will be stored. The calc_parity function is used to calculate the parity bit for the specified position.

add s7, s7, __A03__
A03 corresponds to the a0 register because we intend to store the result obtained through calc_parity in s7. This is because s7 is a callee-saved register, which means it is used for saving and restoring values during function calls

Modify Assembly for easy test

    ....
main:
    la s0, num1        
    lw a0, 0(s0)        
    la a1, result    
    jal store_nibble 
   
    la a1, end         
    jalr a1   
    store_nibble:
    # Store the original value of a0
    mv t2, a0
    # Calculate p2
    andi a0, t2, 0b0100 
    slli a0, a0, 1     
    jal calc_parity  
    mv t3, a0       
    # Calculate p1
    andi a0, t2, 0b0010 # Mask for bit positions d0, d2
    slli a0, a0, 2      # Shift left to position the bits for parity calculation
    jal calc_parity     # Call calc_parity function
    slli a0, a0, 1      # Shift left to make space for p2
    ....

Observing its pipelineS

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

I suspect that this will cause an infinite loop. I think it might be due to an issue with my test program's registers. I will find some time to retest this and observe its pipeline。

  • Part 3
    List all registers that need to be saved in the prologue and restored in the epilogue in order for store_nibble to follow the calling convention. If there are none, write N/A. __ A06 __
  • A06=?
    s0, s1, s7, ra
    All of these registers are used in the question template and must, therefore, be saved and restored to follow the calling convention. (While ra is not strictly a saved register, store_nibble is calling functions and, therefore, needs to save ra as well.) Any additional saved registers used also need to be saved and restored.
    sp is a special case; even though it is callee-saved, it is not usually saved and restored from the stack. Thus, it was ignored.

📝 Explain and Annotate

  • Prologue/Epilogue

For RISC-V functions, the prologue/epilogue is used to allocate stack space for temporary registers during the execution of a RISC-V function, as illustrated in the following diagram
1705060025680
Establishing the stack address through the prologue/epilogue is crucial. Not only is this a fundamental aspect of constructing functions, but it also ensures that the stack's position and values are correctly preserved, making it easier to debug

  • Callee and Caller Saved

Callee Saved:The callee function can use them freely
(if needed, the caller had to save them before invoking and will restore them afterwards)

Caller Saved:The callee function must save them before modifying them, and restore them before returning (avoid using them at all, and no need to save)

for store_nibble function

store_nibble:
    # prologue omitted
    addi sp,sp ,-16
    sw s0,0(sp)
    sw s1,4(sp)
    sw s7,8(sp)
    sw ra,12(sp)
    mv s0, a0
    mv s1, a1
    srli s7, s0, 1      # set d3 through d1
    slli s7, s7, 1      # make space for next bit
    andi a0, s0, 0b1110 # parity of d1, d2, and d3
    jal ra, calc_parity # compute p2
    add s7, s7, a0
    slli s7, s7, 1      # make space for next bit
    andi t0, s0, 1      # extract d0
    add s7, s7, t0
    slli s7, s7, 1      # make space for next bit
    andi a0, s0, 0b1101 # parity of d0, d2, and d3
    jal ra, calc_parity # compute p1
    add s7, s7, a0
    slli s7, s7, 1      # make space for next bit
    andi a0, s0, 0b1011 # parity of d0, d1, and d3
    jal ra, calc_parity # compute p0
    add s7, s7, a0
    sb s7, 0(s1) # s1 contains the memory address passed in as a1
    #  epilogue omitted
    addi sp,sp ,16
    lw s0,0(sp)
    lw s1,4(sp)
    lw s7,8(sp)
    lw ra,12(sp)
    jr ra

We can determine that the store_nibble function operates with the need to save and restore the following registers: ra, s0, s1, s7. Among these, ra is considered Caller Saved, while the rest, s0, s1, and s7, are Callee Saved. However, sp is a special case, as it serves as a pointer to the stack, and the changes to the stack are sequential. Therefore, there is no need to specifically store and restore memory addresses in the prologue/epilogue since the sp address will be restored upon the function's completion。

.data 
 num1: .word 3
.text

main:   
    la s5,num1
    lw a0,0(s5)       
    jal store_nibble
    jal print_result
    jal exit
    
print_result:
    mv a0,s0 
    li a7,1
    ecall
    jr ra
        
calc_parity:
    li t1, 0         # t1 holds current parity value in LSB
loop:
    xor t1, t1, a0   # adds the current least significant bit
                     # in a0 to the parity value in t1
    srli a0,a0,1         # shift to the next bit in a0
    bne a0, x0, loop # loop if there are more than 1 bit left in a0
                     # 0 bits will never affect parity
    andi a0,t1,1      # we only want the one parity bit
                     # in bit 0 of t1
    jr ra
store_nibble:
    addi sp,sp,-16
    sw   ra,0(sp)
    sw   s0,4(sp)
    sw   s1,8(sp)
    sw   s7,12(sp)
    # prologue omitted
   
    mv s0, a0
    mv s1, a1
    srli s7, s0, 1      # set d3 through d1
    slli s7, s7, 1      # make space for next bit
    andi a0, s0, 0b1110 # parity of d1, d2, and d3
    jal  ra,calc_parity # compute p2
    add s7, s7, a0
    slli s7, s7, 1      # make space for next bit
    andi t0, s0, 1      # extract d0
    add s7, s7, t0
    slli s7, s7, 1      # make space for next bit
    andi a0, s0, 0b1101 # parity of d0, d2, and d3
    jal  ra,calc_parity # compute p1
    add s7, s7, a0
    slli s7, s7, 1      # make space for next bit
    andi a0, s0, 0b1011 # parity of d0, d1, and d3
    jal ra, calc_parity # compute p0
    add s7, s7, a0
    sb s7, 0(s1) # s1 contains the memory address passed in as a1
    #  epilogue omitted
    lw ra,0(sp)
    lw s0,4(sp)
    lw s1,8(sp)
    lw s7,12(sp)
    addi sp,sp,16
    jr ra
   
exit:
    li a7,10
    ecall
    

我們可以觀察其pipeline以及記憶體位置螢幕擷取畫面 2024-01-22 214813

Input Output
3(0b11) 0

Problem B

Suppose you are tasked with writing a file system in C. The struct file_item represents either a file or a directory. The data union can hold either the contents of a file (as a string) or an array of pointers to child file_items. For the purposes of this question, assume that pointers are 4 bytes in length.

#include <stdbool.h>
typedef struct file_item {
    char *name;
    bool is_directory;
    file_item_data data;
} file_item;

typedef union file_item_data {
    char contents[X];
    struct file_item *children[16];
} file_item_data;

/* Copies all characters from src to dest including the NULL terminator */
char *strcpy(char *dest, const char *src);
  • Part 1
    We set X to the maximum value that does not increase the union size. What is the maximum strlen of a string we can store in a single file? __ B01 __
  • B01 =?
    63
    The size of a union is determined by its largest element, disregarding alignment rules, which we will not consider here. In a 32-bit system, pointers are 32 bits (4 bytes). children is an array of 16 pointers, occupying
    bytes. Therefore, we must ensure that char contents[X] does not exceed 64 bytes. Given that, since 1 byte is defined as the size of a char by default, setting X = 64 fulfills our goal. Consequently, strlen(contents) would return 63 because it does not count the null terminator in the string length.

📝 Explain and Annotate

In the C language, the length of a string is measured using strlen by counting characters until it encounters the null terminator

\0. Therefore, when using this function to measure the length of a character array, you must provide a null terminator at the end of the array to correctly indicate the end of the string.

In the file_item_data union, you can see an array of pointers to children of the contents array. There are a total of 16 pointers, so the value of X in contents[X] ranges from

4×16=64. However, since we need to leave 1 byte for the correct null-terminated char, the maximum value of X is
641=63

  • Part 2
    Write code to create files and directories. Ensure that your code still works even if the input strings are freed later on. Assume that the input strings will fit inside a file_item_data union.
/* Creates a file with the given name and contents,
 * and returns a pointer to that file.
 */
file_item *create_file(char *contents, const char *name)
{
    file_item *new_file = calloc(1, sizeof(file_item));
    new_file->name = name;
    B02 /* Fill your code here */ ;
    return new_file;
}

/* Creates a directory with the given name and no children,
 * and returns a pointer to that directory.
 */
file_item *create_directory(const char *name)
{
    file_item *new_dir = calloc(1, sizeof(file_item));
    new_dir->name = name;
    B03 /* Fill your code here */ ;
    return new_dir;
}

Your solution should be fully correct, without any memory leaks, and should not include any additional lines.

  • B02 =?
    strcpy(new_file->data.contents, contents)
  • B03 =?
    new_dir->is_directory = true

📝 Explain and Annotate

The use of calloc and malloc functions for dynamic memory allocation is different. With malloc, it only allocates a dynamic memory location without initializing the space's values. On the other hand, calloc not only allocates dynamic memory but also initializes its values. For the functions used in B02 and B03, this is advantageous when dealing with pointers within structures

B02 /* Fill your code here */; is a placeholder for filling in the create_file function. In this function, the Contents are allocated into the file_item structure, and data can store the content of a file or a pointer array to file_item. Here, we need to copy the contents to data.contents, so we use the standard C library function strcpy to achieve this

Create a simple test case to observe its memory Address

...
int main() {
    const char *file_name = "testfile.txt";
    const char *file_contents = "Hello, file!";
    const char *dir_name = "testdir";

    file_item *file = create_file(file_contents, file_name);
    file_item *dir = create_directory(dir_name);

    printf("File name: %s, Address: %p\n", file->name, (void*)file->name);
    printf("File contents: %s, Address: %p\n", file->data.contents, (void*)file->data.contents);
    printf("Directory name: %s, Address: %p\n", dir->name, (void*)dir->name);

    // Clean up
    free(file->name);
    free(file);
    free(dir->name);
    free(dir);

    return 0;
}
...

Output:

File_name File_contents Directory name
Address 00000000001D98E0 00000000001D9850 00000000001D99A0

From the output, it can be observed that the name strings are randomly allocated, and their addresses differ from the other parts of the File_item struct. The file contents are directly stored in the data union

Problem C

Consider a 16-bit Fixed-point arithmetic
with 1 sign bit, 5 integer bits, and 10 fraction bits. The 10 fraction bits continue where the integer bits left off, representing

21,
22
,,
210
For example, the bit representation 0b0 01101 1010000000represents
(23+22+20)+(21+23)=13.625

In this question, we will compare this fixed-point system to a 16-bit floating-point system that adheres to all IEEE-754 conventions, including denorms, NaNs, etc., with a 5-bit exponent and an exponent bias of -15.

  • Part1
    Write
    22.375
    in hexadecimal using the 16-bit fixed-point system described above. __ C01 __
  • C01 =?
    0xD980
    The sign bit is 0b1, as the number is negative.
    The integer bits are obtained by converting 22 to unsigned binary, giving us 0b10110.
    The fractional bits are obtained by converting 0.375 to unsigned binary, resulting in 0b0.0110000000, with the fractional part being 0b0110000000.
    In total, we have 0b1101100110000000, which is equivalent to 0xD980 in hexadecimal.

Write

22.375 in hexadecimal using the 16-bit floating point system system described above. __ C02 __

  • C02 =?
    0xCD98
    The floating point system in this question has 16 bits total, including 1 sign bit and 5 exponent
    bits. That leaves 10 bits for the mantissa.
    First, write 22.375 in binary: 0b1 0110.011
    Then, move the binary point to create a number of the form 0b1.MM MMMM MMMM ×
    2E
    This gives us 0b1.01 1001 1000 ×
    24
    The mantissa bits can be read directly from this form: 0b01 1001 1000. Remember to drop the implicit 1 to the left of the binary point.
    The exponent is −4, which we can convert to bias notation by subtracting the bias:
    4(15)=19
    In unsigned 5-bit binary, this is 0b10011.
    The sign bit is 0b1, since the number is negative.
    In total, we have 0b1 10011 01 1001 1000, which is 0xCD98 in hex.

📝 Explain and Annotate

  • Fixed-points arithmetic: we can define a fixed point number type simply by implicitly fixing the binary point to be at some position of a numeral. We will then simply adhere to this implicit convention when we represent numbers
    For example, with an 8-bit value, fixed<8,3> indicates that the last 3 bits represent the fractional part
0 0 0 1 0 1 1 0

This represents the number 0b00010.110 =

21+21+22=2.75

For C01, the value of

22.375 is represented as follows: the sign bit is 1 since it's a negative number (0b1), the integer part is
22
, represented as 0b10110 in binary, and the fractional part is
0.375
, which in binary is 0b0110000000. Therefore, in binary, it is 0b1101100110000000.

For C02, to convert -22.375 to a 16-bit floating-point representation following IEEE 754 rules, which includes 1 sign bit, 5 exponent bits, and 10 mantissa bits, you can first break it down.

The integer part,

22, is represented as 0b10110 in binary, and the fractional part,
0.375
, is 0b011. So, the binary representation is 0b10110.011.
S
In normalized form, it becomes 0b1.0110 0110 0110 0110 ×
24
.

The mantissa bits are 0110011000, and the exponent bits are 4. Adding the bias of -15, we get an exponent of

4(15)=19.

Converting 19 to binary gives 0b10011. Finally, the sign bit is 0b1, so when combined, it becomes 0b1100110110011000.

Converting each nibble to hexadecimal, we get 0xCD98

  • Part2
    How many numbers in the range
    [16,64)
    (including 16, excluding 64) can be represented by the fixed-point system described above? __ C03 __
  • C03 =?
    214

    16 in the fixed point system is 0b10000.0000000000. The largest representable number in the fixed point system is 0b11111.1111111111, which is
    25210
    which is just under 32.
    In between these two numbers, we can choose every bit except the most-significant integer bit to be 0 or 1. That gives us 4 integer bits and 10 fractional bits that could all be 0 or 1. Each bit pattern represents a different number, so we have
    214
    representable numbers in total.

📝 Explain and Annotate

For a 16-bit fixed-point system, when the binary representation is 0b10000.0000000000, it means that the maximum representable value is 0b11111.1111111111, which is slightly less than 32, as it represents

25210.

Within this range of numbers, we can choose different values for each bit except the first sign bit. This gives us a total of 14 bits to use. For each of these 14 bits, we have the option to fill it with either 0 or 1. Therefore, there are a total of

214 possible combinations.

Problem D

  • Part 1
    Suppose we have the following program.
int remainder(int a, int b)
{
    while (a >= b)
        a -= b;
    return a;
}

For a RISC architecture (RV32 in particular), our RISC-V compiler generates the following assembly code.

loop:
    blt x1, x2, done
    sub x1, x1, x2
    j loop
done:
    ...

Assume that RISC-V instructions are 4 bytes each. Assume data values are 4 bytes each.

How many bytes is the RISC-V program? __ D01 __

  • D01 =?
    12

Assume a=7 and b=3. How many data bytes are loaded/stored in the RISC program? __ D02 __

  • D02 =?
    0

📝 Explain and Annotate

Assume that RISC-V instructions are 4 bytes each. Assume data values are 4 bytes each

loop:
    blt x1, x2, done
    sub x1, x1, x2
    j loop
done:
    ...

There are only three instructions in this RISC-V program: blt, sub, and j. Each instruction occupies 4 bytes in memory, so the total size of the program is

4×3=12 bytes. Therefore, the answer for D01 is 12 bytes.

For this RISC-V code snippet, there are no explicit "loaded" or "stored" actions observed.

# for (i = 0; i < N; i++)
#     c[i] = a[i] + b[i]
Loop:
    lw x2, 0(x1)     # load a[i]
    lw x4, 0(x3)     # load b[i]
    add x5, x2, x4   # c[i] = a[i] + b[i]
    __D03__          # store c[i]
    addi x1, x1, 4   # bump pointer
    addi x3, x3, 4   # bump pointer
    addi x6, x6, 4   # bump pointer
    addi x7, x7, 1   # i++
    __D04__          # x8 holds N

Complete the above code to make the RISC-V assembly function as described in the comment.

  • D03 =?
    sw x5, 0(x6)
  • D04 =?
    bne x7, x8, Loop

📝 Explain and Annotate

__D03__ # store c[i] For D03, S-type instructions will be selected, and their general format is as follows sw src, offset(base)
S-type instrction:
Core instrction format

Imm[11:5] rs1 rs2 funct3 Imm[4:0] opcode

Based on the context provided, where it's observed that register x5 already holds the result of an addition, and it's clear that the subsequent pointer is pointing to another location represented by register x6, it is appropriate to store the computation result from x5 into register x6. Therefore, the instructionsw x5, x6(0)

__D04__ # x8 holds N
According to the adjacent comments on D04, when performing a loop operation, it is necessary to observe the values of registers x7 and x8. x7 is used to store the result of the i++ calculation, while x8 stores the current vector length. When x7 equals x8, the loop terminates. Therefore, the instruction used for branching and conditional checking is bne x7, x8, Loop

Next, consider a compiler that Loop unrolling and reschedules instructions, assuming the loop operates on an even number of elements in the vectors. Shown below.

Loop:
    lw x2, 0(x1)      # load a[i]
    addi x1, x1, 8    # bump pointer a
    lw x4, 0(x3)      # load b[i]
    addi x3, x3, 8    # bump pointer b
    __D05__           # load a[i+1]
    add x5, x2, x4    # c[i] = a[i] + b[i]
    __D06__           # load b[i+1]
    addi x7, x7, 2    # i+=2
    __D07__           # store c[i]
    add x10, x8, x9  # c[i+1] = a[i+1] + b[i+1]
    __D08__           # store c[i+1]
    __D09__           # bump pointer c
    __D10__           # x11 holds N
  • D05 =?
    lw x8, -4(x1)
  • D06 =?
    lw x9, -4(x3)
  • D07 =?
    sw x5, 0(x6)
  • D08 =?
    sw x10, 4(x6)
  • D09 =?
    add x6, x6, 8
  • D10 =?
    bne x7, x11, Loop

📝 Explain and Annotate

I-type lw instructions have a general format of "lw src, offset(base)."
Core instruction format

Imm[11:0] rs1 funct3 rd opcode

For D05 and D06, both are loading the values of a[i+1] and b[i+1]. Therefore, x1 and x3 have already been previously assigned to a[i] and b[i], respectively. To obtain the values of a[i+1] and b[i+1], a 4-bit offset is required.

For D07 and D08, they are storing the values of c[i] and c[i+1] into x5 and x10 registers. However, in D08, x6 is already pointing to c[i], which is why there is an offset used to store c[i+1].

For D09 and D10, one is related to a loop that has already started and is processing two elements per iteration. The other is checking whether x7 is equal to x11, and if not, it continues the loop

The mentioned instructions like bne x7, x11, Loop, sw x5, 0(x6)," and others are all part of the design to prevent data hazards by using forwarding

Reference