Consider that we plan to deploy a custom RISC-V processor for a space mission, which requires additional data protection in memory. We have decided to implement a Hamming code with even parity to safeguard our data. For this question, please refer to the parity table shown below.
What Are Bit Flips And How Are Spacecraft Protected From Them?
Implement a RISC-V function called calc_parity
that takes a 4-byte input in a0
and calculates the parity of its bits. It should return 1 for odd parity or 0 for even parity. For example:
calc_parity:
li t1, 0 # t1 holds current parity value in LSB
loop:
xor t1, t1, a0 # adds the current least significant bit
# in a0 to the parity value in t1
__A01__ # shift to the next bit in a0
bne a0, x0, loop # loop if there are more than 1 bit left in a0
# 0 bits will never affect parity
__A02__ # we only want the one parity bit
# in bit 0 of t1
jr ra
- A01 =?
srli a0, a0, 1
It is important to note that this is a logical shift, as using an arithmetic shift could potentially lead to an infinite loop if bit 31 is set.- A02 =?
andi a0, t1, 1
If we have an 8-bit data: 10101111 and need to encrypt it using Hamming code
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 |
By performing XOR operations using the following table, we obtain:
8 | 4 | 2 | 1 | |
---|---|---|---|---|
3 | 0 | 0 | 1 | 1 |
6 | 0 | 1 | 1 | 0 |
9 | 1 | 0 | 0 | 1 |
10 | 1 | 0 | 1 | 0 |
11 | 1 | 0 | 1 | 1 |
12 | 1 | 1 | 0 | 0 |
0 | 0 | 0 | 0 |
The encoded result after encryption is
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 |
a0
, we need to ensure that each right shift operation fills with 0 instead of 1
. Therefore, we use a logical shift (srli a0, a0, 1
) to shift the value stored in register a0 to the right. If we were to use an arithmetic shift (srai a0, a0, 1
), it would continuously fill with 1, causing an infinite loop。
__A02__ # we only want the one parity bit
We only need to retain one bit for identifying the parity, so we directly perform andi a0, t1, 1
and store it in register a0
as the function return value。
easy test for calc_parity
.data
num1: .word 0x01
result: .word 0x00
.text
main:
la s0, num1
lw a0, 0(s0)
jal calc_parity
la s0, result
sw a0, 0(s0)
jal end
calc_parity:
li t1, 0
loop:
xor t1, t1, a0
srli a0, a0, 1
bnez a0, loop
andi a0, t1, 1
jr ra
end:
li a7, 10
ecall
store_nibble
, a RISC-V function that takes four bits of data in a0
, encodes them into seven bits (as a byte with 0
in the most significant bit [MSB]), and stores the result at the memory address specified in a1. This function does not return any value. You may assume that calc_parity
has been correctly implemented and adheres to the calling convention, but do not assume any specific implementation details of calc_parity. Also, you may assume that the most significant 28
bits of the argument in a0
are set to 0
.store_nibble:
# prologue omitted
mv s0, a0
mv s1, a1
srli s7, s0, 1 # set d3 through d1
slli s7, s7, 1 # make space for next bit
andi a0, s0, 0b1110 # parity of d1, d2, and d3
jal ra, calc_parity # compute p2
add s7, s7, a0
slli s7, s7, 1 # make space for next bit
andi t0, s0, 1 # extract d0
add s7, s7, t0
slli s7, s7, 1 # make space for next bit
andi a0, s0, 0b1101 # parity of d0, d2, and d3
jal ra, calc_parity # compute p1
add s7, s7, a0
slli s7, s7, 1 # make space for next bit
andi a0, s0, 0b1011 # parity of d0, d1, and d3
jal ra, calc_parity # compute p0
add s7, s7, __A03__
sb __A04__, __A05__ # s1 contains the memory address passed in as a1
# epilogue omitted
jr ra
- A03=?
a0
- A04=?
s7
- A05=?
0(s1)
The a0
register contains 4 bits of input data (d3, d2, d1, d0), and the highest 28 bits are set to 0
. The a1 register is the memory address where the result will be stored. The calc_parity
function is used to calculate the parity bit for the specified position.
add s7, s7, __A03__
A03
corresponds to the a0 register because we intend to store the result obtained through calc_parity
in s7
. This is because s7
is a callee-saved register, which means it is used for saving and restoring values during function calls
Modify Assembly for easy test
....
main:
la s0, num1
lw a0, 0(s0)
la a1, result
jal store_nibble
la a1, end
jalr a1
store_nibble:
# Store the original value of a0
mv t2, a0
# Calculate p2
andi a0, t2, 0b0100
slli a0, a0, 1
jal calc_parity
mv t3, a0
# Calculate p1
andi a0, t2, 0b0010 # Mask for bit positions d0, d2
slli a0, a0, 2 # Shift left to position the bits for parity calculation
jal calc_parity # Call calc_parity function
slli a0, a0, 1 # Shift left to make space for p2
....
Observing its pipelineS
I suspect that this will cause an infinite loop. I think it might be due to an issue with my test program's registers. I will find some time to retest this and observe its pipeline。
store_nibble
to follow the calling convention. If there are none, write N/A
. __ A06 __
- A06=?
s0, s1, s7, ra
All of these registers are used in the question template and must, therefore, be saved and restored to follow the calling convention. (Whilera
is not strictly a saved register,store_nibble
is calling functions and, therefore, needs to savera
as well.) Any additional saved registers used also need to be saved and restored.
sp
is a special case; even though it is callee-saved, it is not usually saved and restored from the stack. Thus, it was ignored.
For RISC-V functions, the prologue/epilogue is used to allocate stack space for temporary registers during the execution of a RISC-V function, as illustrated in the following diagram
Establishing the stack address through the prologue/epilogue is crucial. Not only is this a fundamental aspect of constructing functions, but it also ensures that the stack's position and values are correctly preserved, making it easier to debug
Callee Saved:The callee function can use them freely
(if needed, the caller had to save them before invoking and will restore them afterwards)
Caller Saved:The callee function must save them before modifying them, and restore them before returning (avoid using them at all, and no need to save)
for store_nibble function
store_nibble:
# prologue omitted
addi sp,sp ,-16
sw s0,0(sp)
sw s1,4(sp)
sw s7,8(sp)
sw ra,12(sp)
mv s0, a0
mv s1, a1
srli s7, s0, 1 # set d3 through d1
slli s7, s7, 1 # make space for next bit
andi a0, s0, 0b1110 # parity of d1, d2, and d3
jal ra, calc_parity # compute p2
add s7, s7, a0
slli s7, s7, 1 # make space for next bit
andi t0, s0, 1 # extract d0
add s7, s7, t0
slli s7, s7, 1 # make space for next bit
andi a0, s0, 0b1101 # parity of d0, d2, and d3
jal ra, calc_parity # compute p1
add s7, s7, a0
slli s7, s7, 1 # make space for next bit
andi a0, s0, 0b1011 # parity of d0, d1, and d3
jal ra, calc_parity # compute p0
add s7, s7, a0
sb s7, 0(s1) # s1 contains the memory address passed in as a1
# epilogue omitted
addi sp,sp ,16
lw s0,0(sp)
lw s1,4(sp)
lw s7,8(sp)
lw ra,12(sp)
jr ra
We can determine that the store_nibble
function operates with the need to save and restore the following registers: ra, s0, s1, s7
. Among these, ra
is considered Caller Saved, while the rest, s0, s1
, and s7
, are Callee Saved. However, sp is a special case, as it serves as a pointer to the stack, and the changes to the stack are sequential. Therefore, there is no need to specifically store and restore memory addresses in the prologue/epilogue since the sp
address will be restored upon the function's completion。
.data
num1: .word 3
.text
main:
la s5,num1
lw a0,0(s5)
jal store_nibble
jal print_result
jal exit
print_result:
mv a0,s0
li a7,1
ecall
jr ra
calc_parity:
li t1, 0 # t1 holds current parity value in LSB
loop:
xor t1, t1, a0 # adds the current least significant bit
# in a0 to the parity value in t1
srli a0,a0,1 # shift to the next bit in a0
bne a0, x0, loop # loop if there are more than 1 bit left in a0
# 0 bits will never affect parity
andi a0,t1,1 # we only want the one parity bit
# in bit 0 of t1
jr ra
store_nibble:
addi sp,sp,-16
sw ra,0(sp)
sw s0,4(sp)
sw s1,8(sp)
sw s7,12(sp)
# prologue omitted
mv s0, a0
mv s1, a1
srli s7, s0, 1 # set d3 through d1
slli s7, s7, 1 # make space for next bit
andi a0, s0, 0b1110 # parity of d1, d2, and d3
jal ra,calc_parity # compute p2
add s7, s7, a0
slli s7, s7, 1 # make space for next bit
andi t0, s0, 1 # extract d0
add s7, s7, t0
slli s7, s7, 1 # make space for next bit
andi a0, s0, 0b1101 # parity of d0, d2, and d3
jal ra,calc_parity # compute p1
add s7, s7, a0
slli s7, s7, 1 # make space for next bit
andi a0, s0, 0b1011 # parity of d0, d1, and d3
jal ra, calc_parity # compute p0
add s7, s7, a0
sb s7, 0(s1) # s1 contains the memory address passed in as a1
# epilogue omitted
lw ra,0(sp)
lw s0,4(sp)
lw s1,8(sp)
lw s7,12(sp)
addi sp,sp,16
jr ra
exit:
li a7,10
ecall
我們可以觀察其pipeline以及記憶體位置
Input | Output |
---|---|
3(0b11) |
0 |
Suppose you are tasked with writing a file system in C. The struct file_item
represents either a file or a directory. The data
union can hold either the contents of a file (as a string) or an array of pointers to child file_items
. For the purposes of this question, assume that pointers are 4 bytes in length.
#include <stdbool.h>
typedef struct file_item {
char *name;
bool is_directory;
file_item_data data;
} file_item;
typedef union file_item_data {
char contents[X];
struct file_item *children[16];
} file_item_data;
/* Copies all characters from src to dest including the NULL terminator */
char *strcpy(char *dest, const char *src);
X
to the maximum value that does not increase the union size. What is the maximum strlen of a string we can store in a single file? __ B01 __
- B01 =?
63
The size of a union is determined by its largest element, disregarding alignment rules, which we will not consider here. In a 32-bit system, pointers are 32 bits (4 bytes). children is an array of 16 pointers, occupying
bytes. Therefore, we must ensure that char contents[X] does not exceed 64 bytes. Given that, since 1 byte is defined as the size of a char by default, setting X = 64 fulfills our goal. Consequently, strlen(contents) would return 63 because it does not count the null terminator in the string length.
In the C language, the length of a string is measured using strlen by counting characters until it encounters the null terminator
In the file_item_data union, you can see an array of pointers to children of the contents array. There are a total of 16 pointers, so the value of X in contents[X] ranges from
file_item_data
union./* Creates a file with the given name and contents,
* and returns a pointer to that file.
*/
file_item *create_file(char *contents, const char *name)
{
file_item *new_file = calloc(1, sizeof(file_item));
new_file->name = name;
B02 /* Fill your code here */ ;
return new_file;
}
/* Creates a directory with the given name and no children,
* and returns a pointer to that directory.
*/
file_item *create_directory(const char *name)
{
file_item *new_dir = calloc(1, sizeof(file_item));
new_dir->name = name;
B03 /* Fill your code here */ ;
return new_dir;
}
Your solution should be fully correct, without any memory leaks, and should not include any additional lines.
- B02 =?
strcpy(new_file->data.contents, contents)
- B03 =?
new_dir->is_directory = true
The use of calloc
and malloc
functions for dynamic memory allocation is different. With malloc, it only allocates a dynamic memory location without initializing the space's values. On the other hand, calloc not only allocates dynamic memory but also initializes its values. For the functions used in B02
and B03
, this is advantageous when dealing with pointers within structures
B02 /* Fill your code here */
; is a placeholder for filling in the create_file
function. In this function, the Contents are allocated into the file_item
structure, and data can store the content of a file or a pointer array to file_item
. Here, we need to copy the contents to data.contents, so we use the standard C library function strcpy to achieve this
Create a simple test case to observe its memory Address
...
int main() {
const char *file_name = "testfile.txt";
const char *file_contents = "Hello, file!";
const char *dir_name = "testdir";
file_item *file = create_file(file_contents, file_name);
file_item *dir = create_directory(dir_name);
printf("File name: %s, Address: %p\n", file->name, (void*)file->name);
printf("File contents: %s, Address: %p\n", file->data.contents, (void*)file->data.contents);
printf("Directory name: %s, Address: %p\n", dir->name, (void*)dir->name);
// Clean up
free(file->name);
free(file);
free(dir->name);
free(dir);
return 0;
}
...
Output:
File_name | File_contents | Directory name | |
---|---|---|---|
Address | 00000000001D98E0 |
00000000001D9850 |
00000000001D99A0 |
From the output, it can be observed that the name strings are randomly allocated, and their addresses differ from the other parts of the File_item
struct. The file contents are directly stored in the data union
Consider a 16-bit Fixed-point arithmetic
with 1 sign bit, 5 integer bits, and 10 fraction bits. The 10 fraction bits continue where the integer bits left off, representing 0b0 01101 1010000000
represents
In this question, we will compare this fixed-point system to a 16-bit floating-point system that adheres to all IEEE-754 conventions, including denorms, NaNs, etc., with a 5-bit exponent and an exponent bias of -15.
- C01 =?
0xD980
The sign bit is 0b1, as the number is negative.
The integer bits are obtained by converting 22 to unsigned binary, giving us 0b10110.
The fractional bits are obtained by converting 0.375 to unsigned binary, resulting in 0b0.0110000000, with the fractional part being 0b0110000000.
In total, we have 0b1101100110000000, which is equivalent to 0xD980 in hexadecimal.
Write
- C02 =?
0xCD98
The floating point system in this question has 16 bits total, including 1 sign bit and 5 exponent
bits. That leaves 10 bits for the mantissa.
First, write 22.375 in binary: 0b1 0110.011
Then, move the binary point to create a number of the form 0b1.MM MMMM MMMM ×This gives us 0b1.01 1001 1000 × The mantissa bits can be read directly from this form: 0b01 1001 1000. Remember to drop the implicit 1 to the left of the binary point.
The exponent is −4, which we can convert to bias notation by subtracting the bias:In unsigned 5-bit binary, this is 0b10011.
The sign bit is 0b1, since the number is negative.
In total, we have 0b1 10011 01 1001 1000, which is 0xCD98 in hex.
0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 |
This represents the number 0b00010.110
=
For C01
, the value of 0b1
), the integer part is 0b10110
in binary, and the fractional part is 0b0110000000
. Therefore, in binary, it is 0b1101100110000000
.
For C02
, to convert -22.375 to a 16-bit floating-point representation following IEEE 754 rules, which includes 1 sign bit, 5 exponent bits, and 10 mantissa bits, you can first break it down.
The integer part, 0b10110
in binary, and the fractional part, 0b011
. So, the binary representation is 0b10110.011.
S
In normalized form, it becomes 0b1.0110 0110 0110 0110 ×
The mantissa bits are 0110011000
, and the exponent bits are 4. Adding the bias of -15, we get an exponent of
Converting 19 to binary gives 0b10011
. Finally, the sign bit is 0b1, so when combined, it becomes 0b1100110110011000
.
Converting each nibble to hexadecimal, we get 0xCD98
- C03 =?
16 in the fixed point system is 0b10000.0000000000. The largest representable number in the fixed point system is 0b11111.1111111111, which iswhich is just under 32.
In between these two numbers, we can choose every bit except the most-significant integer bit to be 0 or 1. That gives us 4 integer bits and 10 fractional bits that could all be 0 or 1. Each bit pattern represents a different number, so we haverepresentable numbers in total.
For a 16-bit fixed-point system, when the binary representation is 0b10000.0000000000, it means that the maximum representable value is 0b11111.1111111111, which is slightly less than 32, as it represents
Within this range of numbers, we can choose different values for each bit except the first sign bit. This gives us a total of 14 bits to use. For each of these 14 bits, we have the option to fill it with either 0 or 1. Therefore, there are a total of
int remainder(int a, int b)
{
while (a >= b)
a -= b;
return a;
}
For a RISC architecture (RV32 in particular), our RISC-V compiler generates the following assembly code.
loop:
blt x1, x2, done
sub x1, x1, x2
j loop
done:
...
Assume that RISC-V instructions are 4 bytes each. Assume data values are 4 bytes each.
How many bytes is the RISC-V program? __ D01 __
- D01 =?
12
Assume a=7 and b=3. How many data bytes are loaded/stored in the RISC program? __ D02 __
- D02 =?
0
Assume that RISC-V instructions are 4 bytes each. Assume data values are 4 bytes each
loop:
blt x1, x2, done
sub x1, x1, x2
j loop
done:
...
There are only three instructions in this RISC-V program: blt
, sub
, and j
. Each instruction occupies 4 bytes in memory, so the total size of the program is
For this RISC-V code snippet, there are no explicit "loaded" or "stored" actions observed.
# for (i = 0; i < N; i++)
# c[i] = a[i] + b[i]
Loop:
lw x2, 0(x1) # load a[i]
lw x4, 0(x3) # load b[i]
add x5, x2, x4 # c[i] = a[i] + b[i]
__D03__ # store c[i]
addi x1, x1, 4 # bump pointer
addi x3, x3, 4 # bump pointer
addi x6, x6, 4 # bump pointer
addi x7, x7, 1 # i++
__D04__ # x8 holds N
Complete the above code to make the RISC-V assembly function as described in the comment.
- D03 =?
sw x5, 0(x6)
- D04 =?
bne x7, x8, Loop
__D03__ # store c[i]
For D03, S-type instructions will be selected, and their general format is as follows sw src, offset(base)
S-type instrction:
Core instrction format
Imm[11:5] | rs1 | rs2 | funct3 | Imm[4:0] | opcode |
---|
Based on the context provided, where it's observed that register x5
already holds the result of an addition, and it's clear that the subsequent pointer is pointing to another location represented by register x6
, it is appropriate to store the computation result from x5
into register x6
. Therefore, the instructionsw x5, x6(0)
__D04__ # x8 holds N
According to the adjacent comments on D04, when performing a loop operation, it is necessary to observe the values of registers x7
and x8
. x7
is used to store the result of the i++ calculation, while x8
stores the current vector length. When x7 equals x8, the loop terminates. Therefore, the instruction used for branching and conditional checking is bne x7, x8, Loop
Next, consider a compiler that Loop unrolling and reschedules instructions, assuming the loop operates on an even number of elements in the vectors. Shown below.
Loop:
lw x2, 0(x1) # load a[i]
addi x1, x1, 8 # bump pointer a
lw x4, 0(x3) # load b[i]
addi x3, x3, 8 # bump pointer b
__D05__ # load a[i+1]
add x5, x2, x4 # c[i] = a[i] + b[i]
__D06__ # load b[i+1]
addi x7, x7, 2 # i+=2
__D07__ # store c[i]
add x10, x8, x9 # c[i+1] = a[i+1] + b[i+1]
__D08__ # store c[i+1]
__D09__ # bump pointer c
__D10__ # x11 holds N
- D05 =?
lw x8, -4(x1)
- D06 =?
lw x9, -4(x3)
- D07 =?
sw x5, 0(x6)
- D08 =?
sw x10, 4(x6)
- D09 =?
add x6, x6, 8
- D10 =?
bne x7, x11, Loop
I-type lw instructions have a general format of "lw src, offset(base)."
Core instruction format
Imm[11:0] | rs1 | funct3 | rd | opcode |
---|
For D05 and D06, both are loading the values of a[i+1] and b[i+1]. Therefore, x1 and x3 have already been previously assigned to a[i] and b[i], respectively. To obtain the values of a[i+1] and b[i+1], a 4-bit offset is required.
For D07 and D08, they are storing the values of c[i] and c[i+1] into x5 and x10 registers. However, in D08, x6 is already pointing to c[i], which is why there is an offset used to store c[i+1].
For D09 and D10, one is related to a loop that has already started and is processing two elements per iteration. The other is checking whether x7 is equal to x11, and if not, it continues the loop
The mentioned instructions like bne x7, x11, Loop, sw x5, 0(x6),
" and others are all part of the design to prevent data hazards by using forwarding