# Assignment1: RISC-V Assembly and Instruction Pipeline
contributed by <[`amos0256`](https://github.com/amos0256/Computer-Aritecture-2024-Fall)>
###### tags: `Computer Architecture 2024 Fall`
## [Problem `C`](https://hackmd.io/@sysprog/arch2024-quiz1-sol#Problem-C)
### `fabsf`
The `fabsf` function is used to compute the absolute value of a single-precision floating-point number by performing a bitwise AND operation with the mask `0x7FFFFFFF` to clear the sign bit.
```c
static inline float fabsf(float x) {
uint32_t i = *(uint32_t *)&x; // Read the bits of the float into an integer
i &= 0x7FFFFFFF; // Clear the sign bit to get the absolute value
x = *(float *)&i; // Write the modified bits back into the float
return x;
}
```
```c
fabsf:
li t0, 0x7FFFFFFF
and a0, a0, t0
ret
```
### `my_clz`
The `my_clz` function counts the number of leading zeros in a 32-bit unsigned integer. It iterates through the bits of the input `x` from the MSB to the LSB, incrementing a counter until it encounters the first set bit, and then returns the count of leading zeros.
```c
static inline int my_clz(uint32_t x) {
int count = 0;
for (int i = 31; i >= 0; --i) {
if (x & (1U << i))
break;
count++;
}
return count;
}
```
```c
my_clz:
li t0, 0 # counter = 0
li t1, 31 # i counter
li t2, 1
my_clz_loop:
bltz t1, end_my_clz # if t1 < 0, goto end_my_clz
sll t3, t2, t1 # t3 = 1 << i
and t4, a0, t3 # t4 = x & (1 << i)
bnez t4, end_my_clz # if t4 == 1, goto end_my_clz
addi t0, t0, 1 # count++
addi t1, t1, -1 # i--
j my_clz_loop
end_my_clz:
mv a0, t0 # return the count
ret
```
### `fp16_to_fp32`
The `fp16_to_fp32` function converts a 16-bit half-precision floating-point number into a 32-bit single-precision floating-point number. It extends the representation by shifting the half-precision number into the upper half of a 32-bit word, with the appropriate placement of the sign bit, exponent, and mantissa. The function handles both normalized and denormalized numbers by determining how much to shift the mantissa for normalization, and it also accounts for special cases such as infinity and NaN. Finally, the function combines all components—sign, adjusted exponent, and mantissa—into a single 32-bit output, ensuring accurate conversion between the two floating-point formats.
```c
static inline uint32_t fp16_to_fp32(uint16_t h) {
/*
* Extends the 16-bit half-precision floating-point number to 32 bits
* by shifting it to the upper half of a 32-bit word:
* +---+-----+------------+-------------------+
* | S |EEEEE|MM MMMM MMMM|0000 0000 0000 0000|
* +---+-----+------------+-------------------+
* Bits 31 26-30 16-25 0-15
*
* S - sign bit, E - exponent bits, M - mantissa bits, 0 - zero bits.
*/
const uint32_t w = (uint32_t) h << 16;
/*
* Isolates the sign bit from the input number, placing it in the most
* significant bit of a 32-bit word:
*
* +---+----------------------------------+
* | S |0000000 00000000 00000000 00000000|
* +---+----------------------------------+
* Bits 31 0-31
*/
const uint32_t sign = w & UINT32_C(0x80000000);
/*
* Extracts the mantissa and exponent from the input number, placing
* them in bits 0-30 of the 32-bit word:
*
* +---+-----+------------+-------------------+
* | 0 |EEEEE|MM MMMM MMMM|0000 0000 0000 0000|
* +---+-----+------------+-------------------+
* Bits 30 27-31 17-26 0-16
*/
const uint32_t nonsign = w & UINT32_C(0x7FFFFFFF);
/*
* The renorm_shift variable indicates how many bits the mantissa
* needs to be shifted to normalize the half-precision number.
* For normalized numbers, renorm_shift will be 0. For denormalized
* numbers, renorm_shift will be greater than 0. Shifting a
* denormalized number will move the mantissa into the exponent,
* normalizing it.
*/
uint32_t renorm_shift = my_clz(nonsign);
renorm_shift = renorm_shift > 5 ? renorm_shift - 5 : 0;
/*
* If the half-precision number has an exponent of 15, adding a
* specific value will cause overflow into bit 31, which converts
* the upper 9 bits into ones. Thus:
* inf_nan_mask ==
* 0x7F800000 if the half-precision number is
* NaN or infinity (exponent of 15)
* 0x00000000 otherwise
*/
const int32_t inf_nan_mask = ((int32_t)(nonsign + 0x04000000) >> 8) &
INT32_C(0x7F800000);
/*
* If nonsign equals 0, subtracting 1 will cause overflow, setting
* bit 31 to 1. Otherwise, bit 31 will be 0. Shifting this result
* propagates bit 31 across all bits in zero_mask. Thus:
* zero_mask ==
* 0xFFFFFFFF if the half-precision number is
* zero (+0.0h or -0.0h)
* 0x00000000 otherwise
*/
const int32_t zero_mask = (int32_t)(nonsign - 1) >> 31;
/*
* 1. Shifts nonsign left by renorm_shift to normalize it (for denormal
* inputs).
* 2. Shifts nonsign right by 3, adjusting the exponent to fit in the
* 8-bit exponent field and moving the mantissa into the correct
* position within the 23-bit mantissa field of the single-precision
* format.
* 3. Adds 0x70 to the exponent to account for the difference in bias
* between half-precision and single-precision.
* 4. Subtracts renorm_shift from the exponent to account for any
* renormalization that occurred.
* 5. ORs with inf_nan_mask to set the exponent to 0xFF if the input
* was NaN or infinity.
* 6. ANDs with the inverted zero_mask to set the mantissa and exponent
* to zero if the input was zero.
* 7. Combines everything with the sign bit of the input number.
*/
return sign | ((((nonsign << renorm_shift >> 3) +
((0x70 - renorm_shift) << 23)) | inf_nan_mask) & ~zero_mask);
}
```
There are three fp16 test cases in the RISC V assembly code:
* `0x7C00` represent $+ \infty$ and `0x7F800000` in fp32
* `0x0001` represent $5.96 \times 10^{-8}$ and `0x33800000` in fp32
* `0x3C00` represent `1.0` and `0x3F800000` in fp32
```
.data
examples:
.word 0x7C00
.word 0x0001
.word 0x3C00
expected_res:
.word 0x7F800000
.word 0x33800000
.word 0x3F800000
correct_msg:
.asciz "Correct"
wrong_msg:
.asciz "Wrong"
newline:
.asciz ".\n"
.text
.global main
main:
la t0, examples # t0 = address of examples
la t1, expected_res # t1 = address of expected_res
li t2, 3 # t2 = number of examples
li t3, 0 # t3 = for loop counter
loop_example:
beq t2, t3, end_main # if counter == 3, goto end_main
lw a0, 0(t0) # a0 = example
jal ra, fp16_to_fp32
lw a1, 0(t1) # a1 = expected result
jal ra, print_result
addi t0, t0, 4 # move to the next example
addi t1, t1, 4 # move to the next expected result
addi t3, t3, 1 # increment loop counter
j loop_example
end_main:
# exit program
li a7, 10 # syscall exit
ecall
fp16_to_fp32:
# preserve ra register
addi sp, sp, -4
sw ra, 0(sp) # store return address
# preserve t0-t3 registers
addi sp, sp, -16
sw t0, 0(sp)
sw t1, 4(sp)
sw t2, 8(sp)
sw t3, 12(sp)
slli a0, a0, 16 # shift left by 16 bits to extend to 32-bit
# sign bit
li t0, 0x80000000 # load sign mask
and t1, a0, t0 # t1 is sign = w & 0x80000000
# nonsign bit
li t0, 0x7FFFFFFF # load nonsign mask
and t2, a0, t0 # t2 is nonsign = w & 0x7FFFFFFF
# normalize the number, use my_clz to count leading zeros
mv a0, t2
jal ra, my_clz
li t0, 5
blt t0, a0, renorm # if renorm_shift > 5, goto renorm
li a0, 0 # renorm_shift = 0
j continue
renorm:
sub a0, a0, t0 # renorm_shift -= 5
continue:
# handle inf_nan_mask
li t0, 0x04000000
add t3, t2, t0 # inf_nan_mask = nonsign + 0x04000000
srai t3, t3, 8 # inf_nan_mask >>= 8
li t0, 0x7F800000
and t3, t3, t0 # inf_nan_mask
# handle zero_mask
addi t4, t2, -1 # zero_mask = nonsign - 1
srai t4, t4, 31 # zero_mask >> 31
# normalize and adjust exponent
sll t5, t2, a0 # shift nonsign left by renorm_shift
srai t5, t5, 3 # shift nonsign right by 3
li t0, 0x70
sub t0, t0, a0 # 0x70 - renorm_shift
slli t0, t0, 23 # adjust exponent and place in proper position
add t5, t5, t0 # add the adjust exponent to the result
# combine inf_nan_mask and handle zero case
or t5, t5, t3 # or with inf_nan_mask
not t4, t4 # invert zero_mask
and t5, t5, t4 # zero out result if zero_mask is set
# combine sign and result
or a0, t1, t5 # combine with sign bit
# restore t0-t3 registers
lw t0, 0(sp)
lw t1, 4(sp)
lw t2, 8(sp)
lw t3, 12(sp)
addi sp, sp, 16
# restore t0-t3 registers
lw ra, 0(sp)
addi sp, sp, 4
ret
# my_clz function
my_clz:
# preserve t1 and t2 registers
addi sp, sp, -8
sw t1, 0(sp)
sw t2, 4(sp)
li t0, 0 # counter = 0
li t1, 31 # i counter
li t2, 1
my_clz_loop:
bltz t1, end_my_clz # if t1 < 0, goto end_my_clz
sll t3, t2, t1 # t3 = 1 << i
and t4, a0, t3 # t4 = x & (1 << i)
bnez t4, end_my_clz # if t4 == 1, goto end_my_clz
addi t0, t0, 1 # count++
addi t1, t1, -1 # i--
j my_clz_loop
end_my_clz:
mv a0, t0 # return the count
# preserve t1 and t2 registers
lw t1, 0(sp)
lw t2, 4(sp)
addi sp, sp, 8
ret
# print result function
print_result:
bne a0, a1, wrong_case # if a0 != a1, goto wrong_case
li a7, 4
ecall
la a0, correct_msg
li a7, 4
ecall
j end_print
wrong_case:
la a0, wrong_msg
li a7, 4
ecall
end_print:
la a0, newline
li a7, 4
ecall
ret
```
## Minimum Flips to Make a OR b Equal to c
> [LeetCode 1318](https://leetcode.com/problems/minimum-flips-to-make-a-or-b-equal-to-c/description/)
Description:
Given 3 positives numbers `a`, `b` and `c`. Return the minimum flips required in some bits of `a` and `b` to make ( `a` OR `b` == `c` ). (bitwise OR operation).
Flip operation consists of change any single bit 1 to 0 or change the bit 0 to 1 in their binary representation.
Constraint:
* `1 <= a <= 10^9`
* `1 <= b <= 10^9`
* `1 <= c <= 10^9`
## Implementation
### Idle of solving problem
1. The result of bitwise OR (`a | b`) is 1 if either bit is 1, and 0 if both bits are 0.
2. For each bit position, there are two cases:
* If the corresponding bit in `c` is 0, both `a` and `b` must have 0 at that bit position. The total number of required flips is equal to the count 1s at the corresponding positions in `a` and `b`, which is the sum of the bit values in `a` and `b`.
$$
\begin{array}{cccccc}
{} & {a} & {1} & {\underrightarrow{flip}} & {0} & {} \\
{} & {b} & {1} & {\underrightarrow{flip}} & {0} & {} \\
\hline
{} & {c} & {} & {} & {0} & {}
\end{array}
\\
\Rightarrow total \ flips \ times = \# \ of \ 1s
$$
* If the corresponding bit in `c` is 1, at least one of `a` or `b` needs to be 1. A flip is needed only when both `a` and `b` are 0, and the number of required flips is 1.
$$
\begin{array}{cccccc}
{} & {a} & {0} & {\underrightarrow{flip}} & {1} & {} \\
{} & {b} & {0} & {} & {0} & {} \\
\hline
{} & {c} & {} & {} & {1} & {}
\end{array}
\\
\Rightarrow total \ flips \ times = 1
$$
### C program
#### First version
In the first version, I used an intuitive approach, starting from the 32nd bit to compare the corresponding bits of the three numbers a, b, and c. The loop calculated the minimum number of required flips based on the ideas mentioned above until it arrived at the 1st bit.
```c
int minFlips(int a, int b, int c){
int flips = 0;
for (int i = 0; i < 32; i++) {
int bitA = (a >> i) & 1;
int bitB = (b >> i) & 1;
int bitC = (c >> i) & 1;
if (bitC == 0) {
flips += bitA + bitB;
}
else {
if (bitA == 0 && bitB == 0) {
flips += 1;
}
}
}
return flips;
}
```
#### Second version
When the numbers `a`, `b`, and `c` are small, unnecessary iterations can occur due to comparisons involving 0. Therefore, we can first identify the leading 1 in the binary representation of the largest number among `a`, `b` and `c`. This enables to calculate the number of leading zeros for the maximum number, allowing to skip these unnecessary comparisons.
* without counting leading zero
$$
\begin{array}
{} & {a} & {\overbrace{000...0}^{30 \ bits} 01} & {} \\
{} & {b} & {\overbrace{000...0}^{30 \ bits} 10} & {} \\
\hline
{} & {c} & {\overbrace{000...0}^{30 \ bits} 11} & {}
\end{array}
\\
$$
It can observe that only the last two bits need to be compared.
* with counting leading zero
$$
\begin{array}
{} & {a} & {01} & {} \\
{} & {b} & {10} & {} \\
\hline
{} & {c} & {11} & {}
\end{array}
\\
$$
When calculating the counting leading zero of the three numbers `a`, `b`, and `c`, we avoid using a comparison of their sizes to find the maximum value. Since we only care about the position of the first occurrence of 1, performing an OR operation on the three numbers will not change the position of the first 1 that appears.
```c
static inline int my_clz(uint32_t x) {
int count = 0;
for (int i = 31; i >= 0; --i) {
if (x & (1U << i))
break;
count++;
}
return count;
}
int minFlips(int a, int b, int c) {
int flips = 0;
int maxBit = 31 - my_clz(a | b | c);
for (int i = 0; i <= maxBit; i++) {
int bitA = (a >> i) & 1;
int bitB = (b >> i) & 1;
int bitC = (c >> i) & 1;
if (bitC == 0) {
flips += bitA + bitB;
}
else {
if (bitA == 0 && bitB == 0) {
flips += 1;
}
}
}
return flips;
}
```
:::danger
Replace `else if (b >= a && b >= c)` with `if (b >= a && b >= c)`.
Shorten the above.
:::
### RISC V Assembly
#### First version
```
.data
examples:
.word 1, 2, 3
.word 51041, 65280, 716177407
.word 143165576, 715827882, 1
expected_res:
.word 0
.word 14
.word 23
correct_msg:
.asciz "Correct"
wrong_msg:
.asciz "Wrong"
newline:
.asciz ".\n"
.text
.global main
main:
la t0, examples # t0 = address of examples
la t1, expected_res # t1 = address of expected_res
li t2, 3 # t2 = number of examples
li t3, 0 # t3 = for loop counter
loop_example:
beq t2, t3, end_main # if counter == 3, goto end_main
# load the current example's value
lw a0, 0(t0)
lw a1, 4(t0)
lw a2, 8(t0)
jal ra, min_flips # call min_flips
lw a1, 0(t1) # load the current expected result
jal ra, print_result # call print_result
addi t0, t0, 12 # move to the next example
addi t1, t1, 4 # move to the next expected result
addi t3, t3, 1 # increment loop counter
j loop_example
end_main:
# exit program
li a7, 10 # syscall exit
ecall
# min flips function
min_flips:
# preserve t0-t3 registers
addi sp, sp, -16
sw t0, 0(sp)
sw t1, 4(sp)
sw t2, 8(sp)
sw t3, 12(sp)
li t0, 0 # flips = 0
li t1, 31 # i = 31, 31 bits
loop:
bltz t1, end_loop # if i < 0, goto end_loop
srl t2, a0, t1 # bitA = (a >> i) & 1
andi t2, t2, 1
srl t3, a1, t1 # bitB = (b >> i) & 1
andi t3, t3, 1
srl t4, a2, t1 # bitC = (c >> i) & 1
andi t4, t4, 1
beqz t4, add_flip # if bitC == 0, goto add_flip
beqz t2, check_b # if bitA == 0, goto check_b
j continue_loop
add_flip:
add t0, t0, t2 # flips += bitA
add t0, t0, t3 # flips += bitB
j continue_loop
check_b:
bnez t3, continue_loop # if bitB != 0, goto continue_loop
addi t0, t0, 1 # flips += 1
continue_loop:
addi t1, t1, -1 # i--
j loop
end_loop:
mv a0, t0 # restore return value
# restore t0-t3 registers
lw t0, 0(sp)
lw t1, 4(sp)
lw t2, 8(sp)
lw t3, 12(sp)
addi sp, sp, 16
ret
# print result function
print_result:
bne a0, a1, wrong_case # if a0 != a1, goto wrong_case
la a0, correct_msg
li a7, 4
ecall
j end_print
wrong_case:
la a0, wrong_msg
li a7, 4
ecall
end_print:
la a0, newline
li a7, 4
ecall
ret
```
#### Second version
```
.data
examples:
.word 1, 2, 3
.word 51041, 65280, 716177407
.word 143165576, 715827882, 1
expected_res:
.word 0
.word 14
.word 23
correct_msg:
.asciz "Correct"
wrong_msg:
.asciz "Wrong"
newline:
.asciz ".\n"
.text
.global main
main:
la t0, examples # t0 = address of examples
la t1, expected_res # t1 = address of expected_res
li t2, 3 # t2 = number of examples
li t3, 0 # t3 = for loop counter
loop_example:
beq t2, t3, end_main # if counter == 3, goto end_main
# load the current example's value
lw a0, 0(t0)
lw a1, 4(t0)
lw a2, 8(t0)
jal ra, min_flips # call min_flips
lw a1, 0(t1) # load the current expected result
jal ra, print_result # call print_result
addi t0, t0, 12 # move to the next example
addi t1, t1, 4 # move to the next expected result
addi t3, t3, 1 # increment loop counter
j loop_example
end_main:
# exit program
li a7, 10 # syscall exit
ecall
# my_clz function
my_clz:
li t0, 0 # counter = 0
li t1, 31 # i counter
li t2, 1
my_clz_loop:
bltz t1, end_my_clz # if t1 < 0, goto end_my_clz
sll t3, t2, t1 # t3 = 1 << i
and t4, a0, t3 # t4 = x & (1 << i)
bnez t4, end_my_clz # if t4 == 1, goto end_my_clz
addi t0, t0, 1 # count++
addi t1, t1, -1 # i--
j my_clz_loop
end_my_clz:
mv a0, t0 # return the count
ret
# min flips function
min_flips:
# preserve t0-t3 registers
addi sp, sp, -16
sw t0, 0(sp)
sw t1, 4(sp)
sw t2, 8(sp)
sw t3, 12(sp)
# push the input parameters into the stack
addi sp, sp, -16 # allocate space on the stack
sw ra, 12(sp)
sw a0, 0(sp)
sw a1, 4(sp)
sw a2, 8(sp)
or a1, a1, a2 # b OR c
or a0, a0, a1 # a OR b OR c
jal ra, my_clz # compute the leading zero
li t0, 0 # flips = 0
li t1, 31
sub t1, t1, a0 # i = 31 - number of leading zero
# restore the input parameters from the stack
lw a0, 0(sp)
lw a1, 4(sp)
lw a2, 8(sp)
loop:
bltz t1, end_loop # if i < 0, goto end_loop
srl t2, a0, t1 # bitA = (a >> i) & 1
andi t2, t2, 1
srl t3, a1, t1 # bitB = (b >> i) & 1
andi t3, t3, 1
srl t4, a2, t1 # bitC = (c >> i) & 1
andi t4, t4, 1
beqz t4, add_flip # if bitC == 0, goto add_flip
beqz t2, check_b # if bitA == 0, goto check_b
j continue_loop
add_flip:
add t0, t0, t2 # flips += bitA
add t0, t0, t3 # flips += bitB
j continue_loop
check_b:
bnez t3, continue_loop # if bitB != 0, goto continue_loop
addi t0, t0, 1 # flips += 1
continue_loop:
addi t1, t1, -1 # i--
j loop
end_loop:
mv a0, t0 # restore return value
lw ra, 12(sp)
addi sp, sp, 16
# restore t0-t3 registers
lw t0, 0(sp)
lw t1, 4(sp)
lw t2, 8(sp)
lw t3, 12(sp)
addi sp, sp, 16
ret
# print result function
print_result:
bne a0, a1, wrong_case # if a0 != a1, goto wrong_case
la a0, correct_msg
li a7, 4
ecall
j end_print
wrong_case:
la a0, wrong_msg
li a7, 4
ecall
end_print:
la a0, newline
li a7, 4
ecall
ret
```
:::danger
Use fewer instructions.
:::
## Analysis
### Execution info.
| | Cycles | instrs. retired | CPI | IPC |
| -------- | -------- | -------- | -------- | -------- |
| minFlips without `clz` | 1945 | 1355 | 1.44 | 0.697 |
| minFlips with `clz` | 1692 | 1220 | 1.39 | 0.721 |
Through the improvement of counting leading zeros (clz), it can significantly accelerate test case with a large number of leading zeros. By comparing the first test case, which `a = 1`, `b = 2` and `c = 3`, we can evaluate how many clock cycles can be reduced before and after using the optimized counting leading zero method.
| | Cycles |
| -------- | -------- |
| without `clz` | 676 |
| with `clz` | 405 |
### Single-cycle processor

A single-cycle processor completes instruction execution, including IF, ID, EX, MEM, and WB, in one clock cycle. Every instruction, regardless of complexity, is executed within this cycle, requiring N cycles for N instructions, fewer than a 5-stage processor. However, the clock cycle time must accommodate the slowest instruction, like memory access, which extends the cycle and slows down the processor despite using fewer cycles.
| | Cycles | instrs. retired | CPI | IPC |
| -------- | -------- | -------- | -------- | -------- |
| minFlips with `clz` | 1220 | 1220 | 1 | 1 |
The output from Ripes shows that the number of cycles is equal to the instructions retired, indicating that each instruction is completed in a single cycle.
### 5-stage processor

A pipelined processor divides instruction execution into stages, IF, ID, EX, MEM, and WB, with each stage taking one clock cycle. This allows different instructions to be processed simultaneously at various stages, a process known as pipelining. Although a 5-stage pipeline processor may require more total clock cycles than a single-cycle processor due to potential stalls from data or control hazards, its clock cycle time is shorter. This is because each stage only performs part of the work, minimizing the wait for the longest operation. Consequently, despite using more clock cycles, the shorter cycle time and instruction overlap enhance throughput and performance.
### Pipeline
Using the second instruction in `my_clz_loop`, which is `sll t3, t2, t1`, allows for an explanation of the pipeline
#### IF, Instruction Fetch
The image illustrates the pipeline for the `sll x28, x7, x6` instruction, showing the flow from the Program Counter (PC) holding the instruction address `0x00000060` to the Instruction Memory, which fetches the instruction `0x00639e33`.
<div style="text-align: center;">
<img src="https://hackmd.io/_uploads/BJt1cYD1Jg.png" height=500px>
</div>
#### ID, Instruction Decode
The image illustrates the decode stage for the `sll x28, x7, x6` instruction, showing the decoding of the instruction `0x00639e33` and its opcode, along with the relevant register indices for `x7`, `x6`, and `x28`.
<div style="text-align: center;">
<img src="https://hackmd.io/_uploads/r17IjKPyyg.png" height=500px>
</div>
#### EX, Exection
The image illustrates the execution stage of the `sll x28 x7 x6` instruction, where the ALU performs a left shift on the value in register `x7` by the number of bits specified in `x6`, and the result is stored in `x28`.
<div style="text-align: center;">
<img src="https://hackmd.io/_uploads/BJHthtDk1g.png" height=500px>
</div>
#### MEM, Memory Access
The image illustrates the memory access stage of the `sll x28 x7 x6` instruction, the data memory block handles data read and write operations, with the write enable signal active.
<div style="text-align: center;">
<img src="https://hackmd.io/_uploads/BJNMk5vJJl.png" height=500px>
</div>
#### WB, Write Back
The image illustrates the write stage of the `sll x28 x7 x6` instruction, the shifted result from the ALU is written to register x28.
<div style="text-align: center;">
<img src="https://hackmd.io/_uploads/ByDT-qDkJx.png" height=500px>
</div>
## Reference
* [Quiz1 of Computer Architecture (2024 Fall)](https://hackmd.io/@sysprog/arch2024-quiz1-sol)
* [1318. Minimum Flips to Make a OR b Equal to c](https://leetcode.com/problems/minimum-flips-to-make-a-or-b-equal-to-c/description/)
* [RISC-V Instruction Set Specifications](https://msyksphinz-self.github.io/riscv-isadoc/html/index.html)