# Assignment1: RISC-V Assembly and Instruction Pipeline
contributed by: < `DebugMyLife` (劉孟璋) >
## Problem C in Quiz1
### C Code
```c
static inline int my_clz(uint32_t x)
{
int count = 0;
for (int i = 31; i >= 0; --i)
{
if (x & (1U << i))
break;
count++;
}
return count;
}
// test
int main(void)
{
uint32_t data1 = 0x00000000;
int result1 = my_clz(data1);
if (result1 == 32)
printf("Test 1: Result correct.\n");
else
printf("Test 1: Result wrong.\n");
uint32_t data2 = 0x114514;
int result2 = my_clz(data2);
if (result2 == 11)
printf("Test 2: Result correct.\n");
else
printf("Test 2: Result wrong.\n");
uint32_t data3 = 0x12345678;
int result3 = my_clz(data3);
if (result3 == 3)
printf("Test 3: Result correct.\n");
else
printf("Test 3: Result wrong.\n");
return 0
}
// test
int main(void)
{
uint32_t data1 = 0x00000000;
int result1 = my_clz(data1);
if (result1 == 32)
printf("Result correct. \n");
else
printf("Result wrong. (Expected result: 32) \n");
uint32_t data2 = 0x114514;
int result2 = my_clz(data2);
if (result2 == 11)
printf("Result correct.\n");
else
printf("Result wrong. (Expected result: 11) \n");
uint32_t data3 = 0x12345678;
int result3 = my_clz(data3);
if (result3 == 3)
printf("Result correct. \n");
else
printf("Result wrong. (Expected result: 3) \n");
return 0;
}
```
:::danger
Don't paste code snip without comprehensive discussions.
:::
### Assembly Code
```c
.data
data1: .word 0x00000000
data2: .word 0x114514
data3: .word 0x12345678
ans1: .word 32
ans2: .word 11
ans3: .word 3
msg1: .string "Test 1: "
msg2: .string "Test 2: "
msg3: .string "Test 3: "
msg4: .string "Result correct!\n"
msg5: .string "Result wrong!\n"
.text
main:
li a7, 4
test1:
la a0, msg1
ecall
la t0, data1
lw s0, 0(t0)
la t0, ans1
lw s1, 0(t0)
jal ra, my_clz
test2:
la a0, msg2
ecall
la t0, data2
lw s0, 0(t0)
la t0, ans2
lw s1, 0(t0)
jal ra, my_clz
test3:
la a0, msg3
ecall
la t0, data3
lw s0, 0(t0)
la t0, ans3
lw s1, 0(t0)
jal ra, my_clz
li a7, 10
ecall
my_clz:
li s2, 31 # i
li s3, 1 # 1
li s4, 0 # count
my_clz_loop:
blt s2, zero, my_clz_end
sll t0, s3, s2
and t0, s0, t0
beq t0, zero, result_0
j my_clz_end
result_0:
addi s4, s4, 1 # count++
addi s2, s2, -1 # i--
j my_clz_loop
my_clz_end:
bne s1, s4, wrong
la a0, msg4
ecall
ret
wrong:
la a0, msg5
ecall
ret
```
<br><hr>
### Test Data
:::danger
Don't use HTML tags. Use Markdown synatx always!
:::
<table>
<tr>
<td></td>
<td>test data</td>
<td>expected result</td>
</tr>
<tr>
<td>1</td>
<td>0x00000000</td>
<td>32</td>
</tr>
<tr>
<td>2</td>
<td>0x114514</td>
<td>11</td>
</tr>
<tr>
<td>3</td>
<td>0x12345678</td>
<td>3</td>
</tr>
</table>
<br><hr>
### Output Result

<br><hr>
### Performance
<table>
<tr>
<td>Cycls</td>
<td>610</td>
</tr>
<tr>
<td>Instrs. retired</td>
<td>390</td>
</tr>
<tr>
<td>CPI</td>
<td>1.56</td>
</tr>
<tr>
<td>IPC</td>
<td>0.639</td>
</tr>
<tr>
<td>Clock rate</td>
<td>0 Hz</td>
</tr>
</table>
<br><hr><br>
## [LeetCode 1404.](https://leetcode.com/problems/number-of-steps-to-reduce-a-number-in-binary-representation-to-one/) <br>Number of Steps to Reduce a Number in Binary Representation to One
### Question
<p> Given the binary representation of an integer as a string s, return the number of steps to reduce it to 1 under the following rules:
</p>
<ul>
<li>If the current number is even, you have to divide it by 2.</li>
<li>If the current number is odd, you have to add 1 to it.</li>
</ul>
<p>It is guaranteed that you can always reach one for all test cases.</p>
</div>
<br>
##### Example 1:
<blockquote>
<b>Input:</b> s = "1101"<br>
<b>Output:</b> 6<br>
<b>Explanation:</b><br>
"1101" corressponds to number 13 in their decimal representation.<br>
Step 1) 13 is odd, add 1 and obtain 14. <br>
Step 2) 14 is even, divide by 2 and obtain 7.<br>
Step 3) 7 is odd, add 1 and obtain 8.<br>
Step 4) 8 is even, divide by 2 and obtain 4.<br>
Step 5) 4 is even, divide by 2 and obtain 2.<br>
Step 6) 2 is even, divide by 2 and obtain 1.<br>
</blockquote>
##### Example 2:
<blockquote>
<b>Input:</b> s = "10"<br>
<b>Output:</b> 1<br>
<b>Explanation:</b><br>
"10" corressponds to number 2 in their decimal representation.<br>
Step 1) 2 is even, divide by 2 and obtain 1.<br>
</blockquote>
##### Example 3:
<blockquote>
<b>Input:</b> s = "1"<br>
<b>Output:</b> 0<br>
</blockquote><br>
##### Constraints:
<ol>
<li><code>1 <= s.length <= 500</code></li>
<li><code>s</code> consists of characters '0' or '1'</li>
<li><code>s[0] == '1'</code></li>
</ol>
<br><hr>
### Solution
You can find the source code [here](https://)
1. Use `my_clz` ( counting leading zero ) function from <b><em>Problem C in Quiz 1</em></b> to calculate the number of leading zeros to check how many bits of target number - `num` can be skipped.
2. When encountering a `0`, increment `steps` by 1.
3. When encountering a `1`, increment `steps` by 2 and increment `num` by 1.
4. No matter `step 2` or `step 3`, left shift `num` by 1 bit
5. Use `my_clz` function again for checking carry.
<blockquote>
Carry may occur when incrementing <code>num</code><br>
Ex: ( 111 )<sub>2</sub> ⭢ ( 1000 )<sub>2</sub><br>
</blockquote>
6. return `steps`
<br><hr>
### C Code
Function `my_clz` calculates how many leading zeros are in a number.
```c
static inline int my_clz(uint32_t x)
{
int count = 0;
for (int i = 31; i >= 0; --i)
{
if (x & (1U << i))
break;
count++;
}
return count;
}
```
<br><br>
Function `bin2num` converts numbers <b><em>in binary representation</em></b> to `uint32_t` format
<blockquote>
Binary representation numbers are stored in an array of the type <code>char</code> , so we can't implement <code>my_clz</code> to <code>num</code> directly.
</blockquote>
```c
static inline uint32_t bin2num(int num)
{
int i = 0;
uint32_t bin = 0;
while (num != 0)
{
if (num % 10 == 1)
{
bin += pow(2, i);
}
i++;
num /= 10;
}
return bin;
}
```
<br><br>
Function `reduce2one` compute the total steps of reducing `num` to `1`
```c
static inline int reduce2one(uint32_t num)
{
int steps = 0;
int zeros = my_clz(num);
if (zeros == 32)
return -1;
if (zeros == 31)
return 0;
int times = 32 - zeros;
for(int i=0; i < times; i++)
{
if (num == 1)
continue;
else if ((num & 0x1))
{
num++;
steps += 2;
}
else if (!(num & 0x1))
steps++;
num = num >> 1;
if (i+1==times && num != 1)
{
i = 0;
times = 32 - my_clz(num);
}
}
return steps;
}
```
<br><br>
Function `main` tests the data and shows the result.
```c
int main(void)
{
char s1[] = "10111001";
uint32_t num1 = bin2num(atoi(s1));
if (reduce2one(num1) == 12)
printf("Test 1: Result correct.\n");
else
printf("Test 1: Result wrong.\n");
char s2[] = "1101";
uint32_t num2 = bin2num(atoi(s2));
if (reduce2one(num2) == 6)
printf("Test 2: Result correct.\n");
else
printf("Test 2: Result wrong.\n");
char s3[] = "10";
uint32_t num3 = bin2num(atoi(s3));
if (reduce2one(num3) == 1)
printf("Test 3: Result correct.\n");
else
printf("Test 3: Result wrong.\n");
char s4[] = "1";
uint32_t num4 = bin2num(atoi(s4));
if (reduce2one(num4) == 0)
printf("Test 4: Result correct.\n");
else
printf("Test 4: Result wrong.\n");
return 0;
}
```
<br><hr>
### Assembly Code
```c
.data
data1: .byte 0b10111001
data2: .byte 0b1101
data3: .byte 0b10
data4: .byte 0b1
ans1: .word 12
ans2: .word 6
ans3: .word 1
ans4: .word 0
msg1: .string "Test 1: "
msg2: .string "Test 2: "
msg3: .string "Test 3: "
msg4: .string "Test 4: "
msg5: .string "Result correct!\n"
msg6: .string "Result wrong!\n"
.text
main:
li a7, 4
test1:
la a0, msg1
ecall
la t0, data1
lbu s0, 0(t0)
la t0, ans1
lw s1, 0(t0)
jal ra, reduce2one
bne s1, s5, wrong
la a0, msg5
ecall
test2:
la a0, msg2
ecall
la t0, data2
lbu s0, 0(t0)
la t0, ans2
lw s1, 0(t0)
jal ra, reduce2one
bne s1, s5, wrong
la a0, msg5
ecall
test3:
la a0, msg3
ecall
la t0, data3
lbu s0, 0(t0)
la t0, ans3
lw s1, 0(t0)
jal ra, reduce2one
bne s1, s5, wrong
la a0, msg5
ecall
test4:
la a0, msg4
ecall
la t0, data4
lbu s0, 0(t0)
la t0, ans4
lw s1, 0(t0)
jal ra, reduce2one
bne s1, s5, wrong
la a0, msg5
ecall
li a7, 10
ecall
ret
wrong:
la a0, msg6
ecall
li a7, 10
ecall
ret
reduce2one:
li s5, 0 # steps
li s6, 0 # times
li s7, 0 # i
li t1, 32 # 32
li t2, 31 # 31
li t3, 1 # 1
mv s8, s0 # s8 = num
addi sp, sp, -4
sw ra, 0(sp)
jal ra, my_clz
beq s4, t1, reduce2one_0
beq s4, t2, reduce2one_1
xori s4, s4, -1
addi s4, s4, 1
addi s6, s4, 32 # times = 32 - count;
reduce2one_Loop:
bge s7, s6, reduce2one_end # times <= i -> break;
beq t3, s8, reduce2one_end
andi t4, s8, 0x1 # check bits
beq t4, zero, reduce2one_bit0
reduce2one_bit1:
addi s5, s5, 2 # step+=2
addi s8, s8, 1 # num++
j reduce2one_branch1
reduce2one_bit0:
addi s5, s5, 1 # step++
reduce2one_branch1:
srli s8, s8, 1 # num >> 1
mv t5, s7
addi t5, t5, 1
bne t5, s6, reduce2one_branch2
beq s8, t3, reduce2one_branch2
li s7, 0 # i = 0
jal ra, my_clz
xori s4, s4, 0xFFFFFFFF
addi s4, s4, 1
addi s6, s4, 32 # times = 32 - count;
reduce2one_branch2:
addi s7, s7, 1
j reduce2one_Loop
reduce2one_0:
li s5, -1
j reduce2one_end
reduce2one_1:
li s5, 0
j reduce2one_end
reduce2one_end:
lw ra, 0(sp)
addi sp, sp, 4
ret
my_clz:
li s2, 31 # i
li s3, 1 # 1
li s4, 0 # count
my_clz_loop:
blt s2, zero, my_clz_end
sll t0, s3, s2
and t0, s0, t0
beq t0, zero, result_0
j my_clz_end
result_0:
addi s4, s4, 1 # count++
addi s2, s2, -1 # i--
j my_clz_loop
my_clz_end:
ret
```
<br><hr>
### Disassembled executable code
```c
00000000 <main>:
0: 00400893 addi x17 x0 4
00000004 <test1>:
4: 10000517 auipc x10 0x10000
8: 01050513 addi x10 x10 16
c: 00000073 ecall
10: 10000297 auipc x5 0x10000
14: ff028293 addi x5 x5 -16
18: 0002c403 lbu x8 0 x5
1c: 10000297 auipc x5 0x10000
20: fe828293 addi x5 x5 -24
24: 0002a483 lw x9 0 x5
28: 0e0000ef jal x1 224 <reduce2one>
2c: 0d549263 bne x9 x21 196 <wrong>
30: 10000517 auipc x10 0x10000
34: 00850513 addi x10 x10 8
38: 00000073 ecall
0000003c <test2>:
3c: 10000517 auipc x10 0x10000
40: fe150513 addi x10 x10 -31
44: 00000073 ecall
48: 10000297 auipc x5 0x10000
4c: fb928293 addi x5 x5 -71
50: 0002c403 lbu x8 0 x5
54: 10000297 auipc x5 0x10000
58: fb428293 addi x5 x5 -76
5c: 0002a483 lw x9 0 x5
60: 0a8000ef jal x1 168 <reduce2one>
64: 09549663 bne x9 x21 140 <wrong>
68: 10000517 auipc x10 0x10000
6c: fd050513 addi x10 x10 -48
70: 00000073 ecall
00000074 <test3>:
74: 10000517 auipc x10 0x10000
78: fb250513 addi x10 x10 -78
7c: 00000073 ecall
80: 10000297 auipc x5 0x10000
84: f8228293 addi x5 x5 -126
88: 0002c403 lbu x8 0 x5
8c: 10000297 auipc x5 0x10000
90: f8028293 addi x5 x5 -128
94: 0002a483 lw x9 0 x5
98: 070000ef jal x1 112 <reduce2one>
9c: 05549a63 bne x9 x21 84 <wrong>
a0: 10000517 auipc x10 0x10000
a4: f9850513 addi x10 x10 -104
a8: 00000073 ecall
000000ac <test4>:
ac: 10000517 auipc x10 0x10000
b0: f8350513 addi x10 x10 -125
b4: 00000073 ecall
b8: 10000297 auipc x5 0x10000
bc: f4b28293 addi x5 x5 -181
c0: 0002c403 lbu x8 0 x5
c4: 10000297 auipc x5 0x10000
c8: f4c28293 addi x5 x5 -180
cc: 0002a483 lw x9 0 x5
d0: 038000ef jal x1 56 <reduce2one>
d4: 01549e63 bne x9 x21 28 <wrong>
d8: 10000517 auipc x10 0x10000
dc: f6050513 addi x10 x10 -160
e0: 00000073 ecall
e4: 00a00893 addi x17 x0 10
e8: 00000073 ecall
ec: 00008067 jalr x0 x1 0
000000f0 <wrong>:
f0: 10000517 auipc x10 0x10000
f4: f5950513 addi x10 x10 -167
f8: 00000073 ecall
fc: 00a00893 addi x17 x0 10
100: 00000073 ecall
104: 00008067 jalr x0 x1 0
00000108 <reduce2one>:
108: 00000a93 addi x21 x0 0
10c: 00000b13 addi x22 x0 0
110: 00000b93 addi x23 x0 0
114: 02000313 addi x6 x0 32
118: 01f00393 addi x7 x0 31
11c: 00100e13 addi x28 x0 1
120: 00040c13 addi x24 x8 0
124: ffc10113 addi x2 x2 -4
128: 00112023 sw x1 0 x2
12c: 084000ef jal x1 132 <my_clz>
130: 066a0263 beq x20 x6 100 <reduce2one_0>
134: 067a0463 beq x20 x7 104 <reduce2one_1>
138: fffa4a13 xori x20 x20 -1
13c: 001a0a13 addi x20 x20 1
140: 020a0b13 addi x22 x20 32
00000144 <reduce2one_Loop>:
144: 076bd063 bge x23 x22 96 <reduce2one_end>
148: 058e0e63 beq x28 x24 92 <reduce2one_end>
14c: 001c7e93 andi x29 x24 1
150: 000e8863 beq x29 x0 16 <reduce2one_bit0>
00000154 <reduce2one_bit1>:
154: 002a8a93 addi x21 x21 2
158: 001c0c13 addi x24 x24 1
15c: 0080006f jal x0 8 <reduce2one_branch1>
00000160 <reduce2one_bit0>:
160: 001a8a93 addi x21 x21 1
00000164 <reduce2one_branch1>:
164: 001c5c13 srli x24 x24 1
168: 000b8f13 addi x30 x23 0
16c: 001f0f13 addi x30 x30 1
170: 016f1e63 bne x30 x22 28 <reduce2one_branch2>
174: 01cc0c63 beq x24 x28 24 <reduce2one_branch2>
178: 00000b93 addi x23 x0 0
17c: 034000ef jal x1 52 <my_clz>
180: fffa4a13 xori x20 x20 -1
184: 001a0a13 addi x20 x20 1
188: 020a0b13 addi x22 x20 32
0000018c <reduce2one_branch2>:
18c: 001b8b93 addi x23 x23 1
190: fb5ff06f jal x0 -76 <reduce2one_Loop>
00000194 <reduce2one_0>:
194: fff00a93 addi x21 x0 -1
198: 00c0006f jal x0 12 <reduce2one_end>
0000019c <reduce2one_1>:
19c: 00000a93 addi x21 x0 0
1a0: 0040006f jal x0 4 <reduce2one_end>
000001a4 <reduce2one_end>:
1a4: 00012083 lw x1 0 x2
1a8: 00410113 addi x2 x2 4
1ac: 00008067 jalr x0 x1 0
000001b0 <my_clz>:
1b0: 01f00913 addi x18 x0 31
1b4: 00100993 addi x19 x0 1
1b8: 00000a13 addi x20 x0 0
000001bc <my_clz_loop>:
1bc: 02094063 blt x18 x0 32 <my_clz_end>
1c0: 012992b3 sll x5 x19 x18
1c4: 005472b3 and x5 x8 x5
1c8: 00028463 beq x5 x0 8 <result_0>
1cc: 0100006f jal x0 16 <my_clz_end>
000001d0 <result_0>:
1d0: 001a0a13 addi x20 x20 1
1d4: fff90913 addi x18 x18 -1
1d8: fe5ff06f jal x0 -28 <my_clz_loop>
000001dc <my_clz_end>:
1dc: 00008067 jalr x0 x1 0
```
<br><hr>
### Test Data
<table>
<tr>
<td></td>
<td colspan="2" style="text-align:center">test data</td>
<td>expected result</td>
</tr>
<tr>
<td>1</td>
<td>"10111001"<sub>[c]</sub></td>
<td>0b10111001<sub>[ass]</sub></td>
<td style="text-align:center">12</td>
</tr>
<tr>
<td>2</td>
<td>"1101"<sub>[c]</sub></td>
<td>0b1101<sub>[ass]</sub></td>
<td style="text-align:center">6</td>
</tr>
<tr>
<td>3</td>
<td>"10"<sub>[c]</sub></td>
<td>0b10<sub>[ass]</sub></td>
<td style="text-align:center">1</td>
</tr>
<tr>
<td>4</td>
<td>"1"<sub>[c]</sub></td>
<td>0b1<sub>[ass]</sub></td>
<td style="text-align:center">0</td>
</tr>
</table>
<br><hr>
### Output Result

<br><hr>
### Performance
<table>
<tr>
<td>Cycls</td>
<td>1720</td>
</tr>
<tr>
<td>Instrs. retired</td>
<td>1118</td>
</tr>
<tr>
<td>CPI</td>
<td>1.54</td>
</tr>
<tr>
<td>IPC</td>
<td>0.65</td>
</tr>
<tr>
<td>Clock rate</td>
<td>369.18 Hz</td>
</tr>
</table>
<br><hr>
### Analysis
<p>Test the code with Ripes simulator.</p>
#### 5-stage pipelined processor
<p>There are 5 stages in this processor. Using five-stage pipeline to parallelize instructions.</p>

<p>The 5 stages are:</p>
<ol>
<li><code>IF</code> ( Instruction fetch )</li>
<ul>
<li>
Read the next expected instruction into the buffer
</li>
</ul>
<li><code>ID</code> ( Instruction decode )</li>
<ul>
<li>
Determine opcode and operand specifiers
</li>
<li>
Calculate effective address of each source operand
</li>
<li>
Fetch each operand from memory
</li>
</ul>
<li><code>EXE</code> ( Execute )</li>
<ul>
<li>
Perform indicated operation
</li>
</ul>
<li><code>MEM</code> ( Memory access ) </li>
<ul>
<li>
Access memory
</li>
</ul>
<li><code>WB</code> ( write back )</li>
<ul>
<li>
Store result
</li>
</ul>
</ol>
<p>Each stage is separated by pipeline registers in the block diagram.
<br><br>
#### I-type format
<p>Detailed Analysis of <code>srli x24 x24 1</code> in RISC-V Pipeline:</p>
<blockquote>
Instruction Format: srli rd, rs1, imm12
</blockquote>
<ol>
<li><code>IF</code> ( Instruction fetch )</li>
<img src="https://hackmd.io/_uploads/ryGHM2cy1e.png">
<li><code>ID</code> ( Instruction decode )</li>
<img src="https://hackmd.io/_uploads/HylFGh51kx.png">
<li><code>EXE</code> ( Execute )</li>
<img src="https://hackmd.io/_uploads/HJgnMh9Jyl.png">
<li><code>MEM</code> ( Memory access )</li>
<img src="https://hackmd.io/_uploads/SJixXnc11e.png">
<li><code>WB</code> ( write back )</li>
<img src="https://hackmd.io/_uploads/Hy-mm25kyx.png">
</ol>
<br><hr><br>
## Reference
[Quiz1 of Computer Architecture (2024 Fall)](https://hackmd.io/@sysprog/arch2024-quiz1-sol#Problem-C)
[Lab1: RV32I Simulator](https://hackmd.io/@sysprog/H1TpVYMdB)
[LeetCode 1404 - Number of Steps to Reduce a Number in Binary Representation to One ](https://leetcode.com/problems/number-of-steps-to-reduce-a-number-in-binary-representation-to-one/)