# Assignment1: RISC-V Assembly and Instruction Pipeline contributed by: < `DebugMyLife` (劉孟璋) > ## Problem C in Quiz1 ### C Code ```c static inline int my_clz(uint32_t x) { int count = 0; for (int i = 31; i >= 0; --i) { if (x & (1U << i)) break; count++; } return count; } // test int main(void) { uint32_t data1 = 0x00000000; int result1 = my_clz(data1); if (result1 == 32) printf("Test 1: Result correct.\n"); else printf("Test 1: Result wrong.\n"); uint32_t data2 = 0x114514; int result2 = my_clz(data2); if (result2 == 11) printf("Test 2: Result correct.\n"); else printf("Test 2: Result wrong.\n"); uint32_t data3 = 0x12345678; int result3 = my_clz(data3); if (result3 == 3) printf("Test 3: Result correct.\n"); else printf("Test 3: Result wrong.\n"); return 0 } // test int main(void) { uint32_t data1 = 0x00000000; int result1 = my_clz(data1); if (result1 == 32) printf("Result correct. \n"); else printf("Result wrong. (Expected result: 32) \n"); uint32_t data2 = 0x114514; int result2 = my_clz(data2); if (result2 == 11) printf("Result correct.\n"); else printf("Result wrong. (Expected result: 11) \n"); uint32_t data3 = 0x12345678; int result3 = my_clz(data3); if (result3 == 3) printf("Result correct. \n"); else printf("Result wrong. (Expected result: 3) \n"); return 0; } ``` :::danger Don't paste code snip without comprehensive discussions. ::: ### Assembly Code ```c .data data1: .word 0x00000000 data2: .word 0x114514 data3: .word 0x12345678 ans1: .word 32 ans2: .word 11 ans3: .word 3 msg1: .string "Test 1: " msg2: .string "Test 2: " msg3: .string "Test 3: " msg4: .string "Result correct!\n" msg5: .string "Result wrong!\n" .text main: li a7, 4 test1: la a0, msg1 ecall la t0, data1 lw s0, 0(t0) la t0, ans1 lw s1, 0(t0) jal ra, my_clz test2: la a0, msg2 ecall la t0, data2 lw s0, 0(t0) la t0, ans2 lw s1, 0(t0) jal ra, my_clz test3: la a0, msg3 ecall la t0, data3 lw s0, 0(t0) la t0, ans3 lw s1, 0(t0) jal ra, my_clz li a7, 10 ecall my_clz: li s2, 31 # i li s3, 1 # 1 li s4, 0 # count my_clz_loop: blt s2, zero, my_clz_end sll t0, s3, s2 and t0, s0, t0 beq t0, zero, result_0 j my_clz_end result_0: addi s4, s4, 1 # count++ addi s2, s2, -1 # i-- j my_clz_loop my_clz_end: bne s1, s4, wrong la a0, msg4 ecall ret wrong: la a0, msg5 ecall ret ``` <br><hr> ### Test Data :::danger Don't use HTML tags. Use Markdown synatx always! ::: <table> <tr> <td></td> <td>test data</td> <td>expected result</td> </tr> <tr> <td>1</td> <td>0x00000000</td> <td>32</td> </tr> <tr> <td>2</td> <td>0x114514</td> <td>11</td> </tr> <tr> <td>3</td> <td>0x12345678</td> <td>3</td> </tr> </table> <br><hr> ### Output Result ![1](https://hackmd.io/_uploads/SJvOY6tkkg.png) <br><hr> ### Performance <table> <tr> <td>Cycls</td> <td>610</td> </tr> <tr> <td>Instrs. retired</td> <td>390</td> </tr> <tr> <td>CPI</td> <td>1.56</td> </tr> <tr> <td>IPC</td> <td>0.639</td> </tr> <tr> <td>Clock rate</td> <td>0 Hz</td> </tr> </table> <br><hr><br> ## [LeetCode 1404.](https://leetcode.com/problems/number-of-steps-to-reduce-a-number-in-binary-representation-to-one/) <br>Number of Steps to Reduce a Number in Binary Representation to One ### Question <p> Given the binary representation of an integer as a string s, return the number of steps to reduce it to 1 under the following rules: </p> <ul> <li>If the current number is even, you have to divide it by 2.</li> <li>If the current number is odd, you have to add 1 to it.</li> </ul> <p>It is guaranteed that you can always reach one for all test cases.</p> </div> <br> ##### Example 1: <blockquote> <b>Input:</b> s = "1101"<br> <b>Output:</b> 6<br> <b>Explanation:</b><br> "1101" corressponds to number 13 in their decimal representation.<br> Step 1) 13 is odd, add 1 and obtain 14. <br> Step 2) 14 is even, divide by 2 and obtain 7.<br> Step 3) 7 is odd, add 1 and obtain 8.<br> Step 4) 8 is even, divide by 2 and obtain 4.<br> Step 5) 4 is even, divide by 2 and obtain 2.<br> Step 6) 2 is even, divide by 2 and obtain 1.<br> </blockquote> ##### Example 2: <blockquote> <b>Input:</b> s = "10"<br> <b>Output:</b> 1<br> <b>Explanation:</b><br> "10" corressponds to number 2 in their decimal representation.<br> Step 1) 2 is even, divide by 2 and obtain 1.<br> </blockquote> ##### Example 3: <blockquote> <b>Input:</b> s = "1"<br> <b>Output:</b> 0<br> </blockquote><br> ##### Constraints: <ol> <li><code>1 <= s.length <= 500</code></li> <li><code>s</code> consists of characters '0' or '1'</li> <li><code>s[0] == '1'</code></li> </ol> <br><hr> ### Solution You can find the source code [here](https://) 1. Use `my_clz` ( counting leading zero ) function from <b><em>Problem C in Quiz 1</em></b> &nbsp;to calculate the number of leading zeros to check how many bits of target number - `num` can be skipped. 2. When encountering a `0`, increment `steps` by 1. 3. When encountering a `1`, increment `steps` by 2 and increment `num` by 1. 4. No matter `step 2` or `step 3`, left shift `num` by 1 bit 5. Use `my_clz` function again for checking carry. <blockquote> Carry may occur when incrementing <code>num</code><br> Ex: ( 111 )<sub>2</sub> ⭢ ( 1000 )<sub>2</sub><br> </blockquote> 6. return `steps` <br><hr> ### C Code Function `my_clz` calculates how many leading zeros are in a number. ```c static inline int my_clz(uint32_t x) { int count = 0; for (int i = 31; i >= 0; --i) { if (x & (1U << i)) break; count++; } return count; } ``` <br><br> Function `bin2num` converts numbers <b><em>in binary representation</em></b>&nbsp; to `uint32_t` format <blockquote> Binary representation numbers are stored in an array of the type <code>char</code> , so we can't implement <code>my_clz</code> to <code>num</code> directly. </blockquote> ```c static inline uint32_t bin2num(int num) { int i = 0; uint32_t bin = 0; while (num != 0) { if (num % 10 == 1) { bin += pow(2, i); } i++; num /= 10; } return bin; } ``` <br><br> Function `reduce2one` compute the total steps of reducing `num` to `1` ```c static inline int reduce2one(uint32_t num) { int steps = 0; int zeros = my_clz(num); if (zeros == 32) return -1; if (zeros == 31) return 0; int times = 32 - zeros; for(int i=0; i < times; i++) { if (num == 1) continue; else if ((num & 0x1)) { num++; steps += 2; } else if (!(num & 0x1)) steps++; num = num >> 1; if (i+1==times && num != 1) { i = 0; times = 32 - my_clz(num); } } return steps; } ``` <br><br> Function `main` tests the data and shows the result. ```c int main(void) { char s1[] = "10111001"; uint32_t num1 = bin2num(atoi(s1)); if (reduce2one(num1) == 12) printf("Test 1: Result correct.\n"); else printf("Test 1: Result wrong.\n"); char s2[] = "1101"; uint32_t num2 = bin2num(atoi(s2)); if (reduce2one(num2) == 6) printf("Test 2: Result correct.\n"); else printf("Test 2: Result wrong.\n"); char s3[] = "10"; uint32_t num3 = bin2num(atoi(s3)); if (reduce2one(num3) == 1) printf("Test 3: Result correct.\n"); else printf("Test 3: Result wrong.\n"); char s4[] = "1"; uint32_t num4 = bin2num(atoi(s4)); if (reduce2one(num4) == 0) printf("Test 4: Result correct.\n"); else printf("Test 4: Result wrong.\n"); return 0; } ``` <br><hr> ### Assembly Code ```c .data data1: .byte 0b10111001 data2: .byte 0b1101 data3: .byte 0b10 data4: .byte 0b1 ans1: .word 12 ans2: .word 6 ans3: .word 1 ans4: .word 0 msg1: .string "Test 1: " msg2: .string "Test 2: " msg3: .string "Test 3: " msg4: .string "Test 4: " msg5: .string "Result correct!\n" msg6: .string "Result wrong!\n" .text main: li a7, 4 test1: la a0, msg1 ecall la t0, data1 lbu s0, 0(t0) la t0, ans1 lw s1, 0(t0) jal ra, reduce2one bne s1, s5, wrong la a0, msg5 ecall test2: la a0, msg2 ecall la t0, data2 lbu s0, 0(t0) la t0, ans2 lw s1, 0(t0) jal ra, reduce2one bne s1, s5, wrong la a0, msg5 ecall test3: la a0, msg3 ecall la t0, data3 lbu s0, 0(t0) la t0, ans3 lw s1, 0(t0) jal ra, reduce2one bne s1, s5, wrong la a0, msg5 ecall test4: la a0, msg4 ecall la t0, data4 lbu s0, 0(t0) la t0, ans4 lw s1, 0(t0) jal ra, reduce2one bne s1, s5, wrong la a0, msg5 ecall li a7, 10 ecall ret wrong: la a0, msg6 ecall li a7, 10 ecall ret reduce2one: li s5, 0 # steps li s6, 0 # times li s7, 0 # i li t1, 32 # 32 li t2, 31 # 31 li t3, 1 # 1 mv s8, s0 # s8 = num addi sp, sp, -4 sw ra, 0(sp) jal ra, my_clz beq s4, t1, reduce2one_0 beq s4, t2, reduce2one_1 xori s4, s4, -1 addi s4, s4, 1 addi s6, s4, 32 # times = 32 - count; reduce2one_Loop: bge s7, s6, reduce2one_end # times <= i -> break; beq t3, s8, reduce2one_end andi t4, s8, 0x1 # check bits beq t4, zero, reduce2one_bit0 reduce2one_bit1: addi s5, s5, 2 # step+=2 addi s8, s8, 1 # num++ j reduce2one_branch1 reduce2one_bit0: addi s5, s5, 1 # step++ reduce2one_branch1: srli s8, s8, 1 # num >> 1 mv t5, s7 addi t5, t5, 1 bne t5, s6, reduce2one_branch2 beq s8, t3, reduce2one_branch2 li s7, 0 # i = 0 jal ra, my_clz xori s4, s4, 0xFFFFFFFF addi s4, s4, 1 addi s6, s4, 32 # times = 32 - count; reduce2one_branch2: addi s7, s7, 1 j reduce2one_Loop reduce2one_0: li s5, -1 j reduce2one_end reduce2one_1: li s5, 0 j reduce2one_end reduce2one_end: lw ra, 0(sp) addi sp, sp, 4 ret my_clz: li s2, 31 # i li s3, 1 # 1 li s4, 0 # count my_clz_loop: blt s2, zero, my_clz_end sll t0, s3, s2 and t0, s0, t0 beq t0, zero, result_0 j my_clz_end result_0: addi s4, s4, 1 # count++ addi s2, s2, -1 # i-- j my_clz_loop my_clz_end: ret ``` <br><hr> ### Disassembled executable code ```c 00000000 <main>: 0: 00400893 addi x17 x0 4 00000004 <test1>: 4: 10000517 auipc x10 0x10000 8: 01050513 addi x10 x10 16 c: 00000073 ecall 10: 10000297 auipc x5 0x10000 14: ff028293 addi x5 x5 -16 18: 0002c403 lbu x8 0 x5 1c: 10000297 auipc x5 0x10000 20: fe828293 addi x5 x5 -24 24: 0002a483 lw x9 0 x5 28: 0e0000ef jal x1 224 <reduce2one> 2c: 0d549263 bne x9 x21 196 <wrong> 30: 10000517 auipc x10 0x10000 34: 00850513 addi x10 x10 8 38: 00000073 ecall 0000003c <test2>: 3c: 10000517 auipc x10 0x10000 40: fe150513 addi x10 x10 -31 44: 00000073 ecall 48: 10000297 auipc x5 0x10000 4c: fb928293 addi x5 x5 -71 50: 0002c403 lbu x8 0 x5 54: 10000297 auipc x5 0x10000 58: fb428293 addi x5 x5 -76 5c: 0002a483 lw x9 0 x5 60: 0a8000ef jal x1 168 <reduce2one> 64: 09549663 bne x9 x21 140 <wrong> 68: 10000517 auipc x10 0x10000 6c: fd050513 addi x10 x10 -48 70: 00000073 ecall 00000074 <test3>: 74: 10000517 auipc x10 0x10000 78: fb250513 addi x10 x10 -78 7c: 00000073 ecall 80: 10000297 auipc x5 0x10000 84: f8228293 addi x5 x5 -126 88: 0002c403 lbu x8 0 x5 8c: 10000297 auipc x5 0x10000 90: f8028293 addi x5 x5 -128 94: 0002a483 lw x9 0 x5 98: 070000ef jal x1 112 <reduce2one> 9c: 05549a63 bne x9 x21 84 <wrong> a0: 10000517 auipc x10 0x10000 a4: f9850513 addi x10 x10 -104 a8: 00000073 ecall 000000ac <test4>: ac: 10000517 auipc x10 0x10000 b0: f8350513 addi x10 x10 -125 b4: 00000073 ecall b8: 10000297 auipc x5 0x10000 bc: f4b28293 addi x5 x5 -181 c0: 0002c403 lbu x8 0 x5 c4: 10000297 auipc x5 0x10000 c8: f4c28293 addi x5 x5 -180 cc: 0002a483 lw x9 0 x5 d0: 038000ef jal x1 56 <reduce2one> d4: 01549e63 bne x9 x21 28 <wrong> d8: 10000517 auipc x10 0x10000 dc: f6050513 addi x10 x10 -160 e0: 00000073 ecall e4: 00a00893 addi x17 x0 10 e8: 00000073 ecall ec: 00008067 jalr x0 x1 0 000000f0 <wrong>: f0: 10000517 auipc x10 0x10000 f4: f5950513 addi x10 x10 -167 f8: 00000073 ecall fc: 00a00893 addi x17 x0 10 100: 00000073 ecall 104: 00008067 jalr x0 x1 0 00000108 <reduce2one>: 108: 00000a93 addi x21 x0 0 10c: 00000b13 addi x22 x0 0 110: 00000b93 addi x23 x0 0 114: 02000313 addi x6 x0 32 118: 01f00393 addi x7 x0 31 11c: 00100e13 addi x28 x0 1 120: 00040c13 addi x24 x8 0 124: ffc10113 addi x2 x2 -4 128: 00112023 sw x1 0 x2 12c: 084000ef jal x1 132 <my_clz> 130: 066a0263 beq x20 x6 100 <reduce2one_0> 134: 067a0463 beq x20 x7 104 <reduce2one_1> 138: fffa4a13 xori x20 x20 -1 13c: 001a0a13 addi x20 x20 1 140: 020a0b13 addi x22 x20 32 00000144 <reduce2one_Loop>: 144: 076bd063 bge x23 x22 96 <reduce2one_end> 148: 058e0e63 beq x28 x24 92 <reduce2one_end> 14c: 001c7e93 andi x29 x24 1 150: 000e8863 beq x29 x0 16 <reduce2one_bit0> 00000154 <reduce2one_bit1>: 154: 002a8a93 addi x21 x21 2 158: 001c0c13 addi x24 x24 1 15c: 0080006f jal x0 8 <reduce2one_branch1> 00000160 <reduce2one_bit0>: 160: 001a8a93 addi x21 x21 1 00000164 <reduce2one_branch1>: 164: 001c5c13 srli x24 x24 1 168: 000b8f13 addi x30 x23 0 16c: 001f0f13 addi x30 x30 1 170: 016f1e63 bne x30 x22 28 <reduce2one_branch2> 174: 01cc0c63 beq x24 x28 24 <reduce2one_branch2> 178: 00000b93 addi x23 x0 0 17c: 034000ef jal x1 52 <my_clz> 180: fffa4a13 xori x20 x20 -1 184: 001a0a13 addi x20 x20 1 188: 020a0b13 addi x22 x20 32 0000018c <reduce2one_branch2>: 18c: 001b8b93 addi x23 x23 1 190: fb5ff06f jal x0 -76 <reduce2one_Loop> 00000194 <reduce2one_0>: 194: fff00a93 addi x21 x0 -1 198: 00c0006f jal x0 12 <reduce2one_end> 0000019c <reduce2one_1>: 19c: 00000a93 addi x21 x0 0 1a0: 0040006f jal x0 4 <reduce2one_end> 000001a4 <reduce2one_end>: 1a4: 00012083 lw x1 0 x2 1a8: 00410113 addi x2 x2 4 1ac: 00008067 jalr x0 x1 0 000001b0 <my_clz>: 1b0: 01f00913 addi x18 x0 31 1b4: 00100993 addi x19 x0 1 1b8: 00000a13 addi x20 x0 0 000001bc <my_clz_loop>: 1bc: 02094063 blt x18 x0 32 <my_clz_end> 1c0: 012992b3 sll x5 x19 x18 1c4: 005472b3 and x5 x8 x5 1c8: 00028463 beq x5 x0 8 <result_0> 1cc: 0100006f jal x0 16 <my_clz_end> 000001d0 <result_0>: 1d0: 001a0a13 addi x20 x20 1 1d4: fff90913 addi x18 x18 -1 1d8: fe5ff06f jal x0 -28 <my_clz_loop> 000001dc <my_clz_end>: 1dc: 00008067 jalr x0 x1 0 ``` <br><hr> ### Test Data <table> <tr> <td></td> <td colspan="2" style="text-align:center">test data</td> <td>expected result</td> </tr> <tr> <td>1</td> <td>"10111001"<sub>[c]</sub></td> <td>0b10111001<sub>[ass]</sub></td> <td style="text-align:center">12</td> </tr> <tr> <td>2</td> <td>"1101"<sub>[c]</sub></td> <td>0b1101<sub>[ass]</sub></td> <td style="text-align:center">6</td> </tr> <tr> <td>3</td> <td>"10"<sub>[c]</sub></td> <td>0b10<sub>[ass]</sub></td> <td style="text-align:center">1</td> </tr> <tr> <td>4</td> <td>"1"<sub>[c]</sub></td> <td>0b1<sub>[ass]</sub></td> <td style="text-align:center">0</td> </tr> </table> <br><hr> ### Output Result ![1](https://hackmd.io/_uploads/BydoX5cJkg.png) <br><hr> ### Performance <table> <tr> <td>Cycls</td> <td>1720</td> </tr> <tr> <td>Instrs. retired</td> <td>1118</td> </tr> <tr> <td>CPI</td> <td>1.54</td> </tr> <tr> <td>IPC</td> <td>0.65</td> </tr> <tr> <td>Clock rate</td> <td>369.18 Hz</td> </tr> </table> <br><hr> ### Analysis <p>Test the code with Ripes simulator.</p> #### 5-stage pipelined processor <p>There are 5 stages in this processor. Using five-stage pipeline to parallelize instructions.</p> ![image](https://hackmd.io/_uploads/Sk6QU5cJyl.png) <p>The 5 stages are:</p> <ol> <li><code>IF</code> ( Instruction fetch )</li> <ul> <li> Read the next expected instruction into the buffer </li> </ul> <li><code>ID</code> ( Instruction decode )</li> <ul> <li> Determine opcode and operand specifiers </li> <li> Calculate effective address of each source operand </li> <li> Fetch each operand from memory </li> </ul> <li><code>EXE</code> ( Execute )</li> <ul> <li> Perform indicated operation </li> </ul> <li><code>MEM</code> ( Memory access ) </li> <ul> <li> Access memory </li> </ul> <li><code>WB</code> ( write back )</li> <ul> <li> Store result </li> </ul> </ol> <p>Each stage is separated by pipeline registers in the block diagram. <br><br> #### I-type format <p>Detailed Analysis of <code>srli x24 x24 1</code> in RISC-V Pipeline:</p> <blockquote> Instruction Format: srli rd, rs1, imm12 </blockquote> <ol> <li><code>IF</code> ( Instruction fetch )</li> <img src="https://hackmd.io/_uploads/ryGHM2cy1e.png"> <li><code>ID</code> ( Instruction decode )</li> <img src="https://hackmd.io/_uploads/HylFGh51kx.png"> <li><code>EXE</code> ( Execute )</li> <img src="https://hackmd.io/_uploads/HJgnMh9Jyl.png"> <li><code>MEM</code> ( Memory access )</li> <img src="https://hackmd.io/_uploads/SJixXnc11e.png"> <li><code>WB</code> ( write back )</li> <img src="https://hackmd.io/_uploads/Hy-mm25kyx.png"> </ol> <br><hr><br> ## Reference [Quiz1 of Computer Architecture (2024 Fall)](https://hackmd.io/@sysprog/arch2024-quiz1-sol#Problem-C) [Lab1: RV32I Simulator](https://hackmd.io/@sysprog/H1TpVYMdB) [LeetCode 1404 - Number of Steps to Reduce a Number in Binary Representation to One ](https://leetcode.com/problems/number-of-steps-to-reduce-a-number-in-binary-representation-to-one/)