Assignment1: RISC-V Assembly and Instruction Pipeline

# Explore and analysis different approaches of CLZ method contributed by ??? ## Environment :::info Iscpu ``` Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 ``` ::: ## Implement by iteration 1. This code efficiently counts the number of leading zeros in a 64-bit unsigned integer by iteratively shifting the input and adjusting the count until no leading zeros remain. #### C Code: ```javascript= #include <stdint.h> unsigned clz(uint64_t x) { int n = 64, c = 32; do { uint32_t y = (uint32_t)(x >> c); if (y) { n -= c; x = (uint64_t)y; } c >>= 1; } while (c); return (n - (int)x); } ``` #### Assembly Language: ```javascript= count_leading_zero: li a0, 0 # Clear the result register (a0) beqz a1, zero_result # Check if x (a1) is zero, branch to zero_result li t0, 32 # Set up a loop counter (t0) to 32 count_loop: srl a1, a1, 1 # Shift the input right by 1 bit (a1) bnez a1, not_carry # Branch to not_carry if the input is not zero addi a0, a0, 1 # Increment the result if input is zero not_carry: subi t0, t0, 1 # Decrement the loop counter (t0) bnez t0, count_loop # Loop until t0 becomes zero zero_result: li a0, 32 # Set the result to 32 if the input was zero ret ``` #### Description In the **do/while** section, 1. **uint32_t y = x >> c;** : Inside the loop, the 64-bit input value x is right-shifted by c positions, and the result is stored in a 32-bit unsigned integer y. This effectively extracts the upper 32 bits of x. 2. **if (y) { ... }:** This condition checks whether the value of y is non-zero. If y is non-zero, it means that there are non-zero bits in the upper 32 bits of x, and we need to adjust our counting. 3. **n -= c;** : If y is non-zero, we subtract c from the n variable. This step updates our count of leading zeros. 4. **x = y;** : We then update the value of x to be y, effectively discarding the leading zeros that we've already counted. 5. **c >>= 1;** : Finally, we right-shift c by 1, effectively halving its value. This prepares it for the next iteration of the loop, where we'll shift x by half as many bits. #### Result ## Implement by hash table 1. Filling all the bits after the highest non-zero bit with 1 can be regarded as a first classification, classifying numbers with the same clz value. 2. Then use some operations to create a one-to-one correspondence between each number, and then make a table -> apply the concept of hash function. #### Assembly Language: ```javascript= count_leading_zeros: # Save temporary register s1 push s1 # Initialize s0 with t0 mv s0, t0 sw x1, -4(s0) lw s1, -4(s0) # Perform right shift (shr) by 1 srli s1, s1, 1 or s1, s1, s1 sw s1, -4(s0) srli s1, s1, 2 or s1, s1, s1 sw s1, -4(s0) srli s1, s1, 4 or s1, s1, s1 sw s1, -4(s0) srli s1, s1, 8 or s1, s1, s1 sw s1, -4(s0) srli s1, s1, 16 or s1, s1, s1 sw s1, -4(s0) # Perform left shift (sal) by 3 slli s1, s1, 3 sub s1, s1, s1 sw s1, -4(s0) # Repeat the above left shift and subtraction operations with different shift amounts (8) slli s1, s1, 8 sub s1, s1, s1 sw s1, -4(s0) slli s1, s1, 8 sub s1, s1, s1 sw s1, -4(s0) slli s1, s1, 8 sub s1, s1, s1 sw s1, -4(s0) # Perform right shift (shr) by 26 srli s1, s1, 26 # Move the result to s1 mv s1, s1 # Load the table address into s2 la s2, table lbu s1, 0(s2) # Move the result to s1 mv s1, s1 # Restore temporary register s1 pop s1 ret # Define the table .section .rodata table: .byte 0b00111111 .byte 0b00010000 .byte 0b00110011 .byte 0b00010001 .byte 0x00 .byte 0x00 .byte 0b00110101 .byte 0x00 .byte 0x00 .byte 0b00001100 .byte 0b00010111 .byte 0x00 .byte 0b00110100 .byte 0b00110001 .byte 0b00000010 .byte 0b00100001 .byte 0b00000100 .byte 0x00 .byte 0x00 .byte 0b00001101 .byte 0b00001100 .byte 0b00110010 .byte 0x00 .byte 0b00010110 .byte 0x00 .byte 0b00100111 .byte 0b00110011 .byte 0b00000110 .byte 0b00110000 .byte 0b00001000 .byte 0x00 .byte 0x00 ``` #### Description ## Implement by binary search #### Assembly Language: ```javascript= count_leading_zeros: li a0, 0 # Clear the result register (a0) beqz a1, zero_result # Check if x (a1) is zero, branch to zero_result li t0, 32 # Set up a loop counter (t0) to 32 count_loop: srl a1, a1, 1 # Shift the input right by 1 bit (a1) bnc not_carry # Branch if the carry flag (CF) is not set (there's no direct equivalent in RISC-V) addi a0, a0, 1 # Increment the result if carry is set not_carry: subi t0, t0, 1 # Decrement the loop counter (t0) bnez t0, count_loop # Loop until t0 becomes zero zero_result: li a0, 32 # Set the result to 32 if the input was zero ret ``` #### Descpition ## Analysis & Comparison: 1. Algorithm and approach * Case **iteration** uses a loop and bitwise operations to count leading zeros by shifting and checking bits iteratively. * Case **hash table** utilizes a lookup table and bitwise operations to calculate the count of leading zeros. * Case **binary search** uses a series of conditional checks and bit shifting to find the count of leading zeros. 2. Complexity * Case **iteration** uses a loop, which results in multiple iterations based on the number of leading zeros. The loop complexity depends on the input value. * Case **hash table** uses a lookup table, which provides a constant-time lookup for leading zero counts. The complexity is independent of the input value. * Case **binary search** uses a series of conditional checks and bit shifts. The number of checks and shifts is fixed and does not depend on the input value. 3. Performance and efficiency * Case **iteration** may have variable performance based on the input value because it uses a loop. It could be less efficient for large input values with many leading zeros. * Case **hash table** has good performance characteristics due to the use of a lookup table. It provides a consistent and efficient way to count leading zeros. * Case **binary search** also has good performance characteristics, as it uses a fixed number of operations regardless of the input value. However, it may not be as efficient as the second code snippet in some cases. * ### Summarize **Hash table** generally offers the best performance and is the ==most efficient for counting leading zeros==, especially for larger input values. **binary search** is also efficient but may have slightly worse performance compared to the second snippet. **iteration** is ==less efficient== for large inputs but is still a valid method for counting leading zeros.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.