# Computer Architecture Homework2
contributed by< [OllieNi](https://github.com/ollieni) >
## Program selection
### Program
I choose the program from [洪佑杭](https://github.com/hungyuhang), his title is [Multiplication Overflow Prediction](https://hackmd.io/@hungyuhang/risc-v-hw1) which is an implementation of CLZ in [quiz 1](https://hackmd.io/@sysprog/arch2023-quiz1).
### Motivation
I picked his program because I also choose CLZ in assignment 1 and I think his code has many aspects worth learning.
First, I didn't undertake 64bits program, but he did.
Second, his code is well structured and display high readability.
I would like to **learn some advantages from others' codes**.
:::warning
Improve your English writing!
:notes: jserv
:::
## Rewrite code
The original implementation is in this [link](https://hackmd.io/@hungyuhang/risc-v-hw1).
C code can be directly run with the gcc compiler in rv32emu.
The RISC-V code need to be modified since the system call number is differently defined in rv32emu.
The **newlib system call** number is as follow.
|# | System call | Current support |
|------|-----------------|-----------------|
| 57 | `close` | Deletes a descriptor from the per-process object reference table. |
| 62 | `lseek` | Repositions the file offset of the open file description to the argument offset according to the directive whence. |
| 63 | `read` | Reads specific bytes of data from the object referenced by the descriptor fildes into the buffer. |
| 64 | `write` | Prints the buffer as a string to the specified file descriptor. |
| 80 | `fstat` | No effect. |
| 93 | `exit` | Terminates with a status code. |
| 169 | `gettimeofday` | Gets date and time. Current time zone is NOT obtained. |
| 214 | `brk` | Supports updating the program break and returning the current program break. |
| 403 | `clock_gettime` | Retrieves the value used by a clock which is specified by clock-id. |
| 1024 | `open` | Opens or creates a file for reading or writing. |
We only need to used **64(write string) and 93(exit)** in the program.
By the way, I met a problem at first.
<s>

</s>
:::warning
Don't put the screenshots which contain plain text only.
:notes: jserv
:::
After I changed the original exit(nop) to newlib system code, the problem solved.
```c
exit:
nop
```
I utilize this Makefile to compile the C code.
And I will get the corresponding .S .o .elf file of the C code.
```c
.PHONY: clean
include ../../../mk/toolchain.mk
ASFLAGS = -march=rv32i_zicsr_zifencei -mabi=ilp32
LDFLAGS = --oformat=elf32-littleriscv
%.S: %.c
$(CROSS_COMPILE)gcc $(ASFLAGS) -S -o $@ $<
%.o: %.S
$(CROSS_COMPILE)as $(ASFLAGS) -o $@ $<
all: hw2yh.elf
hw2yh.S: hw2yh.c
$(CROSS_COMPILE)gcc $(ASFLAGS) -S -o $@ $<
hw2yh.elf: hw2yh.o getcycle.o getinstret.o
$(CROSS_COMPILE)gcc -o $@ $(ASFLAGS) $^
clean:
$(RM) hw2yh.elf hw2yh.o hw2yh.S getcycle.o
```
## Contrast the handwritten and compiler-optimized assembly
I employ 'getcycles.S' and 'getinstret.S' in perfcounter to retrieve cycle counts and instruction number of the program.
By inspecting the assembly code in both 'getcycles.S' and 'getinstret.S', I learned that RISC-V use cycle and instert registers to assess the code.
**original**
Risc-V assembly :
* cycle count
```
0 0 1 1 cycle count :0858inferior exit code 0
```
* size
```
text data bss dec hex filename
1020 0 0 1020 3fc hw2yh.elf
```
C code :
* cycle count
```
0
0
1
1
cycle: 6085
instret: 51539607564
inferior exit code 21
```
* size
```c
text data bss dec hex filename
76876 2372 1548 80796 13b9c hw2yh.elf
```
The cycle counts of C code is significantly more than that of assembly.
According to my hypothesis, I considered the majority of C code's cycle counts was generated by the "printf" function.
#### **Ofast**
* cycle count and instruction number
```
0
0
1
1
cycle: 4336
instret: 51539607564
inferior exit code 21
```
* size
```c
text data bss dec hex filename
76592 2372 1548 80512 13a80 hw2yh.elf
```
* elf header
```c
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x101b0
Start of program headers: 52 (bytes into file)
Start of section headers: 94984 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
#### **O3**
* cycle count and instruction number
```
0
0
1
1
cycle: 4336
instret: 51539607564
inferior exit code 21
```
* size
```c
text data bss dec hex filename
76592 2372 1548 80512 13a80 hw2yh.elf
```
* elf header
```
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x101b0
Start of program headers: 52 (bytes into file)
Start of section headers: 94984 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
#### **O2**
* cycle count and instruction number
```
0
0
1
1
cycle: 4420
instret: 51539607564
inferior exit code 21
```
* size
```c
text data bss dec hex filename
75940 2372 1548 79860 137f4 hw2yh.elf
```
* elf header
```c
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x101b0
Start of program headers: 52 (bytes into file)
Start of section headers: 94984 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
#### **O1**
* cycle count and instruction number
```
0
0
1
1
cycle: 4420
instret: 51539607564
inferior exit code 21
```
* size
```c
text data bss dec hex filename
75944 2372 1548 79864 137f8 hw2yh.elf
```
* elf header
```c
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x100d8
Start of program headers: 52 (bytes into file)
Start of section headers: 94968 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
#### **Os**
* cycle count and instruction number
```
0
0
1
1
cycle: 4420
instret: 51539607564
inferior exit code 21
```
* size
```c
text data bss dec hex filename
75940 2372 1548 79860 137f4 hw2yh.elf
```
* elf header
```c
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: RISC-V
Version: 0x1
Entry point address: 0x101b0
Start of program headers: 52 (bytes into file)
Start of section headers: 94984 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 3
Size of section headers: 40 (bytes)
Number of section headers: 15
Section header string table index: 14
```
## **Disassemble the ELF files produced by the C compiler**
**Original C**
:::spoiler click to reveal the code
```c
.file "hw2yh.c"
.option nopic
.attribute arch, "rv32i2p1_zicsr2p0_zifencei2p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
.globl a_x0
.section .sbss,"aw",@nobits
.align 3
.type a_x0, @object
.size a_x0, 8
a_x0:
.zero 8
.globl a_x1
.align 3
.type a_x1, @object
.size a_x1, 8
a_x1:
.zero 8
.globl b_x0
.section .sdata,"aw"
.align 3
.type b_x0, @object
.size b_x0, 8
b_x0:
.word 1
.word 0
.globl b_x1
.align 3
.type b_x1, @object
.size b_x1, 8
b_x1:
.word 16
.word 0
.globl c_x0
.align 3
.type c_x0, @object
.size c_x0, 8
c_x0:
.word 2
.word 0
.globl c_x1
.align 3
.type c_x1, @object
.size c_x1, 8
c_x1:
.word 0
.word 1073741824
.globl d_x0
.align 3
.type d_x0, @object
.size d_x0, 8
d_x0:
.word 3
.word 0
.globl d_x1
.align 3
.type d_x1, @object
.size d_x1, 8
d_x1:
.word -1
.word 2147483647
.text
.align 2
.globl count_leading_zeros
.type count_leading_zeros, @function
count_leading_zeros:
addi sp,sp,-144
sw s0,140(sp)
sw s2,136(sp)
sw s3,132(sp)
sw s4,128(sp)
sw s5,124(sp)
sw s6,120(sp)
sw s7,116(sp)
sw s8,112(sp)
sw s9,108(sp)
sw s10,104(sp)
sw s11,100(sp)
addi s0,sp,144
sw a0,-56(s0)
sw a1,-52(s0)
lw a5,-52(s0)
slli a4,a5,31
lw a5,-56(s0)
srli a2,a5,1
or a2,a4,a2
lw a5,-52(s0)
srli a3,a5,1
lw a5,-56(s0)
or t3,a5,a2
lw a5,-52(s0)
or t4,a5,a3
sw t3,-56(s0)
sw t4,-52(s0)
lw a5,-52(s0)
slli a4,a5,30
lw a5,-56(s0)
srli a6,a5,2
or a6,a4,a6
lw a5,-52(s0)
srli a7,a5,2
lw a5,-56(s0)
or a5,a5,a6
sw a5,-112(s0)
lw a5,-52(s0)
or a5,a5,a7
sw a5,-108(s0)
lw a5,-112(s0)
lw a6,-108(s0)
sw a5,-56(s0)
sw a6,-52(s0)
lw a5,-52(s0)
slli a4,a5,28
lw a5,-56(s0)
srli t1,a5,4
or t1,a4,t1
lw a5,-52(s0)
srli t2,a5,4
lw a5,-56(s0)
or a5,a5,t1
sw a5,-120(s0)
lw a5,-52(s0)
or a5,a5,t2
sw a5,-116(s0)
lw a5,-120(s0)
lw a6,-116(s0)
sw a5,-56(s0)
sw a6,-52(s0)
lw a5,-52(s0)
slli a4,a5,24
lw a5,-56(s0)
srli s2,a5,8
or s2,a4,s2
lw a5,-52(s0)
srli s3,a5,8
lw a5,-56(s0)
or a5,a5,s2
sw a5,-128(s0)
lw a5,-52(s0)
or a5,a5,s3
sw a5,-124(s0)
lw a5,-128(s0)
lw a6,-124(s0)
sw a5,-56(s0)
sw a6,-52(s0)
lw a5,-52(s0)
slli a4,a5,16
lw a5,-56(s0)
srli a5,a5,16
sw a5,-64(s0)
lw a5,-64(s0)
or a5,a4,a5
sw a5,-64(s0)
lw a5,-52(s0)
srli a5,a5,16
sw a5,-60(s0)
lw a5,-56(s0)
lw a2,-64(s0)
lw a3,-60(s0)
mv a4,a2
or a5,a5,a4
sw a5,-136(s0)
lw a5,-52(s0)
mv a4,a3
or a5,a5,a4
sw a5,-132(s0)
lw a5,-136(s0)
lw a6,-132(s0)
sw a5,-56(s0)
sw a6,-52(s0)
lw a5,-52(s0)
srli a5,a5,0
sw a5,-72(s0)
sw zero,-68(s0)
lw a5,-56(s0)
lw a2,-72(s0)
lw a3,-68(s0)
mv a4,a2
or a5,a5,a4
sw a5,-144(s0)
lw a5,-52(s0)
mv a4,a3
or a5,a5,a4
sw a5,-140(s0)
lw a5,-144(s0)
lw a6,-140(s0)
sw a5,-56(s0)
sw a6,-52(s0)
lw a5,-52(s0)
slli a5,a5,31
lw a4,-56(s0)
srli s10,a4,1
or s10,a5,s10
lw a5,-52(s0)
srli s11,a5,1
li a5,1431654400
addi a5,a5,1365
and a5,s10,a5
sw a5,-80(s0)
li a5,1431654400
addi a5,a5,1365
and a5,s11,a5
sw a5,-76(s0)
lw a2,-56(s0)
lw a3,-52(s0)
lw a6,-80(s0)
lw a7,-76(s0)
mv a1,a6
sub a4,a2,a1
mv a1,a4
sgtu a1,a1,a2
mv a0,a7
sub a5,a3,a0
sub a3,a5,a1
mv a5,a3
sw a4,-56(s0)
sw a5,-52(s0)
lw a5,-52(s0)
slli a5,a5,30
lw a4,-56(s0)
srli s8,a4,2
or s8,a5,s8
lw a5,-52(s0)
srli s9,a5,2
li a5,858992640
addi a5,a5,819
and a5,s8,a5
sw a5,-88(s0)
li a5,858992640
addi a5,a5,819
and a5,s9,a5
sw a5,-84(s0)
lw a4,-56(s0)
li a5,858992640
addi a5,a5,819
and a5,a4,a5
sw a5,-96(s0)
lw a4,-52(s0)
li a5,858992640
addi a5,a5,819
and a5,a4,a5
sw a5,-92(s0)
lw a0,-88(s0)
lw a1,-84(s0)
mv a3,a0
lw a6,-96(s0)
lw a7,-92(s0)
mv a2,a6
add a4,a3,a2
mv a3,a4
mv a2,a0
sltu a3,a3,a2
mv a2,a1
mv a1,a7
add a5,a2,a1
add a3,a3,a5
mv a5,a3
sw a4,-56(s0)
sw a5,-52(s0)
lw a5,-52(s0)
slli a5,a5,28
lw a4,-56(s0)
srli t5,a4,4
or t5,a5,t5
lw a5,-52(s0)
srli t6,a5,4
lw a2,-56(s0)
lw a3,-52(s0)
add a4,t5,a2
mv a1,a4
sltu a1,a1,t5
add a5,t6,a3
add a3,a1,a5
mv a5,a3
li a3,252645376
addi a3,a3,-241
and a3,a4,a3
sw a3,-56(s0)
li a3,252645376
addi a3,a3,-241
and a5,a5,a3
sw a5,-52(s0)
lw a5,-52(s0)
slli a5,a5,24
lw a4,-56(s0)
srli s6,a4,8
or s6,a5,s6
lw a5,-52(s0)
srli s7,a5,8
lw a2,-56(s0)
lw a3,-52(s0)
add a4,a2,s6
mv a1,a4
sltu a1,a1,a2
add a5,a3,s7
add a3,a1,a5
mv a5,a3
sw a4,-56(s0)
sw a5,-52(s0)
lw a5,-52(s0)
slli a5,a5,16
lw a4,-56(s0)
srli s4,a4,16
or s4,a5,s4
lw a5,-52(s0)
srli s5,a5,16
lw a2,-56(s0)
lw a3,-52(s0)
add a4,a2,s4
mv a1,a4
sltu a1,a1,a2
add a5,a3,s5
add a3,a1,a5
mv a5,a3
sw a4,-56(s0)
sw a5,-52(s0)
lw a5,-52(s0)
srli a5,a5,0
sw a5,-104(s0)
sw zero,-100(s0)
lw a2,-56(s0)
lw a3,-52(s0)
lw a6,-104(s0)
lw a7,-100(s0)
mv a1,a6
add a4,a2,a1
mv a1,a4
sltu a1,a1,a2
mv a0,a7
add a5,a3,a0
add a3,a1,a5
mv a5,a3
sw a4,-56(s0)
sw a5,-52(s0)
lhu a5,-56(s0)
andi a5,a5,127
slli a5,a5,16
srli a5,a5,16
li a4,64
sub a5,a4,a5
slli a5,a5,16
srli a5,a5,16
mv a0,a5
lw s0,140(sp)
lw s2,136(sp)
lw s3,132(sp)
lw s4,128(sp)
lw s5,124(sp)
lw s6,120(sp)
lw s7,116(sp)
lw s8,112(sp)
lw s9,108(sp)
lw s10,104(sp)
lw s11,100(sp)
addi sp,sp,144
jr ra
.size count_leading_zeros, .-count_leading_zeros
.align 2
.globl predict_if_mul_overflow
.type predict_if_mul_overflow, @function
predict_if_mul_overflow:
addi sp,sp,-48
sw ra,44(sp)
sw s0,40(sp)
addi s0,sp,48
sw a0,-36(s0)
sw a1,-40(s0)
lw a5,-36(s0)
lw a4,0(a5)
lw a5,4(a5)
mv a0,a4
mv a1,a5
call count_leading_zeros
mv a5,a0
mv a4,a5
li a5,63
sub a5,a5,a4
sw a5,-20(s0)
lw a5,-40(s0)
lw a4,0(a5)
lw a5,4(a5)
mv a0,a4
mv a1,a5
call count_leading_zeros
mv a5,a0
mv a4,a5
li a5,63
sub a5,a5,a4
sw a5,-24(s0)
lw a5,-20(s0)
addi a4,a5,1
lw a5,-24(s0)
addi a5,a5,1
add a4,a4,a5
li a5,63
ble a4,a5,.L4
li a5,1
j .L5
.L4:
li a5,0
.L5:
mv a0,a5
lw ra,44(sp)
lw s0,40(sp)
addi sp,sp,48
jr ra
.size predict_if_mul_overflow, .-predict_if_mul_overflow
.section .rodata
.align 2
.LC0:
.string "%d\n"
.align 2
.LC1:
.string "cycle: %llu\n"
.align 2
.LC2:
.string "instret: %llu\n"
.text
.align 2
.globl main
.type main, @function
main:
addi sp,sp,-48
sw ra,44(sp)
sw s0,40(sp)
addi s0,sp,48
call get_instret
sw a0,-24(s0)
sw a1,-20(s0)
call get_cycles
sw a0,-32(s0)
sw a1,-28(s0)
lui a5,%hi(a_x1)
addi a1,a5,%lo(a_x1)
lui a5,%hi(a_x0)
addi a0,a5,%lo(a_x0)
call predict_if_mul_overflow
mv a5,a0
mv a1,a5
lui a5,%hi(.LC0)
addi a0,a5,%lo(.LC0)
call printf
lui a5,%hi(b_x1)
addi a1,a5,%lo(b_x1)
lui a5,%hi(b_x0)
addi a0,a5,%lo(b_x0)
call predict_if_mul_overflow
mv a5,a0
mv a1,a5
lui a5,%hi(.LC0)
addi a0,a5,%lo(.LC0)
call printf
lui a5,%hi(c_x1)
addi a1,a5,%lo(c_x1)
lui a5,%hi(c_x0)
addi a0,a5,%lo(c_x0)
call predict_if_mul_overflow
mv a5,a0
mv a1,a5
lui a5,%hi(.LC0)
addi a0,a5,%lo(.LC0)
call printf
lui a5,%hi(d_x1)
addi a1,a5,%lo(d_x1)
lui a5,%hi(d_x0)
addi a0,a5,%lo(d_x0)
call predict_if_mul_overflow
mv a5,a0
mv a1,a5
lui a5,%hi(.LC0)
addi a0,a5,%lo(.LC0)
call printf
call get_cycles
mv a2,a0
mv a3,a1
lw a0,-32(s0)
lw a1,-28(s0)
sub a4,a2,a0
mv a6,a4
sgtu a6,a6,a2
sub a5,a3,a1
sub a3,a5,a6
mv a5,a3
sw a4,-40(s0)
sw a5,-36(s0)
lw a2,-40(s0)
lw a3,-36(s0)
lui a5,%hi(.LC1)
addi a0,a5,%lo(.LC1)
call printf
lw a5,-24(s0)
mv a1,a5
lui a5,%hi(.LC2)
addi a0,a5,%lo(.LC2)
call printf
nop
lw ra,44(sp)
lw s0,40(sp)
addi sp,sp,48
jr ra
.size main, .-main
.ident "GCC: (xPack GNU RISC-V Embedded GCC x86_64) 13.2.0"
```
:::
---
**Ofast**
:::spoiler click to reveal the code
```c
.file "hw2yh.c"
.option nopic
.attribute arch, "rv32i2p1_zicsr2p0_zifencei2p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
.align 2
.globl count_leading_zeros
.type count_leading_zeros, @function
count_leading_zeros:
slli a4,a1,31
srli a5,a0,1
or a5,a4,a5
srli a4,a1,1
or a1,a4,a1
or a0,a5,a0
slli a4,a1,30
srli a5,a0,2
or a5,a4,a5
srli a4,a1,2
or a4,a4,a1
or a0,a5,a0
slli a3,a4,28
srli a5,a0,4
or a5,a3,a5
srli a3,a4,4
or a3,a3,a4
or a4,a5,a0
slli a2,a3,24
srli a5,a4,8
or a5,a2,a5
srli a2,a3,8
or a2,a2,a3
or a5,a5,a4
srli a3,a5,16
slli a4,a2,16
or a3,a4,a3
srli a4,a2,16
or a4,a4,a2
or a3,a3,a5
or a3,a4,a3
slli a1,a4,31
srli a5,a3,1
li a2,1431654400
addi a2,a2,1365
or a5,a1,a5
and a5,a5,a2
srli a1,a4,1
sub a5,a3,a5
and a2,a1,a2
sgtu a3,a5,a3
sub a4,a4,a2
sub a4,a4,a3
slli a1,a4,30
srli a3,a5,2
li a2,858992640
addi a2,a2,819
or a3,a1,a3
and a3,a3,a2
srli a1,a4,2
and a5,a5,a2
add a5,a3,a5
and a1,a1,a2
and a4,a4,a2
add a4,a1,a4
sltu a3,a5,a3
add a3,a3,a4
slli a2,a3,28
srli a4,a5,4
or a4,a2,a4
add a5,a4,a5
srli a2,a3,4
add a2,a2,a3
sltu a4,a5,a4
li a3,252645376
addi a3,a3,-241
add a4,a4,a2
and a4,a4,a3
and a5,a5,a3
slli a2,a4,24
srli a3,a5,8
or a3,a2,a3
add a5,a3,a5
srli a2,a4,8
add a4,a2,a4
sltu a3,a5,a3
add a3,a3,a4
slli a2,a3,16
srli a4,a5,16
or a4,a2,a4
add a5,a4,a5
srli a2,a3,16
sltu a4,a5,a4
add a3,a2,a3
add a4,a4,a3
add a4,a4,a5
andi a4,a4,127
li a0,64
sub a0,a0,a4
slli a0,a0,16
srli a0,a0,16
ret
.size count_leading_zeros, .-count_leading_zeros
.align 2
.globl predict_if_mul_overflow
.type predict_if_mul_overflow, @function
predict_if_mul_overflow:
lw a7,4(a0)
lw a6,4(a1)
lw a0,0(a0)
lw a1,0(a1)
slli a2,a7,31
slli a3,a6,31
srli a4,a0,1
srli a5,a1,1
or a4,a2,a4
or a5,a3,a5
srli a2,a7,1
srli a3,a6,1
or a2,a2,a7
or a3,a3,a6
or a0,a4,a0
or a1,a5,a1
slli a7,a2,30
slli a6,a3,30
srli a4,a0,2
srli a5,a1,2
or a4,a7,a4
or a5,a6,a5
srli a7,a2,2
srli a6,a3,2
or a2,a2,a7
or a3,a3,a6
or a0,a0,a4
or a1,a1,a5
slli a7,a2,28
slli a6,a3,28
srli a4,a0,4
srli a5,a1,4
or a4,a7,a4
or a5,a6,a5
srli a7,a2,4
srli a6,a3,4
or a2,a2,a7
or a3,a3,a6
or a0,a0,a4
or a1,a1,a5
slli a7,a2,24
slli a6,a3,24
srli a4,a0,8
srli a5,a1,8
or a4,a7,a4
or a5,a6,a5
srli a7,a2,8
srli a6,a3,8
or a2,a2,a7
or a3,a3,a6
or a0,a0,a4
or a1,a1,a5
slli a7,a2,16
slli a6,a3,16
srli a4,a0,16
srli a5,a1,16
or a4,a7,a4
or a5,a6,a5
srli a7,a2,16
srli a6,a3,16
or a2,a2,a7
or a3,a3,a6
or a0,a0,a4
or a1,a1,a5
or a0,a0,a2
or a1,a1,a3
slli t1,a2,31
slli a7,a3,31
srli a4,a0,1
srli a5,a1,1
li a6,1431654400
addi a6,a6,1365
or a4,t1,a4
or a5,a7,a5
srli t1,a2,1
srli a7,a3,1
and a4,a4,a6
and a5,a5,a6
sub a4,a0,a4
sub a5,a1,a5
and t1,t1,a6
and a6,a7,a6
sgtu a0,a4,a0
sgtu a1,a5,a1
sub a2,a2,t1
sub a3,a3,a6
sub a2,a2,a0
sub a3,a3,a1
slli t1,a2,30
slli a7,a3,30
srli a0,a4,2
srli a1,a5,2
li a6,858992640
addi a6,a6,819
or a0,t1,a0
or a1,a7,a1
and a0,a0,a6
and a1,a1,a6
srli t1,a2,2
and a4,a4,a6
srli a7,a3,2
and a5,a5,a6
add a4,a0,a4
add a5,a1,a5
and t1,t1,a6
and a7,a7,a6
and a2,a2,a6
and a3,a3,a6
add a2,t1,a2
add a3,a7,a3
sltu a0,a4,a0
sltu a1,a5,a1
add a0,a0,a2
add a1,a1,a3
slli a7,a0,28
slli a6,a1,28
srli a2,a4,4
srli a3,a5,4
or a2,a7,a2
or a3,a6,a3
add a4,a2,a4
add a5,a3,a5
srli a7,a0,4
srli a6,a1,4
add a7,a7,a0
add a1,a6,a1
sltu a2,a4,a2
sltu a3,a5,a3
li a0,252645376
addi a0,a0,-241
add a2,a2,a7
add a3,a3,a1
and a2,a2,a0
and a5,a5,a0
and a3,a3,a0
and a4,a4,a0
slli a7,a2,24
slli a6,a3,24
srli a0,a4,8
srli a1,a5,8
or a0,a7,a0
or a1,a6,a1
add a0,a4,a0
add a1,a5,a1
srli a7,a2,8
srli a6,a3,8
add a2,a2,a7
add a3,a3,a6
sltu a4,a0,a4
sltu a5,a1,a5
add a4,a4,a2
add a5,a5,a3
slli a7,a4,16
slli a6,a5,16
srli a3,a0,16
srli a2,a1,16
or a3,a7,a3
or a2,a6,a2
add a3,a0,a3
add a2,a1,a2
srli a7,a4,16
srli a6,a5,16
add a4,a4,a7
add a5,a5,a6
sltu a0,a3,a0
sltu a1,a2,a1
add a1,a1,a5
add a0,a0,a4
add a0,a3,a0
add a2,a2,a1
li a5,64
andi a2,a2,127
andi a0,a0,127
sub a0,a5,a0
sub a5,a5,a2
slli a0,a0,16
slli a5,a5,16
li a4,64
srli a5,a5,16
srli a0,a0,16
sub a0,a4,a0
sub a4,a4,a5
add a0,a0,a4
slti a0,a0,64
xori a0,a0,1
ret
.size predict_if_mul_overflow, .-predict_if_mul_overflow
.section .rodata.str1.4,"aMS",@progbits,1
.align 2
.LC0:
.string "%d\n"
.align 2
.LC1:
.string "cycle: %llu\n"
.align 2
.LC2:
.string "instret: %llu\n"
.section .text.startup,"ax",@progbits
.align 2
.globl main
.type main, @function
main:
addi sp,sp,-32
sw ra,28(sp)
sw s0,24(sp)
sw s1,20(sp)
sw s2,16(sp)
sw s3,12(sp)
call get_instret
mv s2,a0
call get_cycles
mv s0,a0
mv s3,a1
lui a0,%hi(a_x0)
lui a1,%hi(a_x1)
addi a1,a1,%lo(a_x1)
addi a0,a0,%lo(a_x0)
call predict_if_mul_overflow
lui s1,%hi(.LC0)
mv a1,a0
addi a0,s1,%lo(.LC0)
call printf
lui a1,%hi(b_x1)
lui a0,%hi(b_x0)
addi a1,a1,%lo(b_x1)
addi a0,a0,%lo(b_x0)
call predict_if_mul_overflow
mv a1,a0
addi a0,s1,%lo(.LC0)
call printf
lui a1,%hi(c_x1)
lui a0,%hi(c_x0)
addi a1,a1,%lo(c_x1)
addi a0,a0,%lo(c_x0)
call predict_if_mul_overflow
mv a1,a0
addi a0,s1,%lo(.LC0)
call printf
lui a1,%hi(d_x1)
lui a0,%hi(d_x0)
addi a1,a1,%lo(d_x1)
addi a0,a0,%lo(d_x0)
call predict_if_mul_overflow
mv a1,a0
addi a0,s1,%lo(.LC0)
call printf
call get_cycles
sub a2,a0,s0
sub a1,a1,s3
sgtu a3,a2,a0
lui a0,%hi(.LC1)
sub a3,a1,a3
addi a0,a0,%lo(.LC1)
call printf
lw s0,24(sp)
lw ra,28(sp)
lw s1,20(sp)
lw s3,12(sp)
mv a1,s2
lw s2,16(sp)
lui a0,%hi(.LC2)
addi a0,a0,%lo(.LC2)
addi sp,sp,32
tail printf
.size main, .-main
.globl d_x1
.globl d_x0
.globl c_x1
.globl c_x0
.globl b_x1
.globl b_x0
.globl a_x1
.globl a_x0
.section .sbss,"aw",@nobits
.align 3
.type a_x1, @object
.size a_x1, 8
a_x1:
.zero 8
.type a_x0, @object
.size a_x0, 8
a_x0:
.zero 8
.section .sdata,"aw"
.align 3
.type d_x1, @object
.size d_x1, 8
d_x1:
.word -1
.word 2147483647
.type d_x0, @object
.size d_x0, 8
d_x0:
.word 3
.word 0
.type c_x1, @object
.size c_x1, 8
c_x1:
.word 0
.word 1073741824
.type c_x0, @object
.size c_x0, 8
c_x0:
.word 2
.word 0
.type b_x1, @object
.size b_x1, 8
b_x1:
.word 16
.word 0
.type b_x0, @object
.size b_x0, 8
b_x0:
.word 1
.word 0
.ident "GCC: (xPack GNU RISC-V Embedded GCC x86_64) 13.2.0"
```
:::
---
**O3**
:::spoiler click to reveal the code
```c
.file "hw2yh.c"
.option nopic
.attribute arch, "rv32i2p1_zicsr2p0_zifencei2p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
.align 2
.globl count_leading_zeros
.type count_leading_zeros, @function
count_leading_zeros:
slli a4,a1,31
srli a5,a0,1
or a5,a4,a5
srli a4,a1,1
or a1,a4,a1
or a0,a5,a0
slli a4,a1,30
srli a5,a0,2
or a5,a4,a5
srli a4,a1,2
or a4,a4,a1
or a0,a5,a0
slli a3,a4,28
srli a5,a0,4
or a5,a3,a5
srli a3,a4,4
or a3,a3,a4
or a4,a5,a0
slli a2,a3,24
srli a5,a4,8
or a5,a2,a5
srli a2,a3,8
or a2,a2,a3
or a5,a5,a4
srli a3,a5,16
slli a4,a2,16
or a3,a4,a3
srli a4,a2,16
or a4,a4,a2
or a3,a3,a5
or a3,a4,a3
slli a1,a4,31
srli a5,a3,1
li a2,1431654400
addi a2,a2,1365
or a5,a1,a5
and a5,a5,a2
srli a1,a4,1
sub a5,a3,a5
and a2,a1,a2
sgtu a3,a5,a3
sub a4,a4,a2
sub a4,a4,a3
slli a1,a4,30
srli a3,a5,2
li a2,858992640
addi a2,a2,819
or a3,a1,a3
and a3,a3,a2
srli a1,a4,2
and a5,a5,a2
add a5,a3,a5
and a1,a1,a2
and a4,a4,a2
add a4,a1,a4
sltu a3,a5,a3
add a3,a3,a4
slli a2,a3,28
srli a4,a5,4
or a4,a2,a4
add a5,a4,a5
srli a2,a3,4
add a2,a2,a3
sltu a4,a5,a4
li a3,252645376
addi a3,a3,-241
add a4,a4,a2
and a4,a4,a3
and a5,a5,a3
slli a2,a4,24
srli a3,a5,8
or a3,a2,a3
add a5,a3,a5
srli a2,a4,8
add a4,a2,a4
sltu a3,a5,a3
add a3,a3,a4
slli a2,a3,16
srli a4,a5,16
or a4,a2,a4
add a5,a4,a5
srli a2,a3,16
sltu a4,a5,a4
add a3,a2,a3
add a4,a4,a3
add a4,a4,a5
andi a4,a4,127
li a0,64
sub a0,a0,a4
slli a0,a0,16
srli a0,a0,16
ret
.size count_leading_zeros, .-count_leading_zeros
.align 2
.globl predict_if_mul_overflow
.type predict_if_mul_overflow, @function
predict_if_mul_overflow:
lw a7,4(a0)
lw a6,4(a1)
lw a0,0(a0)
lw a1,0(a1)
slli a2,a7,31
slli a3,a6,31
srli a4,a0,1
srli a5,a1,1
or a4,a2,a4
or a5,a3,a5
srli a2,a7,1
srli a3,a6,1
or a2,a2,a7
or a3,a3,a6
or a0,a4,a0
or a1,a5,a1
slli a7,a2,30
slli a6,a3,30
srli a4,a0,2
srli a5,a1,2
or a4,a7,a4
or a5,a6,a5
srli a7,a2,2
srli a6,a3,2
or a2,a2,a7
or a3,a3,a6
or a0,a0,a4
or a1,a1,a5
slli a7,a2,28
slli a6,a3,28
srli a4,a0,4
srli a5,a1,4
or a4,a7,a4
or a5,a6,a5
srli a7,a2,4
srli a6,a3,4
or a2,a2,a7
or a3,a3,a6
or a0,a0,a4
or a1,a1,a5
slli a7,a2,24
slli a6,a3,24
srli a4,a0,8
srli a5,a1,8
or a4,a7,a4
or a5,a6,a5
srli a7,a2,8
srli a6,a3,8
or a2,a2,a7
or a3,a3,a6
or a0,a0,a4
or a1,a1,a5
slli a7,a2,16
slli a6,a3,16
srli a4,a0,16
srli a5,a1,16
or a4,a7,a4
or a5,a6,a5
srli a7,a2,16
srli a6,a3,16
or a2,a2,a7
or a3,a3,a6
or a0,a0,a4
or a1,a1,a5
or a0,a0,a2
or a1,a1,a3
slli t1,a2,31
slli a7,a3,31
srli a4,a0,1
srli a5,a1,1
li a6,1431654400
addi a6,a6,1365
or a4,t1,a4
or a5,a7,a5
srli t1,a2,1
srli a7,a3,1
and a4,a4,a6
and a5,a5,a6
sub a4,a0,a4
sub a5,a1,a5
and t1,t1,a6
and a6,a7,a6
sgtu a0,a4,a0
sgtu a1,a5,a1
sub a2,a2,t1
sub a3,a3,a6
sub a2,a2,a0
sub a3,a3,a1
slli t1,a2,30
slli a7,a3,30
srli a0,a4,2
srli a1,a5,2
li a6,858992640
addi a6,a6,819
or a0,t1,a0
or a1,a7,a1
and a0,a0,a6
and a1,a1,a6
srli t1,a2,2
and a4,a4,a6
srli a7,a3,2
and a5,a5,a6
add a4,a0,a4
add a5,a1,a5
and t1,t1,a6
and a7,a7,a6
and a2,a2,a6
and a3,a3,a6
add a2,t1,a2
add a3,a7,a3
sltu a0,a4,a0
sltu a1,a5,a1
add a0,a0,a2
add a1,a1,a3
slli a7,a0,28
slli a6,a1,28
srli a2,a4,4
srli a3,a5,4
or a2,a7,a2
or a3,a6,a3
add a4,a2,a4
add a5,a3,a5
srli a7,a0,4
srli a6,a1,4
add a7,a7,a0
add a1,a6,a1
sltu a2,a4,a2
sltu a3,a5,a3
li a0,252645376
addi a0,a0,-241
add a2,a2,a7
add a3,a3,a1
and a2,a2,a0
and a5,a5,a0
and a3,a3,a0
and a4,a4,a0
slli a7,a2,24
slli a6,a3,24
srli a0,a4,8
srli a1,a5,8
or a0,a7,a0
or a1,a6,a1
add a0,a4,a0
add a1,a5,a1
srli a7,a2,8
srli a6,a3,8
add a2,a2,a7
add a3,a3,a6
sltu a4,a0,a4
sltu a5,a1,a5
add a4,a4,a2
add a5,a5,a3
slli a7,a4,16
slli a6,a5,16
srli a3,a0,16
srli a2,a1,16
or a3,a7,a3
or a2,a6,a2
add a3,a0,a3
add a2,a1,a2
srli a7,a4,16
srli a6,a5,16
add a4,a4,a7
add a5,a5,a6
sltu a0,a3,a0
sltu a1,a2,a1
add a1,a1,a5
add a0,a0,a4
add a0,a3,a0
add a2,a2,a1
li a5,64
andi a2,a2,127
andi a0,a0,127
sub a0,a5,a0
sub a5,a5,a2
slli a0,a0,16
slli a5,a5,16
li a4,64
srli a5,a5,16
srli a0,a0,16
sub a0,a4,a0
sub a4,a4,a5
add a0,a0,a4
slti a0,a0,64
xori a0,a0,1
ret
.size predict_if_mul_overflow, .-predict_if_mul_overflow
.section .rodata.str1.4,"aMS",@progbits,1
.align 2
.LC0:
.string "%d\n"
.align 2
.LC1:
.string "cycle: %llu\n"
.align 2
.LC2:
.string "instret: %llu\n"
.section .text.startup,"ax",@progbits
.align 2
.globl main
.type main, @function
main:
addi sp,sp,-32
sw ra,28(sp)
sw s0,24(sp)
sw s1,20(sp)
sw s2,16(sp)
sw s3,12(sp)
call get_instret
mv s2,a0
call get_cycles
mv s0,a0
mv s3,a1
lui a0,%hi(a_x0)
lui a1,%hi(a_x1)
addi a1,a1,%lo(a_x1)
addi a0,a0,%lo(a_x0)
call predict_if_mul_overflow
lui s1,%hi(.LC0)
mv a1,a0
addi a0,s1,%lo(.LC0)
call printf
lui a1,%hi(b_x1)
lui a0,%hi(b_x0)
addi a1,a1,%lo(b_x1)
addi a0,a0,%lo(b_x0)
call predict_if_mul_overflow
mv a1,a0
addi a0,s1,%lo(.LC0)
call printf
lui a1,%hi(c_x1)
lui a0,%hi(c_x0)
addi a1,a1,%lo(c_x1)
addi a0,a0,%lo(c_x0)
call predict_if_mul_overflow
mv a1,a0
addi a0,s1,%lo(.LC0)
call printf
lui a1,%hi(d_x1)
lui a0,%hi(d_x0)
addi a1,a1,%lo(d_x1)
addi a0,a0,%lo(d_x0)
call predict_if_mul_overflow
mv a1,a0
addi a0,s1,%lo(.LC0)
call printf
call get_cycles
sub a2,a0,s0
sub a1,a1,s3
sgtu a3,a2,a0
lui a0,%hi(.LC1)
sub a3,a1,a3
addi a0,a0,%lo(.LC1)
call printf
lw s0,24(sp)
lw ra,28(sp)
lw s1,20(sp)
lw s3,12(sp)
mv a1,s2
lw s2,16(sp)
lui a0,%hi(.LC2)
addi a0,a0,%lo(.LC2)
addi sp,sp,32
tail printf
.size main, .-main
.globl d_x1
.globl d_x0
.globl c_x1
.globl c_x0
.globl b_x1
.globl b_x0
.globl a_x1
.globl a_x0
.section .sbss,"aw",@nobits
.align 3
.type a_x1, @object
.size a_x1, 8
a_x1:
.zero 8
.type a_x0, @object
.size a_x0, 8
a_x0:
.zero 8
.section .sdata,"aw"
.align 3
.type d_x1, @object
.size d_x1, 8
d_x1:
.word -1
.word 2147483647
.type d_x0, @object
.size d_x0, 8
d_x0:
.word 3
.word 0
.type c_x1, @object
.size c_x1, 8
c_x1:
.word 0
.word 1073741824
.type c_x0, @object
.size c_x0, 8
c_x0:
.word 2
.word 0
.type b_x1, @object
.size b_x1, 8
b_x1:
.word 16
.word 0
.type b_x0, @object
.size b_x0, 8
b_x0:
.word 1
.word 0
.ident "GCC: (xPack GNU RISC-V Embedded GCC x86_64) 13.2.0"
```
:::
---
**Os**
:::spoiler click to reveal the code
```c
.file "hw2yh.c"
.option nopic
.attribute arch, "rv32i2p1_zicsr2p0_zifencei2p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
.align 2
.globl count_leading_zeros
.type count_leading_zeros, @function
count_leading_zeros:
slli a4,a1,31
srli a5,a0,1
or a5,a4,a5
srli a4,a1,1
or a1,a4,a1
or a0,a5,a0
slli a4,a1,30
srli a5,a0,2
or a5,a4,a5
srli a2,a1,2
or a2,a2,a1
or a0,a5,a0
slli a4,a2,28
srli a5,a0,4
or a5,a4,a5
srli a3,a2,4
or a4,a5,a0
or a3,a3,a2
slli a2,a3,24
srli a5,a4,8
or a5,a2,a5
srli a2,a3,8
or a2,a2,a3
or a5,a5,a4
srli a3,a5,16
slli a4,a2,16
or a3,a4,a3
srli a4,a2,16
or a4,a4,a2
or a3,a3,a5
or a3,a4,a3
slli a2,a4,31
srli a5,a3,1
or a5,a2,a5
li a2,1431654400
addi a2,a2,1365
srli a1,a4,1
and a5,a5,a2
sub a5,a3,a5
and a2,a1,a2
sgtu a3,a5,a3
sub a4,a4,a2
sub a4,a4,a3
slli a2,a4,30
srli a3,a5,2
or a3,a2,a3
li a2,858992640
addi a2,a2,819
and a3,a3,a2
srli a1,a4,2
and a5,a5,a2
and a1,a1,a2
add a5,a3,a5
and a4,a4,a2
add a4,a1,a4
sltu a3,a5,a3
add a3,a3,a4
slli a2,a3,28
srli a4,a5,4
or a4,a2,a4
add a5,a4,a5
srli a2,a3,4
add a3,a2,a3
sltu a4,a5,a4
add a4,a4,a3
li a3,252645376
addi a3,a3,-241
and a4,a4,a3
and a5,a5,a3
slli a2,a4,24
srli a3,a5,8
or a3,a2,a3
add a5,a3,a5
srli a2,a4,8
add a4,a2,a4
sltu a3,a5,a3
add a3,a3,a4
slli a2,a3,16
srli a4,a5,16
or a4,a2,a4
add a5,a4,a5
srli a2,a3,16
sltu a4,a5,a4
add a3,a2,a3
add a4,a4,a3
add a4,a4,a5
andi a4,a4,127
li a0,64
sub a0,a0,a4
slli a0,a0,16
srli a0,a0,16
ret
.size count_leading_zeros, .-count_leading_zeros
.align 2
.globl predict_if_mul_overflow
.type predict_if_mul_overflow, @function
predict_if_mul_overflow:
addi sp,sp,-16
sw s1,4(sp)
mv s1,a1
lw a1,4(a0)
lw a0,0(a0)
sw ra,12(sp)
sw s0,8(sp)
call count_leading_zeros
lw a1,4(s1)
mv s0,a0
lw a0,0(s1)
call count_leading_zeros
li a5,64
sub s0,a5,s0
sub a5,a5,a0
add s0,s0,a5
lw ra,12(sp)
slti a0,s0,64
lw s0,8(sp)
lw s1,4(sp)
xori a0,a0,1
addi sp,sp,16
jr ra
.size predict_if_mul_overflow, .-predict_if_mul_overflow
.section .rodata.str1.4,"aMS",@progbits,1
.align 2
.LC0:
.string "%d\n"
.align 2
.LC1:
.string "cycle: %llu\n"
.align 2
.LC2:
.string "instret: %llu\n"
.section .text.startup,"ax",@progbits
.align 2
.globl main
.type main, @function
main:
addi sp,sp,-32
sw ra,28(sp)
sw s0,24(sp)
sw s1,20(sp)
sw s2,16(sp)
sw s3,12(sp)
call get_instret
mv s2,a0
call get_cycles
mv s0,a0
mv s3,a1
lui a0,%hi(a_x0)
lui a1,%hi(a_x1)
addi a1,a1,%lo(a_x1)
addi a0,a0,%lo(a_x0)
call predict_if_mul_overflow
lui s1,%hi(.LC0)
mv a1,a0
addi a0,s1,%lo(.LC0)
call printf
lui a1,%hi(b_x1)
lui a0,%hi(b_x0)
addi a1,a1,%lo(b_x1)
addi a0,a0,%lo(b_x0)
call predict_if_mul_overflow
mv a1,a0
addi a0,s1,%lo(.LC0)
call printf
lui a1,%hi(c_x1)
lui a0,%hi(c_x0)
addi a1,a1,%lo(c_x1)
addi a0,a0,%lo(c_x0)
call predict_if_mul_overflow
mv a1,a0
addi a0,s1,%lo(.LC0)
call printf
lui a1,%hi(d_x1)
lui a0,%hi(d_x0)
addi a1,a1,%lo(d_x1)
addi a0,a0,%lo(d_x0)
call predict_if_mul_overflow
mv a1,a0
addi a0,s1,%lo(.LC0)
call printf
call get_cycles
sub a2,a0,s0
sub a1,a1,s3
sgtu a3,a2,a0
lui a0,%hi(.LC1)
sub a3,a1,a3
addi a0,a0,%lo(.LC1)
call printf
lw s0,24(sp)
lw ra,28(sp)
lw s1,20(sp)
lw s3,12(sp)
mv a1,s2
lw s2,16(sp)
lui a0,%hi(.LC2)
addi a0,a0,%lo(.LC2)
addi sp,sp,32
tail printf
.size main, .-main
.globl d_x1
.globl d_x0
.globl c_x1
.globl c_x0
.globl b_x1
.globl b_x0
.globl a_x1
.globl a_x0
.section .sbss,"aw",@nobits
.align 3
.type a_x1, @object
.size a_x1, 8
a_x1:
.zero 8
.type a_x0, @object
.size a_x0, 8
a_x0:
.zero 8
.section .sdata,"aw"
.align 3
.type d_x1, @object
.size d_x1, 8
d_x1:
.word -1
.word 2147483647
.type d_x0, @object
.size d_x0, 8
d_x0:
.word 3
.word 0
.type c_x1, @object
.size c_x1, 8
c_x1:
.word 0
.word 1073741824
.type c_x0, @object
.size c_x0, 8
c_x0:
.word 2
.word 0
.type b_x1, @object
.size b_x1, 8
b_x1:
.word 16
.word 0
.type b_x0, @object
.size b_x0, 8
b_x0:
.word 1
.word 0
.ident "GCC: (xPack GNU RISC-V Embedded GCC x86_64) 13.2.0"
```
:::
### Obserations
The original generated assembly code comprises 503 lines.
With O3 and Ofast optimization, the code result in 423 lines .
Since the Os is optimization in size, it only contains 260 lines.
:::warning
You should discuss the instruction statistics instead.
:notes: jserv
:::