# Assingment1: Approximating a bfloat number using binary search
###### contributed by < [Timothy Liu](https://github.com/timothyliu0912) >
###### tags: `RISC-V` `Binary Search` `bfloat16` `float32`
## 1. Backgrond
### 1.1 Binary Search introduction
Binay seach is a efficient search algorithm that locate the position of a target value within a sorted array (or interval). Binary search process involves comparing the target value within the middle element of the array. If they do not match, it elimates the half if the element of the array where the target cannot where the target cannot possibly be found and continues the search in the remaining half. This process repeats, with each iteration narrowing down the search range by comparing the middle element to the target value until the desired target value is located.
### 1.2 Binary Search's Procedure
Binary search is operated on a sorted array (or interval). The process of binary search starts by comparing an element located at the midpoint of the array to the target value. If the target value matches the element, the algorithm returns the position of that element in the array. If the target value is less than the element, the search narrows down to the lower half of the array. On the other hand, if the target value is greater than the element, the search continues on the upper half of the array. This iterative approach effectively eliminates the half in which the target value cannot be located during each step of the search, ultimately leading to the discovery of the target value's position in the array.
Given an interval between $L$ and $R$ (i.e., $\left[L, R \right]$) and the target value $T$, the following subroutine uses binary search to find the index of $T$ in $\left[ L, R \right]$
1. If $L > R$, the search terminates as unsuccessful.
3. Set $m$ (the positioon if the middle element) to $\frac{L+R}{2}$.
4. If $m < T$, set $R$ to $m$ and got to step 2.
5. If $m > T$, set $L$ to $m-1$ and got to step 2.
6. If $m = T$, the search is done; return m.
### 1.3 Bfloat16 floating-point format
The `bfloat16` is a custom 16-bit floating point format for machine learning that is composed of
* Sign bit: 1 bit
* Exponent width: 8 bits
* Significand precision: 8 bits (7 explicitly stored, with an implicit leading bit), as opposed to 24 bits in a classical single-precision floating-point format
#### Composed of bfloat16, float32 and float16
|floating point format|Sign bit | Exponent bits | Mantissa bits |
|:--------:|:--------:|:--------:|:--------:|
|`bfloat16` |1-bit | 8-bits | 7-bits |
|`float32` |1-bit | 8-bits | 23-bits |
|`float16` |1-bit | 5-bits | 10-bits |
Compared with `float32`, `bfloat16` has the same exponent size as `float32`, which allows for fast conversion to and from `float32`. The exponent bits are preserved in conversion to the `bfloat16` , but handles denormals are flushed to zero.
(wiki, google:The bfloat16 numerical format )
## 2. Implemtation
The source code of C Program and Assembly code can be found [here]()
### 2.1 C Program
```c=
#include <stdio.h>
float fp32_to_bf16(float x)
{
float y = x;
int *p = (int *) &y;
unsigned int exp = *p & 0x7F800000;
unsigned int man = *p & 0x007FFFFF;
if (exp == 0 && man == 0)
return x;
if (exp == 0x7F800000)
return x;
float r = x;
int *pr = (int *) &r;
*pr &= 0xFF800000;
r /= 0x100;
y = x + r;
*p &= 0xFFFF0000;
return y;
}
float binary_search(float low, float high, float target) {
low = fp32_to_bf16(low);
high = fp32_to_bf16(high);
target = fp32_to_bf16(target);
while (low <= high) {
float mid = low + (high - low) / 2;
if (mid == target) {
return mid;
}
if (mid < target) {
low = mid;
}
else {
high = mid;
}
}
return -1;
}
int main(){
float test_case1_x= 0.1;
float test_case1_ub = 10.0;
float test_case1_lb = 0.001;
float test_case2_x= 100.0;
float test_case2_ub = 256.0;
float test_case2_lb = 10.0;
float test_case3_x= 0.563;
float test_case3_ub = 1.0;
float test_case3_lb = 0.01;
printf("%f\n",binary_search(test_case1_lb, test_case1_ub, test_case1_x));
printf("%f\n",binary_search(test_case2_lb, test_case2_ub, test_case2_x));
printf("%f\n",binary_search(test_case3_lb, test_case3_ub, test_case3_x));
return 0;
}
```
### 2.2 Assembly Code
The complete source code can be found [here](). The following code only shows the part of assembly code functions.
```s=
float32tobfloat16: #convert float32 to bfloat16
li s0, 0x7F800000
li s1, 0x007FFFFF
li s2, 0x00800000
and a0, t1, s0 #a0 input exp
and a1, t1, s1 #a1 input man
# or a1, a1, s2
beq a0, x0, zerocheck
beq a0, s0 print_float
jal s10 float32_div
jal s10 float32_add
li s0, 0xFFFF0000
and t1, t2, s0
ret
zerocheck:
beq a1, x0, print_float
float32_div:
li s0, 0xFF800000
li s1, 0x04000000 #256
# and s2, t1, s0 #pr=s2 input exp
sub a2, a0, s1 #a2 = a0 / 256
jr s10
float32_add:
li s0, 0x7F800000
li s1, 0x007FFFFF
li s2, 0x00800000
and a3, a2, s0 #r exp
and a4, a2, s1 #r man
or a4, a4, s2
sub a5, a0, a3 #a5 = shift
srli a5, a5, 23
srl a4, a4, a5 # shift r man
add a6, a4, a1 #a6 res_man
or t2, a0, a6
jr s10
float_add:
li s0, 0x7F800000
li s1, 0x007FFFFF
li s2, 0x00800000
and a0, t4, s0 #a0 input exp
and a1, t4, s1 #a1 input man
srli a0, a0, 23 # shift exp
or a1, a1, s2
and a3, t5, s0 #r exp
and a4, t5, s1 #r man
srli a3, a3, 23
or a4, a4, s2
sub a5, a0, a3 #a5 = shift
srl a4, a4, a5 # shift r man
add a6, a4, a1 #a6 res_man
li s7, 0x01000000
and a7, a6, s7
bnez a7, overflow #check man overflow
#combine exp and man
and a6, a6,s1
andi a0, a0, 0xff
slli a0, a0, 23
or t2, a0, a6
jr s10
overflow:
srli a6, a6 1
addi a0, a0, 1
and a6, a6,s1
andi a0, a0, 0xff
slli a0, a0, 23
or t2, a0, a6
jr s10
binary_search:
jal s10 float_add #add ub and lb
# div 2
li s0, 0xFF800000
li s1, 0x007FFFFF
li s2 0x01000000 #2
and a0, t2, s0 #a0 t2 exp
and a1, t2, s1 #a0 t2 man
sub a2, a0, s1 #a2 = div 2
or a2, a2, a1
li s0, 0xFFFF0000
and a2, a2, s0
add a0, t4, x0
li a7, 2
ecall
la a0, spacestr
li a7, 4
ecall
add a0, a2, x0
li a7, 2
ecall
la a0, spacestr
li a7, 4
ecall
add a0, t5, x0
li a7, 2
ecall
la, a0, linestr
li a7, 4
ecall
beq a2, t3 print_result
blt t3, a2, mid_to_ub
blt a2, t3, mid_to_lb
mid_to_lb:
mv t5, a2
j binary_search
mid_to_ub:
mv t4, a2
j binary_search
```
### 2.3 Result
#### test data 1
```
0x40A00000 0x41200000 0x40400000
```
#### test data 2
```
0x42C80000 0x43800000 0x3F800000
```
#### test data 3
```
0x3DCCCCCD 0x3F800000 0x3C23D70A
```
#### C program output
```txt
0.100098
100.000000
0.562500
```
#### Assembly output
```txt=
The target, upper bound and lower bound numbers are :
0.100098 10 0.000999451
-----------------
10 5 0.000999451
5 2.5 0.000999451
2.5 1.25 0.000999451
1.25 0.625 0.000999451
0.625 0.3125 0.000999451
0.3125 0.15625 0.000999451
0.15625 0.0786133 0.000999451
0.15625 0.117188 0.0786133
0.117188 0.0976563 0.0786133
0.117188 0.107422 0.0976563
0.107422 0.102539 0.0976563
0.102539 0.100098 0.0976563
-----------------
The binary search result number is 0.100098
The target, upper bound and lower bound numbers are :
100 256 10
-----------------
256 133 10
133 71.5 10
133 102 71.5
102 86.5 71.5
102 94 86.5
102 98 94
102 100 98
-----------------
The binary search result number is 100
The target, upper bound and lower bound numbers are :
0.5625 1 0.0100098
-----------------
1 0.503906 0.0100098
1 0.75 0.503906
0.75 0.625 0.503906
0.625 0.5625 0.503906
-----------------
The binary search result number is 0.5625
```
## 3. Analysis
In the assignment, I use [Ripes](https://github.com/mortbopet/Ripes) to test and analysis the assembly code.
:::info
The Ripes is a visual computer architecture simulator and assembly code editor built for the RISC-V instruction set architecture.
:::
### 3.1 Disassembled Executable Code
Disassembled executable code here only shows a part of the code. The entire code can be found [here]().
```s = 000000a4 <float32tobfloat16>:
a4: 7f800437 lui x8 0x7f800
a8: 008004b7 lui x9 0x800
ac: fff48493 addi x9 x9 -1
b0: 00800937 lui x18 0x800
b4: 00837533 and x10 x6 x8
b8: 009375b3 and x11 x6 x9
bc: 00050e63 beq x10 x0 28 <zerocheck>
c0: 16850a63 beq x10 x8 372 <print_float>
c4: 01800d6f jal x26 24 <float32_div>
c8: 02400d6f jal x26 36 <float32_add>
cc: ffff0437 lui x8 0xffff0
d0: 0083f333 and x6 x7 x8
d4: 00008067 jalr x0 x1 0
000000d8 <zerocheck>:
d8: 14058e63 beq x11 x0 348 <print_float>
000000dc <float32_div>:
dc: ff800437 lui x8 0xff800
e0: 040004b7 lui x9 0x4000
e4: 40950633 sub x12 x10 x9
e8: 000d0067 jalr x0 x26 0
000000ec <float32_add>:
ec: 7f800437 lui x8 0x7f800
f0: 008004b7 lui x9 0x800
f4: fff48493 addi x9 x9 -1
f8: 00800937 lui x18 0x800
fc: 008676b3 and x13 x12 x8
100: 00967733 and x14 x12 x9
104: 01276733 or x14 x14 x18
108: 40d507b3 sub x15 x10 x13
10c: 0177d793 srli x15 x15 23
110: 00f75733 srl x14 x14 x15
114: 00b70833 add x16 x14 x11
118: 010563b3 or x7 x10 x16
11c: 000d0067 jalr x0 x26 0
00000120 <float_add>:
120: 7f800437 lui x8 0x7f800
124: 008004b7 lui x9 0x800
128: fff48493 addi x9 x9 -1
12c: 00800937 lui x18 0x800
130: 008ef533 and x10 x29 x8
134: 009ef5b3 and x11 x29 x9
138: 01755513 srli x10 x10 23
13c: 0125e5b3 or x11 x11 x18
140: 008f76b3 and x13 x30 x8
144: 009f7733 and x14 x30 x9
148: 0176d693 srli x13 x13 23
14c: 01276733 or x14 x14 x18
150: 40d507b3 sub x15 x10 x13
154: 00f75733 srl x14 x14 x15
158: 00b70833 add x16 x14 x11
15c: 01000bb7 lui x23 0x1000
160: 017878b3 and x17 x16 x23
164: 00089c63 bne x17 x0 24 <overflow>
168: 00987833 and x16 x16 x9
16c: 0ff57513 andi x10 x10 255
170: 01751513 slli x10 x10 23
174: 010563b3 or x7 x10 x16
178: 000d0067 jalr x0 x26 0
0000017c <overflow>:
17c: 00185813 srli x16 x16 1
180: 00150513 addi x10 x10 1
184: 00987833 and x16 x16 x9
188: 0ff57513 andi x10 x10 255
18c: 01751513 slli x10 x10 23
190: 010563b3 or x7 x10 x16
194: 000d0067 jalr x0 x26 0
00000198 <binary_search>:
198: f89ffd6f jal x26 -120 <float_add>
19c: ff800437 lui x8 0xff800
1a0: 008004b7 lui x9 0x800
1a4: fff48493 addi x9 x9 -1
1a8: 01000937 lui x18 0x1000
1ac: 0083f533 and x10 x7 x8
1b0: 0093f5b3 and x11 x7 x9
1b4: 40950633 sub x12 x10 x9
1b8: 00b66633 or x12 x12 x11
1bc: ffff0437 lui x8 0xffff0
1c0: 00867633 and x12 x12 x8
1c4: 000e8533 add x10 x29 x0
1c8: 00200893 addi x17 x0 2
1cc: 00000073 ecall
1d0: 10000517 auipc x10 0x10000
1d4: eb050513 addi x10 x10 -336
1d8: 00400893 addi x17 x0 4
1dc: 00000073 ecall
1e0: 00060533 add x10 x12 x0
1e4: 00200893 addi x17 x0 2
1e8: 00000073 ecall
1ec: 10000517 auipc x10 0x10000
1f0: e9450513 addi x10 x10 -364
1f4: 00400893 addi x17 x0 4
1f8: 00000073 ecall
1fc: 000f0533 add x10 x30 x0
200: 00200893 addi x17 x0 2
204: 00000073 ecall
208: 10000517 auipc x10 0x10000
20c: e1c50513 addi x10 x10 -484
210: 00400893 addi x17 x0 4
214: 00000073 ecall
218: 0bc60263 beq x12 x28 164 <print_result>
21c: 00ce4863 blt x28 x12 16 <mid_to_ub>
220: 01c64263 blt x12 x28 4 <mid_to_lb>
00000224 <mid_to_lb>:
224: 00060f13 addi x30 x12 0
228: f71ff06f jal x0 -144 <binary_search>
0000022c <mid_to_ub>:
22c: 00060e93 addi x29 x12 0
230: f69ff06f jal x0 -152 <binary_search>
```
### 3.2 Five-stage pipelined processor
In this assignment, 5-stage pipelined processor is chosen to execute the program. The block diagram is shown below:

1. **Instruction Fetch (IF)**
* instruction is being fetched from the memory.
2. **Instruction Decode (ID)**
* decode the instruction and fetch the source operands
3. **Execute (EX)**
* the computer performs the operation specified by the instruction
4. **Memory (MEM)**
* If there is any data that needs to be accessed, it is done in the memory stage
5. **Write (WB)**
* If we need to store the result in the destination location, it is done during the writeback stage.
### 3.3 Instruction Analysis
Take the instruction ` beq x10, x8 372<print_float>`, for example, and analyze how the instruction is operated in the processor at different stages. The sub instruction is
The following image is the format of B-type instruction format.

#### IF Stage

At this stage, the instruction address of` beq x10, x8 372<print_float> ` is `0x000000c0`, and the PC will translate the address to 32-bit instruction.
The opcode of `beq` instruction is `1100011` and can be formatted as the machine code below.
| imm[12] |imm[10:5] | rs2 | rs1 | funct3 | imm[4:1] | imm[11]| opcode|
|:-------:|:--------:|:---:|:---:|:------:|:-------:|:-------:|:-------:|
| 0 | 001011 | 01000 | 01010 | 000 | 1010 | 0 |1100011|
#### ID Stage

The instruction will be decoded in to several parts:
* `opcode` : `0x05` `beq`
* `R1 idx` : `0x0a` `x10`
* `R2 idx` : `0x08` `x8`
* `Wr reg idx` : `0x14` `x14`
`R1 idx` and `R2 idx` will be sent into the registers and get the corresponding value `0x7f800000`. With the opcode `beq` and instruction `0x16850a63`, The `imm` field output `0x00000174` also is `372` in decimal.
#### EX Stage

* `Op1` of `ALU` is set to `PC` : `0x000000c0`
* `Op2` of `ALU` is set to the `imm` field : `0x0000174`
* In `beq` instruction, inputs of the branch taken will be the same as `x10 == x8`; otherwise, inputs of the branch taken will be different.
* The example of `x10 != x8`

the next `PC` will be `PC+4` `0x00000c8`

* The example of `x10 == x8`

the next `PC` will be the specified address `0x0000234`

#### MEM Stage

There is nothing will be written to memory so `WrEn` is set to 0.
#### WB Stage

The `beq` instruction does not need to write back to the register so `WrEn` is set to 0.