# Computer Architecture Final project: Annotate and explain Quiz1
# Problem A
Consider a C implementation of the [count leading zero](https://en.wikipedia.org/wiki/Find_first_set#CLZ) function for 64-bit integers. A leading zero is defined as any β0β digit that appears before the first non-zero digit in the binary representation of a number.
Examples:Input :
* N = 16 Output : 59
> Explanation: As Binary = (00000000000000000000000000000000 00000000000000000000000000010000)
Input :
* N = 33 Output : 58
> Explanation: As Binary =(00000000000000000000000000000000 00000000000000000000000000100001)
The implementation utilizes [population count](https://en.wikichip.org/wiki/population_count),
which counts the number of set bits in the given value.
The source code is listed below, and you shall fill in parts A01 and A02.
```cpp
#include <stdint.h>
uint16_t count_leading_zeros(uint64_t x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x |= (x >> 32);
/* count ones (population count) */
x -= ((x >> 1) & A01 /* Fill this! */ );
x = ((x >> 2) & 0x3333333333333333) + (x & A02 /* Fill this! */);
x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
x += (x >> 8);
x += (x >> 16);
return (64 - (x & 0x7f));
}
```
Obviously, the above C code listing was incomplete, you shall provide the functioned implementations. A01 and A02 are hexadecimal literals in C. You must obey the following rules when filling them:
* Write shorter code as possible as you can.
* No space character
You can read the content of Population count and find the section titled βThe PopCount routine,β then select the constants.
A01 ?=
==`0x5555555555555555`==
A02 ?=
==`0x3333333333333333`==
:::info
### π Explain and Annotate
#### Count leading zeros:
The canonical algorithm examines one bit at a time starting from the MSB until a non-zero bit is foundγ
#### Population count:
The divide and conquer SWAR-approach deals with counting bits of duos, to aggregate the duo-counts to nibbles and bytes inside one 64-bit register in parallel, to finally sum all bytes together.
The code `x -= ((x >> 1) & A01)` uses a mask A01 to retain only one bit out of every pair of adjacent bits in the variable x. For example, if we have an 8-bit binary number `10101010`, right-shifting it by one position results in `01010101`, which serves as our mask. When extended to 64 bits, it becomes `0x5555555555555555` .
### Ripes simulation for Assembly
CLZ Assembly simulation for Ripes RV32I
```cpp
...
#s0 is x
add s0,x0,a0
#x|=(x>>1)
srli t0,s0,1
or s0,s0,t0
#x|=(x>>2)
srli t0,s0,2
or s0,s0,t0
#x|=(x>>4)
srli t0,s0,4
or s0,s0,t0
#x|=(x>>8)
srli t0,s0,8
or s0,s0,t0
#x|=(x>>16)
srli t0,s0,16
or s0,s0,t0
#x -= ((x>>1) & 0x55555555)
li t1,0x55555555
srli t0,s0,1
and t0,t0,t1
sub s0,s0,t0
#x = ((x>>2) & 0x33333333)+(x &0x33333333)
li t1,0x33333333
srli t0,s0,2
and t0,t0,t1
and t1,s0,t1
add s0,t0,t1
....
```
Turning it into a pseudocode function using a simple number to calculate its leading zeros: When the `PC` holds the next instruction memory address `0x0000005c` preparing to execute the next instruction li t1, `0x55555555`, it can be observed through Ripes that the pipeline is as follows:

It was observed that this instruction is executed in the `23rd` cycle and does not encounter Structural Hazards or Control Hazards. However, in the case of Data Hazards, using Ripes is insufficient for resolving the issue since it cannot be resolved through bypassing.
:::
# Problem B
Consider a C program which converts single precision floating point values to the corresponding [bfloat16](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format) floating-point format.
| Column 1 | Column 2 |
| -------- | -------- |
| 1.200000 |0x3f99999a |
|1.203125 |0x3f9a0000 |
| 2.310000 |0x4013d70a |
| 2.312500 |0x40140000 |
| 3.460000 |0x405d70a4 |
| 3.453125 |0x405d0000 |
| 5.630000 |0x40b428f6 |
The source code is listed below, and you shall fill in parts B01 and B02.
```cpp
float fp32_to_bf16(float x)
{
float y = x;
int *p = (int *) &y;
unsigned int exp = *p & 0x7F800000;
unsigned int man = *p & 0x007FFFFF;
if (exp == 0 && man == 0) /* zero */
return x;
if (exp == B01 /* Fill this! */) /* infinity or NaN */
return x;
/* Normalized number */
/* round to nearest */
float r = x;
int *pr = (int *) &r;
*pr &= 0xFF800000; /* r has the same exp as x */
r /= B02 /* Fill this! */;
y = x + r;
*p &= 0xFFFF0000;
return y;
}
```
Obviously, the above C code listing was incomplete, you shall provide the functioned implementations. B01 and B02 are hexadecimal literals in C. You must obey the following rules when filling them:
* Write shorter code as possible as you can.
* No space character
B01 ?=
==`0x7F800000`==
B02 ?=
==`0x100`==
:::info
### π Explain and Annotate
**FP32** versus **BF16**
**FP32**:32 bits floating point for `IEEE754 ` format the single-precision floating-point follow the picture:
* 1 sign bits
* 8 exponent bits
* 23 fraction bits

**BF16**:The original IEEE **FP16** was not designed with deep learning applications in mind, its dynamic range is too narrow. **BFLOAT16** solves this, providing dynamic range identical to that of **FP32**.
So **BF16** has:
* 1 sign bits
* 8 exponent bits
* 7 fraction bits

The **bfloat16** format, being a truncated IEEE 754 **FP32**, allows for fast conversion to and from an IEEE 754 **FP32**
The code `(if exp == B01)// infinity or NaN` is used to determine whether a floating-point number y is either infinity or NaN in the **IEEE 754** standard. When the exponent bits are set to` 111111111 `(binary) or `0x7F800000` (hexadecimal), it is used to check for the conditions of infinity or NaN. In this context, the binary value B01 corresponds to `0x7F800000`, where the exponent bits are all ones, and the fraction bits are all zeros.
The statement `r /= B02` is intended to facilitate the conversion from **FP32** (Single Precision Floating-Point) to **BF16** (Brain Floating-Point 16-bit) in order to preserve the precision of the floating-point number as much as possible. It involves right-shifting the **FP32** value by 8 bits to perform a rounding operation.
In RISC-V assembly language, we can write a simple assembly code to observe its pipeline stages and memory addresses
```cpp
.....
# Check if it is infinity or NaN
li a2, 0x7F800000
beq a3, a2, is_inf_or_nan
# Perform rounding
li a2, 0xFF800000
li t0, 0xFFFF0000
and a1, a1, a2
srli a1, a1, 8 # Divide by 0x100
add a1, a1, a0
and a0, a1, t0
j end
is_inf_or_nan:
la a0, nanmsg
li a7, 4 # system call for printing string
ecall
j end
.....
```
When we look at the section of code for checking infinity or NaN, we can observe the instruction `beq a3, a2, is_inf_or_nan`. Since `a3` is a register used to store the exponent, we can directly use beq (branch if equal) to check if `a3` is equal to `a2`. If they are equal, it indicates that the value is either NaN or infinity. This can be confirmed using Ripes.

However, this branch instruction does not take place
:::
# Problem C
Assuming that special values such as NaN and INF do not appear during calculations, the following C code attempts to implement single-precision floating-point multiplication in a minimal way. There is also no overflow.
```cpp
#include <stdio.h>
#include <stdint.h>
uint64_t mask_lowest_zero(uint64_t x)
{
uint64_t mask = x;
mask &= (mask << 1) | 0x1;
mask &= (mask << 2) | 0x3;
mask &= (mask << 4) | 0xF;
mask &= (mask << 8) | 0xFF;
mask &= (mask << 16) | 0xFFFF;
mask &= (mask << 32) | 0xFFFFFFFF;
return mask;
}
int64_t inc(int64_t x)
{
if (~x == 0)
return 0;
/* TODO: Carry flag */
int64_t mask = mask_lowest_zero(x);
int64_t z1 = mask ^ ((mask << 1) | 1);
return (x & ~mask) | z1;
}
static inline int64_t getbit(int64_t value, int n)
{
return (value >> n) & 1;
}
/* int32 multiply */
int64_t imul32(int32_t a, int32_t b)
{
int64_t r = 0, a64 = (int64_t) a, b64 = (int64_t) b;
for (int i = 0; i < 32; i++) {
if (getbit(b64, i))
r += a64 << i;
}
return r;
}
/* float32 multiply */
float fmul32(float a, float b)
{
/* TODO: Special values like NaN and INF */
int32_t ia = *(int32_t *) &a, ib = *(int32_t *) &b;
/* sign */
int sa = ia >> 31;
int sb = ib >> 31;
/* mantissa */
int32_t ma = (ia & 0x7FFFFF) | 0x800000;
int32_t mb = (ib & 0x7FFFFF) | 0x800000;
/* exponent */
int32_t ea = ((ia >> 23) & 0xFF);
int32_t eb = ((ib >> 23) & 0xFF);
/* 'r' = result */
int64_t mrtmp = imul32(ma, mb) >> 23;
int mshift = getbit(mrtmp, C01);
int64_t mr = mrtmp >> mshift;
int32_t ertmp = ea + eb - C02;
int32_t er = mshift ? inc(ertmp) : ertmp;
/* TODO: Overflow ^ */
int sr = sa ^ sb;
int32_t r = (sr << C03) | ((er & 0xFF) << C04) | (mr & 0x7FFFFF);
return *(float *) &r;
}
```
Obviously, the above C code listing was incomplete, you shall provide the functioned implementations. C01, C02, C03, and C04 are decimal integer literals in C. You must obey the following rules when filling them:
* Write shorter code as possible as you can.
* No space character
C01 ?=
==`24`==
C02 ?=
==`127`==
C03 ?=
==`31`==
C04 ?=
==`23`==
:::info
### π Explain and Annotate
`C01` is configured to adapt to the **FP32** floating-point format. This is because when multiplying two floating-point numbers with `23*2`, it involves the multiplication of two significands. Therefore, it is set to `24` (fraction bits + sign bits) to facilitate subsequent calculations.
`C02` is configured because it represents the offset of the exponent bits in the floating-point number.
$$
biased\ exponent = real\ exponent + 127
$$
If we simply add the two exponent bits `ea` and `eb` together, the bias will be added twice, and we will get
$$
real\ exponent_a + 127 + real\ exponent_b + 127 = real\ exponent_a + real\ exponent_b + 254
$$
Therefore, C02 is set to 127
In summary, to assemble the final result, we need to combine the sign bit (`sr`), the biased exponent (`ertmp`, which has already been adjusted if necessary), and the normalized mantissa (`mrtmp)`. Therefore, `C03` and `C04` should be set to `31` (for the most significant bit, representing the sign bit) and `23` (for the exponent part, which includes the `8` bits following the most significant bit), respectively.
:::
# Problem D
Let us endeavor to ascertain [endianness](https://en.wikipedia.org/wiki/Endianness) at compile time. When the need arises for compile-time endianness determination, it typically falls into one of two distinct use cases:
* In select, infrequent scenarios, one must discern endianness to properly structure data layouts, such as when dealing with a union like [RGB](https://en.wikipedia.org/wiki/RGB_color_spaces), or when endianness considerations are integral to the code logic.
* In the majority of cases, the primary requirement is to facilitate the conversion between different endiannesses, such as when a communication protocol dictates that a value must be stored in a specific endianness.
In most prevalent use cases, the objective is to facilitate the conversion between little-endian and big-endian formats, as well as potentially to and from the host endianness. For this purpose, we shall introduce endian conversion functions, which shall be denoted by the `end_ prefix`.
```cpp
/* Return a value in which the order of the bytes in 4-byte arguments is reversed. */
static inline uint32_t end_bswap32(uint32_t x)
{
return (x >> 24) | (x >> 8 & D01) | (x << 8 & D02) |
(x << 24);
}
/* Host to Big Endian 32-bit */
static inline uint32_t end_htobe32(uint32_t n)
{
union {
int i;
char c;
} u = {1};
return u.c ? D03 : D04;
}
/* Host to Little Endian 32-bit */
static inline uint32_t end_htole32(uint32_t n)
{
union {
int i;
char c;
} u = {1};
return u.c ? D05 : D06;
}
```
You shall provide the functioned implementations. Both `D01` and `D02` are hexadecimal integer literals, meaning that they should start with 0x.` D03`, `D04`, `D05`, and `D06` are C expressions. You might consider to call end_bswap32 function when it is necessary. You must obey the following rules when filling them:
* Write shorter code as possible as you can.
* Do necessary casting to eliminate compilation warnings.
* Follow the consistent coding style. That is, we prefer `end_bswap32(n)` to `end_bswap32( n )`. Be aware of the **spaces**! Details determine success or failure.
D01 ? = ==`0xff00`==
D02 ? = ==`0xff0000`==
D03 ? = ==`end_bswap32(n)`==
D04 ? = ==`n`==
D05 ? = ==`n`==
D06 ? = ==`end_bswap32(n)`==
:::info
### π Explain and Annotate
* Endianness

Little endian: In little endian machines, last byte of binary representation of the multibyte data-type is stored first.
Big endian: In big endian machines, first byte of binary representation of the multibyte data-type is stored first.
In the `end_bswap32(n)` function, we want to achieve byte swapping. So, when we perform the code `(x >> 8 & D01)`, it means shifting the original value to the right by eight bits. If the original value is` 0x00AABBCC`, after this operation, we will get` 0x00BB0000`.
Similarly, `(x << 8 & D02)` is a mask that shifts the original value to the left by eight bits and performs a bitwise AND operation. This operation results in `0x00CC0000`.
`D03` and `D04` are used for converting from little-endian to big-endian . Therefore, D03 involves using the byte reversal function` end_bswap32(n)`, while D04 is not needed because it's already in big-endian .
`D05` and `D06` are used for converting to little-endian . `D05` represents the value` n`, while `D06` represents the result of the function `end_bswap32(n)`.
By the way, it's worth noting that RISC-V is little-endian.
:::
# Problem E
Arithmetic overflow manifests when the outcome of a computation cannot be adequately represented within the constraints of the prevailing encoding scheme, consequently falling beyond the range of representable values, thus yielding an erroneous result.
Distinct categories of overflow can be delineated as follows:
* Unsigned Overflow: Occurs when the resultant value extends beyond the interval . An observable indication of unsigned overflow emerges when the addition of two numbers yields a result smaller than either of the operands
* Signed Overflow: Takes place when the outcome extends outside the span of A discernible indicator of signed overflow arises when two numbers of the same sign are added, yet the resultant sum bears the opposite sign.

1.Find the largest 8-bit unsigned numeral `c` (answer in HEX) such that `c + 0x80` causes NEITHER signed nor unsigned overflow in 8 bits
* E01 = ?
==`0x7F`==
2.Find the smallest 8-bit numeral c (answer in HEX) such that c + 0x71 causes signed overflow, but NOT unsigned overflow in 8 bits.
* E02 = ?
==`0xF`==
:::info
### π Explain and Annotate
The image above explains the principles of Unsigned and Signed Overflow.
E01: We need to find the largest integer that, when added to `0x80`, will not result in an overflow.
In an 8-bit unsigned integer, the maximum value is `11111111`, where `0x80` represents `10000000`. Therefore, subtracting` 10000000` from `11111111` gives us `01111111`, which is the largest integer without causing an overflow, and it is equal to `0x7f`.
E02:We need to find the smallest 8-bit integer that, when added to `0x71`, will result in a signed overflow without causing an unsigned overflow.
To achieve this, we must ensure that there is no carry-out from the most significant bit (MSB) to trigger the signed overflow. Starting from `0x71`, we look for the first integer that leads to a signed overflow. The integer `0x80` is the first one that causes a signed overflow. Calculating their difference: `0x80` - `0x71` = `0xF`.
Therefore, the answer is `0xF`, which is an 8-bit integer. When added to `0x71`, it causes a signed overflow but does not cause an unsigned overflow.
:::
# ProblemF
The subsequent function, denoted as absf, yields the absolute value of a single-precision floating-point number. What is the hexadecimal literal representation of the value denoted as `F01`?
```cpp
#include <stdint.h>
float absf(float x)
{
uint32_t mask = F01;
union {
uint32_t i;
float f;
} u = {.f = x};
u.i &= mask;
return u.f;
}
```
* F01 =?
==`0x7fffffff`==
:::info
### π Explain and Annotate
* IEEE754 floating point single-precision

The first bit in a floating-point representation typically represents the sign bit. To obtain the absolute value of a floating-point number, you should set the first bit to `0`. Therefore, for a 32-bit representation, it would be `01111111111111111111111111111111`, which is `0x7FFFFFFF`.
To ensure that the sign bit is set to `0` and achieve the absolute value, you can use a bitwise AND operation (`&`). This operation guarantees that the sign bit becomes 0 since any bit AND with `0` results in `0`. This ensures that the result is the absolute value of the original number (`+`).
:::
## Reference
* [Population Count](https://www.chessprogramming.org/Population_Count)
* [Little and Big Endian Mystery](https://www.geeksforgeeks.org/little-and-big-endian-mystery/)
* [bfloat16 floating-point format](https://en.wikipedia.org/wiki/Bfloat16_floating-point_format)
* [Quiz1 of Computer Architecture (2023 Fall)
Solutions](https://hackmd.io/@sysprog/arch2023-quiz1-sol#Problem-E)
* [Converting float to int in C](https://onestepcode.com/float-to-int-c/)
* [Population Count](https://en.wikichip.org/wiki/population_count)
[Assignment1: RISC-V Assembly and Instruction Pipeline](https://hackmd.io/@sysprog/2023-arch-homework1)