Consider a C implementation of the count leading zero function for 64-bit integers. A leading zero is defined as any ‘0’ digit that appears before the first non-zero digit in the binary representation of a number.
Examples:Input :
Explanation: As Binary = (00000000000000000000000000000000 00000000000000000000000000010000)
Input :
Explanation: As Binary =(00000000000000000000000000000000 00000000000000000000000000100001)
The implementation utilizes population count,
which counts the number of set bits in the given value.
The source code is listed below, and you shall fill in parts A01 and A02.
#include <stdint.h>
uint16_t count_leading_zeros(uint64_t x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
x |= (x >> 32);
/* count ones (population count) */
x -= ((x >> 1) & A01 /* Fill this! */ );
x = ((x >> 2) & 0x3333333333333333) + (x & A02 /* Fill this! */);
x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
x += (x >> 8);
x += (x >> 16);
return (64 - (x & 0x7f));
}
Obviously, the above C code listing was incomplete, you shall provide the functioned implementations. A01 and A02 are hexadecimal literals in C. You must obey the following rules when filling them:
You can read the content of Population count and find the section titled “The PopCount routine,” then select the constants.
A01 ?=
0x5555555555555555
A02 ?=
0x3333333333333333
The canonical algorithm examines one bit at a time starting from the MSB until a non-zero bit is found。
The divide and conquer SWAR-approach deals with counting bits of duos, to aggregate the duo-counts to nibbles and bytes inside one 64-bit register in parallel, to finally sum all bytes together.
The code x -= ((x >> 1) & A01)
uses a mask A01 to retain only one bit out of every pair of adjacent bits in the variable x. For example, if we have an 8-bit binary number 10101010
, right-shifting it by one position results in 01010101
, which serves as our mask. When extended to 64 bits, it becomes 0x5555555555555555
.
CLZ Assembly simulation for Ripes RV32I
...
#s0 is x
add s0,x0,a0
#x|=(x>>1)
srli t0,s0,1
or s0,s0,t0
#x|=(x>>2)
srli t0,s0,2
or s0,s0,t0
#x|=(x>>4)
srli t0,s0,4
or s0,s0,t0
#x|=(x>>8)
srli t0,s0,8
or s0,s0,t0
#x|=(x>>16)
srli t0,s0,16
or s0,s0,t0
#x -= ((x>>1) & 0x55555555)
li t1,0x55555555
srli t0,s0,1
and t0,t0,t1
sub s0,s0,t0
#x = ((x>>2) & 0x33333333)+(x &0x33333333)
li t1,0x33333333
srli t0,s0,2
and t0,t0,t1
and t1,s0,t1
add s0,t0,t1
....
Turning it into a pseudocode function using a simple number to calculate its leading zeros: When the PC
holds the next instruction memory address 0x0000005c
preparing to execute the next instruction li t1, 0x55555555
, it can be observed through Ripes that the pipeline is as follows:
It was observed that this instruction is executed in the 23rd
cycle and does not encounter Structural Hazards or Control Hazards. However, in the case of Data Hazards, using Ripes is insufficient for resolving the issue since it cannot be resolved through bypassing.
Consider a C program which converts single precision floating point values to the corresponding bfloat16 floating-point format.
Column 1 | Column 2 |
---|---|
1.200000 | 0x3f99999a |
1.203125 | 0x3f9a0000 |
2.310000 | 0x4013d70a |
2.312500 | 0x40140000 |
3.460000 | 0x405d70a4 |
3.453125 | 0x405d0000 |
5.630000 | 0x40b428f6 |
The source code is listed below, and you shall fill in parts B01 and B02.
float fp32_to_bf16(float x)
{
float y = x;
int *p = (int *) &y;
unsigned int exp = *p & 0x7F800000;
unsigned int man = *p & 0x007FFFFF;
if (exp == 0 && man == 0) /* zero */
return x;
if (exp == B01 /* Fill this! */) /* infinity or NaN */
return x;
/* Normalized number */
/* round to nearest */
float r = x;
int *pr = (int *) &r;
*pr &= 0xFF800000; /* r has the same exp as x */
r /= B02 /* Fill this! */;
y = x + r;
*p &= 0xFFFF0000;
return y;
}
Obviously, the above C code listing was incomplete, you shall provide the functioned implementations. B01 and B02 are hexadecimal literals in C. You must obey the following rules when filling them:
B01 ?=
0x7F800000
B02 ?=
0x100
FP32 versus BF16
FP32:32 bits floating point for IEEE754
format the single-precision floating-point follow the picture:
BF16:The original IEEE FP16 was not designed with deep learning applications in mind, its dynamic range is too narrow. BFLOAT16 solves this, providing dynamic range identical to that of FP32.
So BF16 has:
The bfloat16 format, being a truncated IEEE 754 FP32, allows for fast conversion to and from an IEEE 754 FP32
The code (if exp == B01)// infinity or NaN
is used to determine whether a floating-point number y is either infinity or NaN in the IEEE 754 standard. When the exponent bits are set to111111111
(binary) or 0x7F800000
(hexadecimal), it is used to check for the conditions of infinity or NaN. In this context, the binary value B01 corresponds to 0x7F800000
, where the exponent bits are all ones, and the fraction bits are all zeros.
The statement r /= B02
is intended to facilitate the conversion from FP32 (Single Precision Floating-Point) to BF16 (Brain Floating-Point 16-bit) in order to preserve the precision of the floating-point number as much as possible. It involves right-shifting the FP32 value by 8 bits to perform a rounding operation.
In RISC-V assembly language, we can write a simple assembly code to observe its pipeline stages and memory addresses
.....
# Check if it is infinity or NaN
li a2, 0x7F800000
beq a3, a2, is_inf_or_nan
# Perform rounding
li a2, 0xFF800000
li t0, 0xFFFF0000
and a1, a1, a2
srli a1, a1, 8 # Divide by 0x100
add a1, a1, a0
and a0, a1, t0
j end
is_inf_or_nan:
la a0, nanmsg
li a7, 4 # system call for printing string
ecall
j end
.....
When we look at the section of code for checking infinity or NaN, we can observe the instruction beq a3, a2, is_inf_or_nan
. Since a3
is a register used to store the exponent, we can directly use beq (branch if equal) to check if a3
is equal to a2
. If they are equal, it indicates that the value is either NaN or infinity. This can be confirmed using Ripes.
However, this branch instruction does not take place
Assuming that special values such as NaN and INF do not appear during calculations, the following C code attempts to implement single-precision floating-point multiplication in a minimal way. There is also no overflow.
#include <stdio.h>
#include <stdint.h>
uint64_t mask_lowest_zero(uint64_t x)
{
uint64_t mask = x;
mask &= (mask << 1) | 0x1;
mask &= (mask << 2) | 0x3;
mask &= (mask << 4) | 0xF;
mask &= (mask << 8) | 0xFF;
mask &= (mask << 16) | 0xFFFF;
mask &= (mask << 32) | 0xFFFFFFFF;
return mask;
}
int64_t inc(int64_t x)
{
if (~x == 0)
return 0;
/* TODO: Carry flag */
int64_t mask = mask_lowest_zero(x);
int64_t z1 = mask ^ ((mask << 1) | 1);
return (x & ~mask) | z1;
}
static inline int64_t getbit(int64_t value, int n)
{
return (value >> n) & 1;
}
/* int32 multiply */
int64_t imul32(int32_t a, int32_t b)
{
int64_t r = 0, a64 = (int64_t) a, b64 = (int64_t) b;
for (int i = 0; i < 32; i++) {
if (getbit(b64, i))
r += a64 << i;
}
return r;
}
/* float32 multiply */
float fmul32(float a, float b)
{
/* TODO: Special values like NaN and INF */
int32_t ia = *(int32_t *) &a, ib = *(int32_t *) &b;
/* sign */
int sa = ia >> 31;
int sb = ib >> 31;
/* mantissa */
int32_t ma = (ia & 0x7FFFFF) | 0x800000;
int32_t mb = (ib & 0x7FFFFF) | 0x800000;
/* exponent */
int32_t ea = ((ia >> 23) & 0xFF);
int32_t eb = ((ib >> 23) & 0xFF);
/* 'r' = result */
int64_t mrtmp = imul32(ma, mb) >> 23;
int mshift = getbit(mrtmp, C01);
int64_t mr = mrtmp >> mshift;
int32_t ertmp = ea + eb - C02;
int32_t er = mshift ? inc(ertmp) : ertmp;
/* TODO: Overflow ^ */
int sr = sa ^ sb;
int32_t r = (sr << C03) | ((er & 0xFF) << C04) | (mr & 0x7FFFFF);
return *(float *) &r;
}
Obviously, the above C code listing was incomplete, you shall provide the functioned implementations. C01, C02, C03, and C04 are decimal integer literals in C. You must obey the following rules when filling them:
C01 ?=
24
C02 ?=
127
C03 ?=
31
C04 ?=
23
C01
is configured to adapt to the FP32 floating-point format. This is because when multiplying two floating-point numbers with 23*2
, it involves the multiplication of two significands. Therefore, it is set to 24
(fraction bits + sign bits) to facilitate subsequent calculations.
C02
is configured because it represents the offset of the exponent bits in the floating-point number.
If we simply add the two exponent bits ea
and eb
together, the bias will be added twice, and we will get
Therefore, C02 is set to 127
In summary, to assemble the final result, we need to combine the sign bit (sr
), the biased exponent (ertmp
, which has already been adjusted if necessary), and the normalized mantissa (mrtmp)
. Therefore, C03
and C04
should be set to 31
(for the most significant bit, representing the sign bit) and 23
(for the exponent part, which includes the 8
bits following the most significant bit), respectively.
Let us endeavor to ascertain endianness at compile time. When the need arises for compile-time endianness determination, it typically falls into one of two distinct use cases:
In most prevalent use cases, the objective is to facilitate the conversion between little-endian and big-endian formats, as well as potentially to and from the host endianness. For this purpose, we shall introduce endian conversion functions, which shall be denoted by the end_ prefix
.
/* Return a value in which the order of the bytes in 4-byte arguments is reversed. */
static inline uint32_t end_bswap32(uint32_t x)
{
return (x >> 24) | (x >> 8 & D01) | (x << 8 & D02) |
(x << 24);
}
/* Host to Big Endian 32-bit */
static inline uint32_t end_htobe32(uint32_t n)
{
union {
int i;
char c;
} u = {1};
return u.c ? D03 : D04;
}
/* Host to Little Endian 32-bit */
static inline uint32_t end_htole32(uint32_t n)
{
union {
int i;
char c;
} u = {1};
return u.c ? D05 : D06;
}
You shall provide the functioned implementations. Both D01
and D02
are hexadecimal integer literals, meaning that they should start with 0x. D03
, D04
, D05
, and D06
are C expressions. You might consider to call end_bswap32 function when it is necessary. You must obey the following rules when filling them:
end_bswap32(n)
to end_bswap32( n )
. Be aware of the spaces! Details determine success or failure.D01 ? = 0xff00
D02 ? = 0xff0000
D03 ? = end_bswap32(n)
D04 ? = n
D05 ? = n
D06 ? = end_bswap32(n)
Endianness
Little endian: In little endian machines, last byte of binary representation of the multibyte data-type is stored first.
Big endian: In big endian machines, first byte of binary representation of the multibyte data-type is stored first.
In the end_bswap32(n)
function, we want to achieve byte swapping. So, when we perform the code (x >> 8 & D01)
, it means shifting the original value to the right by eight bits. If the original value is 0x00AABBCC
, after this operation, we will get 0x00BB0000
.
Similarly, (x << 8 & D02)
is a mask that shifts the original value to the left by eight bits and performs a bitwise AND operation. This operation results in 0x00CC0000
.
D03
and D04
are used for converting from little-endian to big-endian . Therefore, D03 involves using the byte reversal function end_bswap32(n)
, while D04 is not needed because it's already in big-endian .
D05
and D06
are used for converting to little-endian . D05
represents the value n
, while D06
represents the result of the function end_bswap32(n)
.
By the way, it's worth noting that RISC-V is little-endian.
Arithmetic overflow manifests when the outcome of a computation cannot be adequately represented within the constraints of the prevailing encoding scheme, consequently falling beyond the range of representable values, thus yielding an erroneous result.
Distinct categories of overflow can be delineated as follows:
1.Find the largest 8-bit unsigned numeral c
(answer in HEX) such that c + 0x80
causes NEITHER signed nor unsigned overflow in 8 bits
0x7F
2.Find the smallest 8-bit numeral c (answer in HEX) such that c + 0x71 causes signed overflow, but NOT unsigned overflow in 8 bits.
0xF
The image above explains the principles of Unsigned and Signed Overflow.
E01: We need to find the largest integer that, when added to 0x80
, will not result in an overflow.
In an 8-bit unsigned integer, the maximum value is 11111111
, where 0x80
represents 10000000
. Therefore, subtracting 10000000
from 11111111
gives us 01111111
, which is the largest integer without causing an overflow, and it is equal to 0x7f
.
E02:We need to find the smallest 8-bit integer that, when added to 0x71
, will result in a signed overflow without causing an unsigned overflow.
To achieve this, we must ensure that there is no carry-out from the most significant bit (MSB) to trigger the signed overflow. Starting from 0x71
, we look for the first integer that leads to a signed overflow. The integer 0x80
is the first one that causes a signed overflow. Calculating their difference: 0x80
- 0x71
= 0xF
.
Therefore, the answer is 0xF
, which is an 8-bit integer. When added to 0x71
, it causes a signed overflow but does not cause an unsigned overflow.
The subsequent function, denoted as absf, yields the absolute value of a single-precision floating-point number. What is the hexadecimal literal representation of the value denoted as F01
?
#include <stdint.h>
float absf(float x)
{
uint32_t mask = F01;
union {
uint32_t i;
float f;
} u = {.f = x};
u.i &= mask;
return u.f;
}
0x7fffffff
The first bit in a floating-point representation typically represents the sign bit. To obtain the absolute value of a floating-point number, you should set the first bit to 0
. Therefore, for a 32-bit representation, it would be 01111111111111111111111111111111
, which is 0x7FFFFFFF
.
To ensure that the sign bit is set to 0
and achieve the absolute value, you can use a bitwise AND operation (&
). This operation guarantees that the sign bit becomes 0 since any bit AND with 0
results in 0
. This ensures that the result is the absolute value of the original number (+
).