Try   HackMD

Computer Architecture Final project: Annotate and explain Quiz1

Problem A

Consider a C implementation of the count leading zero function for 64-bit integers. A leading zero is defined as any ‘0’ digit that appears before the first non-zero digit in the binary representation of a number.

Examples:Input :

  • N = 16 Output : 59

Explanation: As Binary = (00000000000000000000000000000000 00000000000000000000000000010000)

Input :

  • N = 33 Output : 58

Explanation: As Binary =(00000000000000000000000000000000 00000000000000000000000000100001)

The implementation utilizes population count,
which counts the number of set bits in the given value.
The source code is listed below, and you shall fill in parts A01 and A02.

#include <stdint.h>
uint16_t count_leading_zeros(uint64_t x)
{
    x |= (x >> 1);
    x |= (x >> 2);
    x |= (x >> 4);
    x |= (x >> 8);
    x |= (x >> 16);
    x |= (x >> 32);

    /* count ones (population count) */
    x -= ((x >> 1) & A01 /* Fill this! */ );
    x = ((x >> 2) & 0x3333333333333333) + (x & A02 /* Fill this! */);
    x = ((x >> 4) + x) & 0x0f0f0f0f0f0f0f0f;
    x += (x >> 8);
    x += (x >> 16);

    return (64 - (x & 0x7f));
}

Obviously, the above C code listing was incomplete, you shall provide the functioned implementations. A01 and A02 are hexadecimal literals in C. You must obey the following rules when filling them:

  • Write shorter code as possible as you can.
  • No space character

You can read the content of Population count and find the section titled “The PopCount routine,” then select the constants.

A01 ?=
0x5555555555555555
A02 ?=
0x3333333333333333

📝 Explain and Annotate

Count leading zeros:

The canonical algorithm examines one bit at a time starting from the MSB until a non-zero bit is found。

Population count:

The divide and conquer SWAR-approach deals with counting bits of duos, to aggregate the duo-counts to nibbles and bytes inside one 64-bit register in parallel, to finally sum all bytes together.

The code x -= ((x >> 1) & A01) uses a mask A01 to retain only one bit out of every pair of adjacent bits in the variable x. For example, if we have an 8-bit binary number 10101010, right-shifting it by one position results in 01010101, which serves as our mask. When extended to 64 bits, it becomes 0x5555555555555555 .

Ripes simulation for Assembly

CLZ Assembly simulation for Ripes RV32I

...
    
#s0 is x
add s0,x0,a0

#x|=(x>>1)
srli t0,s0,1
or s0,s0,t0

#x|=(x>>2)
srli t0,s0,2
or s0,s0,t0

#x|=(x>>4)
srli t0,s0,4
or s0,s0,t0

#x|=(x>>8)
srli t0,s0,8
or s0,s0,t0

#x|=(x>>16)
srli t0,s0,16
or s0,s0,t0

#x -= ((x>>1) & 0x55555555)
li t1,0x55555555
srli t0,s0,1
and t0,t0,t1
sub s0,s0,t0

#x = ((x>>2) & 0x33333333)+(x &0x33333333)
li t1,0x33333333
srli t0,s0,2
and t0,t0,t1
and t1,s0,t1
add s0,t0,t1

....

Turning it into a pseudocode function using a simple number to calculate its leading zeros: When the PC holds the next instruction memory address 0x0000005c preparing to execute the next instruction li t1, 0x55555555, it can be observed through Ripes that the pipeline is as follows:

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

It was observed that this instruction is executed in the 23rd cycle and does not encounter Structural Hazards or Control Hazards. However, in the case of Data Hazards, using Ripes is insufficient for resolving the issue since it cannot be resolved through bypassing.

Problem B

Consider a C program which converts single precision floating point values to the corresponding bfloat16 floating-point format.

Column 1 Column 2
1.200000 0x3f99999a
1.203125 0x3f9a0000
2.310000 0x4013d70a
2.312500 0x40140000
3.460000 0x405d70a4
3.453125 0x405d0000
5.630000 0x40b428f6

The source code is listed below, and you shall fill in parts B01 and B02.

float fp32_to_bf16(float x)                 
{
    float y = x;
    int *p = (int *) &y;
    unsigned int exp = *p & 0x7F800000;
    unsigned int man = *p & 0x007FFFFF;
    if (exp == 0 && man == 0) /* zero */
        return x;
    if (exp == B01 /* Fill this! */) /* infinity or NaN */
        return x;

    /* Normalized number */
    /* round to nearest */
    float r = x;
    int *pr = (int *) &r;
    *pr &= 0xFF800000;  /* r has the same exp as x */
    r /= B02 /* Fill this! */;
    y = x + r;

    *p &= 0xFFFF0000;

    return y;
}

Obviously, the above C code listing was incomplete, you shall provide the functioned implementations. B01 and B02 are hexadecimal literals in C. You must obey the following rules when filling them:

  • Write shorter code as possible as you can.
  • No space character

B01 ?=
0x7F800000
B02 ?=
0x100

📝 Explain and Annotate

FP32 versus BF16
FP32:32 bits floating point for IEEE754 format the single-precision floating-point follow the picture:

  • 1 sign bits
  • 8 exponent bits
  • 23 fraction bits

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

BF16:The original IEEE FP16 was not designed with deep learning applications in mind, its dynamic range is too narrow. BFLOAT16 solves this, providing dynamic range identical to that of FP32.
So BF16 has:

  • 1 sign bits
  • 8 exponent bits
  • 7 fraction bits

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The bfloat16 format, being a truncated IEEE 754 FP32, allows for fast conversion to and from an IEEE 754 FP32

The code (if exp == B01)// infinity or NaN is used to determine whether a floating-point number y is either infinity or NaN in the IEEE 754 standard. When the exponent bits are set to111111111(binary) or 0x7F800000 (hexadecimal), it is used to check for the conditions of infinity or NaN. In this context, the binary value B01 corresponds to 0x7F800000, where the exponent bits are all ones, and the fraction bits are all zeros.

The statement r /= B02 is intended to facilitate the conversion from FP32 (Single Precision Floating-Point) to BF16 (Brain Floating-Point 16-bit) in order to preserve the precision of the floating-point number as much as possible. It involves right-shifting the FP32 value by 8 bits to perform a rounding operation.

In RISC-V assembly language, we can write a simple assembly code to observe its pipeline stages and memory addresses

    .....
    # Check if it is infinity or NaN
    li a2, 0x7F800000
    beq a3, a2, is_inf_or_nan

    # Perform rounding
    li a2, 0xFF800000
    li t0, 0xFFFF0000
    and a1, a1, a2
    srli a1, a1, 8     # Divide by 0x100
    add a1, a1, a0
    and a0, a1, t0
    j end
    
    is_inf_or_nan:
    la a0, nanmsg
    li a7, 4  # system call for printing string
    ecall
    j end
    .....

When we look at the section of code for checking infinity or NaN, we can observe the instruction beq a3, a2, is_inf_or_nan. Since a3 is a register used to store the exponent, we can directly use beq (branch if equal) to check if a3 is equal to a2. If they are equal, it indicates that the value is either NaN or infinity. This can be confirmed using Ripes.
螢幕擷取畫面 2024-01-10 214856
However, this branch instruction does not take place

Problem C

Assuming that special values such as NaN and INF do not appear during calculations, the following C code attempts to implement single-precision floating-point multiplication in a minimal way. There is also no overflow.

#include <stdio.h>
#include <stdint.h>
uint64_t mask_lowest_zero(uint64_t x)
{
    uint64_t mask = x;
    mask &= (mask << 1) | 0x1;
    mask &= (mask << 2) | 0x3;
    mask &= (mask << 4) | 0xF;
    mask &= (mask << 8) | 0xFF;
    mask &= (mask << 16) | 0xFFFF;
    mask &= (mask << 32) | 0xFFFFFFFF;
    return mask;
}
int64_t inc(int64_t x)
{
    if (~x == 0)
        return 0;
    /* TODO: Carry flag */
    int64_t mask = mask_lowest_zero(x);
    int64_t z1 = mask ^ ((mask << 1) | 1);
    return (x & ~mask) | z1;
}
static inline int64_t getbit(int64_t value, int n)
{
    return (value >> n) & 1;
}
/* int32 multiply */
int64_t imul32(int32_t a, int32_t b)
{
    int64_t r = 0, a64 = (int64_t) a, b64 = (int64_t) b;
    for (int i = 0; i < 32; i++) {
        if (getbit(b64, i))
            r += a64 << i;
    }
    return r;
}
/* float32 multiply */
float fmul32(float a, float b)
{
    /* TODO: Special values like NaN and INF */
    int32_t ia = *(int32_t *) &a, ib = *(int32_t *) &b;

    /* sign */
    int sa = ia >> 31;
    int sb = ib >> 31;

    /* mantissa */
    int32_t ma = (ia & 0x7FFFFF) | 0x800000;
    int32_t mb = (ib & 0x7FFFFF) | 0x800000;

    /* exponent */
    int32_t ea = ((ia >> 23) & 0xFF);
    int32_t eb = ((ib >> 23) & 0xFF);

    /* 'r' = result */
    int64_t mrtmp = imul32(ma, mb) >> 23;
    int mshift = getbit(mrtmp, C01);

    int64_t mr = mrtmp >> mshift;
    int32_t ertmp = ea + eb - C02;
    int32_t er = mshift ? inc(ertmp) : ertmp;
    /* TODO: Overflow ^ */
    int sr = sa ^ sb;
    int32_t r = (sr << C03) | ((er & 0xFF) << C04) | (mr & 0x7FFFFF);
    return *(float *) &r;
}

Obviously, the above C code listing was incomplete, you shall provide the functioned implementations. C01, C02, C03, and C04 are decimal integer literals in C. You must obey the following rules when filling them:

  • Write shorter code as possible as you can.
  • No space character

C01 ?=
24
C02 ?=
127
C03 ?=
31
C04 ?=
23

📝 Explain and Annotate

C01 is configured to adapt to the FP32 floating-point format. This is because when multiplying two floating-point numbers with 23*2, it involves the multiplication of two significands. Therefore, it is set to 24 (fraction bits + sign bits) to facilitate subsequent calculations.

C02 is configured because it represents the offset of the exponent bits in the floating-point number.

biased exponent=real exponent+127
If we simply add the two exponent bits ea and eb together, the bias will be added twice, and we will get

real exponenta+127+real exponentb+127=real exponenta+real exponentb+254

Therefore, C02 is set to 127

In summary, to assemble the final result, we need to combine the sign bit (sr), the biased exponent (ertmp, which has already been adjusted if necessary), and the normalized mantissa (mrtmp). Therefore, C03 and C04 should be set to 31 (for the most significant bit, representing the sign bit) and 23 (for the exponent part, which includes the 8 bits following the most significant bit), respectively.

Problem D

Let us endeavor to ascertain endianness at compile time. When the need arises for compile-time endianness determination, it typically falls into one of two distinct use cases:

  • In select, infrequent scenarios, one must discern endianness to properly structure data layouts, such as when dealing with a union like RGB, or when endianness considerations are integral to the code logic.
  • In the majority of cases, the primary requirement is to facilitate the conversion between different endiannesses, such as when a communication protocol dictates that a value must be stored in a specific endianness.

In most prevalent use cases, the objective is to facilitate the conversion between little-endian and big-endian formats, as well as potentially to and from the host endianness. For this purpose, we shall introduce endian conversion functions, which shall be denoted by the end_ prefix.

/* Return a value in which the order of the bytes in 4-byte arguments is reversed. */
static inline uint32_t end_bswap32(uint32_t x)
{
    return (x >> 24) | (x >> 8 & D01) | (x << 8 & D02) |
           (x << 24);
}

/* Host to Big Endian 32-bit */
static inline uint32_t end_htobe32(uint32_t n)
{
    union {
        int i;
        char c;
    } u = {1};
    return u.c ? D03 : D04;
}

/* Host to Little Endian 32-bit */
static inline uint32_t end_htole32(uint32_t n)
{
    union {
        int i;
        char c;
    } u = {1};
    return u.c ? D05 : D06;
}

You shall provide the functioned implementations. Both D01 and D02 are hexadecimal integer literals, meaning that they should start with 0x. D03, D04, D05, and D06 are C expressions. You might consider to call end_bswap32 function when it is necessary. You must obey the following rules when filling them:

  • Write shorter code as possible as you can.
  • Do necessary casting to eliminate compilation warnings.
  • Follow the consistent coding style. That is, we prefer end_bswap32(n) to end_bswap32( n ). Be aware of the spaces! Details determine success or failure.

D01 ? = 0xff00
D02 ? = 0xff0000
D03 ? = end_bswap32(n)
D04 ? = n
D05 ? = n
D06 ? = end_bswap32(n)

📝 Explain and Annotate

  • Endianness

    lb

Little endian: In little endian machines, last byte of binary representation of the multibyte data-type is stored first.

Big endian: In big endian machines, first byte of binary representation of the multibyte data-type is stored first.

In the end_bswap32(n) function, we want to achieve byte swapping. So, when we perform the code (x >> 8 & D01), it means shifting the original value to the right by eight bits. If the original value is 0x00AABBCC, after this operation, we will get 0x00BB0000.

Similarly, (x << 8 & D02) is a mask that shifts the original value to the left by eight bits and performs a bitwise AND operation. This operation results in 0x00CC0000.

D03 and D04 are used for converting from little-endian to big-endian . Therefore, D03 involves using the byte reversal function end_bswap32(n), while D04 is not needed because it's already in big-endian .

D05 and D06 are used for converting to little-endian . D05 represents the value n, while D06 represents the result of the function end_bswap32(n).

By the way, it's worth noting that RISC-V is little-endian.

Problem E

Arithmetic overflow manifests when the outcome of a computation cannot be adequately represented within the constraints of the prevailing encoding scheme, consequently falling beyond the range of representable values, thus yielding an erroneous result.

Distinct categories of overflow can be delineated as follows:

  • Unsigned Overflow: Occurs when the resultant value extends beyond the interval image. An observable indication of unsigned overflow emerges when the addition of two numbers yields a result smaller than either of the operands
  • Signed Overflow: Takes place when the outcome extends outside the span of imageA discernible indicator of signed overflow arises when two numbers of the same sign are added, yet the resultant sum bears the opposite sign.

螢幕擷取畫面 2024-01-08 162136

1.Find the largest 8-bit unsigned numeral c (answer in HEX) such that c + 0x80 causes NEITHER signed nor unsigned overflow in 8 bits

  • E01 = ?
    0x7F

2.Find the smallest 8-bit numeral c (answer in HEX) such that c + 0x71 causes signed overflow, but NOT unsigned overflow in 8 bits.

  • E02 = ?
    0xF

📝 Explain and Annotate

The image above explains the principles of Unsigned and Signed Overflow.

E01: We need to find the largest integer that, when added to 0x80, will not result in an overflow.
In an 8-bit unsigned integer, the maximum value is 11111111, where 0x80 represents 10000000. Therefore, subtracting 10000000 from 11111111 gives us 01111111, which is the largest integer without causing an overflow, and it is equal to 0x7f.

E02:We need to find the smallest 8-bit integer that, when added to 0x71, will result in a signed overflow without causing an unsigned overflow.

To achieve this, we must ensure that there is no carry-out from the most significant bit (MSB) to trigger the signed overflow. Starting from 0x71, we look for the first integer that leads to a signed overflow. The integer 0x80 is the first one that causes a signed overflow. Calculating their difference: 0x80 - 0x71 = 0xF.

Therefore, the answer is 0xF, which is an 8-bit integer. When added to 0x71, it causes a signed overflow but does not cause an unsigned overflow.

ProblemF

The subsequent function, denoted as absf, yields the absolute value of a single-precision floating-point number. What is the hexadecimal literal representation of the value denoted as F01?

#include <stdint.h>
float absf(float x)
{
    uint32_t mask = F01;
    union {
        uint32_t i;
        float f;
    } u = {.f = x};
    u.i &= mask;
    return u.f;
}
  • F01 =?
    0x7fffffff

📝 Explain and Annotate

  • IEEE754 floating point single-precision

image
The first bit in a floating-point representation typically represents the sign bit. To obtain the absolute value of a floating-point number, you should set the first bit to 0. Therefore, for a 32-bit representation, it would be 01111111111111111111111111111111, which is 0x7FFFFFFF.
To ensure that the sign bit is set to 0 and achieve the absolute value, you can use a bitwise AND operation (&). This operation guarantees that the sign bit becomes 0 since any bit AND with 0 results in 0. This ensures that the result is the absolute value of the original number (+).

Reference