contributed < vestata
>
fp32_to_bf16
This code is a function written in C that converts a 32-bit floating-point number (FP32) into a 16-bit floating-point format called bfloat16 (BF16).
Recall that:
Handling special case: In the bfloat16 format, a value is considered NaN (Not a Number) when the exponent is all 1s (11111111
in binary), and the mantissa (fraction) is not entirely zero. This is similar to how NaNs are represented in the IEEE 754 floating-point formats.
If the exponent is all 1s but the mantissa is completely zero, the value represents infinity (positive or negative, depending on the sign bit). To distinguish NaN from infinity, at least one bit in the mantissa must be set to 1.
In particular, setting the first bit of the mantissa to 1 by | 64
ensures that the value is treated as NaN instead of infinity. Single-precision_floating-point wiki
In normal case: The expression (u.i >> 0x10) & 1
checks whether the bfloat16 result would be even or odd. It does this by examining the least significant bit of the lower 16 bits that will be discarded during the conversion.If the value is odd (i.e., the result of the check is 1), the code adds an extra 1 to u.i
to help with rounding.After this adjustment, the value is right-shifted by 16 bits (>> 0x10
), effectively converting the 32-bit float to bfloat16 by keeping the upper 16 bits.
This ensures that the conversion properly handles rounding by following the "round-to-nearest-even" rule.
The bfloat16 is stored in the lower 16 bits of a 32-bit word.
handling union in risc-v
Info | Value |
---|---|
Cycles | 267 |
Instrs. retired | 201 |
CPI | 1.33 |
IPC | 0.753 |
Clock rate | 0 Hz |
Use bitmask to implement clz is an easy way to boost the speed of the code.
Info | Value |
---|---|
Cycles | 46 |
Instrs. retired | 32 |
CPI | 1.44 |
IPC | 0.696 |
Clock rate | 0 Hz |
Description:
Alice and Bob are playing a game. Initially, Alice has a stringword = "a"
.
You are given a positive integerk
. You are also given an integer arrayoperations
, whereoperations[i]
represents the type of theith
operation.
Now Bob will ask Alice to perform all operations in sequence:
Ifoperations[i] == 0
, append a copy of word to itself.
Ifoperations[i] == 1
, generate a new string by changing each character inword
to its next character in the English alphabet, and append it to the originalword
. For example, performing the operation on"c"
generates"cd"
and performing the operation on"zb"
generates"zbac"
.
Return the value of thekth
character inword
after performing all the operations.
Note that the character'z'
can be changed to'a'
in the second type of operation.
Solution:
So we need to find how many times a
changes to reach the input position k
, the first step is to calculate the word length of the final operation. Since the word initially starts with a
, in the first iteration, the word length is two. In the second iteration, the word length becomes four. The word length grows exponentially. Therefore, to determine the word size at position k
, it can be represented as , which corresponds to operations.
The next step is to determine in which half of the string position k
is located at each operation. If k
is in the right half of the word, we increase the change count by one if operation[i] == 1
. Then, we subtract the position of k
to find its parent character. Use a for loop to go through all the operations.
In the final step, apply the accumulated changes to the character a
.
The relationship between ilog2
(integer log base 2) and clz
(count leading zeros) stems from the fact that finding the logarithm base 2 of an integer can be directly related to identifying the position of the highest set bit in the binary representation of the number.
The log2()
function can be defined by custom function, my_clz()
.
Notice that the input of this question is in long long
, so we hath to modify my_clz()
and ilog2()
to support 64-bit format.
The full code is right here.
Since we have to handle long long
, which is 64-bit data, we need to modify the rv32i
code to use two registers to handle long long
. Following the C code, we have defined three long long
operations: shift_left_64
, blt_64
, and sub_64
.
Test case 1:
Input: k = 12145134613, operations = [0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1]
Output: "i" (0x69)
Test case 2:
Input: k = 10, operations = [0, 1, 0, 1]
Output: "b" (0x62)
Test case 3:
Input: k = 25, operations = [1, 1, 1, 0, 1, 0, 1, 0, 1, 0]
Output: "b" (0x62)
The test result as below:
The execution info as below:
Info | Value |
---|---|
Cycles | 2687 |
Instrs. retired | 1709 |
CPI | 1.57 |
IPC | 0.636 |
Clock rate | 0 Hz |
Now we can choose a processor to run this code. Ripes provide four kinds of processor for us to choose, including single cycle processor, 5-stage processor, 5-stage with hazard detection and 5-stage with forward and hazard detection. Here we choose the 5 stage processor. Its block diagram look like this:
The "5-stage" means this processor using five-stage pipeline to parallelize instructions. The stages are:
You can see that each stage is separated by pipeline registers (the rectangle block with stage names on its each side) in the block diagram.
Detailed Analysis of lw t2, 0(t1)
in RISC-V Pipeline:
instruction format :
lw t2, 0(t1)
The registert1
stores the address of the input operation array plus the offset of the variableop
. We can then check whether the data of operation array is0
or1
, and store the result int2
.
lw t2, 0(t1)
is 0x00032383.imm[11:0] | rs1[5] | funct3[3] | rd[5] | opcode[7] | Instruction |
---|---|---|---|---|---|
imm[11:0] | rs1 | 010 | rd | 0000011 | lw |
000000000000 | 00110 | 010 | 00111 | 0000011 | lw(in binary) |
0x00000000 | 0x06 | 0x02 | 0x07 | 0x03 | lw(in hex) |
lw t2, 0(t1)
into five parts. Which is shown above.R2 idx
is not used in lw
so it is 0x00
0x00000298
) and next PC value (0x0000029c
) are just send through this stage, we don't use them.Reg 1
and Reg 2
are send to branch block, but no branch is taken.0
) to be used as the operands for the ALU.0x06
and the immediate value (0
). The result is the effective memory address, which is then passed to the memory access stage (EX/MEM).0x10000084
, so Read out
is equal to 0x00000001
. The table below denotes the data section of memory.
0x0000029c
) and Wr idx
(0x07
) are just send through this stage, we don't use them.Reg 2
is send to Data in
, but memory doesn't enable writing.Read out
from data memory as final output. So the output value is 0x00000001
.Wr idx
are send back to registers block. Finally, the value 0x00000001
will be write into x7
register, whose ABI name is t2
.After all these stage are done, the register is updated like this:
blt_64
, shift_left_64
, sub_64
Quiz1
Leetcode 3307. Find the K-th Character in String Game II
online tool for RISC-V Instruction Encoder/Decoder