contributed by scones525
In a complete binary tree, there is a certain relationship between the number of nodes and the depth.
For example, a tree with a depth of 2 has 3 to 4 nodes, and a tree with a depth of 3 has 7 to 8 nodes, and so on.
Assuming we use a 64-bit binary representation for identifying each node, the number of leading zeros in the binary form of this integer can represent the level or depth of the node.
We can notice that the relationship between depth and the number of leading zero is : 64 - the number of leading zeros.
Therefore, if we know the node's ID, we can quickly calculate its depth in a complete binary tree using the count_leading_zeros function.
For example if we have a complete binary tree below:
How can we locate tree node "7" by clz?
Firstly, the clz of that tree node represents its depth.
Secondly, we can noticed that the leftmost node of each depth is exactly a multiple of 2.
We can use this characteristic to calculate its position from left to right.
For example, the binary representation of the number 7 is 111.
In a 32-bit system, we can quickly calculate its depth using clz(clz of 7 is 25, its depth is 32 - 25 = 5).
Then, subtracting the starting position of that level from each number gives its position(7-4=3, so 3 is the position of 7 in that level).
Here are some tests…
You can check the code in complete_binary_tree_location.s
This part include c code and translate it into assembly code.
Besides, program's functionality and explanations will be discussed in this subsection.
In RV32i, register can only store 32-bit data.
However, uint_64_t is a long long integer data type which is guaranteed to be exactly 8 bytes in size.
So I store a 64 bits data in two registers in RV32i.
For example we can check lines 3 to 5 in Assembly code part above:
Represented in 64 bits:
Similar to the previous question, there may be issues when shifting data.
Therefore, the bits shifted out from the high-order register must be stored and loaded into the low-order register.
For example:
The code was tested using the Ripes simulator.
And in this part, I will explain an example for each stage in Instruction pipeline( including IF, ID, IE, MEM, and WB.)
Here are the line numbers used for the CLZ function of the disassenbled code.
This loop is the lower section of CLZ function.
And the instruction slli x23 x8 31 will be taken as an example.
Before we begin, let us check the The RISC-V Instruction Set Manual carefully.
Shifts by a constant are encoded as a specialization of the I-type format.The operand to be shifted
is in rs1, and the shift amount is encoded in the lower 5 bits of the I-immediate field.SLLI is a logical left shift (zeros are shiftedinto the lower bits);
The address of instruction slli x23 x8 31 is 0x00000088. So it input 0x00000088 into the Instruction memory.
By the way, 0x0000008c refer to the next instruction address which is add by 4 from its above instruction pipeline.
You may notice that this instruction after passing through the instruction memory becomes 0x1f41b93.
But why?
As mention before in "Integer Register-Immediate Instructions" part, the "slli" instruction is divided into some parts…
To summarize, 0x01f41b93 will be fed into the IF/ID stage.
After decoding, x8 and x31 are put into Register unit. Register Unit will read what value in there register, we can check them in 'GPR'.
x8 :
x31:
Immediate value '31' will be sent into Imm. unit.
On the same time, processor need to store write back register information to ID/EX.
We can divide this part into two sections, ALU and write back information part.
For ALU op1, it stores register x8 value. In my risc-v code, I consider it as two stages:
This part can use loop to simplify it.
However, this part cannot not be done using a loop.
So, instruction slli x23 x8 31 is the first instruction of c code x -= ((x >> 1) & 0x5555555555555555 );
We can learn that 0x000000ff is the value after the loop dealing with it.
For op2, it stores immdeiately value '31' here which means how many bits x8 needs to shift.
After calaulating, we can conduct Res '0x80000000' to the next stage.
We can find an insteresting thing that is : Data Memory output is '0x00000000',
which means nothing is done here.
This is because slli is an integer logical instruction, the entire operation takes place entirely within the CPU's register file and the Arithmetic Logic Unit (ALU), with no involvement of the data memory.
Here, we can discuss two values.
One is '0x80000000', the value that need to write back to 'x31' register.
Another is '0x17', as we mention in decode part before it shares the same means with 'x23' register.
In the same cycle, notice that thay have written back to Register Unit.
Wr data:
Wr idx:
In the previous 64 bits version, we have used a lot of shift and or to implement how to count 64bits number zero.
However, counting a 64 bits number and counting a 32 bits number are actually the same thing.
Why?
Because it doesn't matter which register you count, all we need to do is count the leading zeros in the upper 32 bits or the lower 32 bits.
For example, if we have a test case:
If we use this method, it can reduce a significant amount of execution time and decrease code size theoretically.
You can check this file from this latest_hw1.s.
Before make this optimization:
After:
The biggest difference between the two can be divided into two points.