contributed by < YangYeh-PD
>
1
: Computing the Square RootSuppose that
then
Therefore, the problem can be reduced to determine each term . If setting the m-th bit to 1 would result in , then set it to 0 and continue to examine the next bit.
Thus, the simplest way to calculate the square root of a number N
is to first compute the most significant bit (MSB) of N
and clear all bits smaller than the MSB. In the binary system, the simplest way to obtain the MSB of a number N
is to directly take the logarithm base 2 and then explicitly cast it to an integer.
Then, use a loop to sequentially check whether adding these bits will result in a square greater than N
, and output the result accordingly.
If we want to avoid using the log2()
function, we can iterate through N
and perform right-shift operations successively, counting the number of right-shifts until N
becomes less than or equal to 1.
Due to the significant computational complexity of directly calculating whether is greater than 0, we simplify as follows. Setting
By setting , and ,
and
Next, we attempt to write an algorithm for finding the square root. Since n
is MSB, the initial conditions are as follows:
But I still don't know why I should add
& ~1UL
at the end of the expression.
ChenYang YehSun, Mar 17, 2024 06:31 AM
Check C standard for integer promotion.jservImage Not Showing Possible ReasonsLearn More →
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
and in each iteration,
Therefore, we can express it in the following code
Since __builtin_clz(x)
is a built-in function in GCC, and the result is undefined if x == 0
, we can implement ffs function in <string.h>
to avoid that special case.
which returns 0
when the input has no set bit.
To get rid of the branch, we can make the best use of bitwise operations.
Since when x == b
, the result of the right shift operation (b - x) >> 31
would be 0
. To ensure the correct behavior, we need to subtract 1
from the expression inside the parentheses.
The Linux kernel's lib/math/int_sqrt.c file utilizes a similar method for computing square roots.
When I opened it, I was surprised to find that it actually uses a branch… I initially thought about submitting a patch to modify this branch. However, I found it strange that this commit aa6159a has been there for 4 years, and no one seemed to have noticed it before? So, I decided to conduct my own testing and only then did I realize that the performance improvement was not significantly noticeable.
After compiling these two code snippets into assembly, it was found that the version using bit-masks has 5 more instructions than the version using branches. Therefore, although the bit-masks version does not have branches, the overall performance cannot be improved because the number of instructions increases.
In addition, since the kernel uses unsigned long
instead of integer, performing a right-shift operation results in a logical shift. If an arithmetic shift is desired, it would be necessary to declare another int mask
or perform an explicit type conversion.
Last but not least, the behavior of arithmetic shift is subject to the compiler being used. This is outlined in Section 5 of the ISO/IEC 9899 §6.5.7 5.
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of . If E1 has a signed type and a negative value, the resulting value is implementation-defined.
3
: Calculate the logarithm with base 2 IThe technique of taking logarithm base 2 has actually been used in problem 1, which involves calculating the Most Significant Bit (MSB) of number N
, and then subtracting 1
from it.
We can also utilize the technique of binary search to expedite the process.
Or using built-in count leading zero function.
Noting that since gcc clz()
built-in function leads to the undefined result when input is 0
, we should at least give it a number . So we choose v | 1
as input.
There are two log2()
implementations in linux/log2.h.
The first one is simple, just return MSB - 1
.
The second one is bittle bit tricky. It lists out all the possibilities of every bit of n
and then uses a bunch of ? :
ternary operators to perform nested if-condition checks. The reason for this approach might be that in most cases where logarithms base 2 are required, the numbers tend to be large. Therefore, by first evaluating the cases of the higher-order bits, the computation can effectively terminate sooner.
TODO: check the lecture material about compiler optimizations to determine why optimizing compilers like GCC can generate the corresponding constant result.
jservImage Not Showing Possible ReasonsLearn More →
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
5
: Calculate the logarithm with base 2 IIThe following program again calculate the logarithm with base 2. It use r | shift
to store the logarith value. Since the logarithm value will not exceed 32 (included) in uint32_t
, we can only use 5 bits to represent the value.
So the problem is to find whether each bit of r | shift
should be set or not. To know we should set the bit of r | shift
or not, we again use binary search.
However, there has a special case. When the input value is 0
, the result will be 32
, which is wrong.
We can simply alter the condition of x--
.
When the input is 0 or 1, there are still issues because according to mathematical definitions, , and \log{0} is undefined.
ChenYang YehSun, Mar 17, 2024 20:34 PM
I cannot find it…
ChenYang YehSun, Mar 18, 2024 17:46 PM
Searchlog_2
ilog2
andint_log2
instead.jservImage Not Showing Possible ReasonsLearn More →
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported