Computer Architecture
Hacks
在電腦的世界裡,可以做任何數目系統而且複雜的演算,但是大多數的演算都藉由軟體(程式)來解決,而非用硬體(電路)直接進行各種演算,電腦的硬體或其他數位電路在做算術運算時,最基本的電路往往只有二進位加法器而已,至於減法可藉由補數的加法解決,乘法等於連續的加法,除法則是連續的減法,可見加法器在運算數位系統中的重要性 ,但它也不過是幾個邏輯閘就解決的電路。
Rule:
^
the left most digit
Rule:
Rule:
e.g.
4 & 3 == 0
3 & 2 != 0
Sometimes we need to extend the sign of a number but we don't know a priori the number of bits, b, in which it is represented.
If you need to negate only when a flag is false, then use the following to avoid branching.
If you need to negate only when a flag is true, then use this.
The following finds the the rank of a bit, meaning it returns the sum of bits that are set to 1 from the most-signficant bit downto the bit at the given position.
The following 64-bit code selects the position of the rth 1 bit when counting from the left. In other words if we start at the most significant bit and proceed to the right, counting the number of bits set to 1 until we reach the desired rank, r, then the position where we stop is returned. If the rank requested exceeds the count of bits set, then 64 is returned. The code may be modified for 32-bit or counting from the right. If branching is fast on your target CPU, consider uncommenting the if-statements and commenting the lines that follow them.
A parity bit, or check bit, is a bit added to a string of binary code to ensure that the total number of 1-bits in the string is even or odd. Parity bits are used as the simplest form of error detecting code.
This swaps the values of a and b without using a temporary variable. Don't use this with floating-point numbers (unless you operate on their raw integer representations).
This is an old trick to exchange the values of the variables a and b without using extra space for a temporary variable.
As an example of swapping ranges of bits suppose we have have b = 00101111 (expressed in binary) and we want to swap the n = 3 consecutive bits starting at i = 1 (the second bit from the right) with the 3 consecutive bits starting at j = 5; the result would be r = 11100011 (binary).
The log base 2 of an integer is the same as the position of the highest bit set (or most significant bit set, MSB).
The integer log base 10 is computed by first using one of the techniques above for finding the log base 2. By the relationship log10(v) = log2(v) / log2(10), we need to multiply it by 1/log2(10), which is approximately 1233/4096, or 1233 followed by a right shift of 12. Adding one is needed because the IntegerLogBase2 rounds down. Finally, since the value t is only an approximation that may be off by one, the exact value is found by subtracting the result of v < PowersOf10[t].
This method works well when the input is uniformly distributed over 32-bit values because 76% of the inputs are caught by the first compare, 21% are caught by the second compare, 2% are caught by the third, and so on (chopping the remaining down by 90% with each comparision). As a result, less than 2.6 operations are needed on average.
It makes use of the fact that the first 32 bit position values are relatively prime with 37, so performing a modulus division with 37 gives a unique number from 0 to 36 for each. These numbers may then be mapped to the number of zeros using a small lookup table.
Converting bit vectors to indices of set bits is an example use for this. It requires one more operation than the earlier one involving modulus division, but the multiply may be faster. The expression (v & -v) extracts the least significant 1 bit from v. The constant 0x077CB531UL is a de Bruijn sequence, which produces a unique pattern of bits into the high 5 bits for each possible bit position that it is multiplied against. When there are no bits set, it returns 0.
The time to convert an integer to a float can be high on some machines. The exponent of the 32-bit IEEE floating point representation is shifted down, and the bias is subtracted to give the position of the least significant 1 bit set in v. If v is zero, then the result is -127.
e.g.
Scientific notation is
IEEE 754 floating-point standard is
32-bit IEEE floating point representation is
Sign | Exponent | Mantissa |
---|---|---|
+ | 2 + 127 | 1 |
0 | 10000001 | 1000000 00000000 00000000 |
The result may be expressed by the formula 1U << (lg(v - 1) + 1).
(It would be faster by 2 operations to use the formula and the log base 2 method that uses a lookup table, but in some situations, lookup tables are not suitable.)
It works by copying the highest set bit to all of the lower bits, and then adding one, which results in carries that set all of the lower bits to 0 and one bit beyond the highest set bit to 1. If the original number was a power of 2, then the decrement will reduce it to one less, so that we round up to the same original value.
Interleaved bits (aka Morton numbers) are useful for linearizing 2D integer coordinates, so x and y are combined into a single number that can be compared easily and has the property that a number is usually close to another if their x and y values are close.
e.g.
If you want to convert a certain set of integer coordinates to a Morton code, you have to convert the decimal values to binary and interleave the bits of each coordinate:
(x,y) = (5,9) = (0101,1001)
Interleaving the bits results in: 10010011 = 147
Suppose we have a pattern of N bits set to 1 in an integer and we want the next permutation of N 1 bits in a lexicographical sense.
For example, if N is 3 and the bit pattern is 00010011, the next patterns would be 00010101, 00010110, 00011001,00011010, 00011100, 00100011, and so forth. The following is a fast way to compute the next permutation.
The __builtin_ctz(v)
GNU C compiler intrinsic for x86 CPUs returns the number of trailing zeros.
There is yet a faster method — use hasless(v, 1), which is defined below.
We may want to know if any byte in a word has a specific value. To do so, we can XOR the value to test with a word that has been filled with the byte values in which we're interested. Because XORing a value with itself results in a zero byte and nonzero otherwise, we can pass the result to haszero.
Requirements: x>=0; 0<=n<=128
To count the number of bytes in x
Requirements: x>=0; 0<=n<=127
To count the number of bytes in x
When m < n, this technique tests if a word x contains an unsigned byte value, such that m < value < n.
Requirements: x>=0; 0<=m<=127; 0<=n<=128
To count the number of bytes in x that are between m and n (exclusive)