彙整學員們的作業成果

contributed by <gyes00205>

tags: `linux2021`

題目

考慮函式 func 接受一個 16 位元無號整數

N

，並回傳小於或等於

N

的 power-of-2

uint16_t func(uint16_t N) {
    /* change all right side bits to 1 */
    N |= N >> 1;
    N |= N >> 2;
    N |= N >> 4;
    N |= N >> 8;

    return (N + 1) >> 1;
}

解釋運作原理：

首先將遇到的第一個值為 1 的 bit 的右邊所有 bit 都設為 1，最後 +1 並往右移一個 bit 。
以 N =

32768_{10}

1000000000000000_{2}

為例

N |= N >> 1;

$N$ =
$1100000000000000_{2}$
N |= N >> 2;

$N$ =
$1111000000000000_{2}$
N |= N >> 4;

$N$ =
$1111111100000000_{2}$
N |= N >> 8;

$N$ =
$1111111111111111_{2}$
(N + 1) >> 1;

$N$ =
$1000000000000000_{2}$

不會 overflow 的原因：

在 C99 規格書 6.3.1.1 Boolean,characters, and integers

If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

此為 integer promotion ，因為 (N + 1) >> 1 在 int 的表示範圍內，所以會將 N 轉型成 32-bit 的 int ，回傳時再以 usigned int 表示。

改進方法

因為上面所提到的 integer promotion 的緣故，所以在 (N + 1) >> 1 並不會有 overflow 的問題。
那麼如果是在 uint32_t N 的情況下呢？
改進方法: 參考你所不知道的 C 語言：數值系統

比方說 (x + y) / 2 這樣的運算，有個致命問題在於 (x + y) 可能會導致 overflow (考慮到 x 和 y 都接近 UINT32_MAX，亦即 32-bit 表示範圍的上限之際)
* Binary search 的演算法提出之後十年才被驗證
* 於是我們可以改寫為以下:
(x & y) + ((x ^ y) >> 1)

用加法器來思考: x & y 是進位, x ^ y 是位元和, / 2 是向右移一位

位元相加不進位的總和: x ^ y; 位元相加產生的進位值: (x & y) << 1

x + y = x ^ y + ( x & y ) << 1

所以 (x + y) / 2
= (x + y) >> 1
= (x ^ y + (x & y) << 1) >> 1
= (x & y) + ((x ^ y) >> 1)

從上述的方法我們可以將原本的方式改成下面，在 N 為 uint32_t 的情況下也不會 overflow 。


- (N + 1) >> 1
+ (N & 1) + ((N ^ 1) >> 1)

The Linux Kernel API:

在 The Linux Kernel API 頁面搜尋 “power of 2”，可見 is_power_of_2 ，查閱 Linux 核心原始程式碼，舉例說明其作用和考量，特別是 round up/down power of 2 。特別留意 __roundup_pow_of_two 和 __rounddown_pow_of_two 的實作機制。
linux/include/linux/log2.h

`is_power_of_2`

接受一個無號整數

n

，如果該值為 2 的冪，則回傳 true ; 反之回傳 false 。 0 不是 2 的冪。

/*
 *  Determine whether some value is a power of two, where zero is
 * *not* considered a power of two.
 */

static inline __attribute__((const))
bool is_power_of_2(unsigned long n)
{
	return (n != 0 && ((n & (n - 1)) == 0));
}

首先判斷

n

是否為 0 。如果 (n & (n - 1)) 為 0 ，則

n

為 2 的冪。
以

n = 1000_{2}

為例，則

n - 1 = 0111_{2}

，

n

(n - 1)

= 0

，最後回傳 true 。

`__roundup_pow_of_two`

接受一個無號整數

n

，並回傳大於等於

n

且離

n

最近的 2 的冪。
以

n = 9

為例，則

n - 1 = 8 = 1000_{2}

， fls_long(n - 1) = 4 最後回傳 1UL << 4 等於 16。如果

n

為 2 的冪，則剛好回傳

n

。

/*
 * round up to nearest power of two
 */
static inline __attribute__((const))
unsigned long __roundup_pow_of_two(unsigned long n)
{
	return 1UL << fls_long(n - 1);
}

其中有提到 fls_long 用來找從 MSB 開始出現第一個 1 的位置。linux/include/linux/bitops.h

static inline unsigned fls_long(unsigned long l)
{
	if (sizeof(l) == 4)
		return fls(l);
	return fls64(l);
}

fls 用於 unsigned long 為 4 bytes ，而 fls64 用於 unsigned long 為 8 bytes 。
fls 和 fls64 在 Linux kernel 中會根據不同硬體而有不同實作。以下為 fls 在 arch/arc/include/asm/bitops.h 的實作。

/*
 * fls = Find Last Set in word
 * @result: [1-32]
 * fls(1) = 1, fls(0x80000000) = 32, fls(0) = 0
 */
static inline __attribute__ ((const)) int fls(unsigned int x)
{
	if (__builtin_constant_p(x))
	       return constant_fls(x);

	return 32 - clz(x);
}

其中 clz 和 constant_fls 實作如下

/*
 * Count the number of zeros, starting from MSB
 * Helper for fls( ) friends
 * This is a pure count, so (1-32) or (0-31) doesn't apply
 * It could be 0 to 32, based on num of 0's in there
 * clz(0x8000_0000) = 0, clz(0xFFFF_FFFF)=0, clz(0) = 32, clz(1) = 31
 */
static inline __attribute__ ((const)) int clz(unsigned int x)
{
	unsigned int res;

	__asm__ __volatile__(
	"	norm.f  %0, %1		\n"
	"	mov.n   %0, 0		\n"
	"	add.p   %0, %0, 1	\n"
	: "=r"(res)
	: "r"(x)
	: "cc");

	return res;
}

static inline int constant_fls(unsigned int x)
{
	int r = 32;

	if (!x)
		return 0;
	if (!(x & 0xffff0000u)) {
		x <<= 16;
		r -= 16;
	}
	if (!(x & 0xff000000u)) {
		x <<= 8;
		r -= 8;
	}
	if (!(x & 0xf0000000u)) {
		x <<= 4;
		r -= 4;
	}
	if (!(x & 0xc0000000u)) {
		x <<= 2;
		r -= 2;
	}
	if (!(x & 0x80000000u))
		r -= 1;
	return r;
}

`__rounddown_pow_of_two`

接受一個無號整數

n

，並回傳小於等於

n

且離

n

最近的 2 的冪。以

n = 9 = 1001_{2}

為例， fls_long(n) - 1 = 3 最後回傳 1UL << 3 等於 8。如果

n

為 2 的冪，則剛好回傳

n

。

/*
 * round down to nearest power of two
 */
static inline __attribute__((const))
unsigned long __rounddown_pow_of_two(unsigned long n)
{
	return 1UL << (fls_long(n) - 1);
}

slab allocator

研讀 slab allocator，探索 Linux 核心的 slab 實作，找出運用 power-of-2 的程式碼和解讀

彙整學員們的作業成果

tags: linux2021

題目

解釋運作原理：

不會 overflow 的原因：

改進方法

The Linux Kernel API:

is_power_of_2

__roundup_pow_of_two

__rounddown_pow_of_two

slab allocator

tags: `linux2021`

`is_power_of_2`

`__roundup_pow_of_two`

`__rounddown_pow_of_two`