2024q1 Homework4 (quiz3+4)

contributed by < mesohandsome >

quiz3 測驗 1

程式碼運作原理

c_{m} = P_{m + 1} 2^{m + 1}

d_{m} = (2^{m})^{2}

Y_{m} = {\begin{cases} c_{m} + d_{m} & if a_{m} = 2^{m} \\ 0 & if a_{m} = 0 \end{cases}

藉由位元運算推算出下一輪

c_{m - 1}

，

d_{m - 1}

c_{m - 1} = P_{m} 2^{m} = (P_{m + 1} + a_{m}) 2^{m} = P_{m + 1} 2^{m} + a_{m} 2^{m} = \frac{1}{2} P_{m + 1} 2^{m + 1} + (2^{m})^{2} = {\begin{cases} c_{m} / 2 + d_{m} & if a_{m} = 2^{m} \\ 0 & if a_{m} = 0 \end{cases}

d_{m - 1} = (2^{m - 1})^{2} = \frac{1}{4} (2^{m})^{2} = \frac{1}{4} d_{m}

在程式碼中，int m 對應到

d_{m}

，所以每回合都除以 4，也就是右移 2 位。

int z 則是

c_{m}

，除以 2 也就是右移 1 位。

for (int m = 1UL << ((31 - __builtin_clz(x)) & ~1UL); m; m >>= 2) {
    int b = z + m;
    z >>= 1;
    if (x >= b)
        x -= b, z += m;               
}

使用第 2 週測驗中的 __ffs(x) 取代 __builtin_clz。

- for (int m = 1UL << ((31 - __builtin_clz(x)) & ~1UL); m; m >>= 2) {
+ for (int m = 1UL << __ffs(x); m; m>>= 2) {
      int b = z + m;
      z >>= 1;
      if (x >= b)
          x -= b, z += m;               
  }

在第 3 週測驗 1 中有一部分的敘述標示不清楚，

c_{m}

d_{m}

Y_{m}

之間應有間格標示，但在 hackmd 上用 \\ 換行有些情況沒辦法顯示，所以黏在一起，閱讀時在這部份產生誤解並觀察了非常久才發現問題。

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

quiz3 測驗 2

程式碼運作原理

為了減少乘除法運算，需要以 bitwise 運算去逼近

1 / 10

，而算式必定為

\frac{a n}{2^{N}}

，

N

為非負整數，

a

為正整數，所以找出

(\frac{1}{8} + \frac{1}{2} + 1) \cdot \frac{8}{128} = \frac{13}{128} \approx 0.1

。

    (tmp >> 3) + (tmp >> 1) + tmp

上面的運算可以看成

\frac{t m p}{8} + \frac{t m p}{2} + t m p = \frac{13}{8} t m p

再把

\frac{13}{8} t m p

乘以

\frac{8}{128}

得到接近

\frac{1}{10} t m p

，也就是 div 的結果。

也因為

i n = d i v * 10 + m o d

，所以將剛才整理好的

\frac{1}{10} t m p

以 bitwise 操作乘以 10 倍後，與原始值相減就會得到 mod 的結果。

unsigned d2, d1, d0, q, r;

d0 = q & 0b1;
d1 = q & 0b11;
d2 = q & 0b111;
q = ((((tmp >> 3) + (tmp >> 1) + tmp) << 3) + d0 + d1 + d2) >> 7;
r = tmp - (((q << 2) + q) << 1);

q 計算完會逼近 0.8 * in，在計算 div 時要讓他接近 0.1 * in，所以要右移 3 位。

而 mod 會等於 div 10 的結果乘 10，所以原本的 q 為 0.8 * in 並將前 3 位多餘的過濾掉，再加上 0.2 * in ( 將值為 0.1 * in 的 div 左移兩位 ) 使值變為 1.0 * in 也就是 div 的 10 倍。

#include <stdint.h>
void divmod_10(uint32_t in, uint32_t *div, uint32_t *mod)
{
    uint32_t x = (in | 1) - (in >> 2); /* div = in/10 ==> div = 0.75*in/8 */
    uint32_t q = (x >> 4) + x;
    x = q;
    q = (q >> 8) + x;
    q = (q >> 8) + x;
    q = (q >> 8) + x;
    q = (q >> 8) + x;

    *div = (q >> 3);
    *mod = in - ((q & ~0x7) + (*div << 1));   
}

quiz3 測驗 3

程式碼運作原理

使用二分搜尋的技巧，找出

\log_{2} i

為多少，因此依序填入 65536 , 256 , 16 , 2。

static size_t ilog2(size_t i)
{
    size_t result = 0;
    while (i >= 65536) {
        result += 16;
        i >>= 16;
    }
    while (i >= 256) {
        result += 8;
        i >>= 8;
    }
    while (i >= 16) {
        result += 4;
        i >>= 4;
    }
    while (i >= 2) {
        result += 1;
        i >>= 1;
    }
    return result;
}

__builtin_clz 的輸入不能為 0，因此先將輸入 v 和 1 做 OR 運算，避免為 0 的狀況發生。

int ilog32(uint32_t v)
{
    return (31 - __builtin_clz(v | 1));
}

quiz3 測驗 4

程式碼運作原理

avg->internal 表示

S_{t}

，val << avg->factor 表示

Y_{t}

，avg->weight 表示

α

。

factor 用來調整可以顯示到多少的精確度。

struct ewma *ewma_add(struct ewma *avg, unsigned long val)
{
    avg->internal = avg->internal
                        ? (((avg->internal << avg->weight) - avg->internal) +
                           (val << avg->factor)) >> avg->weight
                        : (val << avg->factor);
    return avg;
}

將上面的算式化簡，就會得當與原本公式相符的結果：

S_{t} = 2^{- w e i g h t} Y_{t} + (1 - 2^{- w e i g h t}) S_{t - 1}

quiz3 測驗 5

int ceil_ilog2(uint32_t x)
{
    uint32_t r, shift;

    x--;
    r = (x > 0xFFFF) << 4;                                                                                                                                    
    x >>= r;
    shift = (x > 0xFF) << 3;
    x >>= shift;
    r |= shift;
    shift = (x > 0xF) << 2;
    x >>= shift;
    r |= shift;
    shift = (x > 0x3) << 1;
    x >>= shift;
    return (r | shift | x > 1) + 1;
}

改進程式碼，使其得以處理 x = 0 的狀況，並仍是 branchless

讓程式一開始依據 x 是否為 0 做減法，避免 unsinged integer 減到小於 0 後出現錯誤。

最後回傳時也判斷 x 是否為 0，決定是否加 1。

- x--;
+ x -= (x > 0);

...

- return (r | shift | x > 1) + 1;
+ return (r | shift | x > 1) + (x > 0);

quiz4 測驗 1

程式碼運作原理

以每 4 個位元為一個單位計算 1 的個數，利用以下公式計算：

x - ⌊ \frac{x}{2} ⌋ - ⌊ \frac{x}{4} ⌋ - ⌊ \frac{x}{8} ⌋

將輸入 v 右移 1 位 ( 除以 2 )，然後 & 0x77777777 的用意是，如果較高位的 4 位在位移之後，右移完要移除的位元會變成低位的第 4 個位元，因此以 0x77777777 ( 01110111 ... 0111 ) 將這些不需要的位元清除，就可以得到每 4 個位元完整的

⌊ \frac{x}{2} ⌋

。

而這個操作總共用了 3 次，取出

⌊ \frac{x}{2} ⌋

⌊ \frac{x}{4} ⌋

⌊ \frac{x}{8} ⌋

。

n = (v >> 1) & 0x77777777;
v -= n;

使用 v = (v + (v >> 4)) 會得到：

B7 (B7+B6) (B6+B5) (B5+B4) (B4+B3) (B3+B2) (B2+B1) (B1+B0)

再使用 0x0F0F0F0F 做 mask 得到：

0 (B7+B6) 0 (B5+B4) 0 (B3+B2) 0 (B1+B0)

與 0x01010101 相乘完後會在第 24 個位元後得到 B0 + B1 + … + B7，因此將結果右移 24 個位元即為結果。

v *= 0x01010101;                                     
return v >> 24;

以下程式用來找出每個數之間的 hamming distance，透過 xor 將不同的位元標示成 1，再以 popcount 找出總共有幾個 1，程式中使用了兩層迴圈來計算，因此每一組配對會被重複計算到，所以最後應該要將 total 除以 2，也就是右移 1 位。

int totalHammingDistance(int* nums, int numsSize)
{
    int total = 0;;
    for (int i = 0;i < numsSize;i++)
        for (int j = 0; j < numsSize;j++)
            total += __builtin_popcount(nums[i] ^ nums[j]); 
    return total >> 1;
}

撰寫更高效的程式碼

第 0 個 bit 觀察 :
全部都是 1 -> Hamming Distance = 0
第 1 個 bit 觀察 :
1 有一個，其餘皆為 0
可以直接用單個 bit 判斷 Hamming Distance (1)* (numsize - 1)
第 2 個 bit 觀察 :
1 有兩個，其餘皆為 0
Hamming Distance (2)* (numsize - 2)

根據以上規則可以整理成以下程式，比起原本的程式碼更有效率，可通過 LeetCode 477. Total Hamming Distance：

int total = 0;

for (int i = 0; i < 32; i++) {
    int count = 0;

    for (int j = 0; j < numsSize; j++) 
        count += (nums[j] >> i) & 1;
    total += count * (numsSize - count);
}

return total;

quiz4 測驗 2

程式碼運作原理

p o p c o u n t (x & \overset{―}{m}) - p o p c o u n t (x & m) = p o p c o u n t (x \oplus m) - p o p c o u n t (m)

popcount(0xAAAAAAAA) = 16 所以可以得出 popcount(n ^ 0xAAAAAAAA) - 16，但為了避免 n < 0 所以加上一個 3 的倍數變為 popcount(n ^ 0xAAAAAAAA) + 23。

再做下一回合的運算以得到一個小於 3 的數：

n = popcount(n ^ 0x2A) - 3;

經過上一回合後，n 目前的值介於 -3 ~ 2，最後需要針對小於零的狀況加 3，且在原本程式中並沒有將 n 從 unsigned 轉型成 int，由於沒有標示正負的符號，導致 & 3 的結果不如預期：

- return n + ((n >> 31) & 3);
+ return n + (((int)n >> 31) & 3);

觀察可移動的方式，move_masks[0] , move_masks[1] , move_masks[2] 為第一排，如果 board 與這三個移動方式做 OR 運算，得到 0x70044444，可以發現有連成一直線時，會有一個位元被設定成 7，為了檢察是否為 7，將 board + 0x11111111 在和 0x88888888 做 AND 運算就可以發現是否有人獲勝。

static const uint32_t move_masks[9] = {
    0x40040040, 0x20004000, 0x10000404, 0x04020000, 0x02002022,
    0x01000200, 0x00410001, 0x00201000, 0x00100110,
};

/* Determine if the tic-tac-toe board is in a winning state. */
static inline uint32_t is_win(uint32_t player_board)
{
    return (player_board + 0x11111111) & 0x88888888;
}

先將 x 的範圍縮減，x >> 15 移除較低的 15 位，(x & UINT32_C(0x7FFF)) 只保留最低的 15 位。

static inline int mod7(uint32_t x)
{
    x = (x >> 15) + (x & UINT32_C(0x7FFF));
    /* Take reminder as (mod 8) by mul/shift. Since the multiplier
     * was calculated using ceil() instead of floor(), it skips the
     * value '7' properly.
     *    M <- ceil(ldexp(8/7, 29))
     */
    return (int) ((x * UINT32_C(0x24924925)) >> 29);
}

quiz4 測驗 3

/* The process of deletion in this tree structure is relatively more intricate,
 * although it shares similarities with deletion methods employed in other BST.
 * When removing a node, if the node to be deleted has a right child, the
 * deletion process entails replacing the node to be removed with the first node
 * encountered in the right subtree. Following this replacement, an update
 * operation is invoked on the right child of the newly inserted node.
 *
 * Similarly, if the node to be deleted does not have a right child, the
 * replacement process involves utilizing the first node found in the left
 * subtree. Subsequently, an update operation is called on the left child of th
 * replacement node.
 *
 * In scenarios where the node to be deleted has no children (neither left nor
 * right), it can be directly removed from the tree, and an update operation is
 * invoked on the parent node of the deleted node.
 */

根據函式註解，可以將缺漏的部分補上。

struct xt_node *least = xt_first(xt_right(del));
if (del == *root)
    *root = least;

xt_replace_right(del, least);
xt_update(root, xt_right(least));
return;

struct xt_node *most = xt_last(xt_left(del));
if (del == *root)
    *root = most;

xt_replace_left(del, most);
xt_update(root, xt_left(most));
return;

xt_update(root, parent);

2024q1 Homework4 (quiz3+4)

quiz3 測驗 1

程式碼運作原理

quiz3 測驗 2

程式碼運作原理

quiz3 測驗 3

程式碼運作原理

quiz3 測驗 4

程式碼運作原理

quiz3 測驗 5

改進程式碼，使其得以處理 x = 0 的狀況，並仍是 branchless

quiz4 測驗 1

程式碼運作原理

撰寫更高效的程式碼

quiz4 測驗 2

程式碼運作原理

quiz4 測驗 3

Read more

2024q1 Homework5 (assessment)

2024q1 Homework6 (integration)

linux2024-homework1

2024q1 Homework2 (quiz1+2)