2023q1 Homework2 (quiz2)

contributed by < slipet >

測驗 1

Details

#include <stdint.h>
static inline uint64_t pow2(uint8_t e) { return ((uint64_t)1) << e; }

uint64_t next_pow2(uint64_t x)
{
    uint8_t lo = 0, hi = 63;
    while (lo < hi) {
        uint8_t test = (lo + hi)/2;
        if (x < pow2(test)) { hi = test; }
        else if (pow2(test) < x) { lo = test+1; }
        else { return pow2(test); }
    }
    return pow2(lo);
}

首先這段程式碼主要是利用 binary search 在 0~63 位元之間搜索，尋找最靠近 x 且"大於等於"2的冪。

接著依題目提示對 x 做一些操作後可以將 x最高位元之後的 bit 都設為1。
x = 0010000000000000 => do operations => x = 0011111111111111
可以觀察到在向右位移 AAAA 之前已經位移了7個 bit ，將最高位元之後的7位都設為1，如下所示:
x = 0010000000000000 => do operations => x = 0011111111000000
若是要盡可能的利用已經設為1的 bit，那就必須繼續往右 8 個位元也就是 AAAA = 8，將後面的位元設為1。
而依照同樣的邏輯 BBBB = 32，

這時如果再+1的話可以得到最靠近 x 且"大於"2的冪。可以發現這個結果跟一開始題目的要求不一樣。以題目為例 x = 0010000000000000 為 2^14 預期得到的結果是 2^14 ，但是做了一連串操作後我們會得到 2^15。

因此若是把 (x>>1)+1 則可以剛好得到所需要最靠近 x 且"大於等於"2的冪。
我們可以測試幾個極值看是否符合預測

when x = 0 => do operations => 1
when x = 1 => do operations => 1
when x = 2 => do operations => 2
when x = 0xffffffffffffffff => do operations => 0x8000000000000000

可以知道 CCCC = (x>>1)+1
x = 0xffffffffffffffff 這個例子會得到 0x8000000000000000，因為題目是要求我們在 "uint64_t" 的範圍內找出最接近且大於等於 2 的冪的值，若是 2^64 則會因為 overflow 而導致結果為 0 ，因此在 uint64_t 範圍內最靠近 x 為 0x8000000000000000 。

uint64_t next_pow2(uint64_t x)
{
    //x = 0010000000000000
    x |= x >> 1;//->x = 0011000000000000
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;//->x = 0011110000000000
    x |= x >> 1;
    x |= x >> 1;
    x |= x >> 1;//->x = 0011111111000000
    x |= x >> 8;//->x = 0011111111111111
    x |= x >> 16;
    x |= x >> 32;
    return (x>>1)+1;
}

接著使用 __bulitin_clzl 改寫前面的方法，這裡有一個小陷阱，如果直接對 1 做位移的話得到的結果是 unsigned 或 signed 會是 implementation-defined 的，所以必須要轉型成 uint64_t 使用 1UL。

Details

uint64_t next_pow2_v3(uint64_t x)
{
    return 1UL<<(63-__builtin_clzl(x));
}

測驗 2

Details

int concatenatedBinary(int n)
{
    const int M = 1e9 + 7;
    int len = 0; /* the bit length to be shifted */
    /* use long here as it potentially could overflow for int */
    long ans = 0;
    for (int i = 1; i <= n; i++) {
        /* removing the rightmost set bit
         * e.g. 100100 -> 100000
         *      000001 -> 000000
         *      000000 -> 000000
         * after removal, if it is 0, then it means it is power of 2
         * as all power of 2 only contains 1 set bit
         * if it is power of 2, we increase the bit length
         */
        if (!(DDDD))
            len++;
        ans = (i | (EEEE)) % M;
    }
    return ans;
}

首先我們先觀察 i = 1,2,3,4,......,n，分別可以對應 0, 1, 10, 11, 100,..... 的二進位表示方式，我們可以發現只有在 i 為2的冪次方時二進位表示的所需長度才會加一，因此對應道題目的 for-loop 裡的判斷式只有在 i 為2的冪次方時 len 才需要 +1 ，因此 DDDD = i&i-1 。

ans = (i | (EEEE)) % M;
呼應道題目所說將 1 到 n 的二進位表示法依序串接在一起，將 ans 向左位移 len 並且與 i 做 OR 運算，因此 EEEE = ans<<len。

Details

int concatenatedBinary(int n){
    const int M = 1e9 + 7;
    int len = 0; 
    long ans = 0;
    for (int i = 1; i <= n; i++) {
        if (!(i&(i-1)))
            len++;
        ans = (i | (ans<<len)) % M;
    }
    return ans;
}

在 leetcode 上提交 4 次得到的平均時間為 26ms。

接著使用 builtin_clz() 嘗試優化，提交4次平均時間仍然為 26ms。
因為 leetcode 的執行時間會因為線上人數的不同以及其他因素，導致伺服器的執行時間不會每次都差不多，所以可能會需要在自己的環境實驗。

Details

int concatenatedBinary(int n){
    const int M = 1e9 + 7;
    int len = 0; 
    long ans = 0;
    for (int i = 1; i <= n; i++) {
        len = 32 - __builtin_clz(i);
        ans = (i | (ans<<len)) % M;
    }
    return ans;
}

接著嘗試優化 module operation，在網路上搜索之後發現大部分都是討論關於 module 2冪次方可以轉化成 &(2^n-1) 。
參考了這篇 link 中有提到 Chandler Carruth's benchmarks at CppCon 2015 加速 module operation 的方式，因為我們的運算中大部分會比 M 要來的小，因此只有在超過 M 的時候才需要 module operation 。

Details

int concatenatedBinary(int n){
    const int M = 1e9 + 7;
    int len = 0; 
    long ret = 0;
    for (int i = 1; i <= n; i++) {
        len = 32 - __builtin_clz(i);
        ret = (ret << len) | i;
        ret = (ret<M)?ret : ret % M;
    }
    return ret;
}

2023q1 Homework2 (quiz2)

測驗 1

測驗 2

Read more

2024q1 Homework1 (lab0)

Fixed Point Arithmetic

2023q1 Homework3 (fibdrv)

2024q1 Homework2 (quiz1+2)