2020q3 Homework3 (quiz3)

contributed by <RusselCK >

tags: `RusselCK`

數值系統＆ Bitwise 練習題
 作業繳交區

Q1.在二補數系統實作跨平台有號整數的 arithmetic right shift ( ASR )

在 C 語言中，對有號型態的物件進行算術右移 (arithmetic right shift，以下簡稱 ASR 或 asr) 歸屬於 unspecified behavior







int asr_i(signed int m, unsigned int n)
{
    const int logical = (((int) -1) OP1) > 0; // OP1 = >>1
    unsigned int fixu = -(logical & (OP2)); // OP2 = m<0
    int fix = *(int *) &fixu;
    return (m >> n) | (fix ^ (fix >> n));
}

提示: 透過 gcc/clang 編譯程式碼時，可加上 -fsanitize=undefined 編譯參數，在執行時期若遇到未定義行為，會得到以下錯誤訊息:

runtime error: load of misaligned address 0x7ffd8a89be8f for type ‘int’, which requires 4 byte alignment

以 8 bits 為例：
-5 >> 1 = ( 1111 1011 ) >>1 = ( 1111 1101 ) = -3

程式碼解析：
- 我們並不知道電腦進行負整數 ARS 之後，究竟會在空位補 0 或補 1
- 我們想要負整數 ARS 後，電腦在空位補 1
根據上述推測
- 如果電腦照我們的想要的方式補 0 或 1 ， (fix ^ (fix >> n)) 可為 0 或正確的 mask
- 反之，(fix ^ (fix >> n)) 必須為正確的 mask ，將補位換成我們想要的值
  
  正確的 mask : ( n = 1 )
  fix = ( 1111 1111 )
  (fix ^ (fix >> n)) = ( ( 1111 1111 ) ^ ( 0111 1111 ) ) = ( 1000 0000 )

#3    const int logical = (((int) -1) OP1) > 0;

-1 = (1111 1111)

想知道電腦在負數右移會如何補位，可以利用 #3 來得知
( 只須右移 1 位便可得知，故 OP1 = >>1 )
- 若 ((int) -1) >>1) 為 ( 0111 1111 ) ，代表電腦在負整數 ARS 會補 0，logical 為 true = ( 0000 0001 )
  ( i.e. 此電腦的右移為邏輯右移 )
- 若 ((int) -1) >>1) 為 ( 1111 1111 ) ，代表電腦在負整數 ARS 會補 1，logical 為 false = ( 0000 0000 )
  ( i.e. 此電腦的右移為算術右移 )

§ 6.5 .7 B i t w i s e s h i f t o p e r a t o r s

( p. 85 )

The result of E1 >> E2 is E1 right-shifted E2 bit positions.

If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 /

2^{E 2}

. ( i.e. 正整數 ARS 不會出錯）

If E1 has a signed type and a negative value, the resulting value is implementation-defined.

#4    unsigned int fixu = -(logical & (OP2));

unsigned int a = -9;
printd("%d\n", a);

output :

-9

若 m < 0 且 logical = true ( i.e. 電腦在負數 ARS 補位的方式不正確）
- 我們想讓 fixu 為 ( 1111 1111 )
- -(logical & (OP2)) = ( 1111 1111 )
- (0000 0001) & OP2 = ( 0000 0001 ) // 2's complement
- OP2 = ( 0000 0001 ) = true
若 m >= 0 且 logical = false
- 我們想讓 fixu 為 ( 0000 0000 )
- -(logical & (OP2)) = ( 0000 0000 )
- (0000 0001) & OP2 = ( 0000 0000 ) // 2's complement and overflow
- OP2 = ( 0000 0000 ) = false
根據上面推導，可假設OP2 = m < 0
若 logical = false，OP2 = m < 0 仍成立

問題：( Form RinHizakura 同學 )

unsigned int fixu = -(logical & (m < 0));
int fix = *(int *) &fixu;

和

int fix = -(logical & (m < 0));

兩個寫法的差異性？

前者可避免編譯器進行過度 (aggressive) 最佳化

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

jserv

Q1. 進階題

實作其他資料寬度的 ASR

參照 C 語言：前置處理器應用篇撰寫通用的巨集以強化程式碼的共用程度
























#define asr_i(m,n) \
    _Generic((m), \
             int8_t: asr_i8, \
             int16_t: asr_i16, \
             int32_t: asr_i32, \
             int64_t: asr_i64 \
    )(m,n)

#define asr(type) \
    const type logical = (((type) -1) >> 1) > 0; \
    u##type fixu = -(logical & (m < 0)); \
    type fix = *(type *) &fixu; \
    return (m >> n) | (fix ^ (fix >> n))

int8_t asr_i8(int8_t m, unsigned int n){ asr(int8_t);}
int16_t asr_i16(int16_t m, unsigned int n){ asr(int16_t); }
int32_t asr_i32(int32_t m, unsigned int n){ asr(int32_t); }
int64_t asr_i64(int64_t m, unsigned int n){ asr(int64_t); }

int main(){
    ...
    asr_i(m, n);
    ...
}

#1~#7 : 根據不同的 data model，asr_i(m, n) 會使用相對應的 function
#15~#18中的asr(type) 會轉換成相對應的內容 ( i.e. #10~#13 )

問題：( Form RinHizakura 同學 )
乍想之下，無論型別都直接轉型成 64 bits 的版本處理，計算出結果後再轉回原本的型別，似乎正確性上是沒有問題的？這樣想是否完全正確呢？排除正確性的話，這樣方式的實作可能有甚麼缺點？

不是每款 C 語言編譯器都能正確處理 64 位元，且 data model 會涉及 ABI 議題

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

jserv

Q2. LeetCode 342. Power of Four (`__builtin_ctz` 實作)

假設 int 為 32 位元寬度





bool isPowerOfFour(int num)
{
    return num > 0 && (num & (num - 1))==0 &&
           !(__builtin_ctz(num) OPQ);  // OPQ = & 0x1
}

GCC extension 其中兩個是 ctz 和 clz:

Built-in Function: int builtin_ctz (unsigned int x)
- Returns the number of trailing 0-bits in x, starting at the least significant bit position. ( 返回右起第一個 1 之後的 0 的個數 )
- If x is 0, the result is undefined.
Built-in Function: int builtin_clz (unsigned int x)
- Returns the number of leading 0-bits in x, starting at the most significant bit position. ( 返回左起第一個 1 之前 0 的個數 )
- If x is 0, the result is undefined.

觀察
$4^{n}$ ：

$4^{n}$	$4^{n}$ in binary	__builtin_ctz( $4^{n}$ )
$4^{0}$	0000 0001	0
$4^{1}$	0000 0100	2
$4^{2}$	0001 0000	4
$4^{3}$	0100 0000	6
$4^{k}$	…	2 $k$

程式碼解析：

num > 0

若
$4^{n}$ 為整數，則
$4^{n}$ 必大於 0

(num & (num - 1))==0

我們可以發現，
$4^{n}$ 在二進制表示中，只會有 1 個 1

以
$4^{2}$ 為例：
(num & (num - 1)) = (( 0001 0000 ) & ( 0000 1111 ) ) = ( 0000 0000 )

!(__builtin_ctz(num) OPQ)

我們可以發現，__builtin_ctz(
$4^{n}$ ) 的值必為偶數
- 必為偶數 = 不為奇數 = 第 1 個 bit 不為 1
- 因此，OPQ = & 0x1
效能分析：

Q2. 進階題

改寫上述程式碼，提供等價功能的不同實作，儘量降低 branch 的數量





bool isPowerOfFour(int num)
{
    return  (__builtin_popcount(num) == 1) &&
           !(__builtin_ctz(num) & 0x1);
}

觀察
$4^{n}$ ，我們可以發現，只要二進制形式符合下列兩個條件，該數便是 4 的冪
- 只有 1 個 1 ，其餘全為 0
- 1 後面 0 的數量必為偶數
效能分析：
其他的改寫：
- RinHizakura 同學

int builtin_ffs (unsigned int x)
- Returns one plus the index of the least significant 1-bit of x
  ( 返回右起第一個 1 的位置 )
- if x is zero, returns zero.

LeetCode 1009. Complement of Base 10 Integer ( 利用 clz )

For a given number N in base-10, return the complement of it's binary representation as a base-10 integer.
Note that except for N = 0, there are no leading zeroes in any binary representation.







int bitwiseComplement(int N){
    if (N == 0)
        return 1;

    int mask = (1 << (32 - __builtin_clz(N))) - 1;
    return ~N & mask;
}

Input: 5
Output: 2
Explanation: 5 is "101" in binary, with complement "010" in binary, which is 2 in base-10.

程式碼解析：


#2    if (N == 0)
#3        return 1;

根據題目提示：passing zero to clz(), which is not a valid argument

#5    int mask = (1 << ( 32 - __builtin_clz(N) )) -1;
#6    return ~N & mask;

以 8 bits 為例
N = 5 = ( 0000 0101 )₂

依據題目要求，應該會需要 ~N= ~5 = ( 1111 1010 )₂ // 1s' complement
- 黃色標記的部份應該要為 0
- 需要更改的 bits 數 = m = __builtin_clz(N) = 5
因此，我們還需要一個 mask = ( 0000 0111 )₂
= ( 0000 1000 ) - 1 = ( 1 << 3 ) - 1 = ( 1 << ( 8 - m ) ) - 1
綜合上述，並推廣到 32 bits，可得：
- mask = (1 << ( 32 - __builtin_clz(N) )) - 1
- 求解即為 ~N & mask
效能分析：

LeetCode 41. First Missing Positive ( 利用 clz )

Given an unsorted integer array, find the smallest missing positive integer.
Your algorithm should run in O(n) time and uses constant extra space.

失敗版


















int firstMissingPositive(int* nums, int numsSize){
    
    int note = 0;
    for (int i = 0; i < numsSize ; i++){
        if (nums[i] > 0 && nums[i] <= numsSize){
            note |= (1 << (nums[i] -1));
        }
    }
    
    if (note==0)
        return 1;
    
    for(int j = 0 ; j < (33 - __builtin_clz(note)); j++){
        if (!((note >> j) & 0x1))
            return j+1;
    }
    return 0;
}

利用 note 的各 bit 紀錄，有哪些正整數出現過
- 若 nums 中有 4，則 note |= (0000 1000)
最後利用 !((note >> j) & 0x1) 找出最早出現 0 的位置，即為所求

runtime error: shift exponent 53 is too large for 32-bit type 'int'

顯然 bit 數根本不夠用

成功版






















#define XORSWAP(a, b) \
  ((a) ^= (b), (b) ^= (a),(a) ^= (b)) \

int firstMissingPositive(int* nums, int numsSize){

    for(int i=0;i<numsSize;i++){
        if(nums[i] > 0 && nums[i] < numsSize){
            int pos = nums[i] - 1;
            if(pos != i && nums[i] != nums[pos]){
                XORSWAP(nums[i],nums[pos]);
                i--;
            }
        }
    }

   for(int i = 0;i < numsSize;i++){
       if(nums[i] != i + 1)
           return i + 1;
   }

   return numsSize + 1;
}

Input: [1,2,0]
Output: 3

Input: [3,4,-1,1]
Output: 2

Input: [7,8,9,11,12]
Output: 1

在 Q5 中，陣列 out 有記錄 bit 為 1 所在位置的功能
- 在這題，我們讓陣列num 也擁有記錄的功能
第一次走訪的過程中 ( #6~#14 )
- 若 0 < num[i] < numsSize，則將 num[i] 和 num[num[i]-1] 互換
  - 前提：位置不同 且 內容不同 ( i.e.(pos != i && nums[i] != nums[pos]))
  - #11: 交換完之後，num[i] 還要在檢查一次 ( 因為內容有更動）
第二次走訪 ( #16~#19 )
- 若位置編號和內容對不上 ( i.e. nums[i] != i + 1 )，代表應該出現的內容為 Missing Positive
  - 對得上的情形：num[0] == 1、 num[1] == 2 etc.
- 第一個沒出現的正確數值，即為 First Missing Positive

研讀 2017 年修課學生報告，理解 clz 的實作方式，並舉出 Exponential Golomb coding 的案例說明

Q3. 利用 `__builtin_popcount` 來實作 LeetCode 1342. Number of Steps to Reduce a Number to Zero

Given a non-negative integer num, return the number of steps to reduce it to zero.
- If the current number is even, you have to divide it by 2,
- otherwise, you have to subtract 1 from it.
假設 int 為 32 位元寬度






int numberOfSteps (int num)
{
    return num ? AAA + 31 - BBB : 0;
    // AAA = __builtin_popcount(num)
    // BBB = __builtin_clz(num)
}

population count 簡稱 popcount 或叫 sideways sum，
- 計算數值的二進位表示中，有多少位元是 1
- 在一些場合下很有用，例如計算 0-1 稀疏矩陣 (sparse matrix)或 bit array 中非 0 元素個數、計算兩個字串的 Hamming distance。
GCC 提供對應的內建函式: __builtin_popcount(x)

以 14 為例：

14 = ( 0000 1110 ) is even; divide by 2 and obtain 7.

7 = ( 0000 0111 ) is odd; subtract 1 and obtain 6.

6 = ( 0000 0110 ) is even; divide by 2 and obtain 3.

3 = ( 0000 0011 ) is odd; subtract 1 and obtain 2.

2 = ( 0000 0010 ) is even; divide by 2 and obtain 1.

1 = ( 0000 0001 ) is odd; subtract 1 and obtain 0.

由上述例子，可以有以下推論：
- 1 所在的最高位元是第 4 個bit，代表需要 divide by 2 ( i.e. 右移 ) 3 次
  - 最高位元的 1 只須 subtract 1 ，不用 divide by 2
- 總共有 3 個 1 ，代表需要 subtract 1 3次
- 上面兩項加總，3 + 3 = 6 即為所求
更廣泛的推論：
- 1 所在的最高位元是第 m 個bit，代表需要 divide by 2 ( i.e. 右移 ) ( m - 1 ) 次
  - 最高位元的 1 只須 subtract 1 ，不用 divide by 2
- 總共有 k 個 1 ，代表需要 subtract 1 k次
- 上面兩項加總，( m - 1 ) + k 即為所求
程式碼：
- ( m - 1 ) = ( 32 - __builtin_clz(num) ) - 1
- k = __builtin_popcount(num)
- ( m - 1 ) + k = [ ( 32 - __builtin_clz(num) ) - 1 ] + __builtin_popcount(num)
  = __builtin_popcount(num) + 31 - __builtin_clz(num)
故 AAA = builtin_popcount(num)、BBB = builtin_clz(num)

Q3. 進階題

改寫程式碼 ( 利用 Bit Twiddling Hacks )

提示: Bit Twiddling Hacks 中提及許多 bitwise operation 的使用，如 bit inverse、abs 等等，是極好的參考材料。

改寫程式碼 ( 避免用到編譯器的擴充功能，只用 C99 特徵及 bitwise 運算 )

Q4. 64-bit GCD (greatest common divisor, 最大公因數)

Greatest Common Divisor 特性和實作考量

有使用 `％` 的版本

#include <stdint.h>
__uint64_t gcd64(__uint64_t u, __uint64_t v) {
    /* case 1 : gcd(a, 0) = a */
    if (!u || !v) return u | v; 
    
    while (v) {                               
        __uint64_t t = v;
        v = u % v;
        u = t;
    }
    return u;
}

$gcd (a, 0) = | a |$
$gcd (a, b) = gcd (b, a)$
$gcd (a, b) = gcd (a, b % a)$

不使用 `％` 的版本

Binary GCD

Binary GCD Algorithm
























#include <stdint.h>
__uint64_t gcd64(__uint64_t u, __uint64_t v) {
    /* case 1 : gcd(a, 0) = a */
    if (!u || !v) return u | v;
    
    int shift;
    for (shift = 0; !((u | v) & 1); shift++) {
        u /= 2, v /= 2;
    }
    while (!(u & 1))
        u /= 2;
    do {
        while (!(v & 1))
            v /= 2;
        if (u < v) {
            v -= u;
        } else {
            __uint64_t t = u - v;
            u = v;
            v = t;
        }
    } while (XXX); // XXX = v
    return YYY; // YYY = u << shift
}

程式碼解析：

#6    int shift;
#7    for (shift = 0; !((u | v) & 1); shift++) {
#8        u /= 2, v /= 2;
#9    }

binary gcd 的精神就是當兩數為偶數時，必有一
$2$ 因子。

$gcd (u, v) = 2 \cdot gcd (\frac{u}{2}, \frac{v}{2})$
shift 紀錄有幾個
$2$ 因子

#10    while (!(u & 1))
#11        u /= 2;

且一數為奇數另一數為偶數，則無
$2$ 因子。

$gcd (u, v) = gcd (u, \frac{v}{2})$

#12    do {
#13        while (!(v & 1))
#14            v /= 2;
#15        if (u < v) {
#16            v -= u;
#17        } else {
#18            __uint64_t t = u - v;
#19            u = v;
#20            v = t;
#21        }
#22    } while (XXX);

假設
$u >= v$ ，總結一些定理：

$u$	$v$	$g c d (u, v)$
$u$	$0$	$u$
$e v e n$	$e v e n$	$2 * g c d (\frac{u}{2}, \frac{v}{2})$
$e v e n$	$o d d$	$g c d (\frac{u}{2}, v)$
$o d d$	$e v e n$	$g c d (u, \frac{v}{2})$
$o d d$	$o d d$	$g c d (\frac{u - v}{2}, v)$

#12~#22 便是不斷操作
$g c d (u, v) = g c d (\frac{u - v}{2}, v)$ 的部份，直到 v = 0 為止
因此， XXX = v

#23     return YYY;

#12~#22 結束後，v = 0
$g c d (u, 0) = u$ ，因此 YYY = u << shift
- 別忘記 #6~#9 得到 shift 個 2 因子

Q4. 進階題

在 x86_64 上透過 `__builtin_ctz` 改寫 GCD




























#include <stdint.h>

#define min(x, y) \
    ((x) < (y) ? (x) : (y)) \

__uint64_t gcd64(__uint64_t u, __uint64_t v) {
    /* case 1 : gcd(a, 0) = a */
    if (!u || !v) return u | v;
    
    int shift = min(__builtin_ctz(u), __builtin_ctz(v));;
    u >>= shift; 
    v >>= shift;
    
    while (!(u & 1))
        u >>= 1;
    do {
        while (!(v & 1))
            v >>= 1;
        if (u < v) {
            v -= u;
        } else {
            __uint64_t t = u - v;
            u = v;
            v = t;
        }
    } while (v); 
    return u << shift ;
}

我們可以改良為： ( #10~#12 )

$even_factor = min (ctz (x), ctz (y))$

$gcd (x, y) = even_factor \cdot gcd ((x ≫ even_factor), (y ≫ even_factor))$

分析對效能的提升

Q5. bit array (也稱 bitset) 在影像處理的使用

考慮以下程式碼 :














#include <stddef.h>
size_t naive(uint64_t *bitmap, size_t bitmapsize, uint32_t *out)
{
    size_t pos = 0;
    for (size_t k = 0; k < bitmapsize; ++k) {
        uint64_t bitset = bitmap[k];
        size_t p = k * 64;
        for (int i = 0; i < 64; i++) {
            if ((bitset >> i) & 0x1)
                out[pos++] = p + i;
        }
    }
    return pos;
}

程式碼解析：
由 uint64_t *bitmap、bitmap[k]，可以判斷 bitmap 為 uint64_t Array
由 uint32_t *out、out[pos++]，可以判斷 out 為 uint32_t Array

§ 6.4 .4 .4 C h a r a c t e r c o n s t a n t s

( p. 61 )

A wide character constant has type wchar_t, an integer type defined in the <stddef.h> header.
The value of a wide character constant containing a single multibyte character that maps to a member of the extended execution character set is the wide character corresponding to that multibyte character, as defined by the mbtowc function, with an implementation-defined current locale.
The value of a wide character constant containing more than one multibyte character, or containing a multibyte character or escape sequence not represented in the extended execution character set, is implementation-defined.

#04    size_t pos = 0;
#05    for (size_t k = 0; k < bitmapsize; ++k) {
#06        uint64_t bitset = bitmap[k];

每 64 個 bits 一組進行迴圈

#07        size_t p = k * 64;
#08        for (int i = 0; i < 64; i++) {
#09            if ((bitset >> i) & 0x1)
#10                out[pos++] = p + i;
                // out[pos] = p + i;
                // pos +=1 ;
#11        }

#09 : 逐一檢查 bitset 的 64 個 bits ，只要發現該 bit 為 1 (i.e. ((bitset >> i) & 0x1) 為 true )，便進入 #10
#10 : 將 p + i 寫入 out[pos++]
- #07 : bitset 的第 1 個 bit = bitmap 中的第 p 個 bit
- 1 出現在第p + i個 bit 中 ( 以 bitmap 的角度來看 )
- 將出現 1 的編號 ( i.e. p + i ) 紀錄於陣列out
- 可推理出 pos 代表 bitmap 中，出現 1 的個數
  
  array 由 0 起算；人類由 1 起算
  因此，需要 pos++

`__builtin_ctzll` 改進版 :
















#include <stddef.h>
size_t improved(uint64_t *bitmap, size_t bitmapsize, uint32_t *out)
{
    size_t pos = 0;
    uint64_t bitset;
    for (size_t k = 0; k < bitmapsize; ++k) {
        bitset = bitmap[k];
        while (bitset != 0) {
            uint64_t t = KKK; // KKK = bitset & -bitset
            int r = __builtin_ctzll(bitset);
            out[pos++] = k * 64 + r;
            bitset ^= t;                           
        }
    }
    return pos;
}

GCC 提供對應的內建函式:

builtin_ctzll (unsigned long long)
- 返回右起第一個 1 之後的 0 的個數

程式碼解析：

#08        while (bitset != 0){

第一個改良：若 bitset 之中沒有 1 ( i.e. 全為 0 )，則不用進行檢查，可直接進入下一組

#10            int r = __builtin_ctzll(bitset);
#11            out[pos++] = k * 64 + r;

#10、#11 可將 bitmap 中出現 1 的 bit 所在 index 紀錄於陣列 out 中

#09            uint64_t t = KKK; 

#12            bitset ^= t;

觀察程式碼，我們猜測 #12 可將 bitset 中最低位的 1 換成 0，如此一來，下個 while 迴圈才能正確執行

以 8 bits 為例
- 假設 bitset = ( **** *100 )
- t 應為 ( 0000 0100 )
- -bitset = (
  $\bar{*}$
  $\bar{*}$
  $\bar{*}$
  $\bar{*}$
  $\bar{*}$ 100 ) // 2's complement
- 故，t = KKK = bitset & -bitset

2020q3 Homework3 (quiz3)

tags: `RusselCK`

Q1.在二補數系統實作跨平台有號整數的 arithmetic right shift ( ASR )

Q1. 進階題

實作其他資料寬度的 ASR

Q2. LeetCode 342. Power of Four (`__builtin_ctz` 實作)

Q2. 進階題

改寫上述程式碼，提供等價功能的不同實作，儘量降低 branch 的數量

LeetCode 1009. Complement of Base 10 Integer ( 利用 clz )

LeetCode 41. First Missing Positive ( 利用 clz )

失敗版

成功版

研讀 2017 年修課學生報告，理解 clz 的實作方式，並舉出 Exponential Golomb coding 的案例說明

Q3. 利用 `__builtin_popcount` 來實作 LeetCode 1342. Number of Steps to Reduce a Number to Zero

Q3. 進階題

改寫程式碼 ( 利用 Bit Twiddling Hacks )

改寫程式碼 ( 避免用到編譯器的擴充功能，只用 C99 特徵及 bitwise 運算 )

Q4. 64-bit GCD (greatest common divisor, 最大公因數)

有使用 `％` 的版本

不使用 `％` 的版本

Q4. 進階題

在 x86_64 上透過 `__builtin_ctz` 改寫 GCD

分析對效能的提升

Q5. bit array (也稱 bitset) 在影像處理的使用

考慮以下程式碼 :

`__builtin_ctzll` 改進版 :

Q5. 進階題

舉出 Q5 用在哪些真實案例中

檢驗 clz 改寫的程式碼相較原本的實作有多少改進

思考進一步的改進空間

2020q3 Homework3 (quiz3)

tags: RusselCK

Q1.在 二補數系統 實作跨平台 有號整數的 arithmetic right shift ( ASR )

Q1. 進階題

實作其他資料寬度的 ASR

Q2. LeetCode 342. Power of Four (__builtin_ctz 實作)

Q2. 進階題

改寫上述程式碼，提供等價功能的不同實作，儘量降低 branch 的數量

LeetCode 1009. Complement of Base 10 Integer ( 利用 clz )

LeetCode 41. First Missing Positive ( 利用 clz )

失敗版

成功版

研讀 2017 年修課學生報告，理解 clz 的實作方式，並舉出 Exponential Golomb coding 的案例說明

Q3. 利用 __builtin_popcount 來實作 LeetCode 1342. Number of Steps to Reduce a Number to Zero

Q3. 進階題

改寫程式碼 ( 利用 Bit Twiddling Hacks )

改寫程式碼 ( 避免用到編譯器的擴充功能，只用 C99 特徵及 bitwise 運算 )

Q4. 64-bit GCD (greatest common divisor, 最大公因數)

有使用 ％ 的版本

不使用 ％ 的版本

Q4. 進階題

在 x86_64 上透過 __builtin_ctz 改寫 GCD

分析對效能的提升

Q5. bit array (也稱 bitset) 在影像處理的使用

考慮以下程式碼 :

__builtin_ctzll 改進版 :

Q5. 進階題

舉出 Q5 用在哪些真實案例中

檢驗 clz 改寫的程式碼相較原本的實作有多少改進

思考進一步的改進空間

Read more

VSCode 套件安裝

ARGB To BW

深入淺出物件導向分析與設計 筆記

TCL 讀書會(?

tags: `RusselCK`

Q1.在二補數系統實作跨平台有號整數的 arithmetic right shift ( ASR )

Q2. LeetCode 342. Power of Four (`__builtin_ctz` 實作)

Q3. 利用 `__builtin_popcount` 來實作 LeetCode 1342. Number of Steps to Reduce a Number to Zero

有使用 `％` 的版本

不使用 `％` 的版本

在 x86_64 上透過 `__builtin_ctz` 改寫 GCD

`__builtin_ctzll` 改進版 :

深入淺出物件導向分析與設計筆記