2020q3 Homework (quiz8)

contributed by < shauming1020 >

tags: `homework`

待釐清的議題 quiz8

測驗 1

狀態壓縮、Dynamic Programming
理解遞迴版本 recursive-nos1.c
說明 interative-nos1.c 和 interative-nos2.c 之間的差異，為何 interative-nos2.c 執行時間較短呢 ?
嘗試實作不同上述的程式碼 (限制為 C99/C11 + GNU extensions)，應比較遞迴和非遞迴的形式在效能的落差，並分析時間與空間複雜度
練習 LeetCode 1595. Minimum Cost to Connect Two Groups of Points

測驗 2

解釋上述程式碼運作原理
探討用 hash 替換 Skip List 裡頭 random 的效益和潛在的議題
實作 delete 操作
研讀 Cache-Sensitive Skip List: Efficient Range Queries on modern CPUs，指出考慮到現代處理器架構的 Skip List 實作，有哪些考量和取捨
練習 LeetCode 1206. Design Skiplist，除了提交 C 語言的實作並通過自動評分系統，也該提供正確性測試和效能分析

測驗 1

LeetCode 1494. Parallel Courses II 給定一整數

n

表示某所大學裡課程的數目，編號為

1

到

n

，給定陣列 dependencies，使得

d e p e n d e n c i e s [i] = [x_{i}, y_{i}]

表示先修課的關係，亦即課程

x_{i}

必須在課程

y_{i}

之前選修並通過。另外，給予一整數

k

，表示在一個學期裡頭，你最多可同時選修

k

門課，前提是這些課的先修課在之前的學期裡已修過。程式應該回傳上完所有課最少需要多少個學期。題目保證一定存有一種修完所有課的方式。

範例 1.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

給定輸入:
n = 4 (即 4 門課程)
dependencies =
$[[2, 1], [3, 1], [1, 4]]$

即選修 1 這門課，應當先修過 2 和 3 這二門課，又，選修 4 之前應當先修過 1 這門課
k = 2 (一個學期中，你至多可同時選修 2 門課程)

預期輸出 : 3
上圖展現可能的修課方式: 第一個學期選修課程 2 和課程 3，然後第二個學期選修課程 1 ，第三個學期選修課程 4。

範例 2.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

給定輸入:
n = 5 (即 5 門課程)
dependencies =
$[[2, 1], [3, 1], [4, 1], [1, 5]]$

即選修 1 這門課，應當先修過 2, 3 和 4 這三門課，又，選修 5 之前應當先修過 1 這門課
k = 2 (一個學期中，你至多可同時選修 2 門課程)

預期輸出: 4
上圖展現可能的修課方式: 第一學期選修課程 2 和 3 (注意一學期僅能同時選修 2 門課程)，第二學期選修課程 4，第三學期選修課程 1，第四學期選修課程 5

本題限制:

$1 \leq n \leq 15$
$1 \leq k \leq n$
$0 \leq d e p e n d e n c i e s . l e n g t h \leq \frac{n \times (n - 1)}{2}$
$d e p e n d e n c i e s [i] . l e n g t h = 2$
$1 \leq x_{i}, y_{i} \leq n$
$x_{i} \neq y_{i}$
所有先修關係都不同，也就是說
$d e p e n d e n c i e s [i] \neq d e p e n d e n c i e s [j]$
題目輸入的圖是個有向無環圖

Dynamic Programming(DP) 本質是暴力法 (brute force; complete search)，但因它採用表格欄位去紀錄答案 (memorization)，所以每個 subproblem 僅運算一次即可。

狀態壓縮是利用二進位來表達中間的狀態，也就是說，狀態只能是 0 或 1，若用集合來思考，就是「在」或「不在」集合中。此手法適用於資料量不大的案例。先用陣列儲存之前的狀態，再由狀態及其對應的數值，推導出狀態轉移方程式，最終可得適當的解法。

以本題來說，

n

至多為

15

，而且一門課「選」或者「不選」即可運用二進位的位元表示，符合上述的狀態壓縮技巧。演算法如下:

計算每個課程的直接相依課程的 bit mask (位元遮罩)，記為 pre
令狀態
$f (S)$ 表示完成位元遮罩
$S$ 的課程所需要的最少學期數
在初始階段，
$f (0) = 0$ ，其餘為位元遮罩有效範圍以外的數值
狀態移轉時，從某個已修過的課程位元遮罩
$S_{0}$ ，嘗試求出
$S_{1}$ ，後者表示目前可選擇的新課程（新課程也該滿足相依條件)，再從
$S_{1}$ 逐次找出小於等於
$k$ 門課程，其位元遮罩記為
$S$ ，移轉關係為
$f (S_{0} \lor S) = m i n (f (S_{0} \lor S), f (S_{0}) + 1)$ 。
最終解為
$f ((1 ≪ n) - 1)$

而 Dynamic Programming(DP) 在本題的使用時機為 …

延伸問題 1

解釋上述程式碼運作原理，為何 interative-nos2.c 會比 interative-nos1.c 有更短的執行時間呢？interative-nos2.c 可如何改進？

我們先從遞迴版本 recursive-nos1.c 來開始理解
















































#include <string.h>      
#include <limits.h>

#define min(a, b) (((a) < (b)) ? (a) : (b))
static void solve(int i, int s, int k, int n, int s0, int s1, int *dp)
{
    if ((k == 0) || (i == n)) {
        dp[s0 | s] = min(dp[s0 | s], dp[s0] + 1);
        return;
    }

    solve(i + 1, s, k, n, s0, s1, dp);

    if ((s1 >> i) & 1)
        solve(i + 1, s | 1 << i, k - 1, n, s0, s1, dp);
}

int minNumberOfSemesters(int n,
                         int **dependencies,
                         int dependenciesSize,
                         int *dependenciesColSize,
                         int k)
{
    const size_t cap = 1 << n;
    int pre[cap];
    memset(pre, 0, sizeof(pre));
    for (int i = 0; i < dependenciesSize; i++)
        pre[dependencies[i][1] - 1] |= 1 << (dependencies[i][0] - 1);

    int dp[cap];
    for (int i = 1; i < cap; i++)
        dp[i] = INT_MAX;
    dp[0] = 0;

    for (int s0 = 0; s0 < cap; s0++) {
        if (dp[s0] == INT_MAX)
            continue;

        int s1 = 0;
        for (int i = 0; i < n; i++)
            if (!((s0 >> i) & 1) && ((pre[i] & s0) == pre[i]))
                s1 |= 1 << i;

        solve(0, 0, k, n, s0, s1, dp);
    }

    return dp[cap - 1];
}

範例 1

Image Not Showing Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →

n = 4，k = 2，dependencies =
$[[2, 1], [3, 1], [1, 4]]$

狀態 cap = 0010 表示已修完課程 2 ，cap = 1111 表示已修完所有課程。
```
 const size_t cap = 1 << n;
 int pre[cap];
 
 for (int i = 0; i < dependenciesSize; i++)
     pre[dependencies[i][1] - 1] |= 1 << (dependencies[i][0] - 1);
```
宣告 pre[cap] 來記錄所有狀態 cap 的先修課程，其中 cap 大小為
$2^{n} = 16, w h e r e n = 4$ ，
用 OR 運算來選課，例如 pre[0] = 0110，表課程 1 的先修課為 2 和 3，
$0000 | 0100 | 0010 = 0110$ ，
pre[3] = 0001，表課程 4 的先修課為 1，而其他狀態( 如沒有先修課的課程) 皆被設成 0。
Note: 共有 n 門課，因此將 pre 的大小設為 n 即可。
```
 int dp[cap];
 for (int i = 1; i < cap; i++)
     dp[i] = INT_MAX;
 dp[0] = 0;
```
使用陣列 dp 來存放已計算過的某狀態所需最少修課學期，
例如 dp[2] = dp[0010] = 1，表修完課程 2 最少需要 1 個學期，而當修完課程 1 時，也已完成先修課 2 和 3，
因此 dp[7] = dp[0111] = 2 = min(dp[0111], dp[0110] + 1)，dp[1] = dp[0001] = NULL，
故 dp[15] = dp[1111] = dp[cap - 1] 則是我們欲求的修完所有課程所需最少學期數。
由於我們要找出最小值，因此初始化 dp 的每個值為極大值 INT_MAX。

 for (int s0 = 0; s0 < cap; s0++) {
     if (dp[s0] == INT_MAX)
         continue;

     int s1 = 0;
     for (int i = 0; i < n; i++)
         if (!((s0 >> i) & 1) && ((pre[i] & s0) == pre[i]))
             s1 |= 1 << i;

     solve(0, 0, k, n, s0, s1, dp);
 }

遍歷所有修課狀態 s0，0000 -> 0001 -> 0010 -> 0011 -> … -> 1111

```
     if (dp[s0] == INT_MAX)
         continue;
```
若 s0 為合法的狀態，其 dp[s0] 不會是初值 INT_MAX，因為已經透過 solve 函式解出其最少所需的學期數並存於 dp 中。

     int s1 = 0;
     for (int i = 0; i < n; i++)
         if (!((s0 >> i) & 1) && ((pre[i] & s0) == pre[i]))
             s1 |= 1 << i;

s0 表當前已修完課程的狀態，s1 表 s0 可選的新課程，例如 s0 = 0000, s1 = 0110、s0 = 0100, s1 = 0010、s0 = 0110, s1 = 0001
…

 static void solve(int i, int s, int k, int n, int s0, int s1, int *dp)
 {
     if ((k == 0) || (i == n)) {
         dp[s0 | s] = min(dp[s0 | s], dp[s0] + 1);
         return;
     }

     solve(i + 1, s, k, n, s0, s1, dp);

     if ((s1 >> i) & 1)
         solve(i + 1, s | 1 << i, k - 1, n, s0, s1, dp);
 }

接下來從狀態 s0 開始遍歷，從可選的新課程 s1 中選至多 k 門課得到欲選課程 s，而修了課程後的狀態 [s0|s] ，其最少修課學期數為：比較修課狀態 s0 的最少修課學期數 +1 與原 [s0|s] 的最少修課學期數。

引數說明 :
1. 位移 i 位元來檢查 s1 還有哪些課可選 (s1 >> i) & 1 ，有的話則選這些課 s | 1 << i 來繼續遞迴求解
2. 從可選的新課程 s1 中選了s 課 (也可不選，即 s = 0000)
3. 尚可修 k 門課
4. 共 n 門課，因此至多位移 n 位元
5. 從修課狀態 s0 開始遍歷
6. s1 為 s0 可選的新課程

e.g.

// 從 s0 = 0000 開始遍歷, 可選新課程 s1 = 0110
solve s0=(nil), s1=0x6  
...
// 選了課程 s = 2, 剩下 1 門課可選, 
i=2, s=2, k=1, n=4, s0=(nil), s1=0x6, dp=(nil) 
i=3, s=2, k=1, n=4, s0=(nil), s1=0x6, dp=(nil) 
i=4, s=2, k=1, n=4, s0=(nil), s1=0x6, dp=(nil) 

// 終止條件為 1. 可位移的值為 n 時, i.e, i == n 
// 則去看修了課程 s 的最小學期數需不需要 update
set dp[0|0x2] = dp[0x2] = (nil) 

// 選了兩門課 s = 0110
i=3, s=6, k=0, n=4, s0=(nil), s1=0x6, dp=(nil) 

// 終止條件為 或是 2. 沒課可選時, i.e. k == 0
// 去看修了課程 s 的最小學期數需不需要 update
set dp[0|0x6] = dp[0x6] = (nil) 
...

接下來探討非遞迴版本 iterative-nos1.c

































#define min(a, b) (((a) < (b)) ? (a) : (b))

int minNumberOfSemesters(int n,
                         int **dependencies,
                         int dependenciesSize,
                         int *dependenciesColSize,
                         int k)
{
    uint16_t prerequisite[n];
    memset(prerequisite, 0, sizeof(prerequisite));
    for (int i = 0; i < dependenciesSize; ++i)
        prerequisite[dependencies[i][1] - 1] |= 1 << (dependencies[i][0] - 1);

    const int cap = 1 << n;                   
    uint8_t dp[cap];
    memset(dp, 1 << 6, cap);
    dp[0] = 0;

    for (uint32_t i = 0; i < cap; ++i) {
        uint32_t ready_for_take = 0;
        for (uint32_t j = 0; j < n; ++j) {
            if ((prerequisite[j] & i) == prerequisite[j])
                ready_for_take |= 1 << j;
        }

        ready_for_take &= ~i;
        for (uint32_t j = ready_for_take; j; j = ready_for_take & (j - 1)) {
            if (__builtin_popcount(j) <= k)
                dp[i | j] = min(dp[i | j], dp[i] + 1);
        }
    }
    return dp[cap - 1];
}

範例 2

n = 5，k = 2，dependencies =
$[[2, 1], [3, 1], [4, 1], [1, 5]]$

與遞迴版本一樣， dp 紀錄狀態 cap 所需最少學期個數，用 1<<6 = 64 作為 dp 的初始值， prerequisite 紀錄狀態 cap 的先修課程。

prerequisite[0] = 01110
prerequisite[1] = 00000
prerequisite[2] = 00000
prerequisite[3] = 00000
prerequisite[4] = 00001

```
 for (uint32_t i = 0; i < cap; ++i) {
     uint32_t ready_for_take = 0;
     for (uint32_t j = 0; j < n; ++j) {
         if ((prerequisite[j] & i) == prerequisite[j])
             ready_for_take |= 1 << j;
     }

     ready_for_take &= ~i;
     for (uint32_t j = ready_for_take; j; j = ready_for_take & (j - 1)) {
         if (__builtin_popcount(j) <= k)
             dp[i | j] = min(dp[i | j], dp[i] + 1);
     }
 }
```
遍歷所有狀態，根據當前狀態 i 下學期可選的課程 ready_for_take，去求出選了其中至多 k 門課的最少學期數。
- ```
 for (uint32_t j = 0; j < n; ++j) {
     if ((prerequisite[j] & i) == prerequisite[j])
         ready_for_take |= 1 << j;
 }
```
  逐一比對狀態 i 是否已包含課程 j 的先修課程條件，滿足的話則將 j 課納入候選課名單 ready_for_take，否則，將那些沒有先修課的課程納入候選名單，
  因此執行完 ready_for_take &= ~i 會得到狀態 i 的下學期可選課程，
  e.g.
  修課狀態 i = 01110 符合課程 j = 0 的先修課條件 prerequisite[0] = 01110，因此 ready_for_take = 01111，ready_for_take&= ~i => 00001，
  修課狀態 i = 00110 皆不符合課程 j = 0, 4 的先修課條件，因此將沒有先修限制的課程納入 ready_for_take = 01110，ready_for_take&= ~i => 01000。
- ```
    for (uint32_t j = ready_for_take; j; j = ready_for_take & (j - 1)) {
        if (__builtin_popcount(j) <= k)
            dp[i | j] = min(dp[i | j], dp[i] + 1);
    }
```
  for (j = ready_for_take; j; j = ready_for_take & (j - 1))
  列出 ready_for_take 所有位元 1 的位置排列組合
  e.g.
  j = 11110 = ready_for_take
  j = 11100
  j = 11010
  …
  j = 00100
  j = 00010
  
  從可選課程 ready_for_take 中挑出滿足至多同時選課數 k 的課程，計算選了這些課的修課狀態 [i | j] 之最少所需學期數。
- 不合法修課的狀態如 i = 00001, i = 11000, i = 10010, ..., etc，其 dp 值始終為初始值 64，這是因為唯有滿足先修課條件的課程狀態才有機會更新 dp 值，而一切都從初始課程狀態 i = 00000 開始，其 dp 值為 0。
參數說明：
1. 修課狀態 i
2. 下學期可選課程 ready_for_take

比較另一個非遞迴版本 iterative-nos2.c



















































#define min(a, b) (((a) < (b)) ? (a) : (b))
int minNumberOfSemesters(int n,
                         int **dependencies,
                         int dependenciesSize,
                         int *dependenciesColSize,
                         int k)
{
    const size_t cap = 1 << n;

    /* pre[i]: the bit representation of all dependencies of course i */
    uint16_t pre[cap];
    memset(pre, 0, sizeof(pre));
    for (int i = 0; i < dependenciesSize; i++)
        pre[dependencies[i][1] - 1] |= 1 << (dependencies[i][0] - 1);

    /* i is the bit representation of a combination of courses.
     * dp[i] is the minimum days to complete all the courses.
     */
    uint8_t dp[cap];
    memset(dp, INIT, sizeof(dp));
    dp[0] = 0;

    uint8_t count[cap];
    memset(count, 0, sizeof(count));
    for (uint32_t i = 0; i < cap; i++) {
        for (uint32_t j = 0; j < n; j++)
            if (i & (1 << j))
                count[i]++;
    }

    for (uint32_t i = 0; i < cap; i++) {
        uint32_t b = 0;
        /* figure out "b", the bit representation of the all courses
         * that we can study now. Since we have i and pre[j], we know
         * course j can be studied if i contains all its prerequisites.
         */
        for (uint32_t j = 0; j < n; j++) {
            if (COND && (pre[j] & i) == pre[j])
                b |= 1 << j;
        }
        if (count[b] <= k) {
            dp[i | b] = min(dp[i | b], dp[i] + 1);
        } else {
            for (uint32_t j = b; j; j = (j - 1) & b) {
                if (count[j] == k)
                    dp[i | j] = min(dp[i | j], dp[i] + 1);
            }
        }
    }
    return dp[cap - 1];
}

pre 紀錄狀態 cap 的先修課程，用 INIT 作為 dp 的初始值，count 紀錄每個狀態有幾個位元 1。
e.g.
```
count[(nil)]=0 
count[0x1]=1 
count[0x2]=1 
count[0x3]=2 
...
```
Note: 題目有給予課程數量的限制：至多 15 門課，因此最壞情況為每門課恰只有一門先修課，至多需要花費 15 學期，如此一來 INIT 只需大於等於 0xF 即可，選項中只有 0x5F 滿足此條件。

 for (uint32_t i = 0; i < cap; i++) {
     uint32_t b = 0;
     
     for (uint32_t j = 0; j < n; j++) {
         if (!(i & (1 << j)) && (pre[j] & i) == pre[j])
             b |= 1 << j;
     }
     if (count[b] <= k) {
         dp[i | b] = min(dp[i | b], dp[i] + 1);
     } else {
         for (uint32_t j = b; j; j = (j - 1) & b) {
             if (count[j] == k)
                 dp[i | j] = min(dp[i | j], dp[i] + 1);
         }
     }
 }

```
   for (uint32_t j = 0; j < n; j++) {
       if (!(i & (1 << j)) && (pre[j] & i) == pre[j])
           b |= 1 << j;
   }
```
對修課狀態 i 計算下學期可選課程 b，即非遞迴版本 1 的 ready_for_take，
若滿足下列條件
1. (pre[j] & i) == pre[j] : 修課狀態 i 已完成課程 j 的先修課程條件。
2. !(i & (1 << j)) : 且課程 j 還沒選，即 COND。
則將課程 j 納入候選課名單 b |= 1 << j。

 if (count[b] <= k) {
         dp[i | b] = min(dp[i | b], dp[i] + 1);
     } else {
         for (uint32_t j = b; j; j = (j - 1) & b) {
             if (count[j] == k)
                 dp[i | j] = min(dp[i | j], dp[i] + 1);
         }
     }

與版本 1 不同，透過預先計算好所有狀態的課程數 count，即位元 1 的個數，利用查表的方式來避免多次 __builtin_popcount 的運算，因此會比 iterative-nos1.c 快。

改進 iterative-nos2.c

在 LeetCode 提交後的執行結果
Runtime: 4 ms
Memory Usage: 5.9 MB














































#define min(a, b) (((a) < (b)) ? (a) : (b))
int minNumberOfSemesters(int n,
                         int **dependencies,
                         int dependenciesSize,
                         int *dependenciesColSize,
                         int k)
{
    if (k == 1) return n;
    
    const size_t cap = 1 << n;

    uint16_t pre[cap];
    memset(pre, 0, sizeof(pre));
    for (int i = 0; i < dependenciesSize; i++)
        pre[dependencies[i][1] - 1] |= 1 << (dependencies[i][0] - 1);


    uint8_t dp[cap];
    memset(dp, 0xf, sizeof(dp));
    dp[0] = 0;

    uint8_t count[cap];
    memset(count, 0, sizeof(count));
    for (uint32_t i = 0; i < cap; i++) {
        count[i] = __builtin_popcount(i);
    }

    for (uint32_t i = 0; i < cap; i++) {
        if (dp[i] == 0xf) continue;
        
        uint32_t b = 0;
        for (uint32_t j = 0; j < n; j++) {
            if (!(i & (1 << j)) && (pre[j] & i) == pre[j])
                b |= 1 << j;
        }
        if (count[b] <= k) {
            dp[i | b] = min(dp[i | b], dp[i] + 1);
        } else {
            for (uint32_t j = b; j; j = (j - 1) & b) {
                if (count[j] == k)
                    dp[i | j] = min(dp[i | j], dp[i] + 1);
            }
        }
    }
    return dp[cap - 1];
}

用 __builtin_popcount() 取代 for 運算來初始 count

    uint8_t count[cap];
    for (uint32_t i = 0; i < cap; i++) {
        count[i] = __builtin_popcount(i);
    }

對於非法狀態 i ，其 dp[i] 為 0xf，直接忽略來省下不必要的計算，if (dp[i] == 0xf) continue;。
當每學期只能選一門課時，即 k = 1，因此修完所有課需要 n 學期， if (k == 1) return n;。

延伸問題 2

嘗試實作不同上述的程式碼 (限制為 C99/C11 + GNU extensions)，應比較遞迴和非遞迴的形式在效能的落差

不使用 DP 的話，可以用比較直覺的深度或廣度優先搜尋演算，去搜尋最少完成修課學期數。

範例 2

n = 5，k = 2，dependencies =
$[[2, 1], [3, 1], [4, 1], [1, 5]]$

BFS (非遞迴)































































#define min(a, b) (((a) < (b)) ? (a) : (b))

int minNumberOfSemesters(int n,
                         int **dependencies,
                         int dependenciesSize,
                         int *dependenciesColSize,
                         int k)
{
    if (k == 1) return n;

    uint16_t pre[n];
    memset(pre, 0, sizeof(pre));
    for (int i = 0; i < dependenciesSize; ++i)
        pre[dependencies[i][1] - 1] |= 1 << (dependencies[i][0] - 1);

    const int cap = 1 << n;                   

    int Q[cap], T[cap];
    memset(Q, 0, cap);
    memset(T, 0, cap);
    Q[0] = 0;
    T[0] = 0;

    int s, t_size = 0, q_size = 1, done =  cap - 1;

    // bfs with queue Q
    for(int c = 1; c <= n; c++) {
        memset(T, 0, cap); // T = set()
        t_size = 0;
        for (int a = 0;  a < q_size; a++) { // for s in S :
            s = Q[a];

            /* ready for take course */
            uint32_t ready_for_take = 0;
            for (int i = 0; i < n; i++) {
                if (pre[i] == (s & ((1 << i) | pre[i])))
                    ready_for_take |= 1 << i;
            }

            /* take the most k courses */
            int m = min(__builtin_popcount(ready_for_take), k);
             for (uint32_t r = ready_for_take; r; r = ready_for_take & (r - 1)) {
                 if (__builtin_popcount(r) == m) { 
                     /* check set Q, T.push(t=r|s)*/
                     int z = 1;
                     for (int i = 0; i < t_size && z; i++)
                        if (T[i] == (r|s)) z = 0;
                    if (z) T[t_size++] = r | s;
                 } 
             }           
        }

        /* if done in T */
        for (int i = 0; i < t_size; i++) {
            Q[i] = T[i];
            if (T[i] == done) 
                return c;
        }
        /* Q = T */
        q_size = t_size;
    }
    return -1;
}

從學期數 1 開始搜尋完成所有課程的狀態 (1<<n) - 1，直到學期數 n 為止。
從佇列 Q 中 pop 一個待走訪的狀態 s，根據 s 的下學期可選課程 ready_for_take 去選擇至多 k 門課 r，得到下學期的修課狀態 t = r | s，並 push 進下學期的修課狀態佇列 T 中。
計算完所有的下學期修課狀態後 (存於佇列 T)，去查看完成所有課程的狀態 (1<<n) - 1 有無在 T 之內，有的話則回傳當前修課學期，否則，佇列 T 成為待走訪的佇列 Q，接續往下學期搜尋。

Pseudo Code


















queue Q = new queue;
Q.push(0);
for semester = 1 to n
{
    queue T = new queue;
    do {
        s = Q.pop();
        ready_for_take = The courses could be taken at the next semester for state s.
        r = Take the most k courses from ready_for_take.
        t = r|s, chooses these courses. 
        T.push(t);
    } while(!Q.empty())

    if (1<<n)-1 in T:
        return semester;
    else:
      Q = T;
}

BFS Tree

在 LeetCode 提交後的執行結果
Runtime: 936 ms
Memory Usage: 6.1 MB

DFS (遞迴)


















































#define min(a, b) (((a) < (b)) ? (a) : (b))

int minNumberOfSemesters(int n,
                         int **dependencies,
                         int dependenciesSize,
                         int *dependenciesColSize,
                         int k)
{
    if (k == 1) return n;

    uint16_t pre[n];
    memset(pre, 0, sizeof(pre));
    for (int i = 0; i < dependenciesSize; ++i)
        pre[dependencies[i][1] - 1] |= 1 << (dependencies[i][0] - 1);

    const int cap = 1 << n;                   

    int S[cap];
    memset(S, 0, cap);
    S[0] = 0;
    return dfs(n, k, 0, pre, S, 1, n);
}

int dfs(int n, int k, int dep, uint16_t *pre, int *S, int s_size, int min_dep)
{
    int s = S[--s_size];

    if (s == (1 << n) - 1) {
        min_dep = min(min_dep, dep);
        return min_dep;
    }
    /* ready for take course */
    uint32_t ready_for_take = 0;
    for (int i = 0; i < n; i++) {
        if (pre[i] == (s & ((1 << i) | pre[i])))
            ready_for_take |= 1 << i;
    }

     /* take the most k courses */
    int m = min(__builtin_popcount(ready_for_take), k);
    for (uint32_t r = ready_for_take; r; r = ready_for_take & (r - 1)) {
         if (__builtin_popcount(r) == m) { 
             /*  S.push(t=r|s) */
             S[s_size++] = r | s;
             min_dep = dfs(n, k, dep+1, pre, S, s_size, min_dep);
         } 
     } 

    return min_dep;
}

與 BFS 不同的是採用堆疊 S 來紀錄待走訪的節點。
每走訪一個修課狀態 s 就將其下學期所有的可能修課狀態 push 進堆疊 S 中，接著就往最後進堆疊的狀態繼續走訪，並檢查該狀態是否為 (1<<n) - 1 ，是的話則更新最少修課學期數 min_dep。
直到走訪完所有可修課狀態，回傳 min_dep。

Pseudo Code




















stack S = new stack;
S.push(0);
global min_deep = n;
dfs(S, 0);
return min_deep;

dfs(stack *S, int deep)
{
    s = S.pop();

    if (s == (1<<n)-1): 
        min_deep = min(min_deep, deep);

    ready_for_take = The courses could be taken at the next semester for state s.
    r = Take the most k courses from ready_for_take.
    t = r|s, chooses these courses. 

    S.push(t);
    dfs(S, deep+1);
}

DFS Tree

在 LeetCode 提交後的執行結果
Runtime: 716 ms
Memory Usage: 6.3 MB

延伸問題 3

分析上述 (包含你的改作練習) 程式碼的時間和空間複雜度

bfs
dfs
recursive-nos1.c
iterative-nos2.c

延伸問題 4

練習 LeetCode 1595. Minimum Cost to Connect Two Groups of Points

給予兩群的節點，第一群有 size1 個節點，第二群有 size2 個節點，其中 size1 >= size2。

在這兩群的節點之間有權重邊相連，cost[i][j] 為第一群的第 i 節點與第二群的第 j 節點相連之權重，如果兩群中的每個節點都與另一群的一個或多個節點連接，則稱這兩群是連通的，換言之，第一群中的每個節點必須至少與第二群中的一個節點連接，且第二群中的每個節點必須至少與第一群中的一個節點連接。

求出連接兩群的最低成本。

範例 1

Input: cost = [[15, 96], [36, 2]]
Output: 17
Explanation: The optimal way of connecting the groups is:
1–A
2–B
This results in a total cost of 17.

範例 2

Input: cost = [[1, 3, 5], [4, 1, 1], [1, 5, 3]]
Output: 4
Explanation: The optimal way of connecting the groups is:
1–A
2–B
2–C
3–A
This results in a total cost of 4.
Note that there are multiple points connected to point 2 in the first group and point A in the second group. This does not matter as there is no limit to the number of points that can be connected. We only care about the minimum total cost.

範例 3

Input: cost = [[2, 5, 1], [3, 4, 7], [8, 1, 2], [6, 2, 4], [3, 8, 8]]
Output: 10

限制
- size1 == cost.length
- size2 == cost[i].length
- 1 <= size1, size2 <= 12
- size1 >= size2
- 0 <= cost[i][j] <= 100

解題
1. 最小生成樹 MST
  將兩群視為一張無向圖，初始取的邊集合 S = {}，每次從圖中取一最小邊並檢驗取的邊加到 S 中是否會形成 cycle，若會形成 cycle 則不取此邊，直到圖中的每個邊皆被檢驗過，得到一個邊的總和為最小的樹。
  而在此題最優解未必為一個樹，反例如下
  
  若以 MST 的解法會取 2-B 使得取出的所有邊形成一個樹，但最佳解不需取 2-B ，且最後會形成一個森林 (T1: 1-A-2-C, T2: 3-B)。
2. Dynamic Programming (DP)
  定義 dp[i][j] 表在第一群的前 i 個節點與第二群的前 j 個節點之最低連接成本，但這樣無法透過解子問題來求，反例如下
  
  最佳解為 dp[3][3]: {1-A, 2-C, 3-B}，但解子問題 dp[2][3]: {1-A, 2-C, B} 時，B 必須與 3 相連，否則孤立的 B 會使兩群不連通。
3. 狀態壓縮 + Dynamic Programming (DP)
  定義 dp[i][j] 表在第一群的前 i 個節點與第二群所取的節點集合狀態 s 之最低連接成本，例如 s=101 表只取 A 和 C 節點，因此問題則是從 dp[0][0] = 0 開始遍歷所有 i, s ，而最後 dp[m][(1<<n)-1] 表第一群的所有 m 個節點與第二群所有的節點之最低連接成本，即為所求。
  
  狀態轉移如下:
```
dp[i][s | (1<<j)] = min(dp[i][s | (1<<j)],
                        dp[i][s] + cost[i][j],
                        dp[i-1][s] + cost[i][j])
```
  - dp[i][s | (1<<j)] 表在第二群當前所取節點狀態 s 中加入第 j 個節點，與第一群前 i 個節點之最低成本
  - dp[i-1][s] + cost[i][j] 表第 i 個節點與第 j 個節點都沒被取過，因此取完看看是否會為最小。
  - dp[i][s] + cost[i][j] 表已取完第 i 個節點但第 j 個節點還沒被取。
  以下圖為例
```
dp[3][CBA] 
= dp[3][C.A] + cost[3][B]
= dp[2][CB.] + cost[3][A]
= dp[2][.BA] + cost[3][C]
...
= dp[2][C.A] + cost[3][B] # opt
```
參考素材: 花花酱 LeetCode 1595. Minimum Cost to Connect Two Groups of Points - 刷题找工作 EP358

程式碼






















int min(int a, int b, int c) {
    int m = a < b ? a : b;
    m = m < c ? m : c;
    return m;
}

int connectTwoGroups(int** cost, int costSize, int* costColSize){
    int m = costSize, n = costColSize[0], cap = 1 << n;
    uint16_t dp[m+1][cap];
    memset(dp, 10000000, sizeof(dp)); 
    dp[0][0] = 0;

    for (int i = 0; i < m; ++i) 
        for (int j = 0; j < n; ++j) 
            for (int s = 0; s < cap; ++s) 
                dp[i + 1][s | (1 << j)] = min(dp[i + 1][ s | (1 <<  j )],
                                                                    dp[i + 1][s] + cost[i][j],
                                                                    dp[i][s] + cost[i][j]);


    return dp[m][cap-1];
}

Runtime: 20 ms, faster than 100.00% of C online submissions for Minimum Cost to Connect Two Groups of Points.
Memory Usage: 6.2 MB, less than 100.00% of C online submissions for Minimum Cost to Connect Two Groups of Points.

測驗 2

Skip List 新增節點:

將 “coin flip” 改為插入節點前，事先建立
“coin flip” 原本透過機率，現在取代為特定的雜湊 (hash) 函數，此處選用 djb2
Skip List 的高度原本是隨機結果，但透過雜湊函數，得到一個相依於資料分布的實作

list.h














































































































#include <assert.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>

#define LIST_MAX_HEIGHT 32

typedef unsigned long custom_t; /* specify user-defined type */

typedef struct __list_node {
    uint8_t *key;
    size_t key_len;
    uint8_t height;
    custom_t val;
    struct __list_node *next[1];
} list_node_t;

typedef struct {
    uint8_t height;
    uint32_t bits, reset;
    list_node_t *head;
} skiplist;

#define LIST_ALLOC(e, s)               \
    do {                               \
        e = malloc(s);                 \
        assert(e && "out of memory."); \
        memset(e, 0, s);               \
    } while (0)

#define LIST_HEIGHT(d, str, l, z)                                  \
    do {                                                           \
        bool flip = false;                                         \
        for (z = 0; !flip; z++) {                                  \
            if (!d->reset) {                                       \
                d->bits = 5381; /* djb2 hash */                    \
                for (size_t i = 0; i < l; i++)                     \
                    d->bits = ((d->bits << 5) + d->bits) + str[i]; \
                d->reset = LIST_MAX_HEIGHT;                        \
            }                                                      \
            flip = (bool) d->bits & 1;                             \
            d->bits >>= 1;                                         \
            --(d->reset);                                          \
        }                                                          \
    } while (0)

#define LIST_INIT(d)                                                          \
    do {                                                                      \
        LIST_ALLOC(d, sizeof(skiplist));                                      \
        LIST_ALLOC(d->head, (sizeof(list_node_t) + ((sizeof(list_node_t *)) * \
                                                    (LIST_MAX_HEIGHT - 1)))); \
        d->height = 0;                                                        \
    } while (0)

#define LIST_GET(d, k, l, v) LIST_FIND(d, k, l, v, NULL)

#define LIST_FIND(d, k, l, v, u)                                          \
    do {                                                                  \
        list_node_t *iterator = d->head, **ud = u;                        \
        for (int i = d->height - 1; i >= 0; --i) {                        \
            while (iterator->next[i] &&                                   \
                   memcmp(k, iterator->next[i]->key, l) > 0)              \
                iterator = iterator->next[i];                             \
            if (ud)                                                       \
                ud[i] = iterator;                                         \
        }                                                                 \
        list_node_t *n = iterator->next[0];                               \
        v = (n && l == n->key_len && !memcmp(k, n->key, l)) ? n->val : 0; \
    } while (0)

#define LIST_NODE_INIT(n, k, l, v, h)                                        \
    do {                                                                     \
        LIST_ALLOC(                                                          \
            n, (sizeof(list_node_t) + ((sizeof(list_node_t *)) * (h - 1)))); \
        LIST_ALLOC(n->key, l + 1);                                           \
        memcpy(n->key, k, l);                                                \
        n->val = (custom_t) v, n->height = h, n->key_len = l;                \
    } while (0)

#define LIST_INSERT(d, k, l, v)          \
    do {                                 \
        custom_t vl;                     \
        list_node_t *u[LIST_MAX_HEIGHT]; \
        LIST_FIND(d, k, l, vl, u);       \
        if (vl)                          \
            break;                       \
        int h;                           \
        LIST_HEIGHT(d, k, l, h);         \
        if (h > d->height) {             \
            h = ++(d->height);           \
            u[h - 1] = d->head;          \
        }                                \
        list_node_t *n;                  \
        LIST_NODE_INIT(n, k, l, v, h);   \
        while (--h >= 0) {               \
            n->next[h] = u[h]->next[h];  \
            u[h]->next[h] = n;           \
        }                                \
    } while (0)

#define LIST_ITERATE(d, callback)         \
    do {                                  \
        assert(callback);                 \
        list_node_t *iterator = d->head;  \
        while (iterator->next[0]) {       \
            iterator = iterator->next[0]; callback(iterator);                       \
        }                                 \
    } while (0)

測試程式





































include <stdio.h>
#include "list.h"

static void print_key(list_node_t *n) {
    puts((const char *) n->key);
}

int main()
{
    /* initialize the list */
    skiplist *d;
    LIST_INIT(d);

    /* create a key */
    char *key = malloc(6);
    strncpy(key, "hello", 5);
    key[5] = 0;

    /* create a value */
    char *val = malloc(6);
    strncpy(val, "world", 5);
    val[5] = 0;

    /* insert into the list */
    int l = 5;
    LIST_INSERT(d, key, l, (custom_t) val);

    /* iterate through the list and print */
    LIST_ITERATE(d, print_key);

    /* retrieve from a list */
    custom_t v;
    LIST_GET(d, key, l, v);
    printf("Hello, %s!\n", (char *) v);

    return 0;
}

延伸問題 1

解釋上述程式碼運作原理，探討用 hash 替換 Skip List 裡頭 random 的效益和潛在的議題，並實作 delete 操作

探討測試程式
1. 鍵值對 <key, val> = <"hello", "world">。
2. LIST_INSERT(d, key, l, (custom_t) val) 對 skip list : d 插入鍵值對 <key, val>，其中鍵 : key 的長度 : l。
3. LIST_ITERATE(d, print_key) 遍歷 skip list : d 並 print 出每個 node 的 key。
4. LIST_GET(d, key, l, v) 輸入 d, key, l 並賦予 v 為對應 key 的值。

結構體

skiplist





typedef struct {
uint8_t height;
uint32_t bits, reset;
list_node_t *head;
} skiplist;

__list_node









typedef unsigned long custom_t; /* specify user-defined type */

typedef struct __list_node {
    uint8_t *key;
    size_t key_len;
    uint8_t height;
    custom_t val;
    struct __list_node *next[1];
} list_node_t;

初始化

LIST_INIT









#define LIST_MAX_HEIGHT 32

#define LIST_INIT(d)                                                          \
    do {                                                                      \
        LIST_ALLOC(d, sizeof(skiplist));                                      \
        LIST_ALLOC(d->head, (sizeof(list_node_t) + ((sizeof(list_node_t *)) * \
                                                    (LIST_MAX_HEIGHT - 1)))); \
        d->height = 0;                                                        \
    } while (0)

配置空間給 skip list : d 以及 d 的 head node，並初始 d 的高度為 0。

LIST_ALLOC






#define LIST_ALLOC(e, s)               \
    do {                               \
        e = malloc(s);                 \
        assert(e && "out of memory."); \
        memset(e, 0, s);               \
    } while (0)

配置 size : s 大小的空間，若配置成功則回傳指向其空間的 pointer : e，並將值設為 0。

LIST_NODE_INIT








#define LIST_NODE_INIT(n, k, l, v, h)                                        \
    do {                                                                     \
        LIST_ALLOC(                                                          \
            n, (sizeof(list_node_t) + ((sizeof(list_node_t *)) * (h - 1)))); \
        LIST_ALLOC(n->key, l + 1);                                           \
        memcpy(n->key, k, l);                                                \
        n->val = (custom_t) v, n->height = h, n->key_len = l;                \
    } while (0)

對 list_node_t : n 初始化，配置高度 : h - 1 個 list_node_t 指標來指向目標節點，以及 1 個 list_node_t 空間給 n 。
配置大小為 l + 1 (+ 1 for null terminator) 給 n->key。

搜尋

LIST_FIND













#define LIST_FIND(d, k, l, v, u)                                          \
    do {                                                                  \
        list_node_t *iterator = d->head, **ud = u;                        \
        for (int i = d->height - 1; i >= 0; --i) {                        \
            while (iterator->next[i] &&                                   \
                   memcmp(k, iterator->next[i]->key, l) > 0)              \
                iterator = iterator->next[i];                             \
            if (ud)                                                       \
                ud[i] = iterator;                                         \
        }                                                                 \
        list_node_t *n = iterator->next[0];                               \
        v = (n && l == n->key_len && !memcmp(k, n->key, l)) ? n->val : 0; \
    } while (0)

在 skip list : d 中搜尋 key : k。
從 d 的最高層開始往下搜尋 for (int i = d->height - 1; i >= 0; --i)，利用 memcmp 來比對節點 : iterator next 的 key 值，如果指向的下一個節點非空 iterator->next[i] 且 k 大於 iterator->next[i]->key ，則將 iterator 指向下一個節點 iterator = iterator->next[i]。
Note: memcmp 回傳值 > 0 表 key: k 大於 iterator->next[i]->key 。
若 list_node_t 陣列 u 非空，則讓 u[i] 用當前 iterator 節點指派 if (ud) ud[i] = iterator，其中 u[i] 用來存放高度 i 指向目標節點的節點，而 ud 為指向 u 的 pointer to pointer 。
e.g. (例 1. 此為示意圖，並非透過 hash function 建立出來的實例圖)

  Find key : 3 之後， u =

當找到目標 key 值時， iterator 為指向此目標節點的節點且 level 在最底層，因此 list_node_t *n = iterator->next[0] ， n 即為目標節點，若 n 為合法的節點時，將其 val 指派給 v ，否則 v 被指派為 0 。

LIST_GET


#define LIST_GET(d, k, l, v) LIST_FIND(d, k, l, v, NULL)

將找到的 key 值 val 指派給引數 v。

插入

LIST_HEIGHT















#define LIST_HEIGHT(d, str, l, z)                                  \
    do {                                                           \
        bool flip = false;                                         \
        for (z = 0; !flip; z++) {                                  \
            if (!d->reset) {                                       \
                d->bits = 5381; /* djb2 hash */                    \
                for (size_t i = 0; i < l; i++)                     \
                    d->bits = ((d->bits << 5) + d->bits) + str[i]; \
                d->reset = LIST_MAX_HEIGHT;                        \
            }                                                      \
            flip = (bool) d->bits & 1;                             \
            d->bits >>= 1;                                         \
            --(d->reset);                                          \
        }                                                          \
    } while (0)

當 d->reset 為空 (skip list : d 為初始狀態或 d->reset 被減為 0 )，計算 key : str 的 hash 值作為 d->bits 初始值，而 d->reset 用 LIST_MAX_HEIGHT 設為初始值。
key 的高度 : z 只有在 d->bits 全為 0 時才會增加 for (z = 0; !flip; z++); flip = (bool) d->bits & 1;，因此增加的次數(即高度) 為剩於的 d->reset 值。
每回右移 d->bits 1 位元 d->bits >>= 1; 並將 d->reset 減 1 --(d->reset);，再回到 1. 和 2. 判斷。
((d->bits << 5) + d->bits) 為 d->bits
$\times 33$ 的運算，其中 d->bits 採用質數 5381 作為初始值。
```
 for (size_t i = 0; i < l; i++)                     \
             d->bits = ((d->bits << 5) + d->bits) + str[i]; \
```
key : str 的 Hash Function 為

$x = 33^{l} x_{0} + 33^{l - 1} s t r [0] + 33^{l - 2} s t r [1] + 33^{l - 3} s t r [2] + . . . 33^{0} s t r [l - 1]$ ，其中 x = d->bits,
$x_{0}$ = 5381。
e.g.
str = "hello", l = 5
d->bits =
$33^{5} \times 5381 + 33^{4}^{'} o^{'} + 33^{3}^{'} l^{'} + 33^{2}^{'} l^{'} + 33^{1}^{'} e^{'} +^{'} h^{'}$

LIST_INSERT




















#define LIST_INSERT(d, k, l, v)          \
    do {                                 \
        custom_t vl;                     \
        list_node_t *u[LIST_MAX_HEIGHT]; \
        LIST_FIND(d, k, l, vl, u);       \
        if (vl)                          \
            break;                       \
        int h;                           \
        LIST_HEIGHT(d, k, l, h);         \
        if (h > d->height) {             \
            h = ++(d->height);           \
            u[h - 1] = d->head;          \
        }                                \
        list_node_t *n;                  \
        LIST_NODE_INIT(n, k, l, v, h);   \
        while (--h >= 0) {               \
            n->next[h] = u[h]->next[h];  \
            u[h]->next[h] = n;           \
        }                                \
    } while (0)

先看目標 key : k 有無存在 skip list : d 中，若存在，則 vl 不為 0，表不需要新增節點，離開迴圈。
不存在時，先透過 hash function (LIST_HEIGHT) 決定一個高度 : h，當此高度 h 比 skip list 當前高度還高時 if (h > d->height)，h 與 d->height 皆被設為 d->height + 1 (HHH 為 h = ++(d->height);)，並將 u[最高層] 指派為 head node，而此時 list_node_t 陣列 u 為存放指向 key : k 合法位置的節點。
e.g. (以例 1 為例，找不存在於 skip list 中的節點後的 u ，假設插入 key : 3.5，而 h 經計算後需 +1 )

     while (--h >= 0) {               \
         n->next[h] = u[h]->next[h];  \
         u[h]->next[h] = n;           \
     }

初始化節點 n 後，將 n 插入到合法的位置。

e.g. (例 1.)

h = 4

h = 3

h = 2

h = 1

h = 0，完成插入

head 每一層指向的節點都是恰使 skip list 高度 : d->height + 1 的節點。

Iterator

LIST_ITERATE








#define LIST_ITERATE(d, callback)         \
    do {                                  \
        assert(callback);                 \
        list_node_t *iterator = d->head;  \
        while (iterator->next[0]) {       \
             iterator = iterator->next[0]; callback(iterator);                \
        }                                 \
    } while (0)

走訪 skip list : d 內的每一個節點去執行函式 : callback，而節點中的最底層指向最鄰近的節點，因此 III 為 iterator = iterator->next[0]; callback(iterator); 。

實作刪除
索引中對應的節點需一併刪除。

LIST_DELETE
















#define LIST_DELETE(d, k, l)             \
    do {                                 \
        custom_t vl;                     \
        list_node_t *u[LIST_MAX_HEIGHT], *target; \
        LIST_FIND(d, k, l, vl, u);       \
        if (!vl)                         \
            break;                       \
        int h = d->height;               \
        while(--h >= 0) {                \
            if (u[h]->next[h] && memcmp(k, u[h]->next[h]->key, l) == 0) {   \
                target = u[h]->next[h];  \
                u[h]->next[h] = u[h]->next[h]->next[h]; \
            }   \
        }                                \
        free(target);   \
    } while (0)

先透過 LIST_FIND 搜尋目標鍵值 : k 有無存在 skip list : d 中，若不存在 vl 為 0，表不需要刪除，因此跳出迴圈即離開函式。
否則從 skip list 當前的高度開始往下走訪 u， u 為指向目標節點的 list_node_t 陣列，並將其 next 指向目標節點的 next 來達到刪除的效果，而在之前先用一個 list_node_t 指標 target 指向目標節點，並在最後釋放所指向的記憶體空間。
e.g. (以上圖為例，Find key : 9 之後， u 為 )

探討用 hash 替換 Skip List 裡頭 random 的效益和潛在的議題
本題將 hash function 作為 Pseudo Random Number Generator (PRNG) 來使用，實作的 github-skip list

Random level (coin flip)






#define LIST_RANDLEVEL(d, z)                \
    do {                                                                 \
        z = 1;                                                      \
        while (rand() < RAND_MAX/2 && z < LIST_MAX_HEIGHT)          \
            z++;                                                   \
    } while (0)

Sample 出高度 z 的機率為

{0.5}^{z}

，其中 z = 1, 2, 3, …, LIST_MAX_HEIGHT - 1 。
於 LIST_INSERT 內替換掉 LIST_HEIGHT，並修改如下

先透過 LIST_RANDLEVEL 決定此新節點的高度 : h，如超出當前 skip list : d 的最大高度，則最大高度改為 h。
於 LIST_INIT(d) 內加入 srand(time(NULL)); 使得每次建 list 的節點高度都不相同。




















#define LIST_INSERT_RANDLEVEL(d, k, l, v)          \
    do {                                 \
        custom_t vl;                     \
        list_node_t *u[LIST_MAX_HEIGHT]; \
        int h;                           \
        LIST_RANDLEVEL(d, h);            \
        if (h > d->height) {             \
            d->height = h;               \
            u[h - 1] = d->head;          \
        }   \
        LIST_FIND(d, k, l, vl, u);       \
        if (vl)                          \
            break;                       \
        list_node_t *n;                  \
        LIST_NODE_INIT(n, k, l, v, h);   \
        while (--h >= 0) {               \
            n->next[h] = u[h]->next[h];  \
            u[h]->next[h] = n;           \
        }                                \
    } while (0)

Xorshift (LevelDB)
參考 grb72t3yde 同學對 LevelDB 研究的專題報告，嘗試實作 LevelDB 所使用到的 xorshift PRNG

xorshift in LevelDB












































--- a/db/skiplist.h
+++ b/db/skiplist.h
@@ -32,10 +32,22 @@
 #include <cstdlib>

 #include "util/arena.h"
-#include "util/random.h"

 namespace leveldb {

+struct rand_internal {
+  uint16_t s;
+};
+
+/* 16-bit xorshift PRNG */
+static inline uint16_t Rand(struct rand_internal* x) {
+  uint16_t xs = x->s;
+  xs ^= xs << 7;
+  xs ^= xs >> 9;
+  xs ^= xs << 8;
+  return (x->s = xs);
+}
+
 class Arena;

 template <typename Key, class Comparator>
@@ -138,7 +150,7 @@ class SkipList {
   std::atomic<int> max_height_;  // Height of the entire list

   // Read/written only by Insert().
-  Random rnd_;
+  struct rand_internal rnd_;
 };

 // Implementation details follow
@@ -243,7 +255,7 @@ int SkipList<Key, Comparator>::RandomHeight() {
   // Increase height with probability 1 in kBranching
   static const unsigned int kBranching = 4;
   int height = 1;
-  while (height < kMaxHeight && ((rnd_.Next() % kBranching) == 0)) {
+  while (height < kMaxHeight && (Rand(&rnd_) % kBranching) == 0) {
     height++;
   }
   assert(height > 0);

Xorshift






#define LIST_XORSHIFT(d)            \
    do {                            \
        d->state ^= d->state << 7;  \
        d->state ^= d->state >> 9;  \
        d->state ^= d->state << 8;  \
    } while (0)                     \

於 skip list 的結構體中新增 uint16_t state; 的成員，並在初始時給予 rand() 當作初值。
在插入新節點時對 state 取 mod kBranching，若能整除 kBranching 則增加節點高度，故 kBranching 的用意是來當作增加節點高度的機率值，如 kBranching = 4 表示要使節點高度加一的機率為 1/4 。























#define LIST_INSERT_XORSHIFT(d, k, l, v) \
    do {                                 \
        custom_t vl;                     \
        list_node_t *u[LIST_MAX_HEIGHT]; \
        int h = 1;                       \
        do {                             \
            LIST_XORSHIFT(d);            \
            h++;                         \
        } while(h < LIST_MAX_HEIGHT && (d->state % kBranching) == 0);   \
        if (h > d->height) {             \
            d->height = h;               \
            u[h - 1] = d->head;          \
        }   \
        LIST_FIND(d, k, l, vl, u);       \
        if (vl)                          \
            break;                       \
        list_node_t *n;                  \
        LIST_NODE_INIT(n, k, l, v, h);   \
        while (--h >= 0) {               \
            n->next[h] = u[h]->next[h];  \
            u[h]->next[h] = n;           \
        }                                \
    } while (0)

效能比較

bench.c(測試程式碼)










































































































































































































#include <stdio.h>
#include <time.h>
#include "list.h"

int file_exists(const char *fname) {
    FILE *file;
    if ((file = fopen(fname, "r"))) {
        printf("test file is exist ... \n");
        fclose(file);
        return 1;
    }
    return 0;
}

double tvgetf()
{
    struct timespec ts;
    double sec;

    clock_gettime(CLOCK_REALTIME, &ts);
    sec = ts.tv_nsec;
    sec /= 1e9;
    sec += ts.tv_sec;
    return sec;
}

static void print_key(list_node_t *n) {
    printf("key, val = %s, %s \n", n->key, n->val);
}

static void rand_key(char *str, size_t size) {
    const char charset[] = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
    while(size-- > 0) {
        size_t index = (double) rand() / RAND_MAX * (sizeof charset - 1);
        *str++ = charset[index];
    }
    *str = '\0';
}

void generate_sparse_test_file(int sampletimes, int MAX_LEN, char *IN_FILE) {
    FILE *fp;
    fp = fopen(IN_FILE, "w");
    int l;
    char *key = malloc(MAX_LEN + 1);

    while(sampletimes--) {
        l = (double) rand() / RAND_MAX * MAX_LEN + 1;
        rand_key(key, l);
        fprintf(fp, "%s\n", key);
    }
    printf("test file %s is generated ... \n", IN_FILE);
}

void test_insert(skiplist *d, int mode, FILE *rep_file, int MAX_LEN, char *IN_FILE) {
    FILE *fp;
    fp = fopen(IN_FILE, "r");

    double t1, t2;
    char buf[MAX_LEN + 2];

    if (mode == 0) {
        printf("Insert with hash function... \n");
        t1 = tvgetf();
        while(fgets(buf, MAX_LEN + 2, fp)) {
            char *key = malloc(MAX_LEN + 1), *val = malloc(MAX_LEN + 1);

            for (int i = 0; i < MAX_LEN + 1; i++) {
                key[i] = (buf[i] == '\n') ? '\0' : buf[i];
                val[i] = (buf[i] == '\n') ? '\0' : buf[i];
            }

            LIST_INSERT_HASHFUNC(d, key, MAX_LEN, val);
        }
        t2 = tvgetf();

    } else if (mode == 1) {
        printf("Insert with rand level... \n");
        t1 = tvgetf();
        while(fgets(buf, MAX_LEN + 2, fp)) {
            char *key = malloc(MAX_LEN + 1), *val = malloc(MAX_LEN + 1);

            for (int i = 0; i < MAX_LEN + 1; i++) {
                key[i] = (buf[i] == '\n') ? '\0' : buf[i];
                val[i] = (buf[i] == '\n') ? '\0' : buf[i];
            }

            LIST_INSERT_RANDLEVEL(d, key, MAX_LEN, val);
        }
        t2 = tvgetf();
    } else if (mode == 2) {
        printf("Insert with xorshift ... \n");
        t1 = tvgetf();
        while(fgets(buf, MAX_LEN + 2, fp)) {
            char *key = malloc(MAX_LEN + 1), *val = malloc(MAX_LEN + 1);

            for (int i = 0; i < MAX_LEN + 1; i++) {
                key[i] = (buf[i] == '\n') ? '\0' : buf[i];
                val[i] = (buf[i] == '\n') ? '\0' : buf[i];
            }
            LIST_INSERT_XORSHIFT(d, key, MAX_LEN, val);
        }
        t2 = tvgetf();
    }

    fprintf(rep_file, "Insert total cost %.6f msec \n", (t2 - t1) * 1000);

}

void test_find(skiplist *d, FILE *rep_file, int MAX_LEN, char *IN_FILE) {
    FILE *fp;
    fp = fopen(IN_FILE, "r");

    char buf[MAX_LEN + 2];
    double t1, t2;

    t1 = tvgetf();
    while (fgets(buf, MAX_LEN + 2, fp)) {
        custom_t v;
        char *key = malloc(MAX_LEN + 1);

        for (int i = 0; i < MAX_LEN + 1; i++)
            key[i] = (buf[i] == '\n') ? '\0' : buf[i];

        LIST_GET(d, key, MAX_LEN, v);

    }
    t2 = tvgetf();
    fprintf(rep_file, "Find all total cost %f msec\n", (t2 - t1) * 1000);

}

void test_delete(skiplist *d, FILE *rep_file, int MAX_LEN, char *IN_FILE) {
    FILE *fp;
    fp = fopen(IN_FILE, "r");

    double t1, t2;
    char buf[MAX_LEN + 2];

    t1 = tvgetf();
    while (fgets(buf, MAX_LEN + 2, fp)) {
        char *key = malloc(MAX_LEN + 1);

        for (int i = 0; i < MAX_LEN + 1; i++)
            key[i] = (buf[i] == '\n') ? '\0' : buf[i];

        LIST_DELETE(d, key, MAX_LEN);

    }
    t2 = tvgetf();

    fprintf(rep_file, "Delete all total cost %f msec\n", (t2 - t1) * 1000);
}

int main(int argc, const char* argv[])
{
    /* */
    if (argc != 6) {
    printf("Usage: num elements, MAX_LEN, insert mode : 0 : HASHFUNC, 1 : RANDLEVEL, 2 : XORSHIFT, test file name, report file name \n");
    return 1;
    }

    int num_elements = atoi(argv[1]);
    int MAX_LEN = atoi(argv[2]);
    int insert_mode = atoi(argv[3]);
    char *IN_FILE = (char *) argv[4];
    const char *REPORT_FILE = argv[5];

    /* test file and report file*/
    if (!file_exists(IN_FILE)) {
        /* sample and output the test data*/
        generate_sparse_test_file(num_elements, MAX_LEN, IN_FILE);
    }

    FILE *rep_file;
    rep_file = fopen(REPORT_FILE, "w");

    /* initialize the list */
    skiplist *d;
    LIST_INIT(d);

    /* build and test skip list d */
    test_insert(d, insert_mode, rep_file, MAX_LEN, IN_FILE); // 0 : HASHFUNC, 1 : RANDLEVEL, 2 : XORSHIFT
    test_find(d, rep_file, MAX_LEN, IN_FILE);
    test_delete(d, rep_file, MAX_LEN, IN_FILE);

    /* iterate through the list and print */
    /*
    double t1, t2;
    t1 = tvgetf();
    LIST_ITERATE(d, print_key);
    t2 = tvgetf();
    fprintf(rep_file, "Iterate all node and print total cost %f msec\n", (t2 - t1) * 1000);

    test_delete(d, rep_file, MAX_LEN, IN_FILE);

    printf("After delete, iterator \n");
    LIST_ITERATE(d, print_key);
    */

    printf("done \n");
    return 0;
}

Makefile

TARGET = bench
CC = gcc
CFLAGS = -O3 -g

default: $(TARGET)
all: default

OBJS = $(patsubst %.c, %.o, $(wildcard *.c))
HEADERS = $(wildcard *.h)

%.o:%.c $(HEADERS)
    $(CC) $(CFLAGS) -c $<-o$@

.PRECIOUS: $(TARGET) $(OBJECTS)

test1: 
    ./$(TARGET) 1000000 8 0 "test1_data.txt" "test1_rep_djb2.txt"
    ./$(TARGET) 1000000 8 1 "test1_data.txt" "test1_rep_rand.txt"
    ./$(TARGET) 1000000 8 2 "test1_data.txt" "test1_rep_xorshift.txt"

test2:
    ./$(TARGET) 10000000 8 0 "test2_data.txt" "test2_rep_djb2.txt"
    ./$(TARGET) 10000000 8 1 "test2_data.txt" "test2_rep_rand.txt"
    ./$(TARGET) 10000000 8 2 "test2_data.txt" "test2_rep_xorshift.txt"

plot:
    echo 3 | sudo tee /proc/sys/vm/drop_caches;
    sudo perf stat --repeat 10 \
        -e cache-misses,cache-references,instructions,cycles \
        ./$(TARGET) 1000000 8 0 "test1_data.txt" "test1_rep_djb2.txt"
    sudo perf stat --repeat 10 \
        -e cache-misses,cache-references,instructions,cycles \
        ./$(TARGET) 1000000 8 1 "test1_data.txt" "test1_rep_rand.txt"
    sudo perf stat --repeat 10 \
        -e cache-misses,cache-references,instructions,cycles \
        ./$(TARGET) 1000000 8 2 "test1_data.txt" "test1_rep_xorshift.txt"
        
valgrind:
        valgrind --tool=massif --stacks=yes --time-unit=i ./$(TARGET) 1000000 8 0 "test1_data.txt" "test1_rep_djb2.txt"
        valgrind --tool=massif --stacks=yes --time-unit=i ./$(TARGET) 1000000 8 1 "test1_data.txt" "test1_rep_rand.txt"
        valgrind --tool=massif --stacks=yes --time-unit=i ./$(TARGET) 1000000 8 2 "test1_data.txt" "test1_rep_xorshift.txt"

testall: test1 test2

clean:
    $(RM) *.o
    $(RM) $(TARGET)
    $(RM) *.txt

使用

make
./bench num_elements MAX_LEN insert_mode "TEST_DATA.txt" "TEST_REPORT.txt"
make testall

字母集為 10 個數字 + 26 個英文大小寫，共 62 個字母。

測試資料共一千萬筆，為隨機生成長度 MAX_LEN = 8 之內的字串 (其中可能含重複的字串)。

範例測資

oMNUckLhy
CmvXU
I8B1f8N
o8
Zd
QBiDwu
iLwLoThlOU
...

測試結果
分別測試 INSERT(INS)、FIND(F) 和 DELETE(DEL) 三種函式，插入、查詢和刪除所有節點。
- Insert
一千萬筆測資單位 msec

djb2 61875.333786 msec

rand() 16907.362938 msec

xorshift 19749.794960 msec
- Find
一千萬筆測資單位 msec

djb2 63355.950594 msec

rand() 18052.890301 msec

xorshift 24746.976852 msec
- Delete
一千萬筆測資單位 msec

djb2 58632.202387 msec

rand() 17096.779823 msec

xorshift 19624.391794 msec

一千萬筆測資	單位 msec
`djb2`	61875.333786 msec
`rand()`	16907.362938 msec
`xorshift`	19749.794960 msec

一千萬筆測資	單位 msec
`djb2`	63355.950594 msec
`rand()`	18052.890301 msec
`xorshift`	24746.976852 msec

一千萬筆測資	單位 msec
`djb2`	58632.202387 msec
`rand()`	17096.779823 msec
`xorshift`	19624.391794 msec

Cache 表現

 Performance counter stats for './bench 1000000 8 0 test1_data.txt test1_rep_djb2.txt' (10 runs):

     1,233,191,770      cache-misses              #   49.749 % of all cache refs      ( +-  0.86% )
     2,478,829,780      cache-references                                              ( +-  0.15% )
    12,395,460,050      instructions              #    0.19  insn per cycle           ( +-  0.02% )
    63,829,551,406      cycles                                                        ( +-  0.60% )

           13.6402 +- 0.0970 seconds time elapsed  ( +-  0.71% )

 Performance counter stats for './bench 1000000 8 1 test1_data.txt test1_rep_rand.txt' (10 runs):

       283,392,758      cache-misses              #   29.899 % of all cache refs      ( +-  0.80% )
       947,841,268      cache-references                                              ( +-  0.53% )
     8,026,118,893      instructions              #    0.50  insn per cycle           ( +-  0.97% )
    15,958,569,297      cycles                                                        ( +-  0.37% )

            3.4067 +- 0.0147 seconds time elapsed  ( +-  0.43% )

 Performance counter stats for './bench 1000000 8 2 test1_data.txt test1_rep_xorshift.txt' (10 runs):

       299,773,923      cache-misses              #   31.409 % of all cache refs      ( +-  0.63% )
       954,432,333      cache-references                                              ( +-  0.15% )
     8,088,534,816      instructions              #    0.50  insn per cycle           ( +-  0.31% )
    16,138,669,308      cycles                                                        ( +-  0.39% )

            3.4421 +- 0.0157 seconds time elapsed  ( +-  0.46% )

Valgrind 記憶體快照

djb2
rand()
xorshift

分析

Hash function 決定當前節點高度的方式為，如果 skip list 的 hash 值 : d->bits 為 0 時才會增加，而增加的次數為剩餘的 d->reset 值，當前 hash 值 d->bits 則當 d->reset 被減為 0 時才會重設，透過計算當前插入的節點鍵值之 hash 值來作為重設值。
因此，當 hash 值 : d->bits 之最左側位元 1 的位置 (即 LIST_MAX_HEIGHT - 1 -__builtin_clz(d->bits)) 一直都大於 d->reset 時，會造成 d->bits 還來不及右移成 0 讓高度 : z 增加，而 d->reset 早已被減為 0，使得 d->bits 又被重設，因此大多節點的高度仍維持在低層，建出來的 skip list 偏向單向鍊結而失去 skip 的功能，所以插入/搜尋/刪除 n 筆資料的成本會退化為

O (n^{2})

。

改進 Hash function
盡量讓 d->bits 最左側位元 1 的位置少於 d->reset 值，因此將 d->bits 初始值 5381 改為 2，並每次右移 d->bits 2 個位元。















#define LIST_HEIGHT(d, str, l, z)                                  \
    do {                                                           \
        bool flip = false;                                         \
        for (z = 0; !flip; z++) {                                  \
            if (!d->reset) {                                       \
                d->bits = 2; /* djb2 hash */                    \
                for (size_t i = 0; i < l; i++)                     \
                    d->bits = ((d->bits << 5) + d->bits) + str[i]; \
                d->reset = LIST_MAX_HEIGHT;                        \
            }                                                      \
            flip = (bool) d->bits & 1;                             \
            d->bits >>= 2;                                         \
            --(d->reset);                                          \
        }                                                          \
    } while (0)

一千萬筆測資	單位 msec
Insert	29932.145596 msec
Find	34056.064129 msec
Delete	29762.346745 msec

增加右移量使 d->bits 比 d->reset 提早為 0，增加節點高度。

c 語言的 rand() 為 Linear congruential generator，優點在於速度快但週期短缺乏安全性，只要取得亂數種子即可預測接下來的亂數值。

a 16-bit LCG with high 8-bit output
from Visualizing the Heart of Some PRNGs

[ISO/IEC 9899:1999] 7.20.2 Pseudo-random sequence generation functions






static unsigned long int next = 1;
int rand(void) // RAND_MAX assumed to be 32767
{
next = next * 1103515245 + 12345;
return (unsigned int)(next/65536) % 32768;
}

Xorshift 則是 Cryptographically secure pseudorandom number generator 當中執行最快速的，具有 "隨機性：看起來夠亂，沒有規律，所有數字分佈平均以及不可預測性：無法從之前的亂數數列猜出下一個亂數的值" 的特性，因此相較於 rand() 更具安全性。

Example random distribution of Xorshift128
from wiki-Xorshift

採用 hash function 的效益與潛在議題
Pseudo random number generator (PRNG) 是一種生成近似隨機序列的演算法，但生成結果會相依於初始值 (亂數種子, seed)，只要能取得亂數種子就有可能預測接下來的亂數，且參數若沒有設定好則可能導致數值出現的週期過短，常常輸出重複值;
而 hash function 則是可以根據某次輸入的資料來輸出亂數值，因此當輸入資料範圍大且輸入順序不可預測時，輸出的亂數值則可以夠亂，但潛在的風險是過於依賴資料分佈，若資料彼此之間太相近則會輸出相同值，造成節點高度大都一樣，使 skip list 退化成一般的 linked list;
Skip list 是一種概率的資料結構，根據 PRNG 來決定整體結構，因此當生成的亂數值不夠亂的時候則很容易使節點分佈過於一致進而降低效能。

延伸問題 2

研讀 Cache-Sensitive Skip List: Efficient Range Queries on modern CPUs，指出考慮到現代處理器架構的 Skip List 實作，有哪些考量和取捨

另外撰寫 CSSL，摘要如下

採用上述的 fat keys (節點內包含許多資料，如指向每個通道下個節點的指針以及數組資料) 來實作 skip list ，由於節點皆一致因此具有方便實作的優點，但缺乏 locality 的性質故不利於 cache 的加速，例如一個具有 5 個通道且每個節點存放一個 32-bit 整數鍵值的 skip list，則每個節點需佔用 4 + (5+1) * 8 = 52 bytes 的空間大小，且假設 cache line 為 64 bytes ，因此遍歷一個節點幾乎佔用了整個 cache line 的空間，且指針指向的位址大多皆不為連續的，從而導致大量的 cache miss 。
Cache-Sensitive Skip List (CSSL) 基於 balanced skip list 作為骨架，將每層的通道採用陣列來實作，由於 balanced skip list 的第 i + 1 level 通道為 level i 通道的每 1/p 位的元素所組成，例如下圖 p = 0.5 ， Level 2 的元素為 Level 1 的每 1/0.5 = 2 位的元素 (1, 5, 7, …) 所組成，因此通道的元素數量是已知的，由此也可簡單的計算出後續元素位置，而採用陣列來實作具有 spatial locality 的特性，減少 cache miss 的發生，同時增加更多的空間利用性，上述/傳統的實作使得大部分節點的指針指向 NULL，需要
$n * (m * r + r + k)$ 的空間，而 CSSL 只須
$n * (r + k) + \sum_{i = 1}^{m} p^{i} * n * k$ ，其中 n 為資料量、m 為通道數。
陣列實作的通道可支援 SIMD 暫存器使 skip list 的操作可以並行處理。

雖然 CSSL 可以較有效地利用 cache ，但仍需考慮一些情況使用，如陣列要求配置連續的記憶體空間，因此當可用的記憶體空間過於分散時，則可能會配置失敗或是需要減少通道數，導致效能降低，而插入的鍵值比當前最大鍵值還大時，則需要重新配置通道陣列，因此若在不確定範圍數值的情況下使用則可能花費大量的重新配置通道時間成本。

延伸問題 3

練習 LeetCode 1206. Design Skiplist，除了提交 C 語言的實作並通過自動評分系統，也該提供正確性測試和效能分析

2020q3 Homework (quiz8)

tags: homework

測驗 1

延伸問題 1

延伸問題 2

延伸問題 3

延伸問題 4

測驗 2

延伸問題 1

延伸問題 2

延伸問題 3

Read more

Assignment3: SoftCPU

2020q3 Homework (quiz5)

Assignment2: RISC-V Toolchain

Assignment2: RISC-V Toolchain

tags: `homework`