2020q3 Homework9 (quiz9)

--- tags: 進階電腦系統理論與實作 --- # 2020q3 Homework9 (quiz9) contributed by < `RinHizakura` > > [第 9 週測驗題](https://hackmd.io/@sysprog/2020-quiz9) > [GitHub](https://github.com/RinHizakura/NCKU-System-Software-Quiz/tree/master/quiz9) ## 測驗 `1` ### 程式原理 ```cpp int main(int argc, char *argv[]) { splitmix_init(); /* Input and expected out data for the 2-input Multiplexer */ const double input[8][3] = { {0, 0, 0}, {0, 0, 1}, {0, 1, 0}, {0, 1, 1}, {1, 0, 0}, {1, 0, 1}, {1, 1, 0}, {1, 1, 1}, }; const double output[8] = { 0, 0, 1, 1, 0, 1, 0, 1, }; /* New network with 3 inputs, 1 hidden layer of 2 neurons, * and 1 output. */ ann_t *ctx = ann_init(3, 1, 2, 1); ... ``` 逐行思考 main 函式的運行，其中呼叫函式的細節列於下方 * 透過 `ann_init`，根據目標的輸入/隱藏/輸出層與神經元數量初始化 `ann_t` 結構 ```cpp ... double err, last_err = LOOP_MAX; int count = 0; do { ++count; if (count % LOOP_MAX == 0) { ann_randomize(ctx); last_err = LOOP_MAX; } ann_t *save = ann_copy(ctx); /* Take a random guess at the ANN weights. */ for (int i = 0; i < ctx->total_weights; ++i) ctx->weight[i] += RRR; err = 0; for (int k = 0; k < (1 << 3); k++) err += pow(*ann_run(ctx, input[k]) - output[k], 2.0); /* Keep these weights if they're an improvement. */ if (err < last_err) { ann_free(save); last_err = err; } else { ann_free(ctx); ctx = save; } } while (err > 0.01); printf("Finished in %d loops.\n", count); ``` * `ann_copy` 複製 `ann_t` 結構 * 隨機調整 ANN 的權重，由於 `ANN_RAND()` 的範圍為 0 到 1 之間，因此 `RRR = (c) ANN_RAND() - 0.5` 可以使調整往正負的方向都可以前進 * 根據新的權重計算的結果與預期的結果(label)計算 error，可以看到使用的 error 為 [residual sum of squares](https://en.wikipedia.org/wiki/Residual_sum_of_squares) * 如果 error 變小，則保留此更新，否則使用原始的模型 * 持續進行直到 err 小於 threshold ```cpp /* Run the network and see what it predicts. */ for (int k = 0; k < (1 << 3); k++) printf("[%1.f, %1.f, %1.f] = %1.f.\n", input[k][0], input[k][1], input[k][2], *ann_run(ctx, input[k])); ann_free(ctx); return 0; ``` * 最後則輸出模性根據不同輸入的結果，並且程式結束時釋放借來的空間 #### `splitmix_init` ```cpp static void splitmix_init() { /* Hopefully, ASLR makes our function address random */ uintptr_t x = (uintptr_t)((void *) &splitmix_init); struct timespec time; clock_gettime(CLOCK_MONOTONIC, &time); x ^= (uintptr_t) time.tv_sec; x ^= (uintptr_t) time.tv_nsec; splitmix_x = x; /* do a few randomization steps */ uintptr_t max = ((x ^ (x >> 17)) & 0x0F) + 1; for (uintptr_t i = 0; i < max; i++) splitmix(); } ``` * 取得 `splitmix_init()` 得函式位址，`uintptr_t` 使得可以根據機器的位元數得到完整的位址 * 再進一步根據執行致該段時的時間 / splitmix 調整隨機值 :::info TODO: 研究 [splitmix](https://dl.acm.org/doi/10.1145/2660193.2660195) 為何並詳細整理其中細節 ::: #### `ann_init` ```cpp /* Create and return a new network context */ ann_t *ann_init(int inputs, int hidden_layers, int hidden, int outputs) { if (hidden_layers < 0) return 0; if (inputs < 1) return 0; if (outputs < 1) return 0; if (hidden_layers > 0 && hidden < 1) return 0; ... ``` * 對於輸入/隱藏/輸出層，先確認是否有錯誤的設定 ```cpp ... const int hidden_weights = hidden_layers ? (inputs + 1) * hidden + (hidden_layers - 1) * (hidden + 1) * hidden : 0; const int output_weights = (hidden_layers ? (WWW) : (inputs + 1)) * outputs; const int total_weights = (hidden_weights + output_weights); const int total_neurons = (inputs + hidden * hidden_layers + outputs); /* Allocate extra size for weights, outputs, and deltas. */ const int size = sizeof(ann_t) + sizeof(double) * (total_weights + total_neurons + (total_neurons - inputs)); ann_t *ret = malloc(size); assert(ret); ret->inputs = inputs; ret->hidden_layers = hidden_layers; ret->hidden = hidden; ret->outputs = outputs; ret->total_weights = total_weights; ret->total_neurons = total_neurons; ret->weight = (double *) ((char *) ret + sizeof(ann_t)); ret->output = ret->weight + ret->total_weights; ret->delta = ret->output + ret->total_neurons; ... ``` * `hidden_weights` 為輸出層與隱藏層，以及隱藏層與隱藏層間的權重數量，考慮到 bias 的加入，因此如果有隱藏層，數量為 `(inputs + 1) * hidden`(輸入與隱藏層的 neuron 連線的總數量) 加上 `(hidden_layers - 1) * (hidden + 1) * hidden`(隱藏層與下個隱藏層的 neuron 連線的總數量)，否則為零 * `output_weights` 則是輸出層與其前一層的 weight 數量，可知如果沒有隱藏層就是 `(inputs + 1) * outputs`，如果有就是 `(hidden + 1) * outputs`，因此 `WWW = (b) hidden + 1` (注意括號位置) ![](https://i.imgur.com/WAmRb9z.png) > 圖源: https://stackoverflow.com/questions/28288489/neural-networks-does-the-input-layer-consist-of-neurons * 借用上面這張圖來解說更為清楚： * input layer 與 hidden layer #1 間的權重數量是 $(4 + 1) \times 3$，hidden layer #1 與 hidden layer #2 間權重數量是 $(3 + 1) \times 3$，兩者之和就是 `hidden_weights` * output layer 與前一層的權重數量是 $(3 + 1) \times 1$ 兩者之和是 `output_weights` * 則總 neuron 數量與總 weight 數的計算應該很容易理解，根據前面計算的資訊可以算出 `ann_t` 結構需要的空間，並設定相關的 struct member 內容(可以看到最後的 3 個 `double *` 透過計算調整到正確的記憶體位址) ```cpp ... ann_randomize(ret); ret->activation_hidden = ann_act_sigmoid_cached; ret->activation_output = ann_act_sigmoid_cached; ann_init_sigmoid_lookup(ret); return ret; } ``` * `ann_randomize` 將權重以隨機的數字初始化 * 設定激勵函數的 function pointer * `ann_init_sigmoid_lookup` 會建立 sigmoid 對應的 lookup table，透過查表減少運算量 #### `ann_run` ```cpp /* Run the feedforward algorithm to calculate the output of the network. */ double const *ann_run(ann_t const *ctx, double const *inputs) { double const *w = ctx->weight; double *o = ctx->output + ctx->inputs; double const *i = ctx->output; /* Copy the inputs to the scratch area, where we also store each neuron's * output, for consistency. This way the first layer isn't a special case. */ memcpy(ctx->output, inputs, sizeof(double) * ctx->inputs); ``` * `memcpy` 將輸入內容複製到輸入層的 neuron(注意到雖然命名為 `ctx->output`，但其實是用以儲存 neuron 數值的，因此其中包含輸入與輸出) ```cpp if (!ctx->hidden_layers) { double *ret = o; for (int j = 0; j < ctx->outputs; ++j) { double sum = *w++ * (-1.0); for (int k = 0; k < ctx->inputs; ++k) sum += *w++ * i[k]; *o++ = ann_act_output_indirect(ctx, sum); } return ret; } ``` * 如果沒有 hidden layer 的話，output 來自權重和輸入層的總和 * `*w++ * (-1.0)` 計算 bias 項乘上權重，後續則為每個 neuron 乘上對應權重 * `ann_act_output_indirect` 透過激勵函數處理 output 值 ```cpp /* Figure input layer */ for (int j = 0; j < ctx->hidden; ++j) { double sum = *w++ * (-1.0); for (int k = 0; k < ctx->inputs; ++k) sum += *w++ * i[k]; *o++ = ann_act_hidden_indirect(ctx, sum); } i += ctx->inputs; /* Figure hidden layers, if any. */ for (int h = 1; h < ctx->hidden_layers; ++h) { for (int j = 0; j < ctx->hidden; ++j) { double sum = *w++ * (-1.0); for (int k = 0; k < ctx->hidden; ++k) sum += *w++ * i[k]; *o++ = ann_act_hidden_indirect(ctx, sum); } i += ctx->hidden; } double const *ret = o; /* Figure output layer. */ for (int j = 0; j < ctx->outputs; ++j) { double sum = *w++ * (-1.0); for (int k = 0; k < ctx->hidden; ++k) sum += *w++ * i[k]; *o++ = ann_act_output_indirect(ctx, sum); } assert(w - ctx->weight == ctx->total_weights); assert(o - ctx->output == ctx->total_neurons); return ret; } ``` * 有 hidden layer 的話其實也類似，只是把 neuron 根據權重不斷向下更新而已 * 值得注意的是，程式中巧妙的安排 `i` 和 `o` 的位置，則當用第 k 層對 k + 1 層的 neuron 做更新時，下一個迴圈中對位址的計算會剛好變成用第 k + 1 層對 k + 2 層做更新 ## 測驗 `2` ### 程式原理逐行研讀程式的行為。 ```cpp typedef struct { uint16_t row[9], col[9], box[9]; } masks_t; const uint16_t ALL_NUMS = 0x1FF; void solveSudoku(char **board, int boardSize, int *boardColSize) { masks_t m; for (int i = 0; i < 9; ++i) m.row[i] = m.col[i] = m.box[i] = ALL_NUMS; for (int y = 0; y < 9; ++y) { for (int x = 0; x < 9; ++x) { if (board[y][x] != '.') { int n = board[y][x] - '1'; uint16_t k = ~(1U << n); m.row[x] &= k, m.col[y] &= k, m.box[bi(x, y)] &= k; } } } ``` * 由題目可知，需要檢查的其實就是 9 個 row / 9 個 column / 9 個 3 $\times$ 3 方格中是否有重複的出現，因此對這些位置先初始化一個包含 9 個 1 的 mask * 先根據現有的數字情況，將其從字元轉成對應整數後調整 mask * 注意到 box 透過 `bi` 去計算座標 `(x,y)` 對應到的 3 $\times$ 3 方格 index ```cpp for (int y = 0; y < 9; ++y) { for (int x = 0; x < 9; ++x) { if (board[y][x] != '.') continue; uint16_t mask = m.row[x] & m.col[y] & m.box[bi(x, y)]; if ((mask & (MMM)) == 0) { size_t n = __builtin_ctz(mask); uint16_t k = ~(1U << n); board[y][x] = '1' + n; m.row[x] &= k, m.col[y] &= k, m.box[bi(x, y)] &= k; } } } solve(board, &m); } ``` * 取回 mask，先將棋盤中僅有唯一解(如果只考慮該格只有一種可行答案)的部份填入，判斷該 `(x,y)` 座標中可以先放入的數字，並且更新棋盤與 mask * 事實上，直接省略此迴圈部份並透過 `solve` 求解出也並無不可，但是先將有唯一解的答案填入可以減少遞迴數量 * 因此，另一種作法是重複執行此迴圈，直到剩下的空格都沒有唯一解時，再透過 `solve` 求解 * 根據上個敘述可知 `MMM = (e) mask - 1`，這是一個常見到幾乎可以背起來的 bit operation 技巧(參閱 [x & (x - 1) == 0](https://hackmd.io/9tGfpJtLTwyyOzDvlyJS2w?view#x-amp-x---1--0)) * 後續透過 `solve` 繼續解出未完成的部份 ```cpp static bool solve(char **board, masks_t *m) { for (int y = 0; y < 9; ++y) { for (int x = 0; x < 9; ++x) { if (board[y][x] != '.') continue; int b = bi(x, y); uint16_t posmoves = m->row[x] & m->col[y] & m->box[b]; while (posmoves) { size_t n = __builtin_ctz(posmoves); uint16_t k = ~(1U << n); board[y][x] = '1' + n; m->row[x] &= k, m->col[y] &= k, m->box[b] &= k; if (solve(board, m)) return true; m->row[x] |= ~k, m->col[y] |= ~k, m->box[b] |= ~k; posmoves &= MOVE; } board[y][x] = '.'; return false; } } return true; } ``` * 剩下的就只是 backtracking 而已，嘗試每一個可行的解然後遞迴往下解，遞迴返回時回復狀態 * 因此 MOVE 顯然是把嘗試過的數字標記上，因此 `MOVE = (b) posmoves - 1` 消除 mask 最右邊的 1 提交並得到之執行時間如下： ![](https://i.imgur.com/bEDtem8.png)