2020q3 第 9 週測驗題

# [2020q3](http://wiki.csie.ncku.edu.tw/sysprog/schedule) 第 9 週測驗題 ###### tags: `sysprog2020` :::info 目的: 檢驗學員對 [dynamic programming](https://en.wikipedia.org/wiki/Dynamic_programming), [bitwise 操作](https://hackmd.io/@sysprog/c-bitwise), 亂數行為的認知 ::: ==[作答表單](https://docs.google.com/forms/d/e/1FAIpQLScBSC1-2-F-nCLHlCK90KnnW271OvvO8HBLb9vYF5eA0Uvdcg/viewform)== ### 測驗 `1` :::spoiler 「進階電腦系統理論與實作」課程是資工所 / 醫資所 / AI 學位學程合班，僅管授課教師是 AI 白痴 (沒有實質用 AI 做出賣座的產品或者具備高度學術影響力的成果)，配合學程需求，嘗試和 AI 沾上邊 ::: 無論 Neural Network (神經網路，簡稱 NN) 抑或 Artificial Neural Networks (簡稱 ANN)，都指人工神經網路，後者也是構成 Deep learning (深度學習) 的要素。ANN 的發展可追溯到 1943 年，示意如下: ![](https://i.imgur.com/646CAGi.png) > 出處: [Jump Start to Artificial Intelligence](https://hackernoon.com/jump-start-to-artificial-intelligence-f6eb30d624ec) ANN 的關鍵思想是模擬生物==神經傳導==的機制，人類大腦中的神經元 (neuron) 高達 1000 億個，每個神經元之間會與另外 10,000 個神經元連接，形成一個錯縱複雜的腦神經網路。先來複習生物學中的神經元構造: ![](https://i.imgur.com/1oYfFr4.png) 每個神經元的樹突 (dendrite) 與突觸 (synapse) 會與其它神經元的突觸與樹突相連，接收並發送訊息: ![](https://i.imgur.com/Bbknesq.png) 這些訊息其實是一些來自不同神經元的突觸所發出的微小電流訊號，經由某神經元的樹突接收後，再「決定」是否再發送訊號出去，該「決定」是種簡單的「是」(On) 或「否」(Off) 的二元判斷：當這個神經元收到的電流刺激 (訊號 On) 總和，超過它所能忍受的門檻範圍 (threshold)，它就會透過突觸發出 On 訊息。換言之，一個神經元僅在其超過忍受範圍的刺激時，才會有所動作。作為模擬神經網路的 ANN (也譯為類神經網路)，Perceptron (感知器) 是類神經網路最基本的單位，它要模擬的對象正是人類腦神經中最基本的單元 —— 神經元。每層的神經元擁有輸入 (input) 和輸出 (output)，藉由激勵函數 (activation function) 來衡量對神經元輸出的重要性。可將激勵函數想像成神經元的閘門，後者代表神經元的忍受門檻，若超過/觸發神經元時則發送值出去。最單純的形式是 step function (步階函數)：當函數收到的值大於 0 則回傳 1，否則皆回傳 0。 ![](https://i.imgur.com/muc2e3E.png) step function 只單純的回傳 `0` 或 `1`，該函數無法微分，所以無法進行 gradient descent 等運算，因此 ANN 通常不採用該函數。資料從神經元進入，經過非線性的激勵函數下輸出，傳入下一層神經元，如此往下傳遞直到最後的輸出層。正是這些激勵函數的作用，讓神經網絡有足夠的能力去抓取複雜的特徵 (features)，從而提高模型的效能。 ANN 的組成: * 輸入層 (Input layer): 接受訊息的神經元，稱為輸入向量。 * 隱藏層 (Hidden layer): 輸入層和輸出層之間眾多神經元和連結組成的各個層面，可以有多層。 * 輸出層 (Output layer): 訊息在神經元連結中傳輸、分析、權衡後所形成的輸出結果（稱為輸出向量）。 Sigmoid function 是個激勵函數，定義如下: $$ s(t) = \frac{1}{1 + e^{-t}} $$ 基於下列因素，Sigmoid 較 step function 更適於類神經網路: 1. Sigmoid 的回傳值是連續且可微分的 2. 與 Y 軸對稱 3. 與 step function 不同，它的值是漸近用 Python 製圖，程式碼如下: (`sigmoid.py`) ```python import numpy as np import matplotlib.pyplot as plt def sigmoid(x): return 1.0 / (1.0 + np.exp(-x)) if __name__ == '__main__': x = np.linspace(-15, 15) y = sigmoid(x) plt.plot(x, y, label="sigmoid", color="red") plt.legend() plt.grid() plt.show() ``` :::info :information_source: 確認事先透過 [pip](https://pypi.org/project/pip/) 安裝 `matplotlib` 套件: ```shell $ pip3 install matplotlib ``` 執行命令: `$ python3 sigmoid.py` ::: ![](https://i.imgur.com/lNHgVzt.png) 不難看出 Sigmoid 的曲線形似 `S` 字母。 Sigmoid 仍有以下問題： 1. 輸出值其 Y 軸並非以 0 為中心 2. 其輸出值過度飽和 (兩端過於貼近輸出值)，不便於梯度計算於是後續又發展出 tanh 和 ReLU 等激勵函數。 > 延伸閱讀: [類神經網路的 ReLU 及其常數時間複雜度實作](https://hackmd.io/@sysprog/constant-time-relu) 後續我們採用 Sigmoid 作為激勵函數，給定 $N1$, $N2$, $N3$ 及個權重 $W$ 如下圖: ![](https://i.imgur.com/nPmc3Iw.png) 我們可計算 $N_4$ 及 $N_5$: $\begin{split} N_4 &= f(w_{1,4} \times N_1 + w_{2,4} \times N_2 + w_{3,4} \times N_3) \\ &= f(0.71 \times 0.94 + 0.73 \times 0.22 + 0.32 \times 0.49) \\ &= f(0.9848) \\ &= 0.72806 \\ \end{split}$ $\begin{split} N_5 &= f(w_{1,5} \times N_1 + w_{2,5} \times N_2 + w_{3,5} \times N_3) \\ &= f(0.91*0.94+0.37*0.22+0.23*0.49) \\ &= f(1.05) \\ &= 0.741 \\ \end{split}$ 以此法可計算 $N_6$: $\begin{split} N_6 &= f(w_{4,6} \times N_4 + w_{5,6} \times N_5) \\ &= f(0.44967) \\ &= 0.61056 \\ \end{split}$ 由於單獨的神經元只能處理線性的二元分類，若要學習並處理非線性問題，則要透過多層的感知器架構加上 back-propagation (反向傳播)，透過梯度下降方法針對 W 權重值進行逐步修正的方式來處理。 > * [類神經網路 Neural network](https://mropengate.blogspot.com/2015/06/ch15-4-neural-network.html) > * [梯度下降法: 隱藏在深度學習背後的演算法](https://www.slideshare.net/ccckmit/ss-238680742) ![](https://i.imgur.com/L5UMMnN.png) 上圖是標準的多層式類神經網路架構，訊息從左側往右依次傳遞給下一層，每個感知器僅能與上下層的感知器連接，而不會連到同一層的其它感知器。我們可用 3-2-3-2 NN 來稱呼上圖類神經網路架構: * 3 inputs * 2 hidden layers * 3 neurons * 2 outputs Output layer 的感知器數量會與我們的標記 (label) 數或分類數 (classification) 一樣多。例如，若我們要辨識手寫阿拉伯數字，則 Output layer 的節點數應為 10 個，代表 0 到 9 等 10 個數字。在腦神經研究中，科學家發現當兩個神經元接近到可互通訊息時，它們會透過樹突及突觸的生長，伴隨著代謝變化，來增加或減少彼此間的傳輸效率。同樣的模式應用到人工神經元，扮演這個化學變化要素的就是權重 $W$，它影響著最終激勵函數的計算結果，因此當我們在知道標準答案後，可進行修正並調整這個錯誤的偏差值，方法就是倒過來使用非線性的激勵函數，由最後往前推導，調整兩個感知器之間訊息的權重 $W$，來取得最佳的修正。這個方式稱為 back-propagation (反向傳播)。進行反向傳播時，權重 $w$ 計算式如下： $$ w = w + Learning\_rate \times (expected – predicted) \times x $$ 其中: * $w$: 已知的 $w$ 權重值，後方的計算為針對其所作的調整 * Learning rate: 此值愈大，代表其偏移的程度愈大。注意在進行梯度下降計算時，過大的 Learning rate 會因來回的偏移過大無法找出正確的值，一般來說，Learning rate 大部份介於 ${0.01, 0.1, 1.0}$ 等數值 * $expected\ – \ predicted$：即正確值與預測值的差異，若預測正確則此值為 $0$，差異愈多此值愈大類神經網路主要分為 4 種架構： * Feedforward Neural Networks * Convolutional Neural Networks * Recursive Neural Networks * Recurrent Neural Networks Feedforward Neural Networks (前饋類神經網路) 是最基本的類神經網路，其工作方式很單純，Input layer 的神經元取得 features value，乘上 $W$ 權重加上 bias 偏差後，透過激勵函數計算出結果並傳遞給下一層，因此 Feedforward Neural Networks 是一種單向的流動，不會把值從輸出端傳回輸入端。在類神經網路訓練過程中，類神經網路的架構可以進行動態調整。倘若模型複雜度越高，該模型對未來的適應性就更低，因為可能會過度地匹配我們給予的訓練樣本，因此，我們可把類神經網路中代表模型複雜度的項，作為配適函數中的負項。例如，目標函數可能設定為: $$ Fitness = -MSE - 0.01 \times NeuralNumber $$ 換言之，我們可設定最低和最高的模型複雜度，然後透過隨機事件上調或降低你的模型複雜度，注意要保留多一點的機率，以維持現在的模型複雜度，因為每回合改變的過程中，我們需要參考多一點同樣模型複雜度，從而進行取捨，有較高的機會訓練出更好的模型。這樣的手法，稱為隨機搜尋 (random search)。既然提到機率，我們來探討適用於不同平台的亂數產生器: * [你的程式夠「亂」嗎？](https://www.ithome.com.tw/voice/110007) * [亂數](https://blog.danielchen.cc/2019/01/31/random-number/) * [The fastest conventional random number generator that can pass Big Crush?](https://lemire.me/blog/2019/03/19/the-fastest-conventional-random-number-generator-that-can-pass-big-crush/) * [ARM and Intel have different performance characteristics: a case study in random number generation](https://lemire.me/blog/2019/03/20/arm-and-intel-have-different-performance-characteristics-a-case-study-in-random-number-generation/) 在若干作業系統 (如 Linux, OpenBSD, MS-Windows, macOS) 中，[Address space layout randomization](https://en.wikipedia.org/wiki/Address_space_layout_randomization) (簡稱 ASLR) 是種防範記憶體損壞漏洞被利用的電腦安全技術，透過隨機放置行程關鍵資料區域的定址空間，防止攻擊者能可靠地跳躍到記憶體的特定位置。我們可利用 ASLR 來設定亂數種子，例如: ```cpp static uint64_t splitmix_x; static void splitmix_init() { /* Hopefully, ASLR makes our function address random */ uintptr_t x = (uintptr_t)((void *) &splitmix_init); struct timespec time; clock_gettime(CLOCK_MONOTONIC, &time); x ^= (uintptr_t) time.tv_sec; x ^= (uintptr_t) time.tv_nsec; splitmix_x = x; } ``` 由於函式 `splitmix_init` 的地址在 ASLR 生效後，可能會隨機得到不同的數值 (注意: 僅是「可能」，實際要看作業系統的表現)，我們可利用這特性來設定亂數種子 `splitmix_x`。以下是一個類神經網路的 C 語言實作: ```cpp #include <assert.h> #include <math.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <time.h> struct ann; typedef double (*ann_act_func)(const struct ann *ctx, double a); typedef struct ann { int inputs, hidden_layers, hidden, outputs; ann_act_func activation_hidden; /* used for hidden neurons */ ann_act_func activation_output; /* used for output */ int total_weights; /* weights and size of weights buffer */ int total_neurons; /* neurons + inputs and size of output buffer */ double *weight; /* All weights */ double *output; /* store input array and output of each neuron */ double *delta; /* total_neurons - inputs */ } ann_t; #define LOOKUP_SIZE 4096 static inline double ann_act_hidden_indirect(const struct ann *ctx, double a) { return ctx->activation_hidden(ctx, a); } static inline double ann_act_output_indirect(const struct ann *ctx, double a) { return ctx->activation_output(ctx, a); } static const double sigmoid_dom_min = -15.0, sigmoid_dom_max = 15.0; static double interval, lookup[LOOKUP_SIZE]; #define unlikely(x) __builtin_expect(!!(x), 0) #define UNUSED __attribute__((unused)) static double ann_act_sigmoid(const ann_t *ctx UNUSED, double a) { if (a < -45.0) return 0; if (a > 45.0) return 1; return 1.0 / (1 + exp(-a)); } static void ann_init_sigmoid_lookup(const ann_t *ctx) { const double f = (sigmoid_dom_max - sigmoid_dom_min) / LOOKUP_SIZE; interval = LOOKUP_SIZE / (sigmoid_dom_max - sigmoid_dom_min); for (int i = 0; i < LOOKUP_SIZE; ++i) lookup[i] = ann_act_sigmoid(ctx, sigmoid_dom_min + f * i); } static double ann_act_sigmoid_cached(const ann_t *ctx UNUSED, double a) { assert(!isnan(a)); if (a < sigmoid_dom_min) return lookup[0]; if (a >= sigmoid_dom_max) return lookup[LOOKUP_SIZE - 1]; size_t j = (a - sigmoid_dom_min) * interval + 0.5; if (unlikely(j >= LOOKUP_SIZE)) return lookup[LOOKUP_SIZE - 1]; return lookup[j]; } /* conventional generator (splitmix) developed by Steele et al. */ static uint64_t splitmix_x; static inline uint64_t splitmix() { splitmix_x += 0x9E3779B97F4A7C15; uint64_t z = splitmix_x; z = (z ^ (z >> 30)) * 0xBF58476D1CE4E5B9; z = (z ^ (z >> 27)) * 0x94D049BB133111EB; return z ^ (z >> 31); } static void splitmix_init() { /* Hopefully, ASLR makes our function address random */ uintptr_t x = (uintptr_t)((void *) &splitmix_init); struct timespec time; clock_gettime(CLOCK_MONOTONIC, &time); x ^= (uintptr_t) time.tv_sec; x ^= (uintptr_t) time.tv_nsec; splitmix_x = x; /* do a few randomization steps */ uintptr_t max = ((x ^ (x >> 17)) & 0x0F) + 1; for (uintptr_t i = 0; i < max; i++) splitmix(); } /* uniform random numbers between 0 and 1 */ #define ANN_RAND() (((double) splitmix()) / UINT64_MAX) /* Set weights randomly */ void ann_randomize(ann_t *ctx) { for (int i = 0; i < ctx->total_weights; ++i) ctx->weight[i] = ANN_RAND() - 0.5; /* weights from -0.5 to 0.5 */ } /* Create and return a new network context */ ann_t *ann_init(int inputs, int hidden_layers, int hidden, int outputs) { if (hidden_layers < 0) return 0; if (inputs < 1) return 0; if (outputs < 1) return 0; if (hidden_layers > 0 && hidden < 1) return 0; const int hidden_weights = hidden_layers ? (inputs + 1) * hidden + (hidden_layers - 1) * (hidden + 1) * hidden : 0; const int output_weights = (hidden_layers ? (WWW) : (inputs + 1)) * outputs; const int total_weights = (hidden_weights + output_weights); const int total_neurons = (inputs + hidden * hidden_layers + outputs); /* Allocate extra size for weights, outputs, and deltas. */ const int size = sizeof(ann_t) + sizeof(double) * (total_weights + total_neurons + (total_neurons - inputs)); ann_t *ret = malloc(size); assert(ret); ret->inputs = inputs; ret->hidden_layers = hidden_layers; ret->hidden = hidden; ret->outputs = outputs; ret->total_weights = total_weights; ret->total_neurons = total_neurons; ret->weight = (double *) ((char *) ret + sizeof(ann_t)); ret->output = ret->weight + ret->total_weights; ret->delta = ret->output + ret->total_neurons; ann_randomize(ret); ret->activation_hidden = ann_act_sigmoid_cached; ret->activation_output = ann_act_sigmoid_cached; ann_init_sigmoid_lookup(ret); return ret; } /* Return a new copy of network context */ ann_t *ann_copy(ann_t const *ctx) { const int size = sizeof(ann_t) + sizeof(double) * (ctx->total_weights + ctx->total_neurons + (ctx->total_neurons - ctx->inputs)); ann_t *ret = malloc(size); assert(ret); memcpy(ret, ctx, size); ret->weight = (double *) ((char *) ret + sizeof(ann_t)); ret->output = ret->weight + ret->total_weights; ret->delta = ret->output + ret->total_neurons; return ret; } /* Free the memory used by a network context */ void ann_free(ann_t *ctx) { free(ctx); } /* Run the feedforward algorithm to calculate the output of the network. */ double const *ann_run(ann_t const *ctx, double const *inputs) { double const *w = ctx->weight; double *o = ctx->output + ctx->inputs; double const *i = ctx->output; /* Copy the inputs to the scratch area, where we also store each neuron's * output, for consistency. This way the first layer isn't a special case. */ memcpy(ctx->output, inputs, sizeof(double) * ctx->inputs); if (!ctx->hidden_layers) { double *ret = o; for (int j = 0; j < ctx->outputs; ++j) { double sum = *w++ * (-1.0); for (int k = 0; k < ctx->inputs; ++k) sum += *w++ * i[k]; *o++ = ann_act_output_indirect(ctx, sum); } return ret; } /* Figure input layer */ for (int j = 0; j < ctx->hidden; ++j) { double sum = *w++ * (-1.0); for (int k = 0; k < ctx->inputs; ++k) sum += *w++ * i[k]; *o++ = ann_act_hidden_indirect(ctx, sum); } i += ctx->inputs; /* Figure hidden layers, if any. */ for (int h = 1; h < ctx->hidden_layers; ++h) { for (int j = 0; j < ctx->hidden; ++j) { double sum = *w++ * (-1.0); for (int k = 0; k < ctx->hidden; ++k) sum += *w++ * i[k]; *o++ = ann_act_hidden_indirect(ctx, sum); } i += ctx->hidden; } double const *ret = o; /* Figure output layer. */ for (int j = 0; j < ctx->outputs; ++j) { double sum = *w++ * (-1.0); for (int k = 0; k < ctx->hidden; ++k) sum += *w++ * i[k]; *o++ = ann_act_output_indirect(ctx, sum); } assert(w - ctx->weight == ctx->total_weights); assert(o - ctx->output == ctx->total_neurons); return ret; } #define LOOP_MAX 1024 int main(int argc, char *argv[]) { splitmix_init(); /* Input and expected out data for the 2-input Multiplexer */ const double input[8][3] = { {0, 0, 0}, {0, 0, 1}, {0, 1, 0}, {0, 1, 1}, {1, 0, 0}, {1, 0, 1}, {1, 1, 0}, {1, 1, 1}, }; const double output[8] = { 0, 0, 1, 1, 0, 1, 0, 1, }; /* New network with 3 inputs, 1 hidden layer of 2 neurons, * and 1 output. */ ann_t *ctx = ann_init(3, 1, 2, 1); double err, last_err = LOOP_MAX; int count = 0; do { ++count; if (count % LOOP_MAX == 0) { ann_randomize(ctx); last_err = LOOP_MAX; } ann_t *save = ann_copy(ctx); /* Take a random guess at the ANN weights. */ for (int i = 0; i < ctx->total_weights; ++i) ctx->weight[i] += RRR; err = 0; for (int k = 0; k < (1 << 3); k++) err += pow(*ann_run(ctx, input[k]) - output[k], 2.0); /* Keep these weights if they're an improvement. */ if (err < last_err) { ann_free(save); last_err = err; } else { ann_free(ctx); ctx = save; } } while (err > 0.01); printf("Finished in %d loops.\n", count); /* Run the network and see what it predicts. */ for (int k = 0; k < (1 << 3); k++) printf("[%1.f, %1.f, %1.f] = %1.f.\n", input[k][0], input[k][1], input[k][2], *ann_run(ctx, input[k])); ann_free(ctx); return 0; } ``` 上方的程式碼嘗試由 Multiplexer 的輸入和輸出，推理出其中邏輯。Multiplexing switch 的作用: ![](https://i.imgur.com/qmtNlGS.png) 一個具備 $2^n$ 輸入端的 Multiplexer (簡稱 MUX) 有 $n$ 個可選擇的輸入－輸出線路，可經由控制端來選擇 (select) 其中一個訊號作為輸出。 ![](https://i.imgur.com/wEnn6kW.png) > 2-input Multiplexer Design > 出處: [The Multiplexer](https://www.electronics-tutorials.ws/combination/comb_2.html) 該程式預期的執行輸出如下: ``` Finished in 359 loops. [0, 0, 0] = 0. [0, 0, 1] = 0. [0, 1, 0] = 1. [0, 1, 1] = 1. [1, 0, 0] = 0. [1, 0, 1] = 1. [1, 1, 0] = 0. [1, 1, 1] = 1. ``` 其中 `359` 可能會替換為其他數值。請研讀程式碼，並補完程式碼，使得類神經網路得以運作。 ==作答區== WWW = ? * `(a)` `hidden + 2` * `(b)` `hidden + 1` * `(c)` `hidden` * `(d)` `hidden - 1` * `(e)` `hidden - 2` RRR = ? * `(a)` `ANN_RAND() + 0.5` * `(a)` `ANN_RAND()` * `(c)` `ANN_RAND() - 0.5` * `(d)` `1 - ANN_RAND()` :::success 延伸問題: 1. 解釋程式碼運作原理，應補充相關背景知識 * 探討 `ann_act_sigmoid_cached` 的實作和加速運算的機制 2. 使用 back-propagation (反向傳遞) 改寫上述程式碼，並探討激勵函數的特性 3. 引入其他激勵函數，例如 ReLU，並確保針對其特性進行適度初始化 4. 比照 [Tinn](https://github.com/glouw/tinn)，搭配給定的 data set，實作出能夠辨識手寫印度—阿拉伯數字的類神經網路 * 參考 [nanonn](https://github.com/zserge/nanonn) 5. 研究前述亂數相關文獻，並探討偽亂數產生器 (PRNG) 的實作 6. 研讀 [Vectorizing random number generators for greater speed: PCG and xorshift128+](https://lemire.me/blog/2018/06/07/vectorizing-random-number-generators-for-greater-speed-pcg-and-xorshift128-avx-512-edition/)，重現實驗並思考 SIMD 加速 PRNG 的機會 7. 研讀 [Bitwise Neural Networks](https://arxiv.org/abs/1601.06071) 和 [Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1](https://arxiv.org/abs/1602.02830)，摘錄論文重點，並試著在上述程式碼的基礎上予以實作 * [tinier-nn](https://github.com/codekansas/tinier-nn) ::: --- ### 測驗 `2` LeetCode [37. Sudoku Solver](https://leetcode.com/problems/sudoku-solver/) 要求解[數獨 (Soduku)](https://en.wikipedia.org/wiki/Sudoku_solving_algorithms) 問題。數獨規則如下： 1. 每列 (row) 必須出現 1 到 9 的數字，且只能出現一次 2. 每行 (column) 必須出現 1 到 9 的數字，且只能出現一次 3. 每個 $3 \times 3$ 構成的「宮」(即子格子，以粗框標記) 內，數字 1 到 9 必須出現，且只能出現一次 :::info ``` column ┌ ┐ ┌ ┐ row │ │ │ │ └ ┘ └ ┘ ``` row, column 的中文到底怎麼翻譯呢？哪個是直的？哪個是橫的？台灣的「列」(row) 是橫向；中國的「列」(column) 是直向，==完全相反==！為何有這樣的演化？因為新中國政府頒訂的稿紙格子是橫式，每一行是橫的，而台灣的稿紙是直式，每一行是直的。在英文用法，row 是「橫」的一條，column 則是像監獄欄杆「直」的一根。 * 台灣把 row 翻成==列==，column 翻成 ==行== :o: 想成台灣人愛排隊，一行接著一「行」，攜伴並排就是「列」 * 中國把 row 翻成==行==，column 翻成 ==列== :o: 想成六四天安門廣場上，一「行」人橫著對抗進犯的坦克車，每次損耗就豎直變「烈士」(列) 另外，在計數 row, column 時，因為 row 是橫的由上往下排，所以數橫的 row 要==豎著數==，又因 column 是直的由左往右排，所以 column 要==橫著數== ::: 示範: ![](https://i.imgur.com/P35wUo0.png) > 參考輸入 ![](https://i.imgur.com/kk9fqW6.png) > 數獨解法若每次挑出一空格，並展開完成該盤面的可能數字，嘗試展開所有解法路徑，可畫成一個樹 (tree) 來思考： ![](https://i.imgur.com/FnAzfBl.png) 其中必定存在一條從 root 到 leaf 的路徑，能將其盤面填滿，並且盤面上的所有數字，能滿足數獨的所有條件規則 —— 題目假設存在唯一解。在決定空格能填什麼數字時，已將不符合規則的數字排除，以減少後續的計算量。 > [參考解題思路](https://hackmd.io/@kenjin/BkTcYzRLH) 以下是個可能的程式碼實作: ```cpp #include <stdbool.h> #include <stdint.h> #include <stddef.h> typedef struct { uint16_t row[9], col[9], box[9]; } masks_t; const uint16_t ALL_NUMS = 0x1FF; static inline int bi(int x, int y) { return (y / 3) * 3 + (x / 3); } static bool solve(char **board, masks_t *m) { for (int y = 0; y < 9; ++y) { for (int x = 0; x < 9; ++x) { if (board[y][x] != '.') continue; int b = bi(x, y); uint16_t posmoves = m->row[x] & m->col[y] & m->box[b]; while (posmoves) { size_t n = __builtin_ctz(posmoves); uint16_t k = ~(1U << n); board[y][x] = '1' + n; m->row[x] &= k, m->col[y] &= k, m->box[b] &= k; if (solve(board, m)) return true; m->row[x] |= ~k, m->col[y] |= ~k, m->box[b] |= ~k; posmoves &= MOVE; } board[y][x] = '.'; return false; } } return true; } void solveSudoku(char **board, int boardSize, int *boardColSize) { masks_t m; for (int i = 0; i < 9; ++i) m.row[i] = m.col[i] = m.box[i] = ALL_NUMS; for (int y = 0; y < 9; ++y) { for (int x = 0; x < 9; ++x) { if (board[y][x] != '.') { int n = board[y][x] - '1'; uint16_t k = ~(1U << n); m.row[x] &= k, m.col[y] &= k, m.box[bi(x, y)] &= k; } } } for (int y = 0; y < 9; ++y) { for (int x = 0; x < 9; ++x) { if (board[y][x] != '.') continue; uint16_t mask = m.row[x] & m.col[y] & m.box[bi(x, y)]; if ((mask & (MMM)) == 0) { size_t n = __builtin_ctz(mask); uint16_t k = ~(1U << n); board[y][x] = '1' + n; m.row[x] &= k, m.col[y] &= k, m.box[bi(x, y)] &= k; } } } solve(board, &m); } ``` 請補完程式碼，使得程式符合 LeetCode 題目要求。 ==作答區== MOVE = ? * `(a)` `posmoves + 1` * `(b)` `posmoves - 1` * `(c)` `~posmoves` * `(d)` `~posmoves - 1` MMM = ? * `(a)` `~mask` * `(b)` `(~mask - 1)` * `(c)` `mask + 1` * `(d)` `mask` * `(e)` `mask - 1` :::success 延伸問題: 1. 解釋程式碼運作原理，指出上述實作的缺失並改進 2. 重寫不同於上述的程式碼，效能表現應優於上述 3. 嘗試運用 SIMD 指令加速數獨求解 * 參考 [Nerd Sniped: A Sudoku Story](https://t-dillon.github.io/tdoku/) 及 [simdoku](https://github.com/yotarok/simdoku) ::: ---