第五次社課 - HackMD

# 從鑽🪵取🔥開始的 AI 考古日記 --- # [前測表單](https://forms.gle/Cr4PFwWUGKFMpBHT8) --- ### OUTLINE * quick intro * 監督式學習 * 強化式學習 * 賽局樹演算法 --- ## Quick Intro ---- ### Why 鑽木取火 * 現在的 AI 太強了 * 技術堆疊多、更新超快 * 從已知用火開始學習! ---- ### Today's goal * 體驗 AI 中比較簡單的部分 * 對 AI 產生興趣 --- ## 監督式學習 ---- ### What is 深度學習 * 基於類神經網路的機器學習演算法 * 自己在資料中找出規則 * 給不出最佳解、但會很接近 ---- ![image](https://hackmd.io/_uploads/r1o1kJbZR.png) ---- ### What is neuron ![image](https://hackmd.io/_uploads/BJjZZybZC.png) ---- ### What is neuron * $\left\{\begin{matrix} & y=0 \ \text{if} \ x_1w_1+x_2w_2 < \theta \\ & y=1 \ \text{if} \ x_1w_1+x_2w_2 \ge \theta\end{matrix}\right.$ * $x_i\in\mathbb{R}, w_i\in\mathbb{R}$ * $x$ 可能會有更多、但輸出 $y$ 只有一個值 ---- ### What is neural network * 一顆 neuron 做不了什麼 * 那一堆呢? ---- ### What is neural network ![image](https://hackmd.io/_uploads/HJkvmkb-R.png) ---- ### What is neural network * 透過權重的調整可以解決複雜問題 * 當超過四層稱為**深度神經網路(DNN)** ---- ### 神經網路模型訓練 * DNN 需要適當的權重組合才能解決問題 * 可以透過**訓練**達成 ---- ### 訓練流程 1. 準備資料 2. 設計模型 3. 訓練模型 4. 評估+預測 ---- ### 準備資料 * 收集資料 * 對資料預處理 = 轉成神經網路能處理的形式 ---- ### 設計模型 * 神經網路要幾層? * 每層幾顆神經元? * 每一層的**種類**? ---- ### 訓練模型 * 主要目的在調整內部權重 * 先從前面一層層運算進去 * 前向傳播(Forward pass) * 有一個預測值 * 預測值跟答案有誤差 * 修正誤差 * 反向傳播(Back propagation) * 從後面一層層修正回去 ---- ![upload_e19dcab6ace7e440a1666380a7764b26](https://hackmd.io/_uploads/B1BnAJZWC.png =120%x) ---- ### 評估+預測 * 模型應該要對所有同類型的輸入有效 * 檢查模型對**沒看過的資料**的效果 ---- ### 監督式學習 * 學習輸入和輸出之間的關係的方法 * 訓練時需要給資料和標籤(答案) * 主要解決問題: **分類**、**迴歸** ---- ### 練習: 手寫數字辨識 * 使用 tensorflow 套件內提供的 MNIST 手寫數字資料集練習 * 經典分類問題 ![image](https://hackmd.io/_uploads/B1khxg-WC.png) ---- ### MNIST: 準備資料 * 因之後需要預測沒看過的資料 * tensorflow 預設有把資料切成 * 訓練資料 + 測試資料 ```python= from tensorflow.keras.datasets import mnist (train_images, train_labels), (test_images, test_labels) = mnist.load_data() ``` ---- ### MNIST: 準備資料 * 資料型態是 numpy 的矩陣資料 ndarray * 如下可查看資料的筆數、圖片大小 ```python= print(train_images.shape) # (60000, 28, 28) # 60000 張圖片、28 * 28 大小的黑白圖片 print(train_labels.shape) # (60000,) # 60000 個答案 ``` ---- ### MNIST: 準備資料 * 如果想看資料集內的圖長怎樣 * 可以用 matplotlib 顯示 ```python= import matplotlib.pyplot as plt %matplotlib inline for i in range(10): plt.subplot(1, 10, i + 1) plt.imshow(train_images[i], 'gray') plt.show() ``` ---- ### MNIST: 準備資料 * 本例神經網路只接受一維的向量資料 * 要對資料做預處理 ```python= import numpy as np print(train_images.shape) # (60000, 28, 28) -> (60000, 784) train_images = train_images.reshape((train_images.shape[0], 784)) # (10000, 28, 28) -> (10000, 784) test_images = test_images.reshape((test_images.shape[0], 784)) print(train_images.shape) ``` ---- ### MNIST: 準備資料 * 因為輸出會是一個長度為 10 的陣列，每個元素分別代表預測機率 * e.g. [0.001, 0.0002, 0.9232, ...] * 這又稱作 one-hot encoding * 要對標籤做預處理 ```python= from tensorflow.keras.utils import to_categorical print(train_labels[0]) train_labels = to_categorical(train_labels) test_labels = to_categorical(test_labels) print(train_labels[0]) # Output: # 5 # [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] ``` ---- ### MNIST: 設計模型 ![image](https://hackmd.io/_uploads/SkgpKgZW0.png) ---- ### MNIST: 設計模型 * 這麼少夠嗎? * 訓練資料不多時，神經網路太複雜反而有反效果 * 會過度提取訓練資料的特徵，變得只能對訓練資料作用 * 又稱過度配適(overfitting) ---- ### MNIST: 設計模型 * 如前面所說，神經網路每層有不同的**種類** * 密集層: 前後兩層所有神經元連在一起。如前面神經網路的圖片展示，又稱**全連接層** * Dropout: 隨機**丟掉**前一層神經元的輸出。有防止 overfitting 的效果，因為它迫使神經網路必須學會較通用的規則 ---- ### MNIST: 設計模型 * sigmoid、softmax? * 激勵函數(Activation Function) * 神經網路本身是一個線性函數，無法模擬非線性特徵，但很多問題都是非線性的 * 而激勵函數可在神經網路中引入非線性 ---- ### MNIST: 設計模型 * sigmoid * 會將輸出控制在 0~1 間 * softmax * 將所有神經元的輸出控制在總和為 1 ---- ### MNIST: 設計模型 * tensorflow 的 Keras 工具可以如下建構神經網路 ```python= from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Activation, Dense, Dropout model = Sequential() model.add(Dense(256, activation='sigmoid', input_shape=(784,))) model.add(Dense(256, activation='sigmoid')) # rate 表示丟棄比例 model.add(Dropout(rate=0.5)) model.add(Dense(10, activation='softmax')) ``` ---- ### MNIST: 設計模型 * 最後還需要將模型編譯後才能訓練 * 編譯需要指定**損失函數、優化器、評估指標** ---- ### MNIST: 設計模型 ![image](https://hackmd.io/_uploads/BJ5O7Wbb0.png) ---- ### MNIST: 設計模型 * 損失函數: 根據預測值與標籤的誤差計算損失值 * 優化器: 根據損失值，修正權重 * 評估指標: 評估訓練成效，用於參考 ---- ### MNIST: 設計模型 ```python= from tensorflow.keras.optimizers import SGD # learning_rate 表示學習率，也就是每次更動權重的幅度 model.compile(loss='categorical_crossentropy', optimizer=SGD(learning_rate=0.1), metrics=['acc']) ``` ---- ### MNIST: 訓練模型 * 訓練模型時，深度學習框架通常會支援將訓練資料進一步拆成(訓練資料+驗證資料) * 這是為了即時知道訓練成效，以精準掌握訓練過程 ---- ### MNIST: 訓練模型 ```python= """ 訓練會重複好幾次，每次會把訓練資料分批訓練，其中: batch_size=500 代表一次中，每批從樣本取 500 張圖來訓練，直到把所有樣本的訓練過一遍 epochs=5 代表總共會訓練五次 """ history = model.fit(train_images, train_labels, batch_size=500, epochs=5, validation_split=0.2) ``` ---- ### MNIST: 訓練模型 * 訓練中，通常比較關注的資訊都是**驗證**資料的評估指標與**訓練**資料的**關係** ``` Epoch 1/5 96/96 [==============================] - 7s 59ms/step - loss: 1.7189 - acc: 0.4305 - val_loss: 0.9473 - val_acc: 0.8368 Epoch 2/5 96/96 [==============================] - 3s 35ms/step - loss: 0.8779 - acc: 0.7511 - val_loss: 0.5461 - val_acc: 0.8845 Epoch 3/5 96/96 [==============================] - 2s 17ms/step - loss: 0.6141 - acc: 0.8279 - val_loss: 0.4094 - val_acc: 0.9018 Epoch 4/5 96/96 [==============================] - 2s 16ms/step - loss: 0.4973 - acc: 0.8606 - val_loss: 0.3449 - val_acc: 0.9095 Epoch 5/5 96/96 [==============================] - 2s 17ms/step - loss: 0.4388 - acc: 0.8757 - val_loss: 0.3119 - val_acc: 0.9156 ``` ---- ### MNIST: 訓練模型 * 用 matplotlib 觀察一下評估指標的曲線 ```python= plt.plot(history.history['acc'], label='acc') plt.plot(history.history['val_acc'], label='val_acc') plt.ylabel('accuracy') plt.xlabel('epoch') plt.legend(loc='best') plt.show() ``` ---- ### MNIST: 訓練模型 * 結果如下圖是正常的 ![image](https://hackmd.io/_uploads/Hy93Ob--C.png) ---- ### MNIST: 訓練模型 * 作為對照組，下圖是不正常的 ![image](https://hackmd.io/_uploads/Sy6LoWZbA.png) ---- ### MNIST: 評估+預測 ---- ### MNIST: 評估+預測使用如下函式可以進行評估 ```python= test_loss, test_acc = model.evaluate(test_images, test_labels) print('loss: {:.3f}, acc: {:.3f}'.format(test_loss, test_acc)) # Output: # 313/313 [==============================] - 2s 5ms/step - loss: 0.3206 - acc: 0.9131 # loss: 0.321, acc: 0.913 ``` ---- ### MNIST: 評估+預測 * 看起來沒大問題就可以使用該模型來預測正式資料了，預測的程式如下 ```python= # 預測測試資料的前十筆 model_predictions = model.predict(test_images[:10]) print(model_predictions[0]) # [5.7922537e-04 3.5977940e-04 8.7750499e-04 6.1351364e-03 1.6562057e-04 # 4.3002155e-04 8.3471705e-06 9.8046595e-01 5.3637120e-04 1.0442023e-02] # 因為小數不方便判讀，所以可以用 numpy 的函數將其轉回數字答案 result = [np.argmax(i) for i in model_predictions] print(result) # [7, 2, 1, 0, 4, 1, 4, 9, 6, 9] ``` ---- ### MNIST: 評估+預測也可以用圖片來看測試的精準度，使用 matplotlib 來顯示圖片 ```python= for i in range(10): plt.subplot(1, 10, i + 1) plt.imshow(test_images[i].reshape((28, 28)), 'gray') plt.show() # 從上一頁來的結果 print(result) ``` ---- ### MNIST: 評估+預測 ![image](https://hackmd.io/_uploads/rJuQjr-Z0.png) --- ## 強化式學習 ---- ### 強化式學習 * 強化式學習也是機器學習的一種 * 與從資料學習的監督式學習不同 ---- ### 強化式學習 * 把 AI 放生到一個**環境** * 程式透過跟環境互動找出「最佳策略」 * e.g. 下棋、玩遊戲 ![image](https://hackmd.io/_uploads/SkSalzW-A.png =50%x) ---- ### 基本術語 * 代理人 a.k.a 機器人 * 行動 * 環境 * 回饋 * 狀態 ```graphviz digraph { compound=true rankdir=LR graph [ fontname="DFKai-SB", fontsize=37 ]; node [ fontname="DFKai-SB", fontsize=35]; edge [ fontname="DFKai-SB", fontsize=29 ]; agent [label="代理人"] [shape=box] environment [label="環境"] [shape=box] agent -> environment [label="行動"] environment -> agent [label="回饋, 狀態"] } ``` ---- ### 基本術語 * 回報 * 將每次行動得到的回饋加總，不過加總的方式是代理人自己計算，因此對於不同的計算方式數值也會不同 * e.g. 折扣回報值的計算是將未來的行動打折、越遠打越多折，加總起來的數值 * 價值 * 因回報值包含不確定的未來 * 所以需要一個指標來衡量收到的回饋的好壞 * 目標是最大化回報值 ---- ### 練習: 多臂吃角子老虎機 * 強化式學習的經典問題 ![image](https://hackmd.io/_uploads/rJ322dsda.png) ---- ### Multi-Armed Bandit: 問題介紹 * 想像有一台有超多拉桿的角子老虎機，其中每支拉桿的中獎機率都不同 * 代理人事先完全不知道機台的中獎機率 * 代理人有固定次數的機會（如：1000 次）玩吃角子老虎機，目標是在固定次數內賺最多錢 ---- ### Multi-Armed Bandit: 問題介紹 * 如果知道所有機台的中獎機率（獎品期望值），那麼一直玩期望值最高的機台最賺！ * 但玩家什麼都不知道，怎麼辦? * 多玩幾次自己算期望值！ ---- ### Multi-Armed Bandit: 策略選擇 * 什麼時候該 All-in 估計價值最高的選擇? * 利用（貪心法） * 什麼時候該多探索不同機台的價值? * 探索（隨便選） ---- ### Multi-Armed Bandit: $\epsilon$ 貪心法 * 提前設定以 $\epsilon$ 的機率「探索」；$1-\epsilon$ 的機率「利用」 * e.g. $\epsilon=0.1$ , 10% 機率探索、90% 機率利用 ---- ### Multi-Armed Bandit: $\epsilon$ 貪心法 ![image](https://hackmd.io/_uploads/ryvTTGZZA.png) ---- ### Multi-Armed Bandit: $\epsilon$ 貪心法 * 價值公式： $V_i=\frac{n-1}{n}\cdot V_{i-1}+\frac{1}{n}\cdot R_i$ * $V_i$ 表示此次所選拉桿的價值、 $V_{i-1}$ 表示上次行動的 * $n$ 表示該拉桿被選過的次數 * $R_i$ 表示此次拉拉桿得到的回饋值 ---- ### Multi-Armed Bandit: 環境製作 ```python= import numpy as np import random import math # 單個拉桿 class SlotArm(): def __init__(self, p): """ 初始化時先設定機率，不過代理人不會知道 """ self.p = p def draw(self): """ 如果抽中獎回饋 1, 沒中就回饋 0 """ if self.p > random.random(): return 1.0 else: return 0.0 def play(algo, arms, num_sims, num_time): """ algo: 實作代理人的策略的 class； arms: 機台，是一個裝了 SlotArm 物件的 list； num_sims: 模擬回合數 num_time: 一回合代理人可以玩幾次 """ times = np.zeros(num_sims * num_time) rewards = np.zeros(num_sims * num_time) for sim in range(num_sims): algo.initialize(len(arms)) for time in range(num_time): index = sim * num_time + time times[index] = time + 1 chosen_arm = algo.select_arm() reward = arms[chosen_arm].draw() rewards[index] = reward algo.update(chosen_arm, reward) return [times, rewards] ``` ---- ### Multi-Armed Bandit: 實作 $\epsilon$ 貪心法 ```python= class EpsilonGreedy(): def __init__(self, epsilon): """ 設定探索機率 epsilon """ self.epsilon = epsilon def initialize(self, n_arms): """ n: 存每個拉桿被拉次數的 list v: 存每個拉桿當前價值的 list 先初始化為 0 """ self.n = np.zeros(n_arms) self.v = np.zeros(n_arms) def select_arm(self): """ 根據 epsilon 機率隨機選擇行動 """ if self.epsilon >= random.random(): return np.random.randint(0, len(self.n)) else: return np.argmax(self.v) def update(self, chosen_arm, reward): """ 按照前述公式更新價值 """ self.n[chosen_arm] += 1 n = self.n[chosen_arm] v = self.v[chosen_arm] self.v[chosen_arm] = ((n - 1) / float(n)) * v + (1 / float(n)) * reward def label(self): return 'ε-greedy(' + str(self.epsilon) + ')' ``` ---- ### Multi-Armed Bandit: 進行實驗 ```python= import pandas as pd import matplotlib.pyplot as plt %matplotlib inline # 設定機台共 5 根拉桿，並隨機設定機率 arms = [SlotArm(random.random()) for _ in range(5)] # epsilon 機率設 0.1 algo = EpsilonGreedy(0.1) # 跑 2000 回合、每回合跑 250 次 results = play(algo, arms, 2000, 250) df = pd.DataFrame({'times': results[0], 'rewards': results[1]}) # 把 2000 回合的每個時間點取平均 mean = df['rewards'].groupby(df['times']).mean() plt.plot(mean, label=algo.label()) plt.xlabel('Step') plt.ylabel('Average Reward') plt.legend(loc='best') plt.show() ``` ---- ### Multi-Armed Bandit: 進行實驗 * 照剛剛的條件做，會得到類似的圖 * 可以多調整參數自己玩玩看~ ![image](https://hackmd.io/_uploads/ryIzxQZb0.png) --- ## 賽局樹演算法 ---- ### What is 賽局樹 * 賽局樹並非機器學習 * 賽局樹是一種描述賽局兩方行動的模型 * 以節點表示局勢、以邊表示行動 * 方形節點表示輪到我方、圓形輪到對方 ---- ### What is 賽局樹 * 葉節點寫的數字代表了局勢的價值 ![image](https://hackmd.io/_uploads/Bymy7XZbR.png =80%x) ---- ### Minimax 演算法 * 完整賽局樹演算法 * 對我方選擇最有利行動(Max) * 假設敵方選擇對我方最不利行動(Mini) * 計算出**所有**行動的局勢價值 ---- ### Minimax 演算法 * 局勢價值由下往上計算 * 因為遊戲必須玩到最後才知道結果 * 所以根據結果逆推每步的好壞 ---- ### Minimax 演算法 ![image](https://hackmd.io/_uploads/rycbBmWWC.png) ---- ### 練習: 圈圈叉叉 * 又稱井字棋 * 應該都玩過..吧? * [遊玩連結](https://g.co/kgs/ag1Aqko) ![image](https://hackmd.io/_uploads/BynjBmZ-C.png =50%x) ---- ### tic-tac-toe: 環境製作 ```python= import random class State: def __init__(self, pieces=None, enemy_pieces=None): self.pieces = pieces if pieces != None else [0] * 9 self.enemy_pieces = enemy_pieces if enemy_pieces != None else [0] * 9 def piece_count(self, pieces): count = 0 for i in pieces: count += 1 if i == 1 else 0 return count def is_lose(self): def is_comp(x, y, dx, dy): for k in range(3): if y < 0 or y > 2 or x < 0 or x > 2 or self.enemy_pieces[x + y * 3] == 0: return False x, y = x+dx, y+dy return True if is_comp(0, 0, 1, 1) or is_comp(0, 2, 1, -1): return True for i in range(3): if is_comp(0, i, 1, 0) or is_comp(i, 0, 0, 1): return True return False def is_draw(self): return self.piece_count(self.pieces) + self.piece_count(self.enemy_pieces) == 9 def is_done(self): return self.is_lose() or self.is_draw() def next(self, action): pieces = self.pieces.copy() pieces[action] = 1 return State(self.enemy_pieces, pieces) def legal_actions(self): actions = [] for i in range(9): if self.pieces[i] == 0 and self.enemy_pieces[i] == 0: actions.append(i) return actions def is_first_player(self): return self.piece_count(self.pieces) == self.piece_count(self.enemy_pieces) def __str__(self): ox = ('o', 'x') if self.is_first_player() else ('x', 'o') output = '' for i in range(9): if self.pieces[i] == 1: output += ox[0] elif self.enemy_pieces[i] == 1: output += ox[1] else: output += '-' if i % 3 == 2: output += '\n' return output ``` ---- ### tic-tac-toe: 隨機策略+手動模式 ```python= def random_action(state): legal_actions = state.legal_actions() print("方法: 隨機下法\n") return legal_actions[random.randint(0, len(legal_actions) - 1)] def human_action(state): legal_actions = state.legal_actions() print("方法: 人類玩家\n") print("目前合法步: ", legal_actions) action = int(input("輸入您的下一步:")) while action not in legal_actions: action = int(input("輸入您的下一步:")) print() return action ``` ---- ### tic-tac-toe: Minimax 實作 ```python= def mini_max(state): # 設定落敗時的局勢價值 = -1 if state.is_lose(): return -1 # 設定平手或獲勝時的局勢價值 = 0 if state.is_draw(): return 0 best_score = -float('inf') for action in state.legal_actions(): # 欸?怎麼回事 score = -mini_max(state.next(action)) if score > best_score: best_score = score return best_score ``` ---- ### tic-tac-toe: Minimax 實作 * 因為不斷處理最大、最小值很麻煩 * 所以改成每次取負號，就只要取最大值了! ![image](https://hackmd.io/_uploads/rJ7xuX-bA.png) ---- ### tic-tac-toe: Minimax 實作 ```python= def mini_max_action(state): best_action = 0 best_score = -float('inf') output = ['', ''] for action in state.legal_actions(): score = -mini_max(state.next(action)) if score > best_score: best_action = action best_score = score output[0] = '{}{:2d},'.format(output[0], action) output[1] = '{}{:2d},'.format(output[1], score) print("方法: Minimax 演算法") print("合法棋步: ", output[0], "局勢價值: ", output[1], '\n') return best_action ``` ---- ### tic-tac-toe: 對局程式 ```python= state = State() round = 0 while True: if state.is_done(): break round += 1 print("-第" + str(round) + "回合-\n") if state.is_first_player(): print("玩家:「O」") action = random_action(state) # action = human_action(state) else: print("玩家:「X」") action = mini_max_action(state) state = state.next(action) print(state) print() if state.is_lose(): print("player1 lose..") elif state.is_draw(): print("draw") else: print("player1 win!!") ``` --- # [後測表單](https://forms.gle/x9wL3L6h66eJvkM3A)