〈 Reinforce Learning 決戰黑白棋 Part.1 〉電腦對人的博弈演算法：Minimax 與 Alpha Beta Pruning

# 〈 Reinforce Learning 決戰黑白棋 Part.1 〉電腦對人的博弈演算法：Minimax 與 Alpha Beta Pruning --- Tu 2023/5/26 ***“...there was a cut that quite shocked me, because it was a move that would never happen in a human-to-human Go match. But, afterwards I analyzed the move and I found that it was very good. It is one move with two or even more purposes. We call it one stone, two birds.”* -Ke Jie** ![](https://hackmd.io/_uploads/BJjF_m6Bn.jpg) ## 一、前言看到標題應該知道這個系列要做什麼了。Reinforce learning是我一直很想學但又沒什麼動力學的東西，老實說我最近沒什麼碰Deep learning相關的知識，因為我覺得就我有的設備，就算有了模型，也沒辦法訓練出個什麼。但最近有一門通識課的期末專題是要做黑白棋遊戲，我就想趁此機會來學一下強化式學習。這篇我主要介紹的Minimax和Alpha beta pruning都算不上機器學習，而是一種演算法(algorithm)，會選擇用這個當開頭一方面是因為我覺得很酷(這種演算法還有很多，感覺像片新大陸)，另一方面是當我真的做出成果後，我希望我的模型可以擊敗這些演算法。頂著期末爆炸的壓力，希望能寫完。 ## 二、Minimax 和 Alpha beta pruning 簡介大家下棋的時候肯定會去預測對手的下一步，而我想大部分人的目標就是能夠預測每一步能導出的結果，Minimax基本上就是這個概念，根據每一種可能的行為去計算該步相對應的品質(quality)，比如能夠吃掉多少棋，或獲得多大的優勢等等。因此，Minimax基本上就是一個Multiway search tree(多元搜尋樹)，但是，若要將直到終局的所有可能都計算出來基本上不太可能(如果是圈圈叉叉當然可以，我指的是複雜遊戲)，所以通常會設一個depth，也就是計算到對手未來的下幾步，來限制計算範圍。而Alpha Beta Pruning則是對Minimax的一種改良，將不必要的計算去掉，也就是減少需要計算的node(Pruning)。可以理解成加速Minimax的技術。 ![](https://hackmd.io/_uploads/HJ6campB3.png) 灰色節點代表被修剪掉，也就是沒有運算必要，更詳細可以觀看[圖源的影片](https://www.youtube.com/watch?v=l-hh51ncgDI)，講得十分仔細，推薦給你們。 ## 三、遊戲環境如上所說，我們要做一個黑白棋遊戲，而在那的基礎上我們再加入Minimax和Alpha Beta pruning 來作為遊戲對手。我們要能夠跟上科技，直接請GPT幫我們造一個出來，以下是我修過的程式碼，符合正常黑白棋規則(因為學校的專題作業需要所以玩家每次都是執黑棋，有志者可以自己改一下)。 ```python=0 import itertools import random import math EMPTY = 0 BLACK = 1 WHITE = 2 # Initialize the board board = [[EMPTY] * 8 for _ in range(8)] board[3][3] = WHITE board[4][4] = WHITE board[3][4] = BLACK board[4][3] = BLACK # Function to print the board def print_board(board): print(" 1 2 3 4 5 6 7 8") print(" -----------------------") for i in range(8): print(f"{i + 1} |", end="") for j in range(8): if board[i][j] == EMPTY: print(" ", end="") elif board[i][j] == BLACK: print("B ", end="") elif board[i][j] == WHITE: print("W ", end="") print("|") print(" -----------------------") # Function to check if a move is valid def is_valid_move(board, row, col, color): if board[row][col] != EMPTY: return False for dr, dc in itertools.product([-1, 0, 1], repeat=2): if dr == 0 and dc == 0: continue r, c = row + dr, col + dc while 0 <= r < 8 and 0 <= c < 8 and board[r][c] != EMPTY and board[r][c] != color: r += dr c += dc if 0 <= r < 8 and 0 <= c < 8 and board[r][c] == color and (r, c) != (row + dr, col + dc): return True return False # Function to make a move def make_move(board, row, col, color): if not is_valid_move(board, row, col, color): return None board[row][col] = color for dr, dc in itertools.product([-1, 0, 1], repeat=2): if dr == 0 and dc == 0: continue r, c = row + dr, col + dc while 0 <= r < 8 and 0 <= c < 8 and board[r][c] != EMPTY and board[r][c] != color: r += dr c += dc if 0 <= r < 8 and 0 <= c < 8 and board[r][c] == color and (r, c) != (row + dr, col + dc): #r, c = row + dr, col + dc 這行有問題 while (r, c) != (row, col): board[r][c] = color r -= dr c -= dc return board # Function to get a list of valid moves for a given color def get_valid_moves(board, color): moves = [] for row in range(8): for col in range(8): if is_valid_move(board, row, col, color): moves.append((row, col)) return moves if len(moves)>0 else False # Function to evaluate the current board state for the AI player def evaluate_board(board): black_score = 0 white_score = 0 for row in range(8): for col in range(8): if board[row][col] == BLACK: black_score += 1 elif board[row][col] == WHITE: white_score += 1 return white_score - black_score # Function for the AI player's move using the Alpha-Beta Pruning algorithm def alphabeta(board, depth, alpha, beta, maximizing_player): if depth == 0 or not get_valid_moves(board, BLACK) or not get_valid_moves(board, WHITE): return evaluate_board(board) if maximizing_player: max_eval = float('-inf') valid_moves = get_valid_moves(board, WHITE) for move in valid_moves: row, col = move temp_board = [row[:] for row in board] make_move(temp_board, row, col, WHITE) eval = alphabeta(temp_board, depth - 1, alpha, beta, False) max_eval = max(max_eval, eval) alpha = max(alpha, eval) if beta <= alpha: break return max_eval else: min_eval = float('inf') valid_moves = get_valid_moves(board, BLACK) for move in valid_moves: row, col = move temp_board = [row[:] for row in board] make_move(temp_board, row, col, BLACK) eval = alphabeta(temp_board, depth - 1, alpha, beta, True) min_eval = min(min_eval, eval) beta = min(beta, eval) if beta <= alpha: break return min_eval # Function for the AI player's move def ai_move(board): valid_moves = get_valid_moves(board, WHITE) best_score = float('-inf') best_move = None for move in valid_moves: row, col = move temp_board = [row[:] for row in board] make_move(temp_board, row, col, WHITE) score = alphabeta(temp_board, 3, float('-inf'), float('inf'), False) if score > best_score: best_score = score best_move = move if best_move: row, col = best_move make_move(board, row, col, WHITE) # Main game loop current_player = BLACK while True: print_board(board) if current_player == BLACK and get_valid_moves(board, BLACK): print("Black's turn (B)") move = input("Enter your move (row col): ") try: row, col = map(int, move.strip().split()) row -= 1 # Convert to 0-indexed col -= 1 if 0 <= row < 8 and 0 <= col < 8 and make_move(board, row, col, current_player)!=None: current_player = WHITE else: print("Invalid move. Try again.") except ValueError: print("Invalid input. Try again.") elif get_valid_moves(board, WHITE): print("White's turn (W) - AI") ai_move(board) current_player = BLACK if not get_valid_moves(board, BLACK) and not get_valid_moves(board, WHITE): break print("Game Over") print_board(board) black_score = sum(row.count(BLACK) for row in board) white_score = sum(row.count(WHITE) for row in board) print(f"Black Score: {black_score}") print(f"White Score: {white_score}") if black_score > white_score: print("Black Wins!") elif white_score > black_score: print("White Wins!") else: print("It's a Tie!") ``` 輸出結果： ``` Game Over 1 2 3 4 5 6 7 8 ----------------------- 1 |W W W W W B B B | 2 |W W W W W W B B | 3 |W W W W W W B B | 4 |W W W W W B B W | 5 |W W W W W B W W | 6 |W W W W W W W W | 7 |W W B B W W W W | 8 |W W W W W W W W | ----------------------- Black Score: 12 White Score: 52 White Wins! ``` **顯而易見的，我被打爆了**。但贏的不必是我，我之後造出來的AI能贏就好(應該啦) 解釋一下上方程式碼重要的幾個點： 1. **alphabeta函式(第88行)**：我說過Minimax的重點是search tree，所以會用到recursive algorithm，而maximizing_player的差別可以理解成數線上的正負兩端，越接近某端就是對某方玩家有力，而我是以黑棋和白棋數的差來表示。 2. **ai_move函式(第120行)**：根據alphabeta找出的最佳走法下棋。 3. **is_valid_move(32)和make_move(47)用到的技巧**：這邊用到了itertool，是一個非常強大的函式庫，在遇到一些麻煩的情況時可以去看有啥能用，像這邊的product就讓程式簡潔超級多。另外，ChatGPT給的程式碼雖然可以跑，但是我在之後的文章還是會再用class整理一下，以便我們的模型訓練。 ## 四、結語這是一篇相對短的文章(但加上程式碼大概有8000字左右)，我之後的文章可能會覆上github，不然有人留言說有時候我為了方便直接改文章會出錯:p。這次的實作基本上靠chatGPT完成，不得不說它這麼給力有震撼到我，雖然有一些錯誤和缺陷，但都在我能處理的範圍內。這麼一搞害我對GPT技術的興趣又燃起來了，下一個系列沒意外應該是Attention的論文心得，但那又是後話了。 p.s. GPT太強了吧(抖 ###### tags: `AI` `Deep Learning` `Reinforce Learning`