# 〈 Reinforce Learning 決戰黑白棋 Part.1 〉 電腦對人的博弈演算法:Minimax 與 Alpha Beta Pruning --- Tu 2023/5/26 ***“...there was a cut that quite shocked me, because it was a move that would never happen in a human-to-human Go match. But, afterwards I analyzed the move and I found that it was very good. It is one move with two or even more purposes. We call it one stone, two birds.”* -Ke Jie** ![](https://hackmd.io/_uploads/BJjF_m6Bn.jpg) ## 一、前言 看到標題應該知道這個系列要做什麼了。Reinforce learning是我一直很想學但又沒什麼動力學的東西,老實說我最近沒什麼碰Deep learning相關的知識,因為我覺得就我有的設備,就算有了模型,也沒辦法訓練出個什麼。但最近有一門通識課的期末專題是要做黑白棋遊戲,我就想趁此機會來學一下強化式學習。 這篇我主要介紹的Minimax和Alpha beta pruning都算不上機器學習,而是一種演算法(algorithm),會選擇用這個當開頭一方面是因為我覺得很酷(這種演算法還有很多,感覺像片新大陸),另一方面是當我真的做出成果後,我希望我的模型可以擊敗這些演算法。 頂著期末爆炸的壓力,希望能寫完。 ## 二、Minimax 和 Alpha beta pruning 簡介 大家下棋的時候肯定會去預測對手的下一步,而我想大部分人的目標就是能夠預測每一步能導出的結果,Minimax基本上就是這個概念,根據每一種可能的行為去計算該步相對應的品質(quality),比如能夠吃掉多少棋,或獲得多大的優勢等等。 因此,Minimax基本上就是一個Multiway search tree(多元搜尋樹),但是,若要將直到終局的所有可能都計算出來基本上不太可能(如果是圈圈叉叉當然可以,我指的是複雜遊戲),所以通常會設一個depth,也就是計算到對手未來的下幾步,來限制計算範圍。 而Alpha Beta Pruning則是對Minimax的一種改良,將不必要的計算去掉,也就是減少需要計算的node(Pruning)。可以理解成加速Minimax的技術。 ![](https://hackmd.io/_uploads/HJ6campB3.png) 灰色節點代表被修剪掉,也就是沒有運算必要,更詳細可以觀看[圖源的影片](https://www.youtube.com/watch?v=l-hh51ncgDI),講得十分仔細,推薦給你們。 ## 三、遊戲環境 如上所說,我們要做一個黑白棋遊戲,而在那的基礎上我們再加入Minimax和Alpha Beta pruning 來作為遊戲對手。 我們要能夠跟上科技,直接請GPT幫我們造一個出來,以下是我修過的程式碼,符合正常黑白棋規則(因為學校的專題作業需要所以玩家每次都是執黑棋,有志者可以自己改一下)。 ```python=0 import itertools import random import math EMPTY = 0 BLACK = 1 WHITE = 2 # Initialize the board board = [[EMPTY] * 8 for _ in range(8)] board[3][3] = WHITE board[4][4] = WHITE board[3][4] = BLACK board[4][3] = BLACK # Function to print the board def print_board(board): print(" 1 2 3 4 5 6 7 8") print(" -----------------------") for i in range(8): print(f"{i + 1} |", end="") for j in range(8): if board[i][j] == EMPTY: print(" ", end="") elif board[i][j] == BLACK: print("B ", end="") elif board[i][j] == WHITE: print("W ", end="") print("|") print(" -----------------------") # Function to check if a move is valid def is_valid_move(board, row, col, color): if board[row][col] != EMPTY: return False for dr, dc in itertools.product([-1, 0, 1], repeat=2): if dr == 0 and dc == 0: continue r, c = row + dr, col + dc while 0 <= r < 8 and 0 <= c < 8 and board[r][c] != EMPTY and board[r][c] != color: r += dr c += dc if 0 <= r < 8 and 0 <= c < 8 and board[r][c] == color and (r, c) != (row + dr, col + dc): return True return False # Function to make a move def make_move(board, row, col, color): if not is_valid_move(board, row, col, color): return None board[row][col] = color for dr, dc in itertools.product([-1, 0, 1], repeat=2): if dr == 0 and dc == 0: continue r, c = row + dr, col + dc while 0 <= r < 8 and 0 <= c < 8 and board[r][c] != EMPTY and board[r][c] != color: r += dr c += dc if 0 <= r < 8 and 0 <= c < 8 and board[r][c] == color and (r, c) != (row + dr, col + dc): #r, c = row + dr, col + dc 這行有問題 while (r, c) != (row, col): board[r][c] = color r -= dr c -= dc return board # Function to get a list of valid moves for a given color def get_valid_moves(board, color): moves = [] for row in range(8): for col in range(8): if is_valid_move(board, row, col, color): moves.append((row, col)) return moves if len(moves)>0 else False # Function to evaluate the current board state for the AI player def evaluate_board(board): black_score = 0 white_score = 0 for row in range(8): for col in range(8): if board[row][col] == BLACK: black_score += 1 elif board[row][col] == WHITE: white_score += 1 return white_score - black_score # Function for the AI player's move using the Alpha-Beta Pruning algorithm def alphabeta(board, depth, alpha, beta, maximizing_player): if depth == 0 or not get_valid_moves(board, BLACK) or not get_valid_moves(board, WHITE): return evaluate_board(board) if maximizing_player: max_eval = float('-inf') valid_moves = get_valid_moves(board, WHITE) for move in valid_moves: row, col = move temp_board = [row[:] for row in board] make_move(temp_board, row, col, WHITE) eval = alphabeta(temp_board, depth - 1, alpha, beta, False) max_eval = max(max_eval, eval) alpha = max(alpha, eval) if beta <= alpha: break return max_eval else: min_eval = float('inf') valid_moves = get_valid_moves(board, BLACK) for move in valid_moves: row, col = move temp_board = [row[:] for row in board] make_move(temp_board, row, col, BLACK) eval = alphabeta(temp_board, depth - 1, alpha, beta, True) min_eval = min(min_eval, eval) beta = min(beta, eval) if beta <= alpha: break return min_eval # Function for the AI player's move def ai_move(board): valid_moves = get_valid_moves(board, WHITE) best_score = float('-inf') best_move = None for move in valid_moves: row, col = move temp_board = [row[:] for row in board] make_move(temp_board, row, col, WHITE) score = alphabeta(temp_board, 3, float('-inf'), float('inf'), False) if score > best_score: best_score = score best_move = move if best_move: row, col = best_move make_move(board, row, col, WHITE) # Main game loop current_player = BLACK while True: print_board(board) if current_player == BLACK and get_valid_moves(board, BLACK): print("Black's turn (B)") move = input("Enter your move (row col): ") try: row, col = map(int, move.strip().split()) row -= 1 # Convert to 0-indexed col -= 1 if 0 <= row < 8 and 0 <= col < 8 and make_move(board, row, col, current_player)!=None: current_player = WHITE else: print("Invalid move. Try again.") except ValueError: print("Invalid input. Try again.") elif get_valid_moves(board, WHITE): print("White's turn (W) - AI") ai_move(board) current_player = BLACK if not get_valid_moves(board, BLACK) and not get_valid_moves(board, WHITE): break print("Game Over") print_board(board) black_score = sum(row.count(BLACK) for row in board) white_score = sum(row.count(WHITE) for row in board) print(f"Black Score: {black_score}") print(f"White Score: {white_score}") if black_score > white_score: print("Black Wins!") elif white_score > black_score: print("White Wins!") else: print("It's a Tie!") ``` 輸出結果: ``` Game Over 1 2 3 4 5 6 7 8 ----------------------- 1 |W W W W W B B B | 2 |W W W W W W B B | 3 |W W W W W W B B | 4 |W W W W W B B W | 5 |W W W W W B W W | 6 |W W W W W W W W | 7 |W W B B W W W W | 8 |W W W W W W W W | ----------------------- Black Score: 12 White Score: 52 White Wins! ``` **顯而易見的,我被打爆了**。但贏的不必是我,我之後造出來的AI能贏就好(應該啦) 解釋一下上方程式碼重要的幾個點: 1. **alphabeta函式(第88行)**:我說過Minimax的重點是search tree,所以會用到recursive algorithm,而maximizing_player的差別可以理解成數線上的正負兩端,越接近某端就是對某方玩家有力,而我是以黑棋和白棋數的差來表示。 2. **ai_move函式(第120行)**:根據alphabeta找出的最佳走法下棋。 3. **is_valid_move(32)和make_move(47)用到的技巧**:這邊用到了itertool,是一個非常強大的函式庫,在遇到一些麻煩的情況時可以去看有啥能用,像這邊的product就讓程式簡潔超級多。 另外,ChatGPT給的程式碼雖然可以跑,但是我在之後的文章還是會再用class整理一下,以便我們的模型訓練。 ## 四、結語 這是一篇相對短的文章(但加上程式碼大概有8000字左右),我之後的文章可能會覆上github,不然有人留言說有時候我為了方便直接改文章會出錯:p。 這次的實作基本上靠chatGPT完成,不得不說它這麼給力有震撼到我,雖然有一些錯誤和缺陷,但都在我能處理的範圍內。這麼一搞害我對GPT技術的興趣又燃起來了,下一個系列沒意外應該是Attention的論文心得,但那又是後話了。 p.s. GPT太強了吧(抖 ###### tags: `AI` `Deep Learning` `Reinforce Learning`