# 〈 Reinforce Learning 決戰黑白棋 Part.1 〉 電腦對人的博弈演算法:Minimax 與 Alpha Beta Pruning
---
Tu 2023/5/26
***“...there was a cut that quite shocked me, because it was a move that would never happen in a human-to-human Go match. But, afterwards I analyzed the move and I found that it was very good. It is one move with two or even more purposes. We call it one stone, two birds.”* -Ke Jie**
![](https://hackmd.io/_uploads/BJjF_m6Bn.jpg)
## 一、前言
看到標題應該知道這個系列要做什麼了。Reinforce learning是我一直很想學但又沒什麼動力學的東西,老實說我最近沒什麼碰Deep learning相關的知識,因為我覺得就我有的設備,就算有了模型,也沒辦法訓練出個什麼。但最近有一門通識課的期末專題是要做黑白棋遊戲,我就想趁此機會來學一下強化式學習。
這篇我主要介紹的Minimax和Alpha beta pruning都算不上機器學習,而是一種演算法(algorithm),會選擇用這個當開頭一方面是因為我覺得很酷(這種演算法還有很多,感覺像片新大陸),另一方面是當我真的做出成果後,我希望我的模型可以擊敗這些演算法。
頂著期末爆炸的壓力,希望能寫完。
## 二、Minimax 和 Alpha beta pruning 簡介
大家下棋的時候肯定會去預測對手的下一步,而我想大部分人的目標就是能夠預測每一步能導出的結果,Minimax基本上就是這個概念,根據每一種可能的行為去計算該步相對應的品質(quality),比如能夠吃掉多少棋,或獲得多大的優勢等等。
因此,Minimax基本上就是一個Multiway search tree(多元搜尋樹),但是,若要將直到終局的所有可能都計算出來基本上不太可能(如果是圈圈叉叉當然可以,我指的是複雜遊戲),所以通常會設一個depth,也就是計算到對手未來的下幾步,來限制計算範圍。
而Alpha Beta Pruning則是對Minimax的一種改良,將不必要的計算去掉,也就是減少需要計算的node(Pruning)。可以理解成加速Minimax的技術。
![](https://hackmd.io/_uploads/HJ6campB3.png)
灰色節點代表被修剪掉,也就是沒有運算必要,更詳細可以觀看[圖源的影片](https://www.youtube.com/watch?v=l-hh51ncgDI),講得十分仔細,推薦給你們。
## 三、遊戲環境
如上所說,我們要做一個黑白棋遊戲,而在那的基礎上我們再加入Minimax和Alpha Beta pruning 來作為遊戲對手。
我們要能夠跟上科技,直接請GPT幫我們造一個出來,以下是我修過的程式碼,符合正常黑白棋規則(因為學校的專題作業需要所以玩家每次都是執黑棋,有志者可以自己改一下)。
```python=0
import itertools
import random
import math
EMPTY = 0
BLACK = 1
WHITE = 2
# Initialize the board
board = [[EMPTY] * 8 for _ in range(8)]
board[3][3] = WHITE
board[4][4] = WHITE
board[3][4] = BLACK
board[4][3] = BLACK
# Function to print the board
def print_board(board):
print(" 1 2 3 4 5 6 7 8")
print(" -----------------------")
for i in range(8):
print(f"{i + 1} |", end="")
for j in range(8):
if board[i][j] == EMPTY:
print(" ", end="")
elif board[i][j] == BLACK:
print("B ", end="")
elif board[i][j] == WHITE:
print("W ", end="")
print("|")
print(" -----------------------")
# Function to check if a move is valid
def is_valid_move(board, row, col, color):
if board[row][col] != EMPTY:
return False
for dr, dc in itertools.product([-1, 0, 1], repeat=2):
if dr == 0 and dc == 0:
continue
r, c = row + dr, col + dc
while 0 <= r < 8 and 0 <= c < 8 and board[r][c] != EMPTY and board[r][c] != color:
r += dr
c += dc
if 0 <= r < 8 and 0 <= c < 8 and board[r][c] == color and (r, c) != (row + dr, col + dc):
return True
return False
# Function to make a move
def make_move(board, row, col, color):
if not is_valid_move(board, row, col, color):
return None
board[row][col] = color
for dr, dc in itertools.product([-1, 0, 1], repeat=2):
if dr == 0 and dc == 0:
continue
r, c = row + dr, col + dc
while 0 <= r < 8 and 0 <= c < 8 and board[r][c] != EMPTY and board[r][c] != color:
r += dr
c += dc
if 0 <= r < 8 and 0 <= c < 8 and board[r][c] == color and (r, c) != (row + dr, col + dc):
#r, c = row + dr, col + dc 這行有問題
while (r, c) != (row, col):
board[r][c] = color
r -= dr
c -= dc
return board
# Function to get a list of valid moves for a given color
def get_valid_moves(board, color):
moves = []
for row in range(8):
for col in range(8):
if is_valid_move(board, row, col, color):
moves.append((row, col))
return moves if len(moves)>0 else False
# Function to evaluate the current board state for the AI player
def evaluate_board(board):
black_score = 0
white_score = 0
for row in range(8):
for col in range(8):
if board[row][col] == BLACK:
black_score += 1
elif board[row][col] == WHITE:
white_score += 1
return white_score - black_score
# Function for the AI player's move using the Alpha-Beta Pruning algorithm
def alphabeta(board, depth, alpha, beta, maximizing_player):
if depth == 0 or not get_valid_moves(board, BLACK) or not get_valid_moves(board, WHITE):
return evaluate_board(board)
if maximizing_player:
max_eval = float('-inf')
valid_moves = get_valid_moves(board, WHITE)
for move in valid_moves:
row, col = move
temp_board = [row[:] for row in board]
make_move(temp_board, row, col, WHITE)
eval = alphabeta(temp_board, depth - 1, alpha, beta, False)
max_eval = max(max_eval, eval)
alpha = max(alpha, eval)
if beta <= alpha:
break
return max_eval
else:
min_eval = float('inf')
valid_moves = get_valid_moves(board, BLACK)
for move in valid_moves:
row, col = move
temp_board = [row[:] for row in board]
make_move(temp_board, row, col, BLACK)
eval = alphabeta(temp_board, depth - 1, alpha, beta, True)
min_eval = min(min_eval, eval)
beta = min(beta, eval)
if beta <= alpha:
break
return min_eval
# Function for the AI player's move
def ai_move(board):
valid_moves = get_valid_moves(board, WHITE)
best_score = float('-inf')
best_move = None
for move in valid_moves:
row, col = move
temp_board = [row[:] for row in board]
make_move(temp_board, row, col, WHITE)
score = alphabeta(temp_board, 3, float('-inf'), float('inf'), False)
if score > best_score:
best_score = score
best_move = move
if best_move:
row, col = best_move
make_move(board, row, col, WHITE)
# Main game loop
current_player = BLACK
while True:
print_board(board)
if current_player == BLACK and get_valid_moves(board, BLACK):
print("Black's turn (B)")
move = input("Enter your move (row col): ")
try:
row, col = map(int, move.strip().split())
row -= 1 # Convert to 0-indexed
col -= 1
if 0 <= row < 8 and 0 <= col < 8 and make_move(board, row, col, current_player)!=None:
current_player = WHITE
else:
print("Invalid move. Try again.")
except ValueError:
print("Invalid input. Try again.")
elif get_valid_moves(board, WHITE):
print("White's turn (W) - AI")
ai_move(board)
current_player = BLACK
if not get_valid_moves(board, BLACK) and not get_valid_moves(board, WHITE):
break
print("Game Over")
print_board(board)
black_score = sum(row.count(BLACK) for row in board)
white_score = sum(row.count(WHITE) for row in board)
print(f"Black Score: {black_score}")
print(f"White Score: {white_score}")
if black_score > white_score:
print("Black Wins!")
elif white_score > black_score:
print("White Wins!")
else:
print("It's a Tie!")
```
輸出結果:
```
Game Over
1 2 3 4 5 6 7 8
-----------------------
1 |W W W W W B B B |
2 |W W W W W W B B |
3 |W W W W W W B B |
4 |W W W W W B B W |
5 |W W W W W B W W |
6 |W W W W W W W W |
7 |W W B B W W W W |
8 |W W W W W W W W |
-----------------------
Black Score: 12
White Score: 52
White Wins!
```
**顯而易見的,我被打爆了**。但贏的不必是我,我之後造出來的AI能贏就好(應該啦)
解釋一下上方程式碼重要的幾個點:
1. **alphabeta函式(第88行)**:我說過Minimax的重點是search tree,所以會用到recursive algorithm,而maximizing_player的差別可以理解成數線上的正負兩端,越接近某端就是對某方玩家有力,而我是以黑棋和白棋數的差來表示。
2. **ai_move函式(第120行)**:根據alphabeta找出的最佳走法下棋。
3. **is_valid_move(32)和make_move(47)用到的技巧**:這邊用到了itertool,是一個非常強大的函式庫,在遇到一些麻煩的情況時可以去看有啥能用,像這邊的product就讓程式簡潔超級多。
另外,ChatGPT給的程式碼雖然可以跑,但是我在之後的文章還是會再用class整理一下,以便我們的模型訓練。
## 四、結語
這是一篇相對短的文章(但加上程式碼大概有8000字左右),我之後的文章可能會覆上github,不然有人留言說有時候我為了方便直接改文章會出錯:p。
這次的實作基本上靠chatGPT完成,不得不說它這麼給力有震撼到我,雖然有一些錯誤和缺陷,但都在我能處理的範圍內。這麼一搞害我對GPT技術的興趣又燃起來了,下一個系列沒意外應該是Attention的論文心得,但那又是後話了。
p.s. GPT太強了吧(抖
###### tags: `AI` `Deep Learning` `Reinforce Learning`