Linux 核心專題: 井字遊戲改進

執行人: cheezad
專題解說錄影

Reviewed by `SuNsHiNe-75`

關於定點數的實作，可考慮透過 stdint.h 中定義的資料類型去取代 double。

cheezad
了解謝謝！

Reviewed by `aa860630`

對於 MCTS 原理的解釋可以附上圖片解說

cheezad
了解！目前已更新在 Lab0-c 中

Reviewed by `hugo0406`

the highest Upper Confidence Bound (UCB) 具體來說是什麼評估標準?

cheezad
我在 Lab0-c 中有更詳細的提到 UCB 的公式，他單純就是數值大小的比較

任務簡介

重做第三次作業，並彙整其他學員的成果

TODO: 解釋 MCTS 的原理

search tree

Monte Carlo Tree Search (MCTS) uses a search tree to identify the next best move. At the root, it selects the node with the highest Upper Confidence Bound (UCB) in the selection step. Upon reaching a leaf node, it expands the next layer of nodes. During the rollout phase, it simulates random moves until a win condition is met. The results are then tracked and backpropagated through the nodes to update their values.
For an example, please visit Lab0-c at the bottom of the paragraph, I provided an example of how the algorithm runs.

TODO: 探討 PRNG 品質對 MCTS 的影響

If the Pseudo Random Number Generator (PRNG) is of poor quality, meaning the numbers it generates are not uniformly distributed, it can negatively impact the Monte Carlo Tree Search (MCTS) during the rollout stage. This may result in certain positions being consistently overlooked, leading to blind spots on the board. Consequently, the search tree may be skewed, preventing it from finding the optimal solution.

量化分析，搭配對應的理論依據。

TODO: 探討 fixed point arithmetic 和其實作

Parts in the code that includes float:

mcts.c

struct node {
    int move;
    char player;
    int n_visits;
    double score;
    struct node *parent;
    struct node *children[N_GRIDS];
};

The score is kept using double.

static inline double uct_score(int n_total, int n_visits, double score)
{
    if (n_visits == 0)
        return DBL_MAX;
    return score / n_visits +
           EXPLORATION_FACTOR * sqrt(log(n_total) / n_visits);
}

Uct score is calculated using the score, and also the EXPLORATION_FACTOR is defined as sqrt(2), sqrt in c is also double. (log is also double)

static struct node *select_move(struct node *node)
{
    struct node *best_node = NULL;
    double best_score = -1;
    for (int i = 0; i < N_GRIDS; i++) {
        if (!node->children[i])
            continue;
        double score = uct_score(node->n_visits, node->children[i]->n_visits,
                                 node->children[i]->score);
        if (score > best_score) {
            best_score = score;
            best_node = node->children[i];
        }
    }
    return best_node;
}

Selection of the next move in selection phase.

static double simulate(char *table, char player)
{
    char current_player = player;
    char temp_table[N_GRIDS];
    memcpy(temp_table, table, N_GRIDS);
    while (1) {
        int *moves = available_moves(temp_table);
        if (moves[0] == -1) {
            free(moves);
            break;
        }
        int n_moves = 0;
        while (n_moves < N_GRIDS && moves[n_moves] != -1)
            ++n_moves;
        int move = moves[rand() % n_moves];
        free(moves);
        temp_table[move] = current_player;
        char win;
        if ((win = check_win(temp_table)) != ' ')
            return calculate_win_value(win, player);
        current_player ^= 'O' ^ 'X';
    }
    return 0.5;
}

Simulation of the game involves

TODO: 藉由工具定位出程式碼的效能瓶頸並改進

善用 perf, valgrind