Main Run 2024-3-1 (Discarded)

# Main Run 2024-3-1 (Discarded) * 2024-3-1 * Start the training. * The version is v0.7.0. * learning rate = 0.005 * batch size = 256 * The network size is 6bx96c. * 2024-3-5 * Played 320k games. * Accumulate around $5.412 \times 10^{7}$ 20bx256c eval queries. * The strengh is better than Leela Zero with [071](https://zero.sjeng.org/networks/257aeeb863dc51bfc598838361225459257377a4b2c9abd3e1ac6cdba1fcc88f.gz) weights. Elo different is +106. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 071.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.7.0 141 118 259 (64.75%) Leela Zero 0.17 82 59 141 (35.25%) ``` * 2024-3-7 * Played 465k games. * Accumulate around $1.124 \times 10^{8}$ 20bx256c eval queries. * The strengh is better than Leela Zero with [091](https://zero.sjeng.org/networks/b3b00c6d75b4e74946a97b88949307c9eae2355a88f518ebf770c7758f90e357.gz) weights. Elo different is +51. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 091.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.7.0 113 116 229 (57.25%) Leela Zero 0.17 84 87 171 (42.75%) ``` * 2024-3-7 * Played 495k games. * Looks the loss is not stable. Drop the learning rate to 0.0025 (from 0.005) * 2024-3-9 * Played 630k games. * Accumulate around $1.781 \times 10^{8}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [092](https://zero.sjeng.org/networks/ae205d8b957e560c19cc2bc935a8ea76d08dd9f88110ea783d50829bdca45329.gz) weights. Elo different is -18. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 092.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.7.0 91 99 190 (47.5%) Leela Zero 0.17 101 109 210 (52.5%) ``` * 2024-3-9 * Played 640k games. * Accumulate around $1.820 \times 10^{8}$ 20bx256c eval queries. * Halt the 6bx96c training. * 2024-3-10 * Start the 10bx128c training. * learning rate = 0.005 * batch size = 256 * current replay buffer is 175000 games. * 2024-3-13 * Played 800k games (10bx128c played 160k games). * Accumulate around $3.846 \times 10^{8}$ 20bx256c eval queries. * The strengh is better than Leela Zero with [116](https://zero.sjeng.org/networks/39d465076ed1bdeaf4f85b35c2b569f604daa60076cbee9bbaab359f92a7c1c4.gz) weights. Elo different is +82. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 116.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.7.0 132 113 245 (61.25%) Leela Zero 0.17 87 68 155 (38.75%) ``` * 2024-3-15 * Played 915k games (10bx128c played 275k games). * Looks the loss is not stable. Drop the learning rate to 0.0025 (from 0.005) * 2024-3-16 * Looks performance of the last 10bx128c network is bad. The network can not perdict win-rate well under scoring area. Here is a example, the last weights think both side win the game under the scoring area. The MCTS win-rate is 50%. * ![Screenshot from 2024-03-16 14-58-14](https://hackmd.io/_uploads/H19MQaz0p.png) * I should check what happened. Halt this run. * 2024-3-21 * Fix some bug and look like better than before on 7x7. But I think I can not get significantly benefit on scoring territory. I will forbid scoring territory rule next run. However, that does not mean I give up the scoring territory.

Read more

電腦對局比賽

Main Run 2024-3-21

AlphaZero 之加速演算法實作

圍棋引擎 Sayuri 開發日誌