Main Run 2024-3-21

# Main Run 2024-3-21 * 2024-3-21 * Start the training. * The version is v0.7.0. * learning rate = 0.005 * batch size = 256 * The network size is 6bx96c. * **Area Scoring** only (T^T). Maybe next run will play **Territory Scoring** both. </br> * 2024-3-25 * Played 475k games. * The loss is unstable. Drop the learning rate to 0.0025 (from 0.005). </br> * 2024-3-27 * Played 590k games. * Accumulate around $1.707 \times 10^{8}$ 20bx256c eval queries. * The strengh is between LZ091 and LZ092. I will update the match games later. * Halt the 6bx96c training. * The loss is still unstable after playing 100k games. Maybe 6bx96c can not understand the 19x19 well? Here is the whole process of policy loss. You could see the loss values at around 450000 steps and at around 680000 steps are oscillation. * ![policy-loss-6bx96c](https://hackmd.io/_uploads/SynfJBZkA.png) </br> * 2024-3-28 * Start the 10bx128c training. * learning rate = 0.005 * batch size = 256 * current replay buffer is 150000 games. </br> * 2024-3-30 * Played 695k games (10bx128c played 105k games). * Accumulate around $3.061 \times 10^{8}$ 20bx256c eval queries. * The strengh is better than Leela Zero with [116](https://zero.sjeng.org/networks/39d465076ed1bdeaf4f85b35c2b569f604daa60076cbee9bbaab359f92a7c1c4.gz) weights. Elo different is +53. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 116.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.7.0 124 106 230 (57.50%) Leela Zero 0.17 94 76 170 (42.50%) ``` </br> * 2024-4-3 * Played 905k games (10bx128c played 315k games). * Drop the learning rate to 0.0025 (from 0.005). </br> * 2024-4-5 * Played 1040k games (10bx128c played 450k games). * Accumulate around $7.509 \times 10^{8}$ 20bx256c eval queries. * Halt the 10bx128c training. * The strengh is between LZ116 and LZ117. </br> * 2024-4-8 * Start the 15bx192c training. * learning rate = 0.0025 * batch size = 256 * current replay buffer is 200000 games. </br> * 2024-4-9 * Played 1095k games (15bx192c played 55k games). * Accumulate around $9.905 \times 10^{8}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [127](https://zero.sjeng.org/networks/3f6c8dd85e888bec8b0bcc3006c33954e4be5df8a24660b03fcf3e128fd54338.gz) weights. Elo different is -6. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 127.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 596 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.7.0 143 150 293 (49.16%) Leela Zero 0.17 148 155 303 (50.84%) ``` </br> * 2024-4-12 * Test a new network structure mixer convolution, a transformer-like without attention block. Seem the mixer of policy loss and WDL loss are significantly better than resnet. Each network is 6bx128c and trained on 150k games. * ![ploss](https://hackmd.io/_uploads/SJ8A47Ix0.png) * ![wloss](https://hackmd.io/_uploads/B1qTEmUlA.png) * [shengkelong](https://github.com/shengkelong) said "I did get good result on transformer. Performance is bad and The relation of attention map is mess". One possible solution proposed by him is Global Pooling or [Patch Embedding](https://arxiv.org/abs/2010.11929). It could compress spatial size. But it is high risk. Think about why AlphaGo never uses Global Pooling. The Global Pooling may lose the local information for policy head. * Another solution is to remove attention mechanism, like [MLP-Mixer](https://arxiv.org/abs/2105.01601). My current version use depthwise convolution instead of token-mixing. You may see relation works from [here](https://arxiv.org/abs/2203.06717), and [here](https://arxiv.org/abs/2201.09792). Note we don't use any Patch Embedding. * Performance, Resnet (Winograd) > Mixer > Resnet (Im2col). </br> * 2024-4-19 * Played 1380k games (15bx192c played 285k games). Accumulate around $2.826 \times 10^{9}$ 20bx256c eval queries. Current weights strength should be between LZ130 and LZ135. We lack of GPU now. So only release weights. Do not provide full match result. * Drop the learning rate to 0.002 (from 0.0025) because last 80k don't get significantly improvement. I am not really sure so only change a little. * Test 10b mixer network performance. After playing around 400 games. The mixer winrate is around 54%. I think we get some benefit from new structure. Will add mixer block for next training step. However still have some problems. * Although the value head performance of SWA is good. The value head is unstable in the training process. * The evals per second is slower than conv3x3 (Winograd). </br> * 2024-5-6 * Played 1640k games (15bx192c played 600k games). * Accumulate around $5.9118 \times 10^{9}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [151](https://zero.sjeng.org/networks/672342b58e62910f461cce138b8186b349539dbe98807bf202ab91a72b19d0c7.gz) weights. Elo different is +28. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 151.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 686 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.7.0 175 196 371 (54.08%) Leela Zero 0.17 147 168 315 (45.92%) ``` * The strengh is as same as Leela Zero with [154](https://zero.sjeng.org/networks/7ff174e6ecc146c30ad1d6fe60a7089eacee65dfe4cce051bf24e076abfc0b68.gz) weights. Elo different is -2. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 154.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 773 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.7.0 174 210 384 (49.68%) Leela Zero 0.17 176 213 389 (50.32%) ``` * Time of Learning rate change. * Drop the learning rate to 0.0016 (from 0.002) when playing 1435k games. (2024-4-25) * Drop the learning rate to 0.00125 (from 0.0016) when playing 1545k games. (2024-5-1) * After the longer training, the performance of pure mixer block is not good than residual block. Although the mixer block have large view, counter-intuitively, it is not good at life and death of dragon. We try some hybrid struture and find that residual block can help this. <div id="sayuri-art" align="center"> <br/> <h3>ownership of mixer-block</h3> <img src="https://hackmd.io/_uploads/rJPVVHLfA.png" alt="mixer-veiw" width="384"/> <h3>ownership of residual-block</h3> <img src="https://hackmd.io/_uploads/SygxSrIGC.png" alt="residual-veiw" width="384"/> <h3>ownership of hybrid-block</h3> <img src="https://hackmd.io/_uploads/HkgAHH8GC.png" alt="hybrid-veiw" width="384"/> </div> </br> * 2024-5-10 * Played 1710k games (15bx192c played 670k games). * Remove KLD weighted for optimistic policy. </br> * 2024-5-11 * Played 1720k games (15bx192c played 680k games). * Accumulate around $6.6862 \times 10^{9}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [157](https://zero.sjeng.org/networks/d351f06e446ba10697bfd2977b4be52c3de148032865eaaf9efc9796aea95a0c.gz) weights. Elo different is +10. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5 --use-optimistic-policy --random-moves-factor 0.1 --random-moves-temp 0.8``` * Leela Zero: ```-w 157.gz --noponder -v 400 -g -t 1 -b 1 --timemanage off --randomcnt 30 --randomtemp 0.8``` * Game result (played 914 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.7.0 213 257 470 (51.42%) Leela Zero 0.17 200 244 444 (48.58%) ``` </br> * 2024-5-17 * Played 1840k games (15bx192c played 800k games). * Accumulate around $7.8433 \times 10^{9}$ 20bx256c eval queries. * Now use the SWA weights for the self-play games. * Drop the learning rate to 0.001 (from 0.00125) when playing 1825k games. (2024-5-16) </br>

Read more

AlphaZero 之加速演算法實作

Test Run

圍棋引擎 Sayuri 開發日誌

電腦對局比賽