Main Run 2023-8-11

# Main Run 2023-8-11 * 2023-8-11 * Start the training. * The version is v0.6.0. * learning rate = 0.005 * batch size = 256 * The network size is 6bx96c. * 2023-8-14 * Played 405k games. * Accumulate around $1.38 \times 10^{8}$ 20bx256c eval queries. * The strengh is better than Leela Zero with [071](https://zero.sjeng.org/networks/257aeeb863dc51bfc598838361225459257377a4b2c9abd3e1ac6cdba1fcc88f.gz) weights. Elo different is +51. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 071.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 104 125 229 (57.25%) Leela Zero 0.17 75 96 171 (42.75%) ``` * 2023-8-15 * Played 490k games. * Accumulate around $1.91 \times 10^{8}$ 20bx256c eval queries. The Leela Zero with 081 weights is around $4.6 \times 10^{10}$ 20bx256c eval queries. The KataGo g65(v1.0.0) with same strength weights is around $5.62 \times 10^{8}$ 20bx256c eval queries. Sayuri is 240 times faster than Leele Zero and 2.94 times faster than KataGo g65. * The strengh is as same as Leela Zero with [081](https://zero.sjeng.org/networks/5e8f3a94dd83e11a6590e893e47eb48780d8682fbd3cb9da92ffe7f8f4853f84.gz) weights. ELO different is +9. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 081.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 93 112 205 (51.25%) Leela Zero 0.17 88 107 195 (48.75%) ``` * 2023-8-16 * Play 555k games. * The 6bx96c will be halted soon. Try the lower learning rate. * learning rate = 0.0025 (from 0.005) * 2023-8-16 * Play 575k games. * The strengh is as same as Leela Zero with [091](https://zero.sjeng.org/networks/b3b00c6d75b4e74946a97b88949307c9eae2355a88f518ebf770c7758f90e357.gz) weights. Elo different is -5. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 091.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 103 94 197 (49.25%) Leela Zero 0.17 106 97 203 (50.75%) ``` * 2023-8-16 * KataGo only estimated NN eval queries. It doesn't include NN cache and reuse-tree eval queries. Sayuri estimated all eval queries. I guess KataGo at least saved 30% eval queries with NN cache and reuse-tree technology. After reducing their effect, the current (played 575 games) is +300~+500 elo against KataGo g104 with same eval queries network. * 2023-8-17 * Played 645k games. * Accumulate around $3 \times 10^{8}$ 20bx256c eval queries. * After considering the NN cache effect and reuse-tree, it may around $2.06 \times 10^{8}$ 20bx256c eval queries. I will use this value in feature testing. * Halt the 6bx96c training. * 2023-8-18 * Start the 10bx128c training. * learning rate = 0.005 * 2023-8-18 * Played 660k games (10bx128c played 15k games). * Accumulate around $2.27 \times 10^{8}$ 20bx256c eval queries. * The strengh is better than Leela Zero with [092](https://zero.sjeng.org/networks/ae205d8b957e560c19cc2bc935a8ea76d08dd9f88110ea783d50829bdca45329.gz) weights. Elo different is +58. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 092.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 109 124 233 (58.25%) Leela Zero 0.17 76 91 167 (41.75%) ``` * 2023-8-19 * Played 710k games (10bx128c played 65k games). * Accumulate around $2.96 \times 10^{8}$ 20bx256c eval queries. * The strengh is better than Leela Zero with [095](https://zero.sjeng.org/networks/5b90bd32ccc835d8e08d41970d39753e5732413d75e8f4035bebb5f1da69fb87.gz) weights. Elo different is +45. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 095.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 109 124 233 (58.25%) Leela Zero 0.17 76 91 167 (41.75%) ``` * 2023-8-20 * Played 800k games (10bx128c played 155k games). * learning rate = 0.0025 (from 0.005) * 2023-8-21 * Played 810k games (10bx128c played 165k games). * Accumulate around $4.36 \times 10^{8}$ 20bx256c eval queries. * The KataGo g65(v1.0.0) with same strength weights is around $2.47 \times 10^{9}$ 20bx256c eval queries. Sayuri is 5.67 times faster than KataGo g65. * The strengh is better than Leela Zero with [102](https://zero.sjeng.org/networks/ed26f634a3420ced1cef437f4cddd7e35edafcf793b96d4cbaf2e7da376ccfec.gz) weights. Elo different is +63. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 102.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 108 128 236 (59.00%) Leela Zero 0.17 72 92 164 (41.00%) ``` * 2023-8-21 * Played 845k games (10bx128c played 200k games). * Accumulate around $4.83 \times 10^{8}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [105](https://zero.sjeng.org/networks/ed26f634a3420ced1cef437f4cddd7e35edafcf793b96d4cbaf2e7da376ccfec.gz) weights. Elo different is +35. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 105.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 108 112 220 (55.00%) Leela Zero 0.17 88 92 180 (45.00%) ``` * 2023-8-22 * Played 930k games (10bx128c played 285k games). * Accumulate around $5.98 \times 10^{8}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [111](https://zero.sjeng.org/networks/e9c2c70b9346f18d9cbaa2bc2a6000556723d9b42ef4c8461d4b909e63d38f67.gz) weights. Elo different is +5. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 111.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 100 103 203 (50.75%) Leela Zero 0.17 97 100 197 (50.25%) ``` * 2023-8-23 * Played 945k games (10bx128c played 300k games). * Accumulate around $6.18 \times 10^{8}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [116](https://zero.sjeng.org/networks/39d465076ed1bdeaf4f85b35c2b569f604daa60076cbee9bbaab359f92a7c1c4.gz) weights. Elo different is +3. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 116.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 91 111 202 (50.50%) Leela Zero 0.17 89 109 198 (49.50%) ``` * 2023-8-24 * Played 1065k games. The 6bx96c played 645k games. The 10bx128c played 420k games. * Accumulate around $7.79 \times 10^{8}$ 20bx256c eval queries. * Halt the 10bx128c training. * 2023-9-2 * I already moved my computer to the my lab. My arms are break... Now I can start the 15bx192c training!!! * learning rate = 0.0025 * batch size = 256 * 2023-9-8 * Played 1195k games (15bx192c played 130k games). * Accumulate around $1.948 \times 10^{9}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [117](https://zero.sjeng.org/networks/ba748d402af1bdd101212cbd217025dee866b3fcc996bd16d1a3134d5591a501.gz) weights. Elo different is +3. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 117.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 85 117 202 (50.50%) Leela Zero 0.17 83 115 198 (49.50%) ``` * 2023-9-11 * Test the optimistic policy. Seemd the Elo of optimistic policy is +35 (win-rate 55%) better than normal policy. * 2023-9-12 * Played 1275k games (15bx192c played 210k games). * Accumulate around $2.653 \times 10^{9}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [122](https://zero.sjeng.org/networks/c0cb605b3dfb366eeec805841495379efc7d206bcd04e95ef4566614100074c5.gz) weights. Elo different is +16. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 122.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 83 126 209 (52.25%) Leela Zero 0.17 74 117 191 (47.75%) ``` * The strenght is as same as the end of last run. * 2023-9-14 * Test the SWA weights. Seemd the Elo of SWA weights is around +296 (win-rate 85%) better than normal weights. I also notice that KataGo use SWA weights for match games. However, I use the normal weights for match games. I will use SWA instead of normal weights after this moment. * 2023-9-15 * Fix the data race bug for NN cache. ~~A serious bug..., that means the old run and current are all bad results?~~ I think it did not effect the training process a lot. * 2023-9-16 * Played 1375k games (15bx192c played 310k games). * Accumulate around $3.532 \times 10^{9}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [135](https://zero.sjeng.org/networks/05b723d0c9e1c10abf18e9ba3e55c378de42e364cfe8a506d425f2af8ca204e8.gz) weights. Elo different is +14. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 135.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 78 130 208 (52.00%) Leela Zero 0.17 70 122 192 (48.00%) ``` * 2023-9-20 * Played 1470k games (15bx192c played 405k games). * Drop the learninig rate to ```0.00125``` (from ```0.0025```). We will not drop the learning rate again for the 15bx192c. The low learning can perform well for value predictions and also improve the strengh. However, high learning rate would make the network plastic. I think current the learning rate is lower enough. * 2023-9-25 * Played 1570k games (15bx192c played 505k games). * Accumulate around $5.209 \times 10^{9}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [143](https://zero.sjeng.org/networks/057adf2c59b6f884ca3c21b2ea67dd7b69e157e616bc9bed140b11b4e0ab8d76.gz) weights. Elo different is -9. * Sayuri: ```-w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 143.gz --noponder -v 400 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 88 107 195 (48.75%) Leela Zero 0.17 93 112 205 (51.25%) ``` * 2023-10-4 * Played 1785k games (15bx192c played 720k games). * Accumulate around $7.038 \times 10^{9}$ 20bx256c eval queries. * The strengh is as same as Leela Zero with [151](https://zero.sjeng.org/networks/672342b58e62910f461cce138b8186b349539dbe98807bf202ab91a72b19d0c7.gz) weights. Elo different is -12. * Sayuri: ```-w current_weights -t 4 -b 2 -p 1600 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 151.gz --noponder -v 1600 -g -t 4 -b 2``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 93 100 193 (48.25%) Leela Zero 0.17 100 107 207 (51.75%) ``` * I use 4 threads with 2 batches instead of 1 thread with 1 batch because increasing the search thread can improve diversity. I find that Sayuri and LeelaZero like to play the same opening after LZ130 level. Seem this method can make Elo of difference weights more smooth? * 2023-10-18 ~ 10-21 * Played 2035k games (15bx192c played 970k games). * Accumulate around $9.324 \times 10^{9}$ 20bx256c eval queries. * Improve the playout cap randomization. We will not always play the best move in the fast search phase. Seems it can improve the diversity and learn the game more quickly on 9x9 games. It is one of v0.6.1 features. * Seem the recent 250k game is no obvious progress against LeelaZero. We abort the current 15b self-play. Then, I try to drop the learning to ```0.0005``` (from ```0.00125```). However, we don't use these networks for the self-play. We call these networks are special 15b weights. * The special 15b weights is as same as Leela Zero with [157](https://zero.sjeng.org/networks/d351f06e446ba10697bfd2977b4be52c3de148032865eaaf9efc9796aea95a0c.gz) weights. Elo different is +2. * Sayuri: ```-w special_weights -t 4 -b 2 -p 1600 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 157.gz --noponder -v 1600 -g -t 4 -b 2``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.0 102 99 201 (50.25%) Leela Zero 0.17 101 98 199 (49.75%) ``` * 2023-10-24 * The Ray author's reply for current status. ``` Hi Hung-Tse Lin I think there are several possible reasons why Sayuri doesn't have enough growth of her strength. 1. Strength measurements Once the go engine reaches a certain strength, it's hard to compare exact strength because games tend to have the same progression. The solution for this problem is simple: provide an opening book to be used for measurement and start games from specified positions of an opening book. 2. The number of visits for self-play is small. The more times the go engine is searched, the stronger it becomes, and this strength for self-play games is a factor that determines the limit of the accuracy of value network predictions. The solution is to increase the number of visits for self-play games. But I recommend you to try other solutions, because this solution slows down the RL progress considerably. 3. Learning rate is too small. From your RL notes, this is probably not the cause. 4. Limitation of the neural network. The fewer the number of parameters in the neural network, the faster it can reach a plateau. FYI, I changed the neural network structure from 15 blocks with 192 channels to 20 blocks with 256 channels when I generated 2,000,000 self-play games. In self-play games the difference in ELO rating is usually twice as large then using other go engines. I'd like you to change a way of measuring strength and see if there is a difference. Then next try is to change the neural network structure. Maybe it's not a bug and the RL process should continue I think. Best regards, Yuki Kobayashi ``` * I prefer updating the network size first, from 15bx192c to 20bx256b. Seems the 20b net could achieve same strength with 2 times pre-batch learning rate. For example, the pre-batch learning rate 20b is ```128 * 0.000625``` . The 15b's is ```256 * 0.0005```. The 15b's is lower of factor for 2.5. But their strength are equal. LeelaZero and KataGo update their network size when they achieve around LZ150. I think updating the netork size now is a reasonable choise. * KataGo use (1000, 200) visits for 15b~20b network at the g104. The average visits is 400. Sayuri is 175. But according to Kobayashi's results, seems the low visits is OK. * I should write a personal match tool for strength measurement because the normal tools can't not start the game from opening book? * 2023-10-30 * Start the 20bx256c training!!! * learning rate = 0.000625 * batch size = 128 * 2023-11-4 * Suspend the training somedays. I need to prepare the Othello engine. * 2023-11-7 * Find a bug which was in the v0.6.0. It may effect the match game result and make Sayuri weaker (maybe?). But the bug did not effect the self-play games. * 2023-11-18 * Played 2200k games (20bx256c played 165k games). * Accumulate around $1.2907 \times 10^{10}$ 20bx256c eval queries. * The match tool is coming soon. I may use personal match tool insteal of two-gtp next time. It will be more fair. * The strengh is as same as Leela Zero with [173](https://zero.sjeng.org/networks/33986b7f9456660c0877b1fc9b310fc2d4e9ba6aa9cee5e5d242bd7b2fb1b166.gz) weights. Elo different is +28. * Sayuri: ```-w current_weights -t 4 -b 2 -p 1600 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 173.gz --noponder -v 1600 -g -t 4 -b 2``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.1 106 110 216 (54.00%) Leela Zero 0.17 90 94 184 (46.00%) ``` * 2023-12-13 * Played 2430k games (20bx256c played 395k games). * Looks no progress for the last 100k ~ 150k games. I decide to reduce the learning rate. * learning rate = 0.0003 (from 0.000625) * batch size = 128 * 2023-12-17 * Played 2480k games (20bx256c played 445k games). * Accumulate around $1.898 \times 10^{10}$ 20bx256c eval queries. * Should I double the visits? * The strengh is close to LZ-ELFv0 finally! * I use my [match tool](https://github.com/CGLemon/gtp-tool) for match games. It random samples the SGF from directory. This avoids both engines play same openings often. And the GNU Go junde will help to confirm the final score. * The strengh is as same as Leela Zero with [174](https://zero.sjeng.org/networks/c9d70c413e589d338743bfa83783e23f378dc0b9aa98940a2fbd1d852dab8781.gz) weights. Elo different is +10. * Sayuri: ```-w current_weights -t 1 -b 1 -p 800 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5``` * Leela Zero: ```-w 174.gz --noponder -v 800 -g -t 1 -b 1``` * Game result (played 400 games with Leela Zero): ``` Name black won white won total (win-rate) Sayrui v0.6.1 92 114 206 (51.50%) Leela Zero 0.17 92 102 194 (48.50%) ``` * 2023-12-22 * I finally have the free time, yet! * I have some problem with current training state. The current match shows the last weights suppresses LZ-ELFv0. But I doubt this result. It may because... * ELF weights is crispy when he plays the strange openings. * The SGF set is unfair. * Some more isseues. The network is unstable. * The win-rate is unstable. In some sequence positions, the network is too self-confidence. However, it will find itself loses the game after searching. * The policy is pretty sharp even cross the some ten thousands games weights. The network alway preferences to play the same openings. The sharp policy may means network can not distinguish some similar positions well. * I think these two issues are stability problems. I try to drop the learning rate to handle these issues. The current status shows stability is not relative to learning rate. I think, in the self-play training set, the trajectory is simple so that the network performance is bad in strange positions. Maybe I can try other method. * Improve the diversity: Leela Zero always playes the best move in the self-play except for opening moves. However, giving it more randomness can improve strengh well. The disadvantage is randomized moves hurts the tree-reused rate. We shold balance them. * Double the visits. In my view points, we do not need to care about the broadness of target distribution. Even if the one-hot policy also can cause good result (e.g. simple policy in Gumbel paper). The main point of target distribution is "Can we find the surprised move". Double the visits should improve the quality. * Hiroshi Yamashita suggested me follow the ```floodgate``` and ```pairing``` to desgin a multi-players match system. Mm... It is Ruby which I never used. * 2023-12-28 * Played 2645k games (20bx256c played 610k games). * Abort the 2nd run. My disk is break. I lost all training data.

Read more

Main Run 2024-3-21

AlphaZero 之加速演算法實作

Test Run

圍棋引擎 Sayuri 開發日誌