Run 2023-8-11

2023-8-11
- Start the training.
- The version is v0.6.0.
- learning rate = 0.005
- batch size = 256
- The network size is 6bx96c.

2023-8-14
- Played 405k games.
- Accumulate around
  $1.38 \times 10^{8}$ 20bx256c eval queries.
- The strengh is better than Leela Zero with 071 weights. Elo different is +51.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 071.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.6.0      104              125         229  (57.25%)
Leela Zero 0.17     75               96         171  (42.75%)
```
2023-8-15
- Played 490k games.
- Accumulate around
  $1.91 \times 10^{8}$ 20bx256c eval queries. The Leela Zero with 081 weights is around
  $4.6 \times 10^{10}$ 20bx256c eval queries. The KataGo g65(v1.0.0) with same strength weights is around
  $5.62 \times 10^{8}$ 20bx256c eval queries. Sayuri is 240 times faster than Leele Zero and 2.94 times faster than KataGo g65.
- The strengh is as same as Leela Zero with 081 weights. ELO different is +9.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 081.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.6.0       93              112         205  (51.25%)
Leela Zero 0.17     88              107         195  (48.75%)
```

2023-8-16
- Play 555k games.
- The 6bx96c will be halted soon. Try the lower learning rate.
- learning rate = 0.0025 (from 0.005)

2023-8-16
- Play 575k games.
- The strengh is as same as Leela Zero with 091 weights. Elo different is -5.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 091.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.6.0      103               94         197  (49.25%)
Leela Zero 0.17    106               97         203  (50.75%)
```

2023-8-16
- KataGo only estimated NN eval queries. It doesn't include NN cache and reuse-tree eval queries. Sayuri estimated all eval queries. I guess KataGo at least saved 30% eval queries with NN cache and reuse-tree technology. After reducing their effect, the current (played 575 games) is +300~+500 elo against KataGo g104 with same eval queries network.

2023-8-17
- Played 645k games.
- Accumulate around
  $3 \times 10^{8}$ 20bx256c eval queries.
- After considering the NN cache effect and reuse-tree, it may around
  $2.06 \times 10^{8}$ 20bx256c eval queries. I will use this value in feature testing.
- Halt the 6bx96c training.

2023-8-18
- Start the 10bx128c training.
- learning rate = 0.005

2023-8-18
- Played 660k games (10bx128c played 15k games).
- Accumulate around
  $2.27 \times 10^{8}$ 20bx256c eval queries.
- The strengh is better than Leela Zero with 092 weights. Elo different is +58.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 092.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.0       109              124         233  (58.25%)
Leela Zero 0.17      76               91         167  (41.75%)
```

2023-8-19
- Played 710k games (10bx128c played 65k games).
- Accumulate around
  $2.96 \times 10^{8}$ 20bx256c eval queries.
- The strengh is better than Leela Zero with 095 weights. Elo different is +45.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 095.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.0       109              124         233  (58.25%)
Leela Zero 0.17      76               91         167  (41.75%)
```

2023-8-20
- Played 800k games (10bx128c played 155k games).
- learning rate = 0.0025 (from 0.005)

2023-8-21
- Played 810k games (10bx128c played 165k games).
- Accumulate around
  $4.36 \times 10^{8}$ 20bx256c eval queries.
- The KataGo g65(v1.0.0) with same strength weights is around
  $2.47 \times 10^{9}$ 20bx256c eval queries. Sayuri is 5.67 times faster than KataGo g65.
- The strengh is better than Leela Zero with 102 weights. Elo different is +63.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 102.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.0       108              128         236  (59.00%)
Leela Zero 0.17      72               92         164  (41.00%)
```

2023-8-21
- Played 845k games (10bx128c played 200k games).
- Accumulate around
  $4.83 \times 10^{8}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 105 weights. Elo different is +35.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 105.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.0       108              112         220  (55.00%)
Leela Zero 0.17      88               92         180  (45.00%)
```

2023-8-22
- Played 930k games (10bx128c played 285k games).
- Accumulate around
  $5.98 \times 10^{8}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 111 weights. Elo different is +5.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 111.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.0       100              103         203  (50.75%)
Leela Zero 0.17      97              100         197  (50.25%)
```

2023-8-23
- Played 945k games (10bx128c played 300k games).
- Accumulate around
  $6.18 \times 10^{8}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 116 weights. Elo different is +3.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 116.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.0        91              111         202  (50.50%)
Leela Zero 0.17      89              109         198  (49.50%)
```

2023-8-24
- Played 1065k games. The 6bx96c played 645k games. The 10bx128c played 420k games.
- Accumulate around
  $7.79 \times 10^{8}$ 20bx256c eval queries.
- Halt the 10bx128c training.

2023-9-2
- I already moved my computer to the my lab. My arms are break… Now I can start the 15bx192c training!!!
- learning rate = 0.0025
- batch size = 256

2023-9-8
- Played 1195k games (15bx192c played 130k games).
- Accumulate around
  $1.948 \times 10^{9}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 117 weights. Elo different is +3.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 117.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.0        85              117         202  (50.50%)
Leela Zero 0.17      83              115         198  (49.50%)
```

2023-9-11
- Test the optimistic policy. Seemd the Elo of optimistic policy is +35 (win-rate 55%) better than normal policy.

2023-9-12
- Played 1275k games (15bx192c played 210k games).
- Accumulate around
  $2.653 \times 10^{9}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 122 weights. Elo different is +16.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 122.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.0        83              126         209  (52.25%)
Leela Zero 0.17      74              117         191  (47.75%)
```
- The strenght is as same as the end of last run.

2023-9-14
- Test the SWA weights. Seemd the Elo of SWA weights is around +296 (win-rate 85%) better than normal weights. I also notice that KataGo use SWA weights for match games. However, I use the normal weights for match games. I will use SWA instead of normal weights after this moment.

2023-9-15
- Fix the data race bug for NN cache. ~~A serious bug…, that means the old run and current are all bad results?~~ I think it did not effect the training process a lot.

2023-9-16
- Played 1375k games (15bx192c played 310k games).
- Accumulate around
  $3.532 \times 10^{9}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 135 weights. Elo different is +14.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 135.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.0        78              130         208  (52.00%)
Leela Zero 0.17      70              122         192  (48.00%)
```

2023-9-20
- Played 1470k games (15bx192c played 405k games).
- Drop the learninig rate to 0.00125 (from 0.0025). We will not drop the learning rate again for the 15bx192c. The low learning can perform well for value predictions and also improve the strengh. However, high learning rate would make the network plastic. I think current the learning rate is lower enough.

2023-9-25
- Played 1570k games (15bx192c played 505k games).
- Accumulate around
  $5.209 \times 10^{9}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 143 weights. Elo different is -9.
  - Sayuri: -w current_weights -t 1 -b 1 -p 400 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 143.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.0        88              107         195  (48.75%)
Leela Zero 0.17      93              112         205  (51.25%)
```

2023-10-4
- Played 1785k games (15bx192c played 720k games).
- Accumulate around
  $7.038 \times 10^{9}$ 20bx256c eval queries.
- The strengh is as same as Leela Zero with 151 weights. Elo different is -12.
  - Sayuri: -w current_weights -t 4 -b 2 -p 1600 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 151.gz --noponder -v 1600 -g -t 4 -b 2
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.0        93              100         193  (48.25%)
Leela Zero 0.17     100              107         207  (51.75%)
```
- I use 4 threads with 2 batches instead of 1 thread with 1 batch because increasing the search thread can improve diversity. I find that Sayuri and LeelaZero like to play the same opening after LZ130 level. Seem this method can make Elo of difference weights more smooth?

2023-10-18 ~ 10-21
- Played 2035k games (15bx192c played 970k games).
- Accumulate around
  $9.324 \times 10^{9}$ 20bx256c eval queries.
- Improve the playout cap randomization. We will not always play the best move in the fast search phase. Seems it can improve the diversity and learn the game more quickly on 9x9 games. It is one of v0.6.1 features.
- Seem the recent 250k game is no obvious progress against LeelaZero. We abort the current 15b self-play. Then, I try to drop the learning to 0.0005 (from 0.00125). However, we don't use these networks for the self-play. We call these networks are special 15b weights.
- The special 15b weights is as same as Leela Zero with 157 weights. Elo different is +2.
  - Sayuri: -w special_weights -t 4 -b 2 -p 1600 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 157.gz --noponder -v 1600 -g -t 4 -b 2
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.0       102               99         201  (50.25%)
Leela Zero 0.17     101               98         199  (49.75%)
```

2023-10-24

The Ray author's reply for current status.

Hi Hung-Tse Lin

I think there are several possible reasons why Sayuri doesn't have 
enough growth of her strength.

1. Strength measurements
  Once the go engine reaches a certain strength, it's hard to
  compare exact strength because games tend to have the same
  progression. The solution for this problem is simple: provide
  an opening book to be used for measurement and start games from
  specified positions of an opening book.

2. The number of visits for self-play is small.
  The more times the go engine is searched, the stronger it becomes, 
  and this strength for self-play games is a factor that determines
  the limit of the accuracy of value network predictions. The
  solution is to increase the number of visits for self-play games.
  But I recommend you to try other solutions, because this solution 
  slows down the RL progress considerably.

3. Learning rate is too small.
  From your RL notes, this is probably not the cause.

4. Limitation of the neural network.
  The fewer the number of parameters in the neural network, the
  faster it can reach a plateau. FYI, I changed the neural network 
  structure from 15 blocks with 192 channels to 20 blocks with 256 
  channels when I generated 2,000,000 self-play games.

In self-play games the difference in ELO rating is usually twice
as large then using other go engines. I'd like you to change a way
of measuring strength and see if there is a difference. Then next
try is to change the neural network structure.
Maybe it's not a bug and the RL process should continue I think.

Best regards,
Yuki Kobayashi

I prefer updating the network size first, from 15bx192c to 20bx256b. Seems the 20b net could achieve same strength with 2 times pre-batch learning rate. For example, the pre-batch learning rate 20b is 128 * 0.000625 . The 15b's is 256 * 0.0005. The 15b's is lower of factor for 2.5. But their strength are equal. LeelaZero and KataGo update their network size when they achieve around LZ150. I think updating the netork size now is a reasonable choise.
KataGo use (1000, 200) visits for 15b~20b network at the g104. The average visits is 400. Sayuri is 175. But according to Kobayashi's results, seems the low visits is OK.
I should write a personal match tool for strength measurement because the normal tools can't not start the game from opening book?

2023-10-30
- Start the 20bx256c training!!!
- learning rate = 0.000625
- batch size = 128

2023-11-4
- Suspend the training somedays. I need to prepare the Othello engine.

2023-11-7
- Find a bug which was in the v0.6.0. It may effect the match game result and make Sayuri weaker (maybe?). But the bug did not effect the self-play games.

2023-11-18
- Played 2200k games (20bx256c played 165k games).
- Accumulate around
  $1.2907 \times 10^{10}$ 20bx256c eval queries.
- The match tool is coming soon. I may use personal match tool insteal of two-gtp next time. It will be more fair.
- The strengh is as same as Leela Zero with 173 weights. Elo different is +28.
  - Sayuri: -w current_weights -t 4 -b 2 -p 1600 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 173.gz --noponder -v 1600 -g -t 4 -b 2
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.1       106              110         216  (54.00%)
Leela Zero 0.17      90               94         184  (46.00%)
```
2023-12-13
- Played 2430k games (20bx256c played 395k games).
- Looks no progress for the last 100k ~ 150k games. I decide to reduce the learning rate.
- learning rate = 0.0003 (from 0.000625)
- batch size = 128

2023-12-17
- Played 2480k games (20bx256c played 445k games).
- Accumulate around
  $1.898 \times 10^{10}$ 20bx256c eval queries.
- Should I double the visits?
- The strengh is close to LZ-ELFv0 finally!
- I use my match tool for match games. It random samples the SGF from directory. This avoids both engines play same openings often. And the GNU Go junde will help to confirm the final score.
- The strengh is as same as Leela Zero with 174 weights. Elo different is +10.
  - Sayuri: -w current_weights -t 1 -b 1 -p 800 --lcb-reduction 0.02 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 174.gz --noponder -v 800 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won     total (win-rate)

Sayrui v0.6.1        92              114         206  (51.50%)
Leela Zero 0.17      92              102         194  (48.50%)
```

2023-12-22
- I finally have the free time, yet!
- I have some problem with current training state. The current match shows the last weights suppresses LZ-ELFv0. But I doubt this result. It may because…
  - ELF weights is crispy when he plays the strange openings.
  - The SGF set is unfair.
- Some more isseues. The network is unstable.
  - The win-rate is unstable. In some sequence positions, the network is too self-confidence. However, it will find itself loses the game after searching.
  - The policy is pretty sharp even cross the some ten thousands games weights. The network alway preferences to play the same openings. The sharp policy may means network can not distinguish some similar positions well.
- I think these two issues are stability problems. I try to drop the learning rate to handle these issues. The current status shows stability is not relative to learning rate. I think, in the self-play training set, the trajectory is simple so that the network performance is bad in strange positions. Maybe I can try other method.
  - Improve the diversity: Leela Zero always playes the best move in the self-play except for opening moves. However, giving it more randomness can improve strengh well. The disadvantage is randomized moves hurts the tree-reused rate. We shold balance them.
  - Double the visits. In my view points, we do not need to care about the broadness of target distribution. Even if the one-hot policy also can cause good result (e.g. simple policy in Gumbel paper). The main point of target distribution is "Can we find the surprised move". Double the visits should improve the quality.
- Hiroshi Yamashita suggested me follow the floodgate and pairing to desgin a multi-players match system. Mm… It is Ruby which I never used.

2023-12-28
- Played 2645k games (20bx256c played 610k games).
- Abort the 2nd run. My disk is break. I lost all training data.

Run 2023-8-11

Read more

電腦對局比賽

Sayuri Zero Note

Run 2024-3-21

Run 2023-1-6