Run 2023-1-6

2023-1-6:
- Played 0 games.
- Version is v0.4.0.
- Network size is 6 blocks with 128 channels.
- learning rate = 0.005
- batch size = 256
- cpuct-init = 1.25
- utility-factor (all) = 0
- playouts = 100
- reduce-playouts = 80 (75%)
- gumbel-playouts = 50
- resign-playouts = 75 (winrate below 2%)
- Train on the last 500k games.

2023-1-14:
- Played 130k games.
- The strength is same as GUN Go 3.8 on 19x19.

2023-1-18:
- Played 180k games.
- The strength is better than GUN Go 3.8 on 19x19.
  - Sayuri: -w zero-180k.bin.txt -t 1 -b 1 -p 400 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --lcb-utility-factor 0.1 --cpuct-init 0.5
  - GNU Go: --mode gtp --level 10 --chinese-rules
  - Game result (played 30 games with GNU Go):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.0        11              11           22 (73.3%)
GNU Go 3.8            4               4            8 (26.7%)
```

2023-1-22:
- Played 250k games.
- After playing with her, I confirmed the strength was at least 3 dan or 4 dan in the opening and middle game. But she was very bad at yose, life and death. I guess this situation will be fixed after playing 500k games because there are too many random move in the first 100k games which cause the bad result. The random positions will be removed from the delay buffer after playing 500k games.

2023-1-24:
- Played 280k games.
- The strength is better than GUN Go 3.8 on 19x19.
  - Sayuri: -w zero-280k.bin.txt -t 1 -b 1 -p 400 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --lcb-utility-factor 0.1 --cpuct-init 0.5
  - GNU Go: --mode gtp --level 10 --chinese-rules
  - Game result (played 30 games with GNU Go):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.0        13              14           27 (90.0%)
GNU Go 3.8            1               2            3 (10.0%)
```

2023-1-27:
- Played 340k games.
- The strength is equal to Pachi 12.60 on 19x19.
  - Sayuri: -w zero-340k.bin.txt -t 1 -b 1 -p 400 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --lcb-utility-factor 0.1 --cpuct-init 0.5
  - Pachi: threads=1 -t =6400
  - Game result (played 30 games with Pachi):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.0        9               8           17 (56.67%)
Pachi 12.60          7               6           13 (43.33%)
```

2023-1-30:
- Played 390k games.
- Drop the learning rate from 0.005 to 0.0025.

2023-2-3:
- Played 470k games.
- The strength is equal to Pachi 12.60 on 19x19.
  - Sayuri: -w zero-470k.bin.txt -t 1 -b 1 -p 400 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --lcb-utility-factor 0.1 --cpuct-init 0.5
  - Pachi: threads=1 -t =6400
  - Game result (played 30 games with Pachi):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.0       10              10           20 (66.67%)
Pachi 12.60          5               5           10 (33.33%)
```

2023-2-4:
- Played 480k games.
- utility-factor (all) = 0.05 (from 0)
2023-2-8:
- Played 540k games.
- The strength is better than Pachi 12.60 on 19x19.
  - Sayuri: -w zero-540k.bin.txt -t 1 -b 1 -p 400 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --lcb-utility-factor 0.1 --cpuct-init 0.5
  - Pachi: threads=1 -t =6400
  - Game result (played 30 games with Pachi):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.0       11              12           23 (76.67%)
Pachi 12.60          4               3            7 (23.33%)
```

2023-2-9:

Played 560k games.
I played with the her. Sayuri is the white. Here is the SGF and engine settings.
- Sayuri: -w zero-560k.bin.txt -t 1 -b 1 -p 3200 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --lcb-utility-factor 0.1 --cpuct-init 0.5

(;GM[1]FF[4]CA[UTF-8]AP[Sabaki:0.52.0]KM[7.5]SZ[19]DT[2023-02-09]PW[Sayuri-zero];B[dd];W[pd];B[dp];W[pp];B[qq];W[pq];B[qp];W[qo];B[ro];W[qn];B[rn];W[qm];B[rm];W[ql];B[jq];W[cf];B[fc];W[ci];B[ce];W[cl];B[nc];W[ld];B[qc];W[pc];B[qd];W[pe];B[pb];W[qr];B[rr];W[ob];B[qb];W[oc];B[qf];W[fq];B[cn];W[cr];B[cq];W[dr];B[hq];W[fo];B[dl];W[dk];B[dm];W[ek];B[fm];W[bq];B[bp];W[br];B[bf];W[lq];B[lp];W[mp];B[kp];W[mo];B[ic];W[ho];B[fr];W[eq];B[gq];W[en];B[fl];W[fk];B[hl];W[im];B[gk];W[il];B[gi];W[ik];B[lr];W[mr];B[kr];W[ii];B[fj];W[bm];B[lb];W[pf];B[qg];W[pg];B[qh];W[cg];B[bg];W[bh];B[dh];W[ch];B[fg];W[de];B[cd];W[kc];B[kb];W[jc];B[jb];W[id];B[hc];W[hg];B[ee];W[oi];B[qj];W[bn];B[kn];W[lm];B[km];W[kl];B[ln];W[mn];B[ie];W[jf];B[jd];W[je];B[hd];W[pj];B[qk];W[pk];B[rl];W[co];B[kd];W[ke];B[lc];W[md];B[pr];W[or];B[qs];W[ip];B[iq];W[mc];B[mb];W[nb];B[df];W[dg];B[eg];W[ei];B[ej];W[dj];B[fi];W[if];B[ge];W[er];B[gs];W[qe];B[re];W[rd];B[rc];W[hj];B[gj];W[rf];B[sd];W[eh];B[fh];W[di];B[dn];W[do];B[])

2023-2-9:
- Played 570k games.
- Merge the batch normalization layers with convolutional layers. May be a little bit faster. (0.4.1 feature)
- Use the Gumbel random opening instead of old one. (0.4.1 feature)
2023-2-14:
- Played 660k games.
- The strength is better than Pachi 12.60 on 19x19.
  - Sayuri: -w zero-660k.bin.txt -t 1 -b 1 -p 400 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --lcb-utility-factor 0.1 --cpuct-init 0.5
  - Pachi: threads=1 -t =6400
  - Game result (played 30 games with Pachi):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.0       13              13           26 (86.67%)
Pachi 12.60          2               2            4 (13.33%)
```

2023-2-18:
- Played 710k games.
- The strengh is same as Leela Zero with 062 weights.
  - Sayuri: -w zero-710k.bin.txt -t 1 -b 1 -p 100 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --lcb-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 062.gz --noponder -p 100 -g -t 1 -b 1
  - Game result (played 40 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.0       12              11           23 (57.5%)
Leela Zero 0.17      9               8           17 (42.5%)
```
- Consider the totally playouts number in the self-play games. Sayuri 710k weights had done below
  $1.8105 \times 10^{10}$ playouts. Leela Zero 062 weights had already done
  $2.310304 \times 10^{12}$ playouts. If we only count the totally playouts number, the performance of Sayuri is at least 127 times as fast as Leela Zero.
- (There are something wrongs here. May only be 40 ~ 50 times as fast as Leela based on playous.)

2023-2-22:
- Played 780k games.
- Use the FP16.
- Update the engine to 0.4.1
- Drop the learning rate from 0.0025 to 0.001.
- I noticed that the buffer is too big. Maybe we should set the buffer size as 250k games. Small buffer can improve the training int the beginning. But I will fix the buffer size this running.

2023-2-22:
- Played 850k games.
- Version is 0.4.1.
- I found a strange bug. Sayuri could not understand the life-and-death will in some cases. For example,

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

(continue…)
- The left black dragon is in the pass-alive area of white. Sayuri thinks the lower part is live and upper part is death. On the other hand, she thinks the two parts are not connection and lower part is live because there is a real eye in it. There is another interesting result. Sayuri thinks the dragon is completely death if it is not in the pass-alive area of white. See the blow image.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

(continue…)
- So Sayuri should understand the life-and-death will in general. But she may make a mistake when someone is in the pass-alive area. In the current version, she only get completely pass-alive area (black and whtie). She may not confirm owner of pass-alive area if the dragon is too long. A simple way to solve this issue is to fix the feature encoder. We should separate the pass-alive area of black and pass-alive area of white.
(continue…, update it at 2023-4-7)
- Finally, Sayuri can understatnd the life-and-death of this position with 10x128 weights. Maybe the small network (6x128) thinks the black drango are two disconnection block. The large netwok thinks it is one block.

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

2023-2-27:
- Played 860k games.
- The strengh is better than Leela Zero with 066 weights.
  - Sayuri: -w zero-860k.bin.txt -t 1 -b 1 -p 100 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --lcb-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 066.gz --noponder -p 100 -g -t 1 -b 1
  - Game result (played 40 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.1       13              17           30 (75%)
Leela Zero 0.17      3               7           10 (25%)
```

2023-3-6:
- Played 990k games.
- Drop the learning rate from 0.001 to 0.0003.
- The strengh is same as Leela Zero with 071 weights.
  - Sayuri: -w zero-990k.bin.txt -t 1 -b 1 -p 100 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --lcb-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 071.gz --noponder -p 100 -g -t 1 -b 1
  - Game result (played 40 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.1       12              13           25 (62.5%)
Leela Zero 0.17      7               8           15 (37.5%)
```

2023-3-7:
- Played 1010k games.
- Add support for more komi options (0.5.0 feature).
- komi-stddev = 2.5
- komi-big-stddev-prob = 0.06
- komi-big-stddev = 12
- handicap-fair-komi-prob = 0.5

2023-3-8:
- Played 1030k games.
- Remove the some utility options.
- score-utility-factor = 0.1 (from 0.05)

2023-3-10:
- Played 1080k games.
- The strengh is same as Leela Zero with 076 weights.
  - Sayuri: -w zero-1080k.bin.txt -t 1 -b 1 -p 100 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 076.gz --noponder -p 100 -g -t 1 -b 1
  - Game result (played 40 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.1       13              11           24  (60%)
Leela Zero 0.17      9               7           16  (40%)
```

2023-3-13:
- Played 1120k games.
- The strengh is same as Leela Zero with 080 weights.
  - Sayuri: -w zero-1120k.bin.txt -t 1 -b 1 -p 100 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 080.gz --noponder -p 100 -g -t 1 -b 1
  - Game result (played 100 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.1       22              27           49  (49%)
Leela Zero 0.17     23              28           51  (51%)
```
- Sayuri used the
  $2.142 \times 10^{9}$ evals (20x256 size). Leela Zero with LZ-80 used the
  $4.6 \times 10^{10}$ evals (20x256 size) [ref]. Sayuri is 21.48 time as fast as Leela Zero. But she is around 4 times as slow as KataGo v1.0.

2023-3-14:
- Played 1140k games.
- I stop the training. Will restart it after few days.

2023-3-29:
- I am come back. Update the 10x128 network (from 6x128).
- Current learning rate is 0.001.
- Add suppport for bottleneck network (0.5.0 feature). But it is not effective if the network size on the current netork. I will try it in the future.

2023-3-31:
- Played 1245k games (10x128 played 105k games).
- Drop the learning rate from 0.001 to 0.0003.
- The strength is still worst than leela zero with 092 weights.

2023-4-1:
- Played 1315k games (10x128 played 175k games).
- The strengh is same as Leela Zero with 092 weights.
  - Sayuri: -w zero-1315k.bin.txt -t 1 -b 1 -p 100 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 092.gz --noponder -p 100 -g -t 1 -b 1
  - Game result (played 40 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.1       10               8           18  (45%)
Leela Zero 0.17     12              10           22  (55%)
```

2023-4-5:
- Played 1525k games (10x128 played 385k games).
- The strengh is same as Leela Zero with 095 weights.
  - Sayuri: -w zero-1525k.bin.txt -t 1 -b 1 -p 100 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 095.gz --noponder -p 100 -g -t 1 -b 1
  - Game result (played 40 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.1       11               9           20  (50%)
Leela Zero 0.17     11               9           20  (50%)
```

2023-4-7:
- Played 1640k games (10x128 played 500k games).
- Drop the learning rate to 1e-4 (from 3e-4).
- Set the down-sample rate as 32 (from 16).
- Look like the progress is slow donw. Why?
- Actually, the policy head of Sayuri is worst than Leela Zero in the most positions. But blind spots is less than Leela Zero. Why?

2023-4-8
- Played 1710k games (10x128 played 570k games).
- The strengh is same as Leela Zero with 098 weights.
  - Sayuri: -w zero-1710k.bin.txt -t 1 -b 1 -p 100 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 098.gz --noponder -p 100 -g -t 1 -b 1
  - Game result (played 80 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.1       16               21          37  (46.25%)
Leela Zero 0.17     19               24          43  (53.75%)
```

2023-4-11
- Played 1850k games (10x128 played 710k games).
- The strengh is same as Leela Zero with 102 weights.
  - Sayuri: -w zero-1850k.bin.txt -t 1 -b 1 -p 100 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 102.gz --noponder -p 100 -g -t 1 -b 1
  - Game result (played 100 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.1       26               28          54  (54.00%)
Leela Zero 0.17     22               24          46  (46.00%)
```
- Sayuri used the
  $2.25 \times 10^{9}$ evals (20x256 size) since 1140k games. Leela Zero with LZ-102 used around
  $3.9 \times 10^{10}$ evals (20x256 size) since LZ-80 [ref]. KataGo g65 used around
  $1.904 \times 10^{9}$ evals (20x256 size) since last 6x96 weights [ref]. I Guess Sayuri is almost same level as KataGo v1.0 in this section.

2023-4-13
- Played 1975k games (10x128 played 835k games).
- The strengh is same as Leela Zero with 105 weights.
  - Sayuri: -w zero-1975k.bin.txt -t 1 -b 1 -p 100 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 111.gz --noponder -p 100 -g -t 1 -b 1
  - Game result (played 100 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.1       24               25          49  (49.00%)
Leela Zero 0.17     25               26          51  (51.00%)
```
2023-4-16
- Played 2145k games (10x128 played 1005k games).
- The strengh is same as Leela Zero with 111 weights.
  - Sayuri: -w zero-2145k.bin.txt -t 1 -b 1 -p 100 --friendly-pass --reuse-tree --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 111.gz --noponder -p 100 -g -t 1 -b 1
  - Game result (played 100 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.1       18               30          48  (48.00%)
Leela Zero 0.17     20               32          52  (52.00%)
```
- The value head performance is bad sometimes. I also found out KataGo has the same problem. Because the playouts is too low (below 100), the quality of self-play game is also low. It causes the value head is bad. I will use more playouts next step.
- Stop the self-play, Wait for promoting the network (to 15x192).

2023-4-22
- Start the next step training.
- The 6x128 network played 1140k game. The 10x128 network played 1005k game. Totally played 2145k games.
- Network size is 15 blocks with 192 channels.
- The learning rate is 1e-4.
- The batch size is 256.
- Set the playouts as 400 (from 100).
- Set the reduce-playouts as 150 (from 85).
- Set the gumbel-playouts as 85 (from 50).
- Set the resign-playouts as 125 (from 75).
- Set the root-policy-temp as 1.2 (from 1.0).

2023-5-6
- Played 2355k games (15x192 played 210k games).
- The elo gains 50~100 since the first 15x192. I think the progression is too slow. I will try the lower root softmax temperature.
- Drop the root-policy-temp to 1.1 (from 1.2).

2023-6-4
- Played 2645k games (15x192 played 500k games).
- The lower root softmax temperature looks no big different.
- The strengh is close the weights which was used in UEC14 (elo diff -20). I plan to release the v0.5.0 after the strengh is better than it.
- The strengh is weaker than as Leela Zero with 117 weights. ELO different is -83.
  - Sayuri: -w zero-2645k.bin.txt -t 1 -b 1 -p 100 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 117.gz --noponder -v 100 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.4.1       69               84         153  (38.25%)
Leela Zero 0.17    121              116         237  (61.75%)
```

2023-6-21
- Played 2910k games (15x192 played 765k games).
- Update to version 0.5.0.
- Current weights is weaker than my expected. I want to try some other ways. The original MCTX library will rescale the Q value by default. It will transfer the lowest Q value to 0, the highest Q value to 1. So it may make the target policy too sharp. I disable the rescaling. Hope this will be useful.

2023-6-28
- Played 3045k games (15x192 played 900k games).
- The strengh is equal to UEC14 weights.
  - Sayuri: -w zero-3045k.bin.txt -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - UEC14: -w uce14-swa-1700k-v2.bin.txt -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Game result (played 400 games with UEC14 weights, both play 400 visits):
```
Name                   black won       white won    total (win-rate)

Sayrui v0.5.0             100               98         198  (49.5%)
Sayrui v0.5.0 (uec14)     102              100         202  (50.5%)
```
- Now I only train on the last 200k games.
- I misunderstand the Playout Cap Randomization. I record all training data in the self-play. I should discard the bad data.

2023-6-30
- Decreasing the window size does not really hurt the performance. The new network gets +45 elo (against itself, 400 playouts) in this two days.
- Fix the Playout Cap Randomization.
- Set the reduce-playouts as 75 (from 150).
- Set the resign-playouts as 150 (from 125).
- Because Playout Cap Randomization only produces 1/4 data, I will increase the window size (games) recenct days.

2023-7-5
- Now train on the last 250k games.
- Look like the c-scale = 0.1 is too small. Most the KLD value is below 0.1 but KataGo is around 0.5~1.2.
- Set the --gumbel-c-scale as 1 (from 0.1).

2023-7-20
- Played 3385k games (15x192 played 1240k games).
- Now train on the last around 300k games.
- ~~Set the learning rate as 1e-3.~~

2023-7-23
- Played 3435k games (15x192 played 1290k games).
- The strengh is as same as Leela Zero with 117 weights. ELO different is -12.
  - Sayuri: -w zero-3435k.bin.txt -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 117.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.5.0       85              108         193  (48.25%)
Leela Zero 0.17     92              115         207  (52.75%)
```

2023-8-1
- Played 3650k games (15x192 played 1505k games).
- Now train on the last around 375k games.
- I plan to add support for v2 data and v3 network this month. The v2 data will include more information, like this. The first step is generating enough v2 datas, around 500k ~ 1000k games. Then I will train the v3 network.
- After fixing Playout Cap Randomization, the progression likes good. Hope the network can reach the Leela Zero with 130 weights before next month.
- The strengh is as same as Leela Zero with 122 weights. ELO different is -21.
  - Sayuri: -w zero-3650k.bin.txt -t 1 -b 1 -p 400 --lcb-reduction 0 --score-utility-factor 0.1 --cpuct-init 0.5
  - Leela Zero: -w 122.gz --noponder -v 400 -g -t 1 -b 1
  - Game result (played 400 games with Leela Zero):
```
Name             black won       white won    total (win-rate)

Sayrui v0.5.0       84              104         188  (47.00%)
Leela Zero 0.17     96              116         212  (53.00%)
```

2023-8-7
- I think I will finish this run and do another run when I complete the v2 data. Next run will also fix some hyper-parameters. Hope the next run can be better.

2023-8-11
- Stop this run.

Run 2023-1-6

Read more

電腦對局比賽

Sayuri Zero Note

Run 2024-3-21

Run 2023-8-11