# Manual
## 1. Environment:
---
**Server**
```
Ip: 100.73.42.5 (server1)
100.73.42.6 (server2)
User: ubuntu
Password: p@ssw0rd
```
**Network**
```
server1(100.73.42.5)
br0: 100.73.42.5 -> 1G Net
enp129s0f0: 192.168.105.1 -> 10G Net
server2(100.73.42.6)
br0: 100.73.42.6 -> 1G Net
enp129s0f0: 192.168.105.2 -> 10G Net
```
## 2. Cuju Execution
---
## Cuju
### 1. NFS
- Insert this line in `/etc/exports` to add your NFS folder:
```
/home/[your username]/nfsfolder *(rw,no_root_squash,no_subtree_check)
```
- After editing `/etc/exports`, type `/etc/init.d/nfs-kernel-server restart` or `reboot`
- Mount the NFS folder
```
$ sudo mount -t nfs 192.168.11.1:/home/[your username]/nfsfolder /mnt/nfs
```
```
Script sample:
#!/bin/bash
sudo mount -t nfs 127.0.0.1:/home/ubuntu/blkserver_nfs/ /home/ubuntu/cuju_blkserver/blkserver/
```
### 2. Build Cuju
* Clone Cuju on your NFS folder from Github
```
$ cd /mnt/nfs
$ git clone https://github.com/Cuju-ft/Cuju.git
```
* Configure & Compile Cuju-ft
```
$ cd Cuju
$ ./configure --enable-cuju --enable-kvm --disable-pie --target-list=x86_64-softmmu
$ make -j8
```
* Configure, Compile & insmod Cuju-kvm module `*1` `*2`
```
$ cd Cuju/kvm
$ ./configure
$ make -j8
$ ./reinsmodkvm.sh
```
>`*1` If you meet `error: incompatible type for argument 5 of '__get_user_pages_unlocked'`, you can use this patch:
>```
>$ cd Cuju
>$ patch -p1 < ./patch/__get_user_pages_unlocked.patch
>```
>
>`*2` If you meet `error: implicit declaration of function 'use_eager_fpu' [-werror=implicit-function-declaration]`, you can use this patch:
>```
>$ cd Cuju
>$ patch -p1 < ./patch/use_eager_fpu.patch
>```
### 3. Execute Cuju
* Before launching your VM, you should update kvm module in Primary and Backup nodes:
```
$ cd /mnt/nfs/Cuju/kvm
$ ./reinsmodkvm.sh
```
* Boot VM (on Primary Host)
* ```runvm.sh```
```
sudo ./x86_64-softmmu/qemu-system-x86_64 \
-drive if=none,id=drive0,cache=none,format=raw,file=/mnt/nfs/Ubuntu20G-1604.img \
-device virtio-blk,drive=drive0 \
-m 1G -enable-kvm \
-net tap,ifname=tap0 -net nic,model=virtio,vlan=0,macaddr=ae:ae:00:00:00:25 \
-cpu host \
-vga std -chardev socket,id=mon,path=/home/[your username]/vm1.monitor,server,nowait -mon chardev=mon,id=monitor,mode=readline
```
You need to change the guest image path (`file=/mnt/nfs/Ubuntu20G-1604.img`) and monitor path (`path=/home/[your username]/vm1.monitor`) for your environment
* Use VNC to see the console
```
$ vncviewer :5900 &
```
The default `account/password` is `root/root` if you use we provide guest image
* Start Receiver (on Backup Host)
* ```recv.sh```(sample `/home/ubuntu/cuju_blkserver/blkserver/Cuju/run_primary`)
```
sudo x86_64-softmmu/qemu-system-x86_64 \
-drive if=none,id=drive0,cache=none,format=raw,file=/mnt/nfs/Ubuntu20G-1604.img \
-device virtio-blk,drive=drive0 \
-m 1G -enable-kvm \
-net tap,ifname=tap1 -net nic,model=virtio,vlan=0,macaddr=ae:ae:00:00:00:25 \
-vga std -chardev socket,id=mon,path=/home/[your username]/vm1r.monitor,server,nowait -mon chardev=mon,id=monitor,mode=readline \
-cpu host \
-incoming tcp:0:4441,ft_mode
```
* After VM boot and Receiver ready, you can execute following script to enter FT mode
* ```ftmode.sh```(sample `/home/ubuntu/cuju_blkserver/blkserver/Cuju/ft_mode`)
```
sudo echo "migrate_set_capability cuju-ft on" | sudo nc -U /home/[your username]/vm1.monitor
sudo echo "migrate -c tcp:192.168.111.2:4441" | sudo nc -U /home/[your username]/vm1.monitor
```
You need to change the ip address and port (`tcp:192.168.111.2:4441`) for your environment, this is Backup Host's IP
And change the monitor path (`/home/[your username]/vm1.monitor`) for your environment
## Group Fault Tolerance
---
### 1. GFT work flow in server
100.73.42.5
* 開機後先執行~/nfs/Cuju/kvm裡的reinsmodkvm.sh
* 開啟VM: 在~/nfs/Cuju裡執行./open_primary.sh -> ./open_backup.sh
* 開啟5台VM: id 0~4
* open_primary.sh會執行run_ft~run_ft5(各自的VM開機script)
* 調整gft id/RAM大小/smp數量...等
* open_backup.sh會執行recv_b~recv_b5(各自的backup VM 開機script)
*
* 若要開超過5台VM: 到100.73.42.6 ~/nfs/Cuju裡執行 ./open_primary6.sh -> ./open_backup6.sh
* 開啟5台VM: id 5~9
* 用vncviewer確認VM是否開好
* 開完後執行add_host.sh
* add_host.sh內容依group member數量自行調整參數
* gft_add_host [gft_id] [primary的實體機器ip] [GFT PORT] [VM MAC] [SLAVE的實體機器ip] [SLAVE接收備份的port]
* 每個gft_add_host都是下到leader裡
* 若超過五台可以使用add_host_more.sh
* 若超過五台,需要到100.73.42.6 ~/nfs 下cuju_on_6.sh
* 這個script對.6的機器下migrate_set_capability cuju_ft on, 因為在.6開的機器monitor無法在.5控制,所以要先設定
* 最後在.5 ~/nfs 執行./gft_init.sh 開始GFT
* gft_init.sh分為兩部份: migrate_set_capability cuju_ft on 和 gft_init
* migrate_set_capability cuju_ft on 在每台group member都要下
* gft_init只要對leader下就好
#### individual FT & non-buffered FT測試
* 前面步驟一樣開啟VM,確認VM是否開好
* 開完後對每台VM執行ftmode[X].sh, [X]是每台機器順序(ftmode.sh~ftmode10.sh)
* non-buffered FT測試要先修改code, 在 $~/nfs/Cuju/hw/net/virtio-net.c裡
```
// For CUJU-FT
//ret = qemu_sendv_packet_async(qemu_get_subqueue(n->nic, queue_index),
// out_sg, out_num, virtio_net_tx_complete);
ret = qemu_sendv_packet_async_proxy(qemu_get_subqueue(n->nic, queue_index),
out_sg, out_num, virtio_net_tx_complete);
```
修改成
```
// For CUJU-FT
ret = qemu_sendv_packet_async(qemu_get_subqueue(n->nic, queue_index),
out_sg, out_num, virtio_net_tx_complete);
//ret = qemu_sendv_packet_async_proxy(qemu_get_subqueue(n->nic, queue_index),
// out_sg, out_num, virtio_net_tx_complete);
```
#### run stage length測試
* 修改Cuju/include/migration/cuju-kvm-share-mem.h中的 EPOCH_TIME_IN_MS
#### memory滿載
* 在VM中執行以下script即可讓VM memory滿載
* #!/bin/bash
stress --vm-bytes $(awk '/MemFree/{printf "%d\n", $2 * 0.9;}' < /proc/meminfo)k --vm-keep -m 1
### 2. APM synthetic multi-tier service testing
#### work flow
1. 在~/nfs/amtt裡的_amtapp_linear.yaml(可改檔案名)修改topology配置
2. 生成amttcn -> $./amttcngen.sh _amtapp_linear.yaml
3. 確認VM都以開啟完畢
4. 到~/nfs/cmd裡使用amttcn配置topology到各VM中 -> $./amttcn
5. 執行後即可對開啟的ip和port傳request -> ~/nfs 裡 $./curl.sh
#### script功能
* _amtapp_linear.yaml (可以改名) -> 設定topology
* compute_node -> 設定實體機器
* 因為兩台機器互通且有nfs,故這邊只需要設定一台就可以跑兩台機器的實驗
* 大component -> 不用修改
* application
* yaml沒有case_linear application使用,故linear的測試就沿用case_loadbalancer
* 配置各個VM,VM名字直接使用VM的ip
* forwarder設定arg1~arg5: {forwarder自己的ip / 自己的port / 要轉傳到的VM ip / 要轉傳到的VM port / delay(可以不用改)}
* DB server設定arg1~arg3: {DB server自己的ip / 自己的port / delay}
* Load balancer設定arg1~arg3: {自己ip / 自己port / delay}
* content裡設定有幾台可以forward(fanout)
* Multicaster設定同Load balancer
* curl.sh
* 跑多次curl指令傳POST request的script
* 可調整次數(for迴圈)
### 3. GFT
* migration.c裡的#define GFT_RESYNC -> 開關resync功能(註解掉即關閉)
* migration.c裡的#define ft_debug_mode_enable -> 開關debug message
#### leader code flow
* hmp_gft_init
* qmp_gft_leader_init
* set fd handler => fd:group_ft_master_sock / read:gft_master_accept_other_master
* set the value of group_ft_leader_inited
* distribute group member information to all group nodes
* SEND MIG_JOIN_GFT_ADD_HOST & MIG_JOIN_GFT_INIT to master
* set fd handler for each master => fd:sd / read:gft_leader_read_master
* gft_start_migration()
* gft_start_migration
* find self index in group_ft_members array and start qmp_migrate respectively for each node
* qmp_migrate
* cuju_tcp_start_outgoing_migration
* cuju_socket_start_outgoing_migration
* cuju_migration_channel_connect
* migrate_fd_connect
* migration_thread
* after thread created, enter migrate_ft_trans_get_ready(how?)
* gft_connect_internal
* gft_master_connect_other_master
* gft_leader_read_master
* expect MIG_JOIN_GFT_MIGRATION_DONE from individual nodes
* when receive enough acks, call gft_leader_broadcast_all_migration_done()
* gft_leader_broadcast_all_migration_done
* send MIG_JOIN_GFT_MIGRATION_ALL to all nodes, which will be read by gft_master_read_leader
#### master code flow
* gft_init
* set fd handler => fd:group_ft_master_sock / read:gft_master_accept_leader
* gft_master_accept_leader
* accept MIG_JOIN_GFT_ADD_HOST & MIG_JOIN_GFT_INIT from leader
* group members get the info about other group members
* set fd handler => fd:server_fd(group_ft_master_sock) / read:gft_master_accept_other_master
* set fd handler => fd:fd(group_ft_leader_sock) / read:gft_master_read_leader
* gft_start_migration()
* gft_master_accept_other_master
* accept the
* if(GFT_WAIT) call migrate_ft_trans_get_ready();
* gft_master_read_leader
- Receive command MIG_JOIN_GFT_MIGRATION_ALL
- close fd
- set group_ft_members_ready = group_ft_members_size
#### master code flow while GFT_WAIT
* gft_master_read_master
* detect other master VM failed
* gft_status = GFT_WAIT
* call gft_reset_all
* gft_reset_all
* gft_reset_connections current migrate state
* gft_reset_connections previous migrate state
* set s->ft_state to FT_INIT
* set n->ft_state to FT_TRANSACTION_PRE_RUN
* set fd handler => fd:group_ft_master_sock / read:gft_master_accept_leader
* gft_master_accept_leader
* qemu_iohandler_ft_pause(false)
* recv MIG_JOIN_GFT_ADD_HOST & MIG_JOIN_GFT_INIT
* recv info. about other group members
* set fd handler => fd:server_fd(group_ft_master_sock) / read:gft_master_accept_other_master
* set fd handler => fd:fd(group_ft_leader_sock) / read:gft_master_read_leader
* send MIG_JOIN_GFT_INIT_ACK
* gft_start_migration()
* gft_master_accept_other_master
* s1 = migrate_get_current();
* s2 = migrate_get_next(s1);
* don't set cur_off, using current cur_off
* call migrate_ft_trans_get_ready();
* migrate_ft_trans_get_ready
* gft_connect_internal
* gft_master_connect_other_master
* send MIG_JOIN_GFT_NEW & my_gft_id
* gft_master_notify_leader_migration_done
* if (group_ft_leader_sock)
* **send MIG_JOIN_GFT_MIGRATION_DONE (by group_ft_leader_sock)**
* else if group_ft_members_ready == group_ft_members_size
* gft_leader_broadcast_all_migration_done
* gft_master_wait_all_migration_done
* if group_ft_members_size == group_ft_members_ready
* gft_prepare_snapshot_bitmap
* gft_master_start_listen_other_masters
* migration_paused = false
* qemu_iohandler_ft_pause(false)
* vm_start
* migrate_run
* change to GFT_PRE
### 4. GFT add/remove member
* fail發生後手動add member(需回到前一個commit)
* 對failover的member下live migration指令 -> gft_member_live_mig.sh
* live mig結束後對leader下add_member指令 -> gft_add_member.sh
* fail發生後自動add member
* 建立GFT前要在add_host.sh裡加gft_add_backup(Cuju/gft_script資料夾裡) 加入backup pool的VM
* 對slave VM下auto_resync.sh 讓slave VM知道backup pool VM
* 目前狀況:砍最後一台VM可以正常resync,但砍其他台可能有問題
* add/remove新增變數說明
* group_ft_backup
* 儲存由gft_add_backup加入的backup pool備份機器的資訊
* group_ft_failed_members
* 儲存偵測到發生錯誤的機器的資訊
* resync_sock
* 建立另一個socket給failover member 作live migration初始備份完成時通知leader的管道
* is_gft_adding_new_member
* 確認目前的GFT是不是在加新成員的階段(下gft_add_member之後)
* is_accept_finish
* check trans_get_ready enter after accept_other_master finish in resync process member side
* gft_backup_used
* 目前backup pool有多少台已經被拿去當正式備份機器
* gft_backup_pool_size
* resync_conn_count
* 可以調整migrate_run中檢查failover member是否完成初始備份的socket connect的頻率
* resync_is_conn
* 確認resync_sock是否建立
* resync_is_recv_flag
* 確認resync_sock建立後傳的訊息是否有收到
* add/remove新增function
* 1~7 -> dynamic resync流程
1. GFT有機器發生錯誤
2. 其他正常運作的member偵測到
* gft_master_read_master()偵測到(len == -EINVAL)
* 檢查是哪一台成員fail
* leader收每一台的message,沒收到的就是failed member
* if leader failed -> 換gft_id = 1的VM接手當新leader
* 備份failed member的資訊
* call gft_reset_all()
3. 其他member暫停,減少成員, 重新建立少一個成員的GFT
* 在gft_reset_all()裡判斷是不是leader
* 是的話就重建GFT info,略過failed member的資訊
* 執行gft_leader_init()
4. fail的機器做failover到backup機器
5. failover後的機器作live migration初始狀態備份到新的backup VM(從backup pool裡拿)
* cuju-ft-trans-file.c裡 cuju_ft_trans_close()
* 檢查backup pool是否還有backup VM可以用
* 若used >= pool_size代表沒有可用的backup VM
* 直接繼續VM運行,先不作FT
* else -> 讓failover VM和leader建立連線,建完後執行qmp_migrate()作live migration初始狀態備份
* gft_backup_used++(用掉一台backup pool的VM)
* qmp_migrate()執行到 gft_master_wait_all_migration_done()後暫停
6. failover VM做完初始狀態備份後暫停,通知leader去作加入成員的動作
* gft_master_wait_all_migration_done()裡if(is_gft_new_member)
* 傳訊息給leader
* 傳完後close resync_sock
7. leader通知所有member暫停, 重新建立多一個成員的GFT
* migrate_run()
* if(!resync_is_recv_flag && is_leader)
* 每次migrate_run()都嘗試接收failover VM是否做完初始狀態備份的訊(MIG_JOIN_GFT_READY_TO_RESYNC)
* 接到MIG_JOIN_GFT_READY_TO_RESYNC後關閉resync_sock,作qmp_gft_add_member()和gft_reset_all()
* qmp_gft_add_member() -> 發訊息給所有group member讓他們也從gft_master_read_master()進入gft_reset_all()暫停
* leader進入gft_reset_all() -> 重建GFT,先加入舊有的member
* 離開gft_reset_all()後再加入failover的member,然後qmp_gft_leader_init()重啟GFT
## Block server
---
### 1. Switch to block server branch and reCompile Cuju
```
$ git checkout feature/blk-server
$ make -j8
```
### 2. Build Block server
- Make script of "server","primary"and"backup" for block server
- `blk_server.sh`(sample `/home/ubuntu/cuju_blkserver/blkserver/Cuju/run_blkserver`)
```
sudo ./x86_64-softmmu/qemu-system-x86_64 \
-drive if=none,id=drive0,cache=none,format=raw,file=/mnt/nfs/Ubuntu20G-1604.img \
-device virtio-blk,drive=drive0 \
-m 1G -enable-kvm \
-net tap,ifname=tap0 -net nic,model=virtio,vlan=0,macaddr=ae:ae:00:00:00:25 \
-cpu host \
-vga std -chardev socket,id=mon,path=/home/[your username]/vm1.monitor,server,nowait -mon chardev=mon,id=monitor,mode=readline \
-ft-join-port 4000 -blk-server-listen :5001
```
- `blk_primary.sh`(sample `/home/ubuntu/cuju_blkserver/blkserver/Cuju/run_primary_blk`)
```
sudo ./x86_64-softmmu/qemu-system-x86_64 \
-drive if=none,id=drive0,cache=none,format=raw,file=/mnt/nfs/Ubuntu20G-1604.img \
-device virtio-blk,drive=drive0 \
-m 1G -enable-kvm \
-net tap,ifname=tap0 -net nic,model=virtio,vlan=0,macaddr=ae:ae:00:00:00:25 \
-cpu host \
-vga std -chardev socket,id=mon,path=/home/[your username]/vm1.monitor,server,nowait -mon chardev=mon,id=monitor,mode=readline \
-blk-server [server IP]:5001
```
- `blk_backup.sh`(sample `/home/ubuntu/cuju_blkserver/blkserver/Cuju/run_backup_blk`)
```
sudo x86_64-softmmu/qemu-system-x86_64 \
-drive if=none,id=drive0,cache=none,format=raw,file=/mnt/nfs/Ubuntu20G-1604.img \
-device virtio-blk,drive=drive0 \
-m 1G -enable-kvm \
-net tap,ifname=tap1 -net nic,model=virtio,vlan=0,macaddr=ae:ae:00:00:00:25 \
-vga std -chardev socket,id=mon,path=/home/[your username]/vm1r.monitor,server,nowait -mon chardev=mon,id=monitor,mode=readline \
-cpu host \
-incoming tcp:0:4441,ft_mode \
-blk-server [server IP]:5001
```
### 3.Execute Flow
- 1.Execute server
- 2.Execute primary(Use vncviewer to make sure that VM is started)
- 3.Execute backup
- 4.Execute ft_mode(Make sure that ft_mode is using 10G net)
```
sample of ft_mode(/home/ubuntu/cuju_blkserver/blkserver/Cuju/ft_mode)
sudo echo "migrate_set_capability cuju-ft on" | sudo nc -U /home/ubuntu/cuju_blkserver/blkserver/vm1.monitor
sudo echo "migrate -d -c tcp:192.168.105.1:4441" | sudo nc -U /home/ubuntu/cuju_blkserver/blkserver/vm1.monitor
```
- 5.Use vncviewer or ssh to connect VM
```
In our server,I set 100.73.42.51 for the cuju VM
$sh root@100.73.42.51
```
### 4.Code
#### Main function of block server
#### kvm_blk.c
* kvm_blk_server_init
* 1 Init block server's Server-terminal
* 2 Call by vl.c -> main()
* kvm_blk_client_init
* 1 Init block server's Client-terminal
* 2 Call by vl.c -> main()
* 3 Call by migration/cuju-ft-trans-file.c -> cuju_ft_trans_close()
#### kvm_blk_client.c
* kvm_blk_aio_readv
* 1 Get the read request from VM and send to block server's Server-termial
* 2 Call from migration/event-tap.c -> blk_aio_preadv_proxy()
* kvm_blk_aio_write
* 1 Get the write request from VM and send to block server's Server-termial
* 2 Call from hw/block/virtio-blk.c -> submit_requests()
* kvm_blk_epoch_timer
* 1 Do the epoch state of fault tolerance
* 2 Call from migration/migration.c -> migrate_timer()
* kvm_blk_epoch_commit
* 1 Do the commit state of fault tolerance
* 2 Call from migration/migration.c -> kvmft_flush_output()
* kvm_blk_notify_ft
* 1 Start the fault tolerance mode
* 2 migration/migration.c -> migration_thread()
* kvm_blk_save_pending_request
* 1 Call from hw/block/virtio-blk.c -> submit_requests()
#### kvm_blk_serv.c
* Watch the [HackMD](https://hackmd.io/@QvGjH100RCOnbhPOvK9CrQ/rJIu-yuUX/%2FZf0-nWixRzytkgLGeM7gnQ?type=book) from Cuju
### Interface of block server
* vl.c
* main()
Main function for block server init
* migration/event-tap.c
* blk_aio_preadv_proxy()
Read request will intercept by this function from hw/block/virtio-blk.c -> submit_requests()
* migration/migration.c
* migration_thread()
* Init the fault tolerance of every struct
* migrate_timer()
* Do the `Snapshot` and `Transfer` state of fault tolerance
* migrate_run()
* Do the `Run` state of fault tolerance
* kvmft_flush_output()
* Do the `Commit` state of fault tolerance
* send_commit1()
* `//printf("%s",s->time_buf)` Can show the time of four state
### 5.Hint
- Make sure that backup and block server is in the same node
- All the code of block server is in the `/home/ubuntu/cuju_blkserver/blkserver`which is mounted to server1(100.73.42.5) `/home/ubuntu/blkserver_nfs/` folder
- `/home/ubuntu/cuju_blkserver/mount.sh` is help to mount the blkserver folder to nfs
- Both of the server(100.73.42.5 and 100.73.42.6) use same path for block server