# Backup & Recovery (BR) > - **Objective:** Learn to backup & recover a TiDB cluster on AWS (with Kubernetes) > - **Prerequisites:** > - Background knowledge of TiDB components > - Background knowledge of Kubernetes and TiDB Operator > - Background knowledge of [BR](https://pingcap.com/docs/stable/reference/tools/br/br/#command-line-description) > - AWS account > - TiDB cluster on AWS > - **Optionality:**: Required > - **Estimated time:** 1 hour This document describes how to perform physical backups for a TiDB cluster and use them to recover a TiDB cluster. ## Prepare > **Optionality:** Required ### Prepare Data > **Optionality:** You can skip this section if you already have data in the TiDB cluster. Prepare data using sysbench. Refer to [sysbench](https://hackmd.io/0RpTgviPTfShBTDoEBhPfw#Sysbench) ### Grant AWS Account Permissions > - **Optionality:** Required Before you perform backup, AWS account permissions need to be granted to the Backup Custom Resource (CR) object. There are three methods to grant AWS account permissions: - [Grant permissions by importing AccessKey and SecretKey](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br/#three-methods-to-grant-aws-account-permissions) - [Grant permissions by associating IAM with Pod](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br/#grant-permissions-by-associating-iam-with-pod) - [Grant permissions by associating IAM with ServiceAccount](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br/#grant-permissions-by-associating-iam-with-pod) In this doc, we use `grant permissions by importing accesskey and secretkey` to grant AWS account permissions. > **Note** > > Grant permissions by associating IAM with Pod or Grant permissions by associating IAM with ServiceAccount is recommanded in production enviroment. ### Create S3 Bucket If you don't have an S3 bucket for backup, you can create an S3 bucket in the same AWS region of your EKS cluster: ![create S3 bucket](https://i.imgur.com/4Prh6yx.png) You can skip this section if you already have a S3 bucket to store backups. ### Install RBAC Download [backup-rbac.yaml](https://github.com/pingcap/tidb-operator/blob/master/manifests/backup/backup-rbac.yaml), and execute the following command to create the role-based access control (RBAC) resources in your namespace: ```{.bash .copyable} kubectl apply -f backup-rbac.yaml -n ${cluster_namespace} ``` ```{.output} serviceaccount/tidb-backup-manager created rolebinding.rbac.authorization.k8s.io/tidb-backup-manager created ``` ### Create Secrets #### Create s3-secret TiDB operator needs to access S3 when performing backup operations. To do that, you can create the s3-secret secret which stores the credential used to access S3: ```{.bash .copyable} kubectl create secret generic s3-secret --from-literal=access_key=${aws_access_key} --from-literal=secret_key=${aws_secret_key} --namespace=${cluster_namespace} ``` ```{.output} secret/s3-secret created ``` This s3-secret will be used in your Backup CR. Verfify that the secret is properly created: ```{.bash .copyable} kubectl get secrets -n ${cluster_namespace} ``` #### Create tidb-secret TiDB operator needs to access TiDB when performing backup operations. To do that, you can create a secret which stores the password of the user account needed to access the TiDB cluster. ```{.bash .copyable} kubectl create secret generic tidb-secret --from-literal=password=${password} --namespace=${cluster_namespace} ``` ```{.output} secret/backup-secret created ``` > **Note** > > If there's no password for the user account, leave it blank in ${password}. #### Verify Secrets Verfify that the secret is properly created: ```{.bash .copyable} kubectl get secrets -n ${cluster_namespace} ``` ## Ad-hoc Full Backup > - **Optionality:** Required This section describes how to perform full backup. We use Backup Custom Resource (CR) to desbribe an ad-hoc full backup. TiDB Operator performs backup operation based on the specification in the Backup CR. ### Configure Backup CR The following is an example Backup CR: ``` apiVersion: pingcap.com/v1alpha1 kind: Backup metadata: name: backup01 namespace: ${cluster_namespace} spec: backupType: full br: cluster: ${cluster_name} sendCredToTikv: true from: host: ${cluster_name}-tidb secretName: tidb-secret s3: provider: aws secretName: s3-secret region: ${region} bucket: ${bucket} prefix: ${prefix} ``` You should replace values in `{}` with specific variables in your envrioment and save in `backup-aws-s3.yaml`. ### Perform Backup You can perform an ad-hoc full backup using the following command: ```{.bash .copyable} kubectl apply -f backup-aws-s3.yaml ``` ``` backup.pingcap.com/backup01 created ``` ### Verify Backup #### Check Backup Status You can use the following command to check the backup status: ```{.bash .copyable} kubectl get bk -n ${cluster_namespace} -o wide ``` ```{.output} NAME BACKUPPATH BACKUPSIZE COMMITTS STARTED COMPLETED AGE backup01 s3://{my_bucket}/{my_folder}/ 1611872 416522333306486785 3m58s 3m55s 15m ``` #### Inspect Backup Log You can use the following command to find the pod for the ad-hoc full backup: ```{.bash .copyable} kubectl get po -n ${cluster_namespace} ``` You can then use the following command to check the backup progress: ```{.bash .copyable} kubectl logs ${backup_pod} -n ${cluster_namespace} ``` ```{.output} ... I0508 02:56:48.029966 1 backup.go:92] [2020/05/08 02:56:48.029 +00:00] [INFO] [domain.go:607] ["domain closed"] ["take time"=2.586075ms] I0508 02:56:48.030625 1 backup.go:92] [2020/05/08 02:56:48.030 +00:00] [INFO] [collector.go:203] ["Full backup Success summary: total backup ranges: 16, total success: 16, total failed: 0, total take(s): 1.98, total kv: 16000, total size(MB): 1.92, avg speed(MB/s): 0.97"] ["backup fast checksum"=1.561152ms] ["backup checksum"=21.194566ms] ["backup total regions"=16] I0508 02:56:48.032898 1 backup.go:92] I0508 02:56:48.033027 1 backup.go:107] Backup data for cluster anthony/backup01 successfully I0508 02:56:48.040218 1 manager.go:207] reset cluster anthony/backup01 tikv_gc_life_time to 10m0s success I0508 02:56:48.040241 1 manager.go:218] backup cluster anthony/backup01 data to s3://anthonybr/anthony_01/ success I0508 02:56:48.136334 1 manager.go:232] Get size 1611872 for backup files in s3://anthonybr/anthony_01/ of cluster anthony/backup01 success I0508 02:56:48.156090 1 manager.go:244] get cluster anthony/backup01 commitTs 416522333306486785 success I0508 02:56:48.179889 1 backup_status_updater.go:66] Backup: [anthony/backup01] updated successfully ``` #### Check Backup Files You can use the S3 console or `aws s3` command to check backup files: ```{.bash .copyable} aws s3 ls s3://${my_bucket}/${my_folder}/ ``` ```{.output} 2020-05-17 17:06:03 0 2020-05-17 17:59:48 11598 1_100_25_3283d3a03adc06548749d97c17e127615132cbbf66b60da73a957847f19b62f7_write.sst 2020-05-17 17:59:48 187058 1_100_25_80992061af3e5194c3f28a5b79d486c5e9db2feda1afb3f84b4ca229ddce9932_write.sst 2020-05-17 17:59:48 187374 1_104_26_6d17617cf027713cbd7e33a2a124fe83bc2c35035a776d0c11e63551cbae7815_write.sst 2020-05-17 17:59:48 11487 1_104_26_a48ed62e9b0b2959f804d7850a940f2fc34e1382be1ab74d1518368e2512aba2_write.sst 2020-05-17 17:59:49 11544 1_108_27_2334a18b4455213018fb96ec3c92e951dcc4b10106e91896d882a5ee0d2dbea4_write.sst 2020-05-17 17:59:49 187315 1_108_27_4d4bf84bd30a3ae10b645ad9354c2c26cbc5ff2a95b338b93b820db95359617b_write.sst 2020-05-17 17:59:49 11551 1_116_29_96deb4607345137477d0eea19a6e7a200a37064f87c259d5fa8b08f3c52ed426_write.sst 2020-05-17 17:59:49 187206 1_116_29_9fb8afb3f7322e93f506f0d9d11e9b1569bc90b7c3779da9bb7e35137e8e6597_write.sst 2020-05-17 17:59:49 187222 1_2_29_393d3575060c0d616300c1199ef1c015784fdd3d13e950a6251007bbcbaf2c06_write.sst 2020-05-17 17:59:49 11640 1_2_29_e738a916dbc786a9aef2a69b15289ac934265a4fe851e1dc11140d2ac17b28e8_write.sst 2020-05-17 17:59:48 11556 1_92_23_c882bdf96118f3fcc64e32ac9040b3194ab67d910d1f5232b928416088211ba5_write.sst 2020-05-17 17:59:48 187312 1_92_23_f34f254bf742607feee767bcba259d92794f5f373e6fd415af086e4b42689491_write.sst 2020-05-17 17:59:48 187683 1_96_24_38e4059222579a22c8c937f6506dd8e2198a19c3d7fb4a245c0a4f804cd85adc_write.sst 2020-05-17 17:59:48 11487 1_96_24_675ff68620bc779c8c35ff210de4be67260ad904c9d18c17fed517a4fcf4226b_write.sst 2020-05-17 17:59:49 11564 5_112_28_20ea02d2b95115ebc0c5516aee14058da5e27a1f5bbc545fb831e6c3446fda82_write.sst 2020-05-17 17:59:49 187310 5_112_28_2550d727ffe3e408af41f541b134d5e064440c25988bbecfb665933d1195a45d_write.sst 2020-05-17 17:59:49 20770 backupmeta ``` ## Scheduled Full Backup > - **Optionality:** Optional You can setup a backup policy to perform scheduled backups for a TiDB cluster, and set a backup retention policy. A scheduled full backup is described by a BackupSchedule CR object. ### Configure BackupSchedule CR The following is an example BackupSchedule CR: ``` apiVersion: pingcap.com/v1alpha1 kind: BackupSchedule metadata: name: demo-backup-schedule-s3 namespace: ${cluster_namespace} spec: #maxBackups: 5 #pause: true maxReservedTime: "3h" schedule: "*/2 * * * *" backupTemplate: backupType: full br: cluster: ${cluster_name} clusterNamespace: ${cluster_namespace} sendCredToTikv: true from: host: ${cluster_name}-tidb secretName: tidb-secret s3: provider: aws secretName: s3-secret region: ${aws_region} bucket: ${my_bucket} prefix: ${my_folder} ``` You should replace values in `{}` with specific variables in your envrioment and save in `backup-scheduler-aws-s3.yaml`. ### Perform Scheduled Backup You can perform scheduled full backup using the following command: ```{.bash .copyable} kubectl apply -f backup-scheduler-aws-s3.yaml ``` ```{.output} backupschedule.pingcap.com/demo-backup-schedule-s3 created ``` ### Verify ### Check Backup Status You can use the following command to check the backup status: ```{.bash .copyable} kubectl get bk -n ${tidb_cluster_namespace} -o wide ``` ```{.output} NAME BACKUPPATH BACKUPSIZE COMMITTS STARTED COMPLETED AGE backup02 s3://${/anthony_02/ 1611578 416550273190199297 5m1s 4m58s 5m12s demo-backup-schedule-s3-2020-05-09t08-36-00 s3://anthonybr/anthony_sche/anthony-pd.anthony-2379-2020-05-09t08-36-00/ 1611578 416550320389226497 2m1s 118s 2m2s demo-backup-schedule-s3-2020-05-09t08-38-00 0 2s ``` #### Inspect Backup Log You can use the following command to find the pod for the scheduled full backup: ```{.bash .copyable} kubectl get pod -n ${cluster_namespace} ``` You can then use the following command to check the backup progress: ```{.bash .copyable} kubectl logs ${backup_pod} -n anthony ``` ```{.output} ... I0509 08:36:10.257977 1 backup.go:92] [2020/05/09 08:36:10.257 +00:00] [INFO] [client.go:149] ["save backup meta"] [path=s3://anthonybr/anthony_sche/anthony-pd.anthony-2379-2020-05-09t08-36-00] [jobs=0] I0509 08:36:10.258477 1 backup.go:92] [2020/05/09 08:36:10.258 +00:00] [INFO] [progress.go:102] [Checksum] [progress=100.00%!](MISSING) I0509 08:36:10.296059 1 backup.go:92] [2020/05/09 08:36:10.295 +00:00] [INFO] [ddl.go:407] ["[ddl] DDL closed"] [ID=061caa48-f309-422b-a53e-11623f55e1a8] ["take time"=973.143µs] I0509 08:36:10.296130 1 backup.go:92] [2020/05/09 08:36:10.296 +00:00] [INFO] [ddl.go:301] ["[ddl] stop DDL"] [ID=061caa48-f309-422b-a53e-11623f55e1a8] I0509 08:36:10.297872 1 backup.go:92] [2020/05/09 08:36:10.297 +00:00] [INFO] [domain.go:607] ["domain closed"] ["take time"=2.820476ms] I0509 08:36:10.298207 1 backup.go:92] [2020/05/09 08:36:10.298 +00:00] [INFO] [collector.go:203] ["Full backup Success summary: total backup ranges: 16, total success: 16, total failed: 0, total take(s): 2.00, total kv: 16000, total size(MB): 1.92, avg speed(MB/s): 0.96"] ["backup checksum"=16.581592ms] ["backup fast checksum"=1.633146ms] ["backup total regions"=16] I0509 08:36:10.300690 1 backup.go:92] I0509 08:36:10.300730 1 backup.go:107] Backup data for cluster anthony/demo-backup-schedule-s3-2020-05-09t08-36-00 successfully I0509 08:36:10.328351 1 manager.go:207] reset cluster anthony/demo-backup-schedule-s3-2020-05-09t08-36-00 tikv_gc_life_time to 10m0s success I0509 08:36:10.328368 1 manager.go:218] backup cluster anthony/demo-backup-schedule-s3-2020-05-09t08-36-00 data to s3://anthonybr/anthony_sche/anthony-pd.anthony-2379-2020-05-09t08-36-00/ success I0509 08:36:10.491981 1 manager.go:232] Get size 1611578 for backup files in s3://anthonybr/anthony_sche/anthony-pd.anthony-2379-2020-05-09t08-36-00/ of cluster anthony/demo-backup-schedule-s3-2020-05-09t08-36-00 success I0509 08:36:10.520015 1 manager.go:244] get cluster anthony/demo-backup-schedule-s3-2020-05-09t08-36-00 commitTs 416550320389226497 success I0509 08:36:10.541470 1 backup_status_updater.go:66] Backup: [anthony/demo-backup-schedule-s3-2020-05-09t08-36-00] updated successfully ``` #### Check Backup Files You can use the S3 console or `aws s3` command to check backup files: ```{.bash .copyable} aws s3 ls s3://${my_bucket}/${my_folder}/ ``` ## Restore > - **Optionality:** Required In this section, we will demonstrate how to use the backup created in previous sections to restore a database. ### Cleanup Note that, BR only supports to restore to an empty cluster. To do that,you first need to login to the tidb cluster and drop the database `sbtest`: ``` MySQL [(none)]> drop database sbtest; Query OK, 0 rows affected (0.31 sec) ``` ### Configure Restore CR Similar to backup, We use Restore Custom Resource (CR) to desbribe a restore operation. TiDB Operator performs restore operation based on the specification in the Restore CR. The following is an example Restore CP: ``` apiVersion: pingcap.com/v1alpha1 kind: Restore metadata: name: demo-restore-s3 namespace: ${cluster_namespace} spec: br: cluster: ${clustername} clusterNamespace: ${cluster_namespace} to: host: ${cluster_name}-tidb secretName: tidb-secret s3: provider: aws secretName: s3-secret region: ${awe_region} bucket: ${my_bucket} prefix: ${my_folder} ``` You should replace values in `{}` with specific variables in your envrioment and save in `restore-aws-s3.yaml`. ### Perform Restore You can perform restore using the following command: ```{.bash .copyable} kubectl apply -f resotre-aws-s3.yaml ``` ```{.output} restore.pingcap.com/demo-restore-s3 created ``` ### Verify #### Check Restore Status You can use the following command to check the restore status: ```{.bash .copyable} kubectl get rt -n ${namespace} -o wide ``` ```{.output} NAME STARTED COMPLETED AGE demo-restore-s3 43s 25s 49s ``` #### Inspect Restore Log You can use the following command to check the retore process: ```{.bash .copyable} kubectl logs ${restore_pod} -n ${namespace} ``` ```{.output} ... I0508 09:04:46.106204 1 restore.go:86] [2020/05/08 09:04:46.106 +00:00] [INFO] [collector.go:203] ["Full restore Success summary: total restore files: 16, total success: 16, total failed: 0, total take(s): 0.51, total kv: 16000, total size(MB): 1.92, avg speed(MB/s): 3.79"] ["split region"=110.963551ms] ["restore checksum"=19.994152ms] ["restore ranges"=16] I0508 09:04:46.108742 1 restore.go:86] I0508 09:04:46.108804 1 restore.go:101] Restore data for cluster anthony/demo-restore-s3 successfully I0508 09:04:46.120192 1 manager.go:206] reset cluster anthony/demo-restore-s3 tikv_gc_life_time to 10m0s success I0508 09:04:46.120219 1 manager.go:217] restore cluster anthony/demo-restore-s3 from succeed I0508 09:04:46.143491 1 restore_status_updater.go:66] Restore: [anthony/demo-restore-s3] updated successfully ``` #### Check Database You can check from TiDB cluster after restore is done successfully: ``` MySQL [(none)]> show databases; +--------------------+ | Database | +--------------------+ | INFORMATION_SCHEMA | | METRICS_SCHEMA | | PERFORMANCE_SCHEMA | | mysql | | sbtest | | test | +--------------------+ 6 rows in set (0.00 sec) MySQL [(none)]> use sbtest; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed MySQL [sbtest]> show tables; +------------------+ | Tables_in_sbtest | +------------------+ | sbtest1 | | sbtest2 | | sbtest3 | | sbtest4 | | sbtest5 | | sbtest6 | | sbtest7 | | sbtest8 | +------------------+ 8 rows in set (0.00 sec) MySQL [sbtest]> select count(*) from sbtest1; +----------+ | count(*) | +----------+ | 1000 | +----------+ 1 row in set (0.01 sec) ```