Import & Export

# Import & Export > - **Objective:** Learn how to a TiDB cluster on AWS (with Kubernetes) > - **Prerequisites:** > - Background knowledge of TiDB components > - Background knowledge of Kubernetes and TiDB Operator > - Background knowledge of [Mydumper & Lightning](https://pingcap.com/docs/dev/how-to/maintain/backup-and-restore/mydumper-lightning/) > - AWS account > - TiDB cluster on AWS > - **Optionality:** This is required if Binlog next because they both need two TiDB clusters. > - **Estimated time:** TBD In this document, we will demonstrate how to use MyDumper and Lightning to export data from a TiDB cluster as SQL files and import SQL files to TiDB. ## Prepare ### Prepare Data > - **Optionality:** You can skip this section if you already have data in the TiDB cluster. Prepare data using sysbench. Refer to [Run Sysbench](https://hackmd.io/0RpTgviPTfShBTDoEBhPfw#Sysbench). ``` mysql-host=${tidb_EXTERNAL-IP} mysql-port=4000 mysql-user=root mysql-db=sbtest time=1200 threads=8 report-interval=10 db-driver=mysql ``` Prepare data: ``` $ sysbench --config-file=config oltp_point_select --tables=1 --table-size=1000 prepare ``` ### Deploy Downstream Cluster Prepare another cluster to import data #### Provision Nodes You can edit `clusters.tf` and add the following to provision machines for the downstream cluster. ``` module example-cluster2 { providers = { helm = "helm.eks" } source = "../modules/aws/tidb-cluster" eks = local.eks subnets = local.subnets region = var.region cluster_name = "cluster-import" ssh_key_name = module.key-pair.key_name pd_count = 1 pd_instance_type = "c5.large" tikv_count = 3 tikv_instance_type = "c5d.large" tidb_count = 1 tidb_instance_type = "c4.large" monitor_instance_type = "c5.large" create_tidb_cluster_release = false } ``` In the above configuration, we have provisioned `1` pd instance, `3` TiKV instances and `2` TiDB instances. You can modify the corresponding instances type and count to match your needs. To apply the change, execute the following two commands: ``` $ terraform init ``` ``` $ terraform apply ``` #### Deploy TiDB Cluster You can create the ``` $ cp manifests/db.yaml.example downstream_cluster.yaml ``` In `downstream_cluster.yaml`，you need to change `CLUSTER_NAME` to the same name as the `cluster_name` in `clusters.tf` and ensure the `replicas` in the spec of `pd`, `tikv` and `tidb` to match the `pd_count`, `tikv_count` and `tidb_count` in `cluster.tf`. To apply the chagne, execute the following two commands: ``` $ kubectl create namespace ${downstream_namespace} $ kubectl create -f downstream_cluster.yaml -n ${downstream_namespace} ``` Now we get two TiDB cluster with two different namespace. ### Grant AWS Account Permissions Before you perform backup, AWS account permissions need to be granted to the Backup Custom Resource (CR) object. There are three methods to grant AWS account permissions: - [Grant permissions by importing AccessKey and SecretKey](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br/#three-methods-to-grant-aws-account-permissions) - [Grant permissions by associating IAM with Pod](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br/#grant-permissions-by-associating-iam-with-pod) - [Grant permissions by associating IAM with ServiceAccount](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br/#grant-permissions-by-associating-iam-with-pod) In this doc, we use `grant permissions by importing accesskey and secretkey` to grant AWS account permissions. --- **NOTE** Grant permissions by associating IAM with Pod or Grant permissions by associating IAM with ServiceAccount is recommanded in production enviroment. --- ### Create S3 Bucket If you don't have an S3 bucket for data export, you can create an S3 bucket in the same AWS region of your EKS cluster: ![](https://i.imgur.com/4Prh6yx.png) You can skip this section if you already have a S3 bucket to store backups. ### Install RBAC Download [backup-rbac.yaml](https://github.com/pingcap/tidb-operator/blob/master/manifests/backup/backup-rbac.yaml), and execute the following command to create the role-based access control (RBAC) resources for the namespace of both upstream and downstream: ``` $ kubectl apply -f backup-rbac.yaml -n ${upstream_namespace} $ kubectl apply -f backup-rbac.yaml -n ${downstream_namespace} ``` --- **NOTE** The backup-rbac.yaml is in manifests/backup of the tidb-operator directory. --- ### Create Secrets #### Create s3-secret TiDB operator needs to access S3 when performing data import & export operations. To do that, you can create the s3-secret secret which stores the credential used to access S3 in both the upstream and downstream namespace: ``` $ kubectl create secret generic s3-secret --from-literal=access_key=${aws_access_key} --from-literal=secret_key=${aws_secret_key} --namespace=${namespace} secret/s3-secret created ``` ``` $ kubectl create secret generic s3-secret --from-literal=access_key=${aws_access_key} --from-literal=secret_key=${aws_secret_key} --namespace=${another_ns} secret/s3-secret created ``` #### Create tidb-secret TiDB operator needs to access TiDB when performing data import & export operations. To do that, you can create a secret which stores the password of the user account needed to access the TiDB cluster in both the upstream and downstream namespace: ``` $ kubectl create secret generic export-secret --from-literal=password=${password} --namespace=${namespace} secret/export-secret created ``` ``` $ kubectl create secret generic import-secret --from-literal=password=${password} --namespace=${another_ns} secret/import-secret created ``` ## Data Export This section describes how to perform data export. We use Backup Custom Resource (CR) to desbribe an data export. TiDB Operator performs data export operation based on the specification in the Backup CR. ### Checksum Table ``` mysql> admin checksum table sbtest.sbtest1; +---------+------------+--------------------+-----------+-------------+ | Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes | +---------+------------+--------------------+-----------+-------------+ | sbtest | sbtest1 | xxx | xxx | xxx| +---------+------------+--------------------+-----------+-------------+ ``` ### Configure Backup CR The following is an example Backup CR: ``` apiVersion: pingcap.com/v1alpha1 kind: Backup metadata: name: export-to-s3 namespace: ${upstream_namespace} spec: from: host: ${cluster_name}-tidb port: ${TiDB_port} user: ${TiDB_user} secretName: export-secret s3: provider: aws secretName: s3-secret region: ${region} bucket: ${bucket} prefix: ${prefix} storageClassName: local-storage storageSize: 10Gi ``` You should replace values in `{}` with specific variables in your envrioment and save in `export-aws-s3.yaml`. ### Perform Data Export You can perform data export using the following command: ``` $ kubectl apply -f export-aws-s3.yaml ``` ### Verify Data Export You can use the following command to check the data export status: ``` $ kubectl get ${backup_pod} -n ${namespace} -o wide ``` ## Data Import In this section, we will demonstrate how to use the file created in previous sections to restore a database. ### Configure Restore CR Generate import yaml as below: ``` apiVersion: pingcap.com/v1alpha1 kind: Restore metadata: name: import-from-s3 namespace: ${downstream_namespace} spec: backupType: full to: host: ${TiDB_external_ip_downstream_namespace} port: ${TiDB_port} user: ${TiDB_user} secretName: import-secret s3: provider: aws region: ${region} secretName: s3-secret path: s3://${backup_path} storageClassName: local-storage storageSize: 1Gi ``` ### Perform Data Import ``` $ kubectl apply -f import-aws-s3.yaml ``` ### Verify Data Import You can use the following command to check the import status: ``` $ kubectl get po -n ${downstream_namespace} -o wide ``` ### Checksum Table After Import ``` > admin checksum table sbtest.sbtest1; +---------+------------+--------------------+-----------+-------------+ | Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes | +---------+------------+--------------------+-----------+-------------+ | sbtest | sbtest1 | xxx | xxx | xxx| +---------+------------+--------------------+-----------+-------------+ ``` Now check the two cluster have the same data.