# Import & Export
> - **Objective:** Learn how to a TiDB cluster on AWS (with Kubernetes)
> - **Prerequisites:**
> - Background knowledge of TiDB components
> - Background knowledge of Kubernetes and TiDB Operator
> - Background knowledge of [Mydumper & Lightning](https://pingcap.com/docs/dev/how-to/maintain/backup-and-restore/mydumper-lightning/)
> - AWS account
> - TiDB cluster on AWS
> - **Optionality:** This is required if Binlog next because they both need two TiDB clusters.
> - **Estimated time:** TBD
In this document, we will demonstrate how to use MyDumper and Lightning to export data from a TiDB cluster as SQL files and import SQL files to TiDB.
## Prepare
### Prepare Data
> - **Optionality:** You can skip this section if you already have data in the TiDB cluster.
Prepare data using sysbench. Refer to [Run Sysbench](https://hackmd.io/0RpTgviPTfShBTDoEBhPfw#Sysbench).
```
mysql-host=${tidb_EXTERNAL-IP}
mysql-port=4000
mysql-user=root
mysql-db=sbtest
time=1200
threads=8
report-interval=10
db-driver=mysql
```
Prepare data:
```
$ sysbench --config-file=config oltp_point_select --tables=1 --table-size=1000 prepare
```
### Deploy Downstream Cluster
Prepare another cluster to import data
#### Provision Nodes
You can edit `clusters.tf` and add the following to provision machines for the downstream cluster.
```
module example-cluster2 {
providers = {
helm = "helm.eks"
}
source = "../modules/aws/tidb-cluster"
eks = local.eks
subnets = local.subnets
region = var.region
cluster_name = "cluster-import"
ssh_key_name = module.key-pair.key_name
pd_count = 1
pd_instance_type = "c5.large"
tikv_count = 3
tikv_instance_type = "c5d.large"
tidb_count = 1
tidb_instance_type = "c4.large"
monitor_instance_type = "c5.large"
create_tidb_cluster_release = false
}
```
In the above configuration, we have provisioned `1` pd instance, `3` TiKV instances and `2` TiDB instances. You can modify the corresponding instances type and count to match your needs.
To apply the change, execute the following two commands:
```
$ terraform init
```
```
$ terraform apply
```
#### Deploy TiDB Cluster
You can create the
```
$ cp manifests/db.yaml.example downstream_cluster.yaml
```
In `downstream_cluster.yaml`,you need to change `CLUSTER_NAME` to the same name as the `cluster_name` in `clusters.tf` and ensure the `replicas` in the spec of `pd`, `tikv` and `tidb` to match the `pd_count`, `tikv_count` and `tidb_count` in `cluster.tf`.
To apply the chagne, execute the following two commands:
```
$ kubectl create namespace ${downstream_namespace}
$ kubectl create -f downstream_cluster.yaml -n ${downstream_namespace}
```
Now we get two TiDB cluster with two different namespace.
### Grant AWS Account Permissions
Before you perform backup, AWS account permissions need to be granted to the Backup Custom Resource (CR) object. There are three methods to grant AWS account permissions:
- [Grant permissions by importing AccessKey and SecretKey](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br/#three-methods-to-grant-aws-account-permissions)
- [Grant permissions by associating IAM with Pod](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br/#grant-permissions-by-associating-iam-with-pod)
- [Grant permissions by associating IAM with ServiceAccount](https://pingcap.com/docs/tidb-in-kubernetes/stable/backup-to-aws-s3-using-br/#grant-permissions-by-associating-iam-with-pod)
In this doc, we use `grant permissions by importing accesskey and secretkey` to grant AWS account permissions.
---
**NOTE**
Grant permissions by associating IAM with Pod or Grant permissions by associating IAM with ServiceAccount is recommanded in production enviroment.
---
### Create S3 Bucket
If you don't have an S3 bucket for data export, you can create an S3 bucket in the same AWS region of your EKS cluster:

You can skip this section if you already have a S3 bucket to store backups.
### Install RBAC
Download [backup-rbac.yaml](https://github.com/pingcap/tidb-operator/blob/master/manifests/backup/backup-rbac.yaml), and execute the following command to create the role-based access control (RBAC) resources for the namespace of both upstream and downstream:
```
$ kubectl apply -f backup-rbac.yaml -n ${upstream_namespace}
$ kubectl apply -f backup-rbac.yaml -n ${downstream_namespace}
```
---
**NOTE**
The backup-rbac.yaml is in manifests/backup of the tidb-operator directory.
---
### Create Secrets
#### Create s3-secret
TiDB operator needs to access S3 when performing data import & export operations. To do that, you can create the s3-secret secret which stores the credential used to access S3 in both the upstream and downstream namespace:
```
$ kubectl create secret generic s3-secret --from-literal=access_key=${aws_access_key} --from-literal=secret_key=${aws_secret_key} --namespace=${namespace}
secret/s3-secret created
```
```
$ kubectl create secret generic s3-secret --from-literal=access_key=${aws_access_key} --from-literal=secret_key=${aws_secret_key} --namespace=${another_ns}
secret/s3-secret created
```
#### Create tidb-secret
TiDB operator needs to access TiDB when performing data import & export operations. To do that, you can create a secret which stores the password of the user account needed to access the TiDB cluster in both the upstream and downstream namespace:
```
$ kubectl create secret generic export-secret --from-literal=password=${password} --namespace=${namespace}
secret/export-secret created
```
```
$ kubectl create secret generic import-secret --from-literal=password=${password} --namespace=${another_ns}
secret/import-secret created
```
## Data Export
This section describes how to perform data export. We use Backup Custom Resource (CR) to desbribe an data export. TiDB Operator performs data export operation based on the specification in the Backup CR.
### Checksum Table
```
mysql> admin checksum table sbtest.sbtest1;
+---------+------------+--------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+--------------------+-----------+-------------+
| sbtest | sbtest1 | xxx | xxx | xxx|
+---------+------------+--------------------+-----------+-------------+
```
### Configure Backup CR
The following is an example Backup CR:
```
apiVersion: pingcap.com/v1alpha1
kind: Backup
metadata:
name: export-to-s3
namespace: ${upstream_namespace}
spec:
from:
host: ${cluster_name}-tidb
port: ${TiDB_port}
user: ${TiDB_user}
secretName: export-secret
s3:
provider: aws
secretName: s3-secret
region: ${region}
bucket: ${bucket}
prefix: ${prefix}
storageClassName: local-storage
storageSize: 10Gi
```
You should replace values in `{}` with specific variables in your envrioment and save in `export-aws-s3.yaml`.
### Perform Data Export
You can perform data export using the following command:
```
$ kubectl apply -f export-aws-s3.yaml
```
### Verify Data Export
You can use the following command to check the data export status:
```
$ kubectl get ${backup_pod} -n ${namespace} -o wide
```
## Data Import
In this section, we will demonstrate how to use the file created in previous sections to restore a database.
### Configure Restore CR
Generate import yaml as below:
```
apiVersion: pingcap.com/v1alpha1
kind: Restore
metadata:
name: import-from-s3
namespace: ${downstream_namespace}
spec:
backupType: full
to:
host: ${TiDB_external_ip_downstream_namespace}
port: ${TiDB_port}
user: ${TiDB_user}
secretName: import-secret
s3:
provider: aws
region: ${region}
secretName: s3-secret
path: s3://${backup_path}
storageClassName: local-storage
storageSize: 1Gi
```
### Perform Data Import
```
$ kubectl apply -f import-aws-s3.yaml
```
### Verify Data Import
You can use the following command to check the import status:
```
$ kubectl get po -n ${downstream_namespace} -o wide
```
### Checksum Table After Import
```
> admin checksum table sbtest.sbtest1;
+---------+------------+--------------------+-----------+-------------+
| Db_name | Table_name | Checksum_crc64_xor | Total_kvs | Total_bytes |
+---------+------------+--------------------+-----------+-------------+
| sbtest | sbtest1 | xxx | xxx | xxx|
+---------+------------+--------------------+-----------+-------------+
```
Now check the two cluster have the same data.