# Change Data Capture
> - **Objective:** Learn to deploy TiCDC in a TiDB cluster on AWS (with Kubernetes)
> - **Prerequisites:**
> - Background knowledge of TiDB components
> - Background knowledge of Kubernetes and TiDB Operator
> - Background knowledge of [TiCDC](https://pingcap.com/docs/stable/ticdc/ticdc-overview/)
> - **Optionality:** Optional
> - **Estimated time:** 30 mins
## Deploy Downstream TiDB Cluster
> - **Optionality:** Optional
TODO: extract instructions on how to deploy a second TiDB cluster.
If you have a downstream cluster already deployed, you can skip this section.
## Provision TiCDC Nodes
```
variable "create_cdc_node_pool" {
description = "whether creating node pool for cdc"
default = true
}
variable "cluster_cdc_count" {
default = 3
}
variable "cluster_cdc_instance_type" {
default = "c5.2xlarge"
}
```
To apply the changes, you can run:
```
$ terraform apply
```
It might take 10 minutes or more to finish the process.
## Deploy TiCDC
To deploy TiCDC, you can edit `TidbCluster` CR:
```
$ kubectl edit tc ${upstream} -n ${upstream_namespace}
```
In the prompt, add the TiCDC specification:
```
ticdc:
baseImage: pingcap/ticdc
replicas: 3
```
Once you have save the changes, TiDB operator starts to deploy TiCDC. You can use the following command to check the status of TiCDC pods:
```
$ kubectl get pod -n ${upstream_namespace}
NAME READY STATUS RESTARTS AGE
basic-discovery-6bb656bfd-sps8z 1/1 Running 0 4h7m
basic-pd-0 1/1 Running 0 4h7m
basic-pd-1 1/1 Running 0 4h7m
basic-pd-2 1/1 Running 2 4h7m
basic-ticdc-0 1/1 Running 0 3h15m
basic-ticdc-1 1/1 Running 0 3h15m
basic-ticdc-2 1/1 Running 0 3h15m
basic-tidb-0 2/2 Running 0 4h6m
basic-tidb-1 2/2 Running 0 4h6m
basic-tikv-0 1/1 Running 0 4h7m
basic-tikv-1 1/1 Running 0 4h7m
basic-tikv-2 1/1 Running 0 4h7m
```
You can use the following command to check the status of TiCDC service:
```
$ kubectl get svc -n ${upstream_namespace}
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
basic-discovery ClusterIP 10.108.13.111 <none> 10261/TCP 3h37m
basic-pd ClusterIP 10.103.226.105 <none> 2379/TCP 3h37m
basic-pd-peer ClusterIP None <none> 2380/TCP 3h37m
basic-ticdc-peer ClusterIP None <none> 8301/TCP 165m
basic-tidb ClusterIP 10.108.186.92 <none> 4000/TCP,10080/TCP 3h35m
basic-tidb-peer ClusterIP None <none> 10080/TCP 3h35m
basic-tikv-peer ClusterIP None <none> 20160/TCP 3h36m
```
You should take notes of the `ClusterIP` of `basic-pd` and `basic-tidb`, which will be used by TiCDC to create changefeed.
## Create Changefeed
To create a change feed, you first login to one of the TiCDC pod:
```
$ kubectl exec -it basic-ticdc-0 -n ${upstream_namespace} sh
```
Inside the pod, you can first inspect the TiCDC cluster:
```
$ /cdc cli capture list --pd="http://{pd_CLUSTER-IP}:2379"
[
{
"id": "391d4695-a4fb-456a-b800-5a07fb1bc9d6",
"is-owner": false,
"address": "basic-ticdc-0.basic-ticdc-peer.demo.svc:8301"
},
{
"id": "659b88a5-0656-47bf-997f-f47956ae9e1e",
"is-owner": true,
"address": "basic-ticdc-2.basic-ticdc-peer.demo.svc:8301"
},
{
"id": "c83b6c55-8293-4613-9f49-73c6142abc75",
"is-owner": false,
"address": "basic-ticdc-1.basic-ticdc-peer.demo.svc:8301"
}
]
```
To create a changefeed, you can execute the following command:
```
$ /cdc cli changefeed create --sink-uri="mysql://root:@{tidb_CLUSTER-IP}:4000/" --pd="http://${pd_CLUSTER-IP}:2379"
Create changefeed successfully!
ID: 145ee6dd-1220-43f2-8d0b-423ab175944f
Info: {"sink-uri":"mysql://root:@10.104.118.45:4000/","opts":{},"create-time":"2020-05-30T19:34:11.4398499Z","start-ts":417036304749166593,"target-ts":0,"admin-job-type":0,"sort-engine":"memory","sort-dir":".","config":{"case-sensitive":true,"filter":{"ignore-txn-start-ts":null,"ddl-white-list":null},"mounter":{"worker-num":16},"sink":{"dispatch-rules":null},"cyclic-replication":{"enable":false,"replica-id":0,"filter-replica-ids":null,"id-buckets":0,"sync-ddl":false}}}
```
You can check the current in progress processes:
```
$ cdc cli processor list --pd="http://${pd_CLUSTER-IP}:2379"
[
{
"changefeed-id": "145ee6dd-1220-43f2-8d0b-423ab175944f",
"capture-id": "e2692613-9aaf-408e-8718-3d710fd2117e"
}
]
```
## Run Sysbench
It is recommended to explore TiCDC with an empty database.
```
mysql-host=${upstream_tidb_EXTERNAL-IP}
mysql-port=4000
mysql-user=root
mysql-db=cdc
time=1200
threads=8
report-interval=10
db-driver=mysql
```
To prepare data, you can run the following command:
```
$ sysbench --config-file=config oltp_point_select --tables=1 --table-size=1000 prepare
```
## Verify Data
### Verify Data is Synced
You can get the checksum of the `cdc.sbtest1` table in both the upstream and downstream TiDB clusters:
```
$ mysql -h ${upstream_tidb_EXTERNAL-IP} -P 4000 -u root
```
```
mysql> admin checksum table cdc.sbtest1;
```
```
$ mysql -h ${downstream_tidb_EXTERNAL-IP} -P 4000 -u root
```
```
mysql> admin checksum table cdc.sbtest1;
```
The value of the checksum should match. You can run SQL queries for further data verifications,.
## Cleanup
### Remove Changefeed
```
$ kubectl exec -it basic-ticdc-0 -n ${upstream_namespace} sh
```
```
$ /cdc cli changefeed remove --changefeed-id=145ee6dd-1220-43f2-8d0b-423ab175944f --pd="http://10.103.226.105:2379"
```
Check the remove is successful:
```
$ cdc cli processor list --pd="http://${pd_CLUSTER-IP}:2379"
[]
```
### Remove TiCDC in TidbCluster CR
You can remove TiCDC frin `TidbCluster` CR:
```
kubectl edit tc ${upstream} -n ${upstream_namespace}
```
### Delete TiCDC StatefulSet
After that, you can delete the TiCDC StatefulSet:
```
$ kubectl get sts -n ${upstream_namespace}
NAME READY AGE
basic-pd 0/3 2d12h
basic-ticdc 0/3 2d11h
basic-tidb 0/2 2d12h
basic-tikv 0/3 2d12h
```
```
$ kubectl delete sts basic-ticdc -n ${upstream_namespace}
statefulset.apps "basic-ticdc" deleted
```
You can verify that the StatefulSet is successfully deleted:
```
$ kubectl get pod -n ${upstream_namespace}
```
#### Troubleshooting
In case that the TiCDC pods are stuck in the `Terminating` state, you can force delete them by:
```
$ kubectl delete pod basic-ticdc-2 -n ${upstream_namespace} --force --grace-period=0
```