# Hybrid Transactional and Analytical Processing
> - **Objective:** Learn to deploy TiFlash in a TiDB cluster on AWS (with Kubernetes)
> - **Prerequisites:**
> - Background knowledge of TiDB components
> - Background knowledge of Kubernetes and TiDB Operator
> - Background knowledge of [TiFlash](https://pingcap.com/docs/stable/tiflash/tiflash-overview/)
> - **Optionality:** Required
> - **Estimated time:** 30 mins
TiFlash is the key component that makes TiDB essentially an Hybrid Transactional and Analytical Processing (HTAP) database. As a columnar storage extension of TiKV, TiFlash provides both good isolation level and strong consistency guarantee.
In TiFlash, the columnar replicas are asynchronously replicated according to the Raft consensus algorithm. When these replicas are read, the Snapshot Isolation level of consistency is achieved by validating Raft index and multi-version concurrency control (MVCC).
For more information on TiFlash,refer to [TiFlash Overview](https://pingcap.com/docs/stable/tiflash/tiflash-overview/).
## Prepare
This document assumes that you have a TiDB cluster deployed in Kubernetes and data available in the TiDB cluster.
To deploy a TiDB cluster in AWS EKS, you can follow the instructions in [Deploy a TiDB Cluster](/kGIwMT8_QceY7C5_XExngg).
To generate data with sysbench, you can follow the instructions in [Run Sysbench](/0RpTgviPTfShBTDoEBhPfw). Alternatively, you can create your own table and ingest data.
## Provision TiFlash Nodes
Before deploying TiFlash, we need to provide TiFlash nodes. To do that, you can modify `variables.tf` to make the following changes:
```
create_tiflash_node_pool = true
cluster_tiflash_count = 1
cluster_tiflash_instance_type = i3.4xlarge
```
To apply the changes, you can run:
```
$ terraform apply
```
It might take 10 minutes or more to finish the process.
## Deploy TiFlash
To deploy TiFlash, you can edit `TidbCluster` CR:
```
$ kubectl edit tc ${cluster_name} -n ${cluster_namespace}
```
In the prompt, add the TiFlash specification:
```
spec:
tiflash:
baseImage: pingcap/tiflash
maxFailoverCount: 3
replicas: 1
storageClaims:
- resources:
requests:
storage: 100Gi
storageClassName: ebs-gp2
```
Once you have save the changes, TiDB operator starts to deploy TiFlash. You can use the following command to observe the status of TiFlash pod,
```
$ kubectl get pod -n ${cluste_namespce} --watch
NAME READY STATUS RESTARTS AGE
basic-discovery-6cd9cc794-vn7l6 1/1 Running 0 91m
basic-pd-0 1/1 Running 0 91m
basic-tidb-0 2/2 Running 0 89m
basic-tiflash-0 5/5 Running 0 13m
basic-tikv-0 1/1 Running 0 90m
basic-tikv-1 1/1 Running 0 90m
basic-tikv-2 1/1 Running 0 90m
```
## Add TiFlash Replica
After TiFlash is deployed, data replication does not automatically begin. You need to manually specify the tables to be replicated:
```
MySQL [sbtest]> ALTER TABLE `sbtest`.`sbtest1` SET TIFLASH REPLICA 1;
Query OK, 0 rows affected (0.10 sec)
```
You can check the status of the TiFlash replicas of a specific table using the following statement:
```
SELECT * FROM information_schema.tiflash_replica WHERE TABLE_SCHEMA = '<db_name>' and TABLE_NAME = '<table_name>'
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_ID | REPLICA_COUNT | LOCATION_LABELS | AVAILABLE | PROGRESS |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
| sbtest | sbtest1 | 89 | 1 | | 1 | 1 |
+--------------+------------+----------+---------------+-----------------+-----------+----------+
1 row in set (0.01 sec)
```
In the result of above statement:
- The `AVAILABLE` column indicates whether the TiFlash replicas of this table is available for query or not. `1 `means available and `0` means unavailable. If you use DDL statements to modify the number of replicas, the replication status will be recalculated.
- The `PROGRESS` column indicate's that the progress of the replication. The value is between `0.0` and `1.0`. `1.0` means at least one replica is replicated.
## Query with TiFlash
For tables with TiFlash replicas, the TiDB optimizer automatically determines whether to use TiFlash replicas based on the cost estimation
```
MySQL [sbtest]> explain analyze select count(*) from sbtest.sbtest1;
+----------------------------+----------+---------+--------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+----------------------------+----------+---------+--------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+------+
| StreamAgg_24 | 1.00 | 1 | root | | time:18.113388ms, loops:2 | funcs:count(Column#10)->Column#5 | 372 Bytes | N/A |
| └─TableReader_25 | 1.00 | 2 | root | | time:18.102075ms, loops:2, rpc num: 2, rpc max:14.679672ms, min:14.493199ms, avg:14.586435ms, p80:14.679672ms, p95:14.679672ms, proc keys max:0, p95:0 | data:StreamAgg_8 | 206 Bytes | N/A |
| └─StreamAgg_8 | 1.00 | 2 | cop[tiflash] | | proc max:0s, min:0s, p80:0s, p95:0s, iters:2, tasks:2 | funcs:count(1)->Column#10 | N/A | N/A |
| └─TableFullScan_22 | 10000.00 | 1000 | cop[tiflash] | table:sbtest1 | proc max:0s, min:0s, p80:0s, p95:0s, iters:1, tasks:2 | keep order:false, stats:pseudo | N/A | N/A |
+----------------------------+----------+---------+--------------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+------+
```
You can find more detailed usage about TiFlash in [Use TiFlash](https://pingcap.com/docs/stable/reference/tiflash/use-tiflash/#use-tiflash).
## Cleanup TiFlash
TODO