TiDB Monitoring - Understanding Metrics Fundamentals

# TiDB Monitoring - Understanding Metrics Fundamentals [TOC] ## Purpose of this Tutorial Guide The Tutorial Guide is a set of hands-on labs to learn the fundamentals of TiDB metrics. Many metrics are for TiDB and it's components are collected through Prometheus and are graphically displayed using Grafana and the TiDB Dashboard. The goal is to guide users through step-by-step instructions to generate various Real-Time works loads to familiarize and understand Metric Fundamentals, how various workloads affect each component of TiDB using both Grafana and the TiDB Dashboard. The Guide is a learning mechanism and it is not meant to be a definitive guide for performance tuning guide. However, performance topics will be discussed as they relate to the workloads you will run. Various benchmark utilities are used in these labs, Sysbench, TiUP Bench, DBT2, YCSB. A short description of these utilities will be given so that you understand them. The guide consists of multiple articles, each of which is dedicated to a specific topic. For the smoothness of the overall experience, separate articles are intrinsically connected. Therefore, we suggest that users operate sequentially through the guide in article order. Some of these articles are marked as Optional, meaning that the content of these chapters is unnecessary for some useres and can be skipped without affecting subsequent chapters. For the sake of repeatability, we use fixed TiDB version, fixed tidb-operator version, and fixed tools versions in the process of composing this guide. At the end of this guide you will be able to understand the fundamentals of TiDB Metrics and understand how they relate to different workloads and how the TiDB Components work and affect each other. In this tutorial guide, we will introduce you to the fundamental principles of understanding TiDB Monitoring Metrics using hands on examples to compare and contrast how varying workloads affect different components of TiDB. We will talk about what type of basic metrics you'll need to understand. ## TiDB Architecture - The Basics Before we get to the fun stuff, standing up TiDB and running the workloads, it's necessary to spend a few moments to explain the TiDB Architecture and what they are and what they do. As a distributed database, TiDB is designed to consist of multiple components. Theses components communicate with each other and form a complete TiDB system. The architecture is as follows: ![](https://i.imgur.com/gxMONoV.png) ### TiDB server The TiDB server is a stateless SQL layer that exposes the connection endpoint of the MySQL protocol to the outside. The TiDB server receives SQL requests, performs SQL parsing and optimization, and ultimately generates a distributed execution plan. It is horizontally scalable and provides a unified interface to the outside through the load balancing components such as Linux Virtual Server (LVS), HAProxy, or F5. It does not store data and is only for computing and SQL analyzing, transmitting actual data read request to TiKV nodes (or TiFlash nodes). ### Placement Driver (PD) server The PD server is the metadata managing component of the entire cluster. It stores metadata of real-time data distribution of every single TiKV node and the topology structure of the entire TiDB cluster, provides TiDB Dashboard management UI, and allocates transaction IDs to distributed transactions. The PD server is "the brain" of the entire TiDB cluster because it not only stores metadata of the cluster, but also sends data scheduling command to specific TiKV nodes according to the data distribution state reported by TiKV nodes in real time. In addition, the PD server consists of three nodes at least and has high availability. It is recommended to deploy an odd number of PD nodes. PD uses the Timestamp Oracle. The timestamp oracle plays a significant role in the Percolator Transaction model(On which TiDB has it's roots.), it is a server that hands out timestamps in strictly increasing order, a property required for correct operation of the snapshot isolation protocol. Since every transaction requires contacting the timestamp oracle twice, this service must scale well. The timestamp oracle periodically allocates a range of timestamps by writing the highest allocated timestamp to stable storage; then with that allocated range of timestamps, it can satisfy future requests strictly from memory. If the timestamp oracle restarts, the timestamps will jump forward to the maximum allocated timestamp. Timestamps never go “backwards”. ### Storage servers #### TiKV server The TiKV server is responsible for storing data. TiKV is a distributed transactional key-value storage engine. Region is the basic unit to store data. Each Region stores the data for a particular Key Range which is a left-closed and right-open interval from StartKey to EndKey. Multiple Regions exist in each TiKV node. TiKV APIs provide native support to distributed transactions at the key-value pair level and supports the Snapshot Isolation level isolation by default. This is the core of how TiDB supports distributed transactions at the SQL level. After processing SQL statements, the TiDB server converts the SQL execution plan to an actual call to the TiKV API. Therefore, data is stored in TiKV. All the data in TiKV is automatically maintained in multiple replicas (three replicas by default), so TiKV has native high availability and supports automatic failover. #### TiFlash server The TiFlash Server is a special type of storage server. Unlike ordinary TiKV nodes, TiFlash stores data by column, mainly designed to accelerate analytical processing. In this tutorial guide we will not be running workloads against TiFlash server. This will be covered in another tutorial. We mention it here so you understand the complete TiDB Platform. ## TiDB Storage We'll introduce some design ideas and key concepts of TiKV at high level. ![](https://i.imgur.com/chEaLm7.png) ### Key-Value pairs The first thing to decide for a data storage system is the data storage model, that is, in what form the data is saved. TiKV's choice is the Key-Value model and provides an ordered traversal method. There are two key points for TiKV data storage model: The huge Map, which stores Key-Value pairs. The Key-Value pair in the Map is ordered according to Keys' binary order, which means you can Seek the position of a particular Key and then call the Next method to get the Key-Value pairs larger than this Key in incremental order. Note that the KV storage model for TiKV described in this document has nothing to do with SQL tables. This document does not discuss any concepts related to SQL and only focuses on how to implement a high-performance, high-reliability, distributed Key-Value storage such as TiKV. ### Local storage (RocksDB) For any persistent storage engine, data is eventually saved on disk, and TiKV is no exception. TiKV does not write data directly on the disk, but stores data in RocksDB, which is responsible for the data storage. The reason is that it costs a lot to develop a standalone storage engine, especially a high-performance standalone engine that requires careful optimization. RocksDB is an excellent standalone storage engine open-sourced by Facebook. This engine can meet various requirements of TiKV for a single engine. Here, you can simply consider RocksDB as a single persistent Key-Value Map. ### Raft protocol What's more, the implementation of TiKV faces a more difficult thing: to secure data safety in case a single machine fails. A simple way is to replicate data to multiple machines, so that even if one machine fails, the replicas on other machines are still available. In other words, you need a data replication scheme that is reliable, efficient, and able to handle the situation of a failed replica. All of these are made possible by the Raft algorithm. Raft is a consensus algorithm. This document only briefly introduces Raft. For more details, you can see In Search of an Understandable Consensus Algorithm. The Raft has several important features: * Leader election * Membership changes - such as adding replicas, deleting replicas, transferring leaders..etc * Log replication TiKV use Raft to perform data replication. Each data change will be recorded as a Raft log. Through Raft log replication, data is safely and reliably replicated to multiple nodes of the Raft group. However, according to the Raft protocol, successful writes only require that data is replicated to the majority of nodes. ![](https://i.imgur.com/Rb8Jocx.png) In summary, TiKV can quickly store data on disk via the standalone machine RocksDB, and replicate data to multiple machines via Raft in case of machine failure. Data is written through the interface of Raft instead of to RocksDB. With the implementation of Raft, TiKV becomes a distributed Key-Value storage. Even with a few machine failures, TiKV can automatically complete replicas by virtue of the native Raft protocol, which does not impact the application. ### Region To make it easy to understand, let's assume that all data only has one replica. As mentioned earlier, TiKV can be regarded as a large, orderly KV Map, so data is distributed across multiple machines in order to achieve horizontal scalability. For a KV system, there are two typical solutions to distributing data across multiple machines: * Hash: Create Hash by Key and select the corresponding storage node according to the Hash value. * Range: Divide ranges by Key, where a segment of serial Key is stored on a node. TiKV chooses the second solution that divides the whole Key-Value space into a series of consecutive Key segments. Each segment is called a Region. There is a size limit for each Region to store data ,the default value is 96 MB and the size can be configured. Each Region can be described by StartKey, EndKey. ![](https://i.imgur.com/BHOlm2M.png) Note that the Region here has nothing to do with the table in SQL. In this document, forget about SQL and focus on KV for now. After dividing data into Regions, TiKV will perform two important tasks: Distributing data to all nodes in the cluster and use Region as the basic unit. Try its best to ensure that the number of Regions on each node is roughly similar. Performing Raft replication and membership management in Region. These two tasks are very important and will be introduced one by one. First, data is divided into many Regions according to Key, and the data for each Region is stored on only one node (ignoring multiple replicas). The TiDB system has a PD component that is responsible for spreading Regions as evenly as possible across all nodes in the cluster. In this way, on one hand, the storage capacity is scaled horizontally (Regions on the other nodes are automatically scheduled to the newly added node); on the other hand, load balancing is achieved (the situation where one node has a lot of data while the others have little will not occur). At the same time, in order to ensure that the upper client can access the needed data, there is a component, the Placement Driver or PD in the system to record the distribution of Regions on the node, that is, the exact Region of a Key and the node of that Region placed through any Key. For the second task, TiKV replicates data in Regions, which means that data in one Region will have multiple replicas with the name “Replica”. Multiple Replicas of a Region are stored on different nodes to form a Raft Group, which is kept consistent through the Raft algorithm. One of the Replicas serves as the Leader of the Group and other as the Follower. By default, all reads and writes are processed through the Leader, where reads are done and write are replicated to followers. The following diagram shows the whole picture about Region and Raft group. ![](https://i.imgur.com/Asq00zc.png) As we distribute and replicate data in Regions, we have a distributed Key-Value system that, to some extent, has the capability of disaster recovery. You no longer need to worry about the capacity, or disk failure and data loss. ### MVCC Many databases implement multi-version concurrency control (MVCC), and TiKV is no exception. Imagine the situation where two clients modify the value of a Key at the same time. Without MVCC, the data needs to be locked. In a distributed scenario, it might cause performance and deadlock problems. TiKV's MVCC implementation is achieved by appending a version number to Key. In short, without MVCC, TiKV's data layout can be seen as: ![](https://i.imgur.com/0GEmf0V.png) With MVCC, the key array of TiKV logically looks like this: ![](https://i.imgur.com/7DgpfwL.png) Note that for multiple versions of the same Key, versions with larger numbers are placed first, so that when you obtain Value through Key + Version, the Key of MVCC can be constructed with Key and Version, which is Key_Version. Then you can directly locate the first position greater than or equal to this Key_Version through RocksDB. ### Distributed ACID transaction Our transaction model is inspired by Google's Percolator. It's mainly a decentralized 2-phase commit protocol with some practical optimizations. This model relies on a timestamp allocator to assign increasing timestamp for each transaction. TiKV employs a pessimistic transaction model and only locks data in the final 2 phase commit stage, which is the time that client call Commit() function. In order to deal with the lock problem when reading/writing data in the 2pc stage, TiKV adds a simple Scheduler layer in the storage node to queue locally before returning to the client for retry. In this way, the network overhead is reduced. The default isolation level of TiKV is Repeatable Read (SI) and it exposes the lock API, which is used for implementing SSI (Serializable snapshot isolation), such as SELECT … FOR UPDATE in mysql, for the client. ## Putting It All Together - The Journey Of A Transaction Now that we have a basic understanding of the components of TiDB, I will be helpful to "visually" see how a transaction makes it's way through TiDB to read and write data. ### TiDB Performance Map What is it? An introduction that clearly shows you the architecture of the TiDB database. A flow diagram that details how the modules and components interact with each other. A manual that gives tuning metrics helping you troubleshoot the bottleneck of the system. TiDB Performance Map that makes understanding TiDB performance simple. Follow the link below: https://asktug.com/_/tidb-performance-map/#/ During our step by step examples I'll discuss core concepts of the flow of important Component processes. At this point don't get too concerned about the details. You just need to understand the flow at this point and it will become much clearer to you later. ## Building a TiDB Test Environment You have two options to build a TiDB Environment to execute the labs and run the workloads. Depending what your learning goals are, that will drive your choice. macOS and Linux are supported in this guide. ### TiUP Playground - Appendix A If your interested in getting up quickly and start the labs, the playground can be deployed in minutes. All components run as a process and not as independent nodes in a cluster. It's probably the fastest way to get started and learn your way around the various Grafana and Dashboard panels before getting creating larger cluster environments. The same workloads are used across all test environment available to you. ### TiUP Cluster - Appendix B A good choice if you want to run a true multi-node TiDB Cluster. In this guide we use Vagrant to deploy the necessary nodes and TiDB Topology. The setup will usually take up to 30 minutes depending on your access to resources. It is also a good choice if you are planning on doing further testing of TiDB outside the scope of this guide. ### TiDB Operator - Appendix C You may want to choose this option if Kubernetes is your choice for deploying applications. The TiDB Operator can be deployed on Minikube, , Kind, Self-Managed K8s, AWS EKS or Google GKE. It is also a good choice if you are planning on doing further testing of TiDB outside the scope of this guide. ## Run your first workload with TiUP Bench - Understand Basic Metrics and Grafana Navigation To run our first workload it is assumed that you have selected where you are running TiDB, installed and tested as outlined in the appropriate Appendix. The intent of this exercise is to become, at a base level, how to make your way around TiDB Grafana Metrics Dashboards. I'll direct you to certain panels to see the various TiDB Components in action. Don't get concerned with performance at this point. As I stated earlier, this is not a performance troubleshooting guide. I'll cover those aspects in the next published guide. ### Install TiUP Bench Will use TiUP Bench for our first few workload labs to get a baseline and to understand a few simple Dashboard and Grafana panels. This will help you get an initial comfort level where everything is and how to access. We'll also take a look at some of the interaction between TiDB components. Appendix D has a complete description of TiUP Bench. Let's get started ! ### Create a database for TiUP Bench to use mysql -h 192.168.1.203 -u root -P 4000 Note - Your ip address will be most likely be different. create database benchdb01 ; ### Load and create tables and load database TiUP Bench will automatically create the tables and load them with the following commands, the load will take ~15 minutes depending on the resources you have allocated for the TiDB Cluster: tiup bench tpcc -H 192.168.1.203 -D benchdb01 --warehouses 4 --parts 4 prepare 9 tables will be created MySQL [benchdb01]> show tables ; +---------------------+ | Tables_in_benchdb01 | +---------------------+ | customer | | district | | history | | item | | new_order | | order_line | | orders | | stock | | warehouse | +---------------------+ ### Run TiUP Bench workload tiup bench tpcc -H 192.168.1.203 -D test01 --threads 100 --time 10m --warehouses 4 --parts 4 run the --time parameter limits how long the ustility runs. In the example above it is set for 10 nminutes. This should give you enough metrics to collect as you go through this guide. You can set this longer if you wish to collect more metrics over a longer time or simply re-run the utility. #### Overview Dashboard During routine operations, you can get an overview of the component (PD, TiDB, TiKV) status and the entire cluster from the Overview Dashboard, where the key metrics are displayed. The Dashboard canbe found at this URL http://'Ip Address':3000/ The IP Address is that of the Grafana Server. If you used TiUP Playground it is the same address of TiDB. Service Port Status: Shows all the components status. PD: Check Region related status, PD commands duration. TiDB: OPS, QPS, Connection, Transaction, Connection status with TIKV and PD. TIKV: region, size, scheduler pending, coprocessor status System Info: CPU, Memory, IO, Network. #### Go to TiDB Grafana Dashboard Let's take a look at the TiDB Grafana Dashboard. You can access the TiDB Grafana Dashboard at: http://(Your IP Address for Grafana):3000/ After logging into TiDB Grafana Dashboard, the Home page is entered by default. ![](https://i.imgur.com/1b6xHQ8.png) Click on the "HOME" Label and scroll down and Click the drop down button and several dashboards are revealed. You sholud see your cluster name similar to this: ![](https://i.imgur.com/J9F6Zjb.png) As you read in an TiDB Architecture Section there are 3 main components of the TiDB Platform, TiDB Server, Placement Driver and TiKV Storage. Let's take a look at these components and I'll discuss some of the key processes within the components. It can be a bit overwhelming with the number dashboards that are available and all the numerous metrics that are displayed within each panel. So for now the intent is for you to grasp key processes and metrics. When you need to check the running status of TIDB, the Overview panel provides enough information for your reference. The Overview panel is divided into five tabs in total. #### Service Port Status: Shows the running status of all components. Green on the left indicates that each component is operating normally, and if it appears red, it means that there is an abnormal component status. And you can get the specific component exception by name. #### PD: Mainly does management work, such as providing TSO(Time Stamp Orace), region related management, etc. In this tab, you can see the scheduling related information of the region, PD response information, and the status of TIKV. #### TIDB: It is the computing node of TIDB cluster. This tab not only shows the overall performance, such as OPS, QPS, TPS, Connection count. It will also show the performance of interacting with PD and TIKV. It will also show information about memory and locks. #### TIKV: TIKV is a storage node. This tab displays storage-related performance, and also includes coprocessor operating information, #### System Info: Shows the running status from the operating system level, show CPU, MEM, IO, Network information. ![](https://i.imgur.com/uAvLQBn.png) The overview is expanded and the 3 TiDB components are shown. Let's start with System Info. Expand the System Info selection. ![](https://i.imgur.com/y3us4JX.png) ![](https://i.imgur.com/A9T8TXj.png) The first thing to pay attention to is the system-related monitoring, the purpose is to see whether the system resources have reached the bottleneck, this monitoring is on the grafana Overview page: - If the CPU usage exceeds 80%, it is very likely that the CPU has reached the bottleneck. As TiKV also involves IO operations, 75% may reach the bottleneck. - The value of CPU load should not exceed the number of cores of the CPU. - The memory usage of the TiKV node should not exceed 60%, leaving a portion of the memory for caching the data before compression. - The network traffic should not show full usage of the available bandwith of the network interface - IO Util is best maintained below 80% Now let's take a look at the TiDB Component. ![](https://i.imgur.com/1g03cDf.png) ![](https://i.imgur.com/01IBA4F.png) Next we come to the grafana TiDB page. In the case of TP workload, .99 latency should be less than 100ms, and there should not be too many slow queries. Get token duration should not exceed 1ms, otherwise you need to check whether the number of connections exceeds the `token-limit` parameter limit configured by tidb. #### Query Summary - Duration: .99 latency should less than 100ms for OLTP workload - Slow query: there should not be too many slow queries. - Get token duration: better < 1ms, or please check whether the token-limit configuration is larger than total count of connections. #### Executor - Parse duration: better < 10ms - Compile duration: better < 30ms #### KV Errors - Lock Resolve OPS: better < 500 for `expired` and `not_expired`, or there may be too many conflicts - KV Backoff OPS: better < 500 for both `txnLockFast` and `txnLock` - PD TSO Wait duration should be less than 5ms, otherwise the pressure on TiDB may be relatively high. You can scroll downwards to see the other metrics that are collected for the TiDB Layer. Here are some of the key metrics: #### Statement OPS - The number of different types of SQL statements executed per second, which is counted according to `SELECT`, `INSERT`, `UPDATE`, and other types of statements. #### Duration - The execution time. <br/>1. The duration between the time that the client's network request is sent to TiDB and the time that the request is returned to the client after TiDB has executed the request. In general, client requests are sent in the form of SQL statements; however, this duration can include the execution time of commands such as `COM_PING`, `COM_SLEEP`, `COM_STMT_FETCH`, and `COM_SEND_LONG_DATA`. <br/>2. Because TiDB supports Multi-Query, TiDB supports sending multiple SQL statements at one time, such as `select 1; select 1; select 1;`. In this case, the total execution time of this query includes the execution time of all SQL statements. #### QPS By Instance - The QPS on each TiDB instance, which is classified according to the success or failure of command execution results. #### Failed Query OPM - The statistics of error types (such as syntax errors and primary key conflicts) based on the errors occurred when executing SQL statements per second on each TiDB instance. The module in which the error occurs and the error code are included. #### Connection Count - The connection number of each TiDB instance. #### Memory Usage - The memory usage statistics of each TiDB instance, which is divided into the memory occupied by processes and the memory applied by Golang on the heap. #### Transaction OPS - The number of transactions executed per second. #### Transaction Duration - The execution time of a transaction #### KV Cmd OPS - The number of executed KV commands. #### KV Cmd Duration 99 - The execution time of the KV command. #### PD TSO OPS - The number of TSO that TiDB obtains from PD per second. #### PD TSO Wait Duration - The duration that TiDB waits for PD to return TSO. #### TiClient Region Error OPS - The number of Region related errors returned by TiKV. #### Lock Resolve OPS - The number of TiDB operations that resolve locks. When TiDB's read or write request encounters a lock, it tries to resolve the lock. #### KV Backoff OPS - The number of errors returned by TiKV. We'll now take a look at the Placement Driver and some key metrics. We'll select PD in the cluster overview dashboard. ![](https://i.imgur.com/dNSbFzS.png) Expand the PD Dashboard and navigate up and down to get familiar with the dashboard. As you become comfortable with the dashboards I won't be supplying that many screenshots and call outs. Let's take a look at the performance-related metrics in the grafana PD page that the DBA needs to pay attention to: The PD here is mainly concerned with wal fsync duration under the ectd server. This represents the delay of PD writing to disk. If it is too high, it may cause the PD leader to switch. Another indicator that needs attention is Heartbeat. If this delay is relatively high, it means that TiKV has too many heartbeats, and PD can't handle it. This means you need to find a way to reduce the number of regions in the cluster. #### etcd - 99% WAL fsync duration: better < 5ms #### Heartbeat - 99% Region heartbeat latency: better < 5ms ![](https://i.imgur.com/1a3nYsA.png) You can scroll downwards to see the metrics that are collected for the PD Layer. Here are some of the key metrics: #### PD role - The role of the current PD. #### Storage capacity - The total storage capacity of the TiDB cluster. #### Current storage size - The occupied storage capacity of the TiDB cluster, including the space occupied by TiKV replicas. #### Normal stores - The number of nodes in the normal state. #### Abnormal stores - The number of nodes in the abnormal state. #### Number of Regions - The total number of Regions in the current cluster. Note that the number of Regions has nothing to do with the number of replicas. #### 99% completed_cmds_duration_seconds - The 99th percentile duration to complete a pd-server request. it should be less than 5ms #### Handle\_requests\_duration\_seconds - The network duration of a PD request. #### Region health - The state of each Region. Generally, the number of pending peers is less than 100, and that of the missing peers cannot always be greater than `0`. #### Hot write Region's leader distribution - The total number of leaders who are the write hotspots on each TiKV instance. #### Hot read Region's leader distribution - The total number of leaders who are the read hotspots on each TiKV instance. #### Region heartbeat report - The count of heartbeats reported to PD per instance. #### 99% Region heartbeat latency - The heartbeat latency per TiKV instance (P99). Finally let's take a look at the TiKV storage and some key metrics. We’ll select TIKV in the cluster overview dashboard. ![](https://i.imgur.com/mlcvdP5.png) You can scroll downwards to see the metrics that are collected for the TiKV Layer. Let's take a look at the performance-related metrics in the grafana TiKV page that DBAs need to pay attention to: - The region count of a single TiKV should not exceed 50K. If it exceeds, please remember to enable region-merge to reduce the number of regions, or enable the hibernate region parameter to reduce region overhead; - .99 gRPC message duration should be less than 100ms (except for complex coprocessor requests) - In addition, you need to pay attention to the CPU usage of each module of TiKV to see if it is close to the bottleneck. ### Cluster - Region: better < 50K, or the region merge feature and the hibernate region feature are necessary #### gRPC - .99 gRPC message duration: better < 100ms (except complex coprocessor requests) #### Thread CPU - Raft store CPU: better < 75% * `raftstore.store-pool-size` - Async apply CPU: better < 75% * `raftstore.apply-pool-size` - Scheduler worker CPU: better < 80% * `storage.scheduler-worker-pool-size` - gRPC poll CPU: better < 80% * `server.grpc-concurrency` - Unified read pool CPU: better < 80% * `readpool.unified.max-thread-count` - Storage ReadPool CPU: better < 80% * `readpool.storage.normal-concurrency` ![](https://i.imgur.com/FQ2jPFN.png) #### Some Key Points: Raft IO includes the time consumption of writing raft log, raftlog commit, and apply raft log. Raft Propose needs to pay attention to the two wait durations. If the Propose wait duration is high, the raftstore thread may be stuck, such as caused by the high append log, or the raftstore cpu is very busy. If the Apply wait duration is high, the apply module may be busy. It is the slowness of writing rocsksdb, or the apply pool size is not large enough. See if you need to increase the apply-pool-size parameter to solve it. It is best to avoid server is busy errors. #### Here are some of the other key metrics: #### leader - The number of leaders on each TiKV node. #### region - The number of Regions on each TiKV node. #### CPU - The CPU usage ratio on each TiKV node. #### Memory - The memory usage on each TiKV node. #### store size - The size of storage space used by each TiKV instance. #### cf size - The size of each column family (CF for short) #### channel full - The number of "channel full" errors on each TiKV instance. #### server report failures - The number of error messages reported by each TiKV instance. #### scheduler pending commands - The number of pending commands on each TiKV instance. #### coprocessor executor count - The number of coprocessor operations received by TiKV per second. Each type of coprocessor is counted separately. #### coprocessor request duration - The time consumed to process read requests of coprocessor. #### raft store CPU - The CPU usage ratio of the raftstore thread. The default number of threads is 2 (configured by `raftstore.store-pool-size`). A value of over 80% for a single thread indicates that the CPU usage ratio is very high.| #### Coprocessor CPU - The CPU usage ratio of the coprocessor thread. ### Your first learning point Now that you understand the 3 components of TiDB and have seen some of the key metrics of those components. TiDB is a distributed database, from your reading about the TiDB Server you understand that this is where the client applications initially enter through the TiDB Server. However, if you noticed when running our initial workload we specify only 1 TiDB Server port. Essentially, the 2nd TiDB Server never receives client requests ! The end result is we never take advantage to load balance client requests. To achieve the full capabilities of the TiDB Servers we need to install HA Proxy to round robin client requests between the 2 TiDB Servers we have installed. ### Install HAproxy to provide load balancing See Appendix D ## Run second workload with TiUP Bench YCSB - A deeper look into PD and Tikv Read/Write Flows Metrics. In this second load we'll make some parameter changes to add threads, change lock isolation levels and target number of operations. You can use the same database you created previously. ### Load and create tables and load database using TIUP BENCH YCSB TiUP Bench YCSB will automatically create the tables and load them with the following commands, the load will take ~15 minutes depending on the resources you have allocated for the TiDB Cluster: `tiup bench ycsb -D bench01 prepare -d /usr/share/java/mysql-connector-java.jar -H 192.168.1.203 -P 4000 --pd 192.168.1.201:2379 -U root - 100000 -D bench01` #### Let's take a look at how TiDB Reads This is the basic Read Flow. We will look at Grafana Metrics to observe how this Read Flow takes place, then we'll move onto the Write Flow. It is assumed that you are somewhat familiar now with TiDB. However, the Read and Write Flow sections can become somewhat complex. If you are inclined just to become familiar with some of the key metrics and not get overly concerned about the how the "Sausage is made" then continue on with the guide. If you'd like to get deeper TiKV storage you may want to read this excellent paper located here: [https://tikv.github.io/deep-dive-tikv/overview/introduction.html](https://) If you desire, you can run a workload to observe the metrics in Real-Time. ![](https://i.imgur.com/VSVj3ui.png) The main function of the protocol layer is to manage the client connections, parse the MySQL commands, and return the execution results. - Handle the MySQL protocol - Unlikely to be the bottleneck - Control the number of concurrently running sessions to be below “token-limit” ![](https://i.imgur.com/nAevjpj.png) An important metric in the Protocol Layer is: Get Token Duration: the time cost of getting Token on each connection. It can be seen in the Grafna Dashboard here: ![](https://i.imgur.com/Kv5fED6.png) ![](https://i.imgur.com/tatUKlD.png) ![](https://i.imgur.com/pEwZzNB.png) - Parse the SQL statement - Long and complex SQLs cost more - Typically < 1ms for SELECT SQLs - Recommend using PREPARE statement to avoid the parsing cost when possible This key metric can be found here: ![](https://i.imgur.com/ImrlfvY.png) ![](https://i.imgur.com/CRb8YEV.png) - Compile ASTs(abstract syntax tree) to physical plans - Complex SQLs can take a bit longer - Plan cache (experimental) can reduce optimizer cost ![](https://i.imgur.com/DiDMHId.png) ![](https://i.imgur.com/spnhGk2.png) - Executors execute the physical plan produced from the optimizer - (Batch) Point Get executor sends RPC to TiKV to get rows by key - Some executors (e.g. selection and aggregation) can be pushed down to the coprocessor of TiKV/TiFlash - Transaction write buffer is combined with the result from TiKV/TiFlash - Parse Duration: the statistics of the parsing time of SQL statements - Compile Duration: the statistics of the time of compiling the parsed SQL AST to the execution plan - Execution Duration: the statistics of the execution time for SQL statements - Expensive Executor OPS: the statistics of the operators that consume many system resources per second, including Merge Join, Hash Join, Index Look Up Join, Hash Agg, Stream Agg, Sort, TopN, and so on - Queries Using Plan Cache OPS: the statistics of queries using the Plan Cache per second ![](https://i.imgur.com/TrY6QCv.png) Another key Metric during the read flow that needs to me taken into account is the time it takes TiDB to obtain TSO(Timestamp Oracle) from PD. High PD TSO Wait Duration can slow down the read. #### Possible cause - Busy Go runtime - High PD load - Network issue between TiDB and PD This can be seen in the PD Client Dashboard: ![](https://i.imgur.com/xAsX3ZQ.png) We'll now move onto how TiKV processes reads. As you learned earlier int the guide TiKV is responsible for storage using Raft, RocksDB and the KV(Key Valuse Store). ![](https://i.imgur.com/MKlfcFh.png) - Read RPCs from TiDB need to get a RocksDB snapshot first - Local read (a.k.a lease read) is used if leader lease is valid - Otherwise, raftstore needs to confirm its leadership via read index before generating RocksDB snapshot - Follower read needs read index to guarantee linearizability Let's open a few more Dashboards ![](https://i.imgur.com/WpNpiny.png) ![](https://i.imgur.com/QfhQdDG.png) - Leader lease is updated at write operations and read index - Read index OPS should be far lower than local read OPS (except for "Follower Read" cases) The Follower Read feature refers to using any follower replica of a Region to serve a read request under the premise of strongly consistent reads. This feature improves the throughput of the TiDB cluster and reduces the load of the leader. It contains a series of load balancing mechanisms that offload TiKV read loads from the leader replica to the follower replica in a Region. TiKV's Follower Read implementation provides users with strongly consistent reads. ![](https://i.imgur.com/78sgWKp.png) - Storage async snapshot duration should be very short in most cases - High duration is mostly because of too many read indexes At this point you have a general understanding of the Read Flow at a high level. You've seen some key metrics that are involved in the Read process. It has not meant to be an exhaustive exercise. There are many more metrics, but at the cost of losing you in the "weeds" we'll shift to an important part of the TiKV, the Coprocessor and look at some some metrics to get an understanding of its function. #### TiKV Coprocessor The Coprocessor is used to improve the performance of TiDB. For some operations like select count(*), there is no need for TiDB to get data from row to row first and then count. The quicker way is that TiDB pushes down these operations to the corresponding TiKV nodes, the TiKV nodes do the computing and then TiDB consolidates the final results. ![](https://i.imgur.com/06OaZFI.png) Here we can find relevant metrics for the Coprocessor: ![](https://i.imgur.com/XcgPKDU.png) ![](https://i.imgur.com/qLJiKGM.png) #### Coprocessor Overview - Request duration: The total duration from the time of receiving the coprocessor request to the time of finishing processing the request - Total Requests: The number of requests by type per second - Handle duration: The histogram of time spent actually processing coprocessor requests per minute - Total Request Errors: The number of request errors of Coprocessor per second. There should not be a lot of errors in a short time. - Total KV Cursor Operations: The total number of the KV cursor operations by type per second, such as select, index, analyze_table, analyze_index, checksum_table, checksum_index, and so on. - KV Cursor Operations: The histogram of KV cursor operations by type per second - Total RocksDB Perf Statistics: The statistics of RocksDB performance - Total Response Size: The total size of coprocessor response ![](https://i.imgur.com/VifsdVl.png) #### Coprocessor Detail - Handle duration: The histogram of time spent actually processing coprocessor requests per minute - 95% Handle duration by store: The time consumed to handle coprocessor requests per TiKV instance per second (P95) - Wait duration: The time consumed when coprocessor requests are waiting to be handled. It should be less than 10s (P99.99). - 95% Wait duration by store: The time consumed when coprocessor requests are waiting to be handled per TiKV instance per second (P95) - Total DAG Requests: The total number of DAG requests per second - Total DAG Executors: The total number of DAG executors per second - Total Ops Details (Table Scan): The number of RocksDB internal operations per second when executing select scan in coprocessor - Total Ops Details (Index Scan): The number of RocksDB internal operations per second when executing index scan in coprocessor - Total Ops Details by CF (Table Scan): The number of RocksDB internal operations for each CF per second when executing select scan in coprocessor - Total Ops Details by CF (Index Scan): The number of RocksDB internal operations for each CF per second when executing index scan in coprocessor Scrolling through you can see that the metrics are quite numerous. The intent is to get you familiar with the role that the Coprocessor plays in TiKV and it's relevance to the overall Read/Write process. If you wish to throughly understand all the important key metrics in TiKV other than just the Coprocessor you can read here: [https://docs.pingcap.com/tidb/stable/grafana-tikv-dashboard](https://) Let's continue with the Read Flow. ![](https://i.imgur.com/cdeWH1r.png) ![](https://i.imgur.com/IbL6dch.png) ![](https://i.imgur.com/p2LC7NJ.png) ![](https://i.imgur.com/aEgXcAy.png) - Handle duration and cursor operations show the types of coprocessor requests that use a lot of resource. The "Total KV Cursor Operations" can be found in the Coprocessor Overview Dashboard illustrated eralier. The next 2 metrics below are all key metrics that can be found in the "Coprocessor Details" Dashboard you viewed earlier. ![](https://i.imgur.com/jo9u4VW.png) ![](https://i.imgur.com/CQ3dfvZ.png) The next 2 metrics below are all key metrics that can be found in the "RocksDB-KV" Dashboard located in the TiKV Details Dashboard that you've been using. ![](https://i.imgur.com/GLhx7Am.png) ![](https://i.imgur.com/QNslBf1.png) #### Summary In general, databases are an integral part of any information system. They are used for storing, querying, and updating critical business data. Naturally, the availability, performance, and security of a database system are of primary concerns for any database administrator. To facilitate this, system administrators typically make use of various database monitoring tools. A properly configured database monitoring regimen has a number of benefits. For example: Proactive monitoring is always better than a reactive approach. It’s best to identify any warning signs before they become major incidents. When apps malfunction or crawl to a snail’s pace, the first place people start investigating (and blaming) is the database. Having database monitoring in place can quickly pinpoint any possible issues and resolve those issues. Understanding the Metrics available in TiDB is the first step in being proactive and having the ability to facilitate Performance optimization and solving Performance anomalies. # Appendices # Appendix A ### Quickly Deploy a Local TiDB Cluster The TiDB cluster is a distributed system that consists of multiple components. A typical TiDB cluster consists of at least three PD nodes, three TiKV nodes, and two TiDB nodes. If you want to run some quick tests on TiDB, you might find it time-consuming and complicated to manually deploy so many components. This document introduces the playground component of TiUP and how to use it to quickly build a local TiDB test environment. ### TiUP playground overview TiUP can be downloaded by following the instructions here: [http://tiup.io](https://) The basic usage of the playground component is shown as follows: ```bash tiup playground [version] [flags] ``` If you directly execute the `tiup playground` command, TiUP uses the locally installed TiDB, TiKV, and PD components or installs the stable version of these components to start a TiDB cluster that consists of one TiKV instance, one TiDB instance, and one PD instance. This command actually performs the following operations: - Because this command does not specify the version of the playground component, TiUP first checks the latest version of the installed playground component. Assume that the latest version is v1.3.0, then this command works the same as `tiup playground:v1.3.0`. - If you have not used TiUP playground to install the TiDB, TiKV, and PD components, the playground component installs the latest stable version of these components, and then start these instances. - Because this command does not specify the version of the TiDB, PD, and TiKV component, TiUP playground uses the latest version of each component by default. Assume that the latest version is v5.0.0, then this command works the same as `tiup playground:v1.3.0 v5.0.0`. - Because this command does not specify the number of each component, TiUP playground, by default, starts a smallest cluster that consists of one TiDB instance, one TiKV instance, and one PD instance. - After starting each TiDB component, TiUP playground reminds you that the cluster is successfully started and provides you some useful information, such as how to connect to the TiDB cluster through the MySQL client and how to access the [TiDB Dashboard](/dashboard/dashboard-intro.md). The command-line flags of the playground component are described as follows: ```bash Flags: --db int TiDB instance number (default 1) --db.binpath string TiDB instance binary path --db.config string TiDB instance configuration file --db.host host Set the listening address of TiDB --drainer int Set the Drainer data of the cluster --drainer.binpath string Specify the location of the Drainer binary files (optional, for debugging) --drainer.config string Specify the Drainer configuration file -h, --help help for tiup --host string Playground cluster host (default "127.0.0.1") --kv int TiKV instance number (default 1) --kv.binpath string TiKV instance binary path --kv.config string TiKV instance configuration file --monitor Start prometheus component --pd int PD instance number (default 1) --pd.binpath string PD instance binary path --pd.config string PD instance configuration file --pump int Specify the number of Pump nodes in the cluster. If the value is not "0", TiDB Binlog is enabled. --pump.binpath string Specify the location of the Pump binary files (optional, for debugging) --pump.config string Specify the Pump configuration file (optional, for debugging) --tiflash int TiFlash instance number --tiflash.binpath string TiFlash instance binary path --tiflash.config string TiFlash instance configuration file ``` ## Examples ### Use the nightly version to start a TiDB cluster ```shell tiup playground nightly ``` In the command above, `nightly` is the version number of the cluster. Similarly, you can replace `nightly` with `v5.0.0`, and the command is `tiup playground v5.0.0`. ### Start a cluster with monitor ```shell tiup playground nightly --monitor ``` This command starts Prometheus on port 9090 to display the time series data in the cluster. ### Override PD's default configuration First, you need to copy the [PD configuration template](https://github.com/pingcap/pd/blob/master/conf/config.toml). Assume you place the copied file to `~/config/pd.toml` and make some changes according to your need, then you can execute the following command to override PD's default configuration: ```shell tiup playground --pd.config ~/config/pd.toml ``` ### Replace the default binary files By default, when playground is started, each component is started using the binary files from the official mirror. If you want to put a temporarily compiled local binary file into the cluster for testing, you can use the `--{comp}.binpath` flag for replacement. For example, execute the following command to replace the binary file of TiDB: ```shell tiup playground --db.binpath /xx/tidb-server ``` ### Start multiple component instances By default, only one instance is started for each TiDB, TiKV, and PD component. To start multiple instances for each component, add the following flag: ```shell tiup playground v3.0.10 --db 3 --pd 3 --kv 3 ``` ## Quickly connect to the TiDB cluster started by playground TiUP provides the `client` component, which is used to automatically find and connect to a local TiDB cluster started by playground. The usage is as follows: ```shell tiup client ``` This command provides a list of TiDB clusters that are started by playground on the current machine on the console. Select the TiDB cluster to be connected. After clicking <kbd>Enter</kbd>, a built-in MySQL client is opened to connect to TiDB. ## View information of the started cluster ```shell tiup playground display ``` The command above returns the following results: ``` Pid Role Uptime --- ---- ------ 84518 pd 35m22.929404512s 84519 tikv 35m22.927757153s 84520 pump 35m22.92618275s 86189 tidb exited 86526 tidb 34m28.293148663s 86190 drainer 35m19.91349249s ``` ## Scale out a cluster The command-line parameter for scaling out a cluster is similar to that for starting a cluster. You can scale out two TiDB instances by executing the following command: ```shell tiup playground scale-out --db 2 ``` ## Scale in clusters You can specify a `pid` in the `tiup playground scale-in` command to scale in the corresponding instance. To view the `pid`, execute `tiup playground display`. ```shell tiup playground scale-in --pid 86526 ``` # Appendix B ## Stress Test TiDB Using TiUP Bench Component When you test the performance of a database, it is often required to stress test the database. To facilitate this, TiUP has integrated the bench component, which provides two workloads for stress testing: TPC-C and TPC-H. The commands and flags are as follows: ```bash tiup bench ``` ``` Starting component `bench`: /home/tidb/.tiup/components/bench/v1.3.0/bench Benchmark database with different workloads Usage: tiup bench [command] Available Commands: help Help about any command tpcc tpch Flags: --count int Total execution count, 0 means infinite -D, --db string Database name (default "test") -d, --driver string Database driver: mysql --dropdata Cleanup data before prepare -h, --help help for /Users/joshua/.tiup/components/bench/v0.0.1/bench -H, --host string Database host (default "127.0.0.1") --ignore-error Ignore error when running workload --interval duration Output interval time (default 10s) --isolation int Isolation Level 0: Default, 1: ReadUncommitted, 2: ReadCommitted, 3: WriteCommitted, 4: RepeatableRead, 5: Snapshot, 6: Serializable, 7: Linerizable --max-procs int runtime.GOMAXPROCS -p, --password string Database password -P, --port int Database port (default 4000) --pprof string Address of pprof endpoint --silence Don't print error when running workload --summary Print summary TPM only, or also print current TPM when running workload -T, --threads int Thread concurrency (default 16) --time duration Total execution time (default 2562047h47m16.854775807s) -U, --user string Database user (default "root") ``` The following sections describe how to run TPC-C and TPC-H tests using TiUP. ## Run TPC-C test using TiUP The TiUP bench component supports the following commands and flags to run the TPC-C test: ```bash Available Commands: check Check data consistency for the workload cleanup Cleanup data for the workload prepare Prepare data for the workload run Run workload Flags: --check-all Run all consistency checks -h, --help help for tpcc --output string Output directory for generating csv file when preparing data --parts int Number to partition warehouses (default 1) --tables string Specified tables for generating file, separated by ','. Valid only if output is set. If this flag is not set, generate all tables by default. --warehouses int Number of warehouses (default 10) ``` ### Test procedures 1. Create 4 warehouses using 4 partitions via hash: ```shell tiup bench tpcc --warehouses 4 --parts 4 prepare ``` 2. Run the TPC-C test: ```shell tiup bench tpcc --warehouses 4 run ``` 3. Clean up data: ```shell tiup bench tpcc --warehouses 4 cleanup ``` 4. Check the consistency: ```shell tiup bench tpcc --warehouses 4 check ``` 5. Generate the CSV file: ```shell tiup bench tpcc --warehouses 4 prepare --output data ``` 6. Generate the CSV file for the specified table: ```shell tiup bench tpcc --warehouses 4 prepare --output data --tables history,orders ``` 7. Enable pprof: ```shell tiup bench tpcc --warehouses 4 prepare --output data --pprof :10111 ``` ## Run TPC-H test using TiUP The TiUP bench component supports the following commands and parameters to run the TPC-H test: ```bash Available Commands: cleanup Cleanup data for the workload prepare Prepare data for the workload run Run workload Flags: --check Check output data, only when the scale factor equals 1 -h, --help help for tpch --queries string All queries (default "q1,q2,q3,q4,q5,q6,q7,q8,q9,q10,q11,q12,q13,q14,q15,q16,q17,q18,q19,q20,q21,q22") --sf int scale factor ``` ### Test procedures 1. Prepare data: ```shell tiup bench tpch --sf=1 prepare ``` 2. Run the TPC-H test by executing one of the following commands: - If you check the result, run this command: ```shell tiup bench tpch --sf=1 --check=true run ``` - If you do not check the result, run this command: ```shell tiup bench tpch --sf=1 run ``` 3. Clean up data: ```shell tiup bench tpch cleanup ``` # Appendix C ### TiDB Kubernetes Operator There are many ways to install the TiDB Operator in a Kubernetes Cluster, Information can be found here: [https://docs.pingcap.com/tidb-in-kubernetes/dev/get-started](https://) # Apendix D ### Install HaProxy - Best Practices [https://docs.pingcap.com/tidb/stable/haproxy-best-practices](https://)