MySQL Compatibility

# MySQL Compatibility TiDB aims for compatibility with MySQL 5.7, in terms of wire/client protocol, SQL syntax, and many behaviors. There are, however, many exceptions to this that are important to understand. You should carefully review the [MySQL Compatibility](https://pingcap.com/docs/stable/reference/mysql-compatibility/) page in the documentation to see whether there are some clear issues that might affect your application. This document aims to help you understand a few common issues that users encounter when assessing and executing a migration from MySQL to TiDB. ## TiDB is cluster-native TiDB is a clustered database platform, which means that transactions can be executing independently on many different front-end TiDB Server instances, against many different back-end TiKV Server instances. Hence, commands and behaviors that you might expect on a standalone MySQL instance will be different when using TiDB. Here are some examples: - `SHOW PROCESSLIST` works only only a single TiDB server - Use `information_schema.cluster_processlist` to see all transactions across the whole cluster. MySQL [(none)]> select * from information_schema.cluster_processlist\G *************************** 1. row *************************** INSTANCE: poc-cluster-tidb-1.poc-cluster-tidb-peer.tidb-cluster.svc:10080 ID: 1249 USER: root HOST: 10.0.68.251 DB: NULL COMMAND: Query TIME: 0 STATE: autocommit INFO: select * from information_schema.cluster_processlist MEM: 0 TxnStart: 04-29 23:56:57.365(416338311214530561) *************************** 2. row *************************** INSTANCE: poc-cluster-tidb-1.poc-cluster-tidb-peer.tidb-cluster.svc:10080 ID: 1048 USER: root HOST: 10.0.68.251 DB: test COMMAND: Query TIME: 1 STATE: autocommit INFO: select sleep(30) MEM: 0 TxnStart: *************************** 3. row *************************** INSTANCE: poc-cluster-tidb-0.poc-cluster-tidb-peer.tidb-cluster.svc:10080 ID: 458 USER: root HOST: 10.0.68.251 DB: test COMMAND: Sleep TIME: 123 STATE: in transaction INFO: NULL MEM: 0 TxnStart: 04-29 23:54:50.015(416338277830492161) 3 rows in set (0.01 sec) - `KILL` requires an additional `TIDB` keyword, because you could accidentally kill a query on the wrong TiDB Server instance - Because DDL runs on the TiKV nodes, you must use `ADMIN SHOW DDL JOBS` to view DDL jobs and `ADMIN CANCEL DDL JOBS` to kill them tidb 1> alter table sbtest1 add key(c) tidb 2> admin show ddl\G *************************** 1. row *************************** SCHEMA_VER: 134 OWNER_ID: 9864057a-a2ee-428d-9678-e22c3628f3e0 OWNER_ADDRESS: poc-cluster-tidb-1.poc-cluster-tidb-peer.tidb-cluster.svc:4000 RUNNING_JOBS: SELF_ID: 9864057a-a2ee-428d-9678-e22c3628f3e0 QUERY: 1 row in set (0.01 sec) tidb 2> admin cancel ddl jobs 153; +--------+------------+ | JOB_ID | RESULT | +--------+------------+ | 153 | successful | +--------+------------+ 1 row in set (0.01 sec) (tidb 1) ERROR 8214 (HY000): Cancelled DDL job - Things like `GET_LOCK()`, `LOCK TABLE` don't work like you'd expect in MySQL - The TiDB slow query log is written separately by each instance of TiDB Server, so you need to aggregate those logs together when debugging slow queries. ## AUTO_INCREMENT & AUTO_RANDOM - AUTO_INCREMENT in TiDB can have surprising gaps. Each TiDB Server instance is allocated a range of auto-increment values. Inserts into multiple different TiDB Server instances have monotonically increasing values for a single server instance, but *not* across the entire cluster: ``` $ mysql -e 'insert into test.t1 () values (); select @@hostname, last_insert_id();' +--------------------+------------------+ | @@hostname | last_insert_id() | +--------------------+------------------+ | poc-cluster-tidb-0 | 1 | +--------------------+------------------+ $ mysql -e 'insert into test.t1 () values (); select @@hostname, last_insert_id();' +--------------------+------------------+ | @@hostname | last_insert_id() | +--------------------+------------------+ | poc-cluster-tidb-1 | 30001 | +--------------------+------------------+ $ mysql -e 'insert into test.t1 () values (); select @@hostname, last_insert_id();' +--------------------+------------------+ | @@hostname | last_insert_id() | +--------------------+------------------+ | poc-cluster-tidb-0 | 2 | +--------------------+------------------+ $ mysql -e 'insert into test.t1 () values (); select @@hostname, last_insert_id();' +--------------------+------------------+ | @@hostname | last_insert_id() | +--------------------+------------------+ | poc-cluster-tidb-1 | 30002 | +--------------------+------------------+ ``` More problematic is the tendency for auto-increment behavior to create a "hot zone" in TiKV for insert-heavy workloads, because all writes go to the same region. TiDB has experimental support for `AUTO_RANDOM`. This causes values across the range of a data type to be used instead of values that all cluster in a small window. If you have workloads that make heavy use of massively parallel inserts into auto-increment columns, you may wish to plan on testing auto-random. https://pingcap.com/docs/stable/reference/sql/attributes/auto-random/ - The `allow-auto-random` option must be enabled in TiDB Server: kubectl patch tc poc-test-cluster -n tidb-cluster --type=merge -p \ '{"spec":{"tidb":{"config":{"experimental":{"allow-auto-random":true}}}}}'` ## DDL DDL operations in a distributed database are more complex than in a standalone database system because schema changes must be synchronized across multiple nodes. Fortunately, TiDB has adopted the same approach described in [Online, Asynchronous Schema Change in F1](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41376.pdf). This affords non-blocking DDL that does not block reads or writes even when adding or removing columns from a table. One [important DDL behavior differences between MySQL and TiDB](https://pingcap.com/docs/stable/reference/mysql-compatibility/#ddl) that may affect your application: - Most compound (multiple operation) ALTER TABLE statements are not supported tidb> alter table t1 add key(a), add key(b); ERROR 8200 (HY000): Unsupported multi schema change If your application relies on setup, upgrade, or migration scripts that use this type of compound `ALTER TABLE` statement, those will need to be modified to work with TiDB. ## Transactions & Locking TiDB now defaults to a pessimistic locking model, which means most behaviors are similar to MySQL/InnoDB, with some important caveats: https://pingcap.com/docs/stable/reference/transactions/transaction-pessimistic/#difference-with-mysql-innodb Older versions of TiDB supported only optimistic locking, which is still available as an option. When using optimistic locking, you get deferred constraint checking and no deadlocks. Here's a scenario using pessimistic locking that shows deadlock detection in action: ``` tidb 1> create table t1 (id int unsigned not null auto_increment primary key); Query OK, 0 rows affected (1.01 sec) tidb 1> create table t2 (id int unsigned not null auto_increment primary key); Query OK, 0 rows affected (1.02 sec) tidb 1> select @@hostname; +--------------------+ | @@hostname | +--------------------+ | poc-cluster-tidb-0 | +--------------------+ 1 row in set (0.00 sec) tidb 1> set tidb_txn_mode=pessimistic; Query OK, 0 rows affected (0.00 sec) tidb 1> begin; Query OK, 0 rows affected (0.00 sec) tidb 1> insert into t1 (id) values (1); Query OK, 1 row affected (0.02 sec) ``` ``` tidb 2> select @@hostname; +--------------------+ | @@hostname | +--------------------+ | poc-cluster-tidb-1 | +--------------------+ 1 row in set (0.00 sec) tidb 2> set tidb_txn_mode=pessimistic; Query OK, 0 rows affected (0.00 sec) tidb 2> begin; Query OK, 0 rows affected (0.00 sec) tidb 2> insert into t2 (id) values (1); Query OK, 1 row affected (0.01 sec) ``` ``` tidb 1> insert into t2 (id) values (1); Query OK, 1 row affected (8.32 sec) <blocks> ``` ``` tidb 2> insert into t1 (id) values (1); ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction ``` And here's the same scenario using optimistic locking, showing that there's no deadlock and that conflicts are resolved only at commit time: ``` tidb 1> set tidb_txn_mode=optimistic; Query OK, 0 rows affected (0.00 sec) tidb 1> begin; Query OK, 0 rows affected (0.00 sec) tidb 1> insert into t1 (id) values (1); Query OK, 1 row affected (0.01 sec) ``` ``` tidb 2> set tidb_txn_mode=pessimistic; Query OK, 0 rows affected (0.00 sec) tidb 2> begin; Query OK, 0 rows affected (0.00 sec) tidb 2> insert into t2 (id) values (1); Query OK, 1 row affected (0.02 sec) ``` ``` tidb 1> insert into t2 (id) values (1); Query OK, 1 row affected (0.01 sec) ``` ``` tidb 2> insert into t1 (id) values (1); Query OK, 1 row affected (0.01 sec) ``` ``` tidb 1> commit; <blocks> ``` ``` tidb 2> commit; Query OK, 0 rows affected (0.00 sec) ``` ``` (tidb 1) ERROR 9007 (HY000): Write conflict, txnStartTS=416337970833653761, conflictStartTS=416337972537589761, conflictCommitTS=416337975893032964, key={tableID=144, handle=1} primary={tableID=144, handle=1} [try again later] ``` TiDB Supports only "Snapshot Isolation" (SI), which is similar to MySQL's Repeatable Read but has some important differences: https://pingcap.com/docs/stable/reference/transactions/transaction-isolation/ ## Character Sets and Collations TiDB supports single-byte (binary, latin1, ascii) character sets as well as utf8 and utf8mb4. Until version 4.0, TiDB only supported "binary" collations. Starting with version 4.0, TiDB supports utf8_general_ci and utf8mb4_general_ci collations. However, a decision must be made **at the time of cluster creation** whether to enable the new collation model. This is to avoid backward compatibility problems. If the old collation model is in place, collation clauses will generally be ignored and binary collations will be used implicitly. You can check whether the new collation model is in use by executing `select variable_value from mysql.tidb where variable_name = 'new_collation_enabled';`. ``` tidb> select variable_value from mysql.tidb where variable_name = 'new_collation_enabled'; +----------------+ | VARIABLE_VALUE | +----------------+ | False | +----------------+ 1 row in set (0.00 sec) tidb> create table t1 (id int unsigned not null auto_increment primary key, v varchar(32) character set utf8 collate utf8_unicode_ci); Query OK, 0 rows affected (0.11 sec) ``` ``` tidb> select variable_value from mysql.tidb where variable_name = 'new_collation_enabled'; +----------------+ | VARIABLE_VALUE | +----------------+ | True | +----------------+ 1 row in set (0.00 sec) tidb> create table t1 (id int unsigned not null auto_increment primary key, v varchar(32) character set utf8 collate utf8_unicode_ci); ERROR 1273 (HY000): Unsupported collation when new collation is enabled: 'utf8_unicode_ci' ``` ## Partitioning There's no need to use explicit partitioning in TiDB for data to be split into multiple regions and distributed across multiple TiKV instances. However, TiDB does support [explicit partitioning](https://pingcap.com/docs/stable/reference/sql/partitioning/) for cases where the volume or distribution of data is already well-known and the overhead of region splitting can be avoided. TiDB also has a [`SPLIT REGION`](https://pingcap.com/docs/stable/reference/sql/statements/split-region/) command that can be used to approach this issue from a somewhat different angle. ## SQL Syntax TiDB accepts but ignores a *lot* of MySQL-compatible SQL syntax. We already discussed collations above, but here are some other important items that are accepted by the parser but ignored: - Foreign Key constraints - `ENGINE` clauses ### SQL Modes & optimizer hints - TiDB supports many/most [SQL modes](https://pingcap.com/docs/stable/reference/sql/sql-mode/) from MySQL 5.7 ## EXPLAIN TiDB uses a [different `EXPLAIN` output format](https://pingcap.com/docs/stable/reference/performance/understanding-the-query-execution-plan/) than MySQL. It takes some getting used to, but `EXPLAIN` in TiDB offers a lot more information than `EXPLAIN` in MySQL or MariaDB, especially information related to pushdown of work to the "coprocessor", the name used in TiDB Server for the TiKV backend. ``` tidb> explain select c from sbtest1 where k between 1000 and 10000; +---------------------------+-------------+-----------+---------------+---------------------------------------------------------+ | id | estRows | task | access object | operator info | +---------------------------+-------------+-----------+---------------+---------------------------------------------------------+ | Projection_4 | 74467098.77 | root | | sbtest.sbtest1.c | | └─TableReader_7 | 74467098.77 | root | | data:Selection_6 | | └─Selection_6 | 74467098.77 | cop[tikv] | | ge(sbtest.sbtest1.k, 1000), le(sbtest.sbtest1.k, 10000) | | └─TableFullScan_5 | 98494812.00 | cop[tikv] | table:sbtest1 | keep order:false | +---------------------------+-------------+-----------+---------------+---------------------------------------------------------+ 4 rows in set (0.00 sec) ``` ``` mysql> explain select c from sbtest1 where k between 1000 and 10000; +------+-------------+---------+-------+---------------+------+---------+------+-------+-----------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+---------+-------+---------------+------+---------+------+-------+-----------------------+ | 1 | SIMPLE | sbtest1 | range | k_1 | k_1 | 4 | NULL | 98640 | Using index condition | +------+-------------+---------+-------+---------------+------+---------+------+-------+-----------------------+ 1 row in set (0.00 sec) ```