# collation differences experiment
## docker run both services
### mysql
- mysql container
```=bash
docker run --name mysql-server -p 3306:3306 -e MYSQL_ROOT_PASSWORD=your_password -d mysql:8.0
docker exec -it mysql-server bash
mysql -u root -p
```
- create test_db and test_table
```=bash
CREATE DATABASE test_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
USE test_db;
CREATE TABLE test_table (test_column VARCHAR(255));
```
### postgresql
- postgresql container
```=bash
docker run --name postgres-server -p 5432:5432 -e POSTGRES_PASSWORD=your_password -d postgres:13.7
docker exec -it postgres-server bash
psql -U postgres
```
- create test_db and test_table
```=bash
CREATE DATABASE test_db WITH ENCODING 'UTF8' LC_COLLATE='C.UTF-8' LC_CTYPE='C.UTF-8';
\c test_db
CREATE TABLE test_table (test_column VARCHAR(255));
```
## Test cases
if postgresql: collation_name=C.UTF-8
if mysql: collation_name=utf8mb4_unicode_ci
### 1. Alphabet Sorting
```=sql
INSERT INTO test_table (test_column) VALUES ('a'), ('A'), ('b'), ('B'), ('C'), ('ç'), ('Đ'), ('e');
SELECT * FROM test_table ORDER BY test_column COLLATE "collation_name";
```
- outputs
- mysql(case insensitive所以把a跟A放一起,e放最後)
```
+-------------+
| test_column |
+-------------+
| |
| |
| a |
| A |
| b |
| B |
| C |
| e |
+-------------+
8 rows in set (0.00 sec)
```
- postgresql(case sensitive所以把大寫字元照順序放前面,接續小寫,再來是特殊字元)
```
test_column
-------------
A
B
C
a
b
e
ç
Đ
(8 rows)
```
### 2. special character sorting
|Char|ASCII|Unicode|
|---|---|---|
|+|43|U+002B|
|$|36|U+0024|
|-|45|U+002D|
|!|33|U+0021|
|@|64|U+0040|
|#|35|U+0023|
```=sql
TRUNCATE TABLE test_table;
INSERT INTO test_table (test_column) VALUES ('+'), ('$'), ('-'), ('!'), ('@'), ('#');
SELECT * FROM test_table ORDER BY test_column COLLATE "collation_name";
```
- outputs
- mysql
```
+-------------+
| test_column |
+-------------+
| - |
| ! |
| @ |
| # |
| + |
| $ |
+-------------+
6 rows in set (0.00 sec)
```
- postgresql (using ascii)
```
test_column
-------------
!
#
$
+
-
@
(6 rows)
```
### 3. Accent Mark Sorting
```=sql
TRUNCATE TABLE test_table;
INSERT INTO test_table (test_column) VALUES ('résumé'), ('resume'), ('resumé');
SELECT * FROM test_table ORDER BY test_column COLLATE "collation_name";
```
- outputs
- mysql(辨識不出accent mark)
```
+-------------+
| test_column |
+-------------+
| resum |
| resume |
| rsum |
+-------------+
3 rows in set (0.01 sec)
```
- postgresql (可以辨識出accent mark)
```
test_column
-------------
resume
resumé
résumé
(3 rows)
```
### 4. Case-Sensitive Search
```=sql
TRUNCATE TABLE test_table;
INSERT INTO test_table (test_column) VALUES ('Hello'), ('HELLO'), ('HeLLo');
SELECT * FROM test_table WHERE test_column = 'hello' COLLATE "collation_name";
```
- outputs
- <font style color = 'red'>**mysql**</font>
```
ERROR 1253 (42000): COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'latin1'
```
- 這邊也會報一樣的錯
```=sql
INSERT INTO test_table (test_column) VALUES ('Yeah'),('YEAH'), ('YeAH');
SELECT * FROM test_table WHERE test_column = 'yeah' COLLATE "utf8mb4_unicode_ci";
```
- postgresql
```
test_column
-------------
(0 rows)
```
- <font style color='blue'>use ILIKE</font>
```=sql
SELECT * FROM test_table WHERE test_column ilike 'hello';
```
```
test_column
-------------
Hello
HELLO
HeLLo
(3 rows)
```
### 5. Accent-Sensitive Search
```=sql
TRUNCATE TABLE test_table;
INSERT INTO test_table (test_column) VALUES ('résumé'), ('resume'), ('resumé');
SELECT * FROM test_table WHERE test_column = 'resume' COLLATE "collation_name";
```
- outputs
- mysql
```
ERROR 1253 (42000): COLLATION 'utf8mb4_unicode_ci' is not valid for CHARACTER SET 'latin1'
```
- postgresql
```
test_column
-------------
resume
(1 row)
```
- use ilike
```=sql
SELECT * FROM test_table WHERE test_column ilike 'resume' COLLATE "C.UTF-8";
```
```
test_column
-------------
resume
(1 row)
```