# SQL 筆記
此筆記是基於 Mosh Hamedani 推出的的 SQL 課程 [Complete SQL Mastery](https://codewithmosh.com/p/complete-sql-mastery) 所撰寫。
# Getting started
## 什麼是 SQL?
SQL(Structured Query Language,結構化查詢語言)是專門用來處理(包括查詢和修改)關係型數據庫的標準語言。
## Database 數據庫或稱資料庫
- A DATABASE is a collection of data stored in a format that can easily be accessed
數據庫是一個以易訪問格式存儲的數據集合
- 為了管理我們的數據庫我們使用一種叫做數據庫管理系統(DBMS, Database Management System)的軟體。我們連接到一個DBMS 然後下達查詢或者修改數據的指令,DBMS 就會執行我們的指令並返回結果
- 有關係型和非關係型兩類數據庫,在更流行的關係型數據庫中,我們把數據存儲在**通過某些關係相互關聯的數據表中,每張表儲存特定的一類數據**,這正是關係型數據庫名稱的由來。(如:顧客表通過顧客id與訂單表相聯繫,訂單表又通過商品id與商品表相聯繫)
- 不同關係型數據庫管理系統語法略有不同,但都是基於標準SQL,本課使用最流行的開源關係型數據庫管理系統,MySQL。
# Retrieving Data From a Single Table
## The SELECT Statement
```sql
SELECT
name,
unit_price,
unit_price * 1.1 AS new_price -- 或 'new price'
FROM products
```
## The WHERE Clause
```sql
SELECT *
FROM customers
-- WHERE points > 3000
WHERE state != 'va' -- 'VA' 和 'va' 一樣結果
```
1. SQL 運算子:`>、<、=、>=、<=、!=、<>`
2. 可對日期格式或字串進行運算,注意 SQL 日期格式需要用引號包起來:
`WHERE birth_date > '1990-01-01'`
## The AND, OR and NOT Operators
```sql
SELECT *
FROM customers
WHERE birth_date > '1990-01-01' OR
(points > 1000 AND state = 'VA') -- AND 優先於 OR,建議加括號,更清楚一些。
```
### NOT 用法
```sql
WHERE NOT (birth_date > '1990-01-01' OR points > 1000)
等於下面
WHERE birth_date <= '1990-01-01' AND points <= 1000
```
## The IN Operator
IN 可以找出所有符合的值
```sql
-- 選出符合 'va'、'fl'、'ga' 州的顧客
SELECT * FROM customers
WHERE state = 'va' OR state = 'fl' OR state = 'ga'
-- 可以改寫為下面醬:
WHERE state IN ('va', 'fl', 'ga')
-- 也可在加 NOT
WHERE state NOT IN ('va', 'fl', 'ga')
```
## The BETWEEN Operator
必須和 AND 一起用!
```sql
SELECT * FROM customers
WHERE points >= 1000 and points <= 3000
-- 等於下面醬
WHERE points BETWEEN 1000 AND 3000
```
## The LIKE Operator (已過時)
- % 代表任意數量的字元(包含 0 個)
- _ 代表一個字元
```sql
SELECT * FROM customers
WHERE last_name like 'brush%' OR
last_name like 'b____y'
```
## The REGEXP Operator 正規表達式
```sql
SELECT * FROM customers
WHERE last_name like '%field%'
-- 等於下面
WHERE last_name REGEXP 'field'
```
```sql
WHERE last_name REGEXP '^mac|field$|rose'
WHERE last_name REGEXP '[gi]e|e[fmq]' -- 查找含 ge/ie 或 ef/em/eq 的
WHERE last_name REGEXP '[a-h]e'
```
- ^ 開頭
- $ 結尾
- [abc] 包含 a 或 b 或 c
- [a-c] 包含 a 到 c
- | 或 logical or
## The IS NULL Operator
```sql
SELECT * FROM customers
WHERE phone IS NULL / IS NOT NULL
```
## The ORDER BY Clause
```sql
SELECT *
FROM customers
ORDER BY state, first_name
-- 先依照 state 排序,若有兩個 state 相同再依照 first_name 排序
```
- Order By 後面可以接運算式跟沒有被 SELECT 到的欄位。
## The LIMIT Clause
```sql
SELECT * FROM customers
LIMIT 3 -- 只選三個
LIMIT 6, 3 -- 跳過六個之後選三個
```
# Retrieving Data From Multiple Tables
## Inner Joins
```sql
SELECT *
FROM orders
JOIN customers -- inner join 的 inner 可省略不寫
ON orders.customer_id = customers.customer_id
```
建議第一次可選 `SELECT *` 選取所有欄位後再選取想要的欄位名稱⬇️
```sql
SELECT order_id, first_name, last_name
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id
```
⚠️ 注意若我也想 `SELECT customer_id` 欄位:
```sql
-- 這一段會報錯,因為 SQL 不知道選 orders 還是 customers 的 customer_id
-- 雖然它們是一樣的
SELECT order_id, customer_id, first_name, last_name
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id
-- 你應在 customer_id 前加上資料表名稱(兩個都可以):
SELECT order_id, orders.customer_id, first_name, last_name
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id
```
替資料表寫別名(語法糖)
```sql
-- 這邊取別名不用再加 AS
SELECT order_id, o.customer_id, first_name, last_name
FROM orders o
JOIN customers c
ON o.customer_id = c.customer_id
```
## Joining Across Databases (跨資料庫連接)
若要 JOIN 的資料表在另一個資料庫,就在該資料表前加上該資料庫前綴:
```sql
-- 這邊 sql_inventory 是另一個資料庫
select * from order_items oi
join sql_inventory.products p
on oi.product_id = p.product_id
```
## Self Joins
利用 `inner join` 一個表與自己合併。常用於員工及員工主管或商品及商品分類的 table。
⚠️ 要注意的是因為是自己 join 自己,所以資料表必須要取別名:
```sql
SELECT *
FROM employees e
JOIN employees m
ON e.reports_to = m.employee_id
```

讓 reports_to 欄位的值去關聯 employee_id,因為 reports_to 的值全部都是 37270 所以每一筆 SELECT 出來的資料都是 Yovonnda。需要注意的是第二筆資料的值是 NULL,所以 SELECT 出來時第二筆資料就不會顯示!

SELECT 出來的結果。每個員工的主管都是 Yovonnda,而 Yovonnda 本身的 reports_to 欄位值是 NULL (他沒有主管),在結果中就不會顯示他的資料。
承上,我們只 `SELECT` employee_id 跟 first_name 就好:
```sql
-- 這邊 first_name 出現兩次,可以將 m.first_name 另取別名為 manager 語意比較清楚
-- AS 在這邊是可以省略的
SELECT e.employee_id, e.first_name, m.first_name AS manager
FROM employees e
JOIN employees m
ON e.reports_to = m.employee_id
```

## Joining Multiple Tables
FROM 一個主要表 ,用多個 JOIN... ON... 去關聯多個不同資料表。
```sql
SELECT
o.order_id,
o.order_date,
c.first_name,
c.last_name,
os.name AS status
FROM orders o
JOIN customers c
ON o.customer_id = c.customer_id
JOIN order_statuses os
ON o.status = os.order_status_id
```
## Compound Join Conditions 複合 join 條件
💡需使用 `AND` 連結:
```sql
SELECT *
FROM 表A
JOIN 表B
ON 關聯條件1
AND 關聯條件2
```
像 order_items 這個資料表的 order_id 的值是有重複的,必須連 product_id 也一起看才能辨識唯一一筆紀錄:

打開設計模式選像可以看到 order_id 和 product_id 都是 **Primary Key** ⬇️

## Implicit Join Syntax 隱式關聯語法
就是將 FROM... JOIN... 寫法改成用 FROM 表1, 表2 WHERE... 🔽
```sql
SELECT *
FROM orders o
JOIN customers c
ON o.customer_id = c.customer_id
-- 下面是隱式關聯語法
SELECT *
FROM orders o, customers c
WHERE o.customer_id = c.customer_id
```
⚠️ 注意!**請盡量避免使用隱式關聯語法**,若忘記加上 `WHERE` 判斷會得到 cross join 的結果,若有 10 筆 order 和 10 筆 customer 就會撈出 100 筆資料。
## Outer Joins
- (INNER) JOIN 只會有兩個表交集的結果。
- LEFT / RIGHT (OUTER) JOIN 則會根據關聯條件顯示全部筆數的資料,沒有交集的話值就是 NULL。
這是 INNER JOIN 的用法,我們關聯 `c.customer_id = o.customer_id` 可以撈出有下訂單的顧客。
```sql
SELECT
c.customer_id,
c.first_name,
o.order_id
FROM customers c
JOIN orders o
ON c.customer_id = o.customer_id
ORDER BY customer_id
```

但若我們想撈出所有顧客時就須使用 OUTER JOIN 的方式(將所有 c.customer_id 都撈出來儘管有些人沒下訂單 order_id 為 NULL)
```sql
SELECT
c.customer_id,
c.first_name,
o.order_id
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
ORDER BY customer_id
```

## Outer Join Between Multiple Tables 多個 outer join
```sql
SELECT
c.customer_id,
c.first_name,
o.order_id,
sh.name AS shipper
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
LEFT JOIN shippers sh
ON o.shipper_id = sh.shipper_id
ORDER BY customer_id
```
⚠️ 請一律使用 INNER JOIN 跟 LEDT JOIN 就好,不然很容易出錯看到眼花。
## The USING Clause
當使用 JOIN 時的關聯條件的欄位名稱相同可使用 USING:
```sql
SELECT
o.order.id,
c.first_name
FROM orders o
JOIN customers c
-- ON o.customer_id = c.customer_id
USING (customer_id)
```
複合關聯條件:
```sql
SELECT *
FROM order_items oi
JOIN order_item_notes oin
ON oi.order_id = oin.order_Id AND
oi.product_id = oin.product_id
-- USING (order_id, product_id)
```
## Natural Joins
此語法會自動檢視同名的欄位作為關聯條件,基本上不要用這個,因為沒辦法掌握結果,知道有這個語法就好。
```sql
SELECT
o.order_id,
c.first_name
FROM orders o
NATURAL JOIN customers c
```
## Cross Joins
交集兩個資料表所有欄位的組合。
```sql
SELECT
c.first_name AS customer,
p.name AS product
FROM customers c
CROSS JOIN products p
ORDER BY c.first_name
```
隱式語法如下:
```sql
SELECT
c.first_name,
p.name
FROM customers c, products p
ORDER BY c.first_name
```
盡量別用隱式的用法吧,語意比較不清楚。
## Unions
將不同次查詢資料上下串聯起來成一個結果。
- 不同次 SELECT 的欄位數量必須相同,否則會報錯。
- 執行結果的欄位名稱由第一個下 SELECT 語法的欄位名稱優先。
```sql
SELECT
order_id,
order_date,
'Active' AS status
FROM orders
WHERE order_date >= '2019-01-01'
UNION
SELECT
order_id,
order_date,
'Archived' AS status
FROM orders
WHERE order_date < '2019-01-01';
```

範例二:
```sql
SELECT
customer_id,
first_name,
points,
'Bronze' AS type
FROM customers
WHERE points < 2000
UNION
SELECT
customer_id,
first_name,
points,
'Silver' AS type
FROM customers
WHERE points BETWEEN 2000 and 3000
UNION
SELECT
customer_id,
first_name,
points,
'Gold' AS type
FROM customers
WHERE points > 3000
ORDER BY customer_id
```

# Inserting, Updating, and Deleting Data
varchar(50) VS char(50)
將一欄位屬性設為 varchar(50) 若只有 5可不照順序)
## Inserting a Row 插入一列
```sql
INSERT INTO 資料表 (欄位名稱) -- 欄位名稱是選填選項,盡量寫比較好!
VALUES (欄位值)
```
範例一(不填欄位名稱)
```sql
-- 若省略欄位名稱,值就必須照順序填,欄位屬性若是像 ID 有 AI 屬性,就要填 DEFAULT 值給它
INSERT INTO customers
VALUES (
DEFAULT,
'Michael',
'Jackson',
'1958-08-29',
DEFAULT,
'5225 Figueroa Mountain Rd',
'Los Olivos',
'CA',
DEFAULT
);
```
範例二(有填欄位名稱,這種寫法較優,可跳過有預設值的欄位或可不照順序)
```sql
INSERT INTO customers (
address,
city,
state,
last_name,
first_name,
birth_date,
)
VALUES (
'5225 Figueroa Mountain Rd',
'Los Olivos',
'CA',
'Jackson',
'Michael',
'1958-08-29',
)
```
## Inserting Multiple Rows 插入多列
就是 VALUES 後用多個括號給值:
```sql
INSERT INTO products (name, quantity_in_stock, unit_price)
VALUES ('product1', 1, 10),
('product2', 2, 20),
('product3', 3, 30)
```
## Inserting Hierarchical Rows 插入層級列
本小節主要重點是可以利用 `LAST_INSERT_ID()` 來獲得上一筆插入資料的 ID,可以藉此將此 ID 再插入到另一張資料表裡面,通常是用在有父層子層關係的資料表中,如範例是一個 order 訂單中的一筆資料會有多個 order_item 項目(同一筆 order 可以有很多個 order_item,它們用 order_id 關聯)。
```sql
-- 先在 orders 插入一筆訂單資料
INSERT INTO orders (customer_id, order_date, status)
VALUES (1, '2019-01-01', 1);
-- 可以用 SELECT last_insert_id() 確認一下目前的序數
-- 然後這筆單的關聯條件就是last_insert_id(),包含什麼產品、數量跟單價一起寫入資料庫
INSERT INTO order_items
VALUES
(last_insert_id(), 1, 2, 2.5),
(last_insert_id(), 2, 5, 1.5)
```
<aside>
💡 **注意!`LAST_INSERT_ID()` 只能用在有設置 Auto Incrememt 屬性的欄位上!!**
</aside>
## Creating a Copy of a Table 拷貝一個資料表
### 方法一、`CREATE TABLE 資料表名稱 AS 子查詢語法`
```sql
-- 複製 orders 資料表到 orders_archived
CREATE TABLE orders_archived AS
SELECT * FROM orders
```
只拷貝部分欄位
```sql
CREATE TABLE orders_archived AS
SELECT * FROM orders
WHERE order_date < '2019-01-01'
```
### 方法二、`INSERT INTO 資料表名稱 AS 子查詢語法`
```sql
INSERT INTO orders_archived -- 這邊不用再輸入 (欄位名稱),會直接使用子查詢後的欄位名
SELECT * FROM orders -- 這邊的子查詢對應的是 VALUES (...),(...),(...)
WHERE order_date < '2019-01-01'
```
## Updating rows 更新欄位值
語法
```sql
UPDATE 資料表名稱
SET 要更新的欄位名稱 = 值 / DEFAULT / NULL / 表達式
WHERE 條件
-- 要更新多筆資料就是增加 WHERE 的條件而已
```
範例
```sql
UPDATE invoices
SET
payment_total = 0.5 * invoice_total,
payment_date = NULL
WHERE invoice_id IN (3, 4)
```
## Using Subqueries in Updates 在 Update 中使用子查詢
```sql
UPDATE invoices
SET payment_total = 567, payment_date = due_date
WHERE client_id =
-- 子查詢用括號包起來確保可先執行
(SELECT client_id
FROM clients
WHERE name = 'Yadel');
-- 若子查詢返回多筆資料就要用 IN 而不是 =
WHERE client_id IN
(SELECT client_id
FROM clients
WHERE state IN ('CA', 'NY'))
```
# Summarizing Data
## Aggregate Functions 聚合函數
輸入一系列的值並得到一個結果。
```sql
SELECT
MAX(invoice_total) AS highest,
MIN(invoice_total) AS lowest,
AVG(invoice_total) AS average,
SUM(invoice_total * 1.1) AS total, -- 聚合函數裡面也可放表達式
COUNT(invoice_total) AS number_of_invoices,
COUNT(*) AS total_records, -- 聚合函數會忽略空值,想計算所有欄位可用 * 號
COUNT(payment_date) AS number_of_payments,
COUNT(DISTINCT client_id) AS number_of_distinct_clients
-- DISTINCT client_id 篩選該欄位重複值,再用 COUNT 去計算數量
FROM invoices
WHERE invoice_date > '2019-07-01' -- 用 WHERE 去限縮結果,只看下半年的
```
- 聚合函數會忽略空值
- 聚合函數可放表達式
Exercise
```sql
SELECT
'1st_half_of_2019' AS date_range,
SUM(invoice_total) AS total_sales,
SUM(payment_total) AS total_payments,
SUM(invoice_total - payment_total) AS what_we_expect
FROM invoices
WHERE invoice_date BETWEEN '2019-01-01' AND '2019-06-30'
UNION
SELECT
'2st_half_of_2019' AS date_range,
SUM(invoice_total) AS total_sales,
SUM(payment_total) AS total_payments,
SUM(invoice_total - payment_total) AS what_we_expect
FROM invoices
WHERE invoice_date BETWEEN '2019-07-01' AND '2019-12-31'
UNION
SELECT
'Total' AS date_range,
SUM(invoice_total) AS total_sales,
SUM(payment_total) AS total_payments,
SUM(invoice_total - payment_total) AS what_we_expect
FROM invoices
WHERE invoice_date BETWEEN '2019-01-01' AND '2019-12-31'
```

💡 `SUM(invoice_total - payment_total)` 等於 `SUM(invoice_total) - SUM(payment_total)`
## The GROUP BY Clause
SELECT 出所有銷售
```sql
SELECT
SUM(invoice_total) AS total_sales
FROM invoices
```

那若想知道每個顧客的總銷售呢?用 `GROUP BY` 對列進行分組:
```sql
SELECT
client_id,
SUM(invoice_total) AS total_sales
FROM invoices
GROUP BY client_id
```

<aside>
💡 默認的順序是依照 `GROUP BY` 進行分組的欄位去排序的。可以再用 `ORDER BY` 去排序 🔽
</aside>
```sql
SELECT
client_id,
SUM(invoice_total) AS total_sales
FROM invoices
-- WHERE invoice_date >= '2019-07-01'
-- 也可以再加篩選語句
GROUP BY client_id
ORDER BY total_sales DESC
```

<aside>
💡 注意!`GROUP BY` 的位置要放在 `FROM`、`WHERE` 之後 `ORDER BY` 之前!
</aside>
## The HAVING Clause
`HAVING` 和 `WHERE` 都是篩選語法,寫法也都一樣,它們的不同同在於:
1. `WHERE` 只能篩選資料本中本來有的欄位,如使用聚合函數過後的欄位就無法用。
2. `HAVING` 後面只能接有被 SELECT 的欄位。
```sql
SELECT
client_id,
SUM(invoice_total) AS total_sales,
COUNT(*) AS number_of_invoices
FROM invoices
GROUP BY client_id
HAVING total_sales > 500 AND number_of_invoices > 5
```
這邊如果將 `HAVING` 替換成 `WHERE` 會報錯,因為原資料表中沒有 total_sales 這欄。
## WITH ROLLUP 運算子 (MySQL 才有的語法)
若 `SELECT` 的欄位有運用到聚合函數,加上 `WITH ROLLUP` 之後的結果會新增一列計算該欄位的總和。
```sql
SELECT
client_id,
SUM(invoice_total)
FROM invoices
GROUP BY client_id WITH ROLLUP
```

# Writing Complex Query
### Subqueries 子查詢
簡單來說就是 `SELECT` 語句中在寫 `SELECT`:
```sql
SELECT *
FROM products
WHERE unit_price > ( -- 記得用括號包起來
SELECT unit_price
FROM products
WHERE product_id = 3
)
```
```sql
-- 查詢薪水大於平均的人
SELECT *
FROM employees
WHERE salary > (
SELECT AVG(salary)
FROM employees
)
```
## The ALL Keyword
`> ALL(SELECT...)` 與 `> (SELECT MAX(column)` 同校,目的都是要找出最大值。
範例:sql_invoicing 資料庫中要選出大於 `id = 3` 的顧客的最大發票金額的所有發票
1. MAX() 寫法
```sql
SELECT *
FROM invoices
WHERE invoice_total > (
SELECT MAX(invoice_total)
FROM invoices
WHERE client_id = 3
)
```
1. ALL() 寫法
```sql
SELECT *
FROM invoices
WHERE invoice_total > ALL (
SELECT invoice_total
FROM invoices
WHERE client_id = 3
)
```
`MAX()` 方法會返回一個顧客 `id = 3` 的最大的發票金額,再判斷哪些發票的金額比這個值大;`ALL()` 方法是先返回顧客 `id = 3`的所有發票金額,是一列值,再用 `ALL()` 判斷比這些金額都大的發票有哪些。
## The ANY Keyword
`> ANY/SOME (SELECT...)` 與 `> (SELECT MIN(column))` 同效
`= ANY/SOME (SELECT...)` 與 `IN (…)` 同效
範例一:sql_invoicing 資料庫中要選出大於 `id = 3` 的顧客的最小發票金額的所有發票
1. ANY() 寫法
```sql
SELECT *
FROM invoices
WHERE invoice_total > ANY (
SELECT invoice_total
FROM invoices
WHERE client_id = 3
)
```
1. MIN() 寫法
```sql
WHERE invoice_total > (
SELECT MIN(invoice_total)
FROM invoices
WHERE client_id = 3
)
```
範例二:選出至少有兩張發票的顧客
1. ANY() 寫法
```sql
SELECT *
FROM clients
WHERE client_id = ANY (
SELECT client_id
FROM invoices
GROUP BY client_id
HAVING COUNT(*) >= 2
)
```
1. IN 寫法
```sql
SELECT *
FROM clients
WHERE client_id IN (
SELECT client_id
FROM invoices
GROUP BY client_id
HAVING COUNT(*) >= 2
)
```
## Correlated Subqueries 相關子查詢
選出 sql_hr.employees 裡那些薪水超過他所隸屬的辦公室的平均薪水的員工:
```sql
SELECT *
FROM employees e
WHERE salary > (
SELECT AVG(salary)
FROM employees
WHERE office_id = e.office_id -- 記得取別名區分
)
```
因為 WHEHR 會去逐列比對去做篩選,所以可以將 e.office_id 想像成從 1 開始一直遞增,類似迴圈的概念,(其中一張表必須要取別名去區分)。
## The EXISTS Operator
`EXISTS (相關子查詢)` 等於 `IN (SELECT...子查詢)` 但 EXISTS 效率較佳。
1. IN 寫法
```sql
SELECT *
FROM clients
WHERE client_id IN (
SELECT DISTINCT client_id
FROM invoices
)
```
2. EXISTS
```sql
SELECT *
FROM clients c
WHERE EXISTS (
SELECT client_id
FROM invoices
WHERE clients_id = c.client_id
)
```
待續...