# 惱人的 SQL

# Select
`select column_name from table_name`
語法:
* `''` 表示 string
關鍵字:
* distinct - 去除重複值:`select distinct column from table`
* and, or, not, `!=, <>` - 邏輯運算子
* limit - 限制 # of rows return from the query,常和 order by 一起用
* order by - 搭配 DESC, ASC(預設)
* 舉例:`select address, name, id from student order by name, address`
* between - match a value against a range of values 有包含性
* 留意:'2007-02-014' 預設後面的毫秒千秒等等的是 0
* in - create a condition that checks to see if a value in included in a list of multiple options,也可以搭配 not
* 舉例:`select column from table where columns in(property_1, property_2)`
* like and unlike - using pattern matching. like 是 case-sensitive 可以用 ilike 去達成 insensitive. postgresql 也有支援 regex
* like 搭配 % match any sequences of character
* liek 搭配 _ match any single character
* as - 可以增加可讀性,尤其是和 aggregate function 搭配。因為他在 query 最後才被執行,所以不能使用 alias inside a where operator or a having operator to filter by
* union - combine the result-set of two or more select statements
* `select column_name1 from table1 union select column_name2 from table2`
舉例:



`select * from customer
where first_name ilike '%er%'` 任何有 er 在 first_name 中出現的
內建 function:
* count - return the # of input rows that match the query
* 舉例:`select count(distinct name) from actor`
# Group by statement
group by 的用途是在 aggreate data and apply functions 目的是讓我們更好地理解
aggregate functions:
* idea:將多個 input 變成一個 output
* 舉例:`AVG(), COUNT(), MIN(), MAX(), SUM()`
* 使用在 select clause or having clause
舉例
1. AVG() - 回傳為 decimal, 可以透過 ROUND() 去 format
## Group by
用法:選定 categorical column to group by
底層運作:將表依據你選定的 categorical column 去分組,透過 aggregate function 將分組後的欄位去做相對應的運算 (take multiple values reduce them into one single value)
```sql
select categorical_col, AGG(data_col)
from table
where categorical_col != 'A' -- 和 aggregate 完的結果無關
group by AGG(data_col)
```
在 SQL 的 SELECT 語句中,如果你使用了聚合函數(如SUM、COUNT、AVG等),那麼在 SELECT 語句中出現的列(columns)就必須是聚合函數,或者是 GROUP BY 子句中列出的列。這是因為聚合函數是針對整個分組(group)進行計算的,而不是針對每個單獨的行進行計算的。如果你想要在 SELECT 語句中列出其他列,並且不是聚合函數或 GROUP BY 子句中列出的列,那麼就會報錯,因為這樣的列沒有定義計算的規則。
```sql
select customer_id, SUM(amount)
from payment
group by customer_id -- total sum amount per customer_id
order by SUM(amount) desc
select customer_id, count(amount)
from payment
group by customer_id -- transcation occurr per customer_id
order by count(amount) desc
select customer_id, staff_id, sum(amount)
from payment
group by customer_id, staff_id
order by customer_id, staff_id
-- the totam amount spend per staff per customer
select date(payment_date), sum(amount) from payment
group by date(payment_date)
-- how much it process each day
select round(avg(replacement_cost), 2), rating
from film
group by rating
```
處理 group by date 要做轉換, 目的是捨去 timestamp 後面的時分秒
## Having
用途: filter 已發生的 aggregation
(可以想成 aggregate statement 後的 where)
```sql
select customer_id, sum(amount)
from payment
where customer_id not in(184, 87, 477)
group by customer_id
having sum(amount) < 100
select store_id, count(customer_id) -- 語意較好
from customer
group by store_id
having count(customer_id) > 300
select customer_id, count(*)
from payment
group by customer_id
having count(*) >= 40
select customer_id, sum(amount)
from payment
where staff_id = 2
group by customer_id
having sum(amount) > 100
```
# JOINS

[A Visual Explanation of SQL Joins
](https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/)
## Inner Join
An Inner Join will result the set of records that match in both tables. only grab the rows that happen to be in both table A and table B.

1. talbe order 不重要:從表 A 去 join 表 B,跟從表 B 去 join 表 A 得到的結果是一樣的
2. postgres join 預設為 inner join
## Outer Join
目的:how to deal with values only present in one of the table being joined
1. Full outer Join: 可替換 table 的順序,因為他是對稱的。


> full outer join with where - get rows unique to either table(rows not found in both table)

```sql=
select * from customer
full outer join payment
on customer.customer_id = payment.customer_id
where customer.customer_id is null
or payment.customer_id is null
```
2. left outer join: order matters
> left outer join with where - get row unique with left table

3. right outer join: same as left join expect the tables are switched


# 資料庫對應關係
[「筆記」- 資料關聯的三種關係](https://medium.com/pierceshih/%E7%AD%86%E8%A8%98-%E8%B3%87%E6%96%99%E9%97%9C%E8%81%AF%E7%9A%84%E4%B8%89%E7%A8%AE%E9%97%9C%E4%BF%82-245152c093da)
# SubQuery
> perform a query on the results of another query.
The syntax involves two select statements
1. subquery 括號裡面的會先執行,一執行完畢就回傳結果,sql 就執行外部的 query
2. 可以搭配 in operator in conjuction with a subquery to check against multiple results returned
```sql
-- 可以透過 join 也可以用 subquery
select student, score from
test_scores
where student in(
select student from honer_roll_table
)
```
3. The Exists operator is used to test for existence of rows in a subquery. Typically a subquery is passed in the Exists() to check if any rows are returned with the subquery.
```sql
select column_name from table_name
where exists
(select column_name from table_name
where condition
)
```
如果 subquery 回傳的是 multiple value 你應該要使用 in operator
```sql
select title, rental_rate
from film
where rental_rate >
(select avg(rental_rate) from film)
order by rental_rate desc
-- grab film title between certain dates
select film.film_id, film.title
from film where film.film_id in (
select inventory.film_id from rental
inner join inventory on inventory.inventory_id = rental.inventory_id
where return_date
between '2005-05-29' and '2005-05-30')
-- find customers at least have one payment whose amount is greater than 11
select first_name, last_name
from customer as c where not exists(
select * from payment as p
where p.customer_id = c.customer_id
and amount > 11
)
```
# self join

名詞解釋:A self-join is a query in which a table is joined to itself 可以被看作兩個相同的表去做關聯。The table is not actually copied, but SQL performs the command as though it were.
用途:compare values in a column of rows within the same table
性質:
1. use alias for the table ow the table name would be ambiguous
用途:

```sql
select emp.name, rep.name as rep
from employees as emp
join employees as rep on
emp.emp_id = rep.report_id
```
透過這個 JOIN 子句,您可以將 Employees 表格自己連接到自己,其中 emp_id 列被連接到 report_id 列。這表示,每個員工報告給另一個員工,而 emp_id 是報告人的 ID,而 report_id 是該報告人報告給的人的 ID。
因此,您需要在 JOIN 子句中將 emp_id 列連接到 report_id 列,以便能夠正確匹配員工和他們的報告對象。如果您將 emp_id 列連接到 emp_id 列,那麼您將只會找到員工報告給自己,而不是他們實際上報告給的人。
```
-- find all the pair of film that have the same length
select f1.title, f2.title, f2.length
from film as f1
inner join film as f2
on f1.film_id != f2.film_id
and f1.length = f2.length
```
## QA
> What's difference between subquery and join ?
子查询是在一個查詢語句內嵌套另一個查詢語句,而 JOIN 是通過將兩個或多個表的行匹配來組合數據。
子查询返回一個單獨的值或一個結果集,而 JOIN 返回由兩個或多個表的結合行組成的結果集。
子查询可以在 SELECT、INSERT、UPDATE 和 DELETE 陳述中使用,而 JOIN 只能在 SELECT 陳述中使用。
當你需要使用多個表或子集進行查詢時,可以考慮使用 JOIN。JOIN 提供了更靈活和有效的方式來組合表中的數據。通常情況下,當你需要選擇和組合多個表中的數據時,JOIN 更為適用。
而子查询通常適用於需要使用特定條件過濾數據的情況,例如查找具有某些屬性的行,或者查找與特定值相關聯的行。當你需要在 WHERE 子句中使用特定條件來過濾數據時,子查询是更好的選擇。
# Other Functions
## Timestamps and Extract
data type:
1. time: contains only time
2. date: contains only date
3. timestamp: contains date and time
4. timestamptz: contains date, time, and timezone
functions and operations:
1. timezone
2. now
3. timeofday
4. current_time
5. current_date
```sql;
show timezone
select now()
select timeofday()
select current_time
select current_date
```
擷取資料常用函數
1. extract(): extract or obtain a sub-component of a date value, `EXTRACT(field FROM source)`,用於從日期/時間值中提取指定的元素,例如年、月、日、小时、分钟、秒等等。
* year: `extract(year from date_col)`
* month
* day
* week
* ouarter
3. age(): calculates and returns the current age given a timestamp
* age(date_col) 回傳 2 years 1 mon 5 day 11.32.13.003423
5. to_char(): convert data type to text. 通常會用在 timestamp 的 format 上
6. dow(): 日期/時間函數,用於獲取指定日期的星期幾。`SELECT EXTRACT(DOW FROM TIMESTAMP '2001-02-16 20:38:40');
`
```sql
select
extract(year from payment_date) as payment_year,
extract(month from payment_date) as payment_month,
age(payment_date),
to_char(payment_date, 'YYYY/MM/DD') as payment_date
from payment
select count(*) as payment_day from payment
where to_char(payment_date, 'ID') = '1'
```
# Mathamatical functions and operators
Left: `LEFT(string, length)`,是一個選取字串左邊的 function
```sql
select
upper(first_name) || ' ' || lower(last_name) as full_name,
length(first_name),
lower(left(first_name, 1)) || last_name || '@gmail.com' as new_mail
from customer
```
# HW
[sql HW2](https://docs.google.com/document/d/1wiuYbTQslmfolQWgeVPB356csjK6yqOUBhgC7fM44o8/edit)
```sql
-- Q01
select * from cd.facilities
-- Q02
select
distinct fc.name, fc.membercost
from cd.facilities fc
-- Q03
select
distinct *
from cd.facilities fc
where fc.membercost > 0
-- Q04
select
distinct fc.facid,
fc.name, fc.membercost, fc.monthlymaintenance
from cd.facilities fc
where fc.membercost > 0 and fc.membercost < (fc.monthlymaintenance / 50.0)
-- Q05
select * from cd.facilities fc
where fc.name like '%Tennis%'
-- Q06 使用IN運算子,可以在一個查詢中同時搜尋多個值,而不需要使用OR運算子。
select * from cd.facilities fc
where fc.facid in (1, 5)
-- Q07 也可以使用 date function ex: Date('2012-09-01')
select memid, surname, firstname, joindate
from cd.members
where to_char(joindate, 'YYYY-MM-DD') >= '2012-09-01'
-- Q08
select distinct surname from cd.members
order by surname
limit 10
-- Q09 也可以透過 MAX() MIN() 去解題
select mb.joindate from cd.members mb
order by mb.joindate desc
limit 1
-- Q10
select count(*) from cd.facilities fc
where fc.guestcost > 10
-- Q11 per statement 也就是要用 group by
select sum(slots), bk.facid
from cd.bookings bk
inner join cd.facilities fc
on fc.facid = bk.facid
where to_char(bk.starttime, 'YYYY-MM') = '2012-09'
group by bk.facid
order by sum(slots) desc
-- Q12
select fc.facid, sum(bk.slots) from cd.bookings bk
inner join cd.facilities fc
on fc.facid = bk.facid
group by fc.facid
having sum(bk.slots) > 1000
order by fc.facid
-- Q13
select
cd.bookings.starttime as start,
cd.facilities.name
from cd.bookings
inner join cd.facilities
on cd.bookings.facid = facilities.facid
where cd.facilities.name like '%Tennis Court%'
and to_char(cd.bookings.starttime, 'YYYY-MM-DD') = '2012-09-21'
order by cd.bookings.starttime
-- Q14
select count(*) from cd.members
inner join cd.bookings
on cd.bookings.memid = cd.members.memid
where firstname = 'David' and surname = 'Farrell'
```
# PK and FK
## PK
* 名詞解釋: columns or a group of columns used to identify a row uniquely in a table
* 用途:去 join 其他表
## FK
* 名詞解釋: field or a group of fields in a table that uniquely identify a row in other table
* 用途:定義自己這張表如何跟其他表的 PK 去做參照
# Constraints
* 名詞解釋:表格欄位定義的規則,避免無效的資料進到 DB
* 用途:建立表或定義欄位時,可以透過 constraints 去設定為 PK 或建立 FK 跟其他表的關係
* table constraints & column constraints
common constraints
1. not null
2. unique
3. check: ensures all values in the columns satisfy certain conditions
4. exclusion: 如果任意兩個 rows 跟特定欄位或特定運算子形成的表達式做比較,要使用它。not all of the comparisons will return true.
table constraints
* 名詞解釋: columns or a group of columns used to identify a row uniquely in a table
* 用途:去 join 其他表
* 特質:unique & not null


舉例:Each row for PK is uniquely ; constraints data based on columns on other table
# Create DB and tables
## Create
```sql
Create Table table_name(
column_name Type column_constraint,
column_name Type column_constraint,
table_constraint, table_constraint
) INHERITS existing_table_name
create table job(
job_id serial primary key,
job_name varchar(50) unique not null
)
-- 使用中間表關聯兩張表
create table account_job(
user_id integer references account(user_id),
job_id integer references job(job_id),
hire_date timestamp
)
```
serial - if the row is removed, the column with the serial data type will not adjust 作為表的主鍵,如果是要說跟其他表的關聯,就用 integer
```sql
create table account(
user_id serial primary key,
name varchar(50) unique not null,
password varchar(50) not null,
email varchar(200) unique not null,
created_on timestamp not null,
last_login timestamp
)
```
## Insert
目的:add a row into table
the inserted row values must match up for the table, including constraints. serial columns don't need to be a provided a value.
```sql
insert into table(column1, column2)
values (value1, value2), (valu1, value2)....
insert into table(column1, column2)
select colum1, column2
from another table where condition ...
```
```sql
-- Example
insert into account(
name, password, email, created_on
) values ('Angela', 'password', 'angela@gmail.com', current_timestamp)
insert into job(
job_name
) values('president')
insert into account_job(
user_id, job_id, hire_date
) values(1, 1, current_timestamp)
```
## Update
目的:allows us for the changing of values of the columns in the table
```sql
update table
set colum1 = value1
column2 = value2
where condition
```
```sql
-- update join
update tableA
set origin_col = tableB.new_col
from tableB
where tableA.id = tableB.id
-- return affected rows
update account
set last_login = created_on
returning account_id, last_login
-- update join
update account_job
set hire_date = account.created_on
from account
where account.user_id = account_job.user_id
```
## Delete
用途:remove rows from a table
```sql
-- delete join
delete from table_A
using table_B
where table_A.id = table_B.id
-- delete all rows from a table
delete from table
```
可以搭配 returning 關鍵字,去得到我們移除的 rows 是什麼
```sql
insert into job(job_name)
values('Doctor')
delete from job
where job_name = 'Doctor'
returning job_id, job_name
```
## Alter
用途:change for existing table structure
1. add, drop, or rename columns
2. chage a column's data type
3. set default value for a column
4. add check constraint
5. rename table
語法:
```sql
alter table table_name action
```
```sql
-- add columns
alter table table_name
add column new_col type
-- remove columns
alter table table_name
drop column col_name
-- alter constraint
alter table table_name
alter column col_name
set default value
alter table table_name
alter column col_name
drop default
alter table table_name
alter column col_name
set not null
alter table table_name
alter column col_name
add constraint constraint_name
```
```sql!
create table information(
info_id serial primary key,
title varchar(500) not null,
person varchar(50) not null unique
)
-- rename table
alter table information
rename to new_information
-- rename column
alter table new_information
rename person to people
-- remove constraint for a column
alter table new_information
alter column people drop not null
```
## Drop
用途:completely remove a column in a table
在 postgresql 裏面,會自動移除這個欄位相關的 constraints and indexes
**However, it will not remove columns used in views, triggers, or stored procedures without the additional CASCADE clause**
```sql!
-- general syntax
alter table table_name
drop column col_name
-- remove all dependencies
alter table table_name
drop column col_name cascade
-- check for existence for avoiding error
alter table table_name
drop column if exists col_name
-- drop multiple columns
alter table table_name
drop column col_one,
drop column col_two
```
NOTICE: column "people" of relation "new_information" does not exist, skipping
ALTER TABLE
```sql!
alter table new_information
drop if exists people
-- Query returned successfully in 45 msec.
```
## Check
用途:create constraints that adhere to certain conditions.
```sql!
-- ex
create table example(
id serial primary_key,
age smallint check (age > 21),
parent_age smallint check (parent_age > age)
)
```
# Conditional Expressions and Procedures
目的:增加邏輯和 workflow 到 sql 裡面去
## Case
The main difference is that the case expression syntax first evaluates an expression then compares the result with each value in the when clauses sequentially.
```sql!
-- general case statement
case
when condition1 then result1
when condition2 then result2
else some_other_result
as column_name
end
-- case expression
case expression
when condition1 then result1
when condition2 then result2
else some_other_result
as column_name
end
```
```sql
select customer_id,
case
when ( customer_id <= 100 ) then 'Premium'
when ( customer_id between 100 and 200) then 'Plus'
else 'Normal'
end as customer_class
from customer
select customer_id,
case customer_id
when 2 then 'Winner'
when 5 then 'Second Place'
else 'Normal'
end raffle_results
from customer
select sum(
case rental_rate
when 0.99 then 1
else 0
end
) as num_of_bargain_movies,
sum(
case rental_rate
when 2.99 then 1
else 0
end
)as num_of_regular_movies,
sum(
case rental_rate
when 4.99 then 1
else 0
end
)as num_of_premium_movies
from film
select
sum(
case rating
when 'R' then 1
else 0
end
) as r,
sum(
case rating
when 'PG' then 1
else 0
end
) as pg,
sum(
case rating
when 'PG-13' then 1
else 0
end
) as pg13
from film
```
## Coalesce
定義:Coalesce function accepts an unlimited number of arguments. It returns the first argument that is not null. 如果所有引述都為 null 就回傳 null
`Coalesce(argument_1, argument_2, ...)`
通常這些引數為 col_name
用途:表如果有 null,又想要替換 null 為其他的值,例如 `
select item, (price - coalesce(discount, 0) as final_price from table`
## Cast
定義:cast operator let's you convert from one data type into another
並不是任意的資料型態都可以透過 cast 被轉成任意的資料型態,他必須要合理。
```sql!
select
cast('5' as integer), -- cast function
'3'::integer -- cast operator
```
用途:select query with a column name instead of a single instance
```sql!
select cast(date as timestamp) from table
select char_length(inventory_id::varchar)
from rental
```
## NULLIF
定義:NULLIF function take two arguments, and returns null if both are equal ow it returns the first argument passed
用途:Null 會在某些 query 造成無法預期的結果或錯誤,這時 NULLIF 就派上用場了
```sql!
NULLIF(arg_1, arg_2)
-- example
create table department(
first_name varchar(50),
department varchar(50)
)
insert into department(
first_name,
department
)
values
('Tom', 'A'),
('Peter', 'A'),
('Nick', 'B');
select (
sum(
case
when department = 'A' then 1
else 0
end
) /
sum (
case
when department = 'B' then 1
else 0
end
)
) as depart_ratio -- 2
from department
-- 如果 department = 'B' 裡面的人都離開了,這個 ratio 就會出錯
delete from department where department = 'B'
-- 使用 nullif 的時機到了
select (
sum(
case
when department = 'A' then 1
else 0
end
) /
nullif(
sum (
case
when department = 'B' then 1
else 0
end
), 0
)
) as depart_ratio
from department
```
## Views
將常用到的 query 存起來

示意:

1. A view is a database object that is of a stored query
2. A view can be accessed as a virtual table in postgresql
3. A view simply stores query
4. 也可以針對既有的 view 去 update or alter
```sql!
-- 語法
create view view_name as query
create view customer_info as
select first_name, last_name, address from customer
inner join address
on customer.address_id = address.address_id
-- 要異動 view 裡面的內容
create or replace view customer_info as
select first_name, last_name, address, district from customer
inner join address
on customer.address_id = address.address_id
-- remove view
drop view if exists customer_info
-- change name of view
alter view customer_info rename to c_info
```
## Import and Export
用途:import data from a.csv file to an already existing table
檔案路徑要正確且檔案裡面的資料型態要可以被 sql 正確解讀,這樣 import 才到 db 才會正常

[COPY](https://www.postgresql.org/docs/15/sql-copy.html)
import command 並沒有幫你創建表,他假設使用前你已經自己建好了
```sql!
-- 先建立表
create table simple(
a integer,
b integer,
c integer
)
```
[Can I automatically create a table in PostgreSQL from a csv file with headers?](https://stackoverflow.com/questions/21018256/can-i-automatically-create-a-table-in-postgresql-from-a-csv-file-with-headers)
[How to import and export data using CSV files in PostgreSQL
](https://www.enterprisedb.com/postgres-tutorials/how-import-and-export-data-using-csv-files-postgresql)
[How to import CSV file data into a PostgreSQL table](https://stackoverflow.com/questions/2987433/how-to-import-csv-file-data-into-a-postgresql-table)
# 完食碎念時間
好感人,終於完食了 SQL 初階課程。雖然很久以前也在 cocademy 上練習過 sql 但果然沒用到的東西,過不久就會通通忘記。因為需求又不得不回歸 sql 的懷抱,完食初級就要到了進階版的 advanced SQL。
這週的課程很基礎,學習上沒有太大的壓力,也有發現一些有趣的小東西,跟有在專案裡看過的老朋友們。
希望下週的進食,不會吃到想吐出來,大吼大叫表示:腦子你為什麼要學這麼慢 ORZ。
Q182 done