owned this note
owned this note
Published
Linked with GitHub
Databases and SQL for Data Science with Python <br> WEEK_3 - Intermediate SQL
====
###### tags: `IBM Data Engineering Professional Certificate`,`Reading Note`,`Coursera`,`Databases and SQL for Data Science with Python`
### Overview
>* 介紹 WHERE 的LIKE, BETWEEN, OR 提取資料。
>* 介紹 ORDER BY 排序資料。
>* 介紹 GROUP BY 的AS, HAVING 為資料進行分類。
>* 介紹 Built in Function 的 SUM, MAX/MIN, AVG, ROUND,LENGTH,UCASE/LCASE 的功能。
>* 介紹 Date & Time 的資料類型以及時間的函數。
>* 介紹 Sub-Queries 和 Nested Selects 的資料提取功能。
>* 提取多個 tables 的資料。
>* 介紹 join 多個 tables。
<br>
## Refining your Results
### 1. Using String Patterns and Ranges
* Retrieving rows - using a String Pattern
* WHERE...LIKE
* 使用LIKE搜尋String Pattern
```SQL=
SELECT <column_name> FROM <table_name>
WHERE <column_name> LIKE <string_pattern>
```
* 使用LIKE的範例:
```SQL=
SELECT firstname FROM Author
WHERE firstname LIKE 'R%'
```
* Retrieving rows - using a Range
* WHERE...BETWEEN
* 使用rows的區間提取資料
```SQL=
SELECT <column_name> FROM <table_name>
WHERE <column_name> BETWEEN <num> AND <num>
```
* 使用BETWEEN的範例:
```SQL=
SELECT title, pages FROM Book
WHERE pages BETWEEN 290 AND 300
```
* Retrieving rows - using a Set of Values
* WHERE...OR
* 使用OR搜尋更多關鍵字
```SQL=
SELECT <column_name> FROM <table_name>
WHERE <column_name> = 'keyword 1' OR <column_name> = 'keyword 2'
```
* 使用OR的範例:
```SQL=
SELECT fisrtname, lastname, country FROM Author
WHERE country = 'AU' OR country = 'BR'
```
<br>
### 2. Sorting Result Sets
* Using the ORDER BY clause
* ORDER BY
* 使用 ORDER BY 為 column 中的資料進行排序
```SQL=
SELECT <column_name> FROM <table_name>
ORDER BY <column_name>
```
* ORDER BY clause - Descending Order
* ORDER BY...DESC / ASC
* 使用 ORDER BY 為 column 中的資料進行遞減或遞增排序
```SQL=
SELECT <column_name> FROM <table_name>
ORDER BY <column_name> DESC
```
* Specifying Column Sequence Number
* ORDER BY
* 使用範例
```SQL=
SELECT title, pages FROM Book
ORDER BY 2
```
OUTPUT :

<br>
### 3. Grouping Result Sets
* GROUP BY clause
* GROUP BY
* 使用 GROUP BY 為資料進行分類
```SQL=
SELECT <column_name> FROM <table>
GROUP BY <column_name>
```
* 使用 GROUP BY 的範例
```SQL=
SELECT country, count(country) FROM Author
GROUP BY country
```
OUTPUT :

* AS
* 在 SELECT 查詢可以使用 AS 關鍵字為查詢結果取一個別名
```SQL=
SELECT country, count(country) AS Count FROM Author
GROUP BY country
```
OUTPUT :

* HAVING
* 在 SELECT 查詢中使用 HAVING 来指定 GROUP BY 的过滤条件
```SQL=
SELECT country, count(country) AS Count FROM Author
GROUP BY country HAVING COUNT(country) > 4
```
OUTPUT :

<br>
## Functions, Multiple Tables, and Sub-queries
### 1. Built-in Database Functions
* Built-in Function
* 提升資料的處理能力
* 減少提取資料的數量
* Aggregate or Column Functions
* SUM Function:
* 把整個Column的數值做相加
```SQL=
SUM(column_name) FROM <table_name>
```
* SUM的使用範例
```SQL=
SELCT SUM(COST) FROM PETRESCUE
```
OUTPUT :

* 為輸出結果取別名
```SQL=
SELCT SUM(COST) AS SUM_OF_COST
FROM PETRESCUE
```
OUTPUT :

* MIN, MAX
* MAX / MIN
* 回傳在 Column 裡最大的數值
```SQL=
SELECT MAX(column_name) FROM <table_name>
```
* MIN 與 MAX 的 syntax 一樣的寫法
* 有條件搜尋 MAX / MIN 的值
* 加 WHERE
```SQL=
SELECT MIN(column_name) FROM <table_name>
WHERE <column_name> = 'keyword'
```
* Average
* AVG
* 計算column的平均值
```SQL=
SELECT AVG(column_name) FROM <table_name>
```
* 可以處理Column之間的數學運算
* 以計算平均一隻狗的價格為範例
```SQL=
SELECT AVG(COST / QUANTITY) FROM PETRESUE
WHERE ANIMAL = 'Dog'
```
OUTPUT:

* SCALAR and STRING FUNCTIONS
* ROUND(X, D)
* 回傳x舍入到最接近的整數。如果第D參數,則函數回傳x四捨五入至第D位小數點
```SQL=
SELECT ROUND(column_name, integer) FROM <table_name>
```
* LENGTH
* 回傳字串的長度
```SQL=
SELECT LENGTH(column_name) FROM <table_name>
```
* LENGTH 可以直接匯入字串
```SQL=
SELECT LENGTH('string')
```
* UCASE, LCASE
* UCASE / LCASE
* 把字串轉為大或小寫
```SQL=
SELECT UCASE(column_name) FROM <table_name>
```
<br>
### 2. Date and Time Built-in Functions
* 多數的數據庫會提供時間資料類型

DATE:
>Y : Year
>M : Month
>D : Day
TIME
>H : Hour
>M : Minute
>S : Seconds
TIMESTAMP
>Y : Year
>X : Month
>D : Day
>H : Hour
>M : Minute
>S : Seconds
>Z : Microseconds
* 關於時間與日期的函數
* YEAR()
* 回傳日期的年份
* MONTH()
* 回傳日期的月份
* DAY()
* 回傳日期的天
* DAYOFMONTH()
* 回傳日期月份中的天,回傳值範圍會在0至31之間
* DAYOFWEEK()
* 回傳日期的工作索引值,如:SUNDAY為1, MONDAY為2,SATURDAY為7
* DAYOFYEAR()
* 回傳一年內的第幾天
* WEEK()
* 回傳一年內的第幾週
* HOUR()
* 回傳時間的小時部分,範圍為0至23
* MINUTE()
* 回傳時間的分鐘部分,範圍為0至59
* SECOND
* 回傳時間的秒,範圍為0至59
<br>
### 3. Sub-Queries and Nested Selects
* SQL 查詢語在另一個 SQL 查詢語句中
* 錯誤示範
```SQL=
SELECT * FROM employees
WHERE salary > AVG(salary)
```
OUTPUT :

* 無法輸出正常結果
* 使用 Sub-query 完成任務
```SQL=
SELECT EMP_ID, F_NAME, L_NAME, SALARY
FROM employees
WHERE SALARY < (SELECT AVG(SALARY) FROM employees);
```
OUTPUT:

* Sub-queries in list of columns
* 為 Sub-queries 取別名,錯誤示範
```SQL=
SELECT EMP_ID, SALARY, AVG(SALARY) AS AVG_SALARY
FROM employees;
```
* 正確的寫法
```SQL=
SELECT EMP_ID, SALARY,
(SELECT AVG(SALARY) FROM employees)
AS AVG_SALARY
FROM employees;
```
OUTPUT:

* Sub-queries in FROM clause
* Derived Tables / Table Expressions
```SQL=
SELECT * FROM
(SELECT EMP_ID, F_NAME, L_NAME, DEP_ID FROM emplyees) AS EMP4ALL;
```
OUTPUT :

<br>
### 4. Working with Multiple Tables
#### 1. Accessing Multiple Tables with Sub-queries
* 這部分有點複雜,所以會用範例的方式講解:

* 提取 DEPARTMENTS table 與 EMPLOYEES table 有交集的資料:
```SQL=
SELECT * FROM employees
WHERE DEP_ID IN (SELECT DEPT_ID_DEP FROM departments);
```
* 提取 EMPLOYEES table 與 DEPARTMENTS table 的 location ID 有交集的資料
```SQL=
SELECT * FROM employees
WHERE DEP_ID IN
(SELECT DEPT_ID_DEP FROM departments WHERE LOC_ID = 'L0002');
```
OUTPUT:

* 提取 department ID 和 employees name, 薪資待遇有超過 70000
```SQL=
SELECT DEPT_ID_DEP, DEP_NAME FROM departments
WHERE DEPT_ID_DEP IN
(SELECT DEP_ID FRPM employees WHERE SALARY > 70000);
```
#### 2. Accessing multiple tables with Implicit Join
* Full join / Cartesian join
```SQL=
SELECT * FROM <table_name_1>, <table_name_2>;
```
* SQL 範例
```SQL=
SELECT * FROM employees, departments;
```
OUTPUT:

* 提取部分資料
```SQL=
SELECT * FROM employees, departments
WHERE employees.DEP_ID = departments.DEPT_ID_DEP;
```
* 簡寫版本
```SQL=
SELECT * FROM employees E, departments D
WHERE E.DEP_ID = D.DEPT_ID_DEP;
```
OUTPUT:

* 提取每位 employee 的 department name
```SQL=
SELECT EMP_ID, DEP_NAME FROM employees E, departments D
WHERE E.DEP_ID = D.DEPT_ID_DEP;
```
OUTPUT:

## Summary & Highlights
* 課程完整整理的內容,所以把它記錄下來
* You can use the WHERE clause to refine your query results.
* You can use the wildcard character (%) as a substitute for unknown characters in a pattern.
* You can use BETWEEN ... AND ... to specify a range of numbers.
* You can sort query results into ascending or descending order, using the ORDER BY clause to specify the column to sort on.
* You can group query results by using the GROUP BY clause.
* Most databases come with built-in functions that you can use in SQL statements to perform operations on data within the database itself.
* When you work with large datasets, you may save time by using built-in functions rather than first retrieving the data into your application and then executing functions on the retrieved data.
* You can use sub-queries to form more powerful queries than otherwise.
* You can use a sub-select expression to evaluate some built-in aggregate functions like the average function.
* Derived tables or table expressions are sub-queries where the outer query uses the results of the sub-query as a data source.