# SQL Bootcamp 2021: Go from Zero to Hero
> thanks to ['Jose Portilla](https://www.udemy.com/user/joseportilla/)' on Udemy, if you like please buy this course: https://www.udemy.com/course/the-complete-sql-bootcamp/
## Introduction
### Spreadsheets vs Databases
* Spreadsheets
* One-time analysis
* Quickly need to chart something out
* Reasonable data set size
* Ability for untrained people to work with data
* Databases
* Data integrity
* Can handle massive amount of data
* Quickly combine different datasets
* Automate steps for re-use
* Can support data for websites and applications
```csvpreview {header="true"}
Spreadsheets,Databases
Tab, Table
Column, Column
Row, Row
```
### Database Platforms
- PostgreSQL (Free, Open Source)
- MySQL (Free, Open Source)
- MS Sql Server Express (Free but limitations, only Windows)
- Microsoft Access (Cost, not easy to use)
- SQLite (Free, Open Source; cons=command line)
## Statements
### SELECT
- Most common statement used
- Allows to retrieve information from a table
- Example:
~~~~sql
SELECT col1, col2 FROM table_name;
~~~~
- In general, it is not good practice to use an asterisk (*) in the SELECT statement if you don't really net all columns.
- The semicolon is to denote the end of a query but if you delete that semicolon, that would be no problem.
- It doesn't matter if statements (select, fRoM) are capital case or not, both would work.
### SELECT DISTINCT
- Sometimes a table contains a column that has duplicate values and you may find yourself in a situation where you only want to list the unique/distinct values.
- The `DISTINCT` keyword can be used to return only the distinct values in a column.
- The `DISTINCT` keyword operates on a column. The syntax looks like this:
~~~~sql
SELECT DISTINCT col1 FROM table_name;
~~~~
- To clarify which column `DISTINCT` is being applied to, you can also use paranthesis for clarity. Later on, while using more calls such as `COUNT` and `DISTINCT` together, the paranthesis will be necessary.
~~~~sql
SELECT DISTINCT(col1) FROM table_name;
~~~~
### COUNT
- The `COUNT` function returns the number of input rows that match a specific condition of a query.
- We can apply `COUNT` on a specific column or just pass `COUNT(*)` that returns the same result.
- All return the same thing, since the original table had N rows:
- SELECT COUNT(name) FROM table;
- SELECT COUNT(choice) FROM table;
- SELECT COUNT(*) FROM table;
- Because of this `COUNT` by itself simply returns back a count of the number of rows in a table; `COUNT` is much more useful when combined with other commands, such as `DISTINCT`.
- Examples:
~~~~sql
SELECT COUNT(DISTINCT col1) FROM table_name;
~~~~
~~~~sql
SELECT COUNT(DISTINCT customer_id) FROM payment;
~~~~
### SELECT WHERE
- `SELECT` and `WHERE` are most fundamental SQL statements.
- The `WHERE` statement allows to specify conditions on columns for the rows to be returned.
- Basic syntax example:
~~~~sql
SELECT col1, col2
FROM table
WHERE conditions;
~~~~
- The `WHERE` clause apperas immediately after the `FROM` clause of `SELECT` statement.
- The condiions are used to filter the rows returned from the `SELECT` statement.
- PostgreSQL provides a variety of standard operators to construct the conditions.
#### Comparison Operators
```csvpreview {header="true"}
Operator, Description
=,Equal
>,Greater than
<, Less than
>=, Greater than or equal to
<=, Less than or equal to
<> or !=, Not equal to
```
#### Logical Operators
- AND
- OR
- NOT
```csvpreview {header="true"}
Name, Choice
Zach,Green
David,Green
Claire,Yellow
David,Red
```
- Simple Syntax Example -> Get only the people named David with Choice Red
~~~~sql
SELECT * FROM table_1
WHERE Name='David' AND Choice='Red'
~~~~
Note: Use single quotes to define strings.
### ORDER BY
- It can be used to sort rows based on a column value, in either ascending or descending order.
- Basic syntax example:
~~~~sql
SELECT col1, col2
FROM table_1
ORDER BY col_1 ASC
~~~~
- If you leave `ASC`/`DESC` field blank, `ORDER BY` uses `ASC` by default.
- `ORDER BY` can also be used for multiple columns. This makes sense when one column has duplicate entries.
~~~~sql
SELECT col1, col2, col3
FROM table_1
ORDER BY col_1 DESC, col3 ASC
~~~~
- Without requesting in `SELECT` statement, it still can be sorted by any other column.
### LIMIT
- The `LIMIT` command allows to limit the number of rows returned for a query.
- Useful for not wanting to return every single row in a table, but only view the top few rows to get an idea of the table layout.
- `LIMIT` also becomes useful in combination with `ORDER BY`.
- `LIMIT` goes at the very end of a query request and is the last command to be executed.
- Basic syntax example:
~~~~sql
SELECT col1, col2
FROM table_1
ORDER BY col_1 ASC
LIMIT 20
~~~~
### BETWEEN (and NOT BETWEEN)
- Used to match a value against a range of values:
- The `BETWEEN` operator is the same as:
- value >= low `AND` value <= high
- value `BETWEEN` low `AND` high (!! low and high values are included!!)
- `BETWEEN` operator can also be combined with the `NOT` logical operator.
- The ` NOT BETWEEN` operator is the same as:
- value `NOT BETWEEN` low `AND` high
- value `<` low `OR` value `>` high
- `BETWEEN` operator can also be used with dates. Note that you need to format dates in the ISO 8601 standard format, which is YYYY-MM-DD
- date `BETWEEN` '2007-01-01' AND `2007-02-01`
- Basic Syntax Example
~~~~sql
SELECT * FROM payment
WHERE amount BETWEEN a AND b
~~~~
### IN (and NOT IN)
- It can be used to create a condition that checks to see if a value is included in a list of multiple options.
- Example query:
~~~~~sql
SELECT color FROM table_1
WHERE color IN ('red','blue','green')
~~~~~
- These are the same:
~~~~~sql
SELECT * FROM payment
WHERE amount IN (0.99,1.98,1.99)
~~~~~
~~~~~sql
SELECT * FROM payment
WHERE amount = 0.99 OR amount = 1.98 OR amount=1.99
~~~~~
### LIKE and ILIKE
- The `LIKE` operator allows to perform pattern matching against string data with the use of wildcard characters:
- Percent '%': Matches any sequence of characters
- Underscore '_': Matches any single character
- You can use multiple underscores.
- Examples:
- All names that begin with an 'A'
- `WHERE` name `LIKE` 'A%'
- All names that end with an 'a'
- `WHERE` name `LIKE` '%a'
- Notice that `LIKE` is case-sensitive, we can use `ILIKE` which is case-insensitive.
- Get all Mission Impossible films
- `WHERE` title `LIKE` 'Mission Impossible _'
- Imagine we had version string codes in the format 'Version#A4', 'Version#B7', etc..
- `WHERE` value `LIKE` 'Version#__'
- We can also combine pattern matching operators to create more complext patterns such as:
- `WHERE` name `LIKE` '_her%' --> Cheryl, Theresa, Sherri
:::info
POSTGRESQL REGEX
https://www.postgresql.org/docs/12/functions-matching.html
:::
- Some examples:
~~~~sql
SELECT * FROM customer
WHERE first_name LIKE 'J%' AND last_name LIKE 'S%';
SELECT * FROM customer
WHERE first_name LIKE '_her%';
SELECT * FROM customer
WHERE first_name LIKE 'C%'
ORDER BY last_name
~~~~
## AGGREGATE FUNCTIONS
- Most common aggregate functions:
- `AVG()` - returns average value
- `COUNT()` - returns number of values
- `MAX()` - returns maximum value
- `MIN()` - returns minimum value
- `SUM()` - returns the sum of all values
- Aggregate function calls happen only in the `SELECT` clause or the `HAVING` clause.
- Special Notes:
- `AVG()` returns a floating point value many decimal places (e.g. 2.342418...) You can use `ROUND()` to specify precision after the decimal.
- `COUNT()` is simply returns the number of rows, which means by convention we just use `COUNT(*)`
- Some examples:
~~~~sql
SELECT MIN(replacement_cost) FROM film;
SELECT ROUND(AVG(replacement_cost),3) FROM film;
SELECT MIN(replacement_cost), MAX(replacement_cost), ROUND(AVG(replacement_cost),2) FROM film;
~~~~
### GROUP BY
- General syntax:
~~~~sql
SELECT category_col, AGG(data_col)
FROM table_1
GROUP BY category_col
~~~~
- The `GROUP BY` clause must appera right after `FROM` or `WHERE` statement.
- In the `SELECT` statement, columns must either have an aggregate function or be in the `GROUP BY` call.
- Example:
~~~~sql
SELECT company, division, SUM(sales)
FROM finance_table
GROUP BY company, division
~~~~
- `WHERE` statements should not refer to the aggreagation result, later on it will be used by `HAVING` statement.
~~~~sql
SELECT company, division, SUM(sales)
FROM finance_table
WHERE division IN('marketing','transport')
GROUP BY company, division
~~~~
- If you want to sort results base on the aggregate, make sure to reference the entire function.
~~~~sql
SELECT company SUM(sales)
FROM finance_table
GROUP BY company
ORDER BY SUM(sales)
--
SELECT customer_id, SUM(amount) FROM payment
GROUP BY customer_id
ORDER BY SUM(amount) DESC
--
SELECT DATE(payment_date), SUM(amount) FROM payment
GROUP BY DATE(payment_date)
ORDER BY SUM(amount) DESC
--
SELECT customer_id, SUM(amount)
FROM payment
GROUP BY customer_id
ORDER BY SUM(amount) DESC
LIMIT 5
~~~~
### HAVING
- Allows to filter **after** an aggregation has already taken place.
- We have already seen an example below:
~~~~sql
SELECT company, SUM(sales)
FROM finance_table
WHERE company != 'Google'
GROUP BY company
~~~~
But we cannot use `WHERE` to filter based off of aggregate results, becausee those happen **after** a `WHERE` is executed.
- `HAVING` allows to use the aggregate results as a filter along with a `GROUP BY`
~~~~sql
SELECT company, SUM(sales)
FROM finance_table
WHERE company != 'Google'
GROUP BY company
HAVING SUM(sales) >1000
~~~~
- Example: What are the customer ids of customers whp have spent more than $100 in payment transactions with our staff_id number 2?
~~~~sql
SELECT customer_id, SUM(amount) FROM payment
WHERE staff_id=2
GROUP BY customer_id
HAVING SUM(amount)>100
~~~~
- Assessment Test
1. Return the customer IDs of customers who have spent at least $110 with the staff member who has an ID of 2.
~~~~sql
SELECT customer_id, SUM(amount) FROM payment
WHERE staff_id=2
GROUP BY customer_id
HAVING SUM(amount) >= 110
~~~~
2. How many films begin with the letter J?
~~~~sql
SELECT COUNT(*) FROM film
WHERE title LIKE 'J%'
~~~~
3. What customer has the highest customer ID number whose name starts with an 'E' and has an address ID lower than 500?
~~~~sql
SELECT first_name, last_name FROM customer
WHERE first_name LIKE 'E%' AND address_id < 500
ORDER BY customer_id DESC
~~~~
## JOIN
### AS
- It allows to create an "alias" for a column or result.
- Basic syntax:
~~~~sql
SELECT column_1 as new_name
FROM table_1
--
SELECT amount AS rental_price FROM payment
--
Column "amount" would be "rental_price"
~~~~
- The `AS` operator gets executed at the very end of a query, meaning that we cannot use the ALIAS inside a `WHERE` operator.
### INNER JOIN
- Our company is holding a conference for people in the movie rental industry. We will have people register online beforehand and the nlogin the day of the conference.
```csvpreview {header=true}
registration_id,name
1,Andrew
2,Bob
3,Charlie
4,David
```
```csvpreview {header=true}
login_id,name
1,Xavier
2,Andrew
3,Yolanda
4,Bob
```
- An `INNER JOIN` will result with the set of records that match in both tables.
~~~~sql
SELECT * FROM Registrations
INNER JOIN Logins
ON Registrations.name = Logins.name
~~~~
RESULTS
```csvpreview {header=true}
registration_id,name,login_id,name
1,Andrew,2,Andrew
2,Bob,4,Bob
```
- Another example for more clear results
~~~~sql
SELECT registration_id, Logins.name, login_id FROM Registrations
INNER JOIN Logins
ON Registrations.name = Logins.name
~~~~
RESULTS
```csvpreview {header=true}
registration_id,name,login_id
1,Andrew,2
2,Bob,4
```
- Remember that table order won't matter in an `INNER JOIN`
-
- If you see `JOIN` without the `INNER`, PostgreSQL will treat it as an `INNER JOIN`
### OUTER JOIN
- There are few types of `OUTER JOIN`s
- They allow to specify how to deal with values only present in one of the tables being joined.
- Syntax:
#### FULL OUTER JOIN
- Basic syntax
~~~~sql
SELECT * FROM table_A
FULL OUTER JOIN table_B
ON table_A.col_name1 = table_B.col_name2
~~~~
- Table order won't matter in a `FULL OUTER JOIN`
- Example:
```csvpreview {header=true}
registration_id,name
1,Andrew
2,Bob
3,Charlie
4,David
```
```csvpreview {header=true}
login_id,name
1,Xavier
2,Andrew
3,Yolanda
4,Bob
```
---> FULL OUTER JOIN <---
RESULTS
```csvpreview {header=true}
registration_id,name1,login_id,name2
1,Andrew,2,Andrew
2,Bob,4,Bob
3,Charlie,null,null
null,null,1,Xavier
null,null,3,Yolanda
```
#### FULL OUTER JOIN with WHERE
~~~~sql
SELECT * FROM table_A
FULL OUTER JOIN table_B
ON table_A.col_name1 = table_B.col_name2
WHERE table_A.id IS null OR table_B IS null
~~~~
Example:
~~~~sql
SELECT * FROM payment
FULL OUTER JOIN customer
ON payment.customer_id = customer.customer_id
WHERE payment.customer_id IS null OR
customer.customer_id IS null
~~~~
#### LEFT OUTER JOIN
- It results in the set of records that are in the left table, if there is no match with the right table, the results are null.
- Basic syntax
~~~~sql
SELECT * FROM table_A
LEFT OUTER JOIN table_B
ON table_A.col_name1 = table_B.col_name2
~~~~
- Table order will matter in a `LEFT OUTER JOIN`
- Example:
```csvpreview {header=true}
registration_id,name
1,Andrew
2,Bob
3,Charlie
4,David
```
```csvpreview {header=true}
login_id,name
1,Xavier
2,Andrew
3,Yolanda
4,Bob
```
---> LEFT OUTER JOIN <---
RESULTS
```csvpreview {header=true}
registration_id,name1,login_id,name2
1,Andrew,2,Andrew
2,Bob,4,Bob
3,Charlie,null,null
4,David,null,null
```
#### LEFT OUTER JOIN with WHERE
~~~~sql
SELECT * FROM table_A
FULL OUTER JOIN table_B
ON table_A.col_name1 = table_B.col_name2
WHERE table_B.id IS null
~~~~
Example:
~~~~sql
SELECT * FROM payment
FULL OUTER JOIN customer
ON payment.customer_id = customer.customer_id
WHERE payment.customer_id IS null
~~~~
#### RIGHT OUTER JOIN
- It is essentially the same as `LEFT JOIN`, except the tables are switched. This would be the same as switching the table order in a `LEFT OUTER JOIN`
### UNION
- It is used to combine the result-set of two or more `SELECT` statements.
- It basically serves to directly concatenate two results together, essentially "pasting" them together.
- Basic syntax:
~~~~sql
SELECT col1, col2 .. FROM table1
UNION
SELECT col3, col4 .. FROM table2
~~~~
>Practice
+ California sales tax laws have changed and we need to alert our customers to this through email. What are the emails of the customers who live in California?
~~~~sql
SELECT district, email FROM address
INNER JOIN customer
ON address.address_id = customer.address_id
WHERE district = 'California'
~~~~
+ A customer walks in and is a huge fan of the actor "Nick Wahlberg" and wants to know which movie he is in. Get a list of all the movies "Nick Wahlberg" has been in.
~~~~sql
SELECT title, first_name, last_name FROM actor
INNER JOIN film_actor ON actor.actor_id = film_actor.actor_id
INNER JOIN film ON film_actor.film_id = film.film_id
WHERE first_name = 'Nick' AND last_name = 'Wahlberg'
~~~~
## Advanced SQL Commands
- They are more useful when creating a new database rather than querying a database.
### Timestamps and Extract
* TIME - Contains only time
* DATE - Contains only date
* TIMESTAMP - Contains date and time
* TIMESTAMPTZ - Contains date, time, and timezone
~~~~sql
SELECT NOW()
> returns 2021-05-02 00:36:09.161744+03
SHOW TIMEZONE
> returns Europe/Istanbul
SELECT TIMEOFDAY()
> returns Sun May 02 00:48:36.717651 2021 +03
SELECT CURRENT_DATE
> returns 2021-05-02
~~~~
### EXTRACT() & AGE() & TO_CHAR()
- EXTRACT() allows to extract or obtain a sub-component of a date value.
- AGE() calculates and returns the current age given a timestamp.
- TO_CHAR() is useful for timestamp formatting.
~~~~sql
EXTRACT(YEAR FROM date_col)
> returns 2020
AGE(date_col)
> returns 13 years 1 mon 5 days 01:34:13.003423
TO_CHAR(date_col,'mm-dd-yyyy')
> returns '05-02-2020'
~~~~
### Mathematical Functions and Operators
### String Functions and Operators
### SubQuery
### Self-Join