# SQL Bootcamp 2021: Go from Zero to Hero > thanks to ['Jose Portilla](https://www.udemy.com/user/joseportilla/)' on Udemy, if you like please buy this course: https://www.udemy.com/course/the-complete-sql-bootcamp/ ## Introduction ### Spreadsheets vs Databases * Spreadsheets * One-time analysis * Quickly need to chart something out * Reasonable data set size * Ability for untrained people to work with data * Databases * Data integrity * Can handle massive amount of data * Quickly combine different datasets * Automate steps for re-use * Can support data for websites and applications ```csvpreview {header="true"} Spreadsheets,Databases Tab, Table Column, Column Row, Row ``` ### Database Platforms - PostgreSQL (Free, Open Source) - MySQL (Free, Open Source) - MS Sql Server Express (Free but limitations, only Windows) - Microsoft Access (Cost, not easy to use) - SQLite (Free, Open Source; cons=command line) ## Statements ### SELECT - Most common statement used - Allows to retrieve information from a table - Example: ~~~~sql SELECT col1, col2 FROM table_name; ~~~~ - In general, it is not good practice to use an asterisk (*) in the SELECT statement if you don't really net all columns. - The semicolon is to denote the end of a query but if you delete that semicolon, that would be no problem. - It doesn't matter if statements (select, fRoM) are capital case or not, both would work. ### SELECT DISTINCT - Sometimes a table contains a column that has duplicate values and you may find yourself in a situation where you only want to list the unique/distinct values. - The `DISTINCT` keyword can be used to return only the distinct values in a column. - The `DISTINCT` keyword operates on a column. The syntax looks like this: ~~~~sql SELECT DISTINCT col1 FROM table_name; ~~~~ - To clarify which column `DISTINCT` is being applied to, you can also use paranthesis for clarity. Later on, while using more calls such as `COUNT` and `DISTINCT` together, the paranthesis will be necessary. ~~~~sql SELECT DISTINCT(col1) FROM table_name; ~~~~ ### COUNT - The `COUNT` function returns the number of input rows that match a specific condition of a query. - We can apply `COUNT` on a specific column or just pass `COUNT(*)` that returns the same result. - All return the same thing, since the original table had N rows: - SELECT COUNT(name) FROM table; - SELECT COUNT(choice) FROM table; - SELECT COUNT(*) FROM table; - Because of this `COUNT` by itself simply returns back a count of the number of rows in a table; `COUNT` is much more useful when combined with other commands, such as `DISTINCT`. - Examples: ~~~~sql SELECT COUNT(DISTINCT col1) FROM table_name; ~~~~ ~~~~sql SELECT COUNT(DISTINCT customer_id) FROM payment; ~~~~ ### SELECT WHERE - `SELECT` and `WHERE` are most fundamental SQL statements. - The `WHERE` statement allows to specify conditions on columns for the rows to be returned. - Basic syntax example: ~~~~sql SELECT col1, col2 FROM table WHERE conditions; ~~~~ - The `WHERE` clause apperas immediately after the `FROM` clause of `SELECT` statement. - The condiions are used to filter the rows returned from the `SELECT` statement. - PostgreSQL provides a variety of standard operators to construct the conditions. #### Comparison Operators ```csvpreview {header="true"} Operator, Description =,Equal >,Greater than <, Less than >=, Greater than or equal to <=, Less than or equal to <> or !=, Not equal to ``` #### Logical Operators - AND - OR - NOT ```csvpreview {header="true"} Name, Choice Zach,Green David,Green Claire,Yellow David,Red ``` - Simple Syntax Example -> Get only the people named David with Choice Red ~~~~sql SELECT * FROM table_1 WHERE Name='David' AND Choice='Red' ~~~~ Note: Use single quotes to define strings. ### ORDER BY - It can be used to sort rows based on a column value, in either ascending or descending order. - Basic syntax example: ~~~~sql SELECT col1, col2 FROM table_1 ORDER BY col_1 ASC ~~~~ - If you leave `ASC`/`DESC` field blank, `ORDER BY` uses `ASC` by default. - `ORDER BY` can also be used for multiple columns. This makes sense when one column has duplicate entries. ~~~~sql SELECT col1, col2, col3 FROM table_1 ORDER BY col_1 DESC, col3 ASC ~~~~ - Without requesting in `SELECT` statement, it still can be sorted by any other column. ### LIMIT - The `LIMIT` command allows to limit the number of rows returned for a query. - Useful for not wanting to return every single row in a table, but only view the top few rows to get an idea of the table layout. - `LIMIT` also becomes useful in combination with `ORDER BY`. - `LIMIT` goes at the very end of a query request and is the last command to be executed. - Basic syntax example: ~~~~sql SELECT col1, col2 FROM table_1 ORDER BY col_1 ASC LIMIT 20 ~~~~ ### BETWEEN (and NOT BETWEEN) - Used to match a value against a range of values: - The `BETWEEN` operator is the same as: - value >= low `AND` value <= high - value `BETWEEN` low `AND` high (!! low and high values are included!!) - `BETWEEN` operator can also be combined with the `NOT` logical operator. - The ` NOT BETWEEN` operator is the same as: - value `NOT BETWEEN` low `AND` high - value `<` low `OR` value `>` high - `BETWEEN` operator can also be used with dates. Note that you need to format dates in the ISO 8601 standard format, which is YYYY-MM-DD - date `BETWEEN` '2007-01-01' AND `2007-02-01` - Basic Syntax Example ~~~~sql SELECT * FROM payment WHERE amount BETWEEN a AND b ~~~~ ### IN (and NOT IN) - It can be used to create a condition that checks to see if a value is included in a list of multiple options. - Example query: ~~~~~sql SELECT color FROM table_1 WHERE color IN ('red','blue','green') ~~~~~ - These are the same: ~~~~~sql SELECT * FROM payment WHERE amount IN (0.99,1.98,1.99) ~~~~~ ~~~~~sql SELECT * FROM payment WHERE amount = 0.99 OR amount = 1.98 OR amount=1.99 ~~~~~ ### LIKE and ILIKE - The `LIKE` operator allows to perform pattern matching against string data with the use of wildcard characters: - Percent '%': Matches any sequence of characters - Underscore '_': Matches any single character - You can use multiple underscores. - Examples: - All names that begin with an 'A' - `WHERE` name `LIKE` 'A%' - All names that end with an 'a' - `WHERE` name `LIKE` '%a' - Notice that `LIKE` is case-sensitive, we can use `ILIKE` which is case-insensitive. - Get all Mission Impossible films - `WHERE` title `LIKE` 'Mission Impossible _' - Imagine we had version string codes in the format 'Version#A4', 'Version#B7', etc.. - `WHERE` value `LIKE` 'Version#__' - We can also combine pattern matching operators to create more complext patterns such as: - `WHERE` name `LIKE` '_her%' --> Cheryl, Theresa, Sherri :::info POSTGRESQL REGEX https://www.postgresql.org/docs/12/functions-matching.html ::: - Some examples: ~~~~sql SELECT * FROM customer WHERE first_name LIKE 'J%' AND last_name LIKE 'S%'; SELECT * FROM customer WHERE first_name LIKE '_her%'; SELECT * FROM customer WHERE first_name LIKE 'C%' ORDER BY last_name ~~~~ ## AGGREGATE FUNCTIONS - Most common aggregate functions: - `AVG()` - returns average value - `COUNT()` - returns number of values - `MAX()` - returns maximum value - `MIN()` - returns minimum value - `SUM()` - returns the sum of all values - Aggregate function calls happen only in the `SELECT` clause or the `HAVING` clause. - Special Notes: - `AVG()` returns a floating point value many decimal places (e.g. 2.342418...) You can use `ROUND()` to specify precision after the decimal. - `COUNT()` is simply returns the number of rows, which means by convention we just use `COUNT(*)` - Some examples: ~~~~sql SELECT MIN(replacement_cost) FROM film; SELECT ROUND(AVG(replacement_cost),3) FROM film; SELECT MIN(replacement_cost), MAX(replacement_cost), ROUND(AVG(replacement_cost),2) FROM film; ~~~~ ### GROUP BY - General syntax: ~~~~sql SELECT category_col, AGG(data_col) FROM table_1 GROUP BY category_col ~~~~ - The `GROUP BY` clause must appera right after `FROM` or `WHERE` statement. - In the `SELECT` statement, columns must either have an aggregate function or be in the `GROUP BY` call. - Example: ~~~~sql SELECT company, division, SUM(sales) FROM finance_table GROUP BY company, division ~~~~ - `WHERE` statements should not refer to the aggreagation result, later on it will be used by `HAVING` statement. ~~~~sql SELECT company, division, SUM(sales) FROM finance_table WHERE division IN('marketing','transport') GROUP BY company, division ~~~~ - If you want to sort results base on the aggregate, make sure to reference the entire function. ~~~~sql SELECT company SUM(sales) FROM finance_table GROUP BY company ORDER BY SUM(sales) -- SELECT customer_id, SUM(amount) FROM payment GROUP BY customer_id ORDER BY SUM(amount) DESC -- SELECT DATE(payment_date), SUM(amount) FROM payment GROUP BY DATE(payment_date) ORDER BY SUM(amount) DESC -- SELECT customer_id, SUM(amount) FROM payment GROUP BY customer_id ORDER BY SUM(amount) DESC LIMIT 5 ~~~~ ### HAVING - Allows to filter **after** an aggregation has already taken place. - We have already seen an example below: ~~~~sql SELECT company, SUM(sales) FROM finance_table WHERE company != 'Google' GROUP BY company ~~~~ But we cannot use `WHERE` to filter based off of aggregate results, becausee those happen **after** a `WHERE` is executed. - `HAVING` allows to use the aggregate results as a filter along with a `GROUP BY` ~~~~sql SELECT company, SUM(sales) FROM finance_table WHERE company != 'Google' GROUP BY company HAVING SUM(sales) >1000 ~~~~ - Example: What are the customer ids of customers whp have spent more than $100 in payment transactions with our staff_id number 2? ~~~~sql SELECT customer_id, SUM(amount) FROM payment WHERE staff_id=2 GROUP BY customer_id HAVING SUM(amount)>100 ~~~~ - Assessment Test 1. Return the customer IDs of customers who have spent at least $110 with the staff member who has an ID of 2. ~~~~sql SELECT customer_id, SUM(amount) FROM payment WHERE staff_id=2 GROUP BY customer_id HAVING SUM(amount) >= 110 ~~~~ 2. How many films begin with the letter J? ~~~~sql SELECT COUNT(*) FROM film WHERE title LIKE 'J%' ~~~~ 3. What customer has the highest customer ID number whose name starts with an 'E' and has an address ID lower than 500? ~~~~sql SELECT first_name, last_name FROM customer WHERE first_name LIKE 'E%' AND address_id < 500 ORDER BY customer_id DESC ~~~~ ## JOIN ### AS - It allows to create an "alias" for a column or result. - Basic syntax: ~~~~sql SELECT column_1 as new_name FROM table_1 -- SELECT amount AS rental_price FROM payment -- Column "amount" would be "rental_price" ~~~~ - The `AS` operator gets executed at the very end of a query, meaning that we cannot use the ALIAS inside a `WHERE` operator. ### INNER JOIN - Our company is holding a conference for people in the movie rental industry. We will have people register online beforehand and the nlogin the day of the conference. ```csvpreview {header=true} registration_id,name 1,Andrew 2,Bob 3,Charlie 4,David ``` ```csvpreview {header=true} login_id,name 1,Xavier 2,Andrew 3,Yolanda 4,Bob ``` - An `INNER JOIN` will result with the set of records that match in both tables. ~~~~sql SELECT * FROM Registrations INNER JOIN Logins ON Registrations.name = Logins.name ~~~~ RESULTS ```csvpreview {header=true} registration_id,name,login_id,name 1,Andrew,2,Andrew 2,Bob,4,Bob ``` - Another example for more clear results ~~~~sql SELECT registration_id, Logins.name, login_id FROM Registrations INNER JOIN Logins ON Registrations.name = Logins.name ~~~~ RESULTS ```csvpreview {header=true} registration_id,name,login_id 1,Andrew,2 2,Bob,4 ``` - Remember that table order won't matter in an `INNER JOIN` - - If you see `JOIN` without the `INNER`, PostgreSQL will treat it as an `INNER JOIN` ### OUTER JOIN - There are few types of `OUTER JOIN`s - They allow to specify how to deal with values only present in one of the tables being joined. - Syntax: #### FULL OUTER JOIN - Basic syntax ~~~~sql SELECT * FROM table_A FULL OUTER JOIN table_B ON table_A.col_name1 = table_B.col_name2 ~~~~ - Table order won't matter in a `FULL OUTER JOIN` - Example: ```csvpreview {header=true} registration_id,name 1,Andrew 2,Bob 3,Charlie 4,David ``` ```csvpreview {header=true} login_id,name 1,Xavier 2,Andrew 3,Yolanda 4,Bob ``` ---> FULL OUTER JOIN <--- RESULTS ```csvpreview {header=true} registration_id,name1,login_id,name2 1,Andrew,2,Andrew 2,Bob,4,Bob 3,Charlie,null,null null,null,1,Xavier null,null,3,Yolanda ``` #### FULL OUTER JOIN with WHERE ~~~~sql SELECT * FROM table_A FULL OUTER JOIN table_B ON table_A.col_name1 = table_B.col_name2 WHERE table_A.id IS null OR table_B IS null ~~~~ Example: ~~~~sql SELECT * FROM payment FULL OUTER JOIN customer ON payment.customer_id = customer.customer_id WHERE payment.customer_id IS null OR customer.customer_id IS null ~~~~ #### LEFT OUTER JOIN - It results in the set of records that are in the left table, if there is no match with the right table, the results are null. - Basic syntax ~~~~sql SELECT * FROM table_A LEFT OUTER JOIN table_B ON table_A.col_name1 = table_B.col_name2 ~~~~ - Table order will matter in a `LEFT OUTER JOIN` - Example: ```csvpreview {header=true} registration_id,name 1,Andrew 2,Bob 3,Charlie 4,David ``` ```csvpreview {header=true} login_id,name 1,Xavier 2,Andrew 3,Yolanda 4,Bob ``` ---> LEFT OUTER JOIN <--- RESULTS ```csvpreview {header=true} registration_id,name1,login_id,name2 1,Andrew,2,Andrew 2,Bob,4,Bob 3,Charlie,null,null 4,David,null,null ``` #### LEFT OUTER JOIN with WHERE ~~~~sql SELECT * FROM table_A FULL OUTER JOIN table_B ON table_A.col_name1 = table_B.col_name2 WHERE table_B.id IS null ~~~~ Example: ~~~~sql SELECT * FROM payment FULL OUTER JOIN customer ON payment.customer_id = customer.customer_id WHERE payment.customer_id IS null ~~~~ #### RIGHT OUTER JOIN - It is essentially the same as `LEFT JOIN`, except the tables are switched. This would be the same as switching the table order in a `LEFT OUTER JOIN` ### UNION - It is used to combine the result-set of two or more `SELECT` statements. - It basically serves to directly concatenate two results together, essentially "pasting" them together. - Basic syntax: ~~~~sql SELECT col1, col2 .. FROM table1 UNION SELECT col3, col4 .. FROM table2 ~~~~ >Practice + California sales tax laws have changed and we need to alert our customers to this through email. What are the emails of the customers who live in California? ~~~~sql SELECT district, email FROM address INNER JOIN customer ON address.address_id = customer.address_id WHERE district = 'California' ~~~~ + A customer walks in and is a huge fan of the actor "Nick Wahlberg" and wants to know which movie he is in. Get a list of all the movies "Nick Wahlberg" has been in. ~~~~sql SELECT title, first_name, last_name FROM actor INNER JOIN film_actor ON actor.actor_id = film_actor.actor_id INNER JOIN film ON film_actor.film_id = film.film_id WHERE first_name = 'Nick' AND last_name = 'Wahlberg' ~~~~ ## Advanced SQL Commands - They are more useful when creating a new database rather than querying a database. ### Timestamps and Extract * TIME - Contains only time * DATE - Contains only date * TIMESTAMP - Contains date and time * TIMESTAMPTZ - Contains date, time, and timezone ~~~~sql SELECT NOW() > returns 2021-05-02 00:36:09.161744+03 SHOW TIMEZONE > returns Europe/Istanbul SELECT TIMEOFDAY() > returns Sun May 02 00:48:36.717651 2021 +03 SELECT CURRENT_DATE > returns 2021-05-02 ~~~~ ### EXTRACT() & AGE() & TO_CHAR() - EXTRACT() allows to extract or obtain a sub-component of a date value. - AGE() calculates and returns the current age given a timestamp. - TO_CHAR() is useful for timestamp formatting. ~~~~sql EXTRACT(YEAR FROM date_col) > returns 2020 AGE(date_col) > returns 13 years 1 mon 5 days 01:34:13.003423 TO_CHAR(date_col,'mm-dd-yyyy') > returns '05-02-2020' ~~~~ ### Mathematical Functions and Operators ### String Functions and Operators ### SubQuery ### Self-Join