--- title: Agenda description: Agenda of this lecture duration: 220 card_type: cue_card --- ## Agenda - Joins - Self Join - SQL query as pseudocode - Joining Multiple Tables --- title: Joins description: duration: 2100 card_type: cue_card --- ## Joins ***`Note: Use the following macros for inner joins demonstration:`*** https://docs.google.com/spreadsheets/d/1RHZ4T7Kmibk5InZUrsTiBGQFAbZoOyVdaYH5hbmMcTw/edit#gid=0 Welcome everyone to the next class of your SQL Module. Today we are going to up the complexity of SQL Read queries we are going to write while still using the same foundational concepts we had learnt in the previous class on CRUD. Till now, whenever we had written an SQL query, the query found data from how many tables? > NOTE: Wait for the learners to say 1 Correct, every SQL query we had written till now was only finding data from 1 table. Most of the queries we had written in the previous class were on the `film` table where we applied multiple filters etc. But do you think being able to query data from a single table is enough? Let's take a scenario of Scaler. Let's say we have 2 tables as follows in the Scaler's database: `batches` | batch_id | batch_name | |----------|------------| | 1 | Batch A | | 2 | Batch B | | 3 | Batch C | `students` | student_id | first_name | last_name | batch_id | |------------|------------|-----------|----------| | 1 | John | Doe | 1 | | 2 | Jane | Doe | 1 | | 3 | Jim | Brown | 2 | | 4 | Jenny | Smith | 3 | | 5 | Jack | Johnson | 2 | Suppose, someone asks you to print the name of every student, along with the name of their batch. The output should be something like: | student_name | batch_name | |--------------|------------| | John | Batch A | | Jane | Batch A | | Jim | Batch B | | Jenny | Batch C | | Jack | Batch B | Will you be able to get all of this data by querying over a single table? No. The `student_name` is there in the students table, while the `batch_name` is in the batches table! We somehow need a way to combine the data from both the tables. This is where joins come in. What does the word `join` mean to you? > NOTE: Give hints to get someone to say `combine`. Correct! Joins, as the name suggests, are a way to combine data from multiple tables. For example, if I want to combine the data from the `students` and `batches` table, I can use joins for that. Think of joins as a way to stitch rows of 2 tables together, based on the condition you specify. Example: In our case, we would want to stitch a row of students table with a row of batches table based on what? Imagine that every row of `students` I try to match with every row of `batches`. Based on what condition to be true between those will I stitch them? > NOTE: Give hints to get someone to say `batch_id` in students' row to match with `batch_id` in batches' row. > Demonstrate learners matching each row of 1 table with each row of other table. Correct, we would want to stitch a row of students table with a row of batches table based on the `batch_id` column. This is what we call a `join condition`. A join condition is a condition that must be true between the rows of 2 tables for them to be stitched together. Let's see how we can write a join query for our example. ```sql SELECT students.first_name, batches.batch_name FROM students JOIN batches ON students.batch_id = batches.batch_id; ``` Let's break down this query. The first line is the same as what we have been writing till now. We are selecting the `first_name` column from the `students` table and the `batch_name` column from the `batches` table. The next line is where the magic happens. We are using the `JOIN` keyword to tell SQL that we want to join the `students` table with the `batches` table. The next line is the join condition. We are saying that we want to join the rows of `students` table with the rows of `batches` table where the `batch_id` column of `students` table is equal to the `batch_id` column of `batches` table. This is how we write a join query. > NOTE: While discussing below, share screen. Let's take an example of this on the Sakila database. Let's say for every film, we want to print its name and the language. How can we do that? ```sql SELECT film.title, language.name FROM film JOIN language ON film.language_id = language.language_id; ``` Now, sometimes typing name of tables in the query can become difficult. For example, in the above query, we have to type `film` and `language` multiple times. To make this easier, we can give aliases to the tables. For example, we can give the alias `f` to the `film` table and `l` to the `language` table. We can then use these aliases in our query. Let's see how we can do that: ```sql SELECT f.title, l.name FROM film f JOIN language l ON f.language_id = l.language_id; ``` **Please explain joins using color and sizes table in template. Where only the rows matching some condition are returned in output.** --- title: Quiz 1 description: duration: 45 card_type: quiz_card --- # Question What does JOIN command do in SQL? # Choices - [ ] Returns all rows from both tables - [x] Returns only the matching rows between the tables - [ ] Returns all rows from the single table - [ ] None of the above --- title: Quiz 2 description: duration: 45 card_type: quiz_card --- # Question When joining multiple tables, what is the purpose of the ON clause? # Choices - [x] It specifies the conditions for joining the tables - [ ] It specifies the order of the tables to be joined - [ ] It filters the rows from the joined tables - [ ] None of the above --- title: Quiz 3 description: duration: 45 card_type: quiz_card --- # Question Which SQL statement combines rows from two tables where there is a match in both tables? # Choices - [ ] SELECT * FROM table1 and table2 ON table1.id = table2.id; - [x] SELECT * FROM table1 JOIN table2 ON table1.id = table2.id; - [ ] SELECT * FROM table1 JOIN table2 ON id; - [ ] None of the above --- title: Self Join description: duration: 600 card_type: cue_card --- ## Self Join Let's say at Scaler, for every student we assign a Buddy. For this we have a `students` table, which looks as follows: `id | name | buddy_id` This `buddy_id` will be an id of what? > NOTE: Give hints to get someone to say `student` Correct. Now, let's say we have to print for every student, their name and their buddy's name. How will we do that? Here 2 rows of which tables would we want to stitch together to get this data? > NOTE: Give hints to get someone to say `students` table Correct, an SQL query for the same shall look like: ```sql SELECT s1.name, s2.name FROM students s1 JOIN students s2 ON s1.buddy_id = s2.id; ``` This is an example of SELF join. A self join is a join where we are joining a table with itself. In the above query, we are joining the `students` table with itself. In a self joining, aliasing tables is very important. If we don't alias the tables, then SQL will not know which row of the table to match with which row of the same table (because both of them have same names as they are the same table only). --- title: SQL query as pseudocode description: duration: 600 card_type: cue_card --- ### SQL query as pseudocode As we have been doing since the CRUD class, let's also see how Joins can be represented in terms of pseudocode. Let's take this query: ```sql SELECT s1.name, s2.name FROM students s1 JOIN students s2 ON s1.buddy_id = s2.id; ``` In pseudocode, it shall look like: ```python3 ans = [] for row1 in students: for row2 in students: if row1.buddy_id == row2.id: ans.add(row1 + row2) for row in ans: print(row.name, row.name) ``` --- title: Quiz 4 description: duration: 45 card_type: quiz_card --- # Question What does a self-join do in SQL? # Choices - [ ] Combines rows from two different tables. - [x] Combines rows from the same table based on a related column. - [ ] Returns only the rows from the left table. - [ ] Returns only the rows from the right table. --- title: Quiz 5 description: duration: 45 card_type: quiz_card --- # Question Which one of the following is correct query to do a self join on table based on column id? # Choices - [ ] Select * from table join table on table.id = table.id; - [ ] Select * from table t1 join table t2 on table.id = table.id; - [x] Select * from table t1 join table t2 on t1.id = t2.id; - [ ] None of the above --- title: Quiz 6 description: duration: 45 card_type: quiz_card --- # Question Can we apply self join without using aliasing? # Choices - [x] NO, there will be ambiguity in tables. - [ ] Yes - [] Coconut water --- title: Joining Multiple Tables description: duration: 1620 card_type: cue_card --- ## Joining Multiple Tables Till now, we had only joined 2 tables. But what if we want to join more than 2 tables? Let's say we want to print the name of every film, along with the name of the language and the name of the original language. How can we do that? If you have to add 3 numbers, how do you do that? To get the name of the language, we would first want to combine film and language table over the `language_id` column. Then, we would want to combine the result of that with the language table again over the `original_language_id` column. This is how we can do that: ```sql SELECT f.title, l1.name, l2.name FROM film f JOIN language l1 ON f.language_id = l1.language_id JOIN language l2 ON f.original_language_id = l2.language_id; ``` Let's see how this might work in terms of pseudocode: ```python3 ans = [] for row1 in film: for row2 in language: if row1.language_id == row2.id: ans.add(row1 + row2) for row in ans: for row3 in language: if row.language_id == row3.language_id: ans.add(row + row3) for row in ans: print(row.name, row.language_name, row.original_language_name) ``` --- title: Quiz 7 description: duration: 45 card_type: quiz_card --- # Question When joining multiple tables, what is the purpose of the ON clause? # Choices - [x] It specifies the conditions for joining the tables - [ ] It specifies the order of the tables to be joined - [ ] It filters the rows from the joined tables - [ ] None of the above --- title: Quiz 8 description: duration: 45 card_type: quiz_card --- # Question Is there any change in sequence of output while joining multiple tables at a time? # Choices - [ ] No, overall sequence/order is going to be same - [x] Overall sequence will change however the answer will have same rows - [ ] None of the above --- title: Announcements: description: duration: 150 card_type: cue_card --- Note: Please check out these notes for revision: https://drive.google.com/file/d/1ZDN2AuVF6dmBSnyXUPKTa7SdLLMk4PZ8/view?usp=drive_link Folks this is where learners stop solving questions as things gets a bit tricky out here. But let me assure you that the only way is by going through. Once you start solving the questions you will gradually start getting good understanding of this topic. Let's take this challenge to solve atleast all assignment questions of this session.