---
title: Agenda
description: Agenda of this lecture
duration: 120
card_type: cue_card
---
## Agenda
- Aggregate Queries
- Aggregate Functions
- COUNT
- \* (asterisk)
- Other aggregate functions
- GROUP BY Clause
- HAVING Clause
---
title: Aggregate Queries
description:
duration: 300
card_type: cue_card
---
## Aggregate Queries
Hello Everyone, till now whatever SQL queries we had written worked over each row of the table one by one, filtered some rows, and returned the rows.
Eg: We have been answering questions like:
- Find the students who ...
- Find the batches who ...
- Find the name of every student.
But now we will be answering questions like:
- What is the average salary of all the employees?
- What was the count of movies released in each year?
- What is the maximum salary of all the employees?
> Note: Give examples for above questions.
In above questions, we are not interested in the individual rows, but we are interested to get some data by combining multiple rows. For example, to find the answer of first, you will have to get the rows for all of the employees, go through their salary column, average that and print.
How to do this is what we are going to learn about in today's class.
---
title: Aggregate Functions
description:
duration: 180
card_type: cue_card
---
## Aggregate Functions
SQL provides us with some functions which can be used to aggregate data. These functions are called aggregate functions. Imagine a set of column. With the values of that column across all rows, what all operations would you may want to do?
> NOTE: Give hints to learners to get SUM, average, max etc.
Correct, to allow for exactly all of these operations, SQL provides us with aggregate functions. Aggregate functions will always output 1 value. Let's go through some of these functions one by one and see how they work.
---
title: COUNT
description:
duration: 1440
card_type: cue_card
---
### COUNT
Count function takes the values from a particular column and returns the number of values in that set. Umm, but don't you think it will be exactly same as the number of rows in the table? Nope. Not true. Aggregate functions only take not null values into account. So, if there are any null values in the column, they will not be counted.
Example: Let's take a students table with data like follows:
| id | name | age | batch_id |
|----|------|-----|----------|
| 1 | A | 20 | 1 |
| 2 | B | 21 | 1 |
| 3 | C | 22 | null |
| 4 | D | 23 | 2 |
If you will try to run COUNT and give it the values in batch_id column, it will return 3. Because there are 3 not null values in the column. This is different from number of rows in the students table.
Let's see how do you use this operation in SQL.
```sql
SELECT COUNT(batch_id) FROM students;
```
To understand how aggregate functions work via a pseudocode, let's see how SQL query optimizer may execute them.
```python
table = []
count = 0
for row in table:
if row[batch_id]:
count += 1
print(count)
```
Few things to note here:
While printing, do we have access to the values of row? Nope. We only have access to the count variable. So, we can only print the count. Extrapolating this point, when you use aggregate functions, you can only print the result of the aggregate function. You cannot print the values of the rows.
Eg:
```sql
SELECT COUNT(batch_id), batch_id FROM students;
```
This will be an invalid query. Because, you are trying to print the values of `batch_id` column as well as the count of `batch_id` column. But, you can only print the count of `batch_id` column.
---
title: asterisk
description:
duration: 180
card_type: cue_card
---
### * (asterisk) to count number of rows in the table
What if we want to count the number of rows in the table? We can do that by passing a * to the count function.
\* as we know from earlier, refers to all the columns in the table. So, count(*) will count the number of rows in the table.
```sql
SELECT COUNT(*) FROM students;
```
The above query will print the number of rows in the table.
---
title: Other aggregate functions
description:
duration: 480
card_type: cue_card
---
### Other aggregate functions:
We can use multiple aggregation function in the same query as well. For example:
```sql
SELECT COUNT(batch_id), AVG(age) FROM students;
```
Some aggregate functions are as follows.
1. MAX: Gives Maximum value
2. MIN: Gives minimum value
Note that, values in the column must be comparable for MAX and MIN.
3. AVG: Gives average of non NULL values from the column.
For example: AVG(1, 2, 3, NULL) will be 2.
4. SUM: Gives sum, ignoring the null values from the column.
> Note: Include examples of above functions.
---
title: Quiz 1
description:
duration: 45
card_type: quiz_card
---
# Question
Which of the following is NOT an aggregate function in SQL?
# Choices
- [ ] SUM()
- [x] CONCAT()
- [ ] AVG()
- [ ] MAX()
---
title: Quiz 2
description:
duration: 45
card_type: quiz_card
---
# Question
What does the MIN() function in SQL do?
# Choices
- [ ] Returns the smallest value in a row
- [ ] Returns the smallest value in a table
- [ ] Returns the maximum value in a column
- [x] Returns the smallest value in a column
---
title: Quiz 3
description:
duration: 45
card_type: quiz_card
---
# Question
Which of the following is a valid aggregate function for students table?
# Choices
- [ ] sum(avg(psp))
- [ ] Min(avg(psp))
- [x] concat(first_name, last_name)
- [ ] None of the above
---
title: GROUP BY clause
description:
duration: 900
card_type: cue_card
---
## GROUP BY Clause
### Please use macros for demo: https://docs.google.com/spreadsheets/d/1OLpm37Id5SuwK-2kP-xmxoTE7JGKXcNS6dXQRoqZy5I/edit?usp=sharing
Till now we combined multiple values into a single values by doing some operation on all of them. What if, we want to get the final values in multiple sets? That is, we want to get the set of values as our result in which each value is derived from a group of values from the column.
The way Group By clause works is it allows us to break the table into multiple groups so as to be used by the aggregate function.
For example: `GROUP BY batch_id` will bring all rows with same `batch_id` together in one group
> Note: Also, GROUP BY always works before aggregate functions. Group By is used to apply aggregate function within groups (collection of rows). The result comes out to be a set of values where each value is derived from its corresponding group.
Let's take an example.
| id | name | age | batch_id |
|----|------|-----|----------|
| 1 | A | 20 | 1 |
| 2 | B | 21 | 3 |
| 3 | C | 22 | 1 |
| 4 | D | 23 | 2 |
| 5 | E | 23 | 1 |
| 6 | F | 25 | 2 |
| 7 | G | 22 | 3 |
| 8 | H | 21 | 2 |
| 9 | I | 20 | 1 |
```sql
SELECT COUNT(*), batch_id FROM students GROUP BY batch_id;
```
The result of above query will be:
| COUNT(\*) | batch_id |
|-----------|----------|
| 4 | 1 |
| 3 | 2 |
| 2 | 3 |
Explanation: The query breaks the table into 3 groups each having rows with `batch_id` as 1, 2, 3 respectively. There are 4 rows with `batch_id = 1`, 3 rows with `batch_id = 2` and 2 rows with `batch_id = 3`.
Note that, we can only use the columns in SELECT which are present in Group By because only those columns will have same value across all rows in a group.
> Ask if students have any doubts and take examples to explain them.
---
title: Quiz 4
description:
duration: 45
card_type: quiz_card
---
# Question
Which of the following queries will not result in an error when executed on the 'students' table?
# Choices
- [ ] Select avg(psp), batch_id from students group by batch_id;
- [ ] Select avg(psp), first_name from students group by batch_id;
- [x] Select avg(psp), last_name from students group by batch_id;
- [ ] None of the above
---
title: Quiz 5
description:
duration: 45
card_type: quiz_card
---
# Question
When applying a GROUP BY clause to a table, which columns can be included in the SELECT command?
# Choices
- [ ] Any column.
- [x] Only those columns which have a common value to your group.
- [ ] Only Primary Key column
- [ ] None of the above
---
title: HAVING Clause
description:
duration: 1200
card_type: cue_card
---
## HAVING Clause
HAVING clause is used to filter groups. Let's take a question to understand the need of HAVING clause:
There are 2 tables: Students(id, name, age, batch_id) and Batches(id, name). Print the batch names that have more than 100 students along with count of the students in each batch.
```sql
SELECT COUNT(S.id), B.name
FROM Students S
JOIN Batches B ON S.batch_id = B.id
GROUP BY B.name;
HAVING COUNT(S.id) > 100;
```
Here, `GROUP BY B.name` groups the results by the `B.name` column (batch name). It ensures that the count is calculated for each distinct batch name.
`HAVING COUNT(S.id) > 100` condition filters the grouped results based on the count of `S.id` (number of students). It retains only the groups where the count is greater than 100.
The sequence in which query executes is:
- Firstly, join of the two tables is done.
- Then is is divided into groups based on `B.name`.
- In the third step, result is filtered using the condition in HAVING clause.
- Lastly, it is printed through SELECT.
FROM -> WHERE -> GROUP BY -> HAVING -> SELECT
WHERE is not build to be able to handle aggregates. We can not use WHERE after GROUP BY because WHERE clause works on rows and as soon as GROUP BY forms a result, the rows are convereted into groups. So, no individual conditions or actions can be performed on rows after GROUP BY.
---
title: Quiz 6
description:
duration: 45
card_type: quiz_card
---
# Question
Which SQL statement correctly demonstrates the use of the HAVING clause?
# Choices
- [ ] SELECT * FROM orders HAVING total > 100;
- [ ] SELECT * FROM orders WHERE total > 100 GROUP BY customer_id;
- [ ] SELECT customer_id, COUNT(id) FROM orders GROUP BY customer_id where COUNT(id) > 5;
- [x] SELECT customer_id, SUM(total) FROM orders GROUP BY customer_id HAVING SUM(total) > 1000;
---
title: Quiz 7
description:
duration: 45
card_type: quiz_card
---
# Question
Which one of the following query will not yield an error?
# Choices
- [ ] Select avg(psp), batch_id from students group by batch_id where avg(psp) > 80;
- [ ] Select avg(psp), psp from students group by batch_id having avg(psp) > 80;
- [x] Select avg(psp), batch_id from students group by batch_id having avg(psp) > 80;
- [ ] None of the above
---
title: Announcements:
description:
duration: 150
card_type: cue_card
---
Note: Please check out these notes for revision: https://drive.google.com/file/d/1XBd2_EOAi_OqO8ltOMI2a9BOa6TnPEy4/view?usp=drive_link
Since, this is quite an easy topic let's try our hands on the assignments now.
That's all for this class. If there are any doubts feel free to ask now. Thanks!