# SQL Introduction
:::info
[TOC]
:::

<br/>
## Preface
SQL is a **relational query language** used in practice.
- Implements ideas from **Relational Algebra, Relational Calculus, and Datalog**.
> (without recursion, with slight modifications)
- More **user-friendly** and closer to **natural language**.
- Focuses on **core relational functionalities**.
- Has many standards. (e.g., SQL99)
<br/>
### Basic Form
1. `SELECT` returned attribute(s)
→ Specifies **which attributes (columns)** to return.
2. `FROM` relation(s)
→ Specifies **which relation(s) (tables)** to query.
3. `WHERE` condition(s)
→ Filters tuples based on **conditions on the tables**.
>Execution Semantics:
> - Apply the **`WHERE` conditions** to all relations in the **`FROM`** clause.
> - Return the **values of attributes** listed in the **`SELECT`** clause
<br/>
## Core Functionality
- ✰ Example Schema:
```C++
Coffee(coffee_name, producer)
CoffeeShop(shop_name, address, license)
CoffeeDrinker(drinker_name, address, phone)
Likes(drinker_name, coffee_name)
Sells(shop_name, coffee_name, price)
Frequents(drinker_name, shop_name)
```
<br/>
### 1. Single Relational Query
> - _What brands are made by Baristas?_
```SQL
SELECT coffee_name FROM CoffeeTable WHERE producer = 'Baristas';
```
- **Original Table**
| coffee_name | producer |
|-|-|
| Caribou | Bristas |
| Costa | Coava |
| Kenya | Delta Coffee |
- **Query Result**
| coffee_name |
|-|
| Caribou |
<br/>
### 2. Select All Attributes Query `(*)`
> - What coffee brands are produced by Baristas?
```SQL
SELECT * FROM CoffeeTable WHERE producer = 'Baristas';
```
- **Original Table**
| coffee_name | producer |
|-|-|
| Caribou | Bristas |
| Costa | Coava |
| Kenya | Delta Coffee |
- **Query Result**
| coffee_name | producer |
|-|-|
| Caribou | Baristas |
<br/>
### 3. Query with `WHERE` clause
> - Used to **filter tuples (rows)** based on conditions.
> - May contain **complex conditions**.
- Operators
| Operator | Example |
|-|-|
| **Logical operators** | `AND`, `OR`, `NOT` |
| **Comparison operators** | `<`, `>`, `=`, `<>` _(not equal to)_, `<=`, `>=`, etc. |
| **Type-specific operators** | `LIKE`, etc. |
> - What coffee brands are produced by Baristas or Coava?
```SQL
SELECT coffee_name FROM CoffeeTable WHERE producer = 'Baristas' OR producer = 'Coava';
```
- **Original Table**
| coffee_name | producer |
|-|-|
| Caribou | Bristas |
| Costa | Coava |
| Kenya | Delta Coffee |
- **Query Result**
| coffee_name |
|-|
| Caribou |
| Costa |
<br/>
### 4. Multi Relation Query `(Join)`
> - Find relations between different types of entities.
> - What are the coffee brands liked by at least one person who frequents _Culture_?
- **Solution Without Join**
```SQL
SELECT DISTINCT coffee_name FROM Likes, Frequents
WHERE Frequents.shop_name = ‘Culture’
AND Frequents.drinker_name = Likes.drinker_name;
```
---
- **Join Example**
```SQL
SELECT * FROM CoffeeShop
LEFT JOIN Frequents
ON CoffeeShop.shop_name = Frequents.shop_name;
```
> By default, a `join` returns only tuples that satisfy the join condition.
> **Left outer `join`** returns all tuples from the left relation.
- **Original Tables**
- _CoffeeShop_ Table
| shop_name | address | license |
|-|-|-|
| Culture | King Blvd. | 201 |
| Interzone | Monroe Ave. | 302 |
- _Frequents_ Table
| drinker_name | shop_name |
|-|-|
| Jonn | Culture |
- **Query Result**
| shop_name | address | license | drinker_name | shop_name |
|-|-|-|-|-|
| Culture | King Blvd. | 201 | John | Culture |
| Interzone | Monroe Ave. | 302 | `null` | `null` |
> Right (full) outer join retains all tuples from the right (all) relation.
<br/>
## Nested Query
1. Queries that have another query in their `WHERE` clause
2. The queries that appear in the `WHERE` clause are called **subqueries**.
- ✰ Example Schema:
```C++
Coffee(coffee_name, producer)
CoffeeShop(shop_name, address, license)
CoffeeDrinker(drinker_name, address, phone)
Likes(drinker_name, coffee_name)
Sells(shop_name, coffee_name, price)
Frequents(drinker_name, shop_name)
```
<br/>
> - **++Question 1++**
> Using `Sells(shop_name, coffee_name, price)`, find the coffee shops that serve **Costa** for the ++same price++ that **Culture** charges for **Kenya**.
- Step 1: Find **Culture**’s price for **Kenya** _(Culture-Kenya Price)_
```SQL
SELECT price FROM Sells WHERE shop_name = 'Culture' AND coffee_name = 'Kenya';
```
- Step 2: Find shops that sell **Costa** at that price **(Nested Query)**
```SQL
SELECT shop_name FROM Sells
WHERE coffee_name = 'Costa' AND
price = (
SELECT price FROM Sells
WHERE shop_name = 'Culture'
AND coffee_name = 'Kenya'
);
```
---
<br/>
> - **++Question 2.a++**
> Using `Sells(shop_name, coffee_name, price)`, find the coffee shops that serve **Kenya** for a _cheaper price_ than the price that ==all== coffee shops charge for **Costa**.
- Step 1: Find the set of all prices for **Costa**.
- Step 2: Find the shops that offer **Kenya** at a cheaper price than all values in ++Step 1++.
```SQL
SELECT shop_name FROM Sells
WHERE coffee_name = 'Kenya' AND
price < ALL (SELECT price FROM Sells WHERE coffee_name = 'Costa');
```
> - **++Question 2.b++**
> Using `Sells(shop_name, coffee_name, price)`, find the coffee shops that serve **Kenya** for a cheaper price than the price that ==at least one== coffee shop charges for **Costa**.
- Step 1: Find the set of all prices for **Costa**.
- Step 2: Find the shops that offer **Kenya** at a _cheaper_ price than ++at least one value++ in ++Step 1++.
```SQL
SELECT shop_name FROM Sells
WHERE coffee_name = 'Kenya' AND
price < ANY (SELECT price FROM Sells WHERE coffee_name = 'Costa');
```
:::spoiler **📓 `ALL`, `ANY` clause** {state='close'}
<br/>
- They are used to **compare a value with a set of values**.
- `ALL`: Condition is true only if the comparison holds for all values in the set.
> Equivalent to logical **AND**
- `ANY`: Condition is true if the comparison holds for **at least one value** in the set.
> Equivalent to logical **OR**
:::
---
<br/>
> - **++Question 3++**
> Using `Coffee(coffee_name, producer)` and `Likes(drinker_name, coffee_name)` find the producers of the coffee brands **John** likes.
- Step 1: Find the set of coffee brands that **John likes**.
- Step 2: Then find the producers of those coffee brands.
```SQL
SELECT producer FROM Coffee
WHERE coffee_name
IN (SELECT coffee_name FROM Likes WHERE drinker_name = 'John');
```
:::spoiler **📓 `IN`, `NOT IN` clause** {state='close'}
<br/>
- `IN`: Checks whether a value belongs to a **set returned by a subquery**.
- `NOT IN`: To check if the result of a subquery ++**does not**++ contains a particular value.
:::
---
<br/>
> - **++Question 4++**
> Using `Coffee(coffee_name, producer)`, find the coffee brands that are the only brand made by their producers.
- Brand `c` is the only brand made by producer `p` ⇔ There does **NOT** exist another brand made by `p` with `name ≠ c`
```SQL
SELECT coffee_name FROM Coffee c
WHERE NOT EXISTS (
SELECT * FROM Coffee c2
WHERE c2.producer = c.producer AND c2.coffee_name <> c.coffee_name
);
```
:::spoiler **📓 `EXISTS`, `NOT EXISTS` clause** {state='close'}
<br/>
- `EXISTS`: Checks whether a subquery returns **at least one tuple**.
- `NOT EXISTS`: Checks that a subquery returns **no tuples**.
:::
<br/>
## Aggregation Function
1. Used to **compute summary values** over a set of tuples.
2. Operates on one or more attributes to produce a **single value** per group or for the whole table.
### Standard Functions
| Function | Description |
| -------- | --------------------------- |
| `COUNT` | Number of tuples |
| `SUM` | Sum of attribute values |
| `AVG` | Average of attribute values |
| `MIN` | Minimum value |
| `MAX` | Maximum value |
### Advanced Functions
| Function | Description |
| ---------- | ---------------------------- |
| `VARIANCE` | Variance of numeric values |
| `STDDEV` | Standard deviation |
| `REGR_*` | Linear regression statistics |
:::warning
- Aggregation ignores `NULLs` by default (except `COUNT(*)`).
- Often used with `GROUP BY` to aggregate per group instead of the whole table.
:::
<br/>
- ✰ Example Schema:
```C++
Coffee(coffee_name, producer)
CoffeeShop(shop_name, address, license)
Sells(shop_name, coffee_name, price)
```
<br/>
> - **++Question 1++**
> Using `Coffee(coffee_name, producer)`, find the ++number++ of all coffee brands.
```SQL
Select COUNT(coffee_name) FROM Coffee
```
<br/>
> - **++Question 2++**
> Using `Sell(shop_name, coffee_name, price)`, find the ++number++ of all coffee shops.
- Sell might have multiple tuples with the same shop_name! (duplicates)
- `DISTINCT shop_name` ensures each shop is counted once.
```SQL
SELECT COUNT(DISTINCT shop_name) FROM Sell;
```
:::info
`DISTINCT`:A modifier → used to remove duplicate values before aggregation.
:::
<br/>
> - **++Question 3++**
> Using `Sells(shop_name, coffee_name, price)` find the ++minimum++ price of each coffee brand.
- Step 1: Group tuples with the same brand.
- Step 2: Compute `MIN` over the prices in each group.
```SQL
SELECT coffee_name, MIN(price) AS min_price
FROM Sells GROUP BY coffee_name;
```
> 1. `GROUP BY coffee_name` → groups tuples with the same coffee brand
> 2. `MIN(price)` → computes the minimum price within each group
> 3. `AS` → Rename the `MIN(price)` output attribute as '**min_price**'
:::info
**`GROUP BY`**:Groups tuples that have the same values on specified attributes, and applies aggregation functions **within each group**.
1. Can use **multiple attributes**.
2. In the `SELECT` clause, every attribute must be **either**
- an aggregated value (e.g., `MIN`, `COUNT`), or
- an attribute listed in the `GROUP BY` clause.
- ❌ ==Wrong Example==
> There is **no single value** of `shop_name` for the group.
> Therefore, it must be added to `GROUP BY` or removed / aggregated.
```SQL
SELECT coffee_name, MIN(price), shop_name FROM Sells GROUP BY coffee_name;
```
:::
<br/>
> - **++Question 4++**
> Using `Sells(shop_name, coffee_name, price)` find the ++minimum++ price of each coffee brand **sold in _Interzone_ or _Ava_**.
```SQL
SELECT coffee_name, MIN(price) AS minprice FROM Sells
WHERE shop_name = 'Interzone' OR shop_name = 'Ava'
GROUP BY coffee_name;
```
> 1. `WHERE` filters tuples to only those sold by _Interzone_ or _Ava_
> 2. Remaining tuples are grouped by `coffee_name`
> 3. `MIN(price)` is computed within each group
<br/>
> - **++Question 5++**
> Using `Sells(shop_name, coffee_name, price)`, find the ++minimum++ price of each brand **whose ++maximum++ price is less than 11**.
- **❌ Invalid**
```SQL
SELECT coffee_name, MIN(price) FROM Sells
WHERE MAX(price) < 11 -- invalid
GROUP BY coffee_name;
```
- **✅ Valid**
```SQL
SELECT coffee_name, MIN(price) AS minprice FROM Sells
GROUP BY coffee_name
HAVING MAX(price) < 11;
```
:::info
**`HAVING`**:Filter Groups (After `GROUP BY`)
1. `WHERE` **cannot** use aggregation functions (e.g., `MAX`, `MIN`)
2. ++Aggregated conditions++ must be written in `HAVING`
3. `HAVING` filters **groups**, not individual tuples
:::
<br/>
> - **++Question 5++**
> Using `Sells(shop_name, coffee_name, price)`, find the ++minimum++ price of each brand **served in more than three coffee shops**.
```SQL
Select coffee_name, Min(price) As minprice From Sells
Group By coffee_name
Having Count(shop_name) > 3;
```
<br/>
## Set Operation
1. Set operations treat query results as **sets of tuples**.
2. Relations involved must be **union-compatible**.
- (same number of attributes, compatible domains)
### Basic Operation
| Operation | Meaning |
|---------|--------|
| `R UNION S` | Tuples that appear in **R or S** |
| `R INTERSECT S` | Tuples that appear in **both R and S** |
| `R EXCEPT S` | Tuples in **R but not in S** _(Not supported in some system)_ |
```SQL
SELECT a FROM R WHERE a NOT IN ( SELECT a FROM S ); -- to simulate `EXCEPT`
```
### Duplicates Operation
> Set operations **remove duplicates by default**
> Using `ALL` **keeps duplicates**
| Operation | Default | With ALL |
|---------|--------|----------|
| `UNION` | removes duplicates | `UNION ALL` keeps duplicates |
| `INTERSECT` | removes duplicates | `INTERSECT ALL` keeps duplicates |
| `EXCEPT` | removes duplicates | `EXCEPT ALL` keeps duplicates |
<br/>
- ✰ Example Schema:
```C++
Pair(parent, child)
```
<br/>
> - **++Question 1++ (Warmup)**
> Using `Pair(parent, child)`, find **grandchildren of Joe**.
```SQL
SELECT p2.child FROM Pair p1, Pair p2
WHERE (p1.parent = 'Joe') AND (p1.child = p2.parent);
```
<br/>
> - **⭐ ++Question 2++ (Messedup)**
> Using `Pair(parent, child)`, find **all descendants of Joe**.
```SQL
SELECT p1.child FROM Pair p1 -- first generation
WHERE (p1.parent == 'Joe');
UNION
SELECT p2.child FROM Pair p1, Pair p2 -- second generation
WHERE (p1.parent = 'JOE') AND (p1.child = p2.parent);
UNION
SELECT p3.child FROM Pair p1, Pair p2, Pair p3 -- third generation
WHERE (p1.parent = 'JOE') AND (p1.child == p2.parent) AND (p2.child = p3.parent);
UNION
... -- #sub-queries = #generations descending from Joe
```
> One sub-query per each generation,
> but we don't know #generations descendinf from _Joe_.
> Query must **==recursively== traverse** parentship relation to find all descendants.
<br/>
## MySQL Common Table Expression (CTE)
### Theory of Limitations
- **Basic SQL** does **not support recursive queries**
- Recursive queries cannot be expressed in:
- **Relational Algebra (RA)**
- **Relational Calculus (RC)**
- **Non-recursive Datalog**
<br/>
### Beyond Basic SQL
- More recent SQL standards introduce **limited recursion**
- These features go **beyond RA and RC**
- Support varies across systems:
- MySQL: ==recursive `WITH` (CTE)==
- Oracle: `CONNECT BY`
---
<br/>
### Common Table Expression (CTE)
1. A **CTE** is a temporary relation (relation variable)
2. Exists **only within the scope of a single query**
3. Acts like a **virtual view**
4. Defined using `WITH`
- ✰ Example
```sql
WITH
cte1 AS (SELECT a, b FROM table1)
cte2 AS (SELECT a, b FROM table2)
SELECT cte1.b, cte2.d
FROM cte1 JOIN cte2 ON cte1.a = cte2.c;
```
> CTE can be referenced in the main query, but it disappears after the query ends.
- ✰ Recursive Query Example ①
```sql
WITH
RECURSIVE NUM(n) AS (
SELECT 1 -- base
UNION ALL
SELECT n + 1 FROM NUM WHERE n < 5 -- recursion step
)
SELECT * FROM NUM;
```
<br/>
- ✰ Recursive Query Example ②
> Using `Employ(id, manager_id)` produce management chain of every employee.
> - Each employee ⇒ ids of his/her managers in the order of management
```sql
WITH RECURSIVE CHAIN(id, path) AS (
-- Base case (find top managers)
SELECT id, CAST(id AS CHAR(200)) FROM Employ WHERE manager_id IS NULL
UNION ALL
-- Recursive step (concat path)
SELECT e.id, CONCAT(c.path, ' → ', e.id)
FROM CHAIN c JOIN Employ e ON c.id = e.manager_id
)
SELECT * FROM CHAIN;
```
- **Input Schema**
| id | manager_id | |
| --- | ---------- | ------------- |
| 1 | NULL | ← top manager |
| 5 | 1 | |
| 6 | 1 | |
| 101 | 5 | |
| 102 | 5 | |
| 103 | 6 | |
- **Output Schema**
| id | path |
| --- | ------- |
| 1 | 1 |
| 5 | 1 → 5 |
| 6 | 1 → 6 |
| 101 | 1 → 5 → 101 |
| 102 | 1 → 5 → 102 |
| 103 | 1 → 6 → 103 |
<br/>
:::success
- **Non-recursive CTE** → improves readability and modularity
- **Recursive CTE** → enables queries beyond RA and RC
:::
<br/>
### Recursion Support
- **Most systems provide only limited recursion**
- **MySQL**:
- Only **one recursive reference** to the CTE
- Recursive step can appear **only in FROM clause → linear recursion**
- **Cannot use `GROUP BY` / Aggregation** in recursive step
- **Other systems (e.g., SQL Server, Oracle)**
- Allow more constructs
- Can support more complex recursion
| Type | Meaning | Example |
|-|-|-|
| Linear recursion | Recursive step references **one previous result** | Management chain |
| Non-linear recursion | Recursive step references **multiple previous results** | Combinatorial functions like `n choose k` |
<br/>
## Conclusion
- 1️⃣ SQL is a **relational query language**, implementing RA, RC, and Datalog.
- Basic query format: `SELECT … FROM … WHERE …`
- `WHERE` filters tuples, `SELECT` chooses columns.
- 2️⃣ Supports **single-table queries**, **multi-table joins**, **select all (`*`)**, and **condition**.
- 3️⃣ Supports **subqueries** with `IN/NOT IN`, `EXISTS/NOT EXISTS`, `ALL/ANY`.
- 4️⃣ Supports **aggregation functions**: `COUNT`, `SUM`, `AVG`, `MIN`, `MAX`
- `DISTINCT` removes duplicates
- `GROUP BY` groups tuples
- `HAVING` filters groups **after aggregation**
- 5️⃣ Supports **set operations**: `UNION`, `INTERSECT`, `EXCEPT`
- Default removes duplicates; `ALL` keeps duplicates
- `EXCEPT` can be simulated using nested queries with `NOT IN`
- 6️⃣ **Common Table Expressions (CTE)**: temporary relations within a single query
- Non-recursive: improves readability
- Recursive: defines a relation in terms of itself
- 7️⃣ **Recursive CTE**: base case + recursive step
- Example: management chain or number sequences
- `CAST` may be needed to convert numeric values for string concatenation
- 8️⃣ **Limitations of basic SQL**: recursion is not supported
- Advanced SQL standards provide **limited recursion**
- MySQL: linear recursion, single recursive reference, no aggregation in recursive step
- 9️⃣ **Recursion types**:
- Linear recursion → one previous result produces the next
- Non-linear recursion → multiple previous results produce the next
- 🔟 Database systems vary in recursion support and may limit recursion complexity.
<br/>
:::spoiler Relevant Resource
[SQL Tutorial](http://w3schools.com/sql/)
[Ramakrishnan Database Management Systems](https://raw.githubusercontent.com/pforpallav/school/master/CPSC404/Ramakrishnan%20-%20Database%20Management%20Systems%203rd%20Edition.pdf)
:::