# NORMALIZATION
ok. here we go.
## 1NF
values in each cell atomic, and no repeated groups
(no Q1, Q2, Q3)
no duplicate columns
no significance to the order of rows or columns
in every row, each column must have a single value
so if we have
| recId | FirstName | LastName | Department | WorkshopNames |
| ----- | --------- | -------- | ---------- | ------------ |
| 1 | Marlon | Kuzmick | LL | Effective Presentation Tips, Leading Discussion |
| 2 | Pamela | Pollock | P&P | Effective Presentation Tips, Leading Discussion |
| 3 | Katie | Gilligan | LL | Effective Presentation Tips |
we have a problem, because the `WorkshopNames` field contains multiple, non-atomic values. (note, this is NOT what we have in AT when there's a multiple-select-link-to-another-table . . . that's a hidden-join table---we WOULD have this problem if we saved "Effective Presentation Tips, Leading Discussion" for Pamela as a simple string).
You might THINK that this would be ok . . .
| recId | FirstName | LastName | Department | Workshop1Name | Workshop2Name |
| ----- | --------- | -------- | ---------- | ------------ | --- |
| 1 | Marlon | Kuzmick | LL | Effective Presentation Tips | Leading Discussion |
| 2 | Pamela | Pollock | P&P | Effective Presentation Tips | Leading Discussion |
| 3 | Katie | Gilligan | LL | Effective Presentation Tips |
but no. That's not cool.
While it's true that we have 3 people who have registered, we actually have **5** instances of workshop-registration. But while you might THINK that we can do this:
| recId | FirstName | LastName | Department | WorkshopName |
| ----- | --------- | -------- | ---------- | ------------ |
| 1 | Marlon | Kuzmick | LL | Effective Presentation Tips |
| 2 | Marlon | Kuzmick | LL | Leading Discussion |
| 3 | Pamela | Pollock | P&P | Effective Presentation Tips |
| 4 | Pamela | Pollock | P&P | Leading Discussion |
| 5 | Katie | Gilligan | LL | Effective Presentation Tips |
We're actually going to learn that while that satisfies **1NF**, it fails to satisfy the requirements of **2NF** . . .
## 2NF
For **2NF**, no cell's value can be derived from just PART of a composite (candidate) key. So if we have a registration table that looks like this
| recId | FirstName | LastName | Department | WorkshopName |
| ----- | --------- | -------- | ---------- | ------------ |
| 1 | Marlon | Kuzmick | LL | Effective Presentation Tips |
| 2 | Marlon | Kuzmick | LL | Leading Discussion |
| 3 | Pamela | Pollock | P&P | Effective Presentation Tips |
| 4 | Pamela | Pollock | P&P | Leading Discussion |
| 5 | Katie | Gilligan | LL | Effective Presentation Tips |
then we have a problem, because while the Candidate Key for each row is `{FirstName, LastName, WorkshopName}` (i.e. that cluster of values is unique . . . there cannot be two rows with precisely the same values for those three), `Department` can be derived from `{FirstName, LastName}` alone, rather than the entire composite key. This can create problems, because someone could update the first instance of `Pamela Pollock` with a new `Department`, but not the second.
So we need a separate table for `People` or `Registrants` that contains their `Department` info.
| PersonID | FirstName | LastName | Department |
| -------- | --------- | -------- | ---------- |
| 00001 | Marlon | Kuzmick | LL |
| 00002 | Pamela | Pollock | P&P |
| 00003 | Katie | Gilligan | LL |
and then we can reference these people in our workshop-registration table:
| recId | Person | WorkshopName |
| ----- | --------- | -------- | ---------- | ------------ |
| 1 | 00001 | Effective Presentation Tips |
| 2 | 00001 | Leading Discussion |
| 3 | 00002 | Effective Presentation Tips |
| 4 | 00002 | Leading Discussion |
| 5 | 00003 | Effective Presentation Tips |
## 3NF
For 3NF to be satisfied, we need to ensure that there are no values that can be derived from any non-key field.
"A relation is in 3NF if at least one of the following condition holds in every non-trivial function dependency X –> Y:
X is a super key.
Y is a prime attribute (each element of Y is part of some candidate key)."
so to continue our example from above, if we have
| recId | Person | WorkshopName | Room |
| ----- | --------- | -------- | ---------- | ------------ |
| 1 | 00001 | Effective Presentation Tips | Sever 104 |
| 2 | 00001 | Leading Discussion | Sever 206 |
| 3 | 00002 | Effective Presentation Tips | Sever 104 |
| 4 | 00002 | Leading Discussion | Sever 206 |
| 5 | 00003 | Effective Presentation Tips | Sever 104 |
we have a problem, because we can actually derive `Room` from JUST `WorkshopName` alone, so we should actually have a separate workshops table
| recId | WorkshopName | Room |
| ----- | --------- | -------- |
| 1 | Effective Presentation Tips | Sever 104 |
| 2 | Leading Discussion | Sever 206 |
and then reference this in the registrations table
| recId | PersonID | WorkshopID |
| ----- | --------- | -------- | ---------- | ------------ |
| 1 | 00001 | 1 |
| 2 | 00001 | 2 |
| 3 | 00002 | 1 |
| 4 | 00002 | 2 |
| 5 | 00003 | 1 |
## AIRTABLE: DIFFERENCES AND TAKEAWAYS
Airtable's `Link to Another Record` "field" handles many operations that would typically be handled by join tables.
## SQL EQUIVS
```
CREATE TABLE WorkshopRegistrations (
RegID int NOT NULL,
OrderNumber int NOT NULL,
PersonID int,
PRIMARY KEY (RegID),
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID)
);
```