# SQL System Notes
This article has not completed yet and the update is not in the progress currently. I will finish this maybe few months later.
## ER Model
*unfinished*
## Relational Data Model
### Definition
#### Domain
* A set of atomic value
* data type and format
* e.g., name: The set of character strings
#### Relation Schema
* Definition:
* Has a relation name R
* Has a list of attributes, A1, A2, ... An
* denoted by R(A1, A2, ..., An)
* Degree : The number of attributes n of its relation schema
* E.g., STUDENT(Name, ID, Age) or STUDENT(Name: string, ID: string, Age: integer)
* degree = 7
* refer to attributes by their positions
* 3rd attribute is Age
#### Relation State
* Definition:
* A set of r = {t1, t2, ..., tm} of m n-tuples (schema: R(A1, A2, ..., An))
* denoted by r(R)
* t[Ai]: ith value in a tuple t (attribute Ai)
* e.g., 
#### Values and NULL in Tuples
* Each value is an atomic value
* composite and multivalued attributes are not allowed
* the model is also called "flat relational model"
* the assumption is called "first normal form assumption"
* NULL
* exist but we don't know
* don't know whether exist
* don't apply to a tuple
#### Relational Database Schema
* Definition
* a set of relation schemas S = {R1, R2, ... Rm} and a set of integrity constraints IC
#### Relational Database State
* Definition
* a set of relation states DB = {r1, r2, ..., rm}
* ri is a state of Ri
* ri satisfy all integrity constraints specified in IC
### Constraints
* (Inherent) model-based constraints
* Schema-based constraints
* Applicaiton-based or semantic constraints
#### Schema-Based Constraints
* Domain Constraints:
* Within each tuple, the value of each attribute must be an atomic value from the domain dom(A).
* Key Constraints:
* Key attribute of each tuple must be unique.
* Superkey
* subset of attributes to make this set unique
* key
* minimal subset of Superkey to make this set unique
* each of the keys is called a **candidate key**
* Primary Key
* one of the candidate keys
* choice of primary key depands on Dev
* Constraints on NULL
* Specifies whether NULL value is permitted for a particular attribute
* Entity Integrity Constraints
* no primary key value can be NULL

* Referential Integrity Constraints
* Foreign Key
* a set of attributes FK that references relation R2, if:
* FK and PK have the same domain(s)
* value of FK must occurs as value of PK or NULL

* can refer to its own relation

### Data Manipulations
* Modifications
* insert, delete, update
* Retrievals
#### Insertion
* Can violate
* Domain, key, Not null, Entity, Integrity, Referential Integrity
* When violate
* System usually rejects operation

#### Delete
* Can violate
* referential integrity (referenced side)
* When violate
* Restrict: reject
* Cascade: also recursively delete tuples that reference the tuple being deleted
* Set null/default

#### Update
* Update of non-key attributes
* Can violate
* Domain constraint, Not null
* When violate
* Restrict or correction
* Update of primary-key attributes
* Can violate
* Domain, key, entity Integrity, Referential Integrity (referenced side)
* When violate
* Restrict, cascade, set to null/default
* Update of foreign-key attributes
* Can violate
* Referential constraint (referencing side)
* When violate
* Restrict, set to null/default

## Basic SQL
#### Terminology: RDB vs SQL


#### Sublanguages
* DDL: data definition language
* DDL: data definition language
* SDL: storage definition language
* DML: data manipulation language
* DCL: data control language
* DQL: data query language
* DML: data manipulation language
#### Structure
* SQL Schema: Namesapces in SQL
* identified by a schema name
* owned by a specific group/user
* includes metadata
* CREATE SCHEMA
* SQL catalog
* a named collection of schemas
* CREATE CATALOG
* SQL Database
* a named collection of catalog of schemas
* CREATE DATABASE
* DBMS instance > Database >= Catalog >= Schema > Relations/Tables
#### SQL Commands
Main "Verbs" in SQL

* Understand your DBMS
* Connect (to a DBMS server)
* Help
* Show databases
* Get to your database (namespace)
* Create database/schema/catalog
* Use database/schema/catalog
* Drop database/schema/catalog (Delete)
* Show tables
* Create your DB schema (in database)
* Create table
* Alter table (Update)
* Drop table
* Describe table
* Access data
* Insert, delete, update, select (query)
* Most useful commands
* Connect
* Help
* Show
* Use
* Create
* Drop
* Insert
* Delete
* Update
* Select
* Alter
### Data Definition Language
#### Create Database
```
CREATE {DATABASE | SCHEMA} [IF NOT EXISTS] db_name [create_option] ...
create_option: [DEFAULT] {
CHARACTER SET [=] charset_name | COLLATE [=] collation_name | ENCRYPTION [=] {'Y' | 'N'}
}
```
* Example:
* `CREATE DATABASE STUDENT;`
* `CREATE SCHEMA DEPT IF NOT EXISTS;`
#### Create Table
```
CREATE TABLE PROJECT(
Pname VARCHAR(15) NOT NULL,
Pnumber INT NOT NULL,
Plocation VARCHAR(15),
Dnum INT NOT NULL,
PRIMARY KEY (Pnumber),
UNIQUE (Pname),
FOREIGN KEY (Dnum)
REFERENCES DEPARTMENT(Dnumber)
);
```
specify:
* Table name
* Attribute names
* Attribute domains
* Data Types


* CREATE DOMAIN: User Defined Attribute Types
* Not all DBMS support this! (Mostly only supported by large commercial DBMS)
* MySQL (as of v8.0) does not support this
```
CREATE DOMAIN PNUM_TYPE AS CHAR(9);
CREATE TABLE PROJECT(
Pname VARCHAR(15) NOT NULL,
Pnumber PNUM_TYPE NOT NULL,
Plocation VARCHAR(15),
Dnum INT NOT NULL,
PRIMARY KEY (Pnumber),
UNIQUE (Pname),
FOREIGN KEY (Dnum)
REFERENCES DEPARTMENT(Dnumber)
);
```
* Constraints (details are in the next section)

#### Constraints
```
CREATE TABLE PROJECT(
Pname VARCHAR(15) NOT NULL, // Attribute constraint: NOT NULL
Pnumber INT NOT NULL,
Plocation VARCHAR(15) DEFAULT ‘TAIPEI’, // Attribute constraint: DEFAULT
Dnum INT NOT NULL CHECK (Dnum > 0 AND Dnum < 21), // Attribute constraint: CHECK
PRIMARY KEY (Pnumber), // Key Constraints: PRIMARY KEY
UNIQUE (Pname), // Key Constraints: UNIQUE KEY
FOREIGN KEY (Dnum) // Referential Integrity Constraints: FOREIGN KEY
REFERENCES DEPARTMENT(Dnumber) // Referential Integrity Constraints: FOREIGN KEY
);
```
* Attribute Constraints
* NOT NULL
* DEFAULT
* CHECK
* limit the range of attribute values
* Key Constraints
* PRIMARY KEY:
* single attribute: specified after the attribute directly
* `Pnumber INT PRIMARY KEY NOT NULL`
* single or multiple attributes: specified at the end of attribution definition section
* ```
...
PRIMARY KEY (Pnumber, ...)
...
```
* UNIQUE
* the specified way is the same as PRIMARY KEY
* Referential Integrity Constraints
* FOREIGN KEY
* may be violated

* referential triggered action
* Specified with ON UPDATE or ON DELETE
* RESTRICT
* SET NULL
* SET DEFAULT
* CASCADE
* example
* constraints name: e.g., `CONSTRAINT EMPPK PRIMARY KEY (Ssn)`
```
CREATE TABLE EMPLOYEE ( . . . , Dno INT NOT NULL DEFAULT 1,
CONSTRAINT EMPPK PRIMARY KEY (Ssn),
CONSTRAINT EMPSUPERFK FOREIGN KEY (Super_ssn)
REFERENCES EMPLOYEE(Ssn) ON DELETE SET NULL ON UPDATE CASCADE,
CONSTRAINT EMPDEPTFK FOREIGN KEY(Dno)
REFERENCES DEPARTMENT(Dnumber) ON DELETE SET DEFAULT ON UPDATE CASCADE
);
CREATE TABLE DEPARTMENT ( . . . , Mgr_ssn CHAR(9) NOT NULL DEFAULT ‘888665555’, . . . ,
CONSTRAINT DEPTPK PRIMARY KEY(Dnumber),
CONSTRAINT DEPTSK UNIQUE (Dname),
CONSTRAINT DEPTMGRFK FOREIGN KEY (Mgr_ssn)
REFERENCES EMPLOYEE(Ssn) ON DELETE SET DEFAULT ON UPDATE CASCADE
);
CREATE TABLE DEPT_LOCATIONS ( . . . ,
PRIMARY KEY (Dnumber, Dlocation),
FOREIGN KEY (Dnumber) REFERENCES DEPARTMENT(Dnumber)
ON DELETE CASCADE ON UPDATE CASCADE)
;
```
* Semantic Constraints
* CHECK
```
CREATE TABLE t1(
CHECK (c1 <> c2),
c1 INT CHECK (c1 > 10),
c2 INT CONSTRAINT c2_positive CHECK (c2 > 0),
c3 INT CHECK (c3 < 100),
CONSTRAINT c1_nonzero CHECK (c1 <> 0),
CHECK (c1 > c3)
);
```
* usually supported only in large commercial DBMS
* MySQL (as of v8.0) supports CHECK format but does not process it
* Constraints in single table: single column, multi-column
* ASSERTION
```
CREATE ASSERTION SALARY_CONSTRAINT
CHECK ( NOT EXISTS (
SELECT * FROM EMPLOYEE E, EMPLOYEE M, DEPARTMENT D
WHERE E.Salary>M.Salary
AND E.Dno=D.Dnumber
AND D.Mgr_ssn=M.Ssn
));
```
* usually supported only in large commercial DBMS
* Constraints between multi-tables
#### Delete Schema: Drop
```
DROP {DATABASE | SCHEMA} [IF EXISTS] db_name;
// example
DROP TABLE departments RESTRICT;
```
* Can drop database, schema, catalog, table
* Restrict or cascade clause: decide what to do with affected objects
#### Change Schema: Alter
```
ALTER {DATABASE | SCHEMA} db_name alter_specification …;
// example
ALTER TABLE students DROP COLUMN grade CASCADE;
```
* Can add, drop column, constraint, index
* Can rename column or change the definition of column
## Relational Algebra and Advanced SQL
### Operations
#### Select (pick row)
* Syntax:
* 
* Meaning:
* Choose a subset of the tuples from a relation that satisfies a selection condition
* Op:
* and/or
* {=,<,≤,>,≥,≠}
* Selectivity:
* Ratio of tuple selected = |R’| / |R|
* E.g. (has commutativity)
* 
#### Project (pick column)
* Syntax:
* 
* Meaning:
* Selects certain columns from the table and discards the other columns.
* Op:
* no commutativity -> no and/or
* {=,<,≤,>,≥,≠}
* Degree:
* the number of attributes (column)
* Degree of R’ <= degree of R
* Cardinality
* 
* 
* E.g. (no commutativity)
* 
#### Rename
* Syntax:
* 
* Meaning:
* rename
* E.g. (combination of a rename and a select)
* 
#### Union
* Union compatibility
* Two relations R(A1,A2,...,An) and S(B1,B2,...,Bn) are said to be union compatible if:
* R and S have the same degree n
* dom(Ai) = dom(Bi) for 1 ≤ i ≤ n
* SQL operations
* UNION, INTERSECT, EXCEPT
* UNION ALL, INTERSECT ALL, EXCEPT ALL
* E.g.
* 
#### Join

* Theta join
* syntax:
* 
* meaning:
* combine related tuples from two relations into single “longer” tuples
* op:
* and
* {=,<,≤,>,≥,≠}
* join selectivity:
* Ratio of the cardinality of the join result and the cartesian product of the input relations
* 
* e.g.
* 
* 
* Equi join
* In \<join condition\>, only = is used
* Natural join
* similar to equi join, but
* remove one of the join attribute
* requires the same attribute name
* Semi join
* similar to natural join, but
* certain columns excluded (left, right)
* syntax:
* 
* Anti join
* similar to semi join, but
* includes as result only those tuples in R for which there is no tuple in S with an equal value on their common attribute names (left, right)
* syntax:
* 
* 
* Division
* 
* syntax:
* 
* e.g.
* 
#### Additional Relational Operations
* Generalized Projection
* allow functions on attributes as project attributes
* 
* Recursive closure
* e.g., find out supervisors of all employees recursively
* Aggregate functions and grouping
* find out summary information for each “group” of tuples in a relation
* syntax:
* 
* aggregate functions
* `SUM, AVERAGE, MAXIMUM, MINIMUM, COUNT`
* e.g.
* 
* Outer join
* those tuples not selected by JOIN conditions are also kept
* three types
* left, right, full
* Outer union
* Union between two relations that have some, but not all, attributes in common
### Advanced SQL and Complex Queries
#### Select and NULL
* Allows one to select tuples with NULL values with `IS/IS NOT`
* E.g.
* 
* `IS NULL`, not `= NULL`
* SQL consider every NULL to be different from other NULL
#### Three-Valued Logic
* Value:
* TRUE, FALSE, UNKNOWN
* Logic Op (and/or):
* 
* 
#### Complex Queries
* Nested queries
* IN Operator
* Test whether a value is in a set (explict set value, dynamic set value)
* e.g.
* 
* Op
* Ordinary comparison operatorS:
* {=,<,≤,>,≥,<>}
* Set member comparison operators:
* `IN`
* {=,>,<,>=,<=} combined with `SOME, ANY, ALL`
* E.g. `=SOME`, `>ALL`
* Note:
* SOME and ANY have the same effect
* `=SOME` and `=ANY` have the same effect as `IN`
* Dynamic set
* 
* 
* Multiple dynamic set
* 
* Nest Queries with SOME and ALL
* 
* Alias
* When two relations may use the same attribute names, alias becomes necessary
* 
* Correlated Nested Queries
* In `WHERE` clause, some attribute referenced from the outer query
* 
* Nested Queries flattening
* nested query using the `=` or `IN` can be expressed as a single block query
* 
* Exist
* check whether the result of a correlated nested query is empty or not
* The result of `EXISTS` is a Boolean value
* 
* Joined tables
* The following three select statements are the same
* 
* Outer joins
* 
* Aggregate functions
* `COUNT, SUM, MAX, MIN, AVG`
* 
* Group By and Having
* 
## Design Theory and Normalizaiton
### Mapping of ER Diagram


### RDB Design Guidelines
#### Make sure every relation has a clear meaning
#### Reducing redundant information in tuples
* update anomalies
* Insertion anomalies
* information being consistent across multiple tuples
* Example: Insert students of the same department
* Cannot insert before insert other information
* Example: cannot insert a new department if there is no student of that department yet
* Delete anomalies
* When delete tuple of one entity, we delete last piece of information about another entity
* Example: Delete the last student of the a department, we lose the last copy of information about that department
* Modification anomalies
* Update one piece of information, need update multiple tuples, or there will be inconsistency
* Example: If department has a new head, need to update many copies of D_HEAD attribute in STUDENT_DEPT relation
#### Reducing NULL values in tuples
* especially avoid “NOT-APPLICABLE” type of NULL
* E.g. If only 3% of students have offices, avoid “office” as an attribute of the STUDENT table
#### Avoid generating spurious tuples
* Avoid relations with matching attributes that are not a foreign key-primary key
* may produce spurious tuples by "join"
* Design proper relation schemas: can be joined with foreign-primary relationship
* Guarantee that the join does not generate spurious tuples
* Example of spurious tuples
* 
### Dependency
#### Functional Dependency (FD)
* Definition
* Let X and Y be two sets of attributes in a relation schema R, r is a relation state of R
* Denoted as X → Y
* For all r, for any two tuples t1 and t2 in r, if t1[X] = t2[X], then t1[Y] = t2[Y]
* Full functional dependency
* X → Y is a full functional dependency if
* removal of any attribute A from X means that the dependency does not hold anymore
* X → Y is a partial functional dependency if
* the dependency still holds after some removal
* Trivial FD
* if Y is a subset of X
#### Transitive Dependency (TD)
* X → Y is a transitive dependency if
* there exists a set of attributes Z in R such that
* X → Z and Z → Y
* Z is not
* a candidate key
* a subset of any key of R
#### Multivalued Dependency (MVD)
* Definition
* Let R be a relation schema, t be a tuple in R, X and Y be attribute subsets of R, and Z denote (R − (X ∪ Y))
* Denoted as X ↠ Y
* For all relation state r of R, there is a multivalued dependency X → Y if
* if t1[X] = t2[X], then
* t3[X] = t4[X] = t1[X] = t2[X]
* t3[Y] = t1[Y] and t4[Y] = t2[Y]
* t3[Z] = t2[Z] and t4[Z] = t1[Z]
* Example:
* 
#### Join Dependency (JD)
* Definition
* Assume R is a relation schema, r is a state of R
* Denoted as JD(R1, R2, … , Rn)
* Every legal state r of R has a **nonadditive join decomposition** into R1, R2, … , Rn
* That is, for every r, we have: 
* MVD is a special case of a JD with n = 2
* JD(R1, R2) implies an MVD (R1 ∩ R2) ↠ (R1 − R2)
* Trivial JD
* if some Ri = R
### Normal Form
#### Prime Attributes
* An attribute is prime attribute if it is a member of some candidate key of relation schema R
* An attribute is nonprime attribute if it is not a prime attribute
#### Two Checks for Normalization
* Nonadditive (or lossless) join property
* Guarantees that the spurious tuples does not occur with respect to the relation schemas created after decomposition
* Dependency preservation property
* Ensures that each meaningful functional dependency is represented in some individual relation resulting after decomposition
* If both cannot be kept, **Nonadditive > Dependency**
#### 1st Normal Form
* domain must include only **atomic values** (simple, indivisible values)
* The value of any attribute must be a **single value**
* cannot be
* Multivalued (array)
* Composite values
* Some other relation
#### 2nd Normal Form
* Every nonprime attribute A in R is **fully functionally dependent** on the primary key of R
* Example:
* 
* General Definition
* primary key -> any cadidate key
#### 3rd Normal Form
* no nonprime attribute of R is **transitively dependent** on the primary key
* Example:
* 
* General Definition
* whenever a nontrivial functional dependency X → A holds in R, either
* X is a superkey of R
* partial/transit dependency cannot exist
* A is a prime attribute of R
* Example:
* 
* 
* Alternative general Definition
* primary key -> any cadidate key
* (Equivalently) Every nonprime attribute of R is **fully functionally/nontransitively** dependent on every key of R
#### BCNF
* Whenever a nontrivial functional dependency X → A holds in R
* X is a superkey of R
* Example:
* 
* 3NF: true
* BCNF: false
* BCNF Decomposition (use the example above)
* 
* The best way depends on which one conforms to “Non-additive join property”
#### 4th Normal Form
* For every nontrivial multivalued dependency X ↠ Y
* X is a superkey for R
* Example: decompose EMP_INFO into EMP_PROJ and EMP_DEPEND
* 
* 
#### 5th Normal Form
* Definition
* For every nontrivial join dependency JD(R1, R2, … , Rn), every Ri is a **superkey** of R
## Database Programming
*unfinished*