SQL System - HackMD

# SQL System Notes This article has not completed yet and the update is not in the progress currently. I will finish this maybe few months later. ## ER Model *unfinished* ## Relational Data Model ### Definition #### Domain * A set of atomic value * data type and format * e.g., name: The set of character strings #### Relation Schema * Definition: * Has a relation name R * Has a list of attributes, A1, A2, ... An * denoted by R(A1, A2, ..., An) * Degree : The number of attributes n of its relation schema * E.g., STUDENT(Name, ID, Age) or STUDENT(Name: string, ID: string, Age: integer) * degree = 7 * refer to attributes by their positions * 3rd attribute is Age #### Relation State * Definition: * A set of r = {t1, t2, ..., tm} of m n-tuples (schema: R(A1, A2, ..., An)) * denoted by r(R) * t[Ai]: ith value in a tuple t (attribute Ai) * e.g., ![](https://i.imgur.com/wtrRHJR.png) #### Values and NULL in Tuples * Each value is an atomic value * composite and multivalued attributes are not allowed * the model is also called "flat relational model" * the assumption is called "first normal form assumption" * NULL * exist but we don't know * don't know whether exist * don't apply to a tuple #### Relational Database Schema * Definition * a set of relation schemas S = {R1, R2, ... Rm} and a set of integrity constraints IC #### Relational Database State * Definition * a set of relation states DB = {r1, r2, ..., rm} * ri is a state of Ri * ri satisfy all integrity constraints specified in IC ### Constraints * (Inherent) model-based constraints * Schema-based constraints * Applicaiton-based or semantic constraints #### Schema-Based Constraints * Domain Constraints: * Within each tuple, the value of each attribute must be an atomic value from the domain dom(A). * Key Constraints: * Key attribute of each tuple must be unique. * Superkey * subset of attributes to make this set unique * key * minimal subset of Superkey to make this set unique * each of the keys is called a **candidate key** * Primary Key * one of the candidate keys * choice of primary key depands on Dev * Constraints on NULL * Specifies whether NULL value is permitted for a particular attribute * Entity Integrity Constraints * no primary key value can be NULL ![](https://i.imgur.com/wdNzY9C.png =200x) * Referential Integrity Constraints * Foreign Key * a set of attributes FK that references relation R2, if: * FK and PK have the same domain(s) * value of FK must occurs as value of PK or NULL ![](https://i.imgur.com/udqbOtQ.png) * can refer to its own relation ![](https://i.imgur.com/LP2aWk0.png =200x) ### Data Manipulations * Modifications * insert, delete, update * Retrievals #### Insertion * Can violate * Domain, key, Not null, Entity, Integrity, Referential Integrity * When violate * System usually rejects operation ![](https://i.imgur.com/MajmoDm.png =500x) #### Delete * Can violate * referential integrity (referenced side) * When violate * Restrict: reject * Cascade: also recursively delete tuples that reference the tuple being deleted * Set null/default ![](https://i.imgur.com/3uQWiIf.png =400x) #### Update * Update of non-key attributes * Can violate * Domain constraint, Not null * When violate * Restrict or correction * Update of primary-key attributes * Can violate * Domain, key, entity Integrity, Referential Integrity (referenced side) * When violate * Restrict, cascade, set to null/default * Update of foreign-key attributes * Can violate * Referential constraint (referencing side) * When violate * Restrict, set to null/default ![](https://i.imgur.com/XI7cFee.png =400x) ## Basic SQL #### Terminology: RDB vs SQL ![](https://i.imgur.com/EFnqaMS.png =400x) ![](https://i.imgur.com/NuTRFRw.png =400x) #### Sublanguages * DDL: data definition language * DDL: data definition language * SDL: storage definition language * DML: data manipulation language * DCL: data control language * DQL: data query language * DML: data manipulation language #### Structure * SQL Schema: Namesapces in SQL * identified by a schema name * owned by a specific group/user * includes metadata * CREATE SCHEMA * SQL catalog * a named collection of schemas * CREATE CATALOG * SQL Database * a named collection of catalog of schemas * CREATE DATABASE * DBMS instance > Database >= Catalog >= Schema > Relations/Tables #### SQL Commands Main "Verbs" in SQL ![](https://i.imgur.com/gpHOm3l.png) * Understand your DBMS * Connect (to a DBMS server) * Help * Show databases * Get to your database (namespace) * Create database/schema/catalog * Use database/schema/catalog * Drop database/schema/catalog (Delete) * Show tables * Create your DB schema (in database) * Create table * Alter table (Update) * Drop table * Describe table * Access data * Insert, delete, update, select (query) * Most useful commands * Connect * Help * Show * Use * Create * Drop * Insert * Delete * Update * Select * Alter ### Data Definition Language #### Create Database ``` CREATE {DATABASE | SCHEMA} [IF NOT EXISTS] db_name [create_option] ... create_option: [DEFAULT] { CHARACTER SET [=] charset_name | COLLATE [=] collation_name | ENCRYPTION [=] {'Y' | 'N'} } ``` * Example: * `CREATE DATABASE STUDENT;` * `CREATE SCHEMA DEPT IF NOT EXISTS;` #### Create Table ``` CREATE TABLE PROJECT( Pname VARCHAR(15) NOT NULL, Pnumber INT NOT NULL, Plocation VARCHAR(15), Dnum INT NOT NULL, PRIMARY KEY (Pnumber), UNIQUE (Pname), FOREIGN KEY (Dnum) REFERENCES DEPARTMENT(Dnumber) ); ``` specify: * Table name * Attribute names * Attribute domains * Data Types ![](https://i.imgur.com/lESebET.png) ![](https://i.imgur.com/fmkKhYT.png) * CREATE DOMAIN: User Defined Attribute Types * Not all DBMS support this! (Mostly only supported by large commercial DBMS) * MySQL (as of v8.0) does not support this ``` CREATE DOMAIN PNUM_TYPE AS CHAR(9); CREATE TABLE PROJECT( Pname VARCHAR(15) NOT NULL, Pnumber PNUM_TYPE NOT NULL, Plocation VARCHAR(15), Dnum INT NOT NULL, PRIMARY KEY (Pnumber), UNIQUE (Pname), FOREIGN KEY (Dnum) REFERENCES DEPARTMENT(Dnumber) ); ``` * Constraints (details are in the next section) ![](https://i.imgur.com/PnZzpku.png) #### Constraints ``` CREATE TABLE PROJECT( Pname VARCHAR(15) NOT NULL, // Attribute constraint: NOT NULL Pnumber INT NOT NULL, Plocation VARCHAR(15) DEFAULT ‘TAIPEI’, // Attribute constraint: DEFAULT Dnum INT NOT NULL CHECK (Dnum > 0 AND Dnum < 21), // Attribute constraint: CHECK PRIMARY KEY (Pnumber), // Key Constraints: PRIMARY KEY UNIQUE (Pname), // Key Constraints: UNIQUE KEY FOREIGN KEY (Dnum) // Referential Integrity Constraints: FOREIGN KEY REFERENCES DEPARTMENT(Dnumber) // Referential Integrity Constraints: FOREIGN KEY ); ``` * Attribute Constraints * NOT NULL * DEFAULT * CHECK * limit the range of attribute values * Key Constraints * PRIMARY KEY: * single attribute: specified after the attribute directly * `Pnumber INT PRIMARY KEY NOT NULL` * single or multiple attributes: specified at the end of attribution definition section * ``` ... PRIMARY KEY (Pnumber, ...) ... ``` * UNIQUE * the specified way is the same as PRIMARY KEY * Referential Integrity Constraints * FOREIGN KEY * may be violated ![](https://i.imgur.com/wIoofuH.png =500x) * referential triggered action * Specified with ON UPDATE or ON DELETE * RESTRICT * SET NULL * SET DEFAULT * CASCADE * example * constraints name: e.g., `CONSTRAINT EMPPK PRIMARY KEY (Ssn)` ``` CREATE TABLE EMPLOYEE ( . . . , Dno INT NOT NULL DEFAULT 1, CONSTRAINT EMPPK PRIMARY KEY (Ssn), CONSTRAINT EMPSUPERFK FOREIGN KEY (Super_ssn) REFERENCES EMPLOYEE(Ssn) ON DELETE SET NULL ON UPDATE CASCADE, CONSTRAINT EMPDEPTFK FOREIGN KEY(Dno) REFERENCES DEPARTMENT(Dnumber) ON DELETE SET DEFAULT ON UPDATE CASCADE ); CREATE TABLE DEPARTMENT ( . . . , Mgr_ssn CHAR(9) NOT NULL DEFAULT ‘888665555’, . . . , CONSTRAINT DEPTPK PRIMARY KEY(Dnumber), CONSTRAINT DEPTSK UNIQUE (Dname), CONSTRAINT DEPTMGRFK FOREIGN KEY (Mgr_ssn) REFERENCES EMPLOYEE(Ssn) ON DELETE SET DEFAULT ON UPDATE CASCADE ); CREATE TABLE DEPT_LOCATIONS ( . . . , PRIMARY KEY (Dnumber, Dlocation), FOREIGN KEY (Dnumber) REFERENCES DEPARTMENT(Dnumber) ON DELETE CASCADE ON UPDATE CASCADE) ; ``` * Semantic Constraints * CHECK ``` CREATE TABLE t1( CHECK (c1 <> c2), c1 INT CHECK (c1 > 10), c2 INT CONSTRAINT c2_positive CHECK (c2 > 0), c3 INT CHECK (c3 < 100), CONSTRAINT c1_nonzero CHECK (c1 <> 0), CHECK (c1 > c3) ); ``` * usually supported only in large commercial DBMS * MySQL (as of v8.0) supports CHECK format but does not process it * Constraints in single table: single column, multi-column * ASSERTION ``` CREATE ASSERTION SALARY_CONSTRAINT CHECK ( NOT EXISTS ( SELECT * FROM EMPLOYEE E, EMPLOYEE M, DEPARTMENT D WHERE E.Salary>M.Salary AND E.Dno=D.Dnumber AND D.Mgr_ssn=M.Ssn )); ``` * usually supported only in large commercial DBMS * Constraints between multi-tables #### Delete Schema: Drop ``` DROP {DATABASE | SCHEMA} [IF EXISTS] db_name; // example DROP TABLE departments RESTRICT; ``` * Can drop database, schema, catalog, table * Restrict or cascade clause: decide what to do with affected objects #### Change Schema: Alter ``` ALTER {DATABASE | SCHEMA} db_name alter_specification …; // example ALTER TABLE students DROP COLUMN grade CASCADE; ``` * Can add, drop column, constraint, index * Can rename column or change the definition of column ## Relational Algebra and Advanced SQL ### Operations #### Select (pick row) * Syntax: * ![](https://i.imgur.com/32FK1mW.png =x25) * Meaning: * Choose a subset of the tuples from a relation that satisfies a selection condition * Op: * and/or * {=,<,≤,>,≥,≠} * Selectivity: * Ratio of tuple selected = |R’| / |R| * E.g. (has commutativity) * ![](https://i.imgur.com/JzkWo1q.png =250x) #### Project (pick column) * Syntax: * ![](https://i.imgur.com/eVdEp9M.png =x25) * Meaning: * Selects certain columns from the table and discards the other columns. * Op: * no commutativity -> no and/or * {=,<,≤,>,≥,≠} * Degree: * the number of attributes (column) * Degree of R’ <= degree of R * Cardinality * ![](https://i.imgur.com/12XWKHw.png =x25) * ![](https://i.imgur.com/G5HjOKZ.png =x45) * E.g. (no commutativity) * ![](https://i.imgur.com/QKIaZkn.png =400x) #### Rename * Syntax: * ![](https://i.imgur.com/jShRxCd.png =x25) * Meaning: * rename * E.g. (combination of a rename and a select) * ![](https://i.imgur.com/n1n3I4Q.png =500x) #### Union * Union compatibility * Two relations R(A1,A2,...,An) and S(B1,B2,...,Bn) are said to be union compatible if: * R and S have the same degree n * dom(Ai) = dom(Bi) for 1 ≤ i ≤ n * SQL operations * UNION, INTERSECT, EXCEPT * UNION ALL, INTERSECT ALL, EXCEPT ALL * E.g. * ![](https://i.imgur.com/zjtVBTb.png =x150) #### Join ![](https://i.imgur.com/Q6Tham0.png) * Theta join * syntax: * ![](https://i.imgur.com/Ny8RT9R.png =x20) * meaning: * combine related tuples from two relations into single “longer” tuples * op: * and * {=,<,≤,>,≥,≠} * join selectivity: * Ratio of the cardinality of the join result and the cartesian product of the input relations * ![](https://i.imgur.com/ruvgubi.png =x80) * e.g. * ![](https://i.imgur.com/tnC1vfW.png =x40) * ![](https://i.imgur.com/LvCcQ3u.png =x120) * Equi join * In \<join condition\>, only = is used * Natural join * similar to equi join, but * remove one of the join attribute * requires the same attribute name * Semi join * similar to natural join, but * certain columns excluded (left, right) * syntax: * ![](https://i.imgur.com/X4EnxaP.png =x20) * Anti join * similar to semi join, but * includes as result only those tuples in R for which there is no tuple in S with an equal value on their common attribute names (left, right) * syntax: * ![](https://i.imgur.com/rgYhrSd.png =x18) * ![](https://i.imgur.com/839AbOM.png =x22) * Division * ![](https://i.imgur.com/SBGuTc7.png =x160) * syntax: * ![](https://i.imgur.com/NwwabkI.png =x18) * e.g. * ![](https://i.imgur.com/MRDMIJY.png =x120) #### Additional Relational Operations * Generalized Projection * allow functions on attributes as project attributes * ![](https://i.imgur.com/paYOZjE.png =x80) * Recursive closure * e.g., find out supervisors of all employees recursively * Aggregate functions and grouping * find out summary information for each “group” of tuples in a relation * syntax: * ![](https://i.imgur.com/dJH4rOE.png =x30) * aggregate functions * `SUM, AVERAGE, MAXIMUM, MINIMUM, COUNT` * e.g. * ![](https://i.imgur.com/MYGOsmY.png =x200) * Outer join * those tuples not selected by JOIN conditions are also kept * three types * left, right, full * Outer union * Union between two relations that have some, but not all, attributes in common ### Advanced SQL and Complex Queries #### Select and NULL * Allows one to select tuples with NULL values with `IS/IS NOT` * E.g. * ![](https://i.imgur.com/VgFyoTh.png =x60) * `IS NULL`, not `= NULL` * SQL consider every NULL to be different from other NULL #### Three-Valued Logic * Value: * TRUE, FALSE, UNKNOWN * Logic Op (and/or): * ![](https://i.imgur.com/tNcj1qW.png =x130) * ![](https://i.imgur.com/9jYttcG.png =x131) #### Complex Queries * Nested queries * IN Operator * Test whether a value is in a set (explict set value, dynamic set value) * e.g. * ![](https://i.imgur.com/JQY6GIH.png =x120) * Op * Ordinary comparison operatorS: * {=,<,≤,>,≥,<>} * Set member comparison operators: * `IN` * {=,>,<,>=,<=} combined with `SOME, ANY, ALL` * E.g. `=SOME`, `>ALL` * Note: * SOME and ANY have the same effect * `=SOME` and `=ANY` have the same effect as `IN` * Dynamic set * ![](https://i.imgur.com/ggP4r9r.png =x120) * ![](https://i.imgur.com/YQuXPq1.png =x110) * Multiple dynamic set * ![](https://i.imgur.com/R3XfBrt.png =x150) * Nest Queries with SOME and ALL * ![](https://i.imgur.com/dE3pxBD.png =x200) * Alias * When two relations may use the same attribute names, alias becomes necessary * ![](https://i.imgur.com/kz0Yd8q.png =x150) * Correlated Nested Queries * In `WHERE` clause, some attribute referenced from the outer query * ![](https://i.imgur.com/ceHR2p0.png =x120) * Nested Queries flattening * nested query using the `=` or `IN` can be expressed as a single block query * ![](https://i.imgur.com/1UwTBGH.png) * Exist * check whether the result of a correlated nested query is empty or not * The result of `EXISTS` is a Boolean value * ![](https://i.imgur.com/wq2ZfS2.png =x120) * Joined tables * The following three select statements are the same * ![](https://i.imgur.com/OqJktaS.png =x160) * Outer joins * ![](https://i.imgur.com/i8vXKdA.png =x80) * Aggregate functions * `COUNT, SUM, MAX, MIN, AVG` * ![](https://i.imgur.com/j2LAO7m.png =x120) * Group By and Having * ![](https://i.imgur.com/QT6wil8.png =x250) ## Design Theory and Normalizaiton ### Mapping of ER Diagram ![](https://i.imgur.com/k1vnXtO.png =600x) ![](https://i.imgur.com/oyoCmaO.png =600x) ### RDB Design Guidelines #### Make sure every relation has a clear meaning #### Reducing redundant information in tuples * update anomalies * Insertion anomalies * information being consistent across multiple tuples * Example: Insert students of the same department * Cannot insert before insert other information * Example: cannot insert a new department if there is no student of that department yet * Delete anomalies * When delete tuple of one entity, we delete last piece of information about another entity * Example: Delete the last student of the a department, we lose the last copy of information about that department * Modification anomalies * Update one piece of information, need update multiple tuples, or there will be inconsistency * Example: If department has a new head, need to update many copies of D_HEAD attribute in STUDENT_DEPT relation #### Reducing NULL values in tuples * especially avoid “NOT-APPLICABLE” type of NULL * E.g. If only 3% of students have offices, avoid “office” as an attribute of the STUDENT table #### Avoid generating spurious tuples * Avoid relations with matching attributes that are not a foreign key-primary key * may produce spurious tuples by "join" * Design proper relation schemas: can be joined with foreign-primary relationship * Guarantee that the join does not generate spurious tuples * Example of spurious tuples * ![](https://i.imgur.com/1MF4fFN.png =500x) ### Dependency #### Functional Dependency (FD) * Definition * Let X and Y be two sets of attributes in a relation schema R, r is a relation state of R * Denoted as X → Y * For all r, for any two tuples t1 and t2 in r, if t1[X] = t2[X], then t1[Y] = t2[Y] * Full functional dependency * X → Y is a full functional dependency if * removal of any attribute A from X means that the dependency does not hold anymore * X → Y is a partial functional dependency if * the dependency still holds after some removal * Trivial FD * if Y is a subset of X #### Transitive Dependency (TD) * X → Y is a transitive dependency if * there exists a set of attributes Z in R such that * X → Z and Z → Y * Z is not * a candidate key * a subset of any key of R #### Multivalued Dependency (MVD) * Definition * Let R be a relation schema, t be a tuple in R, X and Y be attribute subsets of R, and Z denote (R − (X ∪ Y)) * Denoted as X ↠ Y * For all relation state r of R, there is a multivalued dependency X → Y if * if t1[X] = t2[X], then * t3[X] = t4[X] = t1[X] = t2[X] * t3[Y] = t1[Y] and t4[Y] = t2[Y] * t3[Z] = t2[Z] and t4[Z] = t1[Z] * Example: * ![](https://i.imgur.com/MNhCf0X.png =x150) #### Join Dependency (JD) * Definition * Assume R is a relation schema, r is a state of R * Denoted as JD(R1, R2, … , Rn) * Every legal state r of R has a **nonadditive join decomposition** into R1, R2, … , Rn * That is, for every r, we have: ![](https://i.imgur.com/ikfdHaa.png =x20) * MVD is a special case of a JD with n = 2 * JD(R1, R2) implies an MVD (R1 ∩ R2) ↠ (R1 − R2) * Trivial JD * if some Ri = R ### Normal Form #### Prime Attributes * An attribute is prime attribute if it is a member of some candidate key of relation schema R * An attribute is nonprime attribute if it is not a prime attribute #### Two Checks for Normalization * Nonadditive (or lossless) join property * Guarantees that the spurious tuples does not occur with respect to the relation schemas created after decomposition * Dependency preservation property * Ensures that each meaningful functional dependency is represented in some individual relation resulting after decomposition * If both cannot be kept, **Nonadditive > Dependency** #### 1st Normal Form * domain must include only **atomic values** (simple, indivisible values) * The value of any attribute must be a **single value** * cannot be * Multivalued (array) * Composite values * Some other relation #### 2nd Normal Form * Every nonprime attribute A in R is **fully functionally dependent** on the primary key of R * Example: * ![](https://i.imgur.com/SPShZ9U.png =500x) * General Definition * primary key -> any cadidate key #### 3rd Normal Form * no nonprime attribute of R is **transitively dependent** on the primary key * Example: * ![](https://i.imgur.com/yrRwNhl.png =500x) * General Definition * whenever a nontrivial functional dependency X → A holds in R, either * X is a superkey of R * partial/transit dependency cannot exist * A is a prime attribute of R * Example: * ![](https://i.imgur.com/8ARNdhc.png =500x) * ![](https://i.imgur.com/B7kFHiP.png =500x) * Alternative general Definition * primary key -> any cadidate key * (Equivalently) Every nonprime attribute of R is **fully functionally/nontransitively** dependent on every key of R #### BCNF * Whenever a nontrivial functional dependency X → A holds in R * X is a superkey of R * Example: * ![](https://i.imgur.com/9AwHku0.png =200x) * 3NF: true * BCNF: false * BCNF Decomposition (use the example above) * ![](https://i.imgur.com/InVq5Vl.png =x70) * The best way depends on which one conforms to “Non-additive join property” #### 4th Normal Form * For every nontrivial multivalued dependency X ↠ Y * X is a superkey for R * Example: decompose EMP_INFO into EMP_PROJ and EMP_DEPEND * ![](https://i.imgur.com/f0W6B0b.png =x150) * ![](https://i.imgur.com/gLNEQtV.png =x100) #### 5th Normal Form * Definition * For every nontrivial join dependency JD(R1, R2, … , Rn), every Ri is a **superkey** of R ## Database Programming *unfinished*