# Relational Model :::info [TOC] ::: ![RM](https://hackmd.io/_uploads/HyauvW0N-x.png) <br/> ## Introduction The relational model is a foundational data model in database systems that represents data using tables (relations). It provides a simple, mathematically defined structure that supports precise querying and data integrity, and it forms the basis of SQL and modern relational databases. 1. Defines a model of data - Specifies how data is **structured, represented, and interpreted**. 2. Data → **a set of relations** - The entire database is viewed as a **collection of tables (relations)**, where each relation is a **set of tuples**. <br/> ### ★ Eample Relation Relation: ==Student(SID, Name, Major)== | SID | Name | Major | |-----|-------|--------| | 1 | Alice | CS | | 2 | Bob | EE | - Each row is a **tuple** - Each column is an **attribute** - The table as a whole is a **relation** - Duplicate rows are not allowed <br/> ## Properties 1. Each relation has **at least one key** → A **key** is a set of attributes whose values are **unique** across all tuples. 2. Attributes **without duplicate values** → No two tuples can share the same key value. - ✰ Example: ISBN _(International Standard Book Number)_ ++uniquely++ identifies each book. 3. Each tuple is **uniquely identifiable** → A key guarantees that every tuple can be distinguished from all others. > ++Reordering attributes/tuples++ does not change the relation. 4. Attribute domain → The **set of all possible values** that an attribute can take. - ✰ Example: The domain of attribute `price` is real numbers between `$0` and `$1000`. - Each attribute value is **indivisible** (e.g., integer, string). 5. Value of an attribute cannot be a relation → An attribute cannot contain a table or set of values. 6. **Schema (pattern)** of a relation → The **meta-data** that describes the structure of a relation. - Includes: - ✰ Example: **Book(ISBN, title, price, year)** - `Book`: Name of the relation - `ISBN, title, price, year...`: Names of attributes - `year > 0, ...`: Domains of attributes - Constraints on attributes (e.g., key, non-key) - Instance: _(Like an obj instance of class `List()` in Python)_ ```S [ (11, ‘AI Systems’, $102, 2001), (22, ‘Cell’, $201, 1954) ] ``` > A schema may have **numerous instances**, just like there can be many different tables. <br/> ## Relation Database (DB) A relational database is a **set of relations**. - **DB schema**: the set of **schemas of all relations** in the database - ✰ Example: - **Book(ISBN, title, price, year)** - **Category(CID, name)** - **Book-Category(CID, ISBN)** > Equal to structure of all tables. - **Instance of a DB schema**: a set of **instances of its relations** → The actual data stored in the database at a given time. > Means the actual data in all tables. <br/> ## Relational Languages Relational languages are formal languages used to **query and manipulate data** in a relational database. They provide a **mathematical foundation** for operations on relations. - Main types: 1. **Datalog**: logic-based, rule-driven language 2. **Relational algebra (RA)**: procedural, specifies *how* to retrieve 3. **Relational calculus (RC)**: declarative, specifies *what* to retrieve | Language | Type | What/How | Example Focus | |----------|------------|-----------|--------------------------| | **Datalog** | Logic | What | Rules / Facts / Queries | | **RA** | Procedural | How | Operations on tables | | **RC** | Declarative| What | Specify conditions only | :::info **RA** = **Safe Nonrecursive Datalog** = **Domain-Independent RC** ::: :::warning - Avoid using Rules that cannot be evaluated in a finite time (unsafe rules) - A rule is safe if every variable ++appear in at least one positive atom (body)++ ::: <br/> ### 1️⃣ Datalog (nonrecursive) Specify facts and rules, querying database using logic. - Each ++tuple++ in a DB is a ++**fact**++: - ✰ Example: ```S Movie(8,‘Godfather II’, 1974, 390) Actor(9, ’Robert De Niro’, 1943) ``` - Each ++query++ is a ++**rule**++: - ✰ Example ①: ```S Q1(y):- Movie(x, y, 1998, 20) ``` > (Movies that were produced in 1998 and made $20) - ✰ Example ②: ```S Q2(y):- Actor(x, y, z), Plays(t, x), Movie(t, v, w, 20) ``` > (Actors who played in a movie whose total gross is $20) - ✰ Example ③: ```S ExpensiveBook(B):- Book(B, _, P, _), P > 150 ``` > May use `=`, `<`, `>` in rules > (`B` is an ExpensiveBook if `B` is a Book and its price `P > 150`) - ✰ Example ④: ```S Q4(y):- Actor(x, y, z), Plays(t, x), Movie(t, v, w, 20) Q4(y):- Actor(x, y, z), Plays(t, x), Movie(t, v, 1990, h) ``` > ++Union of rules++ > (Actors who played in a movie with gross of $20 or a movie made in 1990) - ✰ Example ⑤: ```S U5(x, y, z):- Actor(x, y, z), Plays(t, x), Plays(t, f), Actor(f, 'William', g) Q5(y):- Actor(x, y, z), not U5(x,y,z) ``` > ++Using negation++ (All actors who did not play in a movie with "William") :::success ==Head `:-` Body== acts like implication (`←`) in logic ==Atom== is the basic fact in the **body**. ::: <br/> ### 2️⃣ Relational Algebra (RA) Algebraic view on representing queries, performing operations on relations and returns another relation. - **Key operators**: | Operation | Operator | Description | |-|-|-| | Selection (filter tuples) | `σ` | Select a subset of rows from relation | | Projection (select columns) | `π` | Select desired columns from relation | | Join | `⋈` | Combine tuples from two relations based on a condition | | Intersection | `∩` | Tuples that appear in **both** relations | | Union | `∪` | Tuples in reln. 1 and in reln. 2. | | self-Difference | `-` | Tuples in reln. 1, but not in reln. 2 | | Cartesian Product (cross-product) | `×` | Combine two relations | <br/> - ✰ Example ①: $$π_{title} (σ_{price>150} (Book))$$ > (Find titles of books priced over 150) - ✰ Example ②: $$Actor_{⋈} (𝐴𝑐𝑡𝑜𝑟.𝑎𝑖𝑑 =𝑃𝑙𝑎𝑦𝑠.𝑎𝑖𝑑) Play$$ > Perform a **join** between `Actor` and `Play` where `Actor.aid = Plays.aid`. > More meaningful than cross-product :::success **Union-Compatible**: - Two input relations have same number of attributes - Corresponding attributes have the same domain (type) - The output has the same schema as input ::: <br/> ### 3️⃣ Relational Calculus (RC) Describe properties of the desired result without specifying computation steps and apply **first-order logic**. 1. It is a **Declarative** Query Language. 2. SQL and RA can theoretically be **mapped to RC**. 3. It doesn't describe "++how to do it++", but only "++what conditions the answer must meet.++" > Query results = All variable assignments that make the logical formula **True** <br/> - **Key Predicates**: | Relational Predicate | Meaning / Description | |-|-| | `Atom` | A basic fact or relation, e.g., `Actor(x, y, z)` | | `P ∧ P` (AND) | **Conjunction**: both predicates must be true | | `P ∨ P` (OR) | **Disjunction**: either predicate can be true | | `P ⇒ P` | **Implication**: if first predicate is true, second must be true | | `not(P)` | **Negation**: predicate is false | | `∀x.P` | **Universal quantifier**: P holds for all x | | `∃x.P` | **Existential quantifier**: P holds for at least one x | | `Query Q(x1, …, xn)` | A query is a predicate over selected variables | :::spoiler **Quantifier Negation** <br/> | Original | Equivalent | Intuitive Meaning | |-|-|-| | ¬∀x P(x) | ∃x ¬P(x) | **Not all satisfy P → At least one does not satisfy P** | | ¬∃x P(x) | ∀x ¬P(x) | **No element satisfies P → All tuples do not satisfy P** | ::: <br/> - Two Forms of RC: | Form | Name | Description | |-|-|-| | **TRC** | Tuple Relational Calculus | The variable represents a "Tuple" | | **DRC** | Domain Relational Calculus | The variable represents the "Field Value" | <br/> > ⦿ DRC Syntax > `Q(x1, x2, ..., xn) := P(x1, x2, ..., xn)` > - `Q(...)`: Query result schema > - `P(...)`:: Logical Conditions <!-- $$ \{ t.title \mid Book(t) \wedge t.price > 150 \} $$ $$ \{ t \mid \exists ISBN, price, year (Book(ISBN, t, price, year) \wedge price > 150) \} $$ <br/> --> - ✰ Example ①: ```S Q(name) := ∃a, b, m, t, y (Actor(a, name, b) ∧ Plays(m, a) ∧ Movie(m, t, y, 2000)) ``` > (Actors who played in a movie with total gross of $2,000) - ✰ Example ②: ```S Q(name) := ∃a, b (Actor(a, name, b) ∧ ∀m ( Plays(m, a) ⇒ ∃t, g (Movie(m, t, 1990, g)) )) ``` > (All the films the actor has starred in were produced in 1990) > `∀m ( Plays(m, a) ⇒ ... )` → ++If `a` played `m`, then `m` must have been born in 1990++ - ✰ Example ③: ```S Q(name) := ∃a, b, m (Actor(a, name, b) ∧ Plays(m, a) ∧ ¬∃a2 ((a2 ≠ a) ∧ Plays(m, a2))) ``` > (Find the actors who have acted in "one-man show films") > `Plays(m, a)` → `a` is in movie `m` > `(Not Exist) ¬∃a2 ≠ a ...` → There are no other actors who also acted in `m` > Hence, `m` has only one actor. - ✰ Example ④: **(Negation)** > Negation must be restricted to the known relation. ```S Q(name) := ∃a, b (Actor(a, name, b) ∧ ¬∃m (Plays(m, a) ∧ ∃f, g (Actor(f, 'Sharon', g) ∧ Plays(m, f)))) ``` > (Actors who have never worked with "Sharon") - ✰ Example ⑤: ==(Unsafe / Domain-Dependent RC)== ```S ❌ Q(x) := not Actor(x, y, z) ``` > `x` can be any value in any universe (infinite), > Result in an infinite number of answers. - ✰ Example ⑥: **(Correspondence between ++RC++ and ++SQL++)** - _**RC**_ ```R ∃a, b, m, t, y (Actor(a, name, b) ∧ Plays(m, a) ∧ Movie(m, t, y, 2000)) ``` - _**SQL**_ ```SQL SELECT A.name FROM Actor A, Plays P, Movie M WHERE A.aid = P.aid AND P.mid = M.mid AND M.gross = 2000 ``` > SQL WHERE == RC Predicate <br/> ## Conclusion - The relational model provides a strong theoretical foundation for databases. - **SQL** is translated internally into **RA**. - **Datalog** and **RC** are widely used for reasoning, constraints, and knowledge representation. - **RA, safe nonrecursive Datalog, and domain-independent RC are equivalent in expressive power**. <br/> :::spoiler Relevant Resource [Ramakrishnan Database Management Systems](https://raw.githubusercontent.com/pforpallav/school/master/CPSC404/Ramakrishnan%20-%20Database%20Management%20Systems%203rd%20Edition.pdf) <!-- [SQL Tutorial](http://w3schools.com/sql/) --> :::