tags: `Data Base`

第十章: Relational Database Design I

Measures of Quality

讓 attribute semantics 是清楚易懂的
減少重複訊息
減少 NULL 數量
不允許虛假的 tuple 出現的可能性

重複資訊會造成的異常現象(anomalies)

Anomalies 的問題
- 造成多餘的工作被執行
Anomalies 種類
- Example: EMP_PROJ(Emp#, Proj#, Ename, Pname,No_hours)
- Insertion anomaly: 有條件的新增值
  - Cannot insert a project unless an employee is assigned to it.
- Deletion anomaly: 刪除不能刪除的資料
  - When a project is deleted, it will result in deleting all the employees who work on that project.
- Update anomaly: 需要重複更新多筆一樣的資料
  - Changing the name of project number P1 from “Billing” to “Customer Accounting” may cause this update to be made for all 100 employees working on project P1.

NULL Values in Tuples

NULL 的問題
- 浪費儲存空間
- 造成資料沒有可讀性
使用 NULL 的時機
- Attribute 不適用或無效
  - e.g. 男生有沒有懷孕?
- Attribute value 是 unknown
  - e.g. 期末考成績還沒出來
- Value 存在，但不可用

Spurious(虛假) Tuples

JOIN 後產生錯誤的 tuple(無意義的)
lossless join 指保證 join 後出現的資料都是有意義的

Functional Dependencies(FDs)

功用: 能夠偵測 Spurious 與 anomalies
定義: 限制兩個 attribute setes 之間的關係
何謂FD?
- (if x 能決定唯一 y) => X FD Y
FD 用於推導現實世界的 attribute 的 constraint
E.g. SSN -> ENAME , {SSN, PUNMBER} -> HOURS

Use of FDs

使用 FDs 測試 relation 是否滿足 F of FDs set，如果滿足，則稱 r satisfies F

Inference Rules for FDs

Armstrong's inference rules:
- IR1(Reflexive) if Y subset-of X, then X->Y
- IR2(Augmentation) if X -> Y, then XZ -> YZ
  - (Notation: XZ stands for X \(\cup\) Z)
- IR3(Transitive) if X->Y and Y -> Z, then X -> Z
Armstrong's inference rules 完備了 inference rules
還有額外的 inference rules(可推導):
- Decomposition: if X -> YZ, then X -> Y and X -> Z
- Union: if X -> Y and X->Z, then X -> YZ
  - Prove:
  1. X → Y (given)
  2. X → Z (given)
  3. X → XY (using IR2 on 1 by augmentation with X. Where XX = X)
  4. XY → YZ (using IR2 on 2 by augmentation with Y)
  5. X → YZ (using IR3 on 3 and 4)
- Psuedotransitivity: if X -> Y and WY -> Z, then WX -> Z

Closure

Closure of set F of FDs => 從 F 可被推導的 FD
Closure of set F of attribute X => 所有可被 attribute X 的 attributes

Equivalence of Sets of FDs

if 所有 FD in F 可以被 G 推導
if 所有 FD in G 可以被 F 推導
- 兩個 sets pf FDs F and G are equivalent => F+ = G+
Definition(Covers):
- 如果 FD in G 可以被 F 推導 => F covers G
F cover G and G cover F => F and G 是等價

Minimal Sets of FDs(1)

定義:
- 等號右邊有一個 attribute
- 每個 FD 都不能拿走
- 不能用 R1 代替

Normalization of Relations

2NF, 3NF, BCNF
- based on keys and FDs of a relation schema
4NF
- based on keys, multi-valued dependencies: MVDs;
5NF
- based on keys, join dependencies: JDs
  大部分做到 3NF, BCNF, 4NF

要求目標

Nonadditive join property(lossless join)
Dependency perservation property

Promblem with Decompositions

normalize 的代價
- queries 時間變長
- denormalize 不一定變回原來的 schema
- tradeoff: these issue v.s. redunancy

Keys and Attributes

prime attribute 是指存在某些 candidate key 的 attribute
nonprime attribute 是指不存在任何 candidate key 裡的 attribute

First Normal Form

Disallows:
- composite attributes
- multivalue attributes
- nested relations: non-atomic attribute
method:
- 將重複的資料項分別儲存到不同的記錄中, 並加上適當的主鍵

Example:
multivalue resolve

nested relations

Second Normal Form

Uses the concepts of FDs, primary key
目標:
- 符合1NF
- 每個非主鍵屬性必須完全相依於主鍵
method:
- 分割資料表:將部分相依的欄位分割出去，另組成新的資料表
Definitions:
- Prime attribute: 指存在某些 candidate key 的 attribute
- Full functional dependency: 拿掉任何 attribute 的 Y 皆無法滿足 Y -> Z 的 FD
  - Example:
2NF: 所有 non-prime attribute A in R(table) 都是 fully functionally dependent on the primary key
R 可以藉由 2NF normalization 分解成 2NF relations

(以 candidate 為主軸，一個個把不同 prime attribute 的表組合起來)
(只最少的 attribute 就可以辨認出單一 attribute ，把它變成表)

Third Normal Form

目標:
- 符合 2NF
- 各個欄位與主鍵之間沒有遞移相依的關係
如何找出遞移關係
- 若要找出資料表中各欄位與「主鍵」之間的遞移相依性, 最簡單的方法就是從左到右掃瞄資料表中各欄位有沒有『與主鍵無關的相依性』存在
  - 可能的情況如下：
    1.如果有存在時，則代表有「遞移相依」的關係
    2.如果有不存在時，則代表沒有「遞移相依」的關係
method:
- 檢查有無遞移關係
- 將遞移關係分割出去，組成新的資料表
Transitive dependency: FD X -> Z 可以從 X -> Y and Y -> Z推導
如果表 R 是 2NF 且沒有 non-prime attribute A in R 是跟 pk 有 tansitively dependent 關係，稱之為 3NF
利用 3NF normalization 可做出 3NF relations
例如: X -> Y 且 Y -> Z 且 X \(\in\) PK，接下來考慮是否 Y \(\in\) candidate key
- 若 Y 是 candidate key, 則沒有 transitive dependency
- 若不是，把它變成另一個表，並以 Y 為 PK
- Example:
It is fully functionally dependent on every key of R.
- 跟每個 key 都是 FD
It is nontransitively dependent on every key of R.
- 沒有 transitively dependent

Normalization into 2NF and 3NF

Normal Forms Defined Informally

1NF: All attributes depend on the key
2NF: All attributes depend on the whole key
3NF: All attribute depend on nothing but the key

BCNF(Boyce-Codd Normal Form)

如果主鍵是由多個欄位組成，則進行 Boyce-Codd 正規化
目標:
- 符合 3NF 的格式
- 主鍵不可相依於其他非主鍵的欄位
當表 R 的 FD X -> A, X 是 R 的 superkey 時，是為 BCNF

Modeling Temporal Data

Temporal data 是增加時間區段
沒有 accepted standard
Temporal functional dependency: 假如 X 包含全部的 legal instance Y 的 snapshot ，則 X -> Y FD 存在於表 R 中。
在實務上，time attribute 會增加 start 與 end
FK 會參考到某個時間點的 tuple

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

tags: Data Base

第十章: Relational Database Design I

Measures of Quality

重複資訊會造成的異常現象(anomalies)

NULL Values in Tuples

Spurious(虛假) Tuples

Functional Dependencies(FDs)

Use of FDs

Inference Rules for FDs

Closure

Equivalence of Sets of FDs

Minimal Sets of FDs(1)

Normalization of Relations

要求目標

Promblem with Decompositions

Keys and Attributes

First Normal Form

Second Normal Form

Third Normal Form

Normalization into 2NF and 3NF

Normal Forms Defined Informally

BCNF(Boyce-Codd Normal Form)

Modeling Temporal Data

tags: `Data Base`