資料庫系統導論2023: 重點筆記

名詞解釋

基本概念

DBMS : 對DB進行操作和管理的系統/軟體包 (例如:SQL)
DB System : 由DBMS和資料本身組成
MetaData :
- 用來描述資料的資料，稱之為中繼資料 (Meta-data)
- 資料庫系統具有自我描述 (Self-Description) 的特性
- 包含資料的型態、儲存格式、關係及各種限制、實際儲存結構，ex: 循序檔、索引檔
Three-Schema Architecture :
- External Level : 一般使用者看到的資料
- Conceptual Level : 描述資料庫資料的整體架構和限制等
- Internal Level : 資料實際儲存的結構
Data Independence :
- Logical : 外部層與概念層獨立
- Physical : 外部層與內部層獨立，概念層與內部層獨立
Data Definition Language (DDL)
- 用於描述DB的conceptual schema
Data Manipulation Language (DML)
- 用於描述DB資料的更新與提取

Data Model種類

Network model
- 可描述一對一，一對多，多對多
- 父節點可能有多個
- 存取速度快，可達到資料獨立性
- 結構複雜，指標過多難理解，難以優化
Hierarchical model
- 樹狀階層，指標連接，建立相對簡單
- 可描述一對一，無法描述多對多
- 存取速度快，適合大量結構固定的資料
- 父節點被刪除，子節點也會被刪除
- 插入、刪除動作複雜
Relational model
- 關聯必無重複的紀錄，且記錄順序不重要，屬性順序也不重要
- 須遵守整合性限制 (個體整合性、參考整合性限制)
- 採宣告式查詢語言 (SQL)
- 不適用於大量資料處理
Object Relational model
- 融合了OOP與relational model概念的data model
- 有繼承的功能

ER Model

Entity : 特定的物件
Attribute : 用於描述entity的資訊
- atomic attribute (simple) : 不可繼續分割
- composite : 一個attribute由好幾個其他attribute組成
- multi-valued : 一個attribute可能有好幾個不同的值
Key Attribute : 具有獨特值的attribute
- 可以是composite
- 一個entity可以有好幾個key
Entity type : 具有相同基本attribute的entity進行分組成為entity type
Entity set : 每個entity type包含的複數個entity
Domain : attribute的value set，表示可容許的資料類型
Degree : 一個relationship type有多少個entity types參與
Relationship instance : 描述兩個特定entity的relationship
Relationship type V.S set :
- type表示relation的名稱與參與的entity
- set表示當下這個DB state中的relation instance
Cardinality Ratio : 指relation所關聯的entity數量限制
Weak Entity Types : 沒有key attribute，一定有一個identifying relationship
Binary V.S Ternary : Ternary = 3個1-N Binary，其中中間為一個weak entity

EER Model

Subclass V.S Superclass :
- subclass的entity繼承superclass的所有attribute和relationships
Specialization : 定義一個superclass的subclass set
Generalization : 找到多個subclass的共同屬性並定義出superclass
Disjointness constraints :
- Disjoint : 一個superclass中的entity最多只能是一個subclass的成員
- overlapping : 一個superclass中的entity可以是複數subclass的成員
Completeness constraints :
- Total : 每個superclass中的entity都必須是某些subclass的成員 (畫兩條線)
- Partial : superclass中的entity可以不屬於任何一個subclass (畫一條線)
Hierarchy (single inheritance) : subclass只有一個superclass
Lattice (multiple inheritance) : subclass可以有好幾個superclass
- 這時這個subclass被稱作shared subclass
UNION TYPE :
- 一個subclass是好幾個superclass的聯集
- 與shared subclass不同 (交集)，shared subclass必須出現在所有superclass中

Relational Data Model

Inherent or Implicit Constraints
- 基於data model本身的限制
Schema-based or Explicit Constraints
- 在schema中描述
Application based or Semantic Constraints
- 無法透過data model或schema本身描述，必須以應用或其他方式描述
Relational Integrity Constraints
- explicit schema-based
- Key, Entity integrity, Referential integrity
Key Constraint :
- Superkey : 在relation state中，任兩個tuple內的superkey，值不可相同
- Key : 最小化的superkey，若刪除其中一個元素則會造成非superkey
- 從key中選出primary key
Entitiy Integrity Constraints :
- Primary key的值不可為null
Referentail Integrity :
- Foreign key的值可以是別的relation的primary key，或是null
INSERT可能違反以下Constraints :
- Domain, Key, Referential integrity, Entity integrity
DELETE只有可能違反Referential integrity constraint

Relational Algebra

SELECT : 依照條件選出指定的tuple
PROJECT :
- 依照條件選出指定的column(attribute)
- 結果為tuple set，因此會移除任何重複的tuple
RENAME :
- 可以同時將relation跟attribute改名，或只改名attribute
UNION :
- 將兩個relation中的tuple做聯集，其中相同的tuple會被移除
- type compatible : 兩個relation的attribute數量要相同
INTERSECTION :
- 將兩個relation中的tuple做交集，一樣要符合type compatible
SET DIFFERENCE :
- 又稱 MINUS (-)
- R-S = 取得在R中且不在S中的tuple
- 一樣要符合type compatible
CARTESIAN PRODUCT :
- 獲得兩個relation中所有可能的組合
- 新的tuple數量 : m * n
- 新的attribute數量 : m + n
JOIN :
- 以符合條件的attribute為依據合併兩個relation
EQUIJOIN :
- 條件只有相等的JOIN
NATURAL JOIN :
- 自動合併名稱相同的attribute，合併後該attribute只會出現一次
OUTER JOIN :
- 會保留不符合JOIN條件的tuple，此時若無法在另一relation找到對應值則補null
- LEFT : 保留左側relation的全部tuple
- RIGHT : 保留右側relation的全部tuple
- FULL : 左右全部保留
DIVISION :
- R除以S : 找到R中值符合所有S中值的相關attribute
AGGREGATE FUNCTION :
- COUNT只會計算所有row的數量而不移除值相同的部分
- 可以在AGGR.左邊加上想要grouping的名稱，以該attribute為基底做計算

Relational Calculus

Tuple Relational Calculus :
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
- 左側自由變數不須在右側用任何quantifier表示
Domain Relational Calculus :
- Image Not Showing Possible Reasons
  The image file may be corrupted
  The server hosting the image is unavailable
  The image path is incorrect
  The image format is not supported
  Learn More →
- Domain Variable = Attribute
Unsafe Query :
- 可能造成無限輸出，時常發生在有遞迴的情況中

ER to Relational schema mapping

Step 1: Mapping 一般 Entity
- 針對每個regular entity建立新的relation
- 將attribute複製過去，其中key為原本entity的primary key
Step 2: Mapping Weak Entity
- 針對每個weak entity建立新的relation
- 以自己的partial key + 依賴對象的primary key做為新的key
- 其中，依賴對象的primary key同時也是foreign key，要連回去
- 加入所有attribute
Step 3: Mapping 1:1 Relationship Type
- 將total方的primary key加入partial方做為foreign key
- 原本relationship若有attribute，加入total方
- 若兩方皆為partial，則無限制加入哪一方
- 若兩方皆為total，則建立一個新的relation並將兩者merge
Step 4: Mapping 1:N Relationship Type
- 將1端的primary key加入N端作為foreign key
- 若relationship有attribute則加入N端
Step 5: Mapping M:N Relationship Type
- 建立一個新的relation，M端與N端的Primary key一起作為新的primary key
- 同時，這兩個key也要作為foreign key連回原本相關的entity
Step 6: Mapping Multivalued Attribute
- 建立新的relation，並以原entity的primary key和多值的attribute作為新的primary key
- 同時原本entity的primary key也要連回去
Step 7: Mapping of N-ary Relationship Types
- 建立新的relation，所有參與的原entity primary key一起作為新的primary key
- 同時這些key也要連回原本的entity
Special: Mapping Recursive
- 1:1 and 1:N Recursive:
  - 找到相關的key，並將其作為新的foreign key加入表格且重新命名
- M:N Recursive:
  - 建立一個新的relation
  - 將舊relation中相關的兩個key作為新relation中的primary key
  - 同時這兩個key也要作為foreign連回去

期末考範圍

SQL

Data Definition:
- CREATE
  - 定義attribute名稱+類型+限制
    - 用法: 名稱 + 類型(含長度限制) + NULL/NOT NULL
  - 定義on delete/on update時違反constraint的處理方式
    - RESTRICT : DBMS不允許該操作
    - CASCADE : foreign key對應資料連帶更新
    - SET NULL
    - SET DEFAULT
    - 用法: ON DELETE + 處理 + ON UPDATE + 處理
- DROP
- ALTER
  - 改變表格設定
    - 用法: ALTER TABLE 名稱動作
Data Manipulation:
- INSERT
  - 對該表格插入一組tuple，可以為全部attr或部分attr
- DELETE
  - 以WHERE作為條件刪除資料
- UPDATE
  - 以WHERE作為條件更新資料
  - 用法:
    - UPDATE 表格名
    - SET Attr1=Value1, Attr2=Value2, …
    - WHERE 條件
Data Query
- SELECT
  - SELECT * : 列出符合WHERE條件Attr的所有資料
  - 若不同表格有相同名稱，則要標示: TableName.AttrName
  - SELECT DISTINCT Attr: 去除重複的資料做SELECT
  - Aggregate:
    - SUM, AVG, MIN, MAX…
    - 以Aggregate作為條件不能放在WHERE中(執行順序)
      - 要先GROUP之後用HAVING限制
    - COUNT用法:
      - 計算有幾筆非空值
      - COUNT( * ): 整個表格內所有非空值
      - COUNT( Attr ): 該Attr的所有非空值
      - COUNT( DISTINCT Attr): 該Attr的所有非重複非空值
  - 兩個SELECT的結果可以做SET計算
    - UNION, MINUS, INTERSECT
    - SET運算都會去除重複值
- FROM
  - 可以使用JOIN來合併表格
- WHERE
  - 若WHERE為空則代表列出全部資料
  - Alias:
    - 若需要用到相同Table中的資料兩次，可用AS取別名
    - 例: TableA AS A1 / TableA AS A2
  - LIKE:
- GROUP BY
  - 將SELECT出的資料以某個Attr為依據做分群(該Attr資料相同的分成同一群)
- HAVING
  - GROUP BY的限制條件(要滿足這個條件才會被分群)
- ORDER BY
  - 格式: ORDER BY Attr1 Attr2
  - 以屬性內的資料為依據做排序
  - 可以用DESC/ASC標註降序/升序(預設為ASC)
- 執行順序:
  - FROM -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY
Nested query
- Single row subquery :
  - 只回傳單一row
- Multi row subquery :
  - 回傳多個row
  - IN : A是否存在於subquery回傳的資料集合中
  - ALL, ANY通常搭配>,<,NOT這類運算子使用:
    - ALL : A的條件是否滿足所有回傳的查詢結果
    - ANY : A的條件是否滿足任意回傳的查詢結果
- EXIST運算子 : 檢驗subquery回傳的資料是否為非空，回傳TRUE/FALSE
  - 使用WHERE Attr IN (Set of values)也可以達到一樣的效果
Constraint as assertion
- 格式:
  - CREATE ASSERTION 名稱
  - CHECK 條件
  - 條件要一直為真(若用NOT EXIST則該條件query回傳值必須保持NULL)
SQL Trigger
- 格式:
  - CREATE TRIGGER 名稱
  - Event (當這些事件進行時)
  - Condition
  - Action (當條件滿足時觸發動作)
View in SQL
- View是一種虛擬表格
- 格式:
  - CREATE VIEW 名稱 AS
  - SQL Querys…
- 永遠是最新資料(以建立當下的實體表格為主)

DB Normalization

參考資料PPT

若一個attribute要是candidate key，他必須FD所有其他attribute

Transaction Processing

Transaction :
- 由多個針對資料庫存取的動作組成，為資料庫處理的最小邏輯單元，不可分割
Concurrency control :
- ACID :
  - Atomicity : 基元性，不可被分割，是recovery的責任
  - Consistency : 具有一致性
  - Isolation : 孤立性，每次交易不可被其他交易插隊，直到被commit
  - Durability : 永久性，commit後，該操作對資料庫永遠有效
- Schedule :
  - Serial : 不會有任何交錯的交易
  - Serializable : 與相同n筆交易的Serial schedule等價，可以有並行交易
    - Result equivalent : 排程完成後產生相同的資料庫狀態
    - Conflict equivalent : 交易發生衝突的位置相同
- 三個問題
  - Lost update problem :
    - 不同交易針對相同資料進行操作，造成內容值不正確
  - Temporary update problem :
    - 讀取的資料先前被未commit的交易更改，若先前的交易要rollback，會造成錯誤資料讀取
  - Incorrect summary problem
    - 進行aggregate計算時，資料遭其他交易更改，導致結果不正確
- Lock的種類 :
  - Dead Lock
  - Live Lock
    - 資源在某段時間內被較高優先權的交易占用，導致其他交易無法進行
  - Starvation
    - 若Live Lock的資源占用永遠持續，則特定交易會發生永遠無法執行的狀況
- 處理Lock的手段 :
  - Locking
    - Binary Locking
      - 若資料鎖定，則其他人無法存取
    - Two Phase Locking
      - 當資料要被取用時，將資料鎖住，取用後再解鎖
      - 2PL的所有鎖定動作必須發生在所有解鎖之前
        
        分為兩階段，第一階段只允許鎖定，第二階段只允許解鎖
  - Timestamp
    - 每筆交易進入後，給予timestamp來判斷優先順序
  - Multi-version Concurrency Control
    - 多重版本並行控制，以時間為基礎，每次資料被修改時，保存舊的版本，需要較多的記憶體
  - Optimisitic Concurrency
    - 假設所有交易皆可順利的完成，所以不需要在交易期間做任何檢查，如果交易違反可序列化則選擇影響較小的，結束後確保可序列化才存到資料庫中
- Transaction流程:
  - Rollback : 表示交易失敗，回復整個transaction的操作
Recovery :
- Write Ahead Logging :
  - 資料寫入disk之前會先寫入log
- Steal / No Steal
  - Steal : cache可以在commit之前刷新
  - No Steal : cache不能在commit之前刷新
- Deffered update : 到commit才會被確認寫入
  - No Undo/Redo
  - 交易開始時將交易放入undo
  - commit 將交易從undo移到redo
  - checkpoint 清空 redo
  - 若發生crash，重新執行redo中的transaction
  - 缺點: Redo十分耗時
- Inmediate update : 操作一發生就會被寫入
  - Undo/Redo or Undo/No Redo
    - 區別: commit時寫入系統日誌 / 資料庫
  - 流程與Deffered相同
  - crash發生時要對undo中的動作進行回推

Database File Indexing

Single Level
- Dense V.S Sparse
  - Dense : 一筆索引對應一筆資料
  - Sparse : 一筆索引對應多筆資料
- Primary Index
  - 以primary key排序作為索引
  - Sparse
  - index數 = Data block數
- Clustering Index
  - 以Non key排序作為index
  - Sparse
- Secondary Index
  - 以Secondary key(Dense)或non key(Sparse)作為index

資料庫系統導論2023: 重點筆記

名詞解釋

基本概念

Data Model種類

ER Model

EER Model

Relational Data Model

Relational Algebra

Relational Calculus

ER to Relational schema mapping

期末考範圍

SQL

DB Normalization

Transaction Processing

Database File Indexing

Read more

【論文】Dense Passage Retrieval for Open-Domain Question Answering 重點翻譯

【Note】Deep Learning Neural Network : 卷積神經網路 (CNN)

Leetcode 解題速記 30. Substring with Concatenation of All Words

Leetcode 解題速記 108. Convert Sorted Array to Binary Search Tree