資料庫管理系統

# 資料庫管理系統 [TOC] <hr> 補充閱讀 : - [**資料庫系統概論**](https://medium.com/twelvefish/%E4%B8%80-%E8%B3%87%E6%96%99%E5%BA%AB%E7%B3%BB%E7%B5%B1%E6%A6%82%E8%AB%96-3a32898b8851) ## 一、Database Management System (DBMS) 簡介 <img src="https://ithelp.ithome.com.tw/upload/images/20220924/20152201xBaizlIvUB.png" alt="DBMS關係圖"> 資料庫管理系統（DBMS）是一種**負責資料庫的定義、建立、操作、管理和維護的軟體系統**。其目的是保證數據安全可靠，提高資料庫應用的簡明性和方便性。DBMS的工作機理是**把用戶對數據的操作轉化為對系統存儲文件的操作**，有效地實現資料庫3級之間的轉化。 DBMS中的數據獨立性是在不影響要重寫的程式和應用程式的情況下修改方案的能力。數據與程序分離，因此對資料所做的更改不會影響程式執行和應用程式。 A DBMS serves as **an interface between an end-user and a database**, 提供使用者**新增（INSERT）、修改（UPDATE）、刪除（DELETE）、查詢（SELECT）** data in the database. DBMS optimizes the organization of data by following a database schema design technique called **"normalization"**, which splits a large table into smaller tables when any of its attributes have redundancy in values. DBMS offer many benefits over traditional file systems, including flexibility and a more complex backup system. 資料庫的階層關係由大到小排列為：資料庫（由多個資料表組成）、資料表（由多個資料組成）、資料紀錄（由多個欄位組成）、欄位（由多個位元組組成）、位元組（由8個位元組成）、位元。 <img src='https://ithelp.ithome.com.tw/upload/images/20220924/20152201Jq3x5DL5c9.png' alt='資料庫階層舉例'> ### A. DBMS vs File System - #### File System File based systems were an early attempt to computerize the manual system. It is also called a traditional based approach in which a decentralized approach was taken where each department stored and controlled its own data with the help of a data processing specialist. The main role of a data processing specialist was to create the necessary computer file structures. <img src='https://static.javatpoint.com/dbms/images/dbms-vs-files-system.png'> - #### DBMS A database approach is a well-organized collection of data that are related in a meaningful way which can be accessed by different users but stored only once in a system. The various operations performed by the DBMS system are: Insertion, deletion, selection, sorting etc. <img src='https://static.javatpoint.com/dbms/images/dbms-vs-files-system2.png'> ### B. 資料庫體系結構資料庫可以是單層或多層架構 : 一層架構、二層架構、三層架構，但實際上是為二層體系架構和三層體系架構。 - **1-tier Architecture** - 資料庫可直接供使用者使用 - 任何更改都將直接在資料庫本身上完成 - 通常用於開發本地應用程式，可以直接與資料庫通信以實現快速回應 - **2-tier Architecture** - 架構與基本客戶端-伺服器相同。在雙層架構中，用戶端的應用程式可以直接與伺服器端的資料庫通信 > 交互使用例如 ODBC、JDBC 的 API - 用戶介面和應用程式在用戶端運行 - 伺服器端負責提供以下功能：查詢處理和事務管理 - 為了與 DBMS 通信，用戶端應用程式與伺服器端建立連接 <img src='https://static.javatpoint.com/dbms/images/dbms-2-tier-architecture.png'> - **3-tier Architecture** - 在用戶端和伺服器之間包含另一層，客戶端無法直接與伺服器通信 - 用戶端上的應用程式與應用程式伺服器交互，應用程式伺服器進一步與資料庫系統通信 - 最終使用者不知道應用程式伺服器之外是否存在資料庫，資料庫也不知道應用程式以外的任何其他使用者 - 用於大型 Web 應用程式的情況 <img src='https://static.javatpoint.com/dbms/images/dbms-3-tier-architecture.png'> ### C. DBMS Language - **資料定義語言 (Data Definition Language, DDL)** 用來定義資料庫、資料表、檢視表、索引、預存程序、觸發程序、函數等資料庫物件。可以用來建立、更新、刪除 table, schema, domain, index, view。 > 指令 : CREATE(建立)、ALTER(變更)、DROP(刪除)、TRUNCATE(清除) - **資料操作語言 (Data Manipulation Language, DML)** 用來處理資料表裡的資料。 > 指令 : UPDATE(更改)、DELETE(刪除) - **資料控制語言 (Data Control Language, DCL)** 用來控制資料表、檢視表之存取權限，提供資料庫的安全性。 > 指令 : GRANT(授權)、REVOKE(刪權)、COMMIT(完成作業)、ROLLBACK(作業異常，將變動資料回復到初始狀態) >COMMIT以及ROLLBACK也可說是Transaction Control Language(TCL) - **資料查詢語言 (Data Query Language, DQL)** 用來查詢資料表裡的資料，不會對資料本身進行修改的語句。 > 指令 : SELECT(選取) ### 相關名詞介紹 - GISs (Geographic information systems) 地理資訊系統（GIS）是一種計算機系統，用於捕獲、存儲、檢查和顯示與地球表面位置相關的數據 - **OLAP (Online analytical processing)** **OLAP vs OLTP** OLAP系統主要針對讀取進行最佳化，而 OLTP得能處理各種查詢（讀取、插入、更新和刪除）。 **線上分析處理**（英語：Online analytical processing），簡稱**OLAP**，是電腦技術中快速解決多維分析問題一種方法。 - Data warehouses A data warehouse is an enterprise system used for the analysis and reporting of structured and semi-structured data from multiple sources > Such as point-of-sale transactions, marketing automation, customer relationship management - **DBA (Database Administrator)** 資料庫管理師（Database Administrator, 簡稱DBA）通常分為三種 : 1. Production DBA 主要職責是資料庫安裝、運作、維護、故障排除、效能調校，軟體使用授權規劃、高可用性 (HA, High Availability) 與災難復原 (DR, Disaster Recovery) 的設計測試與執行。 2. Development DBA 主要職責是資料庫元件與關聯性的設計、編寫 SQL queries 與 Stores Procedures 的程式碼、協助測試應用程式與資料庫的聯結與功能應用。 3. DevOps DBA 主要職責是資料庫變更的部署，SQL queries 調校, 效能監控。 <hr> ## 二、資料庫的類型 <img src='https://static.javatpoint.com/dbms/images/types-of-databases.png'> ### A. 集中式資料庫它是將數據存儲在集中式資料庫系統中的資料庫類型。用戶可以輕鬆地通過多個應用程式從不同位置訪問存儲的數據。這些應用程式包含允許使用者安全地訪問數據的身份驗證過程。集中式資料庫的一個例子可以是中央圖書館，它承載著學院中每個圖書館的中央資料庫。 #### 集中式資料庫的優勢 - 它降低了數據管理的風險，即對數據的操縱不會影響核心數據 - 在中央存儲庫中管理數據時，可以保持數據一致性 - 它提供了更好的數據品質，使組織能夠建立數據標準 - 它的成本較低，因為處理數據集所需的供應商更少 #### 集中式資料庫的缺點 - 集中式資料庫的大小很大，這增加了獲取數據的響應時間 - 更新如此廣泛的資料庫系統並不容易 - 如果發生任何伺服器故障，整個數據都將丟失，這可能是一個巨大的損失 ### B. 分散式資料庫與集中式資料庫系統不同，在分散式系統中，數據分佈在組織的不同資料庫系統之間。這些資料庫系統通過通信鏈路連接。此類連結可幫助最終用戶輕鬆訪問數據。分散式結構進一步可劃分為 : - **同質DDB (Homogeneous DDB)** : 那些在同一作業系統上執行並使用相同的應用程式進程並攜帶相同硬體設備的資料庫系統。 - **異構DDB (Heterogeneous DDB)** : 那些在不同作業系統上執行不同應用程式，並攜帶不同硬體設備的資料庫系統。 <img src='https://static.javatpoint.com/dbms/images/types-of-databases2.png'> #### 分散式資料庫的優勢 - 在分散式資料庫中可以進行模組化開發，即可以通過包括新計算機並將它們連接到分散式系統來擴展系統。 - 一個伺服器故障不會影響整個數據集。 ### C. 關係資料庫該資料庫基於關係數據模型，以行（元組）和列（屬性）的形式存儲數據，並共同形成表（關係）。關係資料庫使用 SQL 來存儲、操作和維護數據。資料庫中的每個表都帶有一個鍵，該鍵使數據與其他表不同。關係資料庫的例子有 MySQL，Microsoft SQL Server，Oracle等。 #### 關係資料庫的屬性關係模型有以下四個常見的已知屬性，稱為 ACID 屬性，其中： **A (原子性)** : 這可確保數據操作以成功或失敗的方式完成。它遵循“全有或全無”策略。例如，事務將被提交或中止。 **C (一致性)** : 如果我們對數據執行任何操作，則應保留操作前後的值。例如，交易前後的帳戶餘額應該正確，即應該保持保存。 **I (隔離性)** : 可以有併發用戶同時從資料庫訪問數據。因此，數據之間的隔離應保持隔離。例如，當同時發生多個事務時，一個事務效果對資料庫中的其他事務不可見。 **D (持久性)** : 它確保在完成操作並提交數據后，數據更改應保持永久。 ### D. NoSQL資料庫 NoSQL 不僅僅是 SQL 一種用於存儲各種數據集的資料庫。它不是關係資料庫，因為它不以表格(Schema)形式存儲數據，而且**以幾種不同的方式**存儲數據。當對構建現代應用程式的需求增加時，它就應運而生。因此，NoSQL提出了各種各樣的資料庫技術來回應需求。可以將NoSQL資料庫進一步分為以下四種類型： <img src='https://static.javatpoint.com/dbms/images/types-of-databases3.png'> 1. **鍵值存儲 (Key-value storage)** : 它是最簡單的資料庫存儲類型，它將每個專案存儲為鍵（或屬性名稱），將其值保存在一起。 2. **面向文件的資料庫 (Document-Oriented Database)** : 一種用於將數據存儲為類似 JSON 的文件的資料庫類型。它通過使用與應用程式代碼中使用的相同文件模型格式來幫助開發人員存儲數據。 3. **圖形資料庫 (Graph Database)** : 它用於在類似圖形的結構中存儲大量數據。最常見的是，社交網站使用圖形資料庫。 4. **廣列儲存 (Wide-column stores)** : 它類似於關係資料庫中表示的數據。在這裡，數據一起存儲在眾多column中，而不是存儲在row中。 #### NoSQL資料庫的優勢 - 使應用程式開發具有良好的生產力，因為它不需要以結構化格式存儲數據 - 是管理和處理大型數據集的更好選擇 - 提供了高可擴充性 - 用戶可以通過鍵值快速訪問資料庫中的數據 ### E. 雲資料庫這種資料庫類型，其數據存儲在虛擬環境中並通過雲計算平台執行。它為使用者提供了訪問資料庫的各種雲計算服務（SaaS，PaaS，IaaS等）。例如 : Amazon Web Services(AWS)、Microsoft Azure、Kamatera、PhonixNAP、ScienceSoft、Google Cloud SQL。 ### F. 分層資料庫以父子關係節點形式存儲數據的資料庫類型。在下圖中，它以樹狀結構組織數據 : <img src='https://static.javatpoint.com/dbms/images/types-of-databases4.png'> <hr> ## 三、關聯式資料庫（RDBMS） vs 非關聯式資料庫（NoSQL）參考資料：[RDBMS vs. NOSQL | 關聯式資料庫 vs. 非關聯式資料庫](https://medium.com/@eric248655665/rdbms-vs-nosql-關聯式資料庫-vs-非關聯式資料庫-1423c9fbb91a) ### A. 關聯式資料庫管理系統 (Relational Database Management System, RDBMS) 關聯式資料庫是由多個資料表（Table）所組成，並且可以將資料表關聯起來，去連結多個資料表之間的關係。 > 例如 : MySQL, PostgreSQL 關聯式資料庫的特點 : - 由資料表(Table)組成，其中 row 代表一筆資料，column 代表資料欄位名稱。 - **Schema 必須先定義好**，並且**只接受同樣格式資料的插入與修改**。往後如果要修改 schema，必須對於已存在的資料做相對應的處理較為麻煩。 - 可以使用 JOIN 來連結多個資料表，做較複雜的查詢。（JOIN是一種SQL語法） - 具備 **ACID 特性**　（[ACID概念參考文章](http://pse.is/58wmg7)） > ACID 是指資料庫管理系統（DBMS）在寫入或更新資料的過程中，為保證交易（transaction）是正確可靠的，所必須具備的四個特性：原子性（atomicity，或稱不可分割性）、一致性（consistency）、隔離性（isolation，又稱獨立性）、持久性（durability）。 - 使用 SQL(Structured Querying Language) 來管理及查詢資料 <img src='https://miro.medium.com/v2/resize:fit:828/format:webp/1*SmnbbxQaAkvysdfLgRY4Nw.png'/> ### B. 非關聯式資料庫管理系統 (Non-SQL, Not only SQL, NoSQL) 非關聯式資料庫跟關聯式資料庫不一樣，**不需要定義 schema、沒有關聯的關係**。 > 例如 : MongoDB, Redis 非關聯式資料庫的特點 : - 資料庫**由 collection 組成** - collection 中**每筆資料為一份 document**，document 的資料格式不需一致 - 以 CAP theorem 為概念設計（[CAP 概念文章參考](https://medium.com/後端新手村/cap定理101-3fdd10e0b9a)） > CAP定理討論的是在分散式架構下，網路問題導致的資料分區。三個特性分別為：一致性（Consistency）、可用性（Availability）、分區容錯性（Partition tolerance）。在這項定理中，三項特性只能滿足兩項，因此需要做出取捨（trade-off）。 - 常用於分散式雲端系統 <img src='https://miro.medium.com/v2/resize:fit:828/format:webp/1*zDFS4FVd1hBgIvmFm8FC7g.png'/> <hr> ## 四、資料結構 (Data Structure) - 抽象化 (Abstraction) 抽象化是指解決問題時，引入相關事物；當描述這些事物時，我們通常僅專注與問題相關的部分，而忽略其他的細節，以免增加問題的難度或干擾解題者的思緒。 ### A. 軟體工程中的抽象化軟體工程特別強調模組化 (modularity) 概念，以便控制軟體發展時的複雜度，通常模組指的是 **method**與 **class**，描述這些模組時，僅說明其規格而非實作的細節。 Functional Abstraction 功能抽象化 僅描述某個 method 提供哪些功能 (what the function does) ，而非此 method 如何實現這些功能 (how the function does)。 Data Abstraction 資料抽象化 描述可以如何操作資料，而非如何實作這些操作與資料儲存的方式。實踐的方式通常是經由系統分析階段去產生 **Abstract Data Types (ADT) 抽象資料型態**。 Abstract Data Types (ADT) 抽象資料型別 是一堆資料 (data) 的集合，和一組可以在那堆資料上執行的操作 (operation)。 > **ADT** is a collection of data and a set of operations on that data. <img src='https://miro.medium.com/v2/resize:fit:828/format:webp/1*kkJPYC2VQgafv5j04Vm38A.png'/> #### :question: 資料結構(Data Structure) 和 ADT抽象資料型別(Abstract Data Types)的差別? > Data Structure 是描述在程式語言中如何儲存一組資料集合的方式，所以資料結構是 ADT 的**部分實作細節**。 <hr> ## 五、DBMS-Three Level Architecture ### A. Schema 介紹 **Schema describes how the database itself behave for each input of data and how the output from database should behave.** This is essentially the blueprint of a database where it contains details of each individual table describing what type of data, what is the name to the table, how many columns and rows and keys and also each individual property for each section of column. ### B. Schema 的優點 v.s. 缺點 **Advantages** of Schema : - Data can be independently managed of the physical storage - Faster Migration to new graphical environments - it is possible to use different sets of developers on different set of developers - It is better, because the customer has no direct access to the business logic of the database - No data loss in the event of a failure, as you are always secure by accessing the other level **Disadvantages** of Schema： - Complete DB Schema is a complex structure that is difficult for everyone to understand. - Physical separation of levels may affect the Database performance SQL database use the schema as relational databases and also SQL where it **describes how each individual data pair related to each other**. Where non-relational database schema **work as a file structure of no data related to each other**. Therefore, there can be duplications exist and also the main id is generated itself and data itself is described in json format. ### C. 實體關聯圖（Entity Relationship Diagram）與 ER Model 簡介 The Entity-relationship Model is a diagramming model used in **database design**. It is a high-level overview of a domain of data, specifically **through ‘entities’ and its relations between each other**. <img src="https://miro.medium.com/v2/resize:fit:1100/format:webp/1*F1DWB4z2gnUlLKLOj1DSYA.png"> - **Entity Type** : The a description or classification of a particular group of members in a set of data. Think of a **Class** in OOP. - **Entity** : The reference to the member itself. Think of an **object, or instance of class**, in OOP. - **Attribute** : A property (that we are interested in keeping track of) of an Entity Type. Think of **member variable in a class** in OOP. - **Relationship** : A description of how two entities interact with each other in the database. There is not a clear OOP analog for many of these relationships, but essentially think of **how different objects interact with others**. 詳細說明 : [六、A. ER Model Concept](https://hackmd.io/V14gGMK0R_GTQbjNFro6Ng?both#A-ER-Model-Concept) ### D. 三層式綱要架構(Three-schema Architecture) 介紹 Why is 3 level architecture created ? Three level architecture was created to **keep the physical database separate from the user applications**. - **內部層(Internal level)** 內部綱要是用來描述架構中的有關資料庫實際的儲存與存取路徑的完整資訊。針對所指定的資料庫管理系統建立實體資料模型，可以顯示資料是如何實作及儲存在資料庫。關聯式資料庫的實體資料模型是建立表格、關聯和定義索引。 - **概念層(Conceptual level)** 概念綱要也稱為logical level，主要是集中在描述整個資料庫的結構，maintained by DBA(database administrator)。通常會使用一個象徵性的資料模型來描述概念綱要。概念綱要最常使用實體關聯模型(ER Model)，利用圖形方法表示，稱為實體關聯圖(ER Diagram)，用來聚焦在描述實體(entities)、型態(data types)、關聯(relationships)、運算(user operations)和限制(constraints)的細節。 - **外部層(External level)** 外部綱要或稱為使用者視界(View)。負責不同使用者所需要觀看的部分資料，而將資料庫的其他部分隱藏起來。 <img src='https://miro.medium.com/v2/resize:fit:1400/format:webp/1*Mld5pLhDxJcI_WYx1jab9w.png'> ### E. Data Independence - Data independence can be explained using the **three-schema architecture**. - Data independence refers characteristic of being able to **modify the schema at one level** of the database system without altering the schema at the next higher level. There are two types of data independence: - #### Logical Data Independence - Logical data independence refers characteristic of being able to change the conceptual schema without having to change the external schema. - Logical data independence is used to **separate the external level from the conceptual view**. - If we do any changes in the conceptual view of the data, then the user view of the data would not be affected. - **Logical data independence occurs at the user interface level**. - #### Physical Data Independence - Physical data independence can be defined as the capacity to change the internal schema without having to change the conceptual schema. - If we do any changes in the storage size of the database system server, then the Conceptual structure of the database will not be affected. - Physical data independence is used to **separate conceptual levels from the internal levels**. - **Physical data independence occurs at the logical interface level**. <img src='https://static.javatpoint.com/dbms/images/dbms-data-independence.png'> <hr> ## 六、ER Modeling ### A. ER Model Concept ER model stands for an **Entity-Relationship model**. It is a high-level data model. This model is used to **define the data elements and relationship** for a specified system. It develops a conceptual design for the database. It also develops a very simple and easy to design view of data. #### Essential Component of ER Diagram : <img src='https://static.javatpoint.com/dbms/images/dbms-er-model-concept-diagram.png'> - **Entity** An entity may be any object, class, person or place. In the ER diagram, an entity can be represented as **rectangles**. <img src='https://static.javatpoint.com/dbms/images/dbms-er-model-concept2.png'> - **Weak Entity** An entity that **depends on another** entity called a weak entity. The weak entity **doesn't contain any key attribute** of its own. The weak entity is represented by a **double rectangle**. <img src='https://static.javatpoint.com/dbms/images/dbms-er-model-concept3.png'> - **Attribute** The attribute is used to describe the **property of an entity**. **Eclipse** is used to represent an attribute. <img src='https://static.javatpoint.com/dbms/images/dbms-er-model-concept4.png'> - **Key Attribute** The key attribute is used to represent the **main characteristics** of an entity. It represents a **primary key**. The key attribute is represented by an ellipse with the text **underlined**. <img src='https://static.javatpoint.com/dbms/images/dbms-er-model-concept5.png'> - **Composite Attribute** An attribute that **composed of many other attributes**. The composite attribute is represented by an ellipse, and those ellipses are **connected with** an ellipse. <img src='https://static.javatpoint.com/dbms/images/dbms-er-model-concept6.png'> - **Multivalued Attribute** An attribute can **have more than one value**. The **double oval** is used to represent multivalued attribute. <img src='https://static.javatpoint.com/dbms/images/dbms-er-model-concept7.png'> - **Derived Attribute** An attribute that can be **derived from other attribute** is known as a derived attribute. It can be represented by a **dashed ellipse**. <img src='https://static.javatpoint.com/dbms/images/dbms-er-model-concept8.png'> ### B. Keys - #### Primary key (主鍵) 從候選鍵中，挑選出其中一個關聯鍵，也就是最具識別意義的關聯鍵。 <img src='https://static.javatpoint.com/dbms/images/dbms-keys3.png'> - #### Candidate key (候選鍵) 符合唯一性以及最小性的關聯鍵。 <img src='https://static.javatpoint.com/dbms/images/dbms-keys4.png'> - #### Super Key 符合唯一性的關聯鍵。A super key is a superset(父級) of a candidate key. <img src='https://static.javatpoint.com/dbms/images/dbms-keys5.png'> - #### Alternate Key (次要鍵) 沒有被選為主鍵的其他候選鍵。 <img src='https://static.javatpoint.com/dbms/images/dbms-keys7.png'> - #### Foreign Key (外鍵/外部鍵) 關聯中被用來參考到其他表格主鍵的關聯鍵，就是外鍵。 <img src='https://static.javatpoint.com/dbms/images/dbms-keys6.png'> - #### Composite key (or Concatenated Key) Whenever a primary key consists of more than one attribute, it is known as a composite key. 舉例來說，一個員工可能被指派多個角色、參與多個專案，所以 primary key 會是 Emp_Id、Emp_Role、Proj_Id 的組合。 <img src='https://static.javatpoint.com/dbms/images/dbms-keys8.png'> - #### Artificial key 假設如上舉例，primary key 會是一串很長的內容，這時就可以用 Artificial key 做代表性的鍵值。 <img src='https://static.javatpoint.com/dbms/images/dbms-keys9.png'> 進階介紹：[資料塑模](https://hackmd.io/@k139/r1y-9LmK4/%2Fs%2Fr1z0TUQKE?type=book) ### C. Generalization - Generalization is like a **bottom-up** approach in which two or more entities of **lower level combine to form a higher level entity** if they have some attributes in common. - In generalization, an entity of a higher level can also combine with the entities of the lower level to form a further higher level entity. - In generalization, entities are combined to form a more generalized entity, i.e., subclasses are combined to make a superclass. ### D. Specialization - Specialization is a **top-down** approach, and it is opposite to Generalization. In specialization, **one higher level entity can be broken down into two lower level entities**. - Specialization is used to identify the subset of an entity set that shares some distinguishing characteristics. - Normally, the superclass is defined first, the subclass and its related attributes are defined next, and relationship set are then added. ### E. Aggregation In aggregation, the relation between two entities is treated as a single entity. In aggregation, relationship with its corresponding entities is aggregated into a higher level entity. ### F. Reduction of ER diagram to Table - **Entity type becomes a table.** - **All single-valued attribute becomes a column for the table.** - **A key attribute of the entity type represented by the primary key.** - **The multivalued attribute is represented by a separate table.** - **Composite attribute represented by components.** - **Derived attributes are not considered in the table.**