Database Deepdive

# Database Deepdive ## 1. Database Connection Pool - Why and How [圖片引用於此](http://peggg327.blogspot.com/2014/11/connection-pool.html) ![普通連線](https://i.imgur.com/pC59Z4d.png) ![連接池](https://i.imgur.com/qz4WUhA.png) ### 什麼是 connection pool? > a database connection cache implementation which can be configured to suit specific requirements > 介於 db & client 中間的連線快取機制 ### 為什麼要用？因為建立資料庫連線是一件很貴又很花時間的事。 > By just simply implementing a database connection container, which allows us to reuse a number of existing connections, we can effectively save the cost of performing a huge number of expensive database trips, hence boosting the overall performance of our database-driven applications. > Persistent connection vs connection pooling Object pool pattern > 物件池（object pool pattern）是一種設計模式。一個物件池包含一組已經初始化過且可以使用的物件，而可以在有需求時創建和銷毀物件。池的用戶可以從池子中取得對象，對其進行操作處理，並在不需要時歸還給池子而非直接銷毀它。這是一種特殊的工廠物件。 > > 若初始化、實例化的代價高，且有需求需要經常實例化，但每次實例化的數量較少的情況下，使用對象池可以獲得顯著的效能提升。從池子中取得對象的時間是可預測的，但新建一個實例所需的時間是不確定。 ### 怎麼用？直接看 Google 怎麼建 MySQL connection pool https://cloud.google.com/sql/docs/mysql/manage-connections 設定： * Create pool * Open / close connections * Connection count * ~~Exponential backoff~~ (The mysql module automatically uses exponential delays between failed connection attempts.) * Connection timeout * Connection timeout * ~~Connection duration~~ (The 'mysql' Node.js library currently doesn't offer any functionality to control the duration of a connection.) ### CreatePool觀念： [簡明教程：Pool 连接池](https://www.jmjc.tech/less/113) 用 createConnection 创建 Mysql 连接，每执行一次 connection.query 都是一个全新的连接，会造成一个资源的极大浪费，降低性能。连接池是另外的一种执行方法，它一次性的创建了多个连接，然后根据客户端的查询，自动的分发、复用、管理这些连接。 ``` const mysql = require('mysql') // 链接池：创建多个链接、复用与分发链接 const pool = mysql.createPool({ host: '127.0.0.1', port: '3306', user: 'root', password: '' }) // 封装 query = function(sql,callback){ pool.getConnection(function(err,connection){ connection.query(sql,function(err,results){ callback(err, results) // 结果回调 connection.release() // 释放连接资源 | 跟 connection.destroy() 不同，它是销毁 }) } }) } // 随机分配一个连接 pool.query(sql, function (err, result) { // ... } ``` connectionLimit: 是針對client端的連線限制數量。创建连接池的方法是createPool(option)，option里多了一个参数connectionLimit指的是一次性在连接池里创建多少个连接对象，默认10个。如果你想共享一个连接对象，可以使用pool.getConnection。根據[github上 mysql模組](https://github.com/mysqljs/mysql#pool-options)的解釋： > This is a shortcut for the pool.getConnection() -> connection.query() -> connection.release() code flow. > Using pool.getConnection() is useful to share connection state for subsequent queries. This is because two calls to pool.query() may use two different connections and run in parallel. ### 何時用pool.query及何時用pool.getConnection [參考tread](https://github.com/mysqljs/mysql/issues/1668) >`getConnection` if you need transactions or some state between queries. Reference: [MySQL Doc - Connection Pooling with Connector/J](https://dev.mysql.com/doc/connector-j/8.0/en/connector-j-usagenotes-j2ee-concepts-connection-pooling.html) [Wikipedia - Object pool pattern](https://en.wikipedia.org/wiki/Object_pool_pattern) [MySQL - Persistent connection vs connection pooling ](https://stackoverflow.com/questions/9736188/mysql-persistent-connection-vs-connection-pooling) [如何理解node链接mysql时使用连接池的connectionLimit（链接限制数）](https://cnodejs.org/topic/5787d38e3b501f7054983003) [Github JDBC 入門](https://github.com/JustinSDK/JavaSE6Tutorial/blob/master/docs/CH20.md) [Node连接mysql数据库方法](https://segmentfault.com/a/1190000007981124) [簡明教程：Pool 连接池](https://www.jmjc.tech/less/113) [連接池介紹](http://peggg327.blogspot.com/2014/11/connection-pool.html) --- ## 2. Database Normalization - Optimization Guidance > 是指將原先關聯表格的所有資訊，在「分解」之後，仍能由數個新關聯表格中經過「合併」得到相同的資訊。即所謂的「無損失分解」 (Lossless decomposition) 的觀念。 ### 為何要正規化？重複的資料佔資料庫儲存空間要去除的問題： * 資料冗餘 data redundancy * 不一致的相依性 inconsistent dependency 沒有正規化易發生的異常們： * 新增異常 (Insert Anomaly) * 修改異常 (Update Anomaly) * 刪除異常 (Delete Anomaly) ### 正規化方法每一個階段都是以欄位的「相依性」, 做為分割資料表的依據之一。 NF = Normal Form ![](https://i.imgur.com/skSpAzC.png) BCNF被視為大部分應用程式所需的最高階正規形式。可以看這兩個表哪個比較有感（講的是一樣的東西）： ![](https://i.imgur.com/YHoseGF.png) ![](https://i.imgur.com/mFv6EV1.png) ### 正規化型式們 1. First Normal Form: 是指在資料表中的所有記錄之屬性內含值都是基元值 (Atomic Value)，亦即無重覆項目群。沒有做1NF的話就必須要預留很大的空間給這某些欄位, 造成儲存空間的浪費。 - How? 將重複的資料項分別儲存到不同的記錄中, 並加上適當的主鍵。 - Before: sizes 欄位中有超過一個值 ![](https://i.imgur.com/EDTKxbd.png) - After: sizes 拆開放 ![](https://i.imgur.com/TYpFrvu.png) 3. Second Normal Form: 每一非鍵屬性必須「完全相依」於主鍵；即不可「部分功能相依」(partial dependency)於主鍵。 - How? 如果有哪些資料只和一個鍵的一部份有關的話，就得把它們獨立出來變成另一個資料表。如果一個資料表的鍵只有單個欄位的話，它就一定符合第二正規化。 - stylish tables 都只有一個 PK，符合 2NF 5. Third Normal Form: 各欄位與「主鍵」之間沒有「遞移相依」(transitive dependency)的關係。 - How? ：將「遞移相依」的欄位「分割」出去，再另外組成「新的資料表」，並且利用外鍵(F.K.) 來連接二個資料表。 - Before: variant 表中 color_name 僅相依於 color_code, color_code 相依於 variantid (PK) ![](https://i.imgur.com/HjMwCqy.png) - After: 新增 color_object 表切出顏色對照，並將 variant 表中添加 FK color_id 做為對照 ![](https://i.imgur.com/rEstzIG.png) ![](https://i.imgur.com/cbMm0wT.png) product 表把 sizes 交給 variant 表負責，精簡很多 ![](https://i.imgur.com/tY3oFZN.png) 7. BCNF: 是由Boyce和Codd於1974年所提出來的3NF的改良式。如果資料表的「主鍵」是由「多個欄位」組成的, 則必須再執行BoyceCodd 正規化。去除功能相依 (functional dependency) - stylish tables 都只有一個 PK，符合 BCNF * stylish 最後優化：拆出重複的 product_info - Before: ![](https://i.imgur.com/tY3oFZN.png) - After: ![](https://i.imgur.com/a3ZgHfa.png) ![](https://i.imgur.com/eZf4Lox.png) * EER diagram (product related) ![](https://i.imgur.com/OQSvn2s.png) #### 過早優化是萬惡的根源 qps 可能降低資料維護不易 ### Reference [Normalization of Database (有印度腔影片可看)](https://www.studytonight.com/dbms/database-normalization.php) [維基：數據庫正規化](https://zh.wikipedia.org/wiki/%E6%95%B0%E6%8D%AE%E5%BA%93%E8%A7%84%E8%8C%83%E5%8C%96) [資料庫正規劃：水果訂單範例](https://hackmd.io/@TSMI_E7ORNeP8YBbWm-lFA/rykcj8kmM?type=view) [Cust.edu slides：學生課程老師範例](http://cc.cust.edu.tw/~ccchen/doc/db_04.pdf) ### Appendix  #### 2NF eg. - Before: 訂單編號＆商品名稱為主鍵 ![](https://i.imgur.com/cDCfkrV.png) - After: ![](https://i.imgur.com/AzM1aUh.png) #### Fourth Normal Form: 去除多值相依 3NF 之後接下來有點玄。。。 Eg. 飯店提供的披薩種類與配貨地區無關，也即飯店為所有的供貨地區提供它能製作的所有披薩類型。那麼，這個表不滿足第四規格化。因為這個表在{Restaurant}屬性（它不是超鍵）上提供了兩個非平凡的多值依賴: {Restaurant} -> {Pizza Variety} {Restaurant} -> {Delivery Area} - Before: ![](https://i.imgur.com/VmqnJMx.png) - After: ![](https://i.imgur.com/tZ1G2XG.png) #### Fifth Normal Form: 去除連接依賴 (Join Dependency) - How: ![](https://i.imgur.com/Ypcmy2h.png) - Before: ![](https://i.imgur.com/xvHRCyZ.png) - After: ![](https://i.imgur.com/vowkjeq.png) - Reference: [第五正規劃](https://www.youtube.com/watch?v=mbj3HSK28Kk)