Nathan.Lu
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Versions and GitHub Sync Note Insights Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       owned this note    owned this note      
    Published Linked with GitHub
    Subscribed
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    Subscribe
    --- title: DB淺談-存儲 tags: DB description: DB淺談-存儲 --- ![https://books.google.com.tw/books?id=-F2vDwAAQBAJ&dq=Database+Internals:+A+Deep+Dive+into+HOw+Distributed+Data+System+Work&hl=zh-TW](https://i.imgur.com/vHPy3u8.png) <!-- more --> 資料庫系統主要把數據存儲在記憶體與硬碟上. 主要有2種存儲地方: Memory與硬碟上. Memory存儲的稱為Main Memory DBMS(內存資料庫), 透過log來恢復數據 硬碟存儲資料庫就直接使用HD做存儲. 任何RDBMS在事務完成持久化到硬碟前, 都會將修改給先寫到一個有順序的Log文件上, 能把還未完成的事務操作給重放. ### MySQL為例 Redo Log [MySQL · InnoDB · Redo log](http://mysql.taobao.org/monthly/2019/03/03/) [引出-redo-log-的作用](https://www.cnblogs.com/ZhuChangwu/p/14096575.html#%E4%B8%80%E5%BC%95%E5%87%BA-redo-log-%E7%9A%84%E4%BD%9C%E7%94%A8) ### SQL Servere為例 有ldf檔案做類似redo log的紀錄, 然後mdf才是數據保存的真正檔案; 因此備份這兩個檔案就可以達到備份的效果. ----- 資料庫基本都需要Persistance(持久化), 將記憶體中的狀態寫入到硬碟上, 就需要存儲引擎. 存儲引擎基本就是一個數據文件, 寫就寫到該文件上, 讀取就從這文件上去找. 直接訪問Memory的速度直到現在還是比硬碟快上幾個量級[(註)](https://colin-scott.github.io/personal_website/research/interactive_latency.html?fbclid=IwAR0zZslYKSKnXe3iLcRJ2gvUkuUsbZvvWnvoAjJDv-051a-G2fzQC47WcaU). ![https://zhuanlan.zhihu.com/p/444331688](https://i.imgur.com/MUpuW2L.png) 來看看[硬碟](https://zh.wikipedia.org/wiki/%E7%B0%87)與SSD ![](https://i.imgur.com/Kw2h5rh.png)(https://zh.wikipedia.org/wiki/%E7%B0%87) 重點看Sector(扇區), 它是物理讀寫的基本單元, 通常是512Bytes; 所以讀取數據不可能比Sector還小. 多個相鄰連續的Sector連在一起稱為IO **Block(磁區組)**; 所以通常是512、1024、2048、4096bytes這樣的大小, 就是sector的整數倍. SSD也是多個Block組成一個Page, 讀寫最小單位常見的是4k 常講的4k對齊, 指的就是block的大小, 表示OS讀取HD時一次讀取的數據大小; 如果一次讀取4K, 但block大小才2K, 則等於要定址兩次做讀取; 若剛好4k則只需要定址一次. IO操作就是讀取IO **Blocks**緩存到memory中作為buffer提供操作. Virtual Memory存儲許多Page的數據, 各Page又透過map到memory/block上, block又對到多個sector. ![](https://i.imgur.com/fAVQVFU.png) [On Disk IO](https://dengziming.github.io/post/%E6%95%B0%E6%8D%AE%E5%BA%93/%E7%A3%81%E7%9B%98io/#sector-block-page) https://www.jianshu.com/p/50d05952b9c5 https://www.youtube.com/watch?v=qlH4-oHnBb8 MySQL InnoDB為例, 他會把數據拆解成若干個Page, 以Page作為Memory與Disk交互的基本單位. 一般InnoDB的page size是16KB(16384), 也就是一次最少會讀取16KB的內容到Memory中; 持久化時也是一次把16KB的記憶體中的資料給寫入到硬碟的Page內. ![](https://i.imgur.com/eqJ5LS7.png) 緩存頁面如果有任何的被修改, 則變成```髒頁```, 就會寫到Redo Log; 然後會由Purge thread負責將髒頁的結果持久化回硬碟中. 讓寫入的動作由InnoDB來控, 避免順序錯亂. 到SQL層談的Page都是virutal memory的大小. # Hash ### 史上最簡單的資料庫 Pile file(堆文件) 透過兩個bash function實現 ```db_set key value```把key/value存在資料庫內 ```db_get key```查找該key相關的最新一筆value給返回 ```bash= #!/bin/bash db_set () { echo "$1,$2" >> database } db_get () { grep "^$1," database | sed -e "s/^$1,//" | tail -n 1 } [[ -z "${1-}" ]] case $1 in db_set|db_get) "$1" "${@:2}" ;; *);; esac ``` ```bash= ./store.sh db_set 123456 '{"name":"London","attractions":["Big Ben","London Eye"]}' ./store.sh db_set 42 '{"name":"San Francisco","attractions":["Golden Gate Bridge"]}' ./store.sh db_get 42; > {"name":"San Francisco","attractions":["Golden Gate Bridge"]} ``` 將數據儲存在file上, 每行包含一個```,```分格key/value 每次db_set都是對file做append, 並不會覆蓋舊版本的key; 而查找最新值時, 需要找文件中key最後一次出現的位子(tail -n 1) 但這樣的結果有可能會從尾到頭把文件整個翻了遍, 這樣的開銷是O(n) ## Index 為了提高查找特定Key的值, 所誕生的資料結構就是Index(索引) - Index會保存一些額外的metadata作為路標, 快速指向想要的數據 - Index是從主數據衍生的附加數據結構 - INdex最有力的影響是查詢性能 - 在寫入新資料時, 需要維護這額外的數據結構, 會產生一些性能開銷; 寫入能簡單的追加寫到文件尾部, 但索引結構無法, 它需要排序整併 主要是在記憶體中建立一組[Hash Table結構](https://en.wikipedia.org/wiki/Hash_table#Collision_resolution)(相似於Dictionary), 查找效率為O(1) ![](https://i.imgur.com/i3ML7pW.png) ![](https://i.imgur.com/rXWVF6L.png) Hash table的value存的是文件的byte offset, 直接指名可以找到對應值的位置 文件只做append寫入 當一個新的key, 被寫入時, 需要更新hash table 當文件一直倍append, 總是幾年後可能會塞滿一個硬碟的空間 再把文件分為特定大小的分片, 當檔案增長到特定大小時, 就關閉目前文件的打開, 並開始寫入一個新的文件分片; 然後就能對原有的文件分片做壓縮(compaction), 這動作主要就是刪除檔案中重複的key, 只保留最新的一筆. ![](https://i.imgur.com/2tGBqkD.png) 又或是把多個已經歷史文件分片的內容壓縮並且合併到一起 ![](https://i.imgur.com/JZT9XUK.png) 被壓縮後的文件不會被修改 壓縮過程中此過程可以在背景執行, 因此舊的那份文件檔案還是可以提供讀取 壓縮合併完成後, 就轉換讀取請求到新的文件上, 然後就可以刪除舊的 這時每個文件被載入完成時, 都會有一份in-memory hash table. 因此查找key時, 就只查找hash table, 再跳到對應位置循序讀取. 但檔案太大時, 載入時間會很久 - 刪除操作 要效率快, 可以進行軟刪除, 在文件中加入一個特殊的刪除紀錄, 等壓縮合併時排除掉就好. 這樣才不用去一直重新整理磁盤上碎裂的區塊 - Append方式寫入文件的好處 1. append是順序寫入, 比起隨機寫入快上非常多(少了很多尋址動作與載入hash map) 2. 如果文件上的已寫入資料都是不可被修改的, 在併發處理上就簡單了, 一個檔案給一個線程做開檔寫入的動作就好, 崩潰恢復也是照順序讀取做恢復 ## Hash優點 單鍵查找快速 ## 缺點 hash table必須被載入到memory, 且hash table本身是無序的 範圍查找GG, 不支持查找一個條件區間內所有值, 一部分也是因為無序 > mysql [Comparison of B-Tree and Hash Indexes](https://dev.mysql.com/doc/refman/5.6/en/index-btree-hash.html) 當key太多, hash function生成出來的hash值難免碰撞, 需要在一層結構儲存 ### [Collision處理](https://en.wikipedia.org/wiki/Hash_table#Collision_resolution) - seprate chaining ![](https://i.imgur.com/AEV5Ed8.png) - Open addressing ![](https://i.imgur.com/tBEoKTZ.png) 許多關聯式資料庫其實都有內建Hash table, 來存近期倍操作過得key與value [MySQL Adaptive Hash index](https://dev.mysql.com/doc/refman/5.6/en/innodb-adaptive-hash.html) > MySQL Adaptive Hash index > 使用LRU算法, 維護Hash table的數量 MySQL BinLog, Redo log 其實也是以寫入的順序append寫入bin log file ## 小結 其實硬碟內的所有數據, 從理論上都能放進Memory中, 但適合的場景有限 適合每條紀錄佔用的空間不超過幾KB, 且數量是"有限"的. 例: 學生數據, 用戶數據, 庫存資料等等的 像操作Log這類無限增長的就不適合了. # Column-Oriented VS Row-Oriented DBMS 大部分的DBMS都由表組成, 表由Col與Row組成. 通常一個數據的字段(在資料庫就是一個值), 是由Col與Row兩者的交集而成, 某種類型的值. 屬於同一行的字段具有相同的資料類型與結構. 例: 用戶資料表, UserName的值則都是相同的資料類型, 並且屬於同一行. 所以資料庫大致能分成用Row或是用Column進行分類做存儲. Table就能做水平分區(同一個row的值存在一起)或是垂直分區(同一行的存在一起) ![](https://i.imgur.com/l1lLt10.png) ![](https://i.imgur.com/3G4wzz0.png) 關聯式資料庫(RDBMS)大多是Row-Oriented DBMS (MySQL, Postgres,SQL) 而NoSQL很多是Column-Oriented, 像[BigQuery](https://cloud.google.com/bigquery),[ClickHouse](https://clickhouse.com/) ## Row-Oriented DBMS 優點: 1. 空間局部性利用率高: 通常按照業務邏輯做排序擺放後, 如果存取某一筆, 很大機率附近的資料也被訪問 2. 按照Block訪問完整數據, 通常RDBMS都是按照Block做資料存放, 所以一個Block非常有可能就擁有某一行的完整數據, 對業務訪問上方便; 但只查單欄位的話, 就是一種浪費. 3. 如果CPU有支援[SIMD](https://zhuanlan.zhihu.com/p/31271788), 可充分支援這樣的訪問 ## Column-Oriented 優點: 1. 因為前面講到同一行的字段通常是同一種資料類型, 其大小是固定的, 所以存儲上能壓縮到最緊湊.1個block能被塞的值就能最有效的利用, 要達到數據壓縮比1:10不是難處. 2. 方便做分析與複雜的聚合計算, 例如查找趨勢, 計算區段平均值. 2.1 簡單的說, 因為同一行的數據都在連續的磁區上, 磁頭移位尋址次數變少了非常多 2.2 移位過程中, 也不必像RDBMS那樣把沒要的數據給一併讀取 3. 適用於大數據領域 缺點: 1. 新增 2. 更新 3. 刪除 ## 小結 存儲方式只是區別方式的一種 主要還是根據"訪問方式"來決定所需 [(影)列式存储与行式存储区别!](https://www.zhihu.com/zvideo/1432232308811816960) # 其他存儲類型?? ## 寬列式存儲wide-column stores https://blog.logrocket.com/nosql-wide-column-stores-demystified/ Wide-Column store常見的有HBase與BigTable. 在這種存儲格式下, column被分組為(column family列族, 用來存儲相類類型的數據), 並且在每個column family中, 數據被逐行存放, 因此這種格式適合由一個key或是一組key來檢索的數據. # Data file & Index file 資料庫使用特定文件組織形式主要原因有 1. 儲存效率 - 以最少次IO的形式進行存儲 2. 訪問效率 - 盡可能用最少的步驟來定位找到數據 3. 更新效率 - 讓磁盤更改的次數最少化 通常資料庫的每個表都是以一個獨立文件來表示. 為了能快速定位到數據紀錄的所在. 都會需要使用[Index](#Index)這種輔助結構. 資料庫通常也是把Index file給獨立存放, Index file中存儲meta data以及用來定位data file的紀錄. 再做數據的Insert或Update時, 並不會直接地對page上的資料做變更. 而是透過Deletion marker/tombstone(刪除標誌), 在資料庫做GC或壓縮時, 將之回收.

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully