張星瑀 Sheena
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
    • Invite by email
      Invitee

      This note has no invitees

    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Note Insights New
    • Engagement control
    • Make a copy
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Note Insights Versions and GitHub Sync Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Engagement control Make a copy Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
  • Invite by email
    Invitee

    This note has no invitees

  • Publish Note

    Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

    Your note will be visible on your profile and discoverable by anyone.
    Your note is now live.
    This note is visible on your profile and discoverable online.
    Everyone on the web can find and read all notes of this public team.
    See published notes
    Unpublish note
    Please check the box to agree to the Community Guidelines.
    View profile
    Engagement control
    Commenting
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    • Everyone
    Suggest edit
    Permission
    Disabled Forbidden Owners Signed-in users Everyone
    Enable
    Permission
    • Forbidden
    • Owners
    • Signed-in users
    Emoji Reply
    Enable
    Import from Dropbox Google Drive Gist Clipboard
       Owned this note    Owned this note      
    Published Linked with GitHub
    • Any changes
      Be notified of any changes
    • Mention me
      Be notified of mention me
    • Unsubscribe
    # Literature Review:大數據與NoSQL 在第一章已經探討ESG浪潮下企業將面臨的挑戰,而本章節則延續之前的整理,將數位轉型的焦點放在大數據分析處理以及資料庫的應用。筆者目前是參考[成大育才網](https://ge.ncku.edu.tw/?lang=zh_tw)中的兩個課程([智慧製造資料庫與大數據分析](https://ge.ncku.edu.tw/enrol/index.php?id=5605)及[智慧製造感測聯網與數據處理分析技術](https://ge.ncku.edu.tw/enrol/index.php?id=5608))做學習與筆記,歡迎讀者隨著文章的脈絡一同學習! - [ ] 了解大數據的處理架構 - [ ] Lambda與Kappa的差別 - [ ] 了解結構化資料與非結構化資料 - [ ] NoSQL的四種資料模型 ## 大數據是怎麼被處理的? 首先,我們需要了解資料被產生之後,會經過什麼樣的流程,以下圖片為大數據架構的示意圖,其中的各個icon為此架構的重要元素,分別為: * **資料來源Data sources** * **資料存儲Data storage→批次處理Batch processing** * **即時消息攝取Real-time message ingestion→串流資料處理Stream processing** * **分析數據存儲Analytical data store** * **分析和報告Analytics and reporting** * **編排Orchestration**(用於將多個存儲路徑中的數據獨立出來,並做合併、整合,以利數據分析工具使用,也幫助企業可達到更自動化、數據串流驅動的決策參考) 。 [![](https://hackmd.io/_uploads/Skfzt0Sph.jpg)](https://imgur.com/shtSTQb) 而這種架構風格通常適用於: * 需儲存與處理的資料量過大,而不適用於[傳統資料庫](https://www.geeksforgeeks.org/difference-between-traditional-data-and-big-data/)。 * 為了分析和報告,而需要將非結構化的資料做轉化時。 * 在即時或低延遲的狀態下,捕捉、處理和分析無界的資料串流(unbounded streams)。 * 使用Azure的相關應用 > 參考來源:***Big data architecture style,Microsoft*** *https://learn.microsoft.com/en-us/azure/architecture/guide/architecture-styles/big-data* 從上我們了解到數據的處理,會以資料來源的特性而有其適合的存儲及處理方式,那以下簡略介紹兩種大數據的處理架構: ### 1.Lambda架構 [![](https://hackmd.io/_uploads/H1X7FRrp2.jpg)](https://imgur.com/BYswuUK) 此架構可以解決用戶在應用大數據時的實時查詢需求,且具有高人為失誤容錯率與穩固性。該架構是由Nathan Marz提出,利用將數據流分成兩個路徑處理: * **Batch Layer** **批次層** (Cold path冷路徑): 將所有資料以其最原始的格式存入,並對該筆資料進行批次處理,最後結果則顯示於Batch views批次視圖。 * **Speed Layer** **速度層** (Hot path熱路徑): 即時分析處理資料,包含處理批次層中因延遲而尚未傳至批次視圖中的數據;不過流入此路徑的數據會依須達成的延遲需求約束,而犧牲一些精確度。 此架構不會更動最原始的輸入資料,因此任何新變動的數據永遠會附加於已存有的資料,而最先前的資料不會被覆寫;特定數據的變動也會以新的時間戳記記錄。 而最後,由**Serving Layer服務層**透過編制批次視圖索引,與合併每次累加出的速度層的最新資料,再輸出至用戶端。 雖然Lambda架構可實現實時資料處理的需求,但仍有缺點:複雜性較高。因批次層與速度層必須由兩個截然不同的程式庫維持,故有可能在維護與除錯上較有困難。 >參考來源:\ >*https://www.databricks.com/glossary/lambda-architecture* >*https://www.interviewbit.com/blog/lambda-architecture/* >*https://iter01.com/528938.html* ### 2.Kappa架構 [![](https://hackmd.io/_uploads/SyWNYCHp2.jpg)](https://imgur.com/Qg8bDqq) 針對Lambda架構複雜性的缺點,Jay Kreps提出僅一層即時資料處理層(aka僅保留速度層)的——Kappa架構。它仍然保留不更動原始輸入資料的特性,而透過串流處理引擎,如:Apache Storm, Kinesis, Kafka, Flink,實現減少批次處理。 具體來說,在Kappa的速度層中有以下兩個主要要素: * **Ingestion資料擷取**:實時收集且有序地保存來自各種資源的資料,其中包含但不限於API,感測器,日誌檔等。 * **Processing處理**:一旦資料被擷取後,就需要透過串流處理演算法或一些技巧馬上處理,且每個事件的現有狀態都是由新的事件更動附加上去。 雖然在開發、維護上的難度較小(因不需要設計兩個截然不同的程式庫),但仍需要依使用者的需求做成本、資料遺失上的考量。 #### Kappa架構實例——Uber! [![](https://hackmd.io/_uploads/ryjov0H62.png)](https://www.uber.com/en-KE/blog/dynamic-pricing/) 重視即時對話式載客服務的Uber,為了建立[即時定價系統](https://www.uber.com/zh-HK/blog/upfrontfare/),就需要仰賴具有低延遲的數據分析與管理。詳見:***Designing a Production-Ready Kappa Architecture for Timely Data Stream Processing***, https://www.uber.com/en-TW/blog/kappa-architecture-data-stream-processing/ >參考來源:\ >*https://www.sqlservercentral.com/articles/advantages-of-kappa-architecture-in-the-modern-data-stack* >*https://pradeepl.com/blog/kappa-architecture/* >*https://zhuanlan.zhihu.com/p/584255261* ## 大數據與資料庫 ### 結構化資料vs.非結構化資料 首先在了解資料庫前,需要先知道不同的資料特性,正如生物學家在辨別生物一樣,或許透過觀察其外部特徵、解剖觀察內部結構,亦或是根據它的地理分佈、文獻查找等方式;而這邊因為要探討的是資料的蒐集與存儲,因此我們將討論一種分為兩個類型的資料——結構化資料(Structured Data)與非結構化資料(Unstructured Data)。簡單來說: **結構化資料**:以一種固定格式或遵循已經定義好欄及行位的模型存儲的數據。這些資料因為有固定的數據結構,所以較容易分析、使用的存儲空間少。舉例來說,員工資料表中僅有固定的姓名、職位、薪資等欄位;財務報表中包括資產、負債、淨利潤等。 **非結構化資料**:顧名思義,就是沒有整理過或按預先設計好的資料模型\架構存儲的數據。這些資料可能包括文字、圖片、影片、網頁、音樂等。 ![](https://hackmd.io/_uploads/rkdF1L_0h.png) >參考來源:\ >淺談資料格式 — 結構化與非結構化資料*https://medium.com/marketingdatascience/%E6%B7%BA%E8%AB%87%E8%B3%87%E6%96%99%E6%A0%BC%E5%BC%8F-%E7%B5%90%E6%A7%8B%E5%8C%96%E8%88%87%E9%9D%9E%E7%B5%90%E6%A7%8B%E5%8C%96%E8%B3%87%E6%96%99-50c89a4b15e0* >結構化資料 vs. 非結構化資料*https://www.purestorage.com/tw/knowledge/big-data/structured-vs-unstructured-data.html* 在第一篇中已提到大數據的3V特性,由此可知我們大數據是動態的,因此將其歸類為非結構化資料較為合適。從上圖也可以得知,這類型的資料需存於「**非關聯式資料庫NoSQL**」。它的優點: **1.可快速讀寫資料**:基於它每筆資料之間不是以關聯的關係儲存,故搜尋單筆資料即可顯示它所有的資訊,以實現了其快速擷取資料的特性。如下圖為Oracle對NoSQL雲端服務資料庫的性能測試,圖中藍色線為每秒300次的讀取操作,而第95次延遲大約落於3-4毫秒的時間範圍內;而綠色線為每秒150次的寫入操作,第95次延遲也介於4-5毫秒內。 ![](https://hackmd.io/_uploads/BJ9ucJjAn.png) *(如果你和筆者一樣,一開始看不懂這張圖的各軸含義與曲線意義,希望看完這則影片可以幫助你理解<font color="#586D80"> Throughput vs. Latency</font>。)* {%youtube f7VsHLk_Z8c?si=f22Xg-t17_DfViKV%} >延伸閱讀:[RDBMS vs. NOSQL](https://medium.com/@eric248655665/rdbms-vs-nosql-%E9%97%9C%E8%81%AF%E5%BC%8F%E8%B3%87%E6%96%99%E5%BA%AB-vs-%E9%9D%9E%E9%97%9C%E8%81%AF%E5%BC%8F%E8%B3%87%E6%96%99%E5%BA%AB-1423c9fbb91a) > [color=#D6DCDB] **2.擴展性較佳**:相較於傳統SQL中僅能垂直增添資料,NoSQL在橫向擴充較為容易,且硬體上不必增加伺服器或升級CPU和RAM來擴大規模,意味著可以降低些成本。 >延伸閱讀: >[1) NoSQL Beginner Guide: Pros, Cons, Types, and Philosophy](https://www.altexsoft.com/blog/nosql-pros-cons/) >[2) 什麽是 NoSQL?](https://www.oracle.com/tw/database/nosql/what-is-nosql/#:~:text=NoSQL%20%E8%B3%87%E6%96%99%E5%BA%AB%E7%9A%84%E5%84%AA%E9%BB%9E,-%E9%81%8E%E5%8E%BB20%20%E5%B9%B4&text=%E4%BD%BF%E7%94%A8SQL%20%E8%B3%87%E6%96%99%E5%BA%AB%E6%99%82,%E6%8F%90%E4%BE%9B%E6%9B%B4%E5%84%AA%E8%B3%AA%E7%9A%84%E6%9C%8D%E5%8B%99%E3%80%82) > [color=#D6DCDB] **3.靈活性**:有別於傳統資料庫須先建立資料庫Schema的特性,且不一定保留所有的ACID(Atomicity, Consistency, Isolation, Durability),而是採用CAP資料庫理論,故讀寫的限制被放寬。*(<font color="#586D80">觀看下則影片認識CAP理論</font>)* {%youtube gkg-FAEXIkY?si=EeUzNHPwFL9__dfT%} >延伸閱讀:[初步認識分散式資料庫與 NoSQL CAP 理論](https://oldmo860617.medium.com/%E5%88%9D%E6%AD%A5%E8%AA%8D%E8%AD%98%E5%88%86%E6%95%A3%E5%BC%8F%E8%B3%87%E6%96%99%E5%BA%AB%E8%88%87-nosql-cap-%E7%90%86%E8%AB%96-a02d377938d1) > [color=#D6DCDB] ### NoSQL資料庫類型 1. **文件導向資料庫**:資料模型以XML、JSON或BSON的格式作為存儲方式,且特定的元件透過建立索引的方式達到快速搜尋。 2. **鍵值資料庫**:每一筆資料透過鍵值對的方式,顯示資料關係並進行操作,如透過資料的key可以直接存取它的value。 3. **列式資料庫**:以每一直列為一組的方式進行管理,讀取速度雖較快,但難以保持一致的寫入。 4. **圖形資料庫**:用圖的結構概念儲存資料,資料庫則顯示每個節點之間的關係。 >*圖片來源*:黃士嘉與吳佩儒(2017)。7天學會大數據資料處理NoSQL: MongoDB入門與活用 (第2版),第5頁。博碩文化股份有限公司。 ![](https://hackmd.io/_uploads/SJrAOln0n.png) ### 小結 從大數據的處理架構到不同的資料類型,可以逐步了解到要最佳化地利用身邊的資料數據確實不是件容易的事。雖然本章節著重於介紹NoSQL,不過筆者須強調這只是為達成目標的手段之一,選擇合適自身需求的資料庫,來處理大數據才是關鍵目的。因此,熟悉大數據的根本處理原則及了解不同資料庫的優缺點,或許更有助設計與建立資料庫。

    Import from clipboard

    Paste your markdown or webpage here...

    Advanced permission required

    Your current role can only read. Ask the system administrator to acquire write and comment permission.

    This team is disabled

    Sorry, this team is disabled. You can't edit this note.

    This note is locked

    Sorry, only owner can edit this note.

    Reach the limit

    Sorry, you've reached the max length this note can be.
    Please reduce the content or divide it to more notes, thank you!

    Import from Gist

    Import from Snippet

    or

    Export to Snippet

    Are you sure?

    Do you really want to delete this note?
    All users will lose their connection.

    Create a note from template

    Create a note from template

    Oops...
    This template has been removed or transferred.
    Upgrade
    All
    • All
    • Team
    No template.

    Create a template

    Upgrade

    Delete template

    Do you really want to delete this template?
    Turn this template into a regular note and keep its content, versions, and comments.

    This page need refresh

    You have an incompatible client version.
    Refresh to update.
    New version available!
    See releases notes here
    Refresh to enjoy new features.
    Your user state has changed.
    Refresh to load new user state.

    Sign in

    Forgot password

    or

    By clicking below, you agree to our terms of service.

    Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
    Wallet ( )
    Connect another wallet

    New to HackMD? Sign up

    Help

    • English
    • 中文
    • Français
    • Deutsch
    • 日本語
    • Español
    • Català
    • Ελληνικά
    • Português
    • italiano
    • Türkçe
    • Русский
    • Nederlands
    • hrvatski jezik
    • język polski
    • Українська
    • हिन्दी
    • svenska
    • Esperanto
    • dansk

    Documents

    Help & Tutorial

    How to use Book mode

    Slide Example

    API Docs

    Edit in VSCode

    Install browser extension

    Contacts

    Feedback

    Discord

    Send us email

    Resources

    Releases

    Pricing

    Blog

    Policy

    Terms

    Privacy

    Cheatsheet

    Syntax Example Reference
    # Header Header 基本排版
    - Unordered List
    • Unordered List
    1. Ordered List
    1. Ordered List
    - [ ] Todo List
    • Todo List
    > Blockquote
    Blockquote
    **Bold font** Bold font
    *Italics font* Italics font
    ~~Strikethrough~~ Strikethrough
    19^th^ 19th
    H~2~O H2O
    ++Inserted text++ Inserted text
    ==Marked text== Marked text
    [link text](https:// "title") Link
    ![image alt](https:// "title") Image
    `Code` Code 在筆記中貼入程式碼
    ```javascript
    var i = 0;
    ```
    var i = 0;
    :smile: :smile: Emoji list
    {%youtube youtube_id %} Externals
    $L^aT_eX$ LaTeX
    :::info
    This is a alert area.
    :::

    This is a alert area.

    Versions and GitHub Sync
    Get Full History Access

    • Edit version name
    • Delete

    revision author avatar     named on  

    More Less

    Note content is identical to the latest version.
    Compare
      Choose a version
      No search result
      Version not found
    Sign in to link this note to GitHub
    Learn more
    This note is not linked with GitHub
     

    Feedback

    Submission failed, please try again

    Thanks for your support.

    On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

    Please give us some advice and help us improve HackMD.

     

    Thanks for your feedback

    Remove version name

    Do you want to remove this version name and description?

    Transfer ownership

    Transfer to
      Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

        Link with GitHub

        Please authorize HackMD on GitHub
        • Please sign in to GitHub and install the HackMD app on your GitHub repo.
        • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
        Learn more  Sign in to GitHub

        Push the note to GitHub Push to GitHub Pull a file from GitHub

          Authorize again
         

        Choose which file to push to

        Select repo
        Refresh Authorize more repos
        Select branch
        Select file
        Select branch
        Choose version(s) to push
        • Save a new version and push
        • Choose from existing versions
        Include title and tags
        Available push count

        Pull from GitHub

         
        File from GitHub
        File from HackMD

        GitHub Link Settings

        File linked

        Linked by
        File path
        Last synced branch
        Available push count

        Danger Zone

        Unlink
        You will no longer receive notification when GitHub file changes after unlink.

        Syncing

        Push failed

        Push successfully