Try   HackMD

Building Self-serve Data Platform Based On Dagster, DBT and DuckDB - 賴宗智

歡迎來到 MOPCON 2024 共筆

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

共筆入口:https://hackmd.io/@mopcon/2024
手機版請點選上方 按鈕展開議程列表。

從這開始

Modern Data Stack (MDS)

Characteristics Of Modern Data Stack

Most bloggers' point of view:

  • Cloud-native
  • ELT focused
  • Modular
  • SQL first

What Is Modern Data Stack (MDS)

Good

  • Layers: ingestion, warehousing, transformation, Bl.
  • Horizontal products and unlimited scale using cloud infrastructure. (Cost is the primary constraint to data processing)
  • Low overhead investment (infra/data engineers).
  • United by SQL.
  • Both fast from an iteration perspective and a pure query execution time perspective.

Bad

  • Governance is immature (tooling and best-practices are needed, Data Catalog)
  • Batch-based
  • Polling and job scheduling
  • Data doesn't feed back into operational tools (Reverse ETL)

Data Mesh

  • Data mesh is a decentralized sociotechnical approach to share, access, and manage analytical data in complex and large-scale environments.
  • Principles
    • Domain ownership (decentralization)
    • Data as a product (product thinking)
    • Self-serve data platform (focused on this talk)
    • Federated computational governance

(來不及了…XD)

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Dagster Pipeline With DBT And DuckDB

  • Pros
    • Versatility (General-purposed ETL/ELT data pipelines)
    • Flexibility (manage multiple database environments, adapter by data model/contract)
    • Cost efficiency (via DuckDB)
    • Easy to rollback for disaster recovery (GitOps)
  • Cons
    • Needs to maintain EL tasks for various sources and destinations
    • Lack of built-in data lineage for dbt models

Recap

  • Modern Data Stack
  • Data Mesh
    • Self-serve data platform
    • Data orchestration / pipeline, workflow management
  • Dagster as a self-serve data platform
    • Domain-agnostic components
    • Domain-specific repos
  • WAP-based data pipeline using dbt
    • with DuckDB, or
    • with staging table in DWs