# Building Self-serve Data Platform Based On Dagster, DBT and DuckDB - 賴宗智 {%hackmd @mopcon/rkdTi5NTR %} > 從這開始 ## Modern Data Stack (MDS) ### Characteristics Of Modern Data Stack Most bloggers' point of view: * Cloud-native * ELT focused * Modular * SQL first ### What Is Modern Data Stack (MDS) **Good** * Layers: ingestion, warehousing, transformation, Bl. * Horizontal products and unlimited scale using cloud infrastructure. (Cost is the primary constraint to data processing) * Low overhead investment (infra/data engineers). * United by SQL. * Both fast from an iteration perspective and a pure query execution time perspective. **Bad** * Governance is immature (tooling and best-practices are needed, Data Catalog) * Batch-based * Polling and job scheduling * Data doesn't feed back into operational tools (Reverse ETL) ## Data Mesh * Data mesh is a decentralized sociotechnical approach to share, access, and manage analytical data in complex and large-scale environments. * Principles * Domain ownership (decentralization) * Data as a product (product thinking) * **Self-serve data platform (focused on this talk)** * Federated computational governance (來不及了…XD) ![P_20241027_093741-01](https://hackmd.io/_uploads/ryqX9zjgJx.jpg) ![P_20241027_094010-01](https://hackmd.io/_uploads/H1aa9zoekg.jpg) ![P_20241027_094512-01](https://hackmd.io/_uploads/Bk0hjGsgkx.jpg) ![P_20241027_094745-01](https://hackmd.io/_uploads/BJ2R6GsgJe.jpg) ## Dagster Pipeline With DBT And DuckDB * Pros * Versatility (General-purposed ETL/ELT data pipelines) * Flexibility (manage multiple database environments, adapter by data model/contract) * Cost efficiency (via DuckDB) * Easy to rollback for disaster recovery (GitOps) * Cons * Needs to maintain EL tasks for various sources and destinations * Lack of built-in data lineage for dbt models ## Recap * Modern Data Stack * Data Mesh * Self-serve data platform * Data orchestration / pipeline, workflow management * Dagster as a self-serve data platform * Domain-agnostic components * Domain-specific repos * WAP-based data pipeline using dbt * with DuckDB, or * with staging table in DWs