# Building Self-serve Data Platform Based On Dagster, DBT and DuckDB - 賴宗智
{%hackmd @mopcon/rkdTi5NTR %}
> 從這開始
## Modern Data Stack (MDS)
### Characteristics Of Modern Data Stack
Most bloggers' point of view:
* Cloud-native
* ELT focused
* Modular
* SQL first
### What Is Modern Data Stack (MDS)
**Good**
* Layers: ingestion, warehousing, transformation, Bl.
* Horizontal products and unlimited scale using cloud infrastructure. (Cost is the primary constraint to data processing)
* Low overhead investment (infra/data engineers).
* United by SQL.
* Both fast from an iteration perspective and a pure query execution time perspective.
**Bad**
* Governance is immature (tooling and best-practices are needed, Data Catalog)
* Batch-based
* Polling and job scheduling
* Data doesn't feed back into operational tools (Reverse ETL)
## Data Mesh
* Data mesh is a decentralized sociotechnical approach to share, access, and manage analytical data in complex and large-scale environments.
* Principles
* Domain ownership (decentralization)
* Data as a product (product thinking)
* **Self-serve data platform (focused on this talk)**
* Federated computational governance
(來不及了…XD)




## Dagster Pipeline With DBT And DuckDB
* Pros
* Versatility (General-purposed ETL/ELT data pipelines)
* Flexibility (manage multiple database environments, adapter by data model/contract)
* Cost efficiency (via DuckDB)
* Easy to rollback for disaster recovery (GitOps)
* Cons
* Needs to maintain EL tasks for various sources and destinations
* Lack of built-in data lineage for dbt models
## Recap
* Modern Data Stack
* Data Mesh
* Self-serve data platform
* Data orchestration / pipeline, workflow management
* Dagster as a self-serve data platform
* Domain-agnostic components
* Domain-specific repos
* WAP-based data pipeline using dbt
* with DuckDB, or
* with staging table in DWs