歡迎來到 MOPCON 2024 共筆
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
共筆入口:https://hackmd.io/@mopcon/2024
手機版請點選上方 按鈕展開議程列表。
從這開始
Modern Data Stack (MDS)
Characteristics Of Modern Data Stack
Most bloggers' point of view:
- Cloud-native
- ELT focused
- Modular
- SQL first
What Is Modern Data Stack (MDS)
Good
- Layers: ingestion, warehousing, transformation, Bl.
- Horizontal products and unlimited scale using cloud infrastructure. (Cost is the primary constraint to data processing)
- Low overhead investment (infra/data engineers).
- United by SQL.
- Both fast from an iteration perspective and a pure query execution time perspective.
Bad
- Governance is immature (tooling and best-practices are needed, Data Catalog)
- Batch-based
- Polling and job scheduling
- Data doesn't feed back into operational tools (Reverse ETL)
Data Mesh
- Data mesh is a decentralized sociotechnical approach to share, access, and manage analytical data in complex and large-scale environments.
- Principles
- Domain ownership (decentralization)
- Data as a product (product thinking)
- Self-serve data platform (focused on this talk)
- Federated computational governance
(來不及了…XD)
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Image Not Showing
Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →
Dagster Pipeline With DBT And DuckDB
- Pros
- Versatility (General-purposed ETL/ELT data pipelines)
- Flexibility (manage multiple database environments, adapter by data model/contract)
- Cost efficiency (via DuckDB)
- Easy to rollback for disaster recovery (GitOps)
- Cons
- Needs to maintain EL tasks for various sources and destinations
- Lack of built-in data lineage for dbt models
Recap
- Modern Data Stack
- Data Mesh
- Self-serve data platform
- Data orchestration / pipeline, workflow management
- Dagster as a self-serve data platform
- Domain-agnostic components
- Domain-specific repos
- WAP-based data pipeline using dbt
- with DuckDB, or
- with staging table in DWs