---
title: "From Data Pipeline to Airflow - the Obstacles in Our Migration - 莊鐵鴻"
tags: PyConAPAC2022, 2022-organize, 2022-共筆
---
# From Data Pipeline to Airflow - the Obstacles in Our Migration - 莊鐵鴻
{%hackmd 3JQH2UcwQ1e5RMgz4GRiKg %}
<iframe src=https://app.sli.do/event/qBYKnWQknV3LjLUciY4BX1 height=450 width=100%></iframe>
Slide link 投影片連結:
YouTube link 演講影片連結:TBA
> Collaborative writing start from below
> 從這裡開始共筆
Below is the part that speaker updated the talk/tutorial after speech
講者於演講後有更新或勘誤投影片的部份
- Q: 把舊系統轉移到 airflow 遇到的挑戰中,你覺得最值得再深入研究的是什麼?
- 我對兩個部份特別感興趣:
- [CWL - Common Workflow Language](https://www.commonwl.org/)。似乎是一種描述 workflow 的通用標準,Airflow 的 conference 裡有相關的演講。如果我們的 pipeline 可以用 CWL 描述,也許就能用現成的 App 透過使用者介面來建立了。
- 執行時對 DAG Run 的掌握度所能帶來的好處。比如在執行 DAG Run 時才決定 EC2 的 region 的話是否有好處,這件事在 AWS Data Pipeline 上是無法控制的。
- Q: What is your motivation behind the migration to Airflow? Did you consider other pipeline solutions, and how do they compare to Airflow (and AWS Data Pipeline)?
- We heard that there will no more improvement on AWS Data Pipeline and we want more flexibility. So we decided to find alternative solutions.
- I had to make the decision without much information. So I chose Airflow because of Python.
- I did read some materials about Apache Nifi later.
- Cons
- Developed with Java and I do not like Java.
- Looks like if we want something unusual, we have to speak Java, and I do not like Java.
- Looks like Nifi focuses on data more while we currently focus on tasks more.
- Pros
- The web interface is nice.
- The data flow builder is almost what I want for our recommender system.