--- title: "From Data Pipeline to Airflow - the Obstacles in Our Migration - 莊鐵鴻" tags: PyConAPAC2022, 2022-organize, 2022-共筆 --- # From Data Pipeline to Airflow - the Obstacles in Our Migration - 莊鐵鴻 {%hackmd 3JQH2UcwQ1e5RMgz4GRiKg %} <iframe src=https://app.sli.do/event/qBYKnWQknV3LjLUciY4BX1 height=450 width=100%></iframe> Slide link 投影片連結: YouTube link 演講影片連結:TBA > Collaborative writing start from below > 從這裡開始共筆 Below is the part that speaker updated the talk/tutorial after speech 講者於演講後有更新或勘誤投影片的部份 - Q: 把舊系統轉移到 airflow 遇到的挑戰中,你覺得最值得再深入研究的是什麼? - 我對兩個部份特別感興趣: - [CWL - Common Workflow Language](https://www.commonwl.org/)。似乎是一種描述 workflow 的通用標準,Airflow 的 conference 裡有相關的演講。如果我們的 pipeline 可以用 CWL 描述,也許就能用現成的 App 透過使用者介面來建立了。 - 執行時對 DAG Run 的掌握度所能帶來的好處。比如在執行 DAG Run 時才決定 EC2 的 region 的話是否有好處,這件事在 AWS Data Pipeline 上是無法控制的。 - Q: What is your motivation behind the migration to Airflow? Did you consider other pipeline solutions, and how do they compare to Airflow (and AWS Data Pipeline)? - We heard that there will no more improvement on AWS Data Pipeline and we want more flexibility. So we decided to find alternative solutions. - I had to make the decision without much information. So I chose Airflow because of Python. - I did read some materials about Apache Nifi later. - Cons - Developed with Java and I do not like Java. - Looks like if we want something unusual, we have to speak Java, and I do not like Java. - Looks like Nifi focuses on data more while we currently focus on tasks more. - Pros - The web interface is nice. - The data flow builder is almost what I want for our recommender system.