From Data Pipeline to Airflow - the Obstacles in Our Migration - 莊鐵鴻
歡迎來到 PyCon APAC 2022 共筆
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
共筆入口:https://hackmd.io/@pycontw/2022
手機版請點選上方 按鈕展開議程列表。
Welcome to PyCon APAC 2022 Collaborative Writing
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Collaborative Writing Workplace:https://hackmd.io/@pycontw/2022
Using mobile please tap to unfold the agenda.
Slide link 投影片連結:
YouTube link 演講影片連結:TBA
Collaborative writing start from below
從這裡開始共筆
Below is the part that speaker updated the talk/tutorial after speech
講者於演講後有更新或勘誤投影片的部份
- Q: 把舊系統轉移到 airflow 遇到的挑戰中,你覺得最值得再深入研究的是什麼?
- 我對兩個部份特別感興趣:
- CWL - Common Workflow Language。似乎是一種描述 workflow 的通用標準,Airflow 的 conference 裡有相關的演講。如果我們的 pipeline 可以用 CWL 描述,也許就能用現成的 App 透過使用者介面來建立了。
- 執行時對 DAG Run 的掌握度所能帶來的好處。比如在執行 DAG Run 時才決定 EC2 的 region 的話是否有好處,這件事在 AWS Data Pipeline 上是無法控制的。
- Q: What is your motivation behind the migration to Airflow? Did you consider other pipeline solutions, and how do they compare to Airflow (and AWS Data Pipeline)?
- We heard that there will no more improvement on AWS Data Pipeline and we want more flexibility. So we decided to find alternative solutions.
- I had to make the decision without much information. So I chose Airflow because of Python.
- I did read some materials about Apache Nifi later.
- Cons
- Developed with Java and I do not like Java.
- Looks like if we want something unusual, we have to speak Java, and I do not like Java.
- Looks like Nifi focuses on data more while we currently focus on tasks more.
- Pros
- The web interface is nice.
- The data flow builder is almost what I want for our recommender system.