# Data Pipeline 套件比較: Airflow, MetaFlow, Prefect ###### tags: `anue` > [time=Thu, Dec 19, 2019 3:30 PM] - Airlofw - MetaFlow - Prefect DAG Operators --- ## Compare | flow |Owner | Feature |Cloud Services Support| Github CreateAt | ------ | ------| -------- | ------ | ----- | | Airflow |Airbnb/Apache |Complete| GCP/AWS |2015-04-13 | Metaflow |Netflix |Saves Every Resulting| AWS |2019-09-17 | Prefect |PrefectHQ |Light| GCP/AWS |2018-06-29 ## Airflow ---- ## DAG ```python """ Code that goes along with the Airflow tutorial located at: https://github.com/apache/airflow/blob/master/airflow/example_dags/tutorial.py """ from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2015, 6, 1), 'email': ['airflow@example.com'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5), # 'queue': 'bash_queue', # 'pool': 'backfill', # 'priority_weight': 10, # 'end_date': datetime(2016, 1, 1), } dag = DAG( 'tutorial', default_args=default_args, schedule_interval=timedelta(days=1)) # t1, t2 and t3 are examples of tasks created by instantiating operators t1 = BashOperator( task_id='print_date', bash_command='date', dag=dag) t2 = BashOperator( task_id='sleep', bash_command='sleep 5', retries=3, dag=dag) templated_command = """ {% for i in range(5) %} echo "{{ ds }}" echo "{{ macros.ds_add(ds, 7)}}" echo "{{ params.my_param }}" {% endfor %} """ t3 = BashOperator( task_id='templated', bash_command=templated_command, params={'my_param': 'Parameter I passed in'}, dag=dag) t2.set_upstream(t1) t3.set_upstream(t1) ``` ---- ![](https://i.imgur.com/GPcR5cl.png) --- ## Metaflow [Metaflow tutorials](https://docs.metaflow.org/getting-started/tutorials) - Netflix - it automatically saves everything resulting from your code to S3, which makes it really portable and easy to pick up from failed tasks - 資料與不同時間的 run() 都包在 flow裡 ---- ```python # 00-helloworld/helloworld.py from metaflow import FlowSpec, step class HelloFlow(FlowSpec): """ A flow where Metaflow prints 'Hi'. Run this flow to validate that Metaflow is installed correctly. """ @step def start(self): """ This is the 'start' step. All flows must have a step named 'start' that is the first step in the flow. """ print("HelloFlow is starting.") self.next(self.hello) @step def hello(self): """ A step for metaflow to introduce itself. """ print("Metaflow says: Hi!") self.next(self.end) @step def end(self): """ This is the 'end' step. All flows must have an 'end' step, which is the last step in the flow. """ print("HelloFlow is all done.") if __name__ == '__main__': HelloFlow() ``` ---- ```bash python 00-helloworld/helloworld.py show ``` ![](https://i.imgur.com/WOEwuiG.png) ---- ```bash python 00-helloworld/helloworld.py run ``` ![](https://i.imgur.com/WJTYzxD.png) --- ## Prefect [medium](https://medium.com/the-prefect-blog?source=post_sidebar--------------------------post_sidebar-) ---- ![](https://i.imgur.com/MKBQmx1.png) ---- - Integration with Dask - like airlofw, but better design - minimal errfots - Use decorator: "task" ,a node in the DAG. - lightful ---- ```python from prefect import task, Flow @task def extract(): """Get a list of data""" return [1, 2, 3] @task def transform(data): """Multiply the input by 10""" return [i * 10 for i in data] @task def load(data): """Print the data to indicate it was received""" print("Here's your data: {}".format(data)) with Flow('ETL') as flow: e = extract() t = transform(e) l = load(t) flow.visualize() ``` ---- ![](https://i.imgur.com/mzjSgN6.png) ---- [cloud-scheduler](https://www.prefect.io/products/cloud-scheduler/) ![](https://i.imgur.com/L7YvMnn.png) ---- ## 結論 Prefect : looks a lot nicer to use and a lot easier to get started Metaflow : For Data Science ,適合機器學習專案使用 Airflow : 大型多人使用的專案 ## Reference
{"metaMigratedAt":"2023-06-15T02:40:20.699Z","metaMigratedFrom":"Content","title":"Data Pipeline 套件比較: Airflow, MetaFlow, Prefect","breaks":true,"contributors":"[{\"id\":\"af59de6f-855f-49e9-b168-cab9f93454b9\",\"add\":0,\"del\":10},{\"id\":\"27e5f37e-eca5-4de1-8bad-e3fb269a144a\",\"add\":5921,\"del\":1553}]"}
    756 views
   Owned this note