# Data Pipeline 套件比較: Airflow, MetaFlow, Prefect
###### tags: `anue`
> [time=Thu, Dec 19, 2019 3:30 PM]
- Airlofw
- MetaFlow
- Prefect
DAG
Operators
---
## Compare
| flow |Owner | Feature |Cloud Services Support| Github CreateAt
| ------ | ------| -------- | ------ | ----- |
| Airflow |Airbnb/Apache |Complete| GCP/AWS |2015-04-13
| Metaflow |Netflix |Saves Every Resulting| AWS |2019-09-17
| Prefect |PrefectHQ |Light| GCP/AWS |2018-06-29
## Airflow
----
## DAG
```python
"""
Code that goes along with the Airflow tutorial located at:
https://github.com/apache/airflow/blob/master/airflow/example_dags/tutorial.py
"""
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': datetime(2015, 6, 1),
'email': ['airflow@example.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG(
'tutorial', default_args=default_args, schedule_interval=timedelta(days=1))
# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag)
t2 = BashOperator(
task_id='sleep',
bash_command='sleep 5',
retries=3,
dag=dag)
templated_command = """
{% for i in range(5) %}
echo "{{ ds }}"
echo "{{ macros.ds_add(ds, 7)}}"
echo "{{ params.my_param }}"
{% endfor %}
"""
t3 = BashOperator(
task_id='templated',
bash_command=templated_command,
params={'my_param': 'Parameter I passed in'},
dag=dag)
t2.set_upstream(t1)
t3.set_upstream(t1)
```
----

---
## Metaflow
[Metaflow tutorials](https://docs.metaflow.org/getting-started/tutorials)
- Netflix
- it automatically saves everything resulting from your code to S3, which makes it really portable and easy to pick up from failed tasks
- 資料與不同時間的 run() 都包在 flow裡
----
```python
# 00-helloworld/helloworld.py
from metaflow import FlowSpec, step
class HelloFlow(FlowSpec):
"""
A flow where Metaflow prints 'Hi'.
Run this flow to validate that Metaflow is installed correctly.
"""
@step
def start(self):
"""
This is the 'start' step. All flows must have a step named 'start' that
is the first step in the flow.
"""
print("HelloFlow is starting.")
self.next(self.hello)
@step
def hello(self):
"""
A step for metaflow to introduce itself.
"""
print("Metaflow says: Hi!")
self.next(self.end)
@step
def end(self):
"""
This is the 'end' step. All flows must have an 'end' step, which is the
last step in the flow.
"""
print("HelloFlow is all done.")
if __name__ == '__main__':
HelloFlow()
```
----
```bash
python 00-helloworld/helloworld.py show
```

----
```bash
python 00-helloworld/helloworld.py run
```

---
## Prefect
[medium](https://medium.com/the-prefect-blog?source=post_sidebar--------------------------post_sidebar-)
----

----
- Integration with Dask
- like airlofw, but better design
- minimal errfots
- Use decorator: "task" ,a node in the DAG.
- lightful
----
```python
from prefect import task, Flow
@task
def extract():
"""Get a list of data"""
return [1, 2, 3]
@task
def transform(data):
"""Multiply the input by 10"""
return [i * 10 for i in data]
@task
def load(data):
"""Print the data to indicate it was received"""
print("Here's your data: {}".format(data))
with Flow('ETL') as flow:
e = extract()
t = transform(e)
l = load(t)
flow.visualize()
```
----

----
[cloud-scheduler](https://www.prefect.io/products/cloud-scheduler/)

----
## 結論
Prefect : looks a lot nicer to use and a lot easier to get started
Metaflow : For Data Science ,適合機器學習專案使用
Airflow : 大型多人使用的專案
## Reference
{"metaMigratedAt":"2023-06-15T02:40:20.699Z","metaMigratedFrom":"Content","title":"Data Pipeline 套件比較: Airflow, MetaFlow, Prefect","breaks":true,"contributors":"[{\"id\":\"af59de6f-855f-49e9-b168-cab9f93454b9\",\"add\":0,\"del\":10},{\"id\":\"27e5f37e-eca5-4de1-8bad-e3fb269a144a\",\"add\":5921,\"del\":1553}]"}