# English Introduction
## Overview
### Comparison of Popularity
#### Github Stars
* conductor: 12.9k
* airflow: 33.6k
* temporal: 9.3k
* Cadence: 7.7k
## Conductor
### Introduction
* The design goal of Conductor is to serve as a "**scheduling engine**" that can flexibly control each task.
* Conductor is built on **microservices**, capable of cross-service scheduling, and can quickly construct new workflows using existing services.
* Describes execution flow with **JSON DSL**, separating task definition from execution clearly; also provides Python support.
* Capable of synchronously handling multiple tasks and has **high scalability**.
### Architecture

* Through the API provided by Conductor, tasks, workflows, etc., can be easily managed.
### Data Storage Method
* According to official GitHub issues:
* Netflix officially utilizes Cassandra as its primary database.
* Additionally, they employ Redis as the underlying technology for the internal queuing system and for implementing rate-limiting DAO.
* Apart from these, it is also recommended to use MySQL or PostgreSQL.
[Source](https://github.com/Netflix/conductor/issues/2059)
### Advantages
* Capable of creating complex workflows.
* Tracking of execution status available at any time.
* Capable of synchronous task processing.
* Highly scalable, able to handle large workloads.
### Disadvantages
* Workflow definition is slightly cumbersome, debugging is not easy.
* JSON DSL readability is not high.
### Recommended Use Cases
* Management of large-scale, microservices-based scenarios.
## Red node
All search results related to Red node lead to "Node-Red," which is a powerful IoT development tool based on Node.js introduced by IBM through a browser interface. It differs significantly from other workflow frameworks and may be a misnomer. This part needs confirmation and updating.
## Airflow
### Introduction
* Airflow is a workflow management platform open-sourced by Airbnb, allowing developers to schedule and process data using **configuration as code** (Python) and control scheduling and monitor status through a WebUI.
* In the world of Airflow, there are several important concepts:
* Operators: Define the "**actual work content**," including built-in common work content (such as executing bash commands, python functions, etc.), customizing work content, and linking external services or databases using Hooks, usually using Jinja syntax.
* Tasks: Tasks can be **assigned** the actual work content through code to generate tasks and handed over to DAG to define how these tasks are executed. From my understanding, Airflow further divides tasks from Conductor into Operators + Tasks.
* DAG: Equivalent to "workflow" in Conductor, the entire workflow is defined as a **directed acyclic graph** to define the relationship between tasks and the execution order. Defined in Python.
* In Airflow, each task is an independent entity, and any data generated by the task will **disappear upon task completion**.
* In version 1.10 and later, Airflow can use the **Kubernetes Executor** to allocate different CPUs, memory, settings, etc., for each task to maximize resource utilization.

### Architecture

* Scheduler: Continuously scans the dependencies of current tasks (can be constructed into a directed acyclic graph) and schedules task execution according to the **specified time interval**.
* Metadata Database: Airflow stores important data (such as task status) in a PostgreSQL or MySQL database to **ensure data persistence** and **fault recovery**.
* Web server: Provides a **user interaction interface** to view task execution, task dependency status, log access, etc.
* Worker Nodes: Responsible for **executing tasks**.
### Data Storage Method
* According to Astronomer documentation:
* Any database supported by SQLAlchemy can be used to store Airflow's metadata.
* Recommended databases include:
* Postgres (most common)
* MySQL
* SQLite
[Source](https://docs.astronomer.io/learn/airflow-database)
### Advantages
* When dealing with **heavy workloads**, Airflow can maximize performance using the Kubernetes Executor.
### Disadvantages
* Airflow is highly related to **Python**, including its underlying components and workflow definition methods.
### Recommended Use Cases
* Handling relatively simple data migration tasks.
* Initiating Dataflow for data pipeline processing.
## Temporal
### Introduction
* Core concept: Temporal is the simple, scalable open-source way to write and run reliable cloud applications.
* In the world of Temporal, there are several important concepts:
* Workflow: Equivalent to the "workflow" in Conductor, can be defined using **multiple languages** (including Go, Java, PHP, Python, TypeScript, etc.); Workflow Definition must be deterministic.

* Activity: Refers to **executing a single, clearly defined function or method**, equivalent to the "task" in Conductor.

### Architecture

* Workers: Can be seen as compute nodes responsible for actually executing programs; can **run multiple Workers in parallel** simultaneously; Workers are **stateless**, and different Workers do not communicate with each other.
* Temporal Service: Responsible for tracking the execution status of workflows, activities, and tasks, can have **multiple instances** during high loads.
* Data store: Stores necessary information such as task queues, execution status, logs, etc., that need **persistent storage**, currently officially supported by Cassandra and MySQL.
* Web Console: The **Web UI** provided by Temporal, which allows viewing workflow information at any time.
* External Client: After connecting to the Temporal Service, can **trigger** the execution of workflows.
### Data Storage Method
* According to official documentation:
* Temporal supports various databases, including:
* MySQL
* PostgreSQL
* SQLite
* Elasticsearch (highly recommended)
[Source](https://www.restack.io/docs/temporal-knowledge-temporal-database-examples)
### Advantages
* Workflows are specially designed to be **fault-tolerant**, able to recover workflow execution status and progress in case of errors; workflows may only return execution failure results due to errors in their executed code; this is also a **highlighted advantage** by the official documentation.
* Workflows have no execution time limit, can **run for a long time**.
* Scalable, able to **handle large numbers of workflows** simultaneously.
### Disadvantages
* Workflow Definition must be **deterministic**.
### Recommended Use Cases
* In scenarios where there is a need to **handle a large number of workflows simultaneously**, Temporal can still maintain high performance.
* When workflows need to run for a long time (several months).
## Cadence
### Introduction
* Cadence is very similar to Conductor.
* In the world of Cadence, there are also two important concepts corresponding to Conductor:
* Activities: Equivalent to the "task" in Conductor, defined using **JAVA or GO**.
* Workflow: The feature of Cadence workflow is that it is **code-first**, making it easy to know the workflow process through code; the biggest difference from Conductor is that it does not require **loop** iteration, but achieves it by **recursively calling another workflow through Workflow.continueAsNew**; additionally, the Workflow.Sleep method can **pause this workflow for a long time** until it is awakened at the correct time.
### Architecture

### Data Storage Method
* According to the official GitHub documentation, Cadence provides an API on the persistence layer, allowing the use of any database that supports multi-row transactions on a single shard or partition. This includes:
* Cassandra
* DynamoDB
* AuroraDB
* MySQL
* Postgres
[Source](https://github.com/uber/cadence/blob/master/docs/persistence.md)
### Advantages
* The **code-first** architecture makes maintenance and integration easier.
* More universal, can replace the backend of workflows with Cadence while retaining the workflow definition language.
### Disadvantages
* **Dependent on many tools**, not suitable for use in a single service.
### Recommended Use Cases
* Workflow management for **complex distributed systems**.
## Summary
* Compared to Conductor:
* Airflow is simpler, has a larger community, but is not suitable for high-load work scenarios and is bound to Python.
* Temporal can handle workflow errors well.
* Cadence has a code-first feature but depends on more environments and is more suitable for large-scale application scenarios.
## References
1. Airflow vs. Cadence: [Link](https://www.instaclustr.com/blog/airflow-vs-cadence-a-side-to-side-comparison/)
2. Netflix Conductor vs. Temporal (Uber Cadence) vs. Airflow: [Link](https://medium.com/@chucksanders22/netflix-conductor-v-s-temporal-uber-cadence-v-s-zeebe-vs-airflow-320df0365948)
3. Uber Cadence vs. Netflix Conductor: [Link](https://instaclustr.medium.com/workflow-comparison-uber-cadence-vs-netflix-conductor-7a332b60cd12)
4. Airflow Architecture: A Deep Dive into Data Pipeline Orchestration: [Link](https://medium.com/@bageshwar.kumar/airflow-architecture-a-deep-dive-into-data-pipeline-orchestration-217dd2dbc1c3)
5. Apache Airflow: Data Pipeline Management Console: [Link](https://tech.hahow.in/apache-airflow-%E5%B7%A5%E4%BD%9C%E6%B5%81%E7%A8%8B%E7%AE%A1%E7%90%86%E6%8E%A7%E5%88%B6%E5%8F%B0-4dc8e6fc1a6a)
6. A Practical Approach to Temporal Architecture: [Link](https://mikhail.io/2020/10/practical-approach-to-temporal-architecture/)
# 中文介紹
## 概述
### 熱門程度比較
#### github star
* conductor: 12.9k
* airflow: 33.6k
* temporal: 9.3k
* Cadence: 7.7k
## conductor
### 介紹
* Conductor 的設計目標是 "**作為排程引擎**",可以靈活控制每個任務
* Conductor 建立在**微服務**上,能夠進行跨服務排程,可以利用現有的服務快速建構出新的 workflow
* 以 **JSON DSL** 來描述 execution flow,任務的定義跟執行清楚的被分開;也提供了 Python 支援
* 能夠同步處理多個 task,且具有**高度的可擴展性**
* 在 Conductor 的世界中,有兩個重要的觀念:
* Task: 代表需要執行的**具體工作或操作**,可能涉及呼叫外部 service、處理資料、觸發其他 task 等等。可以透過 JSON 定義
定義範例
* workflow: 由許多 Task 組成,定義了**任務之間的相依性**、執行順序以及分支等等,同樣可以透過 JSON 定義

### 架構

* 透過 Conductor 提供的 API ,便能輕鬆的管理 tasks、workflows 等等
### 資料儲存方式
* 根據官方 github 的 issues
* Netflix 官方使用 Cassandra 作為主要的資料庫
* 並使用了 Redis 作為 internal queuing system 的底層,以及實作 rate limiting dao
* 除此之外,也建議使用 MySQL 或 PostgreSQL
[Sourse](https://github.com/Netflix/conductor/issues/2059)
### 優點
* 能創建複雜的 workflow
* 可隨時追蹤執行情況
* 能同步處理任務
* 取有可擴展性,能處理大量的工作
### 缺點
* workflow 定義略為麻煩、debug 不易
* JSON DSL 可讀性不高
### 建議的使用情境
* 管理**大規模**,且以微服務為主要架構的場景
## Red node
全部有關 Red node 的搜尋結果都導向至 "Node-Red",但 Node-Red 是 IBM 推出的一套架構在 Node.js,使用瀏覽器介面的強大物聯網開發工具,跟其他 workflow 框架差異頗大,可能是打錯關鍵字,此部分有待確認與更新。
## Airflow
### 介紹
* Airflow 是一個由 Airbnb 開源的工作流程管理平台,讓開發者可以用 **configuration as code**(Python)的方式排程處理資料,且搭配 WebUI 控制排程、監控狀態
* 在 Airflow 的世界中,有幾個重要觀念:
* Operators: 定義了 "**實際的工作內容**",除了內建常見的工作內容 (執行 bash 指令、python function 等等)外,還可以自訂工作內容,並利用 Hooks 連結外部 service 或資料庫,通常使用 Jinja 語法
* Tasks: 可以透過程式碼**指派**實際的工作內容,進而產生 Tasks,並交給 DAG 定義這些 Tasks 的執行方式。就我的理解而言,Airflow 將 Conductor 中的 task 進一步分成 Operators + Tasks
* DAG: 相當於 conductor 中的 "workflow",整個 workflow 被定義成一張**有向無環圖**,這樣做便可**定義各個任務間的關係,以及執行順序**。由 python 定義
* 在 Airflow 中,每個工作都是獨立的實體,所產生的任何資料**會隨著工作完成消失**
* 在 1.10 版以後, Airflow 能使用 **Kubernetes Executor**,跟據每個工作分配不同的 CPU、memory、設定等,達到資源利用最大化

### 架構

* Scheduler: 不斷掃描當前任務的相依關係 (可以建立成一張有向無環圖),並根據**設定的指定時間間隔**來安排任務的執行
* Metadata Database: Airflow 將重要資料 (例如任務狀態) 存入 PostgreSQL 或 MySQL 資料庫,以**確保資料能持久化保存**,以及**故障復原**
* Web server: 提供**使用者互動介面**,可以查看任務執行、任務相依狀態、log 存取等等
* Worker Nodes: 負責**執行任務**
### 資料儲存方式
* 根據Astronomer documentation
* 任何支援 SQLAlchemy 的資料庫都可以拿來儲存 Airflow 的 metadata
* 建議使用的資料庫有
* Postgres (最常見)
* MySQL
* SQLite
[Sourse](https://docs.astronomer.io/learn/airflow-database)
### 優點
* 在**工作量大**的時候,能使用 Kubernetes Executor 使效能最大化
### 缺點
* airflow **跟 python 高度相關**,包含其底層以及 workflow 的定義方式等等
### 建議的使用情景
* 負責處理一些較單純的資料搬移工作
* 啟動 Dataflow 處理資料管道
## Temporal
### 介紹
* 核心概念: Temporal is the simple, scalable open source way to write and run reliable cloud applications
* 在 Temporal 的世界中,有幾個重要觀念:
* workflow: 相當於 conductor 中的 "workflow",可以**由多種語言來定義** (包含 Go、Java、PhP、Python、Typesxript 等等);Workflow Definition 必須是確定性的

* Activity: 指的是**執行單一、明確定義的的函式或 method**,相當於 conductor 中的 "task"

### 架構

* Workers: 可以被視作是 compute nodes,負責實際執行程式;可以**同時平行運行多個 Workers**;Workers 是**無狀態的**,且不同 Worker 間不會互相溝通
* Temporal Service: 負責追蹤 workflows, activities 及 tasks 的執行狀況,在**高負載時可以有多個**
* Data store: 儲存了 task queue、執行狀態、log 等等需要**持久化保存的必要資訊**,目前官方支援 Cassandra 和 MySQL
* Web Console: Temporal 提供的 **Web UI**,能隨時查看 workflow 等等資訊
* External Client: 連接至 Temporal Service 後,便能**觸發** workflow 的執行
### 資料儲存方式
* 根據官方 documentation
* Temporal 支援多種資料庫,包括
* MySQL
* PostgreSQL
* SQLite
* Elasticsearch (最推薦)
[Sourse](https://www.restack.io/docs/temporal-knowledge-temporal-database-examples)
### 優點
* workflow 特別設計成**可容錯**,發生錯誤時能夠復原 workflow 的執行狀態跟進度;workflow 只可能因為其執行的程式碼出錯而回傳執行失敗的結果;這也是**官方強調的優點**
* workflow 無執行時間上限,**能長時間執行**
* 具有可擴展性,能**同時處理大量 workflow**
### 缺點
* Workflow Definition 必須是**確定性**的
### 建議的使用情景
* 在面對需要**同時處理大量 workflow 的場景**,Temporal 仍能保持高性能
* 需要長時間 (數月) 運行 workflow 時
## Cadence
### 介紹
* Cadence 與 conductor 非常像
* 在 Cadence 的世界中,同樣有兩個與 Conductor 對應的重要概念:
* Activities: 相當於 conductor 中的 "task",透過 JAVA 或 GO 來定義
* workflow: Cadence workflow 的特性在於其是 **code first** 的,能夠透過程式碼輕易的得知這個 workflow 的工作流程;與 Conductor 最大的不同是不需要**迴圈**循環,而是**透過 Workflow.continueAsNew 遞迴的呼叫另一個 workflow 來實現**,這是 Cadence 的一大特點;此外,Workflow.Sleep 這個方法可以**長時間暫停這個 workflow**,直到在正確的時機被叫醒
### 架構

### 資料儲存方式
* 根據官方 github,Cadence 在 persistence layer 上提供了 API,故任何支援 multi-row transactions on a single shard 或是 partition 的資料庫都可被使用,包含
* cassandra
* dynamoDB
* auroraDB
* MySQL
* Postgres
[Sourse](https://github.com/uber/cadence/blob/master/docs/persistence.md)
### 優點
* code-first 的架構使的維護與整合較為輕鬆
* 較為通用,可在保留 workflow definition language 的同時,將 workflow 後端替換成 Cadence
### 缺點
* **依賴的工具多**,不適合單一服務使用
### 建議的使用情景
* **複雜**的分佈式系統的 workflow 管理
## 總結
* 相較於 conductor
* Airflow 較簡單、社群較大,但不適合高負載的工作場景,且綁定 python
* Temporal 則能很好的處理 workflow 的錯誤情況
* Cadence 有著 code-first 的特性,但依賴的環境較多,較適用於大型的應用場景
## 參考資料
1. Airflow vs. Cadence: [Link](https://www.instaclustr.com/blog/airflow-vs-cadence-a-side-to-side-comparison/)
2. Netflix Conductor vs. Temporal (Uber Cadence) vs. Airflow: [Link](https://medium.com/@chucksanders22/netflix-conductor-v-s-temporal-uber-cadence-v-s-zeebe-vs-airflow-320df0365948)
3. Uber Cadence vs. Netflix Conductor: [Link](https://instaclustr.medium.com/workflow-comparison-uber-cadence-vs-netflix-conductor-7a332b60cd12)
4. Airflow Architecture: A Deep Dive into Data Pipeline Orchestration: [Link](https://medium.com/@bageshwar.kumar/airflow-architecture-a-deep-dive-into-data-pipeline-orchestration-217dd2dbc1c3)
5. Apache Airflow: Data Pipeline Management Console: [Link](https://tech.hahow.in/apache-airflow-%E5%B7%A5%E4%BD%9C%E6%B5%81%E7%A8%8B%E7%AE%A1%E7%90%86%E6%8E%A7%E5%88%B6%E5%8F%B0-4dc8e6fc1a6a)
6. A Practical Approach to Temporal Architecture: [Link](https://mikhail.io/2020/10/practical-approach-to-temporal-architecture/)