# Airflow 安裝 ###### tags: `系統` 參考資源: - [airflow document](https://airflow.apache.org/docs/apache-airflow/stable/start/index.html) ## 目錄 1. normal Install 2. Install in docker-files ## 啟動 載入python環境 `source venv/bin/activate` ## 安裝 ### 1. Normal Install 利用`pip` 快速安裝airflow - 設定預設路徑 :`export AIRFLOW_HOME=~/airflow` - 安裝 airflow :`pip install apache-airflow` - 設定使用資料庫 :`預設sqllib` 如需變更,請參考setting db - DB初始化 :`airflow db init` - 開啟webserver :`airflow webserver -p 8080` - 開啟schduler :`airflow scheduler` > no user year created, use flask fab command to do it > `FLASK_APP=airflow.www.app flask fab create-admin` <img src="https://i.imgur.com/s5dKYvH.png" width=500> - 完成 <img src='https://i.imgur.com/xmu5Dzc.png' width=500> ### a. setting db #### a.1 `Mysql & MariaDB` 1. 編輯 `airflow.cfg` `mysqldb` ``` mysql+mysqldb://<user>:<password>@<host>[:<port>]/<dbname> ``` or use `sqlconnector` ``` mysql+mysqlconnector://<user>:<password>@<host>[:<port>]/<dbname> ``` 2. 建立 `airflow` 資料庫 ``` CREATE DATABASE airflow_db CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; CREATE USER 'airflow_user' IDENTIFIED BY 'airflow_pass'; GRANT ALL PRIVILEGES ON airflow_db.* TO 'airflow_user'; ``` 3. dbinit ``` airflow db init ``` #### a.2 `Postgres` 1. 編輯 `airflow.cfg` ``` postgresql+psycopg2://<user>:<password>@<host>/<db> ``` 2. 建立 `airflow` 資料庫 ``` CREATE DATABASE airflow_db; CREATE USER airflow_user WITH PASSWORD 'airflow_pass'; GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow_user; ``` 3. dbinit ``` airflow db init ``` #### a.3 `設定時區` `default_timezone = Asia/Taipei` #### a.4 `部署airflow service` 1. 運行服務 建立服務: 在路徑`/lib/systemd/system`建立設定檔 - `touch airflow-schedule.service` - `touch airflow-webserver.service` 2. 編輯服務設定 `systemctl edit --full airflow-schedule.service` `systemctl edit --full airflow-webserver.service` `airflow-schedule.service` ``` [Unit] Description=airflow schedule :handle all schedule of dags [Service] User=root Group=root WorkingDirectory=/var/www/html/auto-service/airflow ExecStart=/bin/bash -c 'ENV=dev /usr/local/python3/bin/python3 app.py' [Install] WantedBy=multi-user.target ``` 3. Systemctl 載入系統設定檔 `systemctl daemon-reload` 4. 啟動服務 `systemctl start file-uploader-api.service` 5. 查看服務狀態 `systemctl status file-uploader-api.service` #### trouble shootting - `pip install psycopg2` error - `pg_config executable not found` <img src="https://i.imgur.com/24lXUPT.png" width=300> 解決方法: `sudo yum install postgresql-devel` --- ### 2. Install in docker-files - 前置作業 - 安裝 [Docker](https://docs.docker.com/engine/install/) - 安裝 `pip install docker-compose` <img src="https://i.imgur.com/09nUWUq.png" width=500> - 建立資料夾 ``` mkdir -p ./airflow ``` - 進入資料夾 ``` cd ./airflow ``` - 取得 **airflow compose-file** ``` curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.1.3/docker-compose.yaml' ``` - 編輯`docker-compose` file: `airlow compose-file` 包含多種服務: - `airflow-scheduler` - 調度器,監控所有任務和 DAG(單向無環圖),然後在它們的依賴關係完成後觸發任務實例 - `airflow-webserver` - 網頁服務,可透過本地位址 `http://localhost:8080` 使用 - `airflow-worker` - 各**工作線程**遵循調度器調度及分配任務資源 - `airflow-init` - 初始化服務 - `flower` - 監測環境應用程式`flower`,可透過本地位址`http://localhost:5555`使用 - `postgres` - 資料庫 - `redis` - The redis - 將訊息從調度程序轉發到工作線程的代理 所有服務都允許您使用[CeleryExecutor](https://airflow.apache.org/docs/apache-airflow/stable/executor/celery.html)運行 Airflow,其他更多資訊詳見[Architecture Overview](https://airflow.apache.org/docs/apache-airflow/stable/concepts/overview.html). - 容器中的資料夾目錄是相互關聯的,目錄內的內容是同步且共用的,設定路徑以指定關聯資料夾 - `./dags` - 你可以把你的 DAG 文件放在這裡。 - `./logs` - 包含來自任務執行和調度程序的日誌。 - `./plugins` - 您可以將自定義插件放在這裡。 3. 初始化環境 - 建立所需資料夾 ``` mkdir -p ./dags ./logs ./plugins ``` <img src="https://i.imgur.com/B6sc4jO.png" width=300> - 建立環境 ``` echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env ``` - 初始化 ``` docker-compose up airflow-init ``` <img src="https://i.imgur.com/pHERxOh.png" width=500> 4. 部署 ``` docker-compose up ``` <img src="https://i.imgur.com/lMg3UIK.png" width=500> 5. 開始使用 `http://localhost:8080/` <img src='https://i.imgur.com/xmu5Dzc.png' width=500> - default: - account `airflow` - pssw `airflow` ### a. trouble shoot change docker container .conf 取得容器id`get container id` ``` docker ps -a ``` <img src="https://i.imgur.com/03KfWrt.png" width=1000> 從docker 複製config 檔案 `copy .conf to local folder` ``` sudo docker cp 2583837c3867:/var/lib/postgresql/data/postgresql.conf /etc/postgresql/ ``` 編輯.conf`edit .conf` ``` vi /etc/postgresql/postgresql.conf ``` 將編輯完成檔案複製回docker`copy back to container` ``` sudo docker cp /etc/postgresql/postgresql.conf 2583837c3867:/var/lib/postgresql/data/postgresql.conf ``` --- ## airflow 部署至apache - 在路徑(/lib/systemd/system)建立設定檔 `sudo systemctl edit --full airflow-webserver.service` ``` [Unit] Descrption= airflow webserver [Service] User=atrustek Group=atrustek WorkingDirectory=/home/atrustek/airflow ExecStart=/bin/bash -c 'source /home/atrustek/Documents/financial_process/venv/bin/activate && python3 airflow webserver -p 8080' [Install] WantedBy=multi-user.target ``` `sudo systemctl edit --full airflow-scheduler.service` ``` [Unit] Descrption= airflow webserver [Service] User=atrustek Group=atrustek WorkingDirectory=/home/atrustek/airflow ExecStart=/bin/bash -c 'source /home/atrustek/Documents/financial_process/venv/bin/activate && python3 airflow schduler' [Install] WantedBy=multi-user.target ``` **systemctl 設定檔重新載入** `systemctl daemon-reload` **啟動服務** `systemctl start airflow-webserver.service` `systemctl start airflow-schduler.service`
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up