[ML] 8.1: Elyra Airflow & Minio
===
###### tags: `ML`
###### tags: `ML`, `流程控制`, `MLOps`, `管線自動化`
<br>
[TOC]
<br>
:::info
:bulb: **測試網站**
- ### JupyterLab + Elyra 套件
http://10.78.26.241:8889/lab?token=95a65ee10228a6611e69a17b0c6e428dc4b8e891b489e856
- ### Airflow-web
http://10.78.26.241:9090/
admin/admin
- ### Minio
http://10.78.26.241:9000/
minioadmin/minioadmin
:::
# Elyra
## 簡介
- JupyterLab 筆記本的擴充套件
- 功能
- 編排工作流程(workflow)的工具,也就是安排 node 的執行流程
- 可在本地端執行,亦可在雲端 airflow, kubeflow 平台上執行
## [elyra-ai](https://github.com/elyra-ai/elyra)
- [hello_world](https://github.com/elyra-ai/examples/tree/master/pipelines/hello_world)
- [hello_world_apache_airflow](https://github.com/elyra-ai/examples/tree/master/pipelines/hello_world_apache_airflow)

- [Analyzing COVID-19 time series data](https://github.com/CODAIT/covid-notebooks)
<br>
## 安裝 elyra
- ### [Elyra Documentation](https://elyra.readthedocs.io/en/latest/getting_started/installation.html)
- ### 前製安裝作業
- [Node.js 12+](https://nodejs.org/en/)
- [How can I update my nodeJS to the latest version?](https://askubuntu.com/questions/426750/how-can-i-update-my-nodejs-to-the-latest-version)
變更前:
```
$ node --version
v10.15.3
```
變更後:
```
$ node --version
v14.16.1
```
- Python 3.x
- ### 三種安裝方式
- pip
- conda
- Build from source
- ### 使用 pip 安裝
建立虛擬環境
```bash
$ virtualenv -p python3 elyra_env
$ source elyra_env/bin/activate
```
安裝 Elyra
```bash
# for Ubuntu
$ pip3 install elyra && jupyter lab build
```
啟動 jupyter
```bash
$ jupyter lab --debug
```
啟動後,Launcher 頁籤就有 4 個選項

- ### 可以直接透過 docker 來使用
```bash
# 不能 work 的 command
$ docker run -it -p 8888:8888 elyra/elyra:dev jupyter lab --debug
```
- 不要下 `-d` (daemon)
- 會看不到 log
- log 中有要登入的 token
- 選擇性參數
- `-v source_path:dest_path` (掛載檔案進來)
- `-w target_workspace` (設定工作區)
- 遇到 error: Permission denied
docker 的使用者是 `jovyan`

- `-u root:root`
- 又遇到 error
```
Running as root is not recommended. Use --allow-root to bypass.
```
- `--allow-root`
- 完整 command
```
$ docker run --rm -it \
-v ~/tj_tsai/workspace/elyra_workflow_20210428:/workspace \
-w /workspace \
-p 18888:8888 \
-u root:root \
elyra/elyra:latest jupyter lab --debug --allow-root
```
- ### 佈署到 K8s
```yaml=
apiVersion: v1
kind: Pod
metadata:
name: tj-elyra-pod
labels:
app: tj-elyra-server
spec:
containers:
- name: tj-elyra-container
image: elyra/elyra:latest
command: ["jupyter", "lab", "--debug"]
ports:
- containerPort: 8888
---
apiVersion: v1
kind: Service
metadata:
name: tj-elyra-service
spec:
type: NodePort
ports:
- targetPort: 8888
port: 80
nodePort: 30002
selector:
app: tj-elyra-server
```
- ### 客製化 runtime image
- [Creating a custom runtime container image](https://elyra.readthedocs.io/en/latest/recipes/creating-a-custom-runtime-image.html)
<br>
## 測試範例
- [下載範例](https://github.com/elyra-ai/examples/tree/master/pipelines/hello_world#setup)
```
git clone https://github.com/elyra-ai/examples.git
```
<br>
## 相依性:檔案必須存在
- ### 設定

- ### 找不到檔案:penguin.csv

```
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor.py", line 275, in _upload_dependencies_to_object_store
dependency_archive_path = self._generate_dependency_archive(operation)
File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor.py", line 261, in _generate_dependency_archive
archive_artifact = create_temp_archive(archive_name=archive_artifact_name,
File "/opt/conda/lib/python3.8/site-packages/elyra/util/archive.py", line 124, in create_temp_archive
raise FileNotFoundError(filenames_set - matched_set) # Only include the missing filenames
FileNotFoundError: {'penguin.csv'}
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/tornado/web.py", line 1704, in _execute
result = await result
File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/handlers.py", line 89, in post
response = await PipelineProcessorManager.instance().process(pipeline)
File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor.py", line 78, in process
res = await asyncio.get_event_loop().run_in_executor(None, processor.process, pipeline)
File "/opt/conda/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor_airflow.py", line 70, in process
pipeline_filepath = self.create_pipeline_file(pipeline=pipeline,
File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor_airflow.py", line 249, in create_pipeline_file
notebook_ops = self._cc_pipeline(pipeline, pipeline_name)
File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor_airflow.py", line 210, in _cc_pipeline
self._upload_dependencies_to_object_store(runtime_configuration,
File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor.py", line 295, in _upload_dependencies_to_object_store
raise FileNotFoundError("Node '{}' referenced dependencies that were not found: {}".
FileNotFoundError: Node 'step2_select_coumns' referenced dependencies that were not found: {'penguin.csv'}
```
<br>
## 在 airflow 上執行
- ### 安裝 Airflow
- 參考 [**安裝-Airflow**](#安裝-Airflow1) 章節
- ### 設定 Apache Airflow runtimes
[](https://i.imgur.com/mQhc5nY.png)
- [GitHub personal access token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token)
- [Runtime Configuration](https://elyra.readthedocs.io/en/latest/user_guide/runtime-conf.html)
- [GitHub Personal Access Token (github_repo_token)](https://elyra.readthedocs.io/en/latest/user_guide/runtime-conf.html#github-personal-access-token-github-repo-token)
- S3 使用 AWS 時 ,不清楚要如何取得 bucket 的端點
- 這邊是使用 Minio,參考 [**安裝 Minio**](#安裝-Minio) 章節
- ### 啟動執行


- ### 執行成功
==Elyra:==

<br>
==Gitlab 狀態:==


<br>
==Minio 狀態:==

<br>
==Airflow 狀態:==
[](https://i.imgur.com/jwL4x1Z.png)
不淒楚為何 Airflow 都沒有動靜
- ### 完整 workflow 的 DAG code

```python=
from airflow import DAG
from airflow_notebook.pipeline import NotebookOp
from airflow.utils.dates import days_ago
# Setup default args with older date to automatically trigger when uploaded
args = {
"project_id": "tj-airflow-0504161135",
}
dag = DAG(
"tj-airflow-0504161135",
default_args=args,
schedule_interval="@once",
start_date=days_ago(1),
description="Created with Elyra 2.2.4 pipeline editor using tj-airflow.pipeline.",
is_paused_upon_creation=False,
)
notebook_op_2e2461b2_6916_410d_99e3_bc229e779441 = NotebookOp(
name="load_data",
namespace="default",
task_id="load_data",
notebook="elyra-ai-examples/pipelines/hello_world_apache_airflow/load_data.ipynb",
cos_endpoint="http://10.78.26.241:9000",
cos_bucket="minio-workspace",
cos_directory="tj-airflow-0504161135",
cos_dependencies_archive="load_data-2e2461b2-6916-410d-99e3-bc229e779441.tar.gz",
pipeline_outputs=[],
pipeline_inputs=[],
image="amancevice/pandas:1.1.1",
in_cluster=True,
env_vars={
"AWS_ACCESS_KEY_ID": "minioadmin",
"AWS_SECRET_ACCESS_KEY": "minioadmin",
"ELYRA_ENABLE_PIPELINE_INFO": "True",
},
config_file="None",
dag=dag,
)
notebook_op_e7290f33_0616_4879_a00b_643fc5f0e407 = NotebookOp(
name="Part_1___Data_Cleaning",
namespace="default",
task_id="Part_1___Data_Cleaning",
notebook="elyra-ai-examples/pipelines/hello_world_apache_airflow/Part 1 - Data Cleaning.ipynb",
cos_endpoint="http://10.78.26.241:9000",
cos_bucket="minio-workspace",
cos_directory="tj-airflow-0504161135",
cos_dependencies_archive="Part 1 - Data Cleaning-e7290f33-0616-4879-a00b-643fc5f0e407.tar.gz",
pipeline_outputs=[],
pipeline_inputs=[],
image="amancevice/pandas:1.1.1",
in_cluster=True,
env_vars={
"AWS_ACCESS_KEY_ID": "minioadmin",
"AWS_SECRET_ACCESS_KEY": "minioadmin",
"ELYRA_ENABLE_PIPELINE_INFO": "True",
},
config_file="None",
dag=dag,
)
(
notebook_op_e7290f33_0616_4879_a00b_643fc5f0e407
<< notebook_op_2e2461b2_6916_410d_99e3_bc229e779441
)
notebook_op_167f5cee_4145_450f_9c0d_d43fd8607ed3 = NotebookOp(
name="Part_2___Data_Analysis",
namespace="default",
task_id="Part_2___Data_Analysis",
notebook="elyra-ai-examples/pipelines/hello_world_apache_airflow/Part 2 - Data Analysis.ipynb",
cos_endpoint="http://10.78.26.241:9000",
cos_bucket="minio-workspace",
cos_directory="tj-airflow-0504161135",
cos_dependencies_archive="Part 2 - Data Analysis-167f5cee-4145-450f-9c0d-d43fd8607ed3.tar.gz",
pipeline_outputs=[],
pipeline_inputs=[],
image="amancevice/pandas:1.1.1",
in_cluster=True,
env_vars={
"AWS_ACCESS_KEY_ID": "minioadmin",
"AWS_SECRET_ACCESS_KEY": "minioadmin",
"ELYRA_ENABLE_PIPELINE_INFO": "True",
},
config_file="None",
dag=dag,
)
(
notebook_op_167f5cee_4145_450f_9c0d_d43fd8607ed3
<< notebook_op_e7290f33_0616_4879_a00b_643fc5f0e407
)
notebook_op_d8c73222_f332_4134_b8a2_83cb1cd2a066 = NotebookOp(
name="Part_3___Time_Series_Forecasting",
namespace="default",
task_id="Part_3___Time_Series_Forecasting",
notebook="elyra-ai-examples/pipelines/hello_world_apache_airflow/Part 3 - Time Series Forecasting.ipynb",
cos_endpoint="http://10.78.26.241:9000",
cos_bucket="minio-workspace",
cos_directory="tj-airflow-0504161135",
cos_dependencies_archive="Part 3 - Time Series Forecasting-d8c73222-f332-4134-b8a2-83cb1cd2a066.tar.gz",
pipeline_outputs=[],
pipeline_inputs=[],
image="amancevice/pandas:1.1.1",
in_cluster=True,
env_vars={
"AWS_ACCESS_KEY_ID": "minioadmin",
"AWS_SECRET_ACCESS_KEY": "minioadmin",
"ELYRA_ENABLE_PIPELINE_INFO": "True",
},
config_file="None",
dag=dag,
)
(
notebook_op_d8c73222_f332_4134_b8a2_83cb1cd2a066
<< notebook_op_e7290f33_0616_4879_a00b_643fc5f0e407
)
```
[](https://i.imgur.com/Wpnwod3.png)
<br>
## 參考資料
- ### [Running notebook pipelines in JupyterLab](https://medium.com/codait/running-notebook-pipelines-locally-in-jupyterlab-1fae14b8e081)

- ### [Creating notebook pipelines using Elyra and Kubeflow Pipelines](https://medium.com/codait/creating-notebook-pipelines-using-elyra-and-kubeflow-pipelines-f9606449cc53)
- S3 物件用途:儲存要執行的 code + 輸出入檔案
- 執行過程:
- 當一個 node 要執行時
input: 下載要用的相關檔案(壓縮檔)
output: 上傳回去
繼續下一個 node (可並行處理)
- 輸出入檔案範例



<br>
<hr>
<br>
# Airflow
## Cynthia 的 Airflow 介紹
- [[slide] Airflow](https://docs.google.com/presentation/d/1Zh5H2YzMv6h3P3s6-UPw1ykw2Rb8biYTszdJljSLYZE/edit?usp=sharing)
<br>
## 安裝 Airflow
- ### 在 K8s 上安裝
- [Airflow Helm Chart](https://github.com/airflow-helm/charts/tree/main/charts/airflow)
- charts/airflow/values.yaml b/charts/airflow/values.yaml
```git=
diff --git a/charts/airflow/values.yaml b/charts/airflow/values.yaml
index ff401f3..7e47839 100644
--- a/charts/airflow/values.yaml
+++ b/charts/airflow/values.yaml
@@ -80,9 +80,9 @@ airflow:
- username: admin
password: admin
role: Admin
- email: admin@example.com
- firstName: admin
- lastName: admin
+ email: tj_tsai@asus.com
+ firstName: tj
+ lastName: tsai
## if we update users or just create them the first time (lookup by `username`)
##
@@ -981,7 +981,7 @@ dags:
gitSync:
## if the git-sync sidecar container is enabled
##
- enabled: false
+ enabled: true
## the git-sync container image
##
@@ -1011,7 +1011,7 @@ dags:
## EXAMPLE - SSH:
## repo: "git@github.com:USERNAME/REPOSITORY.git"
##
- repo: ""
+ repo: "https://github.com/tsungjung411/any-private-test.git"
## the sub-path (within your repo) where dags are located
##
@@ -1023,7 +1023,7 @@ dags:
## the git branch to check out
##
- branch: master
+ branch: main
## the git revision (tag or hash) to check out
##
@@ -1043,7 +1043,7 @@ dags:
## the name of a pre-created Secret with git http credentials
##
- httpSecret: ""
+ httpSecret: "airflow-http-git-secret"
```
- 按照上面步驟即可安裝完成
- [[文件補充說明] Airflow Helm Chart](https://artifacthub.io/packages/helm/airflow-helm/airflow)
- 安裝成功
[](https://i.imgur.com/570YiKa.png)
<br>
bad case:
pod/tj-airflow-cluster-web 可能需要重啟數次(約數 10 分鐘到 20 分鐘),才能 work


- 連線測試
```bash
$ kubectl get all -n tj-airflow
$ kubectl port-forward -n $NAMESPACE --address 10.78.26.241 service/tj-airflow-cluster-web 9090:18080
```
- ### 在 K8s 上解除安裝
```bash
$ kubectl get all -n $NAMESPACE
$ helm uninstall $RELEASE_NAME -n $NAMESPACE
$ kubectl get all -n $NAMESPACE
```
- ### 套件問題
- `ModuleNotFoundError: No module named 'airflow_notebook'`
- [airflow-notebook 0.0.7](https://pypi.org/project/airflow-notebook/)
~~`pip install airflow-notebook`~~
- error
```
ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.
We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.
keyring 23.0.1 requires importlib-metadata>=3.6, but you'll have importlib-metadata 1.7.0 which is incompatible.
twine 3.4.1 requires importlib-metadata>=3.6, but you'll have importlib-metadata 1.7.0 which is incompatible.
```
- **正確安裝指令 (2021/05/12 已驗證)**
```
pip install airflow-notebook --use-feature=2020-resolver
```
測試 import
```python=
import airflow_notebook
```
- 連線到 pod 裡安裝
```bash
$ kubectl exec -it service/tj-airflow-cluster-web -n tj-airflow -- bash
$ pip install airflow-notebook --use-feature=2020-resolver
```
- 但還是遇到 `ModuleNotFoundError: No module named 'airflow_notebook'`
- airflow webserver 沒有重啟?
- [重啟需要 root 密碼](https://docs.qubole.com/en/latest/faqs/airflow/airflow-service-questions.html)
- 最終可行方法,在程式碼前面加上
```python=
def pip_install(python_lib):
import importlib.util
import sys
import os
module = importlib.util.find_spec(python_lib)
if module == None:
python_path = sys.executable
cmd = python_path + ' -m pip install ' + python_lib + ' --use-feature=2020-resolver'
os.system(cmd)
else:
print('module info:\n ', module)
print('module location:\n ', module.submodule_search_locations)
print('The package "%s" already exists.' % python_lib)
pip_install('airflow_notebook')
```
- 又遇到 id 衝突

相同檔名,不同副檔名,會產生相同 task ID
- DAG 執行失敗
[](https://i.imgur.com/b1Zvnwj.png)
```
*** Log file does not exist: /opt/airflow/logs/penguin-0512164146/step1_1_load_data/2021-05-11T00:00:00+00:00/1.log
*** Fetching from: http://:8793/log/penguin-0512164146/step1_1_load_data/2021-05-11T00:00:00+00:00/1.log
*** Failed to fetch log file from worker. Invalid URL 'http://:8793/log/penguin-0512164146/step1_1_load_data/2021-05-11T00:00:00+00:00/1.log': No host supplied
```
- [apache-airflow-providers-cncf-kubernetes 1.2.0](https://pypi.org/project/apache-airflow-providers-cncf-kubernetes/)
`pip install apache-airflow-providers-cncf-kubernetes`
- 尚未驗證
- [Installing Airflow with extras and providers](https://airflow.apache.org/docs/apache-airflow/stable/installation.html#installation-script)
```bash
AIRFLOW_VERSION=2.0.1
PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
pip install "apache-airflow[async,postgres,google]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
```
- [(2.0.1)-(3.8.7) 不存在](https://raw.githubusercontent.com/apache/airflow/constraints-2.0.1/constraints-3.8.7.txt)
- [(2.0.1)-(3.8) 存在](https://raw.githubusercontent.com/apache/airflow/constraints-2.0.1/constraints-3.8.txt)
```
CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-2.0.1/constraints-3.8.txt"
pip install "apache-airflow[async,postgres,google]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
```
- 仍然無法解決 (2021/05/12 已驗證)
`ModuleNotFoundError: No module named 'airflow_notebook'`
<br>
## 使用說明
- ### [Apache Airflow — A New Way To Write DAGs](https://towardsdatascience.com/apache-airflow-a-new-way-to-write-dags-240a93c52e1a)

<br>
<hr>
<br>
# Minio (S3-compatible)
## 安裝 Minio
- ### [在 Docker 上安裝](https://github.com/minio/minio#docker-installation)
```
$ docker run -p 9000:9000 -v `pwd`:/data minio/minio server /data
...
Detected default credentials 'minioadmin:minioadmin', please change the credentials immediately using 'MINIO_ROOT_USER' and 'MINIO_ROOT_PASSWORD'
IAM initialization complete
```
- IAM:Identity and Access Management 身分與存取管理
- 預設根憑證 `minioadmin:minioadmin`
- Apache Airflow runtimes 的設定
[](https://i.imgur.com/VCqaHP9.png)
- ### [在 K8s 上安裝](https://docs.min.io/docs/deploy-minio-on-kubernetes.html)
> MinIO is a high performance distributed object storage server, designed for large-scale private cloud infrastructure.
<br>
## [Pricing](https://min.io/pricing)

<br>
<hr>
<br>
# AWS S3
## IAM
> 建立使用者身份與對應權限
- ### 使用者帳號
[](https://i.imgur.com/JnYbDcm.png)
- ### 使用者群組
[](https://i.imgur.com/tPPEqDq.png)
[](https://i.imgur.com/PHrDYIQ.png)
- ### 其他設定
略過
- ### 建立完成
[](https://i.imgur.com/nZqTr4e.png)
| 中文 | Key | Value |
| -------- | -------- | -------- |
| 使用者 | User name | `elyra_s3_user` |
| 密碼 | Password | |
| 存取金鑰 ID | Access key ID | `AKIAYMLWO37GH6RI3PXE` |
| 私密存取金鑰 | Secret access key | `FKb1xcVcUF7xypwcQDPO+m06ERbqZRbbtRNGPCIR` |
- ### 設定密碼

<br>
## 參考資料
- ### [How To Get Amazon S3 Access Keys](https://objectivefs.com/howto/how-to-get-amazon-s3-keys)
- ### [How to Allow Public Access to an Amazon S3 Bucket & Find S3 URLs](https://havecamerawilltravel.com/photographer/how-allow-public-access-amazon-bucket/)
<br>
<hr>
<br>
# Airflow-on-K8s
## build image (包含 airflow-notebook 套件): OK (2021/05/20)
```dockerfile=
# docker build -t apache/airflow:2.0.1-python3.8-by-tj .
FROM apache/airflow:2.0.1-python3.8
USER root
RUN python -m pip install airflow_notebook --use-feature=2020-resolver
USER root airflow
```
## debug
### `no persistent volumes available for this claim and no storage class is set`
- 暫時解法
將 pstgre 的 persisstent 設成 false
- [Error “no persistent volumes available for this claim and no storage class is set”](https://stackoverflow.com/questions/55780083)
<br>
### `kubernetes.client.rest.ApiException: (403)`
[](https://i.imgur.com/FkC6o2o.png)
```
*** Log file does not exist: /opt/airflow/logs/penguin-0520073052/step1_load_data/2021-05-19T00:00:00+00:00/1.log
*** Fetching from: http://tj-airflow-cluster-worker-0.tj-airflow-cluster-worker.tj-airflow.svc.cluster.local:8793/log/penguin-0520073052/step1_load_data/2021-05-19T00:00:00+00:00/1.log
[2021-05-20 07:32:00,046] {taskinstance.py:851} INFO - Dependencies all met for <TaskInstance: penguin-0520073052.step1_load_data 2021-05-19T00:00:00+00:00 [queued]>
[2021-05-20 07:32:00,161] {taskinstance.py:851} INFO - Dependencies all met for <TaskInstance: penguin-0520073052.step1_load_data 2021-05-19T00:00:00+00:00 [queued]>
[2021-05-20 07:32:00,161] {taskinstance.py:1042} INFO -
--------------------------------------------------------------------------------
[2021-05-20 07:32:00,161] {taskinstance.py:1043} INFO - Starting attempt 1 of 1
[2021-05-20 07:32:00,162] {taskinstance.py:1044} INFO -
--------------------------------------------------------------------------------
[2021-05-20 07:32:00,174] {taskinstance.py:1063} INFO - Executing <Task(NotebookOp): step1_load_data> on 2021-05-19T00:00:00+00:00
[2021-05-20 07:32:00,246] {standard_task_runner.py:52} INFO - Started process 258 to run task
[2021-05-20 07:32:00,252] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'penguin-0520073052', 'step1_load_data', '2021-05-19T00:00:00+00:00', '--job-id', '31', '--pool', 'default_pool', '--raw', '--subdir', '/opt/airflow/dags/repo/penguin-0520073052.py', '--cfg-path', '/tmp/tmpnhagtq7b', '--error-file', '/tmp/tmps1bwwtkq']
[2021-05-20 07:32:00,252] {standard_task_runner.py:77} INFO - Job 31: Subtask step1_load_data
[2021-05-20 07:32:00,645] {logging_mixin.py:104} INFO - Running <TaskInstance: penguin-0520073052.step1_load_data 2021-05-19T00:00:00+00:00 [running]> on host tj-airflow-cluster-worker-0.tj-airflow-cluster-worker.tj-airflow.svc.cluster.local
[2021-05-20 07:32:00,755] {taskinstance.py:1255} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=penguin-0520073052
AIRFLOW_CTX_TASK_ID=step1_load_data
AIRFLOW_CTX_EXECUTION_DATE=2021-05-19T00:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2021-05-19T00:00:00+00:00
[2021-05-20 07:32:00,848] {taskinstance.py:1455} ERROR - (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '79d187ea-8c60-456e-a34a-f0fc94abe222', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'e4a0b3a5-23f5-4643-a914-1897672eeb07', 'Date': 'Thu, 20 May 2021 07:32:00 GMT', 'Content-Length': '296'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:tj-airflow:tj-airflow-cluster\" cannot list resource \"pods\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task
self._prepare_and_execute_task_with_callbacks(context, task)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks
result = self._execute_task(context, task_copy)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task
result = task_copy.execute(context=context)
File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 323, in execute
pod_list = client.list_namespaced_pod(self.namespace, label_selector=label_selector)
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 12803, in list_namespaced_pod
(data) = self.list_namespaced_pod_with_http_info(namespace, **kwargs) # noqa: E501
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 12891, in list_namespaced_pod_with_http_info
return self.api_client.call_api(
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 340, in call_api
return self.__call_api(resource_path, method,
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 172, in __call_api
response_data = self.request(
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 362, in request
return self.rest_client.GET(url,
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 237, in GET
return self.request("GET", url,
File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 231, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '79d187ea-8c60-456e-a34a-f0fc94abe222', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'e4a0b3a5-23f5-4643-a914-1897672eeb07', 'Date': 'Thu, 20 May 2021 07:32:00 GMT', 'Content-Length': '296'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:tj-airflow:tj-airflow-cluster\" cannot list resource \"pods\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}
[2021-05-20 07:32:00,862] {taskinstance.py:1496} INFO - Marking task as FAILED. dag_id=penguin-0520073052, task_id=step1_load_data, execution_date=20210519T000000, start_date=20210520T073200, end_date=20210520T073200
[2021-05-20 07:32:00,984] {local_task_job.py:146} INFO - Task exited with return code 1
```
- 檢查 values.yaml

```yaml=1204
###################################
# Kubernetes - RBAC
###################################
rbac:
## if Kubernetes RBAC resources are created
##
## NOTE:
## - these allow the service account to create/delete Pods in the airflow namespace,
## which is required for the KubernetesPodOperator() to function
##
create: true
## if the created RBAC Role has GET/LIST on Event resources
##
## NOTE:
## - this is needed for KubernetesPodOperator() to use `log_events_on_failure=True`
##
events: true
###################################
# Kubernetes - Service Account
###################################
serviceAccount:
## if a Kubernetes ServiceAccount is created
##
## NOTE:
## - if false, you must create the service account outside of this chart,
## with the name: `serviceAccount.name`
##
create: true
## the name of the ServiceAccount
##
## NOTE:
## - by default the name is generated using the `airflow.serviceAccountName` template in `_helpers/common.tpl`
##
name: ""
## annotations for the ServiceAccount
##
## EXAMPLE: (to use WorkloadIdentity in Google Cloud)
## annotations:
## iam.gke.io/gcp-service-account: <<GCP_SERVICE>>@<<GCP_PROJECT>>.iam.gserviceaccount.com
##
annotations: {}
```
- 可能解法
- [Scheduler throw API exception (403) #473](https://github.com/puckel/docker-airflow/issues/473)
- [403 forbidden when i ran in_cluster_config.py in the pod #519](https://github.com/kubernetes-client/python/issues/519)

- here is my blog for solving this problem: https://blog.csdn.net/u013431916/article/details/80057369
- [基於RBAC方式從Pod內訪問API server](https://blog.csdn.net/u013431916/article/details/80057369)
- [[kubernetes-client / python]in_cluster_config.py](https://github.com/kubernetes-client/python/blob/master/examples/in_cluster_config.py)