[ML] 8.1: Elyra Airflow & Minio === ###### tags: `ML` ###### tags: `ML`, `流程控制`, `MLOps`, `管線自動化` <br> [TOC] <br> :::info :bulb: **測試網站** - ### JupyterLab + Elyra 套件 http://10.78.26.241:8889/lab?token=95a65ee10228a6611e69a17b0c6e428dc4b8e891b489e856 - ### Airflow-web http://10.78.26.241:9090/ admin/admin - ### Minio http://10.78.26.241:9000/ minioadmin/minioadmin ::: # Elyra ## 簡介 - JupyterLab 筆記本的擴充套件 - 功能 - 編排工作流程(workflow)的工具,也就是安排 node 的執行流程 - 可在本地端執行,亦可在雲端 airflow, kubeflow 平台上執行 ## [elyra-ai](https://github.com/elyra-ai/elyra) - [hello_world](https://github.com/elyra-ai/examples/tree/master/pipelines/hello_world) - [hello_world_apache_airflow](https://github.com/elyra-ai/examples/tree/master/pipelines/hello_world_apache_airflow) ![](https://i.imgur.com/TODOLaL.png) - [Analyzing COVID-19 time series data](https://github.com/CODAIT/covid-notebooks) <br> ## 安裝 elyra - ### [Elyra Documentation](https://elyra.readthedocs.io/en/latest/getting_started/installation.html) - ### 前製安裝作業 - [Node.js 12+](https://nodejs.org/en/) - [How can I update my nodeJS to the latest version?](https://askubuntu.com/questions/426750/how-can-i-update-my-nodejs-to-the-latest-version) 變更前: ``` $ node --version v10.15.3 ``` 變更後: ``` $ node --version v14.16.1 ``` - Python 3.x - ### 三種安裝方式 - pip - conda - Build from source - ### 使用 pip 安裝 建立虛擬環境 ```bash $ virtualenv -p python3 elyra_env $ source elyra_env/bin/activate ``` 安裝 Elyra ```bash # for Ubuntu $ pip3 install elyra && jupyter lab build ``` 啟動 jupyter ```bash $ jupyter lab --debug ``` 啟動後,Launcher 頁籤就有 4 個選項 ![](https://i.imgur.com/Ec3bNeR.png) - ### 可以直接透過 docker 來使用 ```bash # 不能 work 的 command $ docker run -it -p 8888:8888 elyra/elyra:dev jupyter lab --debug ``` - 不要下 `-d` (daemon) - 會看不到 log - log 中有要登入的 token - 選擇性參數 - `-v source_path:dest_path` (掛載檔案進來) - `-w target_workspace` (設定工作區) - 遇到 error: Permission denied docker 的使用者是 `jovyan` ![](https://i.imgur.com/XHQHOYB.png) - `-u root:root` - 又遇到 error ``` Running as root is not recommended. Use --allow-root to bypass. ``` - `--allow-root` - 完整 command ``` $ docker run --rm -it \ -v ~/tj_tsai/workspace/elyra_workflow_20210428:/workspace \ -w /workspace \ -p 18888:8888 \ -u root:root \ elyra/elyra:latest jupyter lab --debug --allow-root ``` - ### 佈署到 K8s ```yaml= apiVersion: v1 kind: Pod metadata: name: tj-elyra-pod labels: app: tj-elyra-server spec: containers: - name: tj-elyra-container image: elyra/elyra:latest command: ["jupyter", "lab", "--debug"] ports: - containerPort: 8888 --- apiVersion: v1 kind: Service metadata: name: tj-elyra-service spec: type: NodePort ports: - targetPort: 8888 port: 80 nodePort: 30002 selector: app: tj-elyra-server ``` - ### 客製化 runtime image - [Creating a custom runtime container image](https://elyra.readthedocs.io/en/latest/recipes/creating-a-custom-runtime-image.html) <br> ## 測試範例 - [下載範例](https://github.com/elyra-ai/examples/tree/master/pipelines/hello_world#setup) ``` git clone https://github.com/elyra-ai/examples.git ``` <br> ## 相依性:檔案必須存在 - ### 設定 ![](https://i.imgur.com/1wcOJA5.png) - ### 找不到檔案:penguin.csv ![](https://i.imgur.com/JHIBqu4.png) ``` Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor.py", line 275, in _upload_dependencies_to_object_store dependency_archive_path = self._generate_dependency_archive(operation) File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor.py", line 261, in _generate_dependency_archive archive_artifact = create_temp_archive(archive_name=archive_artifact_name, File "/opt/conda/lib/python3.8/site-packages/elyra/util/archive.py", line 124, in create_temp_archive raise FileNotFoundError(filenames_set - matched_set) # Only include the missing filenames FileNotFoundError: {'penguin.csv'} The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/opt/conda/lib/python3.8/site-packages/tornado/web.py", line 1704, in _execute result = await result File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/handlers.py", line 89, in post response = await PipelineProcessorManager.instance().process(pipeline) File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor.py", line 78, in process res = await asyncio.get_event_loop().run_in_executor(None, processor.process, pipeline) File "/opt/conda/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor_airflow.py", line 70, in process pipeline_filepath = self.create_pipeline_file(pipeline=pipeline, File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor_airflow.py", line 249, in create_pipeline_file notebook_ops = self._cc_pipeline(pipeline, pipeline_name) File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor_airflow.py", line 210, in _cc_pipeline self._upload_dependencies_to_object_store(runtime_configuration, File "/opt/conda/lib/python3.8/site-packages/elyra/pipeline/processor.py", line 295, in _upload_dependencies_to_object_store raise FileNotFoundError("Node '{}' referenced dependencies that were not found: {}". FileNotFoundError: Node 'step2_select_coumns' referenced dependencies that were not found: {'penguin.csv'} ``` <br> ## 在 airflow 上執行 - ### 安裝 Airflow - 參考 [**安裝-Airflow**](#安裝-Airflow1) 章節 - ### 設定 Apache Airflow runtimes [![](https://i.imgur.com/mQhc5nY.png)](https://i.imgur.com/mQhc5nY.png) - [GitHub personal access token](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token) - [Runtime Configuration](https://elyra.readthedocs.io/en/latest/user_guide/runtime-conf.html) - [GitHub Personal Access Token (github_repo_token)](https://elyra.readthedocs.io/en/latest/user_guide/runtime-conf.html#github-personal-access-token-github-repo-token) - S3 使用 AWS 時 ,不清楚要如何取得 bucket 的端點 - 這邊是使用 Minio,參考 [**安裝 Minio**](#安裝-Minio) 章節 - ### 啟動執行 ![](https://i.imgur.com/xISwynf.png) ![](https://i.imgur.com/LQSCls1.png) - ### 執行成功 ==Elyra:== ![](https://i.imgur.com/XS4Pcgh.png) <br> ==Gitlab 狀態:== ![](https://i.imgur.com/ljRuDER.png) ![](https://i.imgur.com/pfaFwAo.png) <br> ==Minio 狀態:== ![](https://i.imgur.com/rveacBn.png) <br> ==Airflow 狀態:== [![](https://i.imgur.com/jwL4x1Z.png)](https://i.imgur.com/jwL4x1Z.png) 不淒楚為何 Airflow 都沒有動靜 - ### 完整 workflow 的 DAG code ![](https://i.imgur.com/ENN2uwk.png) ```python= from airflow import DAG from airflow_notebook.pipeline import NotebookOp from airflow.utils.dates import days_ago # Setup default args with older date to automatically trigger when uploaded args = { "project_id": "tj-airflow-0504161135", } dag = DAG( "tj-airflow-0504161135", default_args=args, schedule_interval="@once", start_date=days_ago(1), description="Created with Elyra 2.2.4 pipeline editor using tj-airflow.pipeline.", is_paused_upon_creation=False, ) notebook_op_2e2461b2_6916_410d_99e3_bc229e779441 = NotebookOp( name="load_data", namespace="default", task_id="load_data", notebook="elyra-ai-examples/pipelines/hello_world_apache_airflow/load_data.ipynb", cos_endpoint="http://10.78.26.241:9000", cos_bucket="minio-workspace", cos_directory="tj-airflow-0504161135", cos_dependencies_archive="load_data-2e2461b2-6916-410d-99e3-bc229e779441.tar.gz", pipeline_outputs=[], pipeline_inputs=[], image="amancevice/pandas:1.1.1", in_cluster=True, env_vars={ "AWS_ACCESS_KEY_ID": "minioadmin", "AWS_SECRET_ACCESS_KEY": "minioadmin", "ELYRA_ENABLE_PIPELINE_INFO": "True", }, config_file="None", dag=dag, ) notebook_op_e7290f33_0616_4879_a00b_643fc5f0e407 = NotebookOp( name="Part_1___Data_Cleaning", namespace="default", task_id="Part_1___Data_Cleaning", notebook="elyra-ai-examples/pipelines/hello_world_apache_airflow/Part 1 - Data Cleaning.ipynb", cos_endpoint="http://10.78.26.241:9000", cos_bucket="minio-workspace", cos_directory="tj-airflow-0504161135", cos_dependencies_archive="Part 1 - Data Cleaning-e7290f33-0616-4879-a00b-643fc5f0e407.tar.gz", pipeline_outputs=[], pipeline_inputs=[], image="amancevice/pandas:1.1.1", in_cluster=True, env_vars={ "AWS_ACCESS_KEY_ID": "minioadmin", "AWS_SECRET_ACCESS_KEY": "minioadmin", "ELYRA_ENABLE_PIPELINE_INFO": "True", }, config_file="None", dag=dag, ) ( notebook_op_e7290f33_0616_4879_a00b_643fc5f0e407 << notebook_op_2e2461b2_6916_410d_99e3_bc229e779441 ) notebook_op_167f5cee_4145_450f_9c0d_d43fd8607ed3 = NotebookOp( name="Part_2___Data_Analysis", namespace="default", task_id="Part_2___Data_Analysis", notebook="elyra-ai-examples/pipelines/hello_world_apache_airflow/Part 2 - Data Analysis.ipynb", cos_endpoint="http://10.78.26.241:9000", cos_bucket="minio-workspace", cos_directory="tj-airflow-0504161135", cos_dependencies_archive="Part 2 - Data Analysis-167f5cee-4145-450f-9c0d-d43fd8607ed3.tar.gz", pipeline_outputs=[], pipeline_inputs=[], image="amancevice/pandas:1.1.1", in_cluster=True, env_vars={ "AWS_ACCESS_KEY_ID": "minioadmin", "AWS_SECRET_ACCESS_KEY": "minioadmin", "ELYRA_ENABLE_PIPELINE_INFO": "True", }, config_file="None", dag=dag, ) ( notebook_op_167f5cee_4145_450f_9c0d_d43fd8607ed3 << notebook_op_e7290f33_0616_4879_a00b_643fc5f0e407 ) notebook_op_d8c73222_f332_4134_b8a2_83cb1cd2a066 = NotebookOp( name="Part_3___Time_Series_Forecasting", namespace="default", task_id="Part_3___Time_Series_Forecasting", notebook="elyra-ai-examples/pipelines/hello_world_apache_airflow/Part 3 - Time Series Forecasting.ipynb", cos_endpoint="http://10.78.26.241:9000", cos_bucket="minio-workspace", cos_directory="tj-airflow-0504161135", cos_dependencies_archive="Part 3 - Time Series Forecasting-d8c73222-f332-4134-b8a2-83cb1cd2a066.tar.gz", pipeline_outputs=[], pipeline_inputs=[], image="amancevice/pandas:1.1.1", in_cluster=True, env_vars={ "AWS_ACCESS_KEY_ID": "minioadmin", "AWS_SECRET_ACCESS_KEY": "minioadmin", "ELYRA_ENABLE_PIPELINE_INFO": "True", }, config_file="None", dag=dag, ) ( notebook_op_d8c73222_f332_4134_b8a2_83cb1cd2a066 << notebook_op_e7290f33_0616_4879_a00b_643fc5f0e407 ) ``` [![](https://i.imgur.com/Wpnwod3.png)](https://i.imgur.com/Wpnwod3.png) <br> ## 參考資料 - ### [Running notebook pipelines in JupyterLab](https://medium.com/codait/running-notebook-pipelines-locally-in-jupyterlab-1fae14b8e081) ![](https://i.imgur.com/zVLUqos.png) - ### [Creating notebook pipelines using Elyra and Kubeflow Pipelines](https://medium.com/codait/creating-notebook-pipelines-using-elyra-and-kubeflow-pipelines-f9606449cc53) - S3 物件用途:儲存要執行的 code + 輸出入檔案 - 執行過程: - 當一個 node 要執行時 input: 下載要用的相關檔案(壓縮檔) output: 上傳回去 繼續下一個 node (可並行處理) - 輸出入檔案範例 ![](https://i.imgur.com/fmS1vFM.png) ![](https://i.imgur.com/8momSeZ.png) ![](https://i.imgur.com/osfEU3v.png) <br> <hr> <br> # Airflow ## Cynthia 的 Airflow 介紹 - [[slide] Airflow](https://docs.google.com/presentation/d/1Zh5H2YzMv6h3P3s6-UPw1ykw2Rb8biYTszdJljSLYZE/edit?usp=sharing) <br> ## 安裝 Airflow - ### 在 K8s 上安裝 - [Airflow Helm Chart](https://github.com/airflow-helm/charts/tree/main/charts/airflow) - charts/airflow/values.yaml b/charts/airflow/values.yaml ```git= diff --git a/charts/airflow/values.yaml b/charts/airflow/values.yaml index ff401f3..7e47839 100644 --- a/charts/airflow/values.yaml +++ b/charts/airflow/values.yaml @@ -80,9 +80,9 @@ airflow: - username: admin password: admin role: Admin - email: admin@example.com - firstName: admin - lastName: admin + email: tj_tsai@asus.com + firstName: tj + lastName: tsai ## if we update users or just create them the first time (lookup by `username`) ## @@ -981,7 +981,7 @@ dags: gitSync: ## if the git-sync sidecar container is enabled ## - enabled: false + enabled: true ## the git-sync container image ## @@ -1011,7 +1011,7 @@ dags: ## EXAMPLE - SSH: ## repo: "git@github.com:USERNAME/REPOSITORY.git" ## - repo: "" + repo: "https://github.com/tsungjung411/any-private-test.git" ## the sub-path (within your repo) where dags are located ## @@ -1023,7 +1023,7 @@ dags: ## the git branch to check out ## - branch: master + branch: main ## the git revision (tag or hash) to check out ## @@ -1043,7 +1043,7 @@ dags: ## the name of a pre-created Secret with git http credentials ## - httpSecret: "" + httpSecret: "airflow-http-git-secret" ``` - 按照上面步驟即可安裝完成 - [[文件補充說明] Airflow Helm Chart](https://artifacthub.io/packages/helm/airflow-helm/airflow) - 安裝成功 [![](https://i.imgur.com/570YiKa.png)](https://i.imgur.com/570YiKa.png) <br> bad case: pod/tj-airflow-cluster-web 可能需要重啟數次(約數 10 分鐘到 20 分鐘),才能 work ![](https://i.imgur.com/LbZq26I.png) ![](https://i.imgur.com/uCG7xsq.png) - 連線測試 ```bash $ kubectl get all -n tj-airflow $ kubectl port-forward -n $NAMESPACE --address 10.78.26.241 service/tj-airflow-cluster-web 9090:18080 ``` - ### 在 K8s 上解除安裝 ```bash $ kubectl get all -n $NAMESPACE $ helm uninstall $RELEASE_NAME -n $NAMESPACE $ kubectl get all -n $NAMESPACE ``` - ### 套件問題 - `ModuleNotFoundError: No module named 'airflow_notebook'` - [airflow-notebook 0.0.7](https://pypi.org/project/airflow-notebook/) ~~`pip install airflow-notebook`~~ - error ``` ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts. We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default. keyring 23.0.1 requires importlib-metadata>=3.6, but you'll have importlib-metadata 1.7.0 which is incompatible. twine 3.4.1 requires importlib-metadata>=3.6, but you'll have importlib-metadata 1.7.0 which is incompatible. ``` - **正確安裝指令 (2021/05/12 已驗證)** ``` pip install airflow-notebook --use-feature=2020-resolver ``` 測試 import ```python= import airflow_notebook ``` - 連線到 pod 裡安裝 ```bash $ kubectl exec -it service/tj-airflow-cluster-web -n tj-airflow -- bash $ pip install airflow-notebook --use-feature=2020-resolver ``` - 但還是遇到 `ModuleNotFoundError: No module named 'airflow_notebook'` - airflow webserver 沒有重啟? - [重啟需要 root 密碼](https://docs.qubole.com/en/latest/faqs/airflow/airflow-service-questions.html) - 最終可行方法,在程式碼前面加上 ```python= def pip_install(python_lib): import importlib.util import sys import os module = importlib.util.find_spec(python_lib) if module == None: python_path = sys.executable cmd = python_path + ' -m pip install ' + python_lib + ' --use-feature=2020-resolver' os.system(cmd) else: print('module info:\n ', module) print('module location:\n ', module.submodule_search_locations) print('The package "%s" already exists.' % python_lib) pip_install('airflow_notebook') ``` - 又遇到 id 衝突 ![](https://i.imgur.com/eoaMxTj.png) 相同檔名,不同副檔名,會產生相同 task ID - DAG 執行失敗 [![](https://i.imgur.com/b1Zvnwj.png)](https://i.imgur.com/b1Zvnwj.png) ``` *** Log file does not exist: /opt/airflow/logs/penguin-0512164146/step1_1_load_data/2021-05-11T00:00:00+00:00/1.log *** Fetching from: http://:8793/log/penguin-0512164146/step1_1_load_data/2021-05-11T00:00:00+00:00/1.log *** Failed to fetch log file from worker. Invalid URL 'http://:8793/log/penguin-0512164146/step1_1_load_data/2021-05-11T00:00:00+00:00/1.log': No host supplied ``` - [apache-airflow-providers-cncf-kubernetes 1.2.0](https://pypi.org/project/apache-airflow-providers-cncf-kubernetes/) `pip install apache-airflow-providers-cncf-kubernetes` - 尚未驗證 - [Installing Airflow with extras and providers](https://airflow.apache.org/docs/apache-airflow/stable/installation.html#installation-script) ```bash AIRFLOW_VERSION=2.0.1 PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)" CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt" pip install "apache-airflow[async,postgres,google]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}" ``` - [(2.0.1)-(3.8.7) 不存在](https://raw.githubusercontent.com/apache/airflow/constraints-2.0.1/constraints-3.8.7.txt) - [(2.0.1)-(3.8) 存在](https://raw.githubusercontent.com/apache/airflow/constraints-2.0.1/constraints-3.8.txt) ``` CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-2.0.1/constraints-3.8.txt" pip install "apache-airflow[async,postgres,google]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}" ``` - 仍然無法解決 (2021/05/12 已驗證) `ModuleNotFoundError: No module named 'airflow_notebook'` <br> ## 使用說明 - ### [Apache Airflow — A New Way To Write DAGs](https://towardsdatascience.com/apache-airflow-a-new-way-to-write-dags-240a93c52e1a) ![](https://i.imgur.com/4qEZBt8.png) <br> <hr> <br> # Minio (S3-compatible) ## 安裝 Minio - ### [在 Docker 上安裝](https://github.com/minio/minio#docker-installation) ``` $ docker run -p 9000:9000 -v `pwd`:/data minio/minio server /data ... Detected default credentials 'minioadmin:minioadmin', please change the credentials immediately using 'MINIO_ROOT_USER' and 'MINIO_ROOT_PASSWORD' IAM initialization complete ``` - IAM:Identity and Access Management 身分與存取管理 - 預設根憑證 `minioadmin:minioadmin` - Apache Airflow runtimes 的設定 [![](https://i.imgur.com/VCqaHP9.png)](https://i.imgur.com/VCqaHP9.png) - ### [在 K8s 上安裝](https://docs.min.io/docs/deploy-minio-on-kubernetes.html) > MinIO is a high performance distributed object storage server, designed for large-scale private cloud infrastructure. <br> ## [Pricing](https://min.io/pricing) ![](https://i.imgur.com/eaLd3XR.png) <br> <hr> <br> # AWS S3 ## IAM > 建立使用者身份與對應權限 - ### 使用者帳號 [![](https://i.imgur.com/JnYbDcm.png)](https://i.imgur.com/JnYbDcm.png) - ### 使用者群組 [![](https://i.imgur.com/tPPEqDq.png)](https://i.imgur.com/tPPEqDq.png) [![](https://i.imgur.com/PHrDYIQ.png)](https://i.imgur.com/PHrDYIQ.png) - ### 其他設定 略過 - ### 建立完成 [![](https://i.imgur.com/nZqTr4e.png)](https://i.imgur.com/nZqTr4e.png) | 中文 | Key | Value | | -------- | -------- | -------- | | 使用者 | User name | `elyra_s3_user` | | 密碼 | Password | | | 存取金鑰 ID | Access key ID | `AKIAYMLWO37GH6RI3PXE` | | 私密存取金鑰 | Secret access key | `FKb1xcVcUF7xypwcQDPO+m06ERbqZRbbtRNGPCIR` | - ### 設定密碼 ![](https://i.imgur.com/6wMDF8Y.png) <br> ## 參考資料 - ### [How To Get Amazon S3 Access Keys](https://objectivefs.com/howto/how-to-get-amazon-s3-keys) - ### [How to Allow Public Access to an Amazon S3 Bucket & Find S3 URLs](https://havecamerawilltravel.com/photographer/how-allow-public-access-amazon-bucket/) <br> <hr> <br> # Airflow-on-K8s ## build image (包含 airflow-notebook 套件): OK (2021/05/20) ```dockerfile= # docker build -t apache/airflow:2.0.1-python3.8-by-tj . FROM apache/airflow:2.0.1-python3.8 USER root RUN python -m pip install airflow_notebook --use-feature=2020-resolver USER root airflow ``` ## debug ### `no persistent volumes available for this claim and no storage class is set` - 暫時解法 將 pstgre 的 persisstent 設成 false - [Error “no persistent volumes available for this claim and no storage class is set”](https://stackoverflow.com/questions/55780083) <br> ### `kubernetes.client.rest.ApiException: (403)` [![](https://i.imgur.com/FkC6o2o.png)](https://i.imgur.com/FkC6o2o.png) ``` *** Log file does not exist: /opt/airflow/logs/penguin-0520073052/step1_load_data/2021-05-19T00:00:00+00:00/1.log *** Fetching from: http://tj-airflow-cluster-worker-0.tj-airflow-cluster-worker.tj-airflow.svc.cluster.local:8793/log/penguin-0520073052/step1_load_data/2021-05-19T00:00:00+00:00/1.log [2021-05-20 07:32:00,046] {taskinstance.py:851} INFO - Dependencies all met for <TaskInstance: penguin-0520073052.step1_load_data 2021-05-19T00:00:00+00:00 [queued]> [2021-05-20 07:32:00,161] {taskinstance.py:851} INFO - Dependencies all met for <TaskInstance: penguin-0520073052.step1_load_data 2021-05-19T00:00:00+00:00 [queued]> [2021-05-20 07:32:00,161] {taskinstance.py:1042} INFO - -------------------------------------------------------------------------------- [2021-05-20 07:32:00,161] {taskinstance.py:1043} INFO - Starting attempt 1 of 1 [2021-05-20 07:32:00,162] {taskinstance.py:1044} INFO - -------------------------------------------------------------------------------- [2021-05-20 07:32:00,174] {taskinstance.py:1063} INFO - Executing <Task(NotebookOp): step1_load_data> on 2021-05-19T00:00:00+00:00 [2021-05-20 07:32:00,246] {standard_task_runner.py:52} INFO - Started process 258 to run task [2021-05-20 07:32:00,252] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'penguin-0520073052', 'step1_load_data', '2021-05-19T00:00:00+00:00', '--job-id', '31', '--pool', 'default_pool', '--raw', '--subdir', '/opt/airflow/dags/repo/penguin-0520073052.py', '--cfg-path', '/tmp/tmpnhagtq7b', '--error-file', '/tmp/tmps1bwwtkq'] [2021-05-20 07:32:00,252] {standard_task_runner.py:77} INFO - Job 31: Subtask step1_load_data [2021-05-20 07:32:00,645] {logging_mixin.py:104} INFO - Running <TaskInstance: penguin-0520073052.step1_load_data 2021-05-19T00:00:00+00:00 [running]> on host tj-airflow-cluster-worker-0.tj-airflow-cluster-worker.tj-airflow.svc.cluster.local [2021-05-20 07:32:00,755] {taskinstance.py:1255} INFO - Exporting the following env vars: AIRFLOW_CTX_DAG_OWNER=airflow AIRFLOW_CTX_DAG_ID=penguin-0520073052 AIRFLOW_CTX_TASK_ID=step1_load_data AIRFLOW_CTX_EXECUTION_DATE=2021-05-19T00:00:00+00:00 AIRFLOW_CTX_DAG_RUN_ID=scheduled__2021-05-19T00:00:00+00:00 [2021-05-20 07:32:00,848] {taskinstance.py:1455} ERROR - (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '79d187ea-8c60-456e-a34a-f0fc94abe222', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'e4a0b3a5-23f5-4643-a914-1897672eeb07', 'Date': 'Thu, 20 May 2021 07:32:00 GMT', 'Content-Length': '296'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:tj-airflow:tj-airflow-cluster\" cannot list resource \"pods\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"kind":"pods"},"code":403} Traceback (most recent call last): File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1112, in _run_raw_task self._prepare_and_execute_task_with_callbacks(context, task) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1285, in _prepare_and_execute_task_with_callbacks result = self._execute_task(context, task_copy) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1315, in _execute_task result = task_copy.execute(context=context) File "/home/airflow/.local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py", line 323, in execute pod_list = client.list_namespaced_pod(self.namespace, label_selector=label_selector) File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 12803, in list_namespaced_pod (data) = self.list_namespaced_pod_with_http_info(namespace, **kwargs) # noqa: E501 File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 12891, in list_namespaced_pod_with_http_info return self.api_client.call_api( File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 340, in call_api return self.__call_api(resource_path, method, File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 172, in __call_api response_data = self.request( File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 362, in request return self.rest_client.GET(url, File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 237, in GET return self.request("GET", url, File "/home/airflow/.local/lib/python3.8/site-packages/kubernetes/client/rest.py", line 231, in request raise ApiException(http_resp=r) kubernetes.client.rest.ApiException: (403) Reason: Forbidden HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '79d187ea-8c60-456e-a34a-f0fc94abe222', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'e4a0b3a5-23f5-4643-a914-1897672eeb07', 'Date': 'Thu, 20 May 2021 07:32:00 GMT', 'Content-Length': '296'}) HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:tj-airflow:tj-airflow-cluster\" cannot list resource \"pods\" in API group \"\" in the namespace \"default\"","reason":"Forbidden","details":{"kind":"pods"},"code":403} [2021-05-20 07:32:00,862] {taskinstance.py:1496} INFO - Marking task as FAILED. dag_id=penguin-0520073052, task_id=step1_load_data, execution_date=20210519T000000, start_date=20210520T073200, end_date=20210520T073200 [2021-05-20 07:32:00,984] {local_task_job.py:146} INFO - Task exited with return code 1 ``` - 檢查 values.yaml ![](https://i.imgur.com/uTsl6FK.png) ```yaml=1204 ################################### # Kubernetes - RBAC ################################### rbac: ## if Kubernetes RBAC resources are created ## ## NOTE: ## - these allow the service account to create/delete Pods in the airflow namespace, ## which is required for the KubernetesPodOperator() to function ## create: true ## if the created RBAC Role has GET/LIST on Event resources ## ## NOTE: ## - this is needed for KubernetesPodOperator() to use `log_events_on_failure=True` ## events: true ################################### # Kubernetes - Service Account ################################### serviceAccount: ## if a Kubernetes ServiceAccount is created ## ## NOTE: ## - if false, you must create the service account outside of this chart, ## with the name: `serviceAccount.name` ## create: true ## the name of the ServiceAccount ## ## NOTE: ## - by default the name is generated using the `airflow.serviceAccountName` template in `_helpers/common.tpl` ## name: "" ## annotations for the ServiceAccount ## ## EXAMPLE: (to use WorkloadIdentity in Google Cloud) ## annotations: ## iam.gke.io/gcp-service-account: <<GCP_SERVICE>>@<<GCP_PROJECT>>.iam.gserviceaccount.com ## annotations: {} ``` - 可能解法 - [Scheduler throw API exception (403) #473](https://github.com/puckel/docker-airflow/issues/473) - [403 forbidden when i ran in_cluster_config.py in the pod #519](https://github.com/kubernetes-client/python/issues/519) ![](https://i.imgur.com/hLyA8du.png) - here is my blog for solving this problem: https://blog.csdn.net/u013431916/article/details/80057369 - [基於RBAC方式從Pod內訪問API server](https://blog.csdn.net/u013431916/article/details/80057369) - [[kubernetes-client / python]in_cluster_config.py](https://github.com/kubernetes-client/python/blob/master/examples/in_cluster_config.py)