# 2025 TAM Day | OpenShift EDA + OCP MCP server
## 0. 環境介紹
1. Go to the info page click on VScode server link, use password provided in the table to login.

2. Go to info page click on Automation controller link, use username and password provided in the table to login.

3. Go to the Automation controller console and click on jobs under "Automation Excution". Jobs section will show all triggered job templates status by the EDA.

4. Go back to the VScode server console and open a terminal.

5. Split terminal so that we can execute and observe the command output side by side.

## 1. 自動新增Resource Quota
## 2. 自動備份PVC
## 3. 自動設定Ingress加密
## 4. 自動收集除錯log (OCP EDA for auto oc adm inspect)
### Add bastion info in AAP
1. Create Inventory - Bastion


2. Add host in the inventory



3. Create credential for bastion


4. Create Host group in bastion inventory and add bastion to the host group






### Create new playbook for oc adm inspect


1. Clone event-driven-andible repo in your VScode termianl
```
cd ~
git clone https://gitea.apps.cluster-7f74v.7f74v.sandbox734.opentlc.com/lab-user/event-driven-ansible.git
```

2. Under event-driven-ansible/automation_controller create playbook `oc-inspect.yml`

oc-inspect.yml
```yaml=
- name: oc adm inspect
hosts: bastion
gather_facts: no
vars:
ns: "{{ ansible_eda.event.resource.metadata.namespace }}"
tasks:
- name: Create inspect file
shell:
"oc adm inspect ns/{{ ns }} --kubeconfig /home/lab-user/.kube/config"
register: lsout
```
3. push git project
```
cd ~/event-driven-ansible
git add *
git commit -am "playbook for oc adm inspect"
git push
```
VScode will ask you to login to gittea.

After push completed, Check your commit ID.

### Create job template in AAP
1. git server 更新AAP Project

新增template取名為 `oc-inspect`


### Create new rulebook
On Vscode console, switch to eda-rulebooks/rulebooks
and create a file named `oc-inspect.yml`

Edit oc-inspect.yml with following
```yaml=
---
- name: Listen for unhealthy+warning event
hosts: all
sources:
- sabre1041.eda.k8s:
api_version: v1
kind: Event
namespace: jace #自行替換成新預計的ns名稱
rules:
- name: Debug
condition:
event.resource.reason == "Unhealthy" and event.resource.type == "Warning"
throttle:
once_within: 5 minutes
group_by_attributes:
- event.resource.metadata.namespace
- event.resource.involvedObject.name
action:
run_job_template:
name: oc-inspect #必須對應AAP內的template 名稱
organization: Default
```

#### Commit and push your rulebook
In the vscode terminal
```bas=
cd ~/eda-rulebooks
git add *
git commit -am "rulebook for oc adm inspect"
git push
```
VScode will ask you to login to gittea.

### Create Rulebook Activcation in AAP
#### Sync-Rulebook project again

Click Create `rulebook activation`

Select anf fill like following example

After that you can see your rule is activated

### 測試 (建立一個會自動probe fail的pod)
On you VScode termianl
```bash=
oc new-project jace
cat << EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
securityContext:
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
capabilities:
drop:
- ALL
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
EOF
```


### 驗證inspect是否成功


## 5. [Option] 自動打包inspect file並建立support case
0. 於Red Hat網頁生成你的Red Hat Token
https://access.redhat.com/management/api

1. 接續lab 4, 將`oc-inspect.yml` 替換成以下內容
```yaml=
- name: oc adm inspect
hosts: bastion
gather_facts: no
vars:
ns: "{{ ansible_eda.event.resource.metadata.namespace }}"
tasks:
- name: Create inspect file
shell:
"rm -rf ./inspect.local.{{ ns }} && oc adm inspect ns/{{ ns }} --dest-dir=./inspect.local.{{ ns }} --kubeconfig /home/lab-user/.kube/config"
register: lsout
- name: Compress with tar
command: "tar -czf inspect.local.{{ ns }}.tar.gz inspect.local.{{ ns }}"
when: lsout.rc == 0
- name: Create support case
shell: |
RH_PORTAL_TOKEN=$(<rh-customer-portal-token) //替換TOKEN
TOKEN=$(curl https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token -d grant_type=refresh_token -d client_id=rhsm-api -d refresh_token=$RH_PORTAL_TOKEN | jq --raw-output .access_token)
response=$(curl -sS -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $TOKEN" --data '{
"product": "OpenShift Container Platform",
"version": "4.12",
"caseType": "RCA Only",
"description": "My pod crashed last night, I was wondering about RCA",
"environment": "staging",
"caseLanguage": "zh_TW",
"severity": 3,
"summary": "Summary message here."
}' "https://api.access.redhat.com/support/v1/cases")
echo $response | jq -r '.location[0] | capture("/cases/(?<case_no>[0-9]+)") | .case_no'
register: case_number
when: lsout.rc == 0
- name: Upload logs to support case
shell: |
RH_PORTAL_TOKEN=$(<rh-customer-portal-token)
TOKEN=$(curl https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token -d grant_type=refresh_token -d client_id=rhsm-api -d refresh_token=$RH_PORTAL_TOKEN | jq --raw-output .access_token)
CASE_NO={{ case_number.stdout }}
curl -X POST -F "file=@inspect.local.{{ ns }}.tar.gz" -H "Authorization: Bearer $TOKEN" https://api.access.redhat.com/support/v1/cases/${CASE_NO}/attachments
when: lsout.rc == 0 and case_number.stdout is defined
```
2. 將原本的範例pod 替換成Deployment以及Configmap
**html-configmap.yaml**
```bash
cat << EOF | oc apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: html-content
data:
index.html: |
<h1>Hello world! Welcome to K8s Summit 2023</h1>
EOF
```
```bash=
# nginx-hello-world.yaml
cat << EOF | oc apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-hello-world
labels:
app: nginx-hello-world
spec:
replicas: 1
selector:
matchLabels:
app: nginx-hello-world
template:
metadata:
labels:
app: nginx-hello-world
spec:
volumes:
- name: html-volume
configMap:
name: html-content
containers:
- name: nginx
image: "quay.io/redhattraining/hello-nginx:v1.0"
securityContext:
allowPrivilegeEscalation: false
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
capabilities:
drop:
- ALL
volumeMounts:
- name: html-volume
mountPath: /usr/share/nginx/html
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
livenessProbe:
exec:
command:
- /bin/sh
- -c
- curl -s http://localhost:8080 | grep -q "world"
initialDelaySeconds: 5
periodSeconds: 5
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
tty: true
stdin: true
serviceAccount: default
terminationGracePeriodSeconds: 5
EOF
```
```bash=
oc expose deploy/nginx-hello-world --port 8080
oc expose svc nginx-hello-world
```
## 6.0 EDA自動串接OCP MCP server進行錯誤分析
### [On Bastion or your VScode terminal] Install Gemini-CLI
1. Install Node.JS
```
# Download and install nvm:
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.3/install.sh | bash
# in lieu of restarting the shell
\. "$HOME/.nvm/nvm.sh"
# Download and install Node.js:
nvm install 22
# Verify the Node.js version:
node -v # Should print "v22.17.1".
nvm current # Should print "v22.17.1".
# Verify npm version:
npm -v # Should print "10.9.2".
```
3. Install Gemini-CLI
```
npm install -g @google/gemini-cli
```
4. Get your API-Key
https://aistudio.google.com/app/apikey

```
export GEMINI_API_KEY="<你的API-KEY>"
echo 'export GEMINI_API_KEY="<你的API-KEY>"' >> ~/.bashrc
```
5. Edit Gemini setting
```
mkdir .gemini
vi ~/.gemini/settings.json
{
"theme": "GitHub",
"mcpServers": {
"kubernetes": {
"command": "npx",
"args": [
"-y",
"rh-tam-kubernetes-mcp-server@latest"
]
}
}
}
```
### 編寫Playbook (讓gemini自動分析指定NS底下的問題)
於VScode 專案路徑event-driven-ansible/automation_controller底下新增`gemini-analyze.yml`

```yaml=
- name: Run gemini to troubleshoot given NS
hosts: bastion
gather_facts: no
vars:
ns: "{{ ansible_eda.event.resource.metadata.namespace }}"
tasks:
- name: Gemini analyze NS
shell: 'gemini -p "分析目前OpenShift內namespace {{ ns }} 有什麼異常 (只需要重點整理,不顯示推論過程)"'
register: lsout
- name: Show command output
debug:
var: lsout.stdout_lines
```
於Vscode termial
```bash=
cd ~/event-driven-ansible
git add *
git commit -am "Add gemini_analyze playbook"
git push
```
Resyce Templete project

Create a new Template called gemini-analyze


### 編寫Rulebook (設定觸發後搜集oc inspect並讓gemini分析問題)
於VScode 專案路徑eda-ruleboos/ruleboos底下新增`oc-inspce-analyze.yml`

```ya=
---
- name: Listen for unhealthy+warning event
hosts: all
sources:
- sabre1041.eda.k8s:
api_version: v1
kind: Event
namespace: jace #自行替換成新預計的ns名稱
rules:
- name: Debug
condition:
event.resource.reason == "Unhealthy" and event.resource.type == "Warning"
throttle:
once_within: 5 minutes
group_by_attributes:
- event.resource.metadata.namespace
- event.resource.involvedObject.name
actions:
- run_job_template:
name: oc-inspect #必須對應AAP內的template 名稱
organization: Default
- run_job_template:
name: gemini-analyze
organization: Default
```
```bash=
cd ~/eda-rulebooks
git add *
git commit -am "add inspect_analyze rulebook"
git push
```
#### Sync-Rulebook project again

#### Create new rule activecation


You should see this new rule is activated

Disable the privouse role - oc-inspce


Let's deploy the pod again to trigger the event.
```
oc delete pod liveness-exec --force
cat << EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
securityContext:
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
capabilities:
drop:
- ALL
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
EOF
```
After a whele, you should see the job inspect and gemini-analyze was triggered in a role.


## 7. gemini 總結問題並 create support case
### 設定環境
- 取得 Red Hat API Tokens
- 設定 bastion 環境
```
echo 'export RH_PORTAL_TOKEN="<你的RH-API-TOKEN>"' >> ~/.bashrc
```
## 創建 playbook
- 創建 automation_controller/oc-inspect-create-case.yml
```
- name: Run gemini to troubleshoot given NS
hosts: bastion
gather_facts: no
vars:
ns: "{{ ansible_eda.event.resource.metadata.namespace }}"
tasks:
- name: Create inspect file
shell:
"oc adm inspect ns/{{ ns }} --dest-dir=./inspect.local.{{ ns }} --kubeconfig /home/lab-user/.kube/config"
register: inspect_file
- name: Compress with tar
command: "tar -czf inspect.local.{{ ns }}.tar.gz inspect.local.{{ ns }}"
when: inspect_file.rc == 0
- name: Gemini analyze NS
shell: 'gemini -p "分析目前OpenShift內namespace {{ ns }} 有什麼異常 (只需要單行重點整理,不顯示推論過程)"'
register: lsout
- name: Show command output
debug:
var: lsout.stdout_lines
- name: create support case
shell: 'gemini -p "開一個 OCP support case, 標題是[test case] My pod crashed last night, I was wondering about RCA, 描述為 {{lsout.stdout_lines}},並且將檔案 inspect.local.{{ ns }}.tar.gz 上傳為附件"'
register: case_output
- name: show create case result
debug:
var: case_output
```
- sync project

- create job template

### 創建 rulebook
- playbook
```
vi ~/eda-rulebooks/rulebooks/oc_inspect_support_case.yml
---
- name: Listen for unhealthy+warning event
hosts: all
sources:
- sabre1041.eda.k8s:
api_version: v1
kind: Event
namespace: jace #自行替換成新預計的ns名稱
rules:
- name: Debug
condition:
event.resource.reason == "Unhealthy" and event.resource.type == "Warning"
throttle:
once_within: 5 minutes
group_by_attributes:
- event.resource.metadata.namespace
- event.resource.involvedObject.name
actions:
- run_job_template:
name: oc-inspect-create-case
organization: Default
```
- 上傳
```
cd ~/eda-rulebooks/
git add .
git commit -m "add oc inspect create support case rulebook"
git push
```
- sync project

- create rulebook activation

### 測試
```
oc delete pod liveness-exec --force
cat << EOF | oc apply -f -
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
securityContext:
allowPrivilegeEscalation: false
seccompProfile:
type: RuntimeDefault
capabilities:
drop:
- ALL
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
EOF
```

## 參考資料
rulebook 寫法
https://www.redhat.com/en/topics/automation/what-is-an-ansible-rulebook
https://ansible.readthedocs.io/projects/rulebook/en/stable/rules.html