Linux系統自動化運維

# Telegram 即時監控通知系統系統管理時，E-mail 通知太慢了，即時通訊軟體才能讓你第一時間收到重要通知。 ## 步驟一：建立一個 Telegram Bot 1. 在 Telegram 中搜尋 @BotFather 並開始對話。 2. 輸入指令 /newbot 來建立一個新的機器人。 3. 為機器人取一個名稱 (Name)。名稱必須以 _bot 結尾，例如：Telegram_bot。 4. 建立成功後，BotFather 會有一長串的 API Token。 5. 建立一個群組 (Group)：在 Telegram 建立一個新的群組，將 Bot 加入群組 6. 打開瀏覽器，貼上以下網址（替換成自己的 Token）：https://api.telegram.org/bot<YOUR_HTTP_API_TOKEN>/getUpdates，刷新頁面後，找到 "chat" 這個物件，裡面的 "id" 就是 Chat ID(通常是一個負數)。 7. 執行下列程式碼就可以收到訊息 ```= import requests def send_msg(msg: str, token: str, chat_id: str): """ 透過 Telegram Bot 發送訊息的函式。 """ # 確保傳入的訊息是字串 assert type(msg) == str, "傳入訊息必須為字串" # 使用 f-string 組裝 API 網址 url = f'https://api.telegram.org/bot{token}/sendMessage?chat_id={chat_id}&text={msg}' # 發送 GET 請求 requests.get(url) my_str = "金門大學資工系！" my_token = "YOUR_TOKEN" my_chat_id = "YOUR_CHAT_ID" # --- 執行發送 --- send_msg(my_str, my_token, my_chat_id) print("訊息已發送！") ``` # 本地部署大型語言模型 (Ollama) 1. 安裝ollama後使用`ollama run llama3:8b`下載模型並運行 2. 運行python就可以與ollama進行對話 ```= from langchain_community.llms import Ollama llm = Ollama(base_url="http://127.0.0.1:11434", model="llama3:8b") res = llm.invoke("你是誰？請用繁體中文回答。") print(res) ``` # Web UI 與 RAG 知識庫 ## 安裝 Page Assist - Ollama Web UI Page Assist 是一個 Chrome 擴充套件，它能自動偵測到你電腦上運行的 Ollama 服務，並提供一個功能豐富的網頁介面。 ## 使用 RAG 幻想：小型模型訓練資料有限，且缺乏即時資訊，當被問到超出其知識範圍的問題時，它會基於機率生成看似合理的答案，而非承認「不知道」。 ### 解決方案 - RAG (檢索增強生成)：給 LLM 外部文件，讓它根據外部文件的內容來回答。 1. 下載 Embedding 模型：RAG 的第一步是將我們的文件「向量化」。這需要一個專門的 Embedding 模型。`ollama pull nomic-embed-text` 2. 設定知識庫 (Knowledge Base)：在 Page Assist 介面，點擊齒輪圖示進入 Settings。找到 RAG Settings，在 Embedding Model 欄位選擇剛剛下載的 nomic-embed-text。 3. 點擊 Add Knowledge，上傳金門大學資工老師.txt 文字檔，並給它一個標題（如：nqu_csie_teacher）。 ```= 不使用 RAG：直接問模型「柯志亨是哪裡的博士？」 -> 模型會亂掰使用 RAG：點擊輸入框下方的 Knowledge 按鈕，勾選剛剛建立的 nqu_csie_teacher 知識庫。再問一次「柯志亨是哪裡的博士？」。結果：模型準確回答「柯志亨是國立成功大學電機工程研究所的博士」，因為它從我們提供的文字檔中找到了答案！ ``` ### RAG 的挑戰 - 表格資料：上傳了包含大量表格的「金門大學行事曆」文件。現象：當我們詢問開學日期、放假日時，即使使用了 RAG，模型的回答仍然錯誤百出。結論：目前的 RAG 技術對於非結構化的純文字處理得比較好，但對於結構化的表格資料理解能力仍然有限 ### 關鍵參數：Temperature (溫度) LLM 非常重要的參數。靠近 0：回答更精準、更具決定性，適合知識問答、程式碼生成。靠近 1：回答更有創意、更有想像力，適合寫詩、寫文案。 # Prometheus 的核心架構 1. Prometheus Server：監控系統的大腦。它主動去拉取（Pull）各個被監控端的數據。 2. Exporter：安裝在「被監控端」（例如你的 Web Server、資料庫主機）上的代理程式。它的工作是收集本機的各項指標（CPU、記憶體、網路流量等），並暴露一個 HTTP 端點讓 Server 來抓資料。最常用的是 node-exporter，用於監控主機本身的狀態。 3. Push Gateway：適用於一些無法被 Server 主動拉取數據的短生命週期任務。這些任務可以主動將數據「推送（Push）」到 Gateway，再由 Server 從 Gateway 拉取。 4. Alertmanager：告警管理器。你可以在 Prometheus Server 設定告警規則（例如：CPU 使用率超過 80% 連續 5 分鐘），一旦觸發，Server 就會把告警事件發給 Alertmanager，再由它透過 E-mail、Telegram 等方式通知你。 5. Web UI / Grafana：數據視覺化的工具。Prometheus 自帶一個簡易的 Web UI，但業界更常用 Grafana 來打造酷炫、資訊豐富的監控儀表板（Dashboard） ## 步驟二：安裝與設定 Prometheus (Master 端) 至少兩台虛擬機，一台當 Master (Prometheus Server)，一台當被監控的 Client (Node Exporter)。 ### 校時：確保所有機器的時間同步，這對監控系統至關重要。 ```= sudo timedatectl set-timezone Asia/Taipei sudo apt install ntpdate sudo ntpdate tock.stdtime.gov.tw ``` ### 安裝 Prometheus 套件：在 Master 機器上安裝 Server 和本機的 Exporter。 ```= sudo apt update sudo apt install prometheus prometheus-node-exporter ``` 在瀏覽器中訪問 http://<Master的IP>:9090，應該能看到 Prometheus 的 Web UI。 ### 設定監控目標 (Targets) 告訴 Prometheus Server 你要去監控哪些機器。編輯設定檔： `sudo vim /etc/prometheus/prometheus.yml` 編輯完後重啟服務：讓新的設定生效。 `sudo systemctl restart prometheus` # 具備即時上網能力的 AI 代理人 (Agent) 1. 申請Google API Key，授權程式可以使用 Google 的雲端服務。 (申請網址：https://console.cloud.google.com/apis/credentials) `進入後點擊「建立憑證」->「API 金鑰」` 2. 建立新的搜尋引擎，讓 Google 知道你要用哪個「自訂搜尋引擎」來查資料。(申請網址：https://programmablesearchengine.google.com/controlpanel/create) `隨意取個名稱，選擇「搜尋整個網路」，然後點擊「建立」。` :::info 必須在 Google Cloud Console 中啟用 Custom Search API 這個服務 ::: 3. OpenAI API Key 申請：讓我們的程式可以呼叫 GPT 模型來進行語言理解與生成。(申請網址：https://platform.openai.com/api-keys) ## 程式實作 ```= import os from langchain.chat_models.openai import ChatOpenAI from langchain.embeddings import OpenAIEmbeddings from langchain.vectorstores import Chroma from langchain.utilities import GoogleSearchAPIWrapper from langchain.retrievers.web_research import WebResearchRetriever from langchain.chains import RetrievalQAWithSourcesChain os.environ["OPENAI_API_KEY"] = "OpenAI API Key" os.environ["GOOGLE_CSE_ID"] = "Programmable Search Engine ID" os.environ["GOOGLE_API_KEY"] = "Google API Key" llm = ChatOpenAI(model_name="gpt-3.5-turbo-16k", temperature=0) vectorstore = Chroma(embedding_function=OpenAIEmbeddings(), persist_directory="./chroma_db_oai") search = GoogleSearchAPIWrapper() web_research_retriever = WebResearchRetriever.from_llm( vectorstore=vectorstore, llm=llm, search=search ) qa_chain = RetrievalQAWithSourcesChain.from_chain_type(llm, retriever=web_research_retriever) user_input_question = "台灣的首都是哪裡？" result = qa_chain({"question": user_input_question}) print("【精簡答案】:") print(result["answer"]) print("-" * 30) print("【資料來源】:") print(result["sources"]) ``` # Prometheus 進階 - 計算 CPU 使用率 CPU 使用率 = 100% - CPU 空閒率 (Idle Rate) 1. 找出 CPU 時間的原始指標：node_cpu_seconds_total。這是一個 Counter（只會增加的計數器），記錄了 CPU 在各種模式（如 idle, user, system）下花費的總秒數。 2. 計算一段時間內的增量：increase(node_cpu_seconds_total{mode="idle"}[5m])。increase 函式可以計算出一個 Counter 在過去 5 分鐘內增加了多少。 3. 計算空閒率：increase(node_cpu_seconds_total{mode="idle"}[5m]) / increase(node_cpu_seconds_total[5m])這個公式用「空閒時間增量」除以「總時間增量」，就得到了單一 CPU 核心的空閒率。 4. 跨核心匯總 (SUM)：一台主機可能有多個 CPU 核心，我們需要把它們的數據加總。sum by (instance) (increase(node_cpu_seconds_total{mode="idle"}[5m]))。sum by (instance) 會根據 instance 標籤（也就是主機 IP）來分組加總。 ```= 最終公式： 100 * (1 - (sum by (instance)(increase(node_cpu_seconds_total{mode="idle"}[5m])) / sum by (instance)(increase(node_cpu_seconds_total[5m])))) ``` # Push Gateway - 主動回報狀態 1. 新增push_ping_status.sh ```= #!/bin/bash instance_name=$(hostname -f) label="ens33_ping_status" target_ip="192.168.164.134" # 請換成你的目標 IP pushgateway_addr="192.168.164.134:9091" # 請換成你的 Push Gateway IP:Port ping -c1 -W1 ${target_ip} > /dev/null 2>&1 if [ $? = "0" ]; then status=1 else status=0 fi http://<gateway>/metrics/job/<job_name>/instance/<instance_name> echo "$label $status" | curl --data-binary @- http://${pushgateway_addr}/metrics/job/pushgateway_test/instance/${instance_name} echo "Metric pushed: $label = $status" ``` 2. 回到 Prometheus UI ，在查詢框中輸入自訂的ens33_ping_status，就能看到來自 ubuntu2 這台機器的回報值 # 本地 LLM 整合網路搜尋與 Agent ```= import os import ollama from langchain_core.tools import Tool from langchain_google_community import GoogleSearchAPIWrapper os.environ["GOOGLE_CSE_ID"] = "YOUR_CSE_ID" os.environ["GOOGLE_API_KEY"] = "YOUR_GOOGLE_API_KEY" search = GoogleSearchAPIWrapper() tool = Tool( name="google_search", description="Search Google for recent results.", func=search.run, ) question = "where is the capital city of France?" context = tool.run(question) print("context=", context) prompt = f"""Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Answer only factual information based on the context. Context: {context}.\n Question: {question} Helpful Answer:""" response = ollama.chat(model='llama3.2:1b', messages=[ { 'role': 'system', 'content': 'You are a useful AI assistant, answer only based on the information from the user prompt and nothing else.', }, { 'role': 'user', 'content': f'{prompt}', }, ]) output = response['message']['content'] print("="*50) print("result=", output) ``` # Agent 與函式呼叫 (Function Calling) 我們寫好一些工具函式（Function），並用文字「描述」這些函式的功能。當使用者提出請求時，LLM 會自己去理解問題，並從我們提供的工具箱中，挑選出最適合的工具來執行。 ```= from langchain_ollama import OllamaLLM from langchain_core.prompts import ChatPromptTemplate from langchain_core.tools import tool from langchain.tools.render import render_text_description from langchain_core.output_parsers import JsonOutputParser model = OllamaLLM(model='mistral:instruct') #使用 @tool 裝飾器來標示這是一個工具 @tool def add(first: int, second: int) -> int: "Add two integers." return first + second @tool def multiply(first: int, second: int) -> int: """Multiply two integers together.""" return first * second @tool def converse(input: str) -> str: "Provide a natural language response using the user input." return model.invoke(input) tools = [add, multiply, converse] rendered_tools = render_text_description(tools) system_prompt = f"""You are an assistant that has access to the following set of tools. Here are the names and descriptions for each tool: {rendered_tools} Given the user input, return the name and input of the tool to use. Return your response as a JSON blob with 'name' and 'arguments' keys. The value associated with the 'arguments' key should be a dictionary of parameters.""" prompt = ChatPromptTemplate.from_messages( [("system", system_prompt), ("user", "{input}")] ) chain = prompt | model | JsonOutputParser() print(chain.invoke({'input': 'What is 3 times 23'})) print(chain.invoke({'input': 'How are you today?'})) ``` # Prometheus 警告系統 # Ansible 基於SSH做管理和維護，可以對一群電腦進行控制設立三台機器，且每台主機名稱都設定完成 > no passwd login [無密碼登入](https://github.com/stereomp3/note/blob/main/linux/111semester01/1-.md#SSH-server) ```sh $ hostnamectl set-hostname centos7-1 $ bash ``` 設定ssh key (可以直接enter跳過選項) ```sh $ ssh-keygen ``` ```sh $ vim /etc/hosts ``` [![img](https://github.com/stereomp3/note/raw/main/linux/111semester01/picture/etc_host.png)](https://github.com/stereomp3/note/blob/main/linux/111semester01/picture/etc_host.png) 這樣就可以使用host name 直接連線到SSH ```sh $ ssh-copy-id root@centos7-2 $ ssh-copy-id root@centos7-3 ``` ssh連線 ```sh $ ssh root@centos7-2 ``` ```sh $ systemctl start sshd # client, [centos7-2, centos7-3] $ yum install ansible # server, [centos7-1] ``` > set up ansible hosts ```sh $ vim /etc/ansible/hosts ``` 伺服器通常是以功能或是作業系統去分類，這邊只有開兩台虛擬機，所以就簡單分兩類 ``` [server1] 192.168.42.136 # centos7-2 [server2] 192.168.42.135 # centos7-3 [servers] 192.168.42.136 192.168.42.135 ``` ansible 測試 ````sh $ ansible server1 -m ping # -m代表module，這裡使用ping module $ ansible servers -m ping ```` > ansible test ansible操作有兩種模式: ad hoc: 簡單測試不同模組，上面就是使用這樣的方式 ```sh $ ansible servers -m shell -a "chdir=/var/log cmd'ls -l | grep log'" ``` playbook: 寫成腳本執行，真正使用的方式，通常使用YAML編寫 ``` cmd: ls -l | grep log chdir:/var/log ``` 查看ansible支援模組有沒有sql ```sh $ ansible-doc -l | grep -i mysql $ ansible-doc ping # 詳細說明ping模組 $ ansible-doc -s ping # 簡單說明ping模組 -s == summary ``` 遠端指令 command，command是default模組，所以也可以直接使用 ```sh $ ansible servers -m command -a "ifconfig" # -a代表參數 $ ansible servers -a "ifconfig ens33" ``` 並不是所有指令command都可以支援，如果使用像是 `|`、`>`...，在指令裡面，就會發生錯誤 command只能用在簡單的操作上面所以要使用shell模組解決這個問題 ```sh $ ansible servers -m shell -a "ifconfig | grep -A 3 enp0s3" $ ansible-doc -s shell # 查看shell指令可以加入什麼參數 $ ansible servers -m shell -a "chdir=/tmp pwd" # 使用參數改變目前資料夾位置 tmp ```