# 開機時自動設定 NVIDIA GPU 耗電量 GPU 耗電量過大可能會造成系統 crash。如果常常 crash 又找不到原因,可以試試調降電源限制,讓耗電量減少一些,但是模型訓練速度可能會變慢。 參考這一篇:https://www.pugetsystems.com/labs/hpc/Quad-RTX3090-GPU-Power-Limiting-with-Systemd-and-Nvidia-smi-1983/ 效能差距可以看這一篇:https://www.pugetsystems.com/labs/hpc/Quad-RTX3090-GPU-Wattage-Limited-MaxQ-TensorFlow-Performance-1974/ 首先建立一個檔案 `nv-power-limit.sh`: ```bash #!/usr/bin/env bash # Set power limits on all NVIDIA GPUs # Make sure nvidia-smi exists command -v nvidia-smi &> /dev/null || { echo >&2 "nvidia-smi not found ... exiting."; exit 1; } POWER_LIMIT=280 MAX_POWER_LIMIT=$(nvidia-smi -q -d POWER | grep 'Max Power Limit' | tr -s ' ' | cut -d ' ' -f 6 | cut -d '.' -f 1 | head -1) if [[ ${POWER_LIMIT%.*}+0 -lt ${MAX_POWER_LIMIT%.*}+0 ]]; then /usr/bin/nvidia-smi --persistence-mode=1 /usr/bin/nvidia-smi --power-limit=${POWER_LIMIT} else echo 'FAIL! POWER_LIMIT set above MAX_POWER_LIMIT ... ' exit 1 fi exit 0 ``` 參數: - `POWER_LIMIT`: 調降後的瓦數。上面的 280W 是用 3090 實測出來的結果,通常是用 `nvidia-smi` 去看最大的限制再來往下調。 改變讀寫權限並移到系統目錄下: ```bash chmod 744 nv-power-limit.sh sudo mv nv-power-limit.sh /usr/local/sbin/ ``` 開機時自動啟動,先建立一個 `nv-power-limit.service`: ``` [Unit] Description=NVIDIA GPU Set Power Limit After=syslog.target systemd-modules-load.service ConditionPathExists=/usr/bin/nvidia-smi [Service] User=root Environment="PATH=/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin" ExecStart=/usr/local/sbin/nv-power-limit.sh [Install] WantedBy=multi-user.target ``` 改變權限並移到系統目錄下: ```bash chmod 644 nv-power-limit.service sudo mv nv-power-limit.service /usr/local/etc/systemd/ sudo ln -s /usr/local/etc/systemd/nv-power-limit.service /etc/systemd/system/nv-power-limit.service ``` 試試開啟這個 service: ```bash! sudo systemctl start nv-power-limit.service ``` 接著檢查狀態: ```bash! sudo systemctl status nv-power-limit.service ``` 輸出結果看起來像這樣: ``` ● nv-power-limit.service - NVIDIA GPU Set Power Limit Loaded: loaded (/usr/local/etc/systemd/nv-power-limit.service; linked; vendor preset: enabled) Active: inactive (dead) Nov 23 16:11:25 pslabs-ml1 systemd[1]: Started NVIDIA GPU Set Power Limit. Nov 23 16:11:27 pslabs-ml1 nv-power-limit.sh[14583]: Enabled persistence mode for GPU 00000000:53:00.0. Nov 23 16:11:27 pslabs-ml1 nv-power-limit.sh[14583]: All done. Nov 23 16:11:27 pslabs-ml1 nv-power-limit.sh[14587]: Power limit for GPU 00000000:53:00.0 was set to 280.00 W from 350.00 W. Nov 23 16:11:27 pslabs-ml1 nv-power-limit.sh[14587]: All done. Nov 23 16:11:27 pslabs-ml1 systemd[1]: nv-power-limit.service: Succeeded. ``` 確定沒問題以後,讓它開機自動執行。 ```bash! sudo systemctl enable nv-power-limit.service ```