# 開機時自動設定 NVIDIA GPU 耗電量
GPU 耗電量過大可能會造成系統 crash。如果常常 crash 又找不到原因,可以試試調降電源限制,讓耗電量減少一些,但是模型訓練速度可能會變慢。
參考這一篇:https://www.pugetsystems.com/labs/hpc/Quad-RTX3090-GPU-Power-Limiting-with-Systemd-and-Nvidia-smi-1983/
效能差距可以看這一篇:https://www.pugetsystems.com/labs/hpc/Quad-RTX3090-GPU-Wattage-Limited-MaxQ-TensorFlow-Performance-1974/
首先建立一個檔案 `nv-power-limit.sh`:
```bash
#!/usr/bin/env bash
# Set power limits on all NVIDIA GPUs
# Make sure nvidia-smi exists
command -v nvidia-smi &> /dev/null || { echo >&2 "nvidia-smi not found ... exiting."; exit 1; }
POWER_LIMIT=280
MAX_POWER_LIMIT=$(nvidia-smi -q -d POWER | grep 'Max Power Limit' | tr -s ' ' | cut -d ' ' -f 6 | cut -d '.' -f 1 | head -1)
if [[ ${POWER_LIMIT%.*}+0 -lt ${MAX_POWER_LIMIT%.*}+0 ]]; then
/usr/bin/nvidia-smi --persistence-mode=1
/usr/bin/nvidia-smi --power-limit=${POWER_LIMIT}
else
echo 'FAIL! POWER_LIMIT set above MAX_POWER_LIMIT ... '
exit 1
fi
exit 0
```
參數:
- `POWER_LIMIT`: 調降後的瓦數。上面的 280W 是用 3090 實測出來的結果,通常是用 `nvidia-smi` 去看最大的限制再來往下調。
改變讀寫權限並移到系統目錄下:
```bash
chmod 744 nv-power-limit.sh
sudo mv nv-power-limit.sh /usr/local/sbin/
```
開機時自動啟動,先建立一個 `nv-power-limit.service`:
```
[Unit]
Description=NVIDIA GPU Set Power Limit
After=syslog.target systemd-modules-load.service
ConditionPathExists=/usr/bin/nvidia-smi
[Service]
User=root
Environment="PATH=/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"
ExecStart=/usr/local/sbin/nv-power-limit.sh
[Install]
WantedBy=multi-user.target
```
改變權限並移到系統目錄下:
```bash
chmod 644 nv-power-limit.service
sudo mv nv-power-limit.service /usr/local/etc/systemd/
sudo ln -s /usr/local/etc/systemd/nv-power-limit.service /etc/systemd/system/nv-power-limit.service
```
試試開啟這個 service:
```bash!
sudo systemctl start nv-power-limit.service
```
接著檢查狀態:
```bash!
sudo systemctl status nv-power-limit.service
```
輸出結果看起來像這樣:
```
● nv-power-limit.service - NVIDIA GPU Set Power Limit
Loaded: loaded (/usr/local/etc/systemd/nv-power-limit.service; linked; vendor preset: enabled)
Active: inactive (dead)
Nov 23 16:11:25 pslabs-ml1 systemd[1]: Started NVIDIA GPU Set Power Limit.
Nov 23 16:11:27 pslabs-ml1 nv-power-limit.sh[14583]: Enabled persistence mode for GPU 00000000:53:00.0.
Nov 23 16:11:27 pslabs-ml1 nv-power-limit.sh[14583]: All done.
Nov 23 16:11:27 pslabs-ml1 nv-power-limit.sh[14587]: Power limit for GPU 00000000:53:00.0 was set to 280.00 W from 350.00 W.
Nov 23 16:11:27 pslabs-ml1 nv-power-limit.sh[14587]: All done.
Nov 23 16:11:27 pslabs-ml1 systemd[1]: nv-power-limit.service: Succeeded.
```
確定沒問題以後,讓它開機自動執行。
```bash!
sudo systemctl enable nv-power-limit.service
```