Azure / 虛擬機器(VM) (包含GPU)
===
###### tags: `ML / Platform`
###### tags: `ML`, `Azure`, `VM`, `GPU`
<br>
[TOC]
<br>
## Azure 入口點
https://portal.azure.com/

<br>
## 建立流程
### Step1 - 前往 [建立虛擬機器] 入口點
https://portal.azure.com/#create/Microsoft.VirtualMachine
<br>
### Step2 - 配置虛擬機器:基本屬性

- **資源群組:ocis-dept-test**
- **虛擬機器名稱:low-cost-VM-no-CPU**
- **映像檔:Ubuntu Seerver 20.04 LTS - Gen1**
---

- **查看所有大小,各系列說明**
[](https://i.imgur.com/enbMdsT.png)
- GPU 位於「**N 系列**」底下
[](https://i.imgur.com/y0h3agW.png)
- ### 標準 NC12s_v2

GP100GL [Tesla P100 PCIe 16GB] 兩張
- ### 標準 NC12s_v3

GV100GL [Tesla V100 PCIe 16GB] 兩張
- nvidi-smi
[](https://i.imgur.com/RX2tVWE.png)
- v3 的 GPU RAM 反而少 120MB
- 實際測試
- v2 為 P100
- v3 為 V100,運算能力比 P100 更快
- 「**非進階儲存體 VM 大小**」底下,亦有 GPU
[](https://i.imgur.com/KwU2yQr.png)
<br>
- **查看所有大小 (GPU, CPU, RAM)**
[](https://i.imgur.com/h9CTCai.png)
- 為了準備資料(上傳資料),可先透過最便宜的 VM 來操作
- 等資料都上傳完畢,就可透過 GPU VM 來進行處理
<br>
- **使用者名稱**
預設是 azureuser
<br>
- **SSH 公開金鑰來源**
- **未建立**

:::warning
:bulb: **提示**
下一次在 Azure 中建立 VM 時,就可以使用您所建立的 SSH 金鑰。
只要針對 **[SSH 公開金鑰來源]** 選取 **[使用儲存在 Azure 中的金鑰]** 即可。 您的電腦上已經有私密金鑰,所以您不需要下載任何項目。
:::
- **已建立**

並選擇金鑰來源:

<br>
### Step2 - 配置虛擬機器:基本屬性 / GPU 價格
| 執行個體 | 隨用<br>隨付 | GPU | GPU-RAM | CPU | RAM |
| ------ | ------- | ------- | ------- | ------- | ------- |
| NC12s_v2 | 124.6720 TWD/小時 | GP100GL [Tesla P100 PCIe 16GB] x 2張 | 16280 MiB | 1(插槽) x 12(核/插槽) x 1(超執行緒/核) = 12超執行緒<br><br>Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 225G |
| NC12s_v3 | 168.89 TWD/小時 | GV100GL [Tesla V100 PCIe 16GB] x 2張 | 16280 MiB | 1(插槽) x 12(核/插槽) x 1(超執行緒/核) = 12超執行緒<br><br>GHz | 225G |
| NC24s_v2 | 249.20 TWD/小時 | GP100GL [Tesla P100 PCIe 16GB] x 4張 | 16280 MiB | 2(插槽) x 12(核/插槽) x 1(超執行緒/核) = 24超執行緒<br><br>Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 448.1G |
| NC24s_v3 | 337.78 TWD/小時 | GV100GL<br>[Tesla V100 PCIe 16GB] x 4張 | 16280 MiB | 2(插槽) x 12(核/插槽) x 1(超執行緒/核) = 24超執行緒<br><br>Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 448.1G |
| NC24rs_v3 | 371.66 TWD/小時 | GV100GL<br>[Tesla V100 PCIe 16GB] x 4張 | 16280 MiB | 2(插槽) x 12(核/插槽) x 1(超執行緒/核) = 24超執行緒<br><br>Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 448.1G |
- GPU 價格
- [GPU 最佳化的虛擬機器大小](https://docs.microsoft.com/zh-tw/azure/virtual-machines/sizes-gpu?context=/azure/virtual-machines/context/context)
- Azure doc
- NCv2-series (P100)
- [en](https://docs.microsoft.com/en-us/azure/virtual-machines/ncv2-series)
- [zh-tw](https://docs.microsoft.com/zh-tw/azure/virtual-machines/ncv2-series)
- NCv3-series (V100)
- [en](https://docs.microsoft.com/en-us/azure/virtual-machines/ncv3-series)
> NCv3-series VMs are powered by NVIDIA Tesla V100 GPUs. These GPUs can provide 1.5x the computational performance of the NCv2-series.
> The NC24rs v3 configuration provides a low latency, high-throughput network interface optimized for tightly coupled parallel computing workloads.
- [zh-tw](https://docs.microsoft.com/zh-tw/azure/virtual-machines/ncv3-series)
> NCv3 系列 VM 是由 NVIDIA Tesla V100 GPU 提供技術支援。 這些 GPU 可提供 NCv2 系列 1.5 倍的計算效能。
> NC24rs v3 組態提供低延遲且高輸送量網路介面,最適合用於緊密結合的平行計算工作負載。
- ### [Ubuntu Advantage Standard 定價](https://azure.microsoft.com/zh-tw/pricing/details/virtual-machines/ubuntu-advantage-standard/)
[](https://i.imgur.com/ufqAd76.png)
- ### [Linux 虛擬機器定價](https://azure.microsoft.com/zh-tw/pricing/details/virtual-machines/linux/)
[](https://i.imgur.com/nL6icfT.png)
<br>
### Step3 - 配置虛擬機器:掛載硬碟

- **連接現有的資料硬碟**

<br>
### Step4 - 配置虛擬機器:網路

<br>
### Step5 - 管理
- **自動關機** (預設不啟用)
若有需要,避免忘了關機,可設置自動關機

在關機前 30 分鐘,會寄信通知;
若有需要延遲,可再推延關機時間。
[](https://i.imgur.com/JeJfPiO.png)
點選延遲關閉 VM

<br>
### Step6 - 檢閱沒問題,就建立

<br>
### Step7 - 前往資源

<br>
<hr>
<br>
## 連線流程
### Step1 - 查看 IP

- **公用 IP 位址**:`157.55.197.208`
<br>
### Step2 - 使用私密金鑰( .pem ) 來連線
- ### 首次使用會遇到 Permissions 問題
- `Permissions 0777 for 'parabricks-test_key.pem' are too open.`
- [解决Permissions 0777 for '/root/.ssh/id_rsa' are too open问题](https://www.jianshu.com/p/d79d0cde061b)
- 解决办法
id_rsa文件默认权限属性是700,当初为了打开root文件夹临时更成了777,所以只要把root文件夹权限改回700即可
- 登入指令
```bash=
$ chmod 700 parabricks-test_key.pem
$ ssh -i parabricks-test_key.pem \
azureuser@157.55.197.208
```
[](https://i.imgur.com/pFMwBpE.png)
<br>
<hr>
<br>
## 磁碟掛載
### Step1 - 建立磁碟 或 連接既有磁碟
:::warning
:bulb: 請參考另外一篇筆記:[Azure / 磁碟(Disk)](/8f1YasxKSY-Tv6yPdCMh8w)
:::
可以在建立 VM 後,再進行:
- 建立新磁碟
- 連接 **新的(未格式化)** 或 **舊的(已經格式化)** 磁碟
[](https://i.imgur.com/2pwkDzK.png)
這邊示範,連接新的磁碟,並進行格式化
[](https://i.imgur.com/9nJMciG.png)
<br>
### Step2 - 查看磁碟清單
- `lsblk`
```bash=
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 55.5M 1 loop /snap/core18/1997
loop1 7:1 0 67.6M 1 loop /snap/lxd/20326
loop2 7:2 0 32.1M 1 loop /snap/snapd/11841
loop3 7:3 0 55.4M 1 loop /snap/core18/2066
loop4 7:4 0 32.1M 1 loop /snap/snapd/12057
sda 8:0 0 30G 0 disk
├─sda1 8:1 0 29.9G 0 part /
├─sda14 8:14 0 4M 0 part
└─sda15 8:15 0 106M 0 part /boot/efi
sdb 8:16 0 4G 0 disk
└─sdb1 8:17 0 4G 0 part /mnt
sdc 8:32 0 1T 0 disk <--- 這裡(尚未格式化)
sr0 11:0 1 628K 0 rom
```
- Name 可能是 sdb, sdc, sdd, sde...,看當時配置的順序?
- 沒有子選項,表示尚未格式化
```
sdb 8:16 0 4G 0 disk <--- 已經格式化
└─sdb1 8:17 0 4G 0 part /mnt <--- 子選項
sdc 8:32 0 1T 0 disk <--- 尚未格式化
```
- ` lsblk -o NAME,HCTL,SIZE,MOUNTPOINT` (加上 `-o` 參數)
:::warning
- **HCTL**:
Host:Channel:Target:Lun
主機:通道:目標:邏輯單元號碼
- **Lun**:
Logical Unit Number
邏輯單元號碼
:::
```bash=
$ lsblk -o NAME,HCTL,SIZE,MOUNTPOINT
NAME HCTL SIZE MOUNTPOINT
loop0 55.5M /snap/core18/1997
loop1 67.6M /snap/lxd/20326
loop2 32.1M /snap/snapd/11841
loop3 55.4M /snap/core18/2066
loop4 32.1M /snap/snapd/12057
sda 0:0:0:0 30G
├─sda1 29.9G /
├─sda14 4M
└─sda15 106M /boot/efi
sdb 1:0:1:0 4G
└─sdb1 4G /mnt
sdc 3:0:0:0 1T
sr0 5:0:0:0 628K
```
```bash=
$ lsblk -o NAME,HCTL,SIZE,MOUNTPOINT | grep sd
NAME HCTL SIZE MOUNTPOINT
sda 0:0:0:0 30G
├─sda1 29.9G /
├─sda14 4M
└─sda15 106M /boot/efi
sdb 1:0:1:0 4G
└─sdb1 4G /mnt
sdc 3:0:0:0 1T
```
- `HCTL` 中的第四個欄位,就是 LUN
> LUN: 資料磁碟的邏輯單元編號。此值可用於識別 VM 內的資料磁碟,因此對於連結到 VM 的每個資料磁碟都不得重複。
>
> 
- `df -h`
:warning: 用 `df` 指令會看不到 1TB SSD
```bash=
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 29G 2.2G 27G 8% /
devtmpfs 203M 0 203M 0% /dev
tmpfs 207M 0 207M 0% /dev/shm
tmpfs 42M 1.1M 41M 3% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 207M 0 207M 0% /sys/fs/cgroup
/dev/loop0 56M 56M 0 100% /snap/core18/1997
/dev/loop2 33M 33M 0 100% /snap/snapd/11841
/dev/loop1 68M 68M 0 100% /snap/lxd/20326
/dev/sda15 105M 7.9M 97M 8% /boot/efi
/dev/sdb1 3.9G 16M 3.7G 1% /mnt
tmpfs 42M 0 42M 0% /run/user/1000
/dev/loop3 56M 56M 0 100% /snap/core18/2066
/dev/loop4 33M 33M 0 100% /snap/snapd/12057
```
<br>
### Step3 - 格式化磁碟
- ### 方法一 ([指令來源](https://docs.microsoft.com/zh-tw/azure/virtual-machines/linux/attach-disk-portal#partition-a-new-disk))
> 2021/06/07 - OK

```
# 按照文件中的指令,將 sdc 換成 sdb
# (掛載的硬碟指到 sdb )
$ sudo parted /dev/sdb --script mklabel gpt mkpart xfspart xfs 0% 100%
$ sudo mkfs.xfs /dev/sdb1
$ sudo partprobe /dev/sdb1
```
- 第一行指令說明
Usage: `parted [OPTION]... [DEVICE: [COMMAND [PARAMETERS]...]...]`
- DEVICE: `/dev/sdc`
- OPTION: `--script` (never prompts for user intervention)
- COMMAND1: `mklabel LABEL-TYPE` (create a new disklabel)
- `mklabel gpt`
- COMMAND2: `mkpart PART-TYPE [FS-TYPE] START END` (make a partition)
- `mkpart xfspart xfs 0% 100%`
- PART-TYPE: `xfspart` ?
- FS-TYPE: `xfs` ?
- START: `0%`
- END: `100%`
<br>
- ### 方法二 ( [指令來源](https://docs.microsoft.com/zh-tw/learn/modules/add-and-size-disks-in-azure-virtual-machines/3-exercise-add-data-disks-to-azure-virtual-machines) | [完整Bash 指令碼](https://raw.githubusercontent.com/MicrosoftDocs/mslearn-add-and-size-disks-in-azure-virtual-machines/master/add-data-disk.sh) )
> 2021/06/08 - OK
Step1: 對空白磁碟,建立分割區
```bash=
#!/bin/bash
# Partition the drive /dev/sdc.
# Read from standard input provide the options we want.
# n adds a new partition.
# p specifies the primary partition type.
# the following blank line accepts the default partition number.
# the following blank line accepts the default start sector.
# the following blank line accepts the default final sector.
# p prints the partition table.
# w writes the changes and exits.
sudo fdisk /dev/sdc <<EOF
n
p
p
w
EOF
```
- 執行結果
```
Welcome to fdisk (util-linux 2.34).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
Device does not contain a recognized partition table.
Created a new DOS disklabel with disk identifier 0x72cae361.
Command (m for help): Partition type
p primary (0 primary, 0 extended, 4 free)
e extended (container for logical partitions)
Select (default p): Partition number (1-4, default 1): First sector (2048-2147483647, default 2048): Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-2147483647, default 2147483647):
Created a new partition 1 of type 'Linux' and of size 1024 GiB.
Command (m for help): Disk /dev/sdc: 1 TiB, 1099511627776 bytes, 2147483648 sectors
Disk model: Virtual Disk
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x72cae361
Device Boot Start End Sectors Size Id Type
/dev/sdc1 2048 2147483647 2147481600 1024G 83 Linux
Command (m for help): The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
```
Step2: 建立分割區的檔案系統(如 ext4)
> 不同的檔案系統類型:ext, ext2, ext3, ext4, vfat, ntfs, nfs
```bash=
# Write a file system to the partition.
# ext4 creates an ext4 filesystem.
# /dev/sdc1 is the device name.
sudo mkfs -t ext4 /dev/sdc1
```
- 執行結果
```
$ sudo mkfs -t ext4 /dev/sdc1
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done
Creating filesystem with 268435200 4k blocks and 67108864 inodes
Filesystem UUID: 681feb55-b751-4ce1-831f-51f8f5783c81
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848
Allocating group tables: done
Writing inode tables: done
Creating journal (262144 blocks): done
Writing superblocks and filesystem accounting information: done
```
- ### 檢視建立情況
```
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 55.5M 1 loop /snap/core18/1997
loop1 7:1 0 67.6M 1 loop /snap/lxd/20326
loop2 7:2 0 32.1M 1 loop /snap/snapd/11841
loop3 7:3 0 55.4M 1 loop /snap/core18/2066
loop4 7:4 0 32.1M 1 loop /snap/snapd/12057
sda 8:0 0 30G 0 disk
├─sda1 8:1 0 29.9G 0 part /
├─sda14 8:14 0 4M 0 part
└─sda15 8:15 0 106M 0 part /boot/efi
sdb 8:16 0 4G 0 disk
└─sdb1 8:17 0 4G 0 part /mnt
sdc 8:32 0 1T 0 disk
└─sdc1 8:33 0 1024G 0 part <--- 出現了
sr0 11:0 1 628K 0 rom
```
<br>
### Step4 - 掛載磁碟
- ### 建立掛載點,並進行連接磁碟
```bash=
# Create the /uploads directory,
# which we'll use as our mount point.
sudo mkdir /uploads
# Attach the disk to the mount point.
sudo mount /dev/sdc1 /uploads
```
- ### 檢視建立情況
```bash=
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 55.5M 1 loop /snap/core18/1997
loop1 7:1 0 67.6M 1 loop /snap/lxd/20326
loop2 7:2 0 32.1M 1 loop /snap/snapd/11841
loop3 7:3 0 55.4M 1 loop /snap/core18/2066
loop4 7:4 0 32.1M 1 loop /snap/snapd/12057
sda 8:0 0 30G 0 disk
├─sda1 8:1 0 29.9G 0 part /
├─sda14 8:14 0 4M 0 part
└─sda15 8:15 0 106M 0 part /boot/efi
sdb 8:16 0 4G 0 disk
└─sdb1 8:17 0 4G 0 part /mnt
sdc 8:32 0 1T 0 disk
└─sdc1 8:33 0 1024G 0 part /uploads <--- 掛載點
sr0 11:0 1 628K 0 rom
```
```bash=
# 顯示檔案系統類型:/dev/sdc1 為 ext4
$ df -Th
Filesystem Type Size Used Avail Use% Mounted on
/dev/root ext4 29G 2.3G 27G 8% /
devtmpfs devtmpfs 203M 0 203M 0% /dev
tmpfs tmpfs 207M 0 207M 0% /dev/shm
tmpfs tmpfs 42M 1.1M 41M 3% /run
tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs tmpfs 207M 0 207M 0% /sys/fs/cgroup
/dev/loop0 squashfs 56M 56M 0 100% /snap/core18/1997
/dev/loop2 squashfs 33M 33M 0 100% /snap/snapd/11841
/dev/loop1 squashfs 68M 68M 0 100% /snap/lxd/20326
/dev/sda15 vfat 105M 7.9M 97M 8% /boot/efi
/dev/sdb1 ext4 3.9G 16M 3.7G 1% /mnt
tmpfs tmpfs 42M 0 42M 0% /run/user/1000
/dev/loop3 squashfs 56M 56M 0 100% /snap/core18/2066
/dev/loop4 squashfs 33M 33M 0 100% /snap/snapd/12057
/dev/sdc1 ext4 1007G 768M 955G 1% /uploads
```
- ### 變更權限 (root → azureuser)
```bash=
$ ls -ls / | grep uploads
4 drwxr-xr-x 3 root root 4096 Jun 8 03:11 uploads
$ sudo chown -R $(id -u):$(id -g) /uploads
$ ls -ls / | grep uploads
4 drwxr-xr-x 3 azureuser azureuser 4096 Jun 8 03:11 uploads
```
<br>
### 參考資料
- ### [使用入口網站將資料磁碟附加至 Linux VM](https://docs.microsoft.com/zh-tw/azure/virtual-machines/linux/attach-disk-portal#partition-a-new-disk)
- ### [在 Azure 虛擬機器中新增磁碟及調整其大小](https://docs.microsoft.com/zh-tw/learn/modules/add-and-size-disks-in-azure-virtual-machines/)
- [初始化並格式化資料磁碟](https://docs.microsoft.com/zh-tw/learn/modules/add-and-size-disks-in-azure-virtual-machines/3-exercise-add-data-disks-to-azure-virtual-machines)
[Bash 指令碼](https://raw.githubusercontent.com/MicrosoftDocs/mslearn-add-and-size-disks-in-azure-virtual-machines/master/add-data-disk.sh)
- 將磁碟機 `/dev/sdc` 進行分割。
- 在磁碟機上建立 `ext4` 檔案系統。
- 建立我們用來作為掛接點的 `/uploads` 目錄。
- 將磁碟連結至掛接點。
- 更新 `/etc/fstab`,如此在系統重新開機之後,磁碟機便會自動掛接。
<br>
### 參考指令:`parted -h`
:::warning
:bulb: **用途**:可以用來分割及格式化資料磁片 <sup>[[註](https://docs.microsoft.com/zh-tw/azure/virtual-machines/linux/attach-disk-portal#partition-a-new-disk)]</sup>
:::
:::warning
:warning: **注意:**
如果磁碟大小是 2 tib (TiB) 或更大,您必須使用 GPT 磁碟分割。 如果磁片大小低於 2 TiB,您可以使用 MBR 或 GPT 磁碟分割。
:::
```
$ parted -h
Usage: parted [OPTION]... [DEVICE [COMMAND [PARAMETERS]...]...]
Apply COMMANDs with PARAMETERS to DEVICE. If no COMMAND(s) are given, run in
interactive mode.
OPTIONs:
-h, --help displays this help message
-l, --list lists partition layout on all block devices
-m, --machine displays machine parseable output
-s, --script never prompts for user intervention
-v, --version displays the version
-a, --align=[none|cyl|min|opt] alignment for new partitions
COMMANDs:
align-check TYPE N check partition N for TYPE(min|opt)
alignment
help [COMMAND] print general help, or help on
COMMAND
mklabel,mktable LABEL-TYPE create a new disklabel (partition
table)
mkpart PART-TYPE [FS-TYPE] START END make a partition
name NUMBER NAME name partition NUMBER as NAME
print [devices|free|list,all|NUMBER] display the partition table,
available devices, free space, all found partitions, or a particular
partition
quit exit program
rescue START END rescue a lost partition near START
and END
resizepart NUMBER END resize partition NUMBER
rm NUMBER delete partition NUMBER
select DEVICE choose the device to edit
disk_set FLAG STATE change the FLAG on selected device
disk_toggle [FLAG] toggle the state of FLAG on selected
device
set NUMBER FLAG STATE change the FLAG on partition NUMBER
toggle [NUMBER [FLAG]] toggle the state of FLAG on partition
NUMBER
unit UNIT set the default unit to UNIT
version display the version number and
copyright information of GNU Parted
Report bugs to bug-parted@gnu.org
```
- screenshot
![Uploading file..._s0eq60eyd]()
<br>
### 參考指令:`fdisk -h`
```=
$ fdisk -h
Usage:
fdisk [options] <disk> change partition table
fdisk [options] -l [<disk>] list partition table(s)
Display or manipulate a disk partition table.
Options:
-b, --sector-size <size> physical and logical sector size
-B, --protect-boot don't erase bootbits when creating a new label
-c, --compatibility[=<mode>] mode is 'dos' or 'nondos' (default)
-L, --color[=<when>] colorize output (auto, always or never)
colors are enabled by default
-l, --list display partitions and exit
-o, --output <list> output columns
-t, --type <type> recognize specified partition table type only
-u, --units[=<unit>] display units: 'cylinders' or 'sectors' (default)
-s, --getsz display device size in 512-byte sectors [DEPRECATED]
--bytes print SIZE in bytes rather than in human readable format
-w, --wipe <mode> wipe signatures (auto, always or never)
-W, --wipe-partitions <mode> wipe signatures from new partitions (auto, always or never)
-C, --cylinders <number> specify the number of cylinders
-H, --heads <number> specify the number of heads
-S, --sectors <number> specify the number of sectors per track
-h, --help display this help
-V, --version display version
Available output columns:
gpt: Device Start End Sectors Size Type Type-UUID Attrs Name UUID
dos: Device Start End Sectors Cylinders Size Type Id Attrs Boot End-C/H/S Start-C/H/S
bsd: Slice Start End Sectors Cylinders Size Type Bsize Cpg Fsize
sgi: Device Start End Sectors Cylinders Size Type Id Attrs
sun: Device Start End Sectors Cylinders Size Type Id Flags
For more details see fdisk(8).
```
<br>
<hr>
<br>
## 查看 GPU 資訊
### Step0 - 系統資訊
```bash=
$ lshw
WARNING: you should run this program as super-user.
gpu-vm-nc12s-v3
...
*-core
...
*-display:0 UNCLAIMED <--- 第一張,UNCLAIMED 表示尚未裝 driver
description: 3D controller
product: GV100GL [Tesla V100 PCIe 16GB] <---
vendor: NVIDIA Corporation
physical id: 2
bus info: pci@0001:00:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: bus_master cap_list
configuration: latency=0
resources: iomemory:100-ff iomemory:140-13f memory:41000000-41ffffff memory:1000000000-13ffffffff memory:1400000000-1401ffffff
*-display:1 UNCLAIMED <--- 第二張,UNCLAIMED 表示尚未裝 driver
...
```
安裝 driver 前後的組態差異性:
[](https://i.imgur.com/GR0cwDd.png)
```bash=
$ lspci
...
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
0001:00:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
0002:00:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1)
```
一開始的情況,沒有 NV 相關指令

<br>
### Step1 - 安裝 Nvidia runtime ( CUDA Toolkit )
- ### 到 [**NVIDIA 網站**](https://developer.nvidia.com/cuda-downloads) 上查詢
https://developer.nvidia.com/cuda-downloads
[](https://i.imgur.com/Sox1jQb.png)
[](https://i.imgur.com/8U1JYPk.png)
- **Installation Instructions:**
```bash=
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda
```
```
...
*****************************************************************************
*** Reboot your computer and verify that the NVIDIA graphics driver can ***
*** be loaded. ***
*****************************************************************************
...
$ sudo reboot
```
- ### 執行 `nvidia-smi`

- CUDA Version: 11.3
- 單顆 CPU 的記憶體:16280 MiB = 16GB
- 如果是跑 Parabricks,須滿足 >= 12GB
<br>
```bash=
$ nvidia-smi -L
GPU 0: NVIDIA Tesla P100-PCIE-16GB (UUID: GPU-eb0dc035-486a-e5f5-28d4-75861019ef0e)
GPU 1: NVIDIA Tesla P100-PCIE-16GB (UUID: GPU-5ff40e36-feb3-502b-f83f-473bee95b150)
```
<br>
<hr>
<br>
## 查看 CPU 資訊
> 1(插槽) x 12(核/插槽) x 1(超執行緒/核) = 12條執行緒
### 檢視物理CPU的個數 (單位:插槽)
```
$ cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l
1
```
### 檢視邏輯CPU的個數 (單位:執行緒)
```
$ cat /proc/cpuinfo |grep "processor"|wc -l
12
```
### 檢視CPU是幾核 (單位:核/插槽)
```
$ cat /proc/cpuinfo |grep "cores"|uniq
cpu cores : 12
```
### 詳看 CPU 詳細資訊
```
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz
stepping : 1
microcode : 0xffffffff
cpu MHz : 2593.993
cache size : 35840 KB
physical id : 0
siblings : 12
core id : 0
cpu cores : 12
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit
bogomips : 5187.98
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
...
...
```
- [E5-2690](https://ark.intel.com/content/www/tw/zh/ark/search.html?_charset_=UTF-8&q=E5-2690) 有 4 種規格,要再看「處理器基礎頻率」和「版本」


- [Intel® Xeon® 處理器 E5-2690 v4 (35M 快取記憶體,2.60 GHz)](https://ark.intel.com/content/www/tw/zh/ark/products/91770/intel-xeon-processor-e52690-v4-35m-cache-2-60-ghz.html)

- vCPU = 14 x 28 = 392
<br>
<hr>
<br>
## 查看 RAM 資訊
### 全部記憶體
```bash=
$ lsmem
RANGE SIZE STATE REMOVABLE BLOCK
0x0000000000000000-0x000000001fffffff 512M online yes 0-3
Memory block size: 128M
Total online memory: 512M <---
Total offline memory: 0B
```
```bash=
$ cat /proc/meminfo | head
MemTotal: 423720 kB <---
MemFree: 4876 kB
MemAvailable: 44636 kB
Buffers: 1320 kB
Cached: 23844 kB
SwapCached: 0 kB
Active: 261864 kB
Inactive: 9680 kB
Active(anon): 255560 kB
Inactive(anon): 172 kB
```
<br>
<hr>
<br>
## 暫時的儲存空間 (`/mnt`)
:::warning
:warning: **注意**:關機後再重新開機,資料會不見!
:::
[](https://i.imgur.com/ykJNgs7.png)
[](https://i.imgur.com/hGMHJ2s.png)
<br>
<hr>
<br>
## 上傳資料方式
### scp
```bash=
$ scp -i parabricks-test_key.pem \
parabricks.tar.gz \
azureuser@70.37.107.238:/mnt/parabricks
```
<br>
### rsync (不中斷+續傳)
```bash=
$ rsync -e 'ssh -i parabricks-test_key.pem' \
--progress -zh --partial --append \
WGS-LIS-AI018A_R* \
azureuser@70.37.107.238:/mnt/parabricks
```
- `--partial`: 不中斷(斷線後,檔案不完整,只有一部分,不要砍掉)
- `--append`: 續傳(繼續附加檔案後半部)
<br>
<hr>
<br>
## 參考資料
- ### [快速入門:在 Azure 入口網站中建立 Linux 虛擬機器](https://docs.microsoft.com/zh-tw/azure/virtual-machines/linux/quick-create-portal)
- ### [用 Azure 的 GPU VM 開始建立深度學習的開發環境](https://ericsk.medium.com/17ee8a6886eb)
- [安裝 CUDA Toolkit](https://developer.nvidia.com/cuda-downloads)

- Base Installer
```bash=
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
sudo apt-get update
sudo apt-get -y install cuda
```