Azure / 虛擬機器(VM) (包含GPU) === ###### tags: `ML / Platform` ###### tags: `ML`, `Azure`, `VM`, `GPU` <br> [TOC] <br> ## Azure 入口點 https://portal.azure.com/ ![](https://i.imgur.com/gDhJBrz.png) <br> ## 建立流程 ### Step1 - 前往 [建立虛擬機器] 入口點 https://portal.azure.com/#create/Microsoft.VirtualMachine <br> ### Step2 - 配置虛擬機器:基本屬性 ![](https://i.imgur.com/7GkvgNc.png) - **資源群組:ocis-dept-test** - **虛擬機器名稱:low-cost-VM-no-CPU** - **映像檔:Ubuntu Seerver 20.04 LTS - Gen1** --- ![](https://i.imgur.com/nIHHSay.png) - **查看所有大小,各系列說明** [![](https://i.imgur.com/enbMdsT.png)](https://i.imgur.com/enbMdsT.png) - GPU 位於「**N 系列**」底下 [![](https://i.imgur.com/y0h3agW.png)](https://i.imgur.com/y0h3agW.png) - ### 標準 NC12s_v2 ![](https://i.imgur.com/XrQn3Kg.png) GP100GL [Tesla P100 PCIe 16GB] 兩張 - ### 標準 NC12s_v3 ![](https://i.imgur.com/2oSpoem.png) GV100GL [Tesla V100 PCIe 16GB] 兩張 - nvidi-smi [![](https://i.imgur.com/RX2tVWE.png)](https://i.imgur.com/RX2tVWE.png) - v3 的 GPU RAM 反而少 120MB - 實際測試 - v2 為 P100 - v3 為 V100,運算能力比 P100 更快 - 「**非進階儲存體 VM 大小**」底下,亦有 GPU [![](https://i.imgur.com/KwU2yQr.png)](https://i.imgur.com/KwU2yQr.png) <br> - **查看所有大小 (GPU, CPU, RAM)** [![](https://i.imgur.com/h9CTCai.png)](https://i.imgur.com/h9CTCai.png) - 為了準備資料(上傳資料),可先透過最便宜的 VM 來操作 - 等資料都上傳完畢,就可透過 GPU VM 來進行處理 <br> - **使用者名稱** 預設是 azureuser <br> - **SSH 公開金鑰來源** - **未建立** ![](https://i.imgur.com/0S9OT6n.png) :::warning :bulb: **提示** 下一次在 Azure 中建立 VM 時,就可以使用您所建立的 SSH 金鑰。 只要針對 **[SSH 公開金鑰來源]** 選取 **[使用儲存在 Azure 中的金鑰]** 即可。 您的電腦上已經有私密金鑰,所以您不需要下載任何項目。 ::: - **已建立** ![](https://i.imgur.com/7UdcG7h.png) 並選擇金鑰來源: ![](https://i.imgur.com/pRc8FW5.png) <br> ### Step2 - 配置虛擬機器:基本屬性 / GPU 價格 | 執行個體 | 隨用<br>隨付 | GPU | GPU-RAM | CPU | RAM | | ------ | ------- | ------- | ------- | ------- | ------- | | NC12s_v2 | 124.6720 TWD/小時 | GP100GL [Tesla P100 PCIe 16GB] x 2張 | 16280 MiB | 1(插槽) x 12(核/插槽) x 1(超執行緒/核) = 12超執行緒<br><br>Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 225G | | NC12s_v3 | 168.89 TWD/小時 | GV100GL [Tesla V100 PCIe 16GB] x 2張 | 16280 MiB | 1(插槽) x 12(核/插槽) x 1(超執行緒/核) = 12超執行緒<br><br>GHz | 225G | | NC24s_v2 | 249.20 TWD/小時 | GP100GL [Tesla P100 PCIe 16GB] x 4張 | 16280 MiB | 2(插槽) x 12(核/插槽) x 1(超執行緒/核) = 24超執行緒<br><br>Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 448.1G | | NC24s_v3 | 337.78 TWD/小時 | GV100GL<br>[Tesla V100 PCIe 16GB] x 4張 | 16280 MiB | 2(插槽) x 12(核/插槽) x 1(超執行緒/核) = 24超執行緒<br><br>Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 448.1G | | NC24rs_v3 | 371.66 TWD/小時 | GV100GL<br>[Tesla V100 PCIe 16GB] x 4張 | 16280 MiB | 2(插槽) x 12(核/插槽) x 1(超執行緒/核) = 24超執行緒<br><br>Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz | 448.1G | - GPU 價格 - [GPU 最佳化的虛擬機器大小](https://docs.microsoft.com/zh-tw/azure/virtual-machines/sizes-gpu?context=/azure/virtual-machines/context/context) - Azure doc - NCv2-series (P100) - [en](https://docs.microsoft.com/en-us/azure/virtual-machines/ncv2-series) - [zh-tw](https://docs.microsoft.com/zh-tw/azure/virtual-machines/ncv2-series) - NCv3-series (V100) - [en](https://docs.microsoft.com/en-us/azure/virtual-machines/ncv3-series) > NCv3-series VMs are powered by NVIDIA Tesla V100 GPUs. These GPUs can provide 1.5x the computational performance of the NCv2-series. > The NC24rs v3 configuration provides a low latency, high-throughput network interface optimized for tightly coupled parallel computing workloads. - [zh-tw](https://docs.microsoft.com/zh-tw/azure/virtual-machines/ncv3-series) > NCv3 系列 VM 是由 NVIDIA Tesla V100 GPU 提供技術支援。 這些 GPU 可提供 NCv2 系列 1.5 倍的計算效能。 > NC24rs v3 組態提供低延遲且高輸送量網路介面,最適合用於緊密結合的平行計算工作負載。 - ### [Ubuntu Advantage Standard 定價](https://azure.microsoft.com/zh-tw/pricing/details/virtual-machines/ubuntu-advantage-standard/) [![](https://i.imgur.com/ufqAd76.png)](https://i.imgur.com/ufqAd76.png) - ### [Linux 虛擬機器定價](https://azure.microsoft.com/zh-tw/pricing/details/virtual-machines/linux/) [![](https://i.imgur.com/nL6icfT.png)](https://i.imgur.com/nL6icfT.png) <br> ### Step3 - 配置虛擬機器:掛載硬碟 ![](https://i.imgur.com/PgLCVyv.png) - **連接現有的資料硬碟** ![](https://i.imgur.com/yuXnW6h.png) <br> ### Step4 - 配置虛擬機器:網路 ![](https://i.imgur.com/fXhgzUG.png) <br> ### Step5 - 管理 - **自動關機** (預設不啟用) 若有需要,避免忘了關機,可設置自動關機 ![](https://i.imgur.com/O0CQOWv.png) 在關機前 30 分鐘,會寄信通知; 若有需要延遲,可再推延關機時間。 [![](https://i.imgur.com/JeJfPiO.png)](https://i.imgur.com/JeJfPiO.png) 點選延遲關閉 VM ![](https://i.imgur.com/svxmhFK.png) <br> ### Step6 - 檢閱沒問題,就建立 ![](https://i.imgur.com/G2vIYHT.png) <br> ### Step7 - 前往資源 ![](https://i.imgur.com/ie055HT.png) <br> <hr> <br> ## 連線流程 ### Step1 - 查看 IP ![](https://i.imgur.com/KtU0VEh.png) - **公用 IP 位址**:`157.55.197.208` <br> ### Step2 - 使用私密金鑰( .pem ) 來連線 - ### 首次使用會遇到 Permissions 問題 - `Permissions 0777 for 'parabricks-test_key.pem' are too open.` - [解决Permissions 0777 for '/root/.ssh/id_rsa' are too open问题](https://www.jianshu.com/p/d79d0cde061b) - 解决办法 id_rsa文件默认权限属性是700,当初为了打开root文件夹临时更成了777,所以只要把root文件夹权限改回700即可 - 登入指令 ```bash= $ chmod 700 parabricks-test_key.pem $ ssh -i parabricks-test_key.pem \ azureuser@157.55.197.208 ``` [![](https://i.imgur.com/pFMwBpE.png)](https://i.imgur.com/pFMwBpE.png) <br> <hr> <br> ## 磁碟掛載 ### Step1 - 建立磁碟 或 連接既有磁碟 :::warning :bulb: 請參考另外一篇筆記:[Azure / 磁碟(Disk)](/8f1YasxKSY-Tv6yPdCMh8w) ::: 可以在建立 VM 後,再進行: - 建立新磁碟 - 連接 **新的(未格式化)** 或 **舊的(已經格式化)** 磁碟 [![](https://i.imgur.com/2pwkDzK.png)](https://i.imgur.com/2pwkDzK.png) 這邊示範,連接新的磁碟,並進行格式化 [![](https://i.imgur.com/9nJMciG.png)](https://i.imgur.com/9nJMciG.png) <br> ### Step2 - 查看磁碟清單 - `lsblk` ```bash= $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 55.5M 1 loop /snap/core18/1997 loop1 7:1 0 67.6M 1 loop /snap/lxd/20326 loop2 7:2 0 32.1M 1 loop /snap/snapd/11841 loop3 7:3 0 55.4M 1 loop /snap/core18/2066 loop4 7:4 0 32.1M 1 loop /snap/snapd/12057 sda 8:0 0 30G 0 disk ├─sda1 8:1 0 29.9G 0 part / ├─sda14 8:14 0 4M 0 part └─sda15 8:15 0 106M 0 part /boot/efi sdb 8:16 0 4G 0 disk └─sdb1 8:17 0 4G 0 part /mnt sdc 8:32 0 1T 0 disk <--- 這裡(尚未格式化) sr0 11:0 1 628K 0 rom ``` - Name 可能是 sdb, sdc, sdd, sde...,看當時配置的順序? - 沒有子選項,表示尚未格式化 ``` sdb 8:16 0 4G 0 disk <--- 已經格式化 └─sdb1 8:17 0 4G 0 part /mnt <--- 子選項 sdc 8:32 0 1T 0 disk <--- 尚未格式化 ``` - ` lsblk -o NAME,HCTL,SIZE,MOUNTPOINT` (加上 `-o` 參數) :::warning - **HCTL**: Host:Channel:Target:Lun 主機:通道:目標:邏輯單元號碼 - **Lun**: Logical Unit Number 邏輯單元號碼 ::: ```bash= $ lsblk -o NAME,HCTL,SIZE,MOUNTPOINT NAME HCTL SIZE MOUNTPOINT loop0 55.5M /snap/core18/1997 loop1 67.6M /snap/lxd/20326 loop2 32.1M /snap/snapd/11841 loop3 55.4M /snap/core18/2066 loop4 32.1M /snap/snapd/12057 sda 0:0:0:0 30G ├─sda1 29.9G / ├─sda14 4M └─sda15 106M /boot/efi sdb 1:0:1:0 4G └─sdb1 4G /mnt sdc 3:0:0:0 1T sr0 5:0:0:0 628K ``` ```bash= $ lsblk -o NAME,HCTL,SIZE,MOUNTPOINT | grep sd NAME HCTL SIZE MOUNTPOINT sda 0:0:0:0 30G ├─sda1 29.9G / ├─sda14 4M └─sda15 106M /boot/efi sdb 1:0:1:0 4G └─sdb1 4G /mnt sdc 3:0:0:0 1T ``` - `HCTL` 中的第四個欄位,就是 LUN > LUN: 資料磁碟的邏輯單元編號。此值可用於識別 VM 內的資料磁碟,因此對於連結到 VM 的每個資料磁碟都不得重複。 > > ![](https://i.imgur.com/QNfjw2D.png) - `df -h` :warning: 用 `df` 指令會看不到 1TB SSD ```bash= $ df -h Filesystem Size Used Avail Use% Mounted on /dev/root 29G 2.2G 27G 8% / devtmpfs 203M 0 203M 0% /dev tmpfs 207M 0 207M 0% /dev/shm tmpfs 42M 1.1M 41M 3% /run tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs 207M 0 207M 0% /sys/fs/cgroup /dev/loop0 56M 56M 0 100% /snap/core18/1997 /dev/loop2 33M 33M 0 100% /snap/snapd/11841 /dev/loop1 68M 68M 0 100% /snap/lxd/20326 /dev/sda15 105M 7.9M 97M 8% /boot/efi /dev/sdb1 3.9G 16M 3.7G 1% /mnt tmpfs 42M 0 42M 0% /run/user/1000 /dev/loop3 56M 56M 0 100% /snap/core18/2066 /dev/loop4 33M 33M 0 100% /snap/snapd/12057 ``` <br> ### Step3 - 格式化磁碟 - ### 方法一 ([指令來源](https://docs.microsoft.com/zh-tw/azure/virtual-machines/linux/attach-disk-portal#partition-a-new-disk)) > 2021/06/07 - OK ![](https://i.imgur.com/JzjX7P5.png) ``` # 按照文件中的指令,將 sdc 換成 sdb # (掛載的硬碟指到 sdb ) $ sudo parted /dev/sdb --script mklabel gpt mkpart xfspart xfs 0% 100% $ sudo mkfs.xfs /dev/sdb1 $ sudo partprobe /dev/sdb1 ``` - 第一行指令說明 Usage: `parted [OPTION]... [DEVICE: [COMMAND [PARAMETERS]...]...]` - DEVICE: `/dev/sdc` - OPTION: `--script` (never prompts for user intervention) - COMMAND1: `mklabel LABEL-TYPE` (create a new disklabel) - `mklabel gpt` - COMMAND2: `mkpart PART-TYPE [FS-TYPE] START END` (make a partition) - `mkpart xfspart xfs 0% 100%` - PART-TYPE: `xfspart` ? - FS-TYPE: `xfs` ? - START: `0%` - END: `100%` <br> - ### 方法二 ( [指令來源](https://docs.microsoft.com/zh-tw/learn/modules/add-and-size-disks-in-azure-virtual-machines/3-exercise-add-data-disks-to-azure-virtual-machines) | [完整Bash 指令碼](https://raw.githubusercontent.com/MicrosoftDocs/mslearn-add-and-size-disks-in-azure-virtual-machines/master/add-data-disk.sh) ) > 2021/06/08 - OK Step1: 對空白磁碟,建立分割區 ```bash= #!/bin/bash # Partition the drive /dev/sdc. # Read from standard input provide the options we want. # n adds a new partition. # p specifies the primary partition type. # the following blank line accepts the default partition number. # the following blank line accepts the default start sector. # the following blank line accepts the default final sector. # p prints the partition table. # w writes the changes and exits. sudo fdisk /dev/sdc <<EOF n p p w EOF ``` - 執行結果 ``` Welcome to fdisk (util-linux 2.34). Changes will remain in memory only, until you decide to write them. Be careful before using the write command. Device does not contain a recognized partition table. Created a new DOS disklabel with disk identifier 0x72cae361. Command (m for help): Partition type p primary (0 primary, 0 extended, 4 free) e extended (container for logical partitions) Select (default p): Partition number (1-4, default 1): First sector (2048-2147483647, default 2048): Last sector, +/-sectors or +/-size{K,M,G,T,P} (2048-2147483647, default 2147483647): Created a new partition 1 of type 'Linux' and of size 1024 GiB. Command (m for help): Disk /dev/sdc: 1 TiB, 1099511627776 bytes, 2147483648 sectors Disk model: Virtual Disk Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: dos Disk identifier: 0x72cae361 Device Boot Start End Sectors Size Id Type /dev/sdc1 2048 2147483647 2147481600 1024G 83 Linux Command (m for help): The partition table has been altered. Calling ioctl() to re-read partition table. Syncing disks. ``` Step2: 建立分割區的檔案系統(如 ext4) > 不同的檔案系統類型:ext, ext2, ext3, ext4, vfat, ntfs, nfs ```bash= # Write a file system to the partition. # ext4 creates an ext4 filesystem. # /dev/sdc1 is the device name. sudo mkfs -t ext4 /dev/sdc1 ``` - 執行結果 ``` $ sudo mkfs -t ext4 /dev/sdc1 mke2fs 1.45.5 (07-Jan-2020) Discarding device blocks: done Creating filesystem with 268435200 4k blocks and 67108864 inodes Filesystem UUID: 681feb55-b751-4ce1-831f-51f8f5783c81 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Allocating group tables: done Writing inode tables: done Creating journal (262144 blocks): done Writing superblocks and filesystem accounting information: done ``` - ### 檢視建立情況 ``` $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 55.5M 1 loop /snap/core18/1997 loop1 7:1 0 67.6M 1 loop /snap/lxd/20326 loop2 7:2 0 32.1M 1 loop /snap/snapd/11841 loop3 7:3 0 55.4M 1 loop /snap/core18/2066 loop4 7:4 0 32.1M 1 loop /snap/snapd/12057 sda 8:0 0 30G 0 disk ├─sda1 8:1 0 29.9G 0 part / ├─sda14 8:14 0 4M 0 part └─sda15 8:15 0 106M 0 part /boot/efi sdb 8:16 0 4G 0 disk └─sdb1 8:17 0 4G 0 part /mnt sdc 8:32 0 1T 0 disk └─sdc1 8:33 0 1024G 0 part <--- 出現了 sr0 11:0 1 628K 0 rom ``` <br> ### Step4 - 掛載磁碟 - ### 建立掛載點,並進行連接磁碟 ```bash= # Create the /uploads directory, # which we'll use as our mount point. sudo mkdir /uploads # Attach the disk to the mount point. sudo mount /dev/sdc1 /uploads ``` - ### 檢視建立情況 ```bash= $ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 55.5M 1 loop /snap/core18/1997 loop1 7:1 0 67.6M 1 loop /snap/lxd/20326 loop2 7:2 0 32.1M 1 loop /snap/snapd/11841 loop3 7:3 0 55.4M 1 loop /snap/core18/2066 loop4 7:4 0 32.1M 1 loop /snap/snapd/12057 sda 8:0 0 30G 0 disk ├─sda1 8:1 0 29.9G 0 part / ├─sda14 8:14 0 4M 0 part └─sda15 8:15 0 106M 0 part /boot/efi sdb 8:16 0 4G 0 disk └─sdb1 8:17 0 4G 0 part /mnt sdc 8:32 0 1T 0 disk └─sdc1 8:33 0 1024G 0 part /uploads <--- 掛載點 sr0 11:0 1 628K 0 rom ``` ```bash= # 顯示檔案系統類型:/dev/sdc1 為 ext4 $ df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/root ext4 29G 2.3G 27G 8% / devtmpfs devtmpfs 203M 0 203M 0% /dev tmpfs tmpfs 207M 0 207M 0% /dev/shm tmpfs tmpfs 42M 1.1M 41M 3% /run tmpfs tmpfs 5.0M 0 5.0M 0% /run/lock tmpfs tmpfs 207M 0 207M 0% /sys/fs/cgroup /dev/loop0 squashfs 56M 56M 0 100% /snap/core18/1997 /dev/loop2 squashfs 33M 33M 0 100% /snap/snapd/11841 /dev/loop1 squashfs 68M 68M 0 100% /snap/lxd/20326 /dev/sda15 vfat 105M 7.9M 97M 8% /boot/efi /dev/sdb1 ext4 3.9G 16M 3.7G 1% /mnt tmpfs tmpfs 42M 0 42M 0% /run/user/1000 /dev/loop3 squashfs 56M 56M 0 100% /snap/core18/2066 /dev/loop4 squashfs 33M 33M 0 100% /snap/snapd/12057 /dev/sdc1 ext4 1007G 768M 955G 1% /uploads ``` - ### 變更權限 (root → azureuser) ```bash= $ ls -ls / | grep uploads 4 drwxr-xr-x 3 root root 4096 Jun 8 03:11 uploads $ sudo chown -R $(id -u):$(id -g) /uploads $ ls -ls / | grep uploads 4 drwxr-xr-x 3 azureuser azureuser 4096 Jun 8 03:11 uploads ``` <br> ### 參考資料 - ### [使用入口網站將資料磁碟附加至 Linux VM](https://docs.microsoft.com/zh-tw/azure/virtual-machines/linux/attach-disk-portal#partition-a-new-disk) - ### [在 Azure 虛擬機器中新增磁碟及調整其大小](https://docs.microsoft.com/zh-tw/learn/modules/add-and-size-disks-in-azure-virtual-machines/) - [初始化並格式化資料磁碟](https://docs.microsoft.com/zh-tw/learn/modules/add-and-size-disks-in-azure-virtual-machines/3-exercise-add-data-disks-to-azure-virtual-machines) [Bash 指令碼](https://raw.githubusercontent.com/MicrosoftDocs/mslearn-add-and-size-disks-in-azure-virtual-machines/master/add-data-disk.sh) - 將磁碟機 `/dev/sdc` 進行分割。 - 在磁碟機上建立 `ext4` 檔案系統。 - 建立我們用來作為掛接點的 `/uploads` 目錄。 - 將磁碟連結至掛接點。 - 更新 `/etc/fstab`,如此在系統重新開機之後,磁碟機便會自動掛接。 <br> ### 參考指令:`parted -h` :::warning :bulb: **用途**:可以用來分割及格式化資料磁片 <sup>[[註](https://docs.microsoft.com/zh-tw/azure/virtual-machines/linux/attach-disk-portal#partition-a-new-disk)]</sup> ::: :::warning :warning: **注意:** 如果磁碟大小是 2 tib (TiB) 或更大,您必須使用 GPT 磁碟分割。 如果磁片大小低於 2 TiB,您可以使用 MBR 或 GPT 磁碟分割。 ::: ``` $ parted -h Usage: parted [OPTION]... [DEVICE [COMMAND [PARAMETERS]...]...] Apply COMMANDs with PARAMETERS to DEVICE. If no COMMAND(s) are given, run in interactive mode. OPTIONs: -h, --help displays this help message -l, --list lists partition layout on all block devices -m, --machine displays machine parseable output -s, --script never prompts for user intervention -v, --version displays the version -a, --align=[none|cyl|min|opt] alignment for new partitions COMMANDs: align-check TYPE N check partition N for TYPE(min|opt) alignment help [COMMAND] print general help, or help on COMMAND mklabel,mktable LABEL-TYPE create a new disklabel (partition table) mkpart PART-TYPE [FS-TYPE] START END make a partition name NUMBER NAME name partition NUMBER as NAME print [devices|free|list,all|NUMBER] display the partition table, available devices, free space, all found partitions, or a particular partition quit exit program rescue START END rescue a lost partition near START and END resizepart NUMBER END resize partition NUMBER rm NUMBER delete partition NUMBER select DEVICE choose the device to edit disk_set FLAG STATE change the FLAG on selected device disk_toggle [FLAG] toggle the state of FLAG on selected device set NUMBER FLAG STATE change the FLAG on partition NUMBER toggle [NUMBER [FLAG]] toggle the state of FLAG on partition NUMBER unit UNIT set the default unit to UNIT version display the version number and copyright information of GNU Parted Report bugs to bug-parted@gnu.org ``` - screenshot ![Uploading file..._s0eq60eyd]() <br> ### 參考指令:`fdisk -h` ```= $ fdisk -h Usage: fdisk [options] <disk> change partition table fdisk [options] -l [<disk>] list partition table(s) Display or manipulate a disk partition table. Options: -b, --sector-size <size> physical and logical sector size -B, --protect-boot don't erase bootbits when creating a new label -c, --compatibility[=<mode>] mode is 'dos' or 'nondos' (default) -L, --color[=<when>] colorize output (auto, always or never) colors are enabled by default -l, --list display partitions and exit -o, --output <list> output columns -t, --type <type> recognize specified partition table type only -u, --units[=<unit>] display units: 'cylinders' or 'sectors' (default) -s, --getsz display device size in 512-byte sectors [DEPRECATED] --bytes print SIZE in bytes rather than in human readable format -w, --wipe <mode> wipe signatures (auto, always or never) -W, --wipe-partitions <mode> wipe signatures from new partitions (auto, always or never) -C, --cylinders <number> specify the number of cylinders -H, --heads <number> specify the number of heads -S, --sectors <number> specify the number of sectors per track -h, --help display this help -V, --version display version Available output columns: gpt: Device Start End Sectors Size Type Type-UUID Attrs Name UUID dos: Device Start End Sectors Cylinders Size Type Id Attrs Boot End-C/H/S Start-C/H/S bsd: Slice Start End Sectors Cylinders Size Type Bsize Cpg Fsize sgi: Device Start End Sectors Cylinders Size Type Id Attrs sun: Device Start End Sectors Cylinders Size Type Id Flags For more details see fdisk(8). ``` <br> <hr> <br> ## 查看 GPU 資訊 ### Step0 - 系統資訊 ```bash= $ lshw WARNING: you should run this program as super-user. gpu-vm-nc12s-v3 ... *-core ... *-display:0 UNCLAIMED <--- 第一張,UNCLAIMED 表示尚未裝 driver description: 3D controller product: GV100GL [Tesla V100 PCIe 16GB] <--- vendor: NVIDIA Corporation physical id: 2 bus info: pci@0001:00:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: bus_master cap_list configuration: latency=0 resources: iomemory:100-ff iomemory:140-13f memory:41000000-41ffffff memory:1000000000-13ffffffff memory:1400000000-1401ffffff *-display:1 UNCLAIMED <--- 第二張,UNCLAIMED 表示尚未裝 driver ... ``` 安裝 driver 前後的組態差異性: [![](https://i.imgur.com/GR0cwDd.png)](https://i.imgur.com/GR0cwDd.png) ```bash= $ lspci ... 0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA 0001:00:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1) 0002:00:00.0 3D controller: NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB] (rev a1) ``` 一開始的情況,沒有 NV 相關指令 ![](https://i.imgur.com/q6rJ3Ql.png) <br> ### Step1 - 安裝 Nvidia runtime ( CUDA Toolkit ) - ### 到 [**NVIDIA 網站**](https://developer.nvidia.com/cuda-downloads) 上查詢 https://developer.nvidia.com/cuda-downloads [![](https://i.imgur.com/Sox1jQb.png)](https://i.imgur.com/Sox1jQb.png) [![](https://i.imgur.com/8U1JYPk.png)](https://i.imgur.com/8U1JYPk.png) - **Installation Instructions:** ```bash= wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /" sudo apt-get update sudo apt-get -y install cuda ``` ``` ... ***************************************************************************** *** Reboot your computer and verify that the NVIDIA graphics driver can *** *** be loaded. *** ***************************************************************************** ... $ sudo reboot ``` - ### 執行 `nvidia-smi` ![](https://i.imgur.com/GUT0cFM.png) - CUDA Version: 11.3 - 單顆 CPU 的記憶體:16280 MiB = 16GB - 如果是跑 Parabricks,須滿足 >= 12GB <br> ```bash= $ nvidia-smi -L GPU 0: NVIDIA Tesla P100-PCIE-16GB (UUID: GPU-eb0dc035-486a-e5f5-28d4-75861019ef0e) GPU 1: NVIDIA Tesla P100-PCIE-16GB (UUID: GPU-5ff40e36-feb3-502b-f83f-473bee95b150) ``` <br> <hr> <br> ## 查看 CPU 資訊 > 1(插槽) x 12(核/插槽) x 1(超執行緒/核) = 12條執行緒 ### 檢視物理CPU的個數 (單位:插槽) ``` $ cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l 1 ``` ### 檢視邏輯CPU的個數 (單位:執行緒) ``` $ cat /proc/cpuinfo |grep "processor"|wc -l 12 ``` ### 檢視CPU是幾核 (單位:核/插槽) ``` $ cat /proc/cpuinfo |grep "cores"|uniq cpu cores : 12 ``` ### 詳看 CPU 詳細資訊 ``` $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 79 model name : Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz stepping : 1 microcode : 0xffffffff cpu MHz : 2593.993 cache size : 35840 KB physical id : 0 siblings : 12 core id : 0 cpu cores : 12 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 20 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs taa itlb_multihit bogomips : 5187.98 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual power management: ... ... ``` - [E5-2690](https://ark.intel.com/content/www/tw/zh/ark/search.html?_charset_=UTF-8&q=E5-2690) 有 4 種規格,要再看「處理器基礎頻率」和「版本」 ![](https://i.imgur.com/ngz2o2y.png) ![](https://i.imgur.com/fWyJGHn.png) - [Intel® Xeon® 處理器 E5-2690 v4 (35M 快取記憶體,2.60 GHz)](https://ark.intel.com/content/www/tw/zh/ark/products/91770/intel-xeon-processor-e52690-v4-35m-cache-2-60-ghz.html) ![](https://i.imgur.com/CknejOI.png) - vCPU = 14 x 28 = 392 <br> <hr> <br> ## 查看 RAM 資訊 ### 全部記憶體 ```bash= $ lsmem RANGE SIZE STATE REMOVABLE BLOCK 0x0000000000000000-0x000000001fffffff 512M online yes 0-3 Memory block size: 128M Total online memory: 512M <--- Total offline memory: 0B ``` ```bash= $ cat /proc/meminfo | head MemTotal: 423720 kB <--- MemFree: 4876 kB MemAvailable: 44636 kB Buffers: 1320 kB Cached: 23844 kB SwapCached: 0 kB Active: 261864 kB Inactive: 9680 kB Active(anon): 255560 kB Inactive(anon): 172 kB ``` <br> <hr> <br> ## 暫時的儲存空間 (`/mnt`) :::warning :warning: **注意**:關機後再重新開機,資料會不見! ::: [![](https://i.imgur.com/ykJNgs7.png)](https://i.imgur.com/ykJNgs7.png) [![](https://i.imgur.com/hGMHJ2s.png)](https://i.imgur.com/hGMHJ2s.png) <br> <hr> <br> ## 上傳資料方式 ### scp ```bash= $ scp -i parabricks-test_key.pem \ parabricks.tar.gz \ azureuser@70.37.107.238:/mnt/parabricks ``` <br> ### rsync (不中斷+續傳) ```bash= $ rsync -e 'ssh -i parabricks-test_key.pem' \ --progress -zh --partial --append \ WGS-LIS-AI018A_R* \ azureuser@70.37.107.238:/mnt/parabricks ``` - `--partial`: 不中斷(斷線後,檔案不完整,只有一部分,不要砍掉) - `--append`: 續傳(繼續附加檔案後半部) <br> <hr> <br> ## 參考資料 - ### [快速入門:在 Azure 入口網站中建立 Linux 虛擬機器](https://docs.microsoft.com/zh-tw/azure/virtual-machines/linux/quick-create-portal) - ### [用 Azure 的 GPU VM 開始建立深度學習的開發環境](https://ericsk.medium.com/17ee8a6886eb) - [安裝 CUDA Toolkit](https://developer.nvidia.com/cuda-downloads) ![](https://i.imgur.com/yIn8tOu.png) - Base Installer ```bash= wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /" sudo apt-get update sudo apt-get -y install cuda ```