# 使用 MegaCli 換硬碟 ###### tags: `c4lab` ## Intro 我們的 server 的 storage 是用 megaraid 去蓋的 所以要查看硬碟 有兩種方式 1. 關機 進 Storage 的 BIOS 看(之前都這樣)(這個 BIOS 之前案 CTRL + R) 2. megacli + smartctl (這次要講的) Concept Overview: 相關名詞都在這裡ㄌ ![](https://i.imgur.com/65hjb04.png) ## Preparation 總之就是要下載 MegaCli, yum 跟 apt 沒有 ### MegaCli MegaCli download Site: https://www.broadcom.com/support/download-search?pg=&pf=&pn=&pa=&po=&dk=megacli&pl= ``` bash # download wget https://docs.broadcom.com/docs-and-downloads/raid-controllers/raid-controllers-common-files/8-07-14_MegaCLI.zip # Unzip unzip 8-07-14_MegaCLI.zip cd Linux # install sudo yum localinstall MegaCli-8.07.14-1.noarch.rpm ``` Manual of MegaCli command line https://www.alteeve.com/w/MegaCli64_Cheat_Sheet ### Output List all the HDD ``` [linnil1@lncrna MegaCli]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll Enclosure Device ID: 8 Slot Number: 0 Drive's position: DiskGroup: 0, Span: 0, Arm: 0 Enclosure position: N/A Device Id: 0 WWN: 50014xxxxxxxxxx Sequence Number: 2 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 5.458 TB [0x2baa0f4b0 Sectors] Non Coerced Size: 5.457 TB [0x2ba90f4b0 Sectors] Coerced Size: 5.457 TB [0x2ba900000 Sectors] Sector Size: 0 Firmware state: Online, Spun Up Device Firmware Level: 0A82 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x500304xxxxxxxxxx Inquiry Data: WD-WX31D95Hxxxxxxx WD60EFRX-xxxxxxx 82.00A82 Device Speed: 6.0Gb/s ``` List all Virtual drives ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -ldinfo -lALL -aALL Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name :raid6vd01 RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3 Size : 21.829 TB Sector Size : 512 Parity Size : 7.276 TB State : Optimal Strip Size : 128 KB Number Of Drives : 8 Span Depth : 1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Encryption Type : None Bad Blocks Exist: No Is VD Cached: Yes Cache Cade Type : Read Only ``` 如果是想要換的硬碟,可以這樣子看序號 把以下的 `15` 換成 `Device Id` ``` [linnil1@lncrna MegaCli]$ sudo smartctl -d megaraid,15 -a /dev/sda === START OF INFORMATION SECTION === Model Family: Seagate NAS HDD Device Model: ST3000VN000-xxxxxx Serial Number: Zxxxxx User Capacity: 3,000,592,982,016 bytes [3.00 TB] SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) ``` Reference * https://mellowhost.com/blog/how-to-get-disk-serial-number-in-megaraid.html ### Identify Bad Disk 壞掉的硬碟 會讓 raid1/raid5/raid6 變成 degraded ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -AdpAllInfo -aALl Device Present ================ Virtual Drives : 3 Degraded : 1 Offline : 0 Physical Devices : 24 Disks : 24 Critical Disks : 0 Failed Disks : 1 ``` 查看哪個硬碟 fail ``` (base) [linnil1@exon MegaCli]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll | egrep "(Arm)|(Device Id)|(Error)|(state)" Device Id: 13 Media Error Count: 0 Other Error Count: 0 Firmware state: Online, Spun Up Drive's position: DiskGroup: 1, Span: 0, Arm: 6 Device Id: 15 Media Error Count: 495 Other Error Count: 3 Firmware state: Failed ``` 當然 如果壞掉的話 說不定連連都聯不進去 ``` (base) [linnil1@exon ~]$ sudo smartctl -d sat+megaraid,15 -a /dev/sda smartctl 5.43 2016-09-28 r4347 [x86_64-linux-2.6.32-754.31.1.el6.x86_64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net Smartctl: Device Read Identity Failed: megasas_cmd result: 0.15 = 0/46 A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options. ``` ### Make Sure Hot-plug for Motherboard 去察看你的主機板資訊 或者是 販售電腦的型號 確定你可以直接插拔 HDD Mother Board PDF: https://www.supermicro.com/manuals/motherboard/C606_602/MNL-1258.pdf ![](https://i.imgur.com/inMs51L.png) ## Remove it 需要用 megacli 把 壞掉ㄉ Disk 標記成 removable 參考 https://www.advancedclustering.com/act_kb/replacing-a-disk-with-megacli/ (待補) The parameter is `-physdrv[<enclosure_ID>:<slot_id>]` , e.g. `-physdrv[8:14]` 移除前務必確認 ``` sudo ./MegaCli64 -pdInfo -PhysDrv[8:14] -a0 ``` 然後移除 ``` MegaCli64 -pdoffline -physdrv[8:14] -a0 MegaCli64 -pdmarkmissing -physdrv[8:14] -a0 MegaCli64 -pdprprmv -physdrv[8:14] -a0 ``` 設定他閃紅燈 ``` MegaCli64 -pdlocate -start -physdrv[8:14] -a0 ``` 然後該硬碟外面的燈會變成紅色 ### 實體拔出來 (應該沒問題吧) ### MegaCli Double Check 少一顆 ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -AdpAllInfo -aALl Device Present ================ Virtual Drives : 3 Degraded : 1 Offline : 0 Physical Devices : 24 Disks : 23 Critical Disks : 0 Failed Disks : 0 ``` VD1 顯示 Partially Degraded ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -ldinfo -lALL -aALL Virtual Drive: 1 (Target Id: 1) Name : RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3 Size : 43.660 TB Sector Size : 512 Parity Size : 10.915 TB State : Partially Degraded Strip Size : 64 KB Number Of Drives : 10 Span Depth : 1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Encryption Type : None Bad Blocks Exist: No Is VD Cached: Yes Cache Cade Type : Read Only ``` 看同一個位置 ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdInfo -PhysDrv[8:14] -aALL Adapter 0: Device at Enclosure - 8, Slot - 14 is not found. Exit Code: 0x00 ``` ## 換上新硬碟 ### Disk 確認規格 (Space, read/write speed, serial number, model number) 記得統編發票 拍照 (新的是 WD60EFZX) ![](https://i.imgur.com/SbwsFsL.png) ### MegaCli: Prepare for rebuilding 找到插上的 disk ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdInfo -PhysDrv[8:14] -a0 Enclosure Device ID: 8 Slot Number: 14 Enclosure position: N/A Device Id: 15 WWN: 50014xxxxxxxxxxx Sequence Number: 1 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 5.458 TB [0x2baa0f4b0 Sectors] Non Coerced Size: 5.457 TB [0x2ba90f4b0 Sectors] Coerced Size: 5.457 TB [0x2ba900000 Sectors] Sector Size: 0 Firmware state: Unconfigured(good), Spun Up Device Firmware Level: 0A81 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x50030480xxxxxxx Connected Port Number: 0(path0) Inquiry Data: WD-C81KHxxxxxx WD60EFZX-xxxxxxx 81.00A81 ``` 插上後的數量 ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -AdpAllInfo -aALl Device Present ================ Virtual Drives : 3 Degraded : 1 Offline : 0 Physical Devices : 25 (missing 跟 unconfigured) Disks : 24 Critical Disks : 0 Failed Disks : 0 ``` 找到它屬於的位置 ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -PdgetMissing -a0 Adapter 0 - Missing Physical drives No. Array Row Size Expected 0 1 6 5722624 MB Exit Code: 0x00 ``` ### Megacli rebuild 填上她的位置 array row ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -PdReplaceMissing -PhysDrv[8:14] -array1 -row6 -a0 Adapter: 0: Missing PD at Array 1, Row 6 is replaced. Exit Code: 0x00 (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdrbld -start -PhysDrv[8:14] -a0 Started rebuild progress on device(Encl-8 Slot-14) Exit Code: 0x00 ``` ### Megacli rebuild progress 同時 你應該會看到 目前正在 rebuild 的硬碟 的紅燈在閃爍中 以下只是查看而已 ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -PdgetMissing -a0 Adapter 0 - No Missing Drive is Found. Exit Code: 0x00 (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdInfo -PhysDrv[8:14] -a0 Firmware state: Rebuild (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdrbld -ShowProg -PhysDrv[8:14] -a0 Rebuild Progress on Device at Enclosure 8, Slot 14 Completed 0% in 3 Minutes. Exit Code: 0x00 (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdrbld -ShowProg -PhysDrv[8:14] -a0 [sudo] password for linnil1: Rebuild Progress on Device at Enclosure 8, Slot 14 Completed 17% in 135 Minutes. Exit Code: 0x00 ``` 參考這個 https://www.advancedclustering.com/act_kb/replacing-a-disk-with-megacli/ ### Megacli rebuild Done ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdrbld -ShowProg -PhysDrv[8:14] -a0 Device(Encl-8 Slot-14) is not in rebuild process Exit Code: 0x00 ``` 都是正常的 ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -AdpAllInfo -aALl Device Present ================ Virtual Drives : 3 Degraded : 0 Offline : 0 Physical Devices : 25 Disks : 24 Critical Disks : 0 Failed Disks : 0 ``` state 從 degraded -> optimal ``` (base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -ldinfo -lALL -aALL Virtual Drive: 1 (Target Id: 1) State : Optimal ``` ## Installation bug? ### Install on Ubuntu https://www.broadcom.com/support/knowledgebase/1211161500661/installing-megacli-in-debian-or-ubuntu ### libncurses.so.5 not found ``` (env) [linnil1@rna server]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 /opt/MegaRAID/MegaCli/MegaCli64: error while loading shared libraries: libncurses.so.5: cannot open shared object file: No such file or directory ``` You can install the package(centos8) ``` sudo yum install ncurses-compat-libs ``` Install [old ncurses library](https://askubuntu.com/questions/1252062/how-to-install-libncurses-so-5-in-ubuntu-20-04) from apt(ubuntu20.04) https://askubuntu.com/questions/1252062/how-to-install-libncurses-so-5-in-ubuntu-20-04 ``` sudo add-apt-repository universe sudo aptinstall libncurses5 ``` # 新增硬碟 插上硬碟後 ``` Enclosure Device ID: 8 Slot Number: 17 Enclosure position: N/A Device Id: 20 WWN: 5000cca2c1d1020f Sequence Number: 7 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 14.552 TB [0x746c00000 Sectors] Non Coerced Size: 14.551 TB [0x746b00000 Sectors] Coerced Size: 14.551 TB [0x746b00000 Sectors] Sector Size: 0 Firmware state: Unconfigured(good), Spun Up Device Firmware Level: W232 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x500304801780011d Connected Port Number: 0(path0) Inquiry Data: 2PH6DW3J WDC WUH721816ALE6L4 PCGNW232 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive: Not Certified Drive Temperature : N/A PI Eligibility: No Drive is formatted for PI information: No PI: No PI Drive's NCQ setting : N/A Port-0 : Port status: Active Port's Linkspeed: 6.0Gb/s Drive has flagged a S.M.A.R.T alert : No ``` ``` (base) linnil1@exon:~$ sudo ./MegaCli64 -cfgldadd -r6 [8:8,8:9,8:10,8:11,8:14,8:15,8:16,8:17] -a0 Adapter 0: Created VD 1 Adapter 0: Configured the Adapter!! Exit Code: 0x00 (base) linnil1@exon:~$ sudo ./MegaCli64 -h ``` ``` (base) linnil1@exon:~$ sudo ./MegaCli64 -LDinfo -L1 -aAll Adapter 0 -- Virtual Drive Information: Virtual Drive: 1 (Target Id: 1) Name : RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3 Size : 87.313 TB Sector Size : 512 Parity Size : 29.104 TB State : Optimal Strip Size : 64 KB Number Of Drives : 8 Span Depth : 1 Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Disk's Default Ongoing Progresses: Background Initialization: Completed 0%, Taken 0 min. Encryption Type : None Bad Blocks Exist: No Is VD Cached: Yes Cache Cade Type : Read Only ```