# 使用 MegaCli 換硬碟
###### tags: `c4lab`
## Intro
我們的 server 的 storage 是用 megaraid 去蓋的
所以要查看硬碟 有兩種方式
1. 關機 進 Storage 的 BIOS 看(之前都這樣)(這個 BIOS 之前案 CTRL + R)
2. megacli + smartctl (這次要講的)
Concept Overview: 相關名詞都在這裡ㄌ

## Preparation
總之就是要下載 MegaCli, yum 跟 apt 沒有
### MegaCli
MegaCli download Site:
https://www.broadcom.com/support/download-search?pg=&pf=&pn=&pa=&po=&dk=megacli&pl=
``` bash
# download
wget https://docs.broadcom.com/docs-and-downloads/raid-controllers/raid-controllers-common-files/8-07-14_MegaCLI.zip
# Unzip
unzip 8-07-14_MegaCLI.zip
cd Linux
# install
sudo yum localinstall MegaCli-8.07.14-1.noarch.rpm
```
Manual of MegaCli command line
https://www.alteeve.com/w/MegaCli64_Cheat_Sheet
### Output
List all the HDD
```
[linnil1@lncrna MegaCli]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll
Enclosure Device ID: 8
Slot Number: 0
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 0
WWN: 50014xxxxxxxxxx
Sequence Number: 2
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 5.458 TB [0x2baa0f4b0 Sectors]
Non Coerced Size: 5.457 TB [0x2ba90f4b0 Sectors]
Coerced Size: 5.457 TB [0x2ba900000 Sectors]
Sector Size: 0
Firmware state: Online, Spun Up
Device Firmware Level: 0A82
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x500304xxxxxxxxxx
Inquiry Data: WD-WX31D95Hxxxxxxx WD60EFRX-xxxxxxx 82.00A82
Device Speed: 6.0Gb/s
```
List all Virtual drives
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -ldinfo -lALL -aALL
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :raid6vd01
RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3
Size : 21.829 TB
Sector Size : 512
Parity Size : 7.276 TB
State : Optimal
Strip Size : 128 KB
Number Of Drives : 8
Span Depth : 1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Bad Blocks Exist: No
Is VD Cached: Yes
Cache Cade Type : Read Only
```
如果是想要換的硬碟,可以這樣子看序號 把以下的 `15` 換成 `Device Id`
```
[linnil1@lncrna MegaCli]$ sudo smartctl -d megaraid,15 -a /dev/sda
=== START OF INFORMATION SECTION ===
Model Family: Seagate NAS HDD
Device Model: ST3000VN000-xxxxxx
Serial Number: Zxxxxx
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
```
Reference
* https://mellowhost.com/blog/how-to-get-disk-serial-number-in-megaraid.html
### Identify Bad Disk
壞掉的硬碟 會讓 raid1/raid5/raid6 變成 degraded
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -AdpAllInfo -aALl
Device Present
================
Virtual Drives : 3
Degraded : 1
Offline : 0
Physical Devices : 24
Disks : 24
Critical Disks : 0
Failed Disks : 1
```
查看哪個硬碟 fail
```
(base) [linnil1@exon MegaCli]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aAll | egrep "(Arm)|(Device Id)|(Error)|(state)"
Device Id: 13
Media Error Count: 0
Other Error Count: 0
Firmware state: Online, Spun Up
Drive's position: DiskGroup: 1, Span: 0, Arm: 6
Device Id: 15
Media Error Count: 495
Other Error Count: 3
Firmware state: Failed
```
當然 如果壞掉的話 說不定連連都聯不進去
```
(base) [linnil1@exon ~]$ sudo smartctl -d sat+megaraid,15 -a /dev/sda
smartctl 5.43 2016-09-28 r4347 [x86_64-linux-2.6.32-754.31.1.el6.x86_64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
Smartctl: Device Read Identity Failed: megasas_cmd result: 0.15 = 0/46
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
```
### Make Sure Hot-plug for Motherboard
去察看你的主機板資訊 或者是 販售電腦的型號
確定你可以直接插拔 HDD
Mother Board PDF: https://www.supermicro.com/manuals/motherboard/C606_602/MNL-1258.pdf

## Remove it
需要用 megacli 把 壞掉ㄉ Disk 標記成 removable
參考 https://www.advancedclustering.com/act_kb/replacing-a-disk-with-megacli/
(待補)
The parameter is `-physdrv[<enclosure_ID>:<slot_id>]` , e.g. `-physdrv[8:14]`
移除前務必確認
```
sudo ./MegaCli64 -pdInfo -PhysDrv[8:14] -a0
```
然後移除
```
MegaCli64 -pdoffline -physdrv[8:14] -a0
MegaCli64 -pdmarkmissing -physdrv[8:14] -a0
MegaCli64 -pdprprmv -physdrv[8:14] -a0
```
設定他閃紅燈
```
MegaCli64 -pdlocate -start -physdrv[8:14] -a0
```
然後該硬碟外面的燈會變成紅色
### 實體拔出來
(應該沒問題吧)
### MegaCli Double Check
少一顆
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -AdpAllInfo -aALl
Device Present
================
Virtual Drives : 3
Degraded : 1
Offline : 0
Physical Devices : 24
Disks : 23
Critical Disks : 0
Failed Disks : 0
```
VD1 顯示 Partially Degraded
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -ldinfo -lALL -aALL
Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3
Size : 43.660 TB
Sector Size : 512
Parity Size : 10.915 TB
State : Partially Degraded
Strip Size : 64 KB
Number Of Drives : 10
Span Depth : 1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Encryption Type : None
Bad Blocks Exist: No
Is VD Cached: Yes
Cache Cade Type : Read Only
```
看同一個位置
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdInfo -PhysDrv[8:14] -aALL
Adapter 0: Device at Enclosure - 8, Slot - 14 is not found.
Exit Code: 0x00
```
## 換上新硬碟
### Disk
確認規格 (Space, read/write speed, serial number, model number)
記得統編發票
拍照
(新的是 WD60EFZX)

### MegaCli: Prepare for rebuilding
找到插上的 disk
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdInfo -PhysDrv[8:14] -a0
Enclosure Device ID: 8
Slot Number: 14
Enclosure position: N/A
Device Id: 15
WWN: 50014xxxxxxxxxxx
Sequence Number: 1
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 5.458 TB [0x2baa0f4b0 Sectors]
Non Coerced Size: 5.457 TB [0x2ba90f4b0 Sectors]
Coerced Size: 5.457 TB [0x2ba900000 Sectors]
Sector Size: 0
Firmware state: Unconfigured(good), Spun Up
Device Firmware Level: 0A81
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x50030480xxxxxxx
Connected Port Number: 0(path0)
Inquiry Data: WD-C81KHxxxxxx WD60EFZX-xxxxxxx 81.00A81
```
插上後的數量
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -AdpAllInfo -aALl
Device Present
================
Virtual Drives : 3
Degraded : 1
Offline : 0
Physical Devices : 25 (missing 跟 unconfigured)
Disks : 24
Critical Disks : 0
Failed Disks : 0
```
找到它屬於的位置
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -PdgetMissing -a0
Adapter 0 - Missing Physical drives
No. Array Row Size Expected
0 1 6 5722624 MB
Exit Code: 0x00
```
### Megacli rebuild
填上她的位置 array row
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -PdReplaceMissing -PhysDrv[8:14] -array1 -row6 -a0
Adapter: 0: Missing PD at Array 1, Row 6 is replaced.
Exit Code: 0x00
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdrbld -start -PhysDrv[8:14] -a0
Started rebuild progress on device(Encl-8 Slot-14)
Exit Code: 0x00
```
### Megacli rebuild progress
同時 你應該會看到 目前正在 rebuild 的硬碟 的紅燈在閃爍中
以下只是查看而已
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -PdgetMissing -a0
Adapter 0 - No Missing Drive is Found.
Exit Code: 0x00
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdInfo -PhysDrv[8:14] -a0
Firmware state: Rebuild
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdrbld -ShowProg -PhysDrv[8:14] -a0
Rebuild Progress on Device at Enclosure 8, Slot 14 Completed 0% in 3 Minutes.
Exit Code: 0x00
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdrbld -ShowProg -PhysDrv[8:14] -a0
[sudo] password for linnil1:
Rebuild Progress on Device at Enclosure 8, Slot 14 Completed 17% in 135 Minutes.
Exit Code: 0x00
```
參考這個 https://www.advancedclustering.com/act_kb/replacing-a-disk-with-megacli/
### Megacli rebuild Done
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -pdrbld -ShowProg -PhysDrv[8:14] -a0
Device(Encl-8 Slot-14) is not in rebuild process
Exit Code: 0x00
```
都是正常的
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -AdpAllInfo -aALl
Device Present
================
Virtual Drives : 3
Degraded : 0
Offline : 0
Physical Devices : 25
Disks : 24
Critical Disks : 0
Failed Disks : 0
```
state 從 degraded -> optimal
```
(base) [linnil1@exon MegaCli]$ sudo ./MegaCli64 -ldinfo -lALL -aALL
Virtual Drive: 1 (Target Id: 1)
State : Optimal
```
## Installation bug?
### Install on Ubuntu
https://www.broadcom.com/support/knowledgebase/1211161500661/installing-megacli-in-debian-or-ubuntu
### libncurses.so.5 not found
```
(env) [linnil1@rna server]$ sudo /opt/MegaRAID/MegaCli/MegaCli64
/opt/MegaRAID/MegaCli/MegaCli64: error while loading shared libraries: libncurses.so.5: cannot open shared object file: No such file or directory
```
You can install the package(centos8)
```
sudo yum install ncurses-compat-libs
```
Install [old ncurses library](https://askubuntu.com/questions/1252062/how-to-install-libncurses-so-5-in-ubuntu-20-04) from apt(ubuntu20.04)
https://askubuntu.com/questions/1252062/how-to-install-libncurses-so-5-in-ubuntu-20-04
```
sudo add-apt-repository universe
sudo aptinstall libncurses5
```
# 新增硬碟
插上硬碟後
```
Enclosure Device ID: 8 Slot Number: 17 Enclosure position: N/A Device Id: 20 WWN: 5000cca2c1d1020f Sequence Number: 7
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA
Raw Size: 14.552 TB [0x746c00000 Sectors]
Non Coerced Size: 14.551 TB [0x746b00000 Sectors]
Coerced Size: 14.551 TB [0x746b00000 Sectors]
Sector Size: 0
Firmware state: Unconfigured(good), Spun Up
Device Firmware Level: W232
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x500304801780011d
Connected Port Number: 0(path0)
Inquiry Data: 2PH6DW3J WDC WUH721816ALE6L4 PCGNW232
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive: Not Certified
Drive Temperature : N/A
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI
Drive's NCQ setting : N/A
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Drive has flagged a S.M.A.R.T alert : No
```
```
(base) linnil1@exon:~$ sudo ./MegaCli64 -cfgldadd -r6 [8:8,8:9,8:10,8:11,8:14,8:15,8:16,8:17] -a0
Adapter 0: Created VD 1
Adapter 0: Configured the Adapter!!
Exit Code: 0x00
(base) linnil1@exon:~$ sudo ./MegaCli64 -h
```
```
(base) linnil1@exon:~$ sudo ./MegaCli64 -LDinfo -L1 -aAll
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 1 (Target Id: 1)
Name :
RAID Level : Primary-6, Secondary-0, RAID Level Qualifier-3
Size : 87.313 TB
Sector Size : 512
Parity Size : 29.104 TB
State : Optimal
Strip Size : 64 KB
Number Of Drives : 8
Span Depth : 1
Default Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAdaptive, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Disk's Default
Ongoing Progresses:
Background Initialization: Completed 0%, Taken 0 min.
Encryption Type : None
Bad Blocks Exist: No
Is VD Cached: Yes
Cache Cade Type : Read Only
```