OpenShift storage via mdadm and LVM

# OpenShift storage via mdadm and LVM Several of my customers have recently asked about installing OpenShift on a single node that has multiple SSD / NVMe storage drives. These are my notes about combining the drives using Linux' software RAID (`mdadm`) and then dynamically allocating slices of the big RAID array via Logical Volumes (`LVM`). [`mdadm` is the Multi-Device Administration (CLI) tool](https://github.com/md-raid-utilities/mdadm/). [`LVM` is the Logical Volume Manager toolset](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/configuring_and_managing_logical_volumes/overview-of-logical-volume-management_configuring-and-managing-logical-volumes#overview-of-logical-volume-management_configuring-and-managing-logical-volumes). :::info I describe using `mdadm` because neither the [LVMStorage Operator](https://docs.openshift.com/container-platform/4.16/storage/persistent_storage/persistent_storage_local/persistent-storage-using-lvms.html) nor the [topolvm project](https://github.com/topolvm/topolvm?tab=readme-ov-file#topolvm) use LVM's native RAID capabilities. ::: ## Step 1 - Create a RAID array Combining several disks into one useable pool is commonly known as "creating a RAID array." RAID is an abbreviation of **R**edundant **A**rray of **I**ndependent **D**isks. Most RAID arrays will sacrifice capacity (and performance) in order to protect your data. But if you like to live dangerously (or your data is backed up using other mechanisms) you can disable all data protection. For example, a 2-disk RAID array can be setup as either RAID 0 (no protection) or RAID 1 (mirror/copy all data from the first drive to the second drive.) Protecting your data in this example means your useable disk space is 50% of the raw/total disk space. And your performance is likely decreased as well because every time you save something to disk, it has to be saved twice. If you have three (3) or more disks, `mdadm` can use _parity_ calculations to protect your data without sacrificing as much useable capacity. These types of arrays are known as RAID 5 and RAID 6. The _parity_ calculations are quick, but they can still reduce performance. A RAID 5 array sacrifies the capacity of one drive in order to protect your data. RAID 6 sacrifices the capacity of two drives in order to protect your data. RAID 6 has become the de facto standard when creating RAID arrays from large capacity HDDs. The two disks of capacity sacrificed to protect your data means that your data is still accessible when two drives have failed. Many people report that replacing a drive causes additional strain on the remaining (old) drives which increases their likelyhood of failing during. Additional information can be found in the [Red Hat Enterprise Linux 9 documentation - Managing Storage Devices - Creating a software RAID on an installed system](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/managing_storage_devices/managing-raid_managing-storage-devices#creating-a-software-raid-on-an-installed-system_managing-raid) :::spoiler Click here to learn a bit more about RAID 0, RAID 1, RAID 5 and RAID 6 https://en.wikipedia.org/wiki/Standard_RAID_levels **RAID 0** - Max capacity because your data is not protected **RAID 1 & 10** - Half capacity because your data is mirrored **RAID 5** - Your data is still available with one (1) failed drive **RAID 6** - Your data is still available with two (2) failed drives ::: ### Create a RAID-1 array ```bash mdadm --create lvm-storage-raid-1 --level 1 --raid-devices 2 /dev/vdb /dev/vdc ``` ### Create a RAID-6 array ```bash # Use a bash trick to expand /dev/nvme[1-6]n1 into nvme1n1, nvme2n1, nvme3n1, ... mdadm --create lvm-storage-raid-6 --level 6 --raid-devices 6 /dev/nvme[1-6]n1 ``` ### Create a partition label It's a good idea to tell LVMStorage to use a partition with a particular GPT label. You can create the partition and label like this: ```bash sgdisk -n 0:0:0 -c 0:"lvm-storage" /dev/lvm-storage-raid-6 ``` ## Step 2 - CreateLVM-based StorageClass The final step is to install the LVMStorage Operator and create an `LVMCluster` resource which will dynamically allocate space (logical volumes) to your containers / VMs. LVM is short for Logical Volume Manager. An example LVMCluster looks like this ### Single StorageClass example ```yaml= --- apiVersion: lvm.topolvm.io/v1alpha1 kind: LVMCluster metadata: name: lvmcluster namespace: openshift-storage spec: storage: deviceClasses: - name: lvm-storage deviceSelector: paths: - /dev/disk/by-partlabel/lvm-storage thinPoolConfig: name: thin-pool overprovisionRatio: 10 sizePercent: 90 ``` ### Multiple StorageClass example ```yaml= --- apiVersion: lvm.topolvm.io/v1alpha1 kind: LVMCluster metadata: name: lvmcluster namespace: openshift-storage spec: storage: deviceClasses: - name: hdd-vg ### the StorageClass name will be "lvms-hdd-vg" default: true ### make this the default StorageClass deviceSelector: paths: - /dev/disk/by-partlabel/lvm-storage-hdd thinPoolConfig: name: hdd-tp overprovisionRatio: 10 sizePercent: 90 - name: nvme-vg ### the StorageClass name will be "lvms-nvme\-vg" default: false deviceSelector: paths: - /dev/disk/by-partlabel/lvm-storage-nvme thinPoolConfig: name: nvme-tp overprovisionRatio: 10 sizePercent: 90 ```