Deploying csi-driver-nfs to OpenShift

# Connecting OpenShift to NFS storage with csi-driver-nfs > Should this be superseded by https://github.com/jooho/nfs-provisioner-operator ? Connecting an NFS server, such as a common Linux server, to OpenShift can be a convenient way to provide storage for multi-pod (ReadWriteMany/RWX) Persistent Volumes. *Fancy NFS servers*, like NetApp or Dell/EMC Isilon, have their own Container Storage Interface (CSI) drivers and shouldn't use the csi-driver-nfs described in this document. Using the appropriate CSI driver for your storage provides additional features like efficient snapshots and clones. The goal of this document is to describe how to install the [csi-driver-nfs](https://github.com/kubernetes-csi/csi-driver-nfs) provisioner onto OpenShift. The procedure can be performed using the Web Console, but I'll document the Helm/command-line method because it's easier to copy and paste. The process of configuring a Linux server to export NFS shares is [described in the Appendix below](#RHEL-as-an-NFS-server). **Please note:** The `ServiceAccounts` created by the official [Helm chart](https://github.com/kubernetes-csi/csi-driver-nfs/blob/master/charts/README.md) require additional permissions in order to overcome OpenShift's default security policy and run the `Pods`. ## Overview I assume that you already have an NFS server available. If not, follow the instructions in the Appendix in order to make a Linux server provide NFS services. Once an NFS server is available, the remaining steps are to: 1. [Make sure `helm` is installed](#Install-the-Helm-CLI-tool) 2. [Connect `helm` to the csi-driver-nfs chart repository](#Add-the-Helm-repo-and-list-available-charts) 3. [Install the chart with certain overrides](#Install-a-chart-release) 4. [Give additional permissions to the `ServiceAccounts`](#Grant-additional-permissions-to-the-ServiceAccounts) 5. [Create a StorageClass](#Create-a-StorageClass) ## Install the Helm CLI tool You must have Helm installed on your workstation before proceeding. Alternatively you can use the OpenShift's [Web Terminal Operator](https://www.redhat.com/en/blog/a-deeper-look-at-the-web-terminal-operator-1) to run all of the commands from the OpenShift Web Console. The Web Terminal Operator environment has `helm` and `oc` already installed. ```bash # Download the helm binary and save it to $HOME/bin/helm curl -L -o $HOME/bin/helm https://mirror.openshift.com/pub/openshift-v4/clients/helm/latest/helm-linux-amd64 # Allow helm to be executed chmod a+x $HOME/bin/helm # Confirm that helm is working by reporting the version helm version ``` ## Add the Helm repo and list available charts Helm uses the term "chart" when referring to the collection of resources that make up a Kubernetes application. Helm charts are published in repos / repositories. The csi-driver-nfs chart instructs OpenShift to create various resources, such as a `Deployment` that creates a controller `Pod` and a `DaemonSet` that creates a mount/unmount `Pod` on each server in the cluster. Two `ServiceAccounts` are also created in order to run the `Pods` with appropriate permissions. :::info These notes were created using version csi-driver-nfs version `4.6.0` on April 2024. ::: ```bash # Tell helm where to find the csi-driver-nfs chart repository helm repo add csi-driver-nfs https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts # Ask helm to list the repo contents. You should see v4.5.0 and several other versions. helm search repo -l csi-driver-nfs ``` ## Install a chart release Helm uses the term "release" when referring to an instance of a chart/application running in a Kubernetes cluster. A single chart can be installed many times into the same cluster. Each time its installed, a new release is created. **The csi-driver-nfs chart should only be installed once!** More info available at https://helm.sh/docs/intro/using_helm My example command below uses several `--set name=value` arguments to override certain default chart values. By overriding the default values I can: 1. Create a new namespace/project for the resources created by csi-driver-nfs 2. Allow the pods that create NFS subdirectories to run on the ControlPlane nodes 3. Have two NFS controller Pods running for high-availability 4. Deploy the optional CSI external snapshot controller 5. Control whether or not the Custom Resource Definitions (CRD) required for the external snapshot controller are installed (they can only be installed once) :::info OpenShift installs the external snapshot CRDs by default. Helm will fail and abort when it finds the existing CRDs because the CRDs weren't created (and labeled) by Helm. Its OK to disable the snapshot CRD creation when you see this error: > Error: INSTALLATION FAILED: rendered manifests contain a resource that already exists. Unable to continue with install: CustomResourceDefinition "volumesnapshots.snapshot.storage.k8s.io" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "csi-driver-nfs"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "csi-driver-nfs" ::: Option 1 - Install everything, but don't try to recreate the external snapshot CRDs that OpenShift installs be default ```bash helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --version v4.6.0 \ --create-namespace \ --namespace csi-driver-nfs \ --set controller.runOnControlPlane=true \ --set controller.replicas=2 \ --set controller.strategyType=RollingUpdate \ --set externalSnapshotter.enabled=true \ --set externalSnapshotter.customResourceDefinitions.enabled=false ``` Option 2 - Install on a Single Node Openshift (SNO) cluster. When there is only one node in the cluster, we don't need multiple replicas of the controller pod ```bash helm install csi-driver-nfs csi-driver-nfs/csi-driver-nfs --version v4.6.0 \ --create-namespace \ --namespace csi-driver-nfs \ --set controller.runOnControlPlane=true \ --set externalSnapshotter.enabled=true \ --set externalSnapshotter.customResourceDefinitions.enabled=false ``` ## Grant additional permissions to the ServiceAccounts :::danger These commands allow the Pods to run as root with all permissions. Ideally specific permissions would be granted with all other permissions restricted. I started working on this in the Appendix, but never finished it. 😭 ::: ```bash oc adm policy add-scc-to-user privileged -z csi-nfs-node-sa -n csi-driver-nfs oc adm policy add-scc-to-user privileged -z csi-nfs-controller-sa -n csi-driver-nfs ``` ## Create a StorageClass Now that the csi-driver-nfs chart has been installed, and allowed to run with eleveated permissions, it's time to create a `StorageClass`. The `StorageClass` we create must reference the csi-driver-nfs provisioner and provides additional parameters which identify which NFS server and exported folder/share to use. An additional `subDir:` paramater is defined which controls the name of the folder that gets dynamically created when a Persistent Volume Claim is created. ```yaml --- apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: nfs-csi provisioner: nfs.csi.k8s.io parameters: server: nfs-server.example.com ### NFS server's IP/FQDN share: /example-dir ### NFS server's exported directory subDir: ${pvc.metadata.namespace}-${pvc.metadata.name}-${pv.metadata.name} ### Folder/subdir name template reclaimPolicy: Delete volumeBindingMode: Immediate ``` If the NFS server is only v4 add to the StorageClass: ```yaml mountOptions: - nfsvers=4.1 ``` :::warning I find it helpful to have a default StorageClass. It's possible to have multiple default StorageClasses, but the behavior is undefined. The thought of undefined behavior scares me! 😱 Check that no other StorageClass has been marked as the default with ```bash oc get storageclass ``` You can make this StorageClass the default by adding an annotation after creating/applying the YAML below ```bash oc annotate storageclass/nfs-csi storageclass.kubernetes.io/is-default-class=true ``` ::: :::success All done! The StorageClass has been created, and you can see it working by creating a sample PVC with this YAML ```yaml --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-nfs namespace: csi-driver-nfs spec: accessModes: - ReadWriteMany resources: requests: storage: 1Gi volumeMode: Filesystem storageClassName: nfs-csi ### remove this line to test the default StorageClass ``` The PVC should become `Bound` in a few seconds... ```bash oc get pvc -n csi-driver-nfs NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE test-nfs Bound pvc-abcdef-12345-... 1Gi RWX nfs-csi 5s ``` ::: # Appendix ## Update the Image Registry's configuration to use shared NFS storage If you have recently installed OpenShift, and didn't provided storage configuration details for the internal Image Registry, you may be interested in updating the Image Registry's configuration to use the new NFS StorageClass. :::info This command will reconfigure the internal Image Registry. The updated configuration will: - Create the default Persistent Volume Claim from the default StorageClass - The default PVC requests a 100GB ReadWriteMany/RWX volume - Make the internal registry highly available by running 2 pods/replicas - Revert the management state to Managed if it had been marked as Removed ```bash oc patch config.imageregistry.operator.openshift.io/cluster --type=merge \ -p '{"spec":{"rolloutStrategy":"RollingUpdate","replicas":2,"managementState":"Managed","storage":{"pvc":{"claim":""}}}}' ``` ::: :::danger The command below will terminate the Image Registry pods and delete the default Persistant Volume Claim. PVCs that were created manually will not be deleted. Only use this if you're planning on changing the internal Image Registry's configuration. All data/images in the Image Registry will be lost! ```bash oc patch config.imageregistry.operator.openshift.io/cluster --type=merge -p '{"spec":{"managementState":"Removed"}}' ``` ::: ## RHEL as an NFS server Here is a an abbreviated set of commands to configure a RHEL8 server as an NFS server. ```bash # Install the RPM dnf install nfs-utils # Create an export directory with adequate permissions and SElinux labels mkdir -p /exports/openshift chmod 755 /exports/openshift chown -R nfsnobody:nfsnobody /exports/openshift semanage fcontext --add --type nfs_t "/exports/openshift(/.*)?" restorecon -R -v /exports/openshift # Add the directory to the list of exports echo "/exports/openshift *(insecure,no_root_squash,async,rw)" >> /etc/exports # Make sure the firewall isn't blocking anything firewall-cmd --add-service nfs \ --add-service mountd \ --add-service rpc-bind \ --permanent firewall-cmd --reload # Start the NFS server service (also starts following reboots) systemctl enable --now nfs-server # Check everything from the server's perspective systemctl status nfs-server exportfs -av showmount -e localhost ``` You can mount the export from another Linux system for testing purposes. ```bash mount nfs.example.com:/exports/openshift /mnt touch /mnt/test-file mkdir /mnt/test-dir ls -lR /mnt #confirm these folders and files are owned by root rm -rvf /mnt/test-* umount /mnt ``` ## Ubuntu as an NFS server :::danger Ubuntu 22.04 LTS ships with an NFS server configuration that rewrites the group list provided by NFS clients (OpenShift.) In order for Ubuntu to function with csi-driver-nfs and OpenShift you must disable the "`manage-gids=y`" portion of the `/etc/nfs.conf` file using these two commands. ```bash echo -e "[mountd]\nmanage-gids=n" | sudo tee /etc/nfs.conf.d/disable-mountd-manage-gids.conf sudo systemctl restart nfs-server nfs-idmapd ``` ::: ## Logs and debugging If you're curious and would like to learn more about how this Provisioner works or troubleshoot and debug failures you can get more information from inspecting the namespace events and pod logs with these commands. The logs below show three log groupings. The first group shows the events in the namespace when a PVC is created. The second grouping shows the logs from a PVC being created which results in a subdirectory being created on the NFS server. The third log grouping shows the logs when a Pod attempts to mount and use the storage. ```bash oc get events -n csi-driver-nfs & LAST SEEN TYPE REASON OBJECT MESSAGE 7m28s Normal ExternalProvisioning persistentvolumeclaim/test-nfs waiting for a volume to be created, either by external provisioner "nfs.csi.k8s.io" or manually created by system administrator 7m28s Normal Provisioning persistentvolumeclaim/test-nfs External provisioner is provisioning volume for claim "csi-driver-nfs/test-nfs" 7m24s Normal ProvisioningSucceeded persistentvolumeclaim/test-nfs Successfully provisioned volume pvc-5daf545e-1ca0-4285-ba91-64fd67133b00 oc logs -f -l app=csi-nfs-controller -n csi-driver-nfs & I1114 18:12:23.858858 1 controller.go:1366] provision "csi-driver-nfs/new450" class "nfs-csi": started I1114 18:12:23.859290 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"csi-driver-nfs", Name:"new450", UID:"70e4a89e-571c-49e3-adf3-3264c6dc553d", APIVersion:"v1", ResourceVersion:"150582625", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "csi-driver-nfs/new450" I1114 18:12:24.163417 1 controller.go:923] successfully created PV pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d for PVC new450 and csi volume name rhdata6.dota-lab.iad.redhat.com#data/hdd/ocp-vmware#pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d## I1114 18:12:24.163682 1 controller.go:1449] provision "csi-driver-nfs/new450" class "nfs-csi": volume "pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d" provisioned I1114 18:12:24.163752 1 controller.go:1462] provision "csi-driver-nfs/new450" class "nfs-csi": succeeded I1114 18:12:24.178981 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"csi-driver-nfs", Name:"new450", UID:"70e4a89e-571c-49e3-adf3-3264c6dc553d", APIVersion:"v1", ResourceVersion:"150582625", FieldPath:""}): type: 'Normal' reason: 'ProvisioningSucceeded' Successfully provisioned volume pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d oc logs -f -l app=csi-nfs-node -c nfs & I1114 18:23:17.540284 1 utils.go:107] GRPC call: /csi.v1.Node/NodePublishVolume I1114 18:23:17.540327 1 utils.go:108] GRPC request: {"target_path":"/var/lib/kubelet/pods/ec96ea93-a4d9-4e0b-86a4-8768b324242e/volumes/kubernetes.io~csi/pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d/mount","volume_capability":{"AccessType":{"Mount":{}},"access_mode":{"mode":5}},"volume_context":{"csi.storage.k8s.io/pv/name":"pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d","csi.storage.k8s.io/pvc/name":"new450","csi.storage.k8s.io/pvc/namespace":"csi-driver-nfs","server":"rhdata6.dota-lab.iad.redhat.com","share":"/data/hdd/ocp-vmware","storage.kubernetes.io/csiProvisionerIdentity":"1699985041046-4693-nfs.csi.k8s.io","subdir":"pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d"},"volume_id":"rhdata6.dota-lab.iad.redhat.com#data/hdd/ocp-vmware#pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d##"} I1114 18:23:17.540807 1 nodeserver.go:123] NodePublishVolume: volumeID(rhdata6.dota-lab.iad.redhat.com#data/hdd/ocp-vmware#pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d##) source(rhdata6.dota-lab.iad.redhat.com:/data/hdd/ocp-vmware/pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d) targetPath(/var/lib/kubelet/pods/ec96ea93-a4d9-4e0b-86a4-8768b324242e/volumes/kubernetes.io~csi/pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d/mount) mountflags([]) I1114 18:23:17.541201 1 mount_linux.go:245] Detected OS without systemd I1114 18:23:17.541229 1 mount_linux.go:220] Mounting cmd (mount) with arguments (-t nfs rhdata6.dota-lab.iad.redhat.com:/data/hdd/ocp-vmware/pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d /var/lib/kubelet/pods/ec96ea93-a4d9-4e0b-86a4-8768b324242e/volumes/kubernetes.io~csi/pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d/mount) I1114 18:23:19.898894 1 nodeserver.go:140] skip chmod on targetPath(/var/lib/kubelet/pods/ec96ea93-a4d9-4e0b-86a4-8768b324242e/volumes/kubernetes.io~csi/pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d/mount) since mountPermissions is set as 0 I1114 18:23:19.898932 1 nodeserver.go:142] volume(rhdata6.dota-lab.iad.redhat.com#data/hdd/ocp-vmware#pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d##) mount rhdata6.dota-lab.iad.redhat.com:/data/hdd/ocp-vmware/pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d on /var/lib/kubelet/pods/ec96ea93-a4d9-4e0b-86a4-8768b324242e/volumes/kubernetes.io~csi/pvc-70e4a89e-571c-49e3-adf3-3264c6dc553d/mount succeeded I1114 18:23:19.898945 1 utils.go:114] GRPC response: {} ``` :::spoiler List of TODO items ## TODO: YAML that gets applied when using the Web / Developer Console ``` --- apiVersion: helm.openshift.io/v1beta1 kind: ProjectHelmChartRepository metadata: name: csi-driver-nfs namespace: csi-driver-nfs spec: name: NFS CSI driver for Kubernetes connectionConfig: url: https://raw.githubusercontent.com/kubernetes-csi/csi-driver-nfs/master/charts ``` ``` TODO - add ChartRelease? YAML here ``` ## TODO: Create a customized SecurityContextConstraint ADDITIONAL TESTING REQUIRED HERE ``` --- apiVersion: security.openshift.io/v1 kind: SecurityContextConstraints metadata: name: csi-driver-nfs allowHostDirVolumePlugin: true allowHostNetwork: true allowHostPorts: true allowPrivilegedContainer: true allowPrivilegeEscalation: true allowedCapabilities: - SYS_ADMIN runAsUser: type: RunAsAny seLinuxContext: type: RunAsAny --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: system:openshift:scc:csi-driver-nfs rules: - apiGroups: - security.openshift.io resources: - securitycontextconstraints resourceNames: - csi-driver-nfs verbs: - use --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: system:openshift:scc:csi-driver-nfs roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: system:openshift:scc:csi-driver-nfs subjects: - kind: ServiceAccount name: csi-nfs-node-sa namespace: csi-driver-nfs - kind: ServiceAccount name: csi-nfs-controller-sa namespace: csi-driver-nfs ``` :::

Read more

Noobaa vs Ceph RGW for OpenShift Image Registry

OpenShift Node Network Configuration

Installing OpenShift on a single drive

OpenShift certificates