Understand Ceph Upgrades (Brownfield ) [TM#195](https://github.com/airshipit/treasuremap/issues/195)

# Understand Ceph Upgrades (Brownfield ) [TM#195](https://github.com/airshipit/treasuremap/issues/195) [TOC] ## Investigation and plan for an upgrade of Ceph on an existing cluster This POC will determine the proposed plan to upgrade an existing ceph cluster from v15.2.13 to v16.2.6. It will also identify the required sequence, dependencies and constraints. As well as any impacts to cluster availability and performance during an upgrade. Assuming that the original cluster was deployed using rook-ceph operator, the brownfield scenario could consist of two independent steps: * Operator upgrade * Ceph upgrade Both steps could be accomplished in any sequence by following set of rules listed below: 1. Before upgrade the ceph cluster should be in a healthy state. It is possible (but not recommended) to perform an upgrade on a cluster which has some warnings alarms, however in this event a person responsible for maintenance should make a decision. Below are some examples of warnings when we still can proceed with upgrades: * some osds are permanently out/down because of drives errors. * some of PGs are in peering/waiting state because of scrubbing or deep scrubbing * there are some PGs that are not scrubbed in time However, warnings like : * osd almost full * osds are flapping and similar Should be considered as a red flag for the brownfield upgrade. To summarize warnings listed above - there should be made a human decision about warning severity. 2. The upgrade should be performed within two major releases, e.g: rook 1.6 -> 1.7 and/or ceph 15.x -> ceph 16.x. It is recommended to upgrade ceph to the latest minor release before performing a major release upgrade. 3. Planning ceph upgrade to the next major release, it is recommended to perform the operator upgrade first. Usually, rook operator supports three major ceph releases N-1, N and N+1, e.g: rook 1.7 supports Nautilus, Octopus and Pacific. 1. It is possible to perform downgrade, as well. For the ceph the downgrade was tested between minor releases. Performing downgrade the attention should be paid to the ceph release notes. We can downgrade between bug-fix releases, but feature releases shouldn’t be downgraded under any circumstances. As an example, latest octopus should not be downgraded to the previous minor versions because of data base schematics change. 5. Different upgrade scenarios performed in the local lab confirming that there are no significant performance or availability impacts. The operator upgrade doesn’t affect a ceph functionality, according to the rook documentation, the ceph cluster remains fully functional with only minimal limitations. The performance impact during the ceph upgrade is absolutely comparable to the impact triggered by regular maintenance like osd node reboot or hard drive replacement. This level of impact is expected and well documented. To summarize the above statement, both brownfield operations are harmless for the cluster. ## Ceph Upgrade Process In this scenario, we will upgrade ceph from v15.2.13 to v16.2.6 ### Pre-requisites and health status Initial status of the ceph cluster : ```airship@d105:~$ kubectl get cephclusters.ceph.rook.io -n rook-ceph NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL rook-ceph /var/lib/rook 3 6d23h Ready Cluster created successfully HEALTH_OK ``` ``` kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status cluster: id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 6d) mgr: a(active, since 6d) osd: 3 osds: 3 up (since 6d), 3 in (since 6d) data: pools: 2 pools, 33 pgs objects: 64 objects, 158 MiB usage: 3.5 GiB used, 15 GiB / 18 GiB avail pgs: 33 active+clean ``` Verify Rook Operator Version: ``` airship@d105:~$ kubectl get deployments.apps rook-ceph-operator -n rook-ceph -o=custom-columns="NAME:.metadata.name,IMAGE:.spec.template.spec.containers[*].image" NAME IMAGE rook-ceph-operator rook/ceph:v1.7.11 ``` Ceph Version Upgrades: Rook v1.7 supports the following Ceph versions: * Ceph Pacific 16.2.0 or newer * Ceph Octopus v15.2.0 or newer * Ceph Nautilus 14.2.5 or newer Existing Ceph Version in the Cluster: ``` airship@d105:~$ kubectl get deployments.apps -n rook-ceph -o=custom-columns="NAME:.metadata.name,IMAGE:.spec.template.spec.containers[*].image" NAME IMAGE csi-cephfsplugin-provisioner k8s.gcr.io/sig-storage/csi-attacher:v3.3.0,k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0,k8s.gcr.io/sig-storage/csi-resizer:v1.3.0,k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0,quay.io/cephcsi/cephcsi:v3.4.0,quay.io/cephcsi/cephcsi:v3.4.0 csi-rbdplugin-provisioner k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0,k8s.gcr.io/sig-storage/csi-resizer:v1.3.0,k8s.gcr.io/sig-storage/csi-attacher:v3.3.0,k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0,quay.io/cephcsi/cephcsi:v3.4.0,quay.io/cephcsi/cephcsi:v3.4.0 rook-ceph-crashcollector-node03 ceph/ceph:v15.2.13 rook-ceph-crashcollector-node04 ceph/ceph:v15.2.13 rook-ceph-crashcollector-node05 ceph/ceph:v15.2.13 rook-ceph-mgr-a ceph/ceph:v15.2.13 rook-ceph-mon-a ceph/ceph:v15.2.13 rook-ceph-mon-b ceph/ceph:v15.2.13 rook-ceph-mon-c ceph/ceph:v15.2.13 rook-ceph-operator rook/ceph:v1.7.11 rook-ceph-osd-0 ceph/ceph:v15.2.13 rook-ceph-osd-1 ceph/ceph:v15.2.13 rook-ceph-osd-2 ceph/ceph:v15.2.13 ``` ``` airship@d105:~$ kubectl get pods -n rook-ceph NAME READY STATUS RESTARTS AGE csi-cephfsplugin-8f2zm 3/3 Running 0 6d23h csi-cephfsplugin-d27bl 3/3 Running 0 6d23h csi-cephfsplugin-kmz8j 3/3 Running 0 6d23h csi-cephfsplugin-provisioner-689686b44-bfpzp 6/6 Running 0 6d23h csi-cephfsplugin-provisioner-689686b44-d699m 6/6 Running 0 6d23h csi-rbdplugin-9dsst 3/3 Running 0 6d23h csi-rbdplugin-fw2nk 3/3 Running 0 6d23h csi-rbdplugin-provisioner-5775fb866b-7fng8 6/6 Running 0 6d23h csi-rbdplugin-provisioner-5775fb866b-7r4xf 6/6 Running 0 6d23h csi-rbdplugin-rs2w8 3/3 Running 0 6d23h rook-ceph-crashcollector-node03-df5fccdc4-xj44l 1/1 Running 0 6d23h rook-ceph-crashcollector-node04-7d5b4dd9df-8pzz7 1/1 Running 0 6d23h rook-ceph-crashcollector-node05-77d88cf7bd-fbdfp 1/1 Running 0 6d23h rook-ceph-mgr-a-84855f9b9d-wg8vd 1/1 Running 0 6d23h rook-ceph-mon-a-5cb4fbdf47-wgh2w 1/1 Running 0 7d1h rook-ceph-mon-b-88d5c7db6-7n9kc 1/1 Running 0 7d1h rook-ceph-mon-c-cdf7b8bc-zx5wt 1/1 Running 0 7d1h rook-ceph-operator-8595fc774f-gr75s 1/1 Running 0 6d23h rook-ceph-osd-0-c5cccc678-ptvlv 1/1 Running 0 6d23h rook-ceph-osd-1-5d7f769f5d-w5bhk 1/1 Running 0 6d23h rook-ceph-osd-2-799c7ddb87-wg68x 1/1 Running 0 6d23h rook-ceph-osd-prepare-node03-gd2nf 0/1 Completed 0 127m rook-ceph-osd-prepare-node04-59m72 0/1 Completed 0 126m rook-ceph-osd-prepare-node05-pqzss 0/1 Completed 0 126m rook-ceph-tools-65c94d77bb-6czmn 1/1 Running 0 6d23h ``` Cluster with one master and three worker nodes: ``` airship@d105:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION node01 Ready control-plane,master 12d v1.21.2 node03 Ready <none> 7d1h v1.21.2 node04 Ready <none> 7d1h v1.21.2 node05 Ready <none> 12d v1.21.2 ``` Each of these worker nodes has two disks configured; one which runs the OS (root disk) and one which is going to be used for the Ceph storage. The below output shows the storage available, which is exactly the same on each host. /dev/sda is the root partition containing the OS install and /dev/sdb is an untouched partition which will be used for Ceph. ``` deployer@node05:~$ sudo fdisk -l | grep /dev/sd Disk /dev/sda: 30 GiB, 32212254720 bytes, 62914560 sectors /dev/sda1 2629632 62781439 60151808 28.7G Linux filesystem /dev/sda2 2048 10239 8192 4M BIOS boot /dev/sda3 10240 1056767 1046528 511M EFI System /dev/sda4 1056768 2629631 1572864 768M Linux filesystem /dev/sda5 62781440 62914526 133087 65M Linux filesystem Disk /dev/sdb: 10 GiB, 10737418240 bytes, 20971520 sectors /dev/sdb1 2048 12584959 12582912 6G 83 Linux ``` ### Steps for Ceph Upgrade 1. Update the main Ceph daemons Begin the upgrade by changing the Ceph image field in the cluster CRD (spec.cephVersion.image). ``` NEW_CEPH_IMAGE='quay.io/ceph/ceph:v16.2.6-20210918' CLUSTER_NAME="$ROOK_CLUSTER_NAMESPACE" # change if your cluster name is not the Rook namespace kubectl -n $ROOK_CLUSTER_NAMESPACE patch CephCluster rook-ceph --type=merge -p "{\"spec\": {\"cephVersion\": {\"image\": \"$NEW_CEPH_IMAGE\"}}}" ``` 2. Wait for the daemon pod updates to complete Status can be determined in a similar way to the Rook upgrade as well. ``` watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \tceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}' ``` 3. Wait for the upgrade to complete Ceph mons, mgr, osds are terminated and replaced with updated versions in sequence. The cluster may be offline very briefly as mons update, and the Ceph Filesystem may fall offline a few times while the MDSes are upgrading. This is normal. The versions of the components can be viewed as they are updated: ``` kubectl get deployments.apps -n rook-ceph -o=custom-columns="NAME:.metadata.name,IMAGE:.spec.template.spec.containers[*].image" ``` After upgrade: Pods are created with new ceph image version ``` rook-ceph-crashcollector-node03-796bc855d5-4lmvd 1/1 Running 0 16m rook-ceph-crashcollector-node04-7949c4dddb-djdnf 1/1 Running 0 13m rook-ceph-crashcollector-node05-f9c854567-mcz78 1/1 Running 0 15m rook-ceph-mgr-a-bcddcb64b-r9pr2 1/1 Running 0 13m rook-ceph-mon-a-6878cc4679-c97cc 1/1 Running 0 15m rook-ceph-mon-b-76584cf74b-6txkl 1/1 Running 0 13m rook-ceph-mon-c-58d994876c-98hvh 1/1 Running 0 16m rook-ceph-operator-8595fc774f-gr75s 1/1 Running 0 7d rook-ceph-osd-0-64cb7bb64f-lvrhc 1/1 Running 0 12m rook-ceph-osd-1-5587cf66f9-th6b2 1/1 Running 0 12m rook-ceph-osd-2-8fbb84756-z2mqc 1/1 Running 0 12m rook-ceph-osd-prepare-node03-cffpg 0/1 Completed 0 13m rook-ceph-osd-prepare-node04-wr465 0/1 Completed 0 12m rook-ceph-osd-prepare-node05-qzz79 0/1 Completed 0 12m ``` ``` rook-ceph-crashcollector-node03 quay.io/ceph/ceph:v16.2.6-20210918 rook-ceph-crashcollector-node04 quay.io/ceph/ceph:v16.2.6-20210918 rook-ceph-crashcollector-node05 quay.io/ceph/ceph:v16.2.6-20210918 rook-ceph-mgr-a quay.io/ceph/ceph:v16.2.6-20210918 rook-ceph-mon-a quay.io/ceph/ceph:v16.2.6-20210918 rook-ceph-mon-b quay.io/ceph/ceph:v16.2.6-20210918 rook-ceph-mon-c quay.io/ceph/ceph:v16.2.6-20210918 rook-ceph-operator rook/ceph:v1.7.11 rook-ceph-osd-0 quay.io/ceph/ceph:v16.2.6-20210918 rook-ceph-osd-1 quay.io/ceph/ceph:v16.2.6-20210918 rook-ceph-osd-2 quay.io/ceph/ceph:v16.2.6-20210918 ``` 4. Verify the updated Cluster ``` airship@d105:~$ kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"ceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}' | sort | uniq ceph-version=16.2.6-0 ``` ## Observations 1. When we ceph upgrade , ceph components like crashcollectors , mons, osds, mgr are upgraded to the latest version. 2. OSD went down one after another from a single node at a time out of 3 nodes. So there was always 2 OSDs which preserve quoram. 3. Ceph cluster health went to Warn state once the OSD were down for a while. Post upgrade OSD came up and ceph cluster health back to Health OK. 4. Node reboot not required post ceph upgrade. 5. Ceph upgrade does not have any impact on the rook operator. 6. Different upgrade scenarios performed in the local lab confirming that there are no significant performance or availability impacts. The ceph upgrade doesn't affect a rook functionality, according to the rook documentation, the ceph cluster remains fully functional with only minimal limitations. The performance impact during the ceph upgrade is absolutely comparable to the impact triggered by regular maintenance like osd node reboot or hard drive replacement. This level of impact is expected and well documented. #### Performance Impact during upgrade process: Initial Ceph Status: ``` kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status cluster: id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 6d) mgr: a(active, since 6d) osd: 3 osds: 3 up (since 6d), 3 in (since 6d) data: pools: 2 pools, 33 pgs objects: 64 objects, 158 MiB usage: 3.5 GiB used, 15 GiB / 18 GiB avail pgs: 33 active+clean ``` One OSDs went down , Ceph mons also go down one at a time during the upgrade , cluster shows warning status: ``` health: HEALTH_WARN 1 osds down 1 host (1 osds) down Degraded data redundancy: 65/195 objects degraded (33.333%), 28 pgs degraded data: pools: 2 pools, 33 pgs objects: 65 objects, 158 MiB usage: 516 MiB used, 17 GiB / 18 GiB avail pgs: 24.242% pgs not active 49/195 objects degraded (25.128%) 22 active+undersized+degraded 8 peering 3 active+undersized ``` After OSDs are upgraded to latest version: There is no impact on the data as the write operation performed in db is still intact. ``` airship@d105:~$ kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status cluster: id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6 health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c (age 16h) mgr: a(active, since 18h) osd: 3 osds: 3 up (since 18h), 3 in (since 7d) data: pools: 2 pools, 96 pgs objects: 104 objects, 248 MiB usage: 1.0 GiB used, 17 GiB / 18 GiB avail pgs: 96 active+clean io: client: 4.7 KiB/s wr, 0 op/s rd, 0 op/s wr ``` Write Operation intact after CEPH upgrade: ``` airship@d105:~$ kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph osd status ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE 0 node03 428M 5716M 0 0 0 0 exists,up 1 node04 447M 5696M 0 6552 0 0 exists,up 2 node05 446M 5697M 0 0 0 0 exists,up ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.