# Understand Ceph Upgrades (Brownfield ) [TM#195](https://github.com/airshipit/treasuremap/issues/195)
[TOC]
## Investigation and plan for an upgrade of Ceph on an existing cluster
This POC will determine the proposed plan to upgrade an existing ceph cluster from v15.2.13 to v16.2.6.
It will also identify the required sequence, dependencies and constraints. As well as any impacts to cluster availability and performance during an upgrade.
Assuming that the original cluster was deployed using rook-ceph operator, the brownfield scenario could consist of two independent steps:
* Operator upgrade
* Ceph upgrade
Both steps could be accomplished in any sequence by following set of rules listed below:
1. Before upgrade the ceph cluster should be in a healthy state. It is possible (but not recommended) to perform an upgrade on a cluster which has some warnings alarms, however in this event a person responsible for maintenance should make a decision. Below are some examples of warnings when we still can proceed with upgrades:
* some osds are permanently out/down because of drives errors.
* some of PGs are in peering/waiting state because of scrubbing or deep scrubbing
* there are some PGs that are not scrubbed in time
However, warnings like :
* osd almost full
* osds are flapping and similar
Should be considered as a red flag for the brownfield upgrade. To summarize warnings listed above - there should be made a human decision about warning severity.
2. The upgrade should be performed within two major releases, e.g: rook 1.6 -> 1.7 and/or ceph 15.x -> ceph 16.x. It is recommended to upgrade ceph to the latest minor release before performing a major release upgrade.
3. Planning ceph upgrade to the next major release, it is recommended to perform the operator upgrade first. Usually, rook operator supports three major ceph releases N-1, N and N+1, e.g: rook 1.7 supports Nautilus, Octopus and Pacific.
1. It is possible to perform downgrade, as well. For the ceph the downgrade was tested between minor releases. Performing downgrade the attention should be paid to the ceph release notes. We can downgrade between bug-fix releases, but feature releases shouldn’t be downgraded under any circumstances. As an example, latest octopus should not be downgraded to the previous minor versions because of data base schematics change.
5. Different upgrade scenarios performed in the local lab confirming that there are no significant performance or availability impacts. The operator upgrade doesn’t affect a ceph functionality, according to the rook documentation, the ceph cluster remains fully functional with only minimal limitations. The performance impact during the ceph upgrade is absolutely comparable to the impact triggered by regular maintenance like osd node reboot or hard drive replacement. This level of impact is expected and well documented.
To summarize the above statement, both brownfield operations are harmless for the cluster.
## Ceph Upgrade Process
In this scenario, we will upgrade ceph from v15.2.13 to v16.2.6
### Pre-requisites and health status
Initial status of the ceph cluster :
```airship@d105:~$ kubectl get cephclusters.ceph.rook.io -n rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 6d23h Ready Cluster created successfully HEALTH_OK
```
```
kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status
cluster:
id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 6d)
mgr: a(active, since 6d)
osd: 3 osds: 3 up (since 6d), 3 in (since 6d)
data:
pools: 2 pools, 33 pgs
objects: 64 objects, 158 MiB
usage: 3.5 GiB used, 15 GiB / 18 GiB avail
pgs: 33 active+clean
```
Verify Rook Operator Version:
```
airship@d105:~$ kubectl get deployments.apps rook-ceph-operator -n rook-ceph -o=custom-columns="NAME:.metadata.name,IMAGE:.spec.template.spec.containers[*].image"
NAME IMAGE
rook-ceph-operator rook/ceph:v1.7.11
```
Ceph Version Upgrades:
Rook v1.7 supports the following Ceph versions:
* Ceph Pacific 16.2.0 or newer
* Ceph Octopus v15.2.0 or newer
* Ceph Nautilus 14.2.5 or newer
Existing Ceph Version in the Cluster:
```
airship@d105:~$ kubectl get deployments.apps -n rook-ceph -o=custom-columns="NAME:.metadata.name,IMAGE:.spec.template.spec.containers[*].image"
NAME IMAGE
csi-cephfsplugin-provisioner k8s.gcr.io/sig-storage/csi-attacher:v3.3.0,k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0,k8s.gcr.io/sig-storage/csi-resizer:v1.3.0,k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0,quay.io/cephcsi/cephcsi:v3.4.0,quay.io/cephcsi/cephcsi:v3.4.0
csi-rbdplugin-provisioner k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0,k8s.gcr.io/sig-storage/csi-resizer:v1.3.0,k8s.gcr.io/sig-storage/csi-attacher:v3.3.0,k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0,quay.io/cephcsi/cephcsi:v3.4.0,quay.io/cephcsi/cephcsi:v3.4.0
rook-ceph-crashcollector-node03 ceph/ceph:v15.2.13
rook-ceph-crashcollector-node04 ceph/ceph:v15.2.13
rook-ceph-crashcollector-node05 ceph/ceph:v15.2.13
rook-ceph-mgr-a ceph/ceph:v15.2.13
rook-ceph-mon-a ceph/ceph:v15.2.13
rook-ceph-mon-b ceph/ceph:v15.2.13
rook-ceph-mon-c ceph/ceph:v15.2.13
rook-ceph-operator rook/ceph:v1.7.11
rook-ceph-osd-0 ceph/ceph:v15.2.13
rook-ceph-osd-1 ceph/ceph:v15.2.13
rook-ceph-osd-2 ceph/ceph:v15.2.13
```
```
airship@d105:~$ kubectl get pods -n rook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-8f2zm 3/3 Running 0 6d23h
csi-cephfsplugin-d27bl 3/3 Running 0 6d23h
csi-cephfsplugin-kmz8j 3/3 Running 0 6d23h
csi-cephfsplugin-provisioner-689686b44-bfpzp 6/6 Running 0 6d23h
csi-cephfsplugin-provisioner-689686b44-d699m 6/6 Running 0 6d23h
csi-rbdplugin-9dsst 3/3 Running 0 6d23h
csi-rbdplugin-fw2nk 3/3 Running 0 6d23h
csi-rbdplugin-provisioner-5775fb866b-7fng8 6/6 Running 0 6d23h
csi-rbdplugin-provisioner-5775fb866b-7r4xf 6/6 Running 0 6d23h
csi-rbdplugin-rs2w8 3/3 Running 0 6d23h
rook-ceph-crashcollector-node03-df5fccdc4-xj44l 1/1 Running 0 6d23h
rook-ceph-crashcollector-node04-7d5b4dd9df-8pzz7 1/1 Running 0 6d23h
rook-ceph-crashcollector-node05-77d88cf7bd-fbdfp 1/1 Running 0 6d23h
rook-ceph-mgr-a-84855f9b9d-wg8vd 1/1 Running 0 6d23h
rook-ceph-mon-a-5cb4fbdf47-wgh2w 1/1 Running 0 7d1h
rook-ceph-mon-b-88d5c7db6-7n9kc 1/1 Running 0 7d1h
rook-ceph-mon-c-cdf7b8bc-zx5wt 1/1 Running 0 7d1h
rook-ceph-operator-8595fc774f-gr75s 1/1 Running 0 6d23h
rook-ceph-osd-0-c5cccc678-ptvlv 1/1 Running 0 6d23h
rook-ceph-osd-1-5d7f769f5d-w5bhk 1/1 Running 0 6d23h
rook-ceph-osd-2-799c7ddb87-wg68x 1/1 Running 0 6d23h
rook-ceph-osd-prepare-node03-gd2nf 0/1 Completed 0 127m
rook-ceph-osd-prepare-node04-59m72 0/1 Completed 0 126m
rook-ceph-osd-prepare-node05-pqzss 0/1 Completed 0 126m
rook-ceph-tools-65c94d77bb-6czmn 1/1 Running 0 6d23h
```
Cluster with one master and three worker nodes:
```
airship@d105:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
node01 Ready control-plane,master 12d v1.21.2
node03 Ready <none> 7d1h v1.21.2
node04 Ready <none> 7d1h v1.21.2
node05 Ready <none> 12d v1.21.2
```
Each of these worker nodes has two disks configured; one which runs the OS (root disk) and one which is going to be used for the Ceph storage. The below output shows the storage available, which is exactly the same on each host. /dev/sda is the root partition containing the OS install and /dev/sdb is an untouched partition which will be used for Ceph.
```
deployer@node05:~$ sudo fdisk -l | grep /dev/sd
Disk /dev/sda: 30 GiB, 32212254720 bytes, 62914560 sectors
/dev/sda1 2629632 62781439 60151808 28.7G Linux filesystem
/dev/sda2 2048 10239 8192 4M BIOS boot
/dev/sda3 10240 1056767 1046528 511M EFI System
/dev/sda4 1056768 2629631 1572864 768M Linux filesystem
/dev/sda5 62781440 62914526 133087 65M Linux filesystem
Disk /dev/sdb: 10 GiB, 10737418240 bytes, 20971520 sectors
/dev/sdb1 2048 12584959 12582912 6G 83 Linux
```
### Steps for Ceph Upgrade
1. Update the main Ceph daemons
Begin the upgrade by changing the Ceph image field in the cluster CRD (spec.cephVersion.image).
```
NEW_CEPH_IMAGE='quay.io/ceph/ceph:v16.2.6-20210918'
CLUSTER_NAME="$ROOK_CLUSTER_NAMESPACE" # change if your cluster name is not the Rook namespace
kubectl -n $ROOK_CLUSTER_NAMESPACE patch CephCluster rook-ceph --type=merge -p "{\"spec\": {\"cephVersion\": {\"image\": \"$NEW_CEPH_IMAGE\"}}}"
```
2. Wait for the daemon pod updates to complete
Status can be determined in a similar way to the Rook upgrade as well.
```
watch --exec kubectl -n $ROOK_CLUSTER_NAMESPACE get deployments -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \tceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}'
```
3. Wait for the upgrade to complete
Ceph mons, mgr, osds are terminated and replaced with updated versions in sequence. The cluster may be offline very briefly as mons update, and the Ceph Filesystem may fall offline a few times while the MDSes are upgrading. This is normal.
The versions of the components can be viewed as they are updated:
```
kubectl get deployments.apps -n rook-ceph -o=custom-columns="NAME:.metadata.name,IMAGE:.spec.template.spec.containers[*].image"
```
After upgrade:
Pods are created with new ceph image version
```
rook-ceph-crashcollector-node03-796bc855d5-4lmvd 1/1 Running 0 16m
rook-ceph-crashcollector-node04-7949c4dddb-djdnf 1/1 Running 0 13m
rook-ceph-crashcollector-node05-f9c854567-mcz78 1/1 Running 0 15m
rook-ceph-mgr-a-bcddcb64b-r9pr2 1/1 Running 0 13m
rook-ceph-mon-a-6878cc4679-c97cc 1/1 Running 0 15m
rook-ceph-mon-b-76584cf74b-6txkl 1/1 Running 0 13m
rook-ceph-mon-c-58d994876c-98hvh 1/1 Running 0 16m
rook-ceph-operator-8595fc774f-gr75s 1/1 Running 0 7d
rook-ceph-osd-0-64cb7bb64f-lvrhc 1/1 Running 0 12m
rook-ceph-osd-1-5587cf66f9-th6b2 1/1 Running 0 12m
rook-ceph-osd-2-8fbb84756-z2mqc 1/1 Running 0 12m
rook-ceph-osd-prepare-node03-cffpg 0/1 Completed 0 13m
rook-ceph-osd-prepare-node04-wr465 0/1 Completed 0 12m
rook-ceph-osd-prepare-node05-qzz79 0/1 Completed 0 12m
```
```
rook-ceph-crashcollector-node03 quay.io/ceph/ceph:v16.2.6-20210918
rook-ceph-crashcollector-node04 quay.io/ceph/ceph:v16.2.6-20210918
rook-ceph-crashcollector-node05 quay.io/ceph/ceph:v16.2.6-20210918
rook-ceph-mgr-a quay.io/ceph/ceph:v16.2.6-20210918
rook-ceph-mon-a quay.io/ceph/ceph:v16.2.6-20210918
rook-ceph-mon-b quay.io/ceph/ceph:v16.2.6-20210918
rook-ceph-mon-c quay.io/ceph/ceph:v16.2.6-20210918
rook-ceph-operator rook/ceph:v1.7.11
rook-ceph-osd-0 quay.io/ceph/ceph:v16.2.6-20210918
rook-ceph-osd-1 quay.io/ceph/ceph:v16.2.6-20210918
rook-ceph-osd-2 quay.io/ceph/ceph:v16.2.6-20210918
```
4. Verify the updated Cluster
```
airship@d105:~$ kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"ceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}' | sort | uniq
ceph-version=16.2.6-0
```
## Observations
1. When we ceph upgrade , ceph components like crashcollectors , mons, osds, mgr are upgraded to the latest version.
2. OSD went down one after another from a single node at a time out of 3 nodes. So there was always 2 OSDs which preserve quoram.
3. Ceph cluster health went to Warn state once the OSD were down for a while. Post upgrade OSD came up and ceph cluster health back to Health OK.
4. Node reboot not required post ceph upgrade.
5. Ceph upgrade does not have any impact on the rook operator.
6. Different upgrade scenarios performed in the local lab confirming that there are no significant performance or availability impacts. The ceph upgrade doesn't affect a rook functionality, according to the rook documentation, the ceph cluster remains fully functional with only minimal limitations. The performance impact during the ceph upgrade is absolutely comparable to the impact triggered by regular maintenance like osd node reboot or hard drive replacement. This level of impact is expected and well documented.
#### Performance Impact during upgrade process:
Initial Ceph Status:
```
kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status
cluster:
id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 6d)
mgr: a(active, since 6d)
osd: 3 osds: 3 up (since 6d), 3 in (since 6d)
data:
pools: 2 pools, 33 pgs
objects: 64 objects, 158 MiB
usage: 3.5 GiB used, 15 GiB / 18 GiB avail
pgs: 33 active+clean
```
One OSDs went down , Ceph mons also go down one at a time during the upgrade , cluster shows warning status:
```
health: HEALTH_WARN
1 osds down
1 host (1 osds) down
Degraded data redundancy: 65/195 objects degraded (33.333%), 28 pgs degraded
data:
pools: 2 pools, 33 pgs
objects: 65 objects, 158 MiB
usage: 516 MiB used, 17 GiB / 18 GiB avail
pgs: 24.242% pgs not active
49/195 objects degraded (25.128%)
22 active+undersized+degraded
8 peering
3 active+undersized
```
After OSDs are upgraded to latest version:
There is no impact on the data as the write operation performed in db is still intact.
```
airship@d105:~$ kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status
cluster:
id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 16h)
mgr: a(active, since 18h)
osd: 3 osds: 3 up (since 18h), 3 in (since 7d)
data:
pools: 2 pools, 96 pgs
objects: 104 objects, 248 MiB
usage: 1.0 GiB used, 17 GiB / 18 GiB avail
pgs: 96 active+clean
io:
client: 4.7 KiB/s wr, 0 op/s rd, 0 op/s wr
```
Write Operation intact after CEPH upgrade:
```
airship@d105:~$ kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph osd status
ID HOST USED AVAIL WR OPS WR DATA RD OPS RD DATA STATE
0 node03 428M 5716M 0 0 0 0 exists,up
1 node04 447M 5696M 0 6552 0 0 exists,up
2 node05 446M 5697M 0 0 0 0 exists,up
```