owned this note
owned this note
Published
Linked with GitHub
# Upgrade Rook for a brownfield site [TM#194](https://github.com/airshipit/treasuremap/issues/194)
[TOC]
## Dependencies and constraints for a Rook upgrade in brownfield deployment
This POC will determine the proposed plan to upgrade an existing rook operator from v1.6.11 to v1.7.11.
It will also identify the required sequence, dependencies and constraints. As well as any impacts to cluster availability and performance during an upgrade.
Assuming that the original cluster was deployed using rook-ceph operator, the brownfield scenario could consist of two independent steps:
* Operator upgrade
* Ceph upgrade
Both steps could be accomplished in any sequence by following set of rules listed below:
1. Before upgrade the ceph cluster should be in a healthy state. It is possible (but not recommended) to perform an upgrade on a cluster which has some warnings alarms, however in this event a person responsible for maintenance should make a decision. Below are some examples of warnings when we still can proceed with upgrades:
* some osds are permanently out/down because of drives errors.
* some of PGs are in peering/waiting state because of scrubbing or deep scrubbing
* there are some PGs that are not scrubbed in time
* there are large omap objects
However, warnings like
* osd almost full
* osds are flapping and similar
Should be considered as a red flag for the brownfield upgrade. To summarize warnings listed above - there should be made a human decision about warning severity.
2. The upgrade should be performed within two major releases, e.g: rook 1.6 -> 1.7 and/or ceph 15.x -> ceph 16.x. It is recommended to upgrade ceph to the latest minor release before performing a major release upgrade.
3. Planning ceph upgrade to the next major release, it is recommended to perform the operator upgrade first. Usually, rook operator supports three major ceph releases N-1, N and N+1, e.g: rook 1.7 supports Nautilus, Octopus and Pacific.
4. It is possible to perform downgrade, as well. For the ceph the downgrade was tested between minor releases. Performing downgrade the attention should be paid to the ceph release notes. We can downgrade between bug-fix releases, but feature releases shouldn't be downgraded under any circumstances. As an example, latest octopus should not be downgraded to the previous minor versions because of data base schematics change.
5. Different upgrade scenarios performed in the local lab confirming that there are no significant performance or availability impacts. The operator upgrade doesn't affect a ceph functionality, according to the rook documentation, the ceph cluster remains fully functional with only minimal limitations. The performance impact during the ceph upgrade is absolutely comparable to the impact triggered by regular maintenance like osd node reboot or hard drive replacement. This level of impact is expected and well documented.
To summarize the above statement, both brownfield operations are harmless for the cluster.
## Rook Operator Upgrade Process
In this scenario we will upgrade rook operator from v1.6.11 to v1.7.11
### Pre-requisites and health status
Initial status of the rook operator and cephcluster:
```
airship@d105:~/shon/rook/cluster/examples/kubernetes/ceph$ kubectl get cephclusters.ceph.rook.io -n rook-ceph
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 3 80m Ready Cluster created successfully HEALTH_OK
```
```
airship@d105:~/shon/rook/cluster/examples/kubernetes/ceph$ kubectl get pods -n rook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-84jjc 3/3 Running 0 29m
csi-cephfsplugin-provisioner-775dcbbc86-6kw67 6/6 Running 0 29m
csi-cephfsplugin-provisioner-775dcbbc86-bhqr4 6/6 Running 0 29m
csi-cephfsplugin-qvqxx 3/3 Running 0 29m
csi-cephfsplugin-zs42r 3/3 Running 0 29m
csi-rbdplugin-chnkn 3/3 Running 0 29m
csi-rbdplugin-mnmb7 3/3 Running 0 29m
csi-rbdplugin-provisioner-5868bd8b55-6c4jw 6/6 Running 0 29m
csi-rbdplugin-provisioner-5868bd8b55-z7xd5 6/6 Running 0 29m
csi-rbdplugin-vklx8 3/3 Running 0 29m
rook-ceph-crashcollector-node03-678646c48c-gs4gr 1/1 Running 0 26m
rook-ceph-crashcollector-node04-6f688cdd56-phzdv 1/1 Running 0 26m
rook-ceph-crashcollector-node05-cc865d54f-bpwf2 1/1 Running 0 26m
rook-ceph-mgr-a-696fb58d75-h5pw4 1/1 Running 0 26m
rook-ceph-mon-a-5cb4fbdf47-wgh2w 1/1 Running 0 29m
rook-ceph-mon-b-88d5c7db6-7n9kc 1/1 Running 0 27m
rook-ceph-mon-c-cdf7b8bc-zx5wt 1/1 Running 0 27m
rook-ceph-operator-bfdc879fd-24xpg 1/1 Running 0 32m
rook-ceph-osd-0-85648cfd7-frj48 1/1 Running 0 26m
rook-ceph-osd-1-5748896bc-pm4nd 1/1 Running 0 26m
rook-ceph-osd-2-744c4b4c9d-clqbb 1/1 Running 0 26m
rook-ceph-osd-prepare-node03-hbv9c 0/1 Completed 0 26m
rook-ceph-osd-prepare-node04-m7vfj 0/1 Completed 0 26m
rook-ceph-osd-prepare-node05-5945n 0/1 Completed 0 26m
```
Operator/Ceph versions:
```
airship@d105:~/shon/rook/cluster/examples/kubernetes/ceph$ kubectl get deployments.apps -n rook-ceph -o=custom-columns="NAME:.metadata.name,IMAGE:.spec.template.spec.containers[*].image"
NAME IMAGE
csi-cephfsplugin-provisioner k8s.gcr.io/sig-storage/csi-attacher:v3.2.1,k8s.gcr.io/sig-storage/csi-snapshotter:v4.1.1,k8s.gcr.io/sig-storage/csi-resizer:v1.2.0,k8s.gcr.io/sig-storage/csi-provisioner:v2.2.2,quay.io/cephcsi/cephcsi:v3.3.1,quay.io/cephcsi/cephcsi:v3.3.1
csi-rbdplugin-provisioner k8s.gcr.io/sig-storage/csi-provisioner:v2.2.2,k8s.gcr.io/sig-storage/csi-resizer:v1.2.0,k8s.gcr.io/sig-storage/csi-attacher:v3.2.1,k8s.gcr.io/sig-storage/csi-snapshotter:v4.1.1,quay.io/cephcsi/cephcsi:v3.3.1,quay.io/cephcsi/cephcsi:v3.3.1
rook-ceph-crashcollector-node03 ceph/ceph:v15.2.13
rook-ceph-crashcollector-node04 ceph/ceph:v15.2.13
rook-ceph-crashcollector-node05 ceph/ceph:v15.2.13
rook-ceph-mgr-a ceph/ceph:v15.2.13
rook-ceph-mon-a ceph/ceph:v15.2.13
rook-ceph-mon-b ceph/ceph:v15.2.13
rook-ceph-mon-c ceph/ceph:v15.2.13
rook-ceph-operator rook/ceph:v1.6.11
rook-ceph-osd-0 ceph/ceph:v15.2.13
rook-ceph-osd-1 ceph/ceph:v15.2.13
rook-ceph-osd-2 ceph/ceph:v15.2.13
```
Health status:
```
airship@d105:~/shon/rook/cluster/examples/kubernetes/ceph$ kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status
cluster:
id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 77m)
mgr: a(active, since 76m)
osd: 3 osds: 3 up (since 76m), 3 in (since 76m)
data:
pools: 1 pools, 1 pgs
objects: 1 objects, 0 B
usage: 3.0 GiB used, 15 GiB / 18 GiB avail
pgs: 1 active+clean
```
```
airship@d105:~/shon/upgrade/rook/cluster/examples/kubernetes$ kubectl get pv,pvc
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/ironic-pv-volume 10Gi RWO Retain Bound metal3/ironic-pv-claim default 5d4h
persistentvolume/pvc-51ff734a-db0e-4044-a999-07da1f5e7d98 2Gi RWO Delete Bound default/mysql-pv-claim rook-ceph-block 13s
persistentvolume/pvc-9738a6b2-03d4-41a9-9fa9-d344429aef9f 2Gi RWO Delete Bound default/wp-pv-claim rook-ceph-block 7s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/mysql-pv-claim Bound pvc-51ff734a-db0e-4044-a999-07da1f5e7d98 2Gi RWO rook-ceph-block 13s
persistentvolumeclaim/wp-pv-claim Bound pvc-9738a6b2-03d4-41a9-9fa9-d344429aef9f 2Gi RWO rook-ceph-block 8s
```
### Steps for Rook operator upgrade.
#### 1) Update common resources and CRDs
First get the latest common resources manifests that contain the latest changes.
```
git clone --single-branch --depth=1 --branch v1.7.11 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
```
Then apply the latest changes.
```
kubectl apply -f common.yaml -f crds.yaml
```
#### 2) Update the Rook Operator
The largest portion of the upgrade is triggered when the operator’s image is updated to v1.7.x. When the operator is updated, it will proceed to update all of the Ceph daemons.
```
kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.7.11
```
#### 3) Wait for the upgrade to complete
Watch now in amazement as the Ceph mons, mgrs, OSDs, rbd-mirrors, MDSes and RGWs are terminated and replaced with updated versions in sequence. The cluster may be offline very briefly as mons update, and the Ceph Filesystem may fall offline a few times while the MDSes are upgrading. This is normal.
The versions of the components can be viewed as they are updated:
```
watch --exec kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'
```
During upgrade:
```
Every 2.0s: kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath={range .items[*]}{.metadata.nam... d105: Mon Jan 24 11:29:44 2022
rook-ceph-crashcollector-node03 req/upd/avl: 1/1/1 rook-version=v1.7.11
rook-ceph-crashcollector-node04 req/upd/avl: 1/1/1 rook-version=v1.7.11
rook-ceph-crashcollector-node05 req/upd/avl: 1/1/1 rook-version=v1.7.11
rook-ceph-mgr-a req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-mon-a req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-mon-b req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-mon-c req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-osd-0 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-osd-1 req/upd/avl: 1/1/1 rook-version=v1.6.11
rook-ceph-osd-2 req/upd/avl: 1/1/1 rook-version=v1.6.11
```
#### 4) Verify the updated cluster
```
# kubectl -n $ROOK_CLUSTER_NAMESPACE get deployment -l rook_cluster=$ROOK_CLUSTER_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq
This cluster is not yet finished:
rook-version=v1.6.11
rook-version=v1.7.11
This cluster is finished:
rook-version=v1.7.11
```
After upgrade completed:
```
Every 2.0s: kubectl -n rook-ceph get deployments -l rook_cluster=rook-ceph -o jsonpath={range .items[*]}{.metadata.nam... d105: Mon Jan 24 11:32:43 2022
rook-ceph-crashcollector-node03 req/upd/avl: 1/1/1 rook-version=v1.7.11
rook-ceph-crashcollector-node04 req/upd/avl: 1/1/1 rook-version=v1.7.11
rook-ceph-crashcollector-node05 req/upd/avl: 1/1/1 rook-version=v1.7.11
rook-ceph-mgr-a req/upd/avl: 1/1/1 rook-version=v1.7.11
rook-ceph-mon-a req/upd/avl: 1/1/1 rook-version=v1.7.11
rook-ceph-mon-b req/upd/avl: 1/1/1 rook-version=v1.7.11
rook-ceph-mon-c req/upd/avl: 1/1/1 rook-version=v1.7.11
rook-ceph-osd-0 req/upd/avl: 1/1/1 rook-version=v1.7.11
rook-ceph-osd-1 req/upd/avl: 1/1/1 rook-version=v1.7.11
rook-ceph-osd-2 req/upd/avl: 1/1/1 rook-version=v1.7.11
```
```
airship@d105:~/shon/upgrade/rook/cluster/examples/kubernetes/ceph$ kubectl get deployments.apps -n rook-ceph -o=custom-columns="NAME:.metadata.name,IMAGE:.spec.template.spec.containers[*].image"
NAME IMAGE
csi-cephfsplugin-provisioner k8s.gcr.io/sig-storage/csi-attacher:v3.3.0,k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0,k8s.gcr.io/sig-storage/csi-resizer:v1.3.0,k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0,quay.io/cephcsi/cephcsi:v3.4.0,quay.io/cephcsi/cephcsi:v3.4.0
csi-rbdplugin-provisioner k8s.gcr.io/sig-storage/csi-provisioner:v3.0.0,k8s.gcr.io/sig-storage/csi-resizer:v1.3.0,k8s.gcr.io/sig-storage/csi-attacher:v3.3.0,k8s.gcr.io/sig-storage/csi-snapshotter:v4.2.0,quay.io/cephcsi/cephcsi:v3.4.0,quay.io/cephcsi/cephcsi:v3.4.0
rook-ceph-crashcollector-node03 ceph/ceph:v15.2.13
rook-ceph-crashcollector-node04 ceph/ceph:v15.2.13
rook-ceph-crashcollector-node05 ceph/ceph:v15.2.13
rook-ceph-mgr-a ceph/ceph:v15.2.13
rook-ceph-mon-a ceph/ceph:v15.2.13
rook-ceph-mon-b ceph/ceph:v15.2.13
rook-ceph-mon-c ceph/ceph:v15.2.13
rook-ceph-operator rook/ceph:v1.7.11
rook-ceph-osd-0 ceph/ceph:v15.2.13
rook-ceph-osd-1 ceph/ceph:v15.2.13
rook-ceph-osd-2 ceph/ceph:v15.2.13
```
```
airship@d105:~/shon/upgrade/rook/cluster/examples/kubernetes/ceph$ kubectl get pv,pvc
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/ironic-pv-volume 10Gi RWO Retain Bound metal3/ironic-pv-claim default 5d4h
persistentvolume/pvc-51ff734a-db0e-4044-a999-07da1f5e7d98 2Gi RWO Delete Bound default/mysql-pv-claim rook-ceph-block 17m
persistentvolume/pvc-9738a6b2-03d4-41a9-9fa9-d344429aef9f 2Gi RWO Delete Bound default/wp-pv-claim rook-ceph-block 17m
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/mysql-pv-claim Bound pvc-51ff734a-db0e-4044-a999-07da1f5e7d98 2Gi RWO rook-ceph-block 17m
persistentvolumeclaim/wp-pv-claim Bound pvc-9738a6b2-03d4-41a9-9fa9-d344429aef9f 2Gi RWO rook-ceph-block 17m
```
## Observations:
1) When we perform the rook operator upgrade, rook components like rook-ceph-mgr,rook-ceph-mon,rook-ceph-osd upgraded to the latest version.
2) OSD went down one after another from a single node at a time out of 3 nodes. So there was always 2 OSDs which preserve quoram.
3) Ceph cluster health went to Warn state once the OSD were down for a while. Post upgrade OSD came up and cpeh cluster health back to Health OK.
4) Node reboot not required post upgrading rook operator.
5) Rook operator did not upgraded the Ceph versions, we need to upgrade the ceph version as a separate step/process.
6) Different upgrade scenarios performed in the local lab confirming that there are no significant performance or availability impacts. The operator upgrade doesn't affect a ceph functionality, according to the rook documentation, the ceph cluster remains fully functional with only minimal limitations. The performance impact during the ceph upgrade is absolutely comparable to the impact triggered by regular maintenance like osd node reboot or hard drive replacement. This level of impact is expected and well documented.
#### Performance Impact during upgrade process:
```
Intial ceph status:
Every 2.0s: kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status d105: Mon Jan 24 11:30:01 2022
cluster:
id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 110m)
mgr: a(active, since 109m)
osd: 3 osds: 3 up (since 109m), 3 in (since 109m)
data:
pools: 2 pools, 33 pgs
objects: 64 objects, 158 MiB
usage: 3.4 GiB used, 15 GiB / 18 GiB avail
pgs: 33 active+clean
1st OSD went down from node03
Every 2.0s: kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status d105: Mon Jan 24 11:31:05 2022
cluster:
id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6
health: HEALTH_WARN
1 osds down
1 host (1 osds) down
Degraded data redundancy: 64/192 objects degraded (33.333%), 28 pgs degraded
services:
mon: 3 daemons, quorum a,b,c (age 111m)
mgr: a(active, since 14s)
osd: 3 osds: 2 up (since 5s), 3 in (since 110m)
data:
pools: 2 pools, 33 pgs
objects: 64 objects, 158 MiB
usage: 3.5 GiB used, 15 GiB / 18 GiB avail
pgs: 64/192 objects degraded (33.333%)
28 active+undersized+degraded
5 active+undersized
After OSD upgraded to latest version:
Every 2.0s: kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status d105: Mon Jan 24 11:31:16 2022
cluster:
id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 111m)
mgr: a(active, since 25s)
osd: 3 osds: 3 up (since 5s), 3 in (since 110m)
data:
pools: 2 pools, 33 pgs
objects: 64 objects, 158 MiB
usage: 3.5 GiB used, 15 GiB / 18 GiB avail
pgs: 25 active+clean
8 active+clean+wait
2nd OSD went down from node04:
Every 2.0s: kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status d105: Mon Jan 24 11:31:21 2022
cluster:
id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6
health: HEALTH_WARN
1 osds down
1 host (1 osds) down
Degraded data redundancy: 31/192 objects degraded (16.146%), 12 pgs degraded
services:
mon: 3 daemons, quorum a,b,c (age 111m)
mgr: a(active, since 31s)
osd: 3 osds: 2 up (since 4s), 3 in (since 110m)
data:
pools: 2 pools, 33 pgs
objects: 64 objects, 158 MiB
usage: 3.5 GiB used, 15 GiB / 18 GiB avail
pgs: 31/192 objects degraded (16.146%)
12 active+undersized+degraded
11 active+clean
8 stale+active+clean
2 active+undersized
Post OSD upgrade on node04 :
Every 2.0s: kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status d105: Mon Jan 24 11:31:27 2022
cluster:
id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6
health: HEALTH_WARN
1 osds down
1 host (1 osds) down
Degraded data redundancy: 64/192 objects degraded (33.333%), 28 pgs degraded
services:
mon: 3 daemons, quorum a,b,c (age 111m)
mgr: a(active, since 36s)
osd: 3 osds: 2 up (since 10s), 3 in (since 111m)
data:
pools: 2 pools, 33 pgs
objects: 64 objects, 158 MiB
usage: 3.5 GiB used, 15 GiB / 18 GiB avail
pgs: 64/192 objects degraded (33.333%)
28 active+undersized+degraded
5 active+undersized
OSD upgrade on node05:
Every 2.0s: kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status d105: Mon Jan 24 11:31:38 2022
cluster:
id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6
health: HEALTH_WARN
Degraded data redundancy: 64/192 objects degraded (33.333%), 28 pgs degraded
services:
mon: 3 daemons, quorum a,b,c (age 111m)
mgr: a(active, since 47s)
osd: 3 osds: 3 up (since 0.858719s), 3 in (since 111m)
data:
pools: 2 pools, 33 pgs
objects: 64 objects, 158 MiB
usage: 3.5 GiB used, 15 GiB / 18 GiB avail
pgs: 64/192 objects degraded (33.333%)
28 active+undersized+degraded
5 active+undersized
Post upgrade of OSD on node05:
Every 2.0s: kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status d105: Mon Jan 24 11:31:45 2022
cluster:
id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6
health: HEALTH_WARN
1 osds down
1 host (1 osds) down
Reduced data availability: 2 pgs peering
services:
mon: 3 daemons, quorum a,b,c (age 111m)
mgr: a(active, since 55s)
osd: 3 osds: 2 up (since 1.42293s), 3 in (since 111m)
data:
pools: 2 pools, 33 pgs
objects: 64 objects, 158 MiB
usage: 3.5 GiB used, 15 GiB / 18 GiB avail
pgs: 33.333% pgs not active
14 active+clean+wait
11 peering
8 stale+active+clean
Final Ceph cluster status :
Every 2.0s: kubectl exec -n rook-ceph rook-ceph-tools-65c94d77bb-6czmn -- ceph status d105: Mon Jan 24 11:32:07 2022
cluster:
id: 0b59ebfb-2e36-45aa-af62-02e1d41cc2e6
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 112m)
mgr: a(active, since 76s)
osd: 3 osds: 3 up (since 4s), 3 in (since 111m)
task status:
data:
pools: 2 pools, 33 pgs
objects: 64 objects, 158 MiB
usage: 3.5 GiB used, 15 GiB / 18 GiB avail
pgs: 22 active+clean
11 active+clean+wait
```