Configure Ceph Full Thresholds on storagecluster CR

# Configure Ceph Full Thresholds on storagecluster CR ## Before specifying values for the ratios ### Storagecluster ``` ~ $ oc get storagecluster ocs-storagecluster -o=jsonpath='{.spec.managedResources.cephCluster}' | jq {} ``` ### Cephcluster ``` ~ $ oc get cephcluster ocs-storagecluster-cephcluster -o=jsonpath='{.spec.storage}' | jq '{nearFullRatio, backfillFullRatio, fullRatio}' { "nearFullRatio": 0.75, "backfillFullRatio": 0.8, "fullRatio": 0.85 } ``` ### Toolbox ``` sh-5.1$ ceph osd dump | grep full_ratio full_ratio 0.85 backfillfull_ratio 0.8 nearfull_ratio 0.75 ``` ### Promrule ``` ~ $ oc get prometheusrule prometheus-ceph-rules -o yaml | yq '.spec.groups[] | select(.name == "cluster-utilization-alert.rules") | .rules[] | select(.alert == "CephClusterNearFull" or .alert == "CephClusterCriticallyFull" or .alert == "CephClusterReadOnly")' alert: CephClusterNearFull annotations: description: Storage cluster utilization has crossed 75% and will become read-only at 85%. Free up some space or expand the storage cluster. message: Storage cluster is nearing full. Data deletion or cluster expansion is required. runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephClusterNearFull.md severity_level: warning storage_type: ceph expr: | ceph_cluster_total_used_raw_bytes / ceph_cluster_total_bytes > 0.75 for: 5s labels: severity: warning alert: CephClusterCriticallyFull annotations: description: Storage cluster utilization has crossed 80% and will become read-only at 85%. Free up some space or expand the storage cluster immediately. message: Storage cluster is critically full and needs immediate data deletion or cluster expansion. runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephClusterCriticallyFull.md severity_level: error storage_type: ceph expr: | ceph_cluster_total_used_raw_bytes / ceph_cluster_total_bytes > 0.80 for: 5s labels: severity: critical alert: CephClusterReadOnly annotations: description: Storage cluster utilization has crossed 85% and will become read-only now. Free up some space or expand the storage cluster immediately. message: Storage cluster is read-only now and needs immediate data deletion or cluster expansion. runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephClusterReadOnly.md severity_level: error storage_type: ceph expr: | ceph_cluster_total_used_raw_bytes / ceph_cluster_total_bytes >= 0.85 for: 0s labels: severity: critical ``` ``` ~ $ oc get prometheusrule prometheus-ceph-rules -o yaml | yq '.spec.groups[] | select(.name == "osd-alert.rules") | .rules[] | select(.alert == "CephOSDCriticallyFull" or .alert == "CephOSDNearFull")' alert: CephOSDCriticallyFull annotations: description: Utilization of storage device {{ $labels.ceph_daemon }} of device_class type {{ $labels.device_class }} has crossed 80% on host {{ $labels.hostname }}. Immediately free up some space or add capacity of type {{ $labels.device_class }}. message: Back-end storage device is critically full. runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephOSDCriticallyFull.md severity_level: error storage_type: ceph expr: | (ceph_osd_metadata * on (ceph_daemon, namespace, managedBy) group_right(device_class,hostname) (ceph_osd_stat_bytes_used / ceph_osd_stat_bytes)) >= 0.80 for: 40s labels: severity: critical alert: CephOSDNearFull annotations: description: Utilization of storage device {{ $labels.ceph_daemon }} of device_class type {{ $labels.device_class }} has crossed 75% on host {{ $labels.hostname }}. Immediately free up some space or add capacity of type {{ $labels.device_class }}. message: Back-end storage device is nearing full. runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephOSDNearFull.md severity_level: warning storage_type: ceph expr: | (ceph_osd_metadata * on (ceph_daemon, namespace, managedBy) group_right(device_class,hostname) (ceph_osd_stat_bytes_used / ceph_osd_stat_bytes)) >= 0.75 for: 40s labels: severity: warning ``` --- ## Specifying values for the ratios ### 0.70 for Nearfull ``` ~ $ oc patch storagecluster ocs-storagecluster -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/managedResources/cephCluster/nearFullRatio", "value": 0.70 }]' ``` ### 0.85 for backfillFull ``` ~ $ oc patch storagecluster ocs-storagecluster -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/managedResources/cephCluster/backfillFullRatio", "value": 0.85 }]' ``` ### 0.95 for Full ``` ~ $ oc patch storagecluster ocs-storagecluster -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/managedResources/cephCluster/fullRatio", "value": 0.95 }]' ``` --- ## After specifying values for the ratios ### Storagecluster ``` ~ $ oc get storagecluster ocs-storagecluster -o=jsonpath='{.spec.managedResources.cephCluster}' | jq { "backfillFullRatio": 0.85, "fullRatio": 0.95, "nearFullRatio": 0.7 } ``` ### Cephcluster ``` ~ $ oc get cephcluster ocs-storagecluster-cephcluster -o=jsonpath='{.spec.storage}' | jq '{nearFullRatio, backfillFullRatio, fullRatio}' { "nearFullRatio": 0.7, "backfillFullRatio": 0.85, "fullRatio": 0.95 } ``` ### Toolbox ``` sh-5.1$ ceph osd dump | grep full_ratio full_ratio 0.95 backfillfull_ratio 0.85 nearfull_ratio 0.7 ``` ### Promrule ``` ~ $ oc get prometheusrule prometheus-ceph-rules -o yaml | yq '.spec.groups[] | select(.name == "cluster-utilization-alert.rules") | .rules[] | select(.alert == "CephClusterNearFull" or .alert == "CephClusterCriticallyFull" or .alert == "CephClusterReadOnly")' alert: CephClusterNearFull annotations: description: Storage cluster utilization has crossed 70.00% and will become read-only at 95.00%. Free up some space or expand the storage cluster. message: Storage cluster is nearing full. Data deletion or cluster expansion is required. runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephClusterNearFull.md severity_level: warning storage_type: ceph expr: | ceph_cluster_total_used_raw_bytes / ceph_cluster_total_bytes > 0.700000 for: 5s labels: severity: warning alert: CephClusterCriticallyFull annotations: description: Storage cluster utilization has crossed 82.50% and will become read-only at 95.00%. Free up some space or expand the storage cluster immediately. message: Storage cluster is critically full and needs immediate data deletion or cluster expansion. runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephClusterCriticallyFull.md severity_level: error storage_type: ceph expr: | ceph_cluster_total_used_raw_bytes / ceph_cluster_total_bytes > 0.825000 for: 5s labels: severity: critical alert: CephClusterReadOnly annotations: description: Storage cluster utilization has crossed 95.00% and will become read-only now. Free up some space or expand the storage cluster immediately. message: Storage cluster is read-only now and needs immediate data deletion or cluster expansion. runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephClusterReadOnly.md severity_level: error storage_type: ceph expr: | ceph_cluster_total_used_raw_bytes / ceph_cluster_total_bytes >= 0.950000 for: 0s labels: severity: critical ``` ``` ~ $ oc get prometheusrule prometheus-ceph-rules -o yaml | yq '.spec.groups[] | select(.name == "osd-alert.rules") | .rules[] | select(.alert == "CephOSDCriticallyFull" or .alert == "CephOSDNearFull")' alert: CephOSDCriticallyFull annotations: description: Utilization of storage device {{ $labels.ceph_daemon }} of device_class type {{ $labels.device_class }} has crossed 82.50% on host {{ $labels.hostname }}. Immediately free up some space or add capacity of type {{ $labels.device_class }}. message: Back-end storage device is critically full. runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephOSDCriticallyFull.md severity_level: error storage_type: ceph expr: | (ceph_osd_metadata * on (ceph_daemon, namespace, managedBy) group_right(device_class,hostname) (ceph_osd_stat_bytes_used / ceph_osd_stat_bytes)) >= 0.825000 for: 40s labels: severity: critical alert: CephOSDNearFull annotations: description: Utilization of storage device {{ $labels.ceph_daemon }} of device_class type {{ $labels.device_class }} has crossed 70.00% on host {{ $labels.hostname }}. Immediately free up some space or add capacity of type {{ $labels.device_class }}. message: Back-end storage device is nearing full. runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephOSDNearFull.md severity_level: warning storage_type: ceph expr: | (ceph_osd_metadata * on (ceph_daemon, namespace, managedBy) group_right(device_class,hostname) (ceph_osd_stat_bytes_used / ceph_osd_stat_bytes)) >= 0.700000 for: 40s labels: severity: warning ```