# Ceph Daemon Container Promotion Pipeline
###### tags: `Design`
## Problem
Storage team depends on a manual process of testing and promoting ceph-dameon container image:
* https://review.opendev.org/c/openstack/tripleo-common/+/832810
* https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/832756
## Promotion Workflow
1. Promoting latest ceph-daemon stable tag
* Objective: read latest stable tag from quay.io/ceph/daemon and promote it to 'ci-testing'
* https://quay.io/repository/ceph/daemon?tab=tags
* Role/playbook that will read all tags from registry and find latest stable tag
1. Promotion: create an artifact in rdo image server that contains lastest stable tag
* we can reuse existing code (compose_promoter on ci-config)
* https://github.com/rdo-infra/ci-config/tree/master/ci-scripts/infra-setup/roles/compose_promoter
* https://images.rdoproject.org/centos_compose/
* ci-testing tag (ceph-ci-testing, ceph-daemon-ci-testing)
2. Extend jobs to consume the 'ci-testing' tag instead of 'job.docker_ceph_tag' or the default value
* Start with standalone jobs - read ceph-ci-testing tag from server
* we can create a new role, so we can re-use later in other jobs
* Jobs:
* periodic-tripleo-ci-centos-9-scenario001-standalone-ceph-{branch}
* periodic-tripleo-ci-centos-9-scenario004-standalone-ceph-{branch}
* Repos to be used:
* OpenStack: current-tripleo
* Cephadm: "-release" repo - https://buildlogs.centos.org/centos/9-stream/storage/x86_64/ceph-pacific/
2.1. Report results to DLRN
* Publish as regular integration job, with aggregate_hash (current-tripleo)
* We need to publish the ceph_tag somewhere:
1. send ceph_daemon_tag as 'notes' in 'report-result'
* https://dlrn.readthedocs.io/en/latest/api.html#post-api-report-result
* the only field left to add this info
~~2. use extended_hash (md5 dlrn_ceph_tag)?~~
3. do not publish that info, and extract it from somewhere else
4. any other idea?
3. Promotion job to promote ceph daemon tag
* Reads from DLRN API: jobs with aggregate_hash and filter by 'ceph_daemon_tag'
* Promote if match criteria:
* Promote artifact in RDO image-server
* Push container image to quay.io (tripleo namespace)
* (chandan): https://opendev.org/openstack/tripleo-quickstart-extras/src/branch/master/roles/container-build/tasks/non_tripleo_containers.yml - need to take care on how to skip non-tripleo containers that exist on tripleo namespace
* We can reuse code from compose_promoter + dlrnapi_promoter
* Question: Should we use our promoter instead of a job?
* +1:
* reusing code is usually good (container push); server already configured and running.
* -1:
* mixing ceph promotion with our comp/integration promotions
* we probably need to extend promoter code to promote simple artifacts (similar to qcow client and compose_promoter)
* a new job is easier modify and maintain (by other teams/storage/ceph)
## Alternative Workflow
Sandeep proposal, using Zuul dependencies:
* we still will need jobs to fetch latest stable tag, and to promote/push content, but avoids using DLRN to store job results. E.g:
```
- job_which_find_the_latest_stable_hash # this job to return latest_stable_hash as artifact
- sc01_job_with_latest_stable_hash:
dependencies:
- job_which_find_the_latest_stable_hash
- sc04_job_with_latest_stable_hash: # this job to return pass/fail as artifact
dependencies:
- job_which_find_the_latest_stable_hash
- ovb_job_with_latest_stable_hash:
dependencies:
- job_which_find_the_latest_stable_hash
- job_which_push_container_to_regisry_with_current-tripleo_hash: # Will use artifact from job_which_find_the_latest_stable_hash
dependencies:
- job_which_find_the_latest_stable_hash
- sc01_job_with_latest_stable_hash
- sc04_job_with_latest_stable_hash
- ovb_job_with_latest_stable_hash
```
* +1:
* easier to implement and maintain
* we still can testproject jobs with dependencies
* don't need to depend on DLRN 'notes' to store results
* we can use zuul return vars to avoid promotions without running jobs
* -1:
* the promotion job itself will not validate current criteria, will work based on dependencies. If someone runs that job, will always promote the latest tag and push it to registry
* we may want to use zuul artifacts to report success/false on dependency jobs
* we can maintain the criteria within job's run playbook - and use DNM changes to skip a job on criteria
Comments:
* on promotion job, consider add an option to force push a different tag, in case we want to rollback a tag
* we shall make unwanted promotion harder, so users will need to know what they are doing, when skipping an criteria to promote a tag
## Monitoring
1. Create/Update dashboards to show ceph-daemon jobs and promotion
2. Add nightly jobs:
* https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-scenario001-ceph-nightly&job_name=tripleo-ci-centos-9-scenario004-ceph-nightly&skip=0