# Ceph Daemon Container Promotion Pipeline ###### tags: `Design` ## Problem Storage team depends on a manual process of testing and promoting ceph-dameon container image: * https://review.opendev.org/c/openstack/tripleo-common/+/832810 * https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/832756 ## Promotion Workflow 1. Promoting latest ceph-daemon stable tag * Objective: read latest stable tag from quay.io/ceph/daemon and promote it to 'ci-testing' * https://quay.io/repository/ceph/daemon?tab=tags * Role/playbook that will read all tags from registry and find latest stable tag 1. Promotion: create an artifact in rdo image server that contains lastest stable tag * we can reuse existing code (compose_promoter on ci-config) * https://github.com/rdo-infra/ci-config/tree/master/ci-scripts/infra-setup/roles/compose_promoter * https://images.rdoproject.org/centos_compose/ * ci-testing tag (ceph-ci-testing, ceph-daemon-ci-testing) 2. Extend jobs to consume the 'ci-testing' tag instead of 'job.docker_ceph_tag' or the default value * Start with standalone jobs - read ceph-ci-testing tag from server * we can create a new role, so we can re-use later in other jobs * Jobs: * periodic-tripleo-ci-centos-9-scenario001-standalone-ceph-{branch} * periodic-tripleo-ci-centos-9-scenario004-standalone-ceph-{branch} * Repos to be used: * OpenStack: current-tripleo * Cephadm: "-release" repo - https://buildlogs.centos.org/centos/9-stream/storage/x86_64/ceph-pacific/ 2.1. Report results to DLRN * Publish as regular integration job, with aggregate_hash (current-tripleo) * We need to publish the ceph_tag somewhere: 1. send ceph_daemon_tag as 'notes' in 'report-result' * https://dlrn.readthedocs.io/en/latest/api.html#post-api-report-result * the only field left to add this info ~~2. use extended_hash (md5 dlrn_ceph_tag)?~~ 3. do not publish that info, and extract it from somewhere else 4. any other idea? 3. Promotion job to promote ceph daemon tag * Reads from DLRN API: jobs with aggregate_hash and filter by 'ceph_daemon_tag' * Promote if match criteria: * Promote artifact in RDO image-server * Push container image to quay.io (tripleo namespace) * (chandan): https://opendev.org/openstack/tripleo-quickstart-extras/src/branch/master/roles/container-build/tasks/non_tripleo_containers.yml - need to take care on how to skip non-tripleo containers that exist on tripleo namespace * We can reuse code from compose_promoter + dlrnapi_promoter * Question: Should we use our promoter instead of a job? * +1: * reusing code is usually good (container push); server already configured and running. * -1: * mixing ceph promotion with our comp/integration promotions * we probably need to extend promoter code to promote simple artifacts (similar to qcow client and compose_promoter) * a new job is easier modify and maintain (by other teams/storage/ceph) ## Alternative Workflow Sandeep proposal, using Zuul dependencies: * we still will need jobs to fetch latest stable tag, and to promote/push content, but avoids using DLRN to store job results. E.g: ``` - job_which_find_the_latest_stable_hash # this job to return latest_stable_hash as artifact - sc01_job_with_latest_stable_hash: dependencies: - job_which_find_the_latest_stable_hash - sc04_job_with_latest_stable_hash: # this job to return pass/fail as artifact dependencies: - job_which_find_the_latest_stable_hash - ovb_job_with_latest_stable_hash: dependencies: - job_which_find_the_latest_stable_hash - job_which_push_container_to_regisry_with_current-tripleo_hash: # Will use artifact from job_which_find_the_latest_stable_hash dependencies: - job_which_find_the_latest_stable_hash - sc01_job_with_latest_stable_hash - sc04_job_with_latest_stable_hash - ovb_job_with_latest_stable_hash ``` * +1: * easier to implement and maintain * we still can testproject jobs with dependencies * don't need to depend on DLRN 'notes' to store results * we can use zuul return vars to avoid promotions without running jobs * -1: * the promotion job itself will not validate current criteria, will work based on dependencies. If someone runs that job, will always promote the latest tag and push it to registry * we may want to use zuul artifacts to report success/false on dependency jobs * we can maintain the criteria within job's run playbook - and use DNM changes to skip a job on criteria Comments: * on promotion job, consider add an option to force push a different tag, in case we want to rollback a tag * we shall make unwanted promotion harder, so users will need to know what they are doing, when skipping an criteria to promote a tag ## Monitoring 1. Create/Update dashboards to show ceph-daemon jobs and promotion 2. Add nightly jobs: * https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-scenario001-ceph-nightly&job_name=tripleo-ci-centos-9-scenario004-ceph-nightly&skip=0