Ceph Daemon Container Promotion Pipeline

tags: Design

Problem

Storage team depends on a manual process of testing and promoting ceph-dameon container image:

Promotion Workflow

  1. Promoting latest ceph-daemon stable tag

  2. Extend jobs to consume the 'ci-testing' tag instead of 'job.docker_ceph_tag' or the default value

    • Start with standalone jobs - read ceph-ci-testing tag from server
      • we can create a new role, so we can re-use later in other jobs
    • Jobs:
      • periodic-tripleo-ci-centos-9-scenario001-standalone-ceph-{branch}
      • periodic-tripleo-ci-centos-9-scenario004-standalone-ceph-{branch}
    • Repos to be used:

    2.1. Report results to DLRN

    • Publish as regular integration job, with aggregate_hash (current-tripleo)
    • We need to publish the ceph_tag somewhere:
      1. send ceph_daemon_tag as 'notes' in 'report-result'
      2. do not publish that info, and extract it from somewhere else
      3. any other idea?
  3. Promotion job to promote ceph daemon tag

    • Reads from DLRN API: jobs with aggregate_hash and filter by 'ceph_daemon_tag'
    • Promote if match criteria:
    • We can reuse code from compose_promoter + dlrnapi_promoter
    • Question: Should we use our promoter instead of a job?
      • +1:
        • reusing code is usually good (container push); server already configured and running.
      • -1:
        • mixing ceph promotion with our comp/integration promotions
        • we probably need to extend promoter code to promote simple artifacts (similar to qcow client and compose_promoter)
        • a new job is easier modify and maintain (by other teams/storage/ceph)

Alternative Workflow

Sandeep proposal, using Zuul dependencies:

  • we still will need jobs to fetch latest stable tag, and to promote/push content, but avoids using DLRN to store job results. E.g:
- job_which_find_the_latest_stable_hash # this job to return latest_stable_hash as artifact  
- sc01_job_with_latest_stable_hash:
    dependencies:
      - job_which_find_the_latest_stable_hash
- sc04_job_with_latest_stable_hash: # this job to return pass/fail as artifact
    dependencies:
      - job_which_find_the_latest_stable_hash   
- ovb_job_with_latest_stable_hash: 
    dependencies:
      - job_which_find_the_latest_stable_hash       
- job_which_push_container_to_regisry_with_current-tripleo_hash: # Will use artifact from job_which_find_the_latest_stable_hash
    dependencies:
      - job_which_find_the_latest_stable_hash
      - sc01_job_with_latest_stable_hash
      - sc04_job_with_latest_stable_hash
      - ovb_job_with_latest_stable_hash
  • +1:

    • easier to implement and maintain
    • we still can testproject jobs with dependencies
    • don't need to depend on DLRN 'notes' to store results
    • we can use zuul return vars to avoid promotions without running jobs
  • -1:

    • the promotion job itself will not validate current criteria, will work based on dependencies. If someone runs that job, will always promote the latest tag and push it to registry
      • we may want to use zuul artifacts to report success/false on dependency jobs
      • we can maintain the criteria within job's run playbook - and use DNM changes to skip a job on criteria

Comments:

  • on promotion job, consider add an option to force push a different tag, in case we want to rollback a tag
  • we shall make unwanted promotion harder, so users will need to know what they are doing, when skipping an criteria to promote a tag

Monitoring

  1. Create/Update dashboards to show ceph-daemon jobs and promotion
  2. Add nightly jobs:
Select a repo