# Ruck and rover notes #24 ###### tags: `ruck_rover` ## PCCI Ruck Rover Sprint #23 - 5 Mar to 26 Mar 2020 :::warning Tripleo CI team ruck|rover: Wes (weshay) && Sandeep ysandeep OSP CI team ruck|rover: Waldek (wznoinsk) && Vadim (vgriner) ::: :::spoiler Useful information * [Ruck/rover primer](https://docs.openstack.org/tripleo-docs/latest/ci/ruck_rover_primer.html) * [Cockpit](http://dashboard-ci.tripleo.org/d/cockpit/cockpit?orgId=1) * * [Internal Cockpit ](http://tripleo-cockpit.usersys.redhat.com/?orgId=1) * Status * http://cistatus.tripleo.org/ * https://trello.com/b/j4IcIomh/production-chain-escalation * http://rhos-release.virt.bos.redhat.com:3030/rhosp * [Debugging Tools](https://docs.google.com/document/d/1VZhje7ZN9sk4E31fYVrPxpqMJGz5ZhHRfhte_RYMXxg/edit#) * [RDO project dashboard](https://review.rdoproject.org/grafana/?orgId=1&var-datasource=default&var-server=registry.rdoproject.org.rdocloud&var-inter=$__auto_interval_inter) * [CentOS pre-release rpm updates for minor releases](http://mirror.centos.org/centos/7/cr/x86_64/Packages/) * [Internal software factory](https://sf.hosted.upshift.rdu2.redhat.com) * [Upstream rsync mirror logs](files.openstack.org/mirror/logs/rsync-mirrors/centos.log) * [Trello retrospective](https://trello.com/b/0VFswmht/rdo-infra-retrospective?menu=filter&filter=label:UniSprint21) * Internal Dashboard * [OSP16](https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/QE/view/OSP16/) * [RHOS-INFRA Infrared issues](https://projects.engineering.redhat.com/issues/?filter=34183) * [CIX escalation](https://mojo.redhat.com/docs/DOC-1098748#jive_content_id_CIX_Escalation_Automation_and_email_format) * [CIX board](https://trello.com/b/j4IcIomh/production-chain-escalation) * [Nodepool image logs](https://softwarefactory-project.io/nodepool-log/) ::: # Sprint 23 ## New / Transient / No bug yet: @raukadah hey hey@w * 11:12 < marios> weshay|ruck: fyi 11:09 < dpawlik> if this one will be merged, https://review.opendev.org/#/c/713177 jobs will fail if they are running on f29/f28 * [OSP17][handover] jobs failing with "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'ctlplane'" - https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/QE/view/OSP17/job/phase1-17_director-rhel-8.1-virthost-1cont_1comp_1ceph-ipv4-geneve-ceph/10/artifact/.sh/05-ooo-overcloud.log/*view*/ , any fixes to infrared should be submitted to gerrit with topic of 'osp17p1' (i.e.: ir_patches_topic in https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/QE/view/OSP17/job/phase1-17_director-rhel-8.1-virthost-1cont_1comp_1ceph-ipv4-geneve-ceph/10/parameters/) * [OSP10][handover] phase2-10-rhel-7.7-openstack-all-in-one-neutron-rabbitmq failing on 'yum clean all' command, running again (to confirm) in https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/ReleaseDelivery/view/OSP10/job/phase2-10-rhel-7.7-openstack-all-in-one-neutron-rabbitmq/73/ - if fails as well then it needs investiation, contact: migarcia * [OSP13] PSI upshift quota exceeded - phase2-13-rhel-7.8-openstack-all-in-one-neutron-rabbitmq failing (I think vgriner was fixing this) * **[OSP][handover] Work to get first osp16.1 jobs ongoing** - https://projects.engineering.redhat.com/browse/RHOSINFRA-3118 **found bug**: ~~osp16.1, rhel8.2: ipmitool commands via vbmc (virtualbmc) take too long and cause overcloud introspection to fail - https://bugzilla.redhat.com/show_bug.cgi?id=1813889~~ * https://bugzilla.redhat.com/show_bug.cgi?id=1814616 - after extended team troubleshooted the issue on Mar 25th we've found a bogus nftables entries (which came in rhel8.2 only) blocking our dhcp/udp traffic, see the bugzilla for updates on the resolution * **[OSP] All IPv6 jobs are broken.** - https://projects.engineering.redhat.com/browse/RHOSINFRA-3104 (reverted = resolved, they'll work on a better patch) * **[OSP] osp16.0 jobs back working on the puddle from 2020-03-11** * **OSP15 update job fail on No such property: overcloud_container_images_urls** https://projects.engineering.redhat.com/browse/RHOSINFRA-3091 this would affect other osp versions as well - it's now fixed (as of ~9am GMT today) * OSP16 - phase1 fails * https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/phase1-16_director-rhel-8.1-virthost-1cont_1comp_1ceph-ipv4-geneve-ceph/105/artifact/.sh/04-ooo-undercloud.log *~~ https://bugzilla.redhat.com/show_bug.cgi?id=1809998 duplicate~~ * Main bug: https://bugzilla.redhat.com/1809939 * Esca: https://trello.com/c/z0LYn4Rq ## Earlier unclosed things tracked here (first is new): ### OSP ISSUES :::info OSP ISSUES ::: Thu 05 March 2020 R&R tansfer: OSP16 1)https://trello.com/c/z0LYn4Rq - main p1 blocker 2)^ also forked into https://trello.com/c/Dkvbl5Kb - memcached container issue OSP13 techdebt - nova-scheduler workaround - https://trello.com/c/pVkLagqH/1337-cixbz1803150ospphase2osp13nova-scheduler-hint-for-nova-scheduler-seems-to-be-ignored - fhubik will deal with that ouside of R&R and sprint Also * Need to construct email to pntops tlv ( check w/ Amnon ) * Express a SLA to rhos-dev re: RFE: https://projects.engineering.redhat.com/browse/RHOSINFRA-3075?filter=34183 * details from fhubik on that^ topic: https://trello.com/c/yBgjnUEn * also we need to document the processes behind the PCCI_INFRA_SUPPORT queue (how long of inactivity before we can close/decr. priority etc.) ### TripleO ISSUES :::info TripleO ISSUES ::: #### train container build failed on collectd ( 1 time ) https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-rhel-8-train-containers-build-push-vexxhost/2b2c26c/logs/containers-failed-to-build.log #### other * :warning: https://bugs.launchpad.net/tripleo/+bug/1867744 Pacemaker cluster fails to start in CI jobs * :warning: https://bugs.launchpad.net/tripleo/+bug/1868079 puppet-neutron container failed * https://bugs.launchpad.net/tripleo/+bug/1867744 Mistral failed command manually execute the following script: /var/lib/mistral/overcloud/ansible-playbook-command.sh * :warning: https://bugs.launchpad.net/tripleo/+bug/1867602 overcloud deploy failed with Systemd start for pcsd failed * https://bugs.launchpad.net/tripleo/+bug/1867599 overcloud deploy failing on fs030 and fs016 while pulling mariadb container from undercloud registry - FIX IS HERE --> https://review.opendev.org/#/c/712013/6/config/general_config/featureset030.yml BLOCKED BY MOLECULE FAIL * https://bugs.launchpad.net/tripleo/+bug/1866621 Can't run container mistral_db_sync > [name=Arx Cruz]2020-03-09 Adriano is taking a look ATM > [name=Arx Cruz]2020-03-09 According Emilien this should be fixed by https://review.rdoproject.org/r/25780 > [name=Arx Cruz]2020-03-16 I'm no longer seeing this happening, but the patch from Emilien isn't merged yet. Moving to done. > [name=Arx Cruz]2020-03-16 According yatin, that's no longer happening because mistral is pinned, moving back to open issue > https://review.opendev.org/#/c/713873/ fix merged here testing with mistral unpinning * https://bugs.launchpad.net/tripleo/+bug/1867323 Launchpad bug 1867323 in tripleo " standalone deploy failed at Error: error checking path "/run/libvirt": stat /run/libvirt: no such file or directory" * https://bugs.launchpad.net/tripleo/+bug/1864953 (intermittent/race?) Image prepare failed: [Errno 17] File exists * ~~https://review.opendev.org/#/c/710836/~~ > [name=Arx Cruz]2020/03/09 This patch was abandoned, I'll poke around in the CIX meeting to check the status * OSP * OSP 16 - https://bugzilla.redhat.com/show_bug.cgi?id=1807375#c10 verified * OSP10 * Packtack jobs still broken https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/ReleaseDelivery/view/OSP10/job/phase2-10-rhel-7.7-openstack-all-in-one-neutron-rabbitmq/59/ * problem between ci slaves and github fetch? * resolved itself, maybe temp network issue? ## Closed things tracked here: ### Tripleo Closed :::spoiler TripleO issues * https://bugs.launchpad.net/tripleo/+bug/1868439 All stable branches OVB job fail because "No module named 'keystoneauth1'" * **RHEL vs CentOS 8 for Train** * <wes> FIXED merge conflict</wes> * (merge conflict on kolla patch https://review.opendev.org/#/c/693159/) try rhel8 container build without kolla patches posted https://review.opendev.org/711198 & testproject https://review.rdoproject.org/r/25663 * nop fails cos no yum-config-manager https://review.opendev.org/#/c/693159/5/docker/base/Dockerfile.j2 https://logserver.rdoproject.org/63/25663/1/check/periodic-tripleo-rhel-8-train-containers-build-push/785736a/logs/buildah-builds/kolla-v4kkjxy6/docker/base/base-build.log * But really we want centos8 train. we have the job but not the pipeline yet * train phase 1 green build yesterday problem reporting https://ci.centos.org/view/rdo/view/promotion-pipeline/job/rdo_trunk-promote-train-current-tripleo/130/ * https://bugs.launchpad.net/tripleo/+bug/1865754 tripleo-ci-centos-8-scenario001-standalone tempest-conf fails 500 PUT http://192.168.24.1:9292/v2/images/ RADOS invalid argument * https://bugs.launchpad.net/tripleo/+bug/1865574 centos-8 multinode and undercloud jobs are hanging on the undercloud install * https://bugs.launchpad.net/tripleo/+bug/1866031 periodic centos7 fs2 upload/fs1 master fails overcloud deploy pcs "create constraint failed" * https://bugs.launchpad.net/tripleo/+bug/1867332 Mistrial tests getting skiped in undercloud deployment > [name=Arx Cruz]2020-03-13 akahat is working on this * https://bugs.launchpad.net/tripleo/+bug/1867023 ImportError: cannot import name suppress * https://bugs.launchpad.net/tripleo/+bug/1866543 All CentOS-8 jobs fail on missing various packages from component repos * https://review.rdoproject.org/r/#/c/25770/ Fix purge script * https://bugs.launchpad.net/tripleo/+bug/1866687 tempest-conf error: Setting [volume-feature-enabled] multi_backend = True; TypeError: option values must be strings * https://review.opendev.org/#/c/711976/ * https://bugs.launchpad.net/tripleo/+bug/1866965 Duplicate declaration: Package[collectd-python] is already declared at puppet-collectd and puppet-tripleo * https://review.opendev.org/#/c/712289/ * **Gate fails: No route to host** ` 2020-03-05 02:55:49 | TASK [Gathering Facts] ********************************************************* 2020-03-05 02:55:49 | Thursday 05 March 2020 02:55:49 +0000 (0:00:00.093) 0:00:07.537 ******** 2020-03-05 02:55:53 | ok: [localhost] 2020-03-05 02:55:53 | ok: [192.168.24.14] 2020-03-05 02:56:20 | [WARNING]: Unhandled error in Python interpreter discovery for host 2020-03-05 02:56:20 | 192.168.24.20: Failed to connect to the host via ssh: ssh: connect to host 2020-03-05 02:56:20 | 192.168.24.20 port 22: No route to host 2020-03-05 02:56:35 | fatal: [192.168.24.20]: UNREACHABLE! => changed=false 2020-03-05 02:56:35 | msg: |- 2020-03-05 02:56:35 | Data could not be sent to remote host "192.168.24.20". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.20 port 22: No route to host 2020-03-05 02:56:35 | unreachable: true 2020-03-05 02:56:35 | 2020-03-05 02:56:35 | NO MORE HOSTS LEFT ************************************************************* ` > [name=Arx Cruz]2020/03/06 - Wasn't able to confirm yet if this is a real issue because other failures is happening before we reach this step. > [name=Arx Cruz]2020/03/12 - Moving to done because jobs are running fine these days and we haven't see this error. * **Gate fails: RuntimeError: Ansible execution failed. playbook: /var/lib/mistral/overcloud/deploy_steps_playbook.yaml, Run Status: failed, Return Code: 2** Gate fails: RuntimeError: Ansible execution failed. playbook: /var/lib/mistral/overcloud/deploy_steps_playbook.yaml, Run Status: failed, Return Code: 2 * https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_265/710811/2/gate/tripleo-ci-centos-7-containers-multinode/2651e0b/logs/undercloud/home/zuul/overcloud_deploy.log * https://bugs.launchpad.net/tripleo/+bug/1866031 periodic centos7 fs2 upload/fs1 master fails overcloud deploy pcs "create constraint failed" > [name=Arx Cruz]2020/03/06 - Wasn't able to confirm yet if this is fixed because other failures is happening before we reach this step. * https://bugs.launchpad.net/tripleo/+bug/1867035 AttributeError: 'MoleculeItem' object has no attribute 'funcargs' in molecule jobs * https://review.opendev.org/712527 * https://review.opendev.org/712526 * :warning: https://bugs.launchpad.net/tripleo/+bug/1866316 ara < 1.0.0 is failing on overcloud-prep-image * https://review.opendev.org/#/c/711597/ > [name=Arx Cruz]2020/03/06 - I don't know the best approach here if is to update Ara, or if it is to get rid of pip2 and use only pip3 in the overcloud-prep-image role * https://bugs.launchpad.net/tripleo/+bug/1866184 centos-8 multinode overcloud deploy failed with heat_resource_tree_params = heat_resource_tree['parameters'] KeyError: 'parameters' * https://review.opendev.org/711480 * Master fs2 upload issue seems legit: * Error: /Stage[main]/Tripleo::Profile::Pacemaker::Ovn_dbs_bundle/Pacemaker::Constraint::Order[ip-192.168.24.12-with-ovndb_servers]/Pcmk_constraint[order-ovn-dbs-bundle-ip-192.168.24.12]/ensure: change from 'absent' to 'present' failed: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20200303-9-1fr9x0o create constraint failed: Error: Resource 'ip-192.168.24.12' does not exist~~ * https://logserver.rdoproject.org/40/25640/1/check/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/f29d8ce/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz~~ * https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/e7bbed3/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz~~ * filed https://bugs.launchpad.net/tripleo/+bug/1866031 * https://bugs.launchpad.net/tripleo/+bug/1865832 tox py3 virtualenv jobs failing ImportError cannot import name 'ContextManager' ::: ### OSP Closed :::spoiler OSP issues * shark13 finally resolved - https://projects.engineering.redhat.com/browse/RHOSINFRA-3019 ::: ## UNPINNING NOTES / pipeline: Directions for the unpinning event: :::spoiler Watch the reviews in [1] for merge, there may be some packages that fail to build and do not move from current -> consistent.   The RDO team will handle any of those issues.    The new content will then enter the component pipeline [2], known issues are tracked here [3], note that centos-8-scenario001 has been fixed w/ [4].  Please remember the component pipeline triggers the first component @ 12am UTC and the following 14 components 1.5 hours later.  Use test-project jobs if you want to speed things up, try not to starve nodepool though. * If you find a packaging or infra related issue, please alert the EOD for #rhos-ops * If you find a new issue, please create a launchpad bug and mark it a promotion blocker and owned by the associated DFG to the component ( this should be much more clear and easy to determine the owner ) * If you find a CI issue please open a launch pad and mark it alert. * If you are not sure what you have found leave notes in the ruck/rover hackmd [5] To find which packages are building or have FTBFS https://review.rdoproject.org/r/#/q/topic:rdo-FTBFS https://trunk.rdoproject.org/centos8-master/queue.html I'm really proud of your effort to get us here so quickly.  Thank you!! [1] https://review.rdoproject.org/r/#/c/25612/ https://review.rdoproject.org/r/#/c/25727/ [2] http://dashboard-ci.tripleo.org/d/UDA4H3aZk/component-pipeline?orgId=1 [3] https://hackmd.io/HrQd03c9SxOMtFPFrq50tg#Failing-component-pipeline-tests [4] https://review.opendev.org/#/c/712289/[5] https://hackmd.io/7MBqFHurTA2e5H8kYRwgag#UNPINNING-NOTES--Component-pipeline ::: :::info Leave notes here on what you find from the recent unpinning ::: ### Divide and conquer * Marios + Chandan EURO time * Ronelle + Wes NA time * Arx POC for tempest :) #### openstack-periodic-master: Failing tests: * fs001 * fs020 * fs035 * scenario010-ovn - no plan for this to be in criteria now #### components * fs1 this is fuzzy atm - periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-baremetal-master * undercloud install fail https://logserver.rdoproject.org/openstack-component-baremetal/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-baremetal-master/1417807/logs/undercloud/home/zuul/install-undercloud.log.txt.gz * unable to create route on br-ex; possibly only due to ens3 in network-scipts * ironic issues * cleaning * need to hold nodes and debug and discover * tempest issues * failing network-basic-ops *~~https://bugs.launchpad.net/tripleo/+bug/1867807 periodic- centos-8-ovb-3ctlr_1comp-featureset001-baremetal-master fails to download tempest cirros image~~ * **https://bugs.launchpad.net/tripleo/+bug/1867945 periodic centos8 standalone-full-tempest-tempest-master timeout** * tempest (?) related issues again 18/03 & no logs :/ * (compute) https://logserver.rdoproject.org/openstack-component-tempest/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-tempest-master/de4fa04/job-output.txt * (tempest) https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-compute-master/d55ae94/job-output.txt #### integration pipeline failures/bugs * https://bugs.launchpad.net/tripleo/+bug/1867664 Master periodic jobs are failing overcloud deploy with ''Container(s) with bad ExitCode: [''container-puppet-neutron''], check logs in /var/log/containers/stdouts/' * OVB is stuck on providing nodes https://bugs.launchpad.net/tripleo/+bug/1866204 overcloud deploy failing on fs030 and fs016 while pulling mariadb container from undercloud registry * https://bugs.launchpad.net/tripleo/+bug/1867599 overcloud deploy failed due to Systemd start for pcsd failed * https://bugs.launchpad.net/tripleo/+bug/1867602 #### Reviews still in play to add/move jobs * Scenario 12 * ~~[rdo-jobs] https://review.rdoproject.org/r/25910~~ * ~~[config] https://review.rdoproject.org/r/25911~~ * ~~[ci-config] https://review.rdoproject.org/r/25912~~ * f30 centos8 * ~~[rdo-jobs] https://review.rdoproject.org/r/25921~~ * ~~[config] https://review.rdoproject.org/r/25922~~ * ~~[ci-config] https://review.rdoproject.org/r/25923~~ * podman c8 * ~~[rdo-jobs] https://review.rdoproject.org/r/25916~~ * ~~[config] https://review.rdoproject.org/r/25920~~ * scenario010-ovn-provider-standalone check @ https://review.opendev.org/711507 , ~~periodic @ https://review.rdoproject.org/r/25745, layout @ https://review.rdoproject.org/r/25746 , criteria @https://review.rdoproject.org/r/25747~~, test @ https://review.rdoproject.org/r/25712 * fs039 ### fs39 tracked there now https://tree.taiga.io/project/tripleo-ci-board/task/1604 * ~~definitions@ https://review.rdoproject.org/r/25793~~ * ~~layout@ https://review.rdoproject.org/r/25794~~ * ~~criteria@ https://review.rdoproject.org/r/25795~~ * ~~fix tripleo-inventory https://review.opendev.org/712962~~ * ~~fix https://review.rdoproject.org/r/25932 config master: Set right centos image for ovb-manage rdo baremetal_image~~ * fix https://review.opendev.org/713659 tripleo-quickstart-extras master: Update freeipa-setup role for centos-8 - ansible_pkg_mgr and packages * fix https://review.opendev.org/714065 tripleo-quickstart-extras master: Use correct novajoin package for centos8 featureset 39 freeipa * fix https://review.opendev.org/715397 openstack/tripleo-quickstart master: Move featureset039 to os-tempest conditionally adds required vars ~~* fix https://review.rdoproject.org/r/26036 rdo-jobs master: WIP Adds use_os_tempest for centos8 fs39 check and periodic~~ **No instead do /715397** * fix ... **not needed** dont build images do r/#/c/26127 instead ~~https://review.opendev.org/714627 tripleo-quickstart-extras master: Fix build-images for centos8 jobs set tripleoclient and image-yaml~~ * fix ~~https://review.rdoproject.org/r/#/c/26127/ Set to_build false for periodic centos8 ovb featureset 039~~ * **test**@ https://review.rdoproject.org/r/25796 * Dont run c7 upgrades on master * https://review.opendev.org/713277 Only run upgrades centos7 jobs on stable branches * Reproducer * ~~ Update get-dlrn-hash-newest to work with aggregate hashes https://review.opendev.org/713333~~ * Added jobs for distgit testing - https://review.rdoproject.org/r/25659 * Baremetal (downstream) reviews * ~~https://code.engineering.redhat.com/gerrit/194899 Add upstream-centos-8 node~~ * ~~https://code.engineering.redhat.com/gerrit/#/c/194604/ Add baremetal base job for centos-8~~ * ~~https://code.engineering.redhat.com/gerrit/194901 Add centos-8 baremetal nodeset~~ * ~~https://softwarefactory-project.io/r/#/c/17819/ bump version to 2.34.0~~ * https://review.rdoproject.org/r/25995 Add get-hash role to dlrn-vars-setup.yml * https://code.engineering.redhat.com/gerrit/195040 DNM - Testing bm centos 8 master jobs * `~~"msg": "Failed to find required executable virtualenv in paths: /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin/:/usr/local/sbin"`~~ * ~~issue with upstream-centos-8 node ... need to chat with nhicher~~ * https://code.engineering.redhat.com/gerrit/195145 Adjust envA settings for centos 8