# Ruck Rover 2022-06-24 to 2022-06-30 ###### tags: `ruck_rover` ###### Previous RR notes: https://hackmd.io/OkRqfQ0SRUWwllxJthdO3Q ###### Next RR notes: https://hackmd.io/pRC9TDaoQLWqRSQioTUl5g [Cockpit](http://dashboard-ci.tripleo.org/d/HkOLImOMk/upstream-and-rdo-promotions?orgId=1) [Downstream cockpit](http://tripleo-cockpit.lab4.eng.bos.redhat.com) [OpenStack Program Meeting 2022]( https://docs.google.com/document/d/1n6ArkMh68R9zivjlyGbpedkggk1wMwEIcrMZSN2uIjc/edit) --- ## 2022-06-30 ### Reruns and Investigations: (crossed out := known bugs recorded above) **NOTE:** Watch for running `testproject` jobs on https://review.rdoproject.org/zuul/status and https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/status. * [C9 Master](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/2tivP9BWz/component-pipeline?orgId=1): * [C9 Wallaby](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/UcMR8py7z/component-pipeline-wallaby-centos9?orgId=1): * [C8 Wallaby](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/ZLvFbT9Mz/component-pipeline-wallaby?orgId=1): * [C8 Train](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/aJeQpzVGz/component-pipeline-train?orgId=1): * [RHEL9 OSP17](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/lF7RUpsnk/rhel9-rhos17-full-component-pipeline?orgId=1): * [RHEL8 OSP17](http://tripleo-cockpit.usersys.redhat.com/d/v8gltz4Mz/rhos-17-full-component-pipeline?orgId=1): * [RHEL8 OSP16.2](http://tripleo-cockpit.usersys.redhat.com/d/KyHCwLHMk/rhos-16-2-full-component-pipeline?orgId=1): * ~~https://code.engineering.redhat.com/gerrit/c/testproject/+/417258/2~~ * component octavia * [periodic-tripleo-ci-rhel-8-scenario010-standalone-octavia-rhos-16.2]( https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/builds?job_name=periodic-tripleo-ci-rhel-8-scenario010-standalone-octavia-rhos-16.2&skip=0) fails on tempest very often, only sporadic passes ### Known Bugs * ~~https://bugs.launchpad.net/tripleo/+bug/1980320 - scenario001 failing~~ * ~~fix merged: https://review.opendev.org/c/openstack/tripleo-ansible/+/848216~~ * `Error overcloud network provision failed` * `TASK [tripleo.operator.tripleo_overcloud_network_provision : overcloud network provision` fails in `periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master` * https://bugs.launchpad.net/tripleo/+bug/1980333 * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master&skip=0 * [passed once](https://logserver.rdoproject.org/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master/c2a9501/job-output.txt) and [failed once](https://logserver.rdoproject.org/37/41437/15/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master/053d715/job-output.txt) with commit_hash `e9a479519f0b7eb9b6aa86811ad992f9d5626111` and distro_hash `21f87ec3c18ca01bd0681ad8c14578a6ff52f012` * `tripleo-ci-centos-9-standalone` and `tripleo-ci-centos-9-standalone-on-multinode-ipa` are failing the `test_minimum_basic_instance_hard_reboot_after_vol_snap_deletion` test * https://bugs.launchpad.net/tripleo/+bug/1980255 * Failed to find floating IP * https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-standalone-on-multinode-ipa * Skiplist entry created: https://review.opendev.org/c/openstack/openstack-tempest-skiplist/+/848167 * ovn-dbs-bundle fails to start because ovn-ctl crashes with coredump generated * https://bugs.launchpad.net/tripleo/+bug/1979276 * puppet-glance-tripleo-standalone job failing * This is also affecting other jobs * Fix is being prepared as of 2022-06-24 https://bugzilla.redhat.com/show_bug.cgi?id=2100393 * No TripleO CI job affected so far as of 2022-06-29 * Fixes still work in progress as of 2022-06-30 * ~~`FATAL | Add hosts with mon label to ceph_mon inventory group for next play | undercloud | error={"msg": "Invalid data passed to 'loop', it requires a list, got this instead: [] + [ 'standalone.localdomain' ]. Hint: If you passed a list/dict of just one element, try adding wantlist=True to your lookup invocation or use q/query instead of lookup."}`~~ * Job `tripleo-ci-centos-9-scenario010-standalone` fails with ansible error in ceph tasks * https://bugs.launchpad.net/tripleo/+bug/1980185 * https://review.opendev.org/c/openstack/tripleo-ansible/+/848075 merged * https://review.opendev.org/c/openstack/tripleo-ansible/+/848111 - wallaby backport +w'd * master and wallaby both merged ### Intermittent Failures * `tempest.lib.exceptions.SSHTimeout: Connection to the 192.168.24.151 via SSH timed out.` * [Job `periodic-tripleo-ci-centos-9-standalone-full-tempest-scenario-wallaby` (but also master jobs) fails with the above error message in Tempest because of a kernel panic]( https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-full-tempest-scenario-wallaby/6200d3f/logs/undercloud/var/log/tempest/stestr_results.html.gz): > Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option. * (fs001 and fs035 OVB jobs failing tempest with identity/haproxy connection errors) * https://bugs.launchpad.net/bugs/1971465 * Intermittent failure * Track the health of fs01 and fs035 * Job `periodic-tripleo-ci-rhel-8-scenario010-standalone-octavia-rhos-16.2` fails on tempest often, but has sporadic passes * https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/builds?job_name=periodic-tripleo-ci-rhel-8-scenario010-standalone-octavia-rhos-16.2 ### Fixed * `neutron AttributeError: module 'pr2modules.netlink.exceptions' has no attribute 'NetlinkDumpInterrupted'` * https://bugs.launchpad.net/tripleo/+bug/1979646 * fs01 network wallaby component failing on neutron/dhcp-agent * reported 2022-06-23, need neutron/hardprov team involvement to resolve this * Yatin fixed it today (2022-06-24): https://review.rdoproject.org/r/q/topic:bug%252F1979646 * testing after wallaby c8 promotion: https://review.rdoproject.org/r/c/testproject/+/36254/125 * Wallaby C8/c9 passed on 2022-06-29: * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-tripleo-wallaby * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-network-wallaby * OVB stack create failures ... * https://bugs.launchpad.net/tripleo/+bug/1980343 * `Unable to establish connection to https://rdo.vexxhost.ca:5000/v3/auth/tokens:` * [example log](https://logserver.rdoproject.org/openstack-periodic-integration-stable4/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_1comp-featureset002-train/3c1b505/job-output.txt) * see conversation on #vexxhost * tried: https://review.rdoproject.org/r/c/config/+/43679 * `https://identity.rdo.mtl2.vexxhost.net/v3` is the right change * contact with `guilhermesp` on #vexxhost if the failure is still * New fix: https://review.rdoproject.org/r/c/config/+/43805 (According to Marios `Depends-On` does not work with config repo and thus has to be tested live.) - merged * Testproject: https://review.rdoproject.org/r/c/testproject/+/43806 * `ERROR: resources.StandaloneServiceChain<file:///home/zuul/tripleo-deploy/tripleo-heat-installer-templates/common/services/standalone-role.yaml>: The Resource Type (OS::TripleO::Services::NeutronCorePlugin) could not be found.` * https://bugs.launchpad.net/tripleo/+bug/1980202 * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-scenario012-standalone-compute-master * [periodic-tripleo-ci-centos-9-scenario012-standalone-compute-master]( https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario012-standalone-compute-master/fe1ad79/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz) * Fix: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/848136 - merged on 2022-06-30 * Waiting for clean run as of 2022-06-30 * Clean runs: https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-scenario012-standalone --- ## 2022-06-29 ## Downstream components ### 17.0 on 9 #### Octavia * periodic-tripleo-ci-rhel-9-scenario010-standalone-octavia-rhos-17 - https://bugzilla.redhat.com/show_bug.cgi?id=2109491 #### Tripleo * periodic-tripleo-ci-rhel-9-standalone-on-multinode-ipa-tripleo-rhos-17 - Parse container param failed * periodic-tripleo-ci-rhel-9-scenario004-standalone-tripleo-rhos-17 * periodic-tripleo-ci-rhel-9-standalone-tripleo-rhos-17 * periodic-tripleo-ci-rhel-9-ovb-3ctlr_1comp-featureset001-internal-tripleo-rhos- ### 16.2 on 8 #### Common * periodic-tripleo-ci-rhel-8-scenario001-standalone-common-rhos-16.2 and periodic-tripleo-ci-rhel-8-scenario002-standalone-common-rhos-16.2 * Cannot find config file: /etc/puppet/hiera.yaml (https://sf.hosted.upshift.rdu2.redhat.com/logs/71/421971/1/check/periodic-tripleo-ci-rhel-8-scenario001-standalone-common-rhos-16.2/73bbee6/logs/undercloud/home/zuul/standalone_deploy.log)(https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-component-common/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-scenario002-standalone-common-rhos-16.2/83e81ca/logs/undercloud/home/zuul/standalone_deploy.log) * periodic-tripleo-ci-rhel-8-scenario004-standalone-common-rhos-16.2 * rechecked as last recheck didnt have logs https://code.engineering.redhat.com/gerrit/c/testproject/+/422009 * periodic-tripleo-ci-rhel-8-standalone-common-rhos-16.2 - * Error: 'rabbitmqctl eval \"lists:keymember(rabbit, 1, application:which_applications()).\" | grep -q true' returned 1 instead of one of [0]", "Error: /Stage[main]/Tripleo::Profile::Pacemaker::Rabbitmq_bundle/Exec[rabbitmq-ready]/returns: change from 'notrun' to ['0'] failed: 'rabbitmqctl eval \"lists:keymember(rabbit, 1, application:which_applications()).\" | grep -q true' ## 2022-06-29 Gate is green centos-9-scenario010-kvm-internal-standalone is failing in octavia tempest tests ### master c9 rerun: https://review.rdoproject.org/r/c/testproject/+/44334 ### wallaby c9 ## Components: ### common on master and wallaby c9 https://bugs.launchpad.net/tripleo/+bug/1982744/ - workaround is to run sudo dnf reinstall openstack-selinux container-selinux sudo rpm -V openstack-selinux --- ## 2022-06-29 ## Failures that need attention: * OVB stack create failures ... * started this afternoon * see conversation on #vexxhost * tried: https://review.rdoproject.org/r/c/config/+/43679 "https://identity.rdo.mtl2.vexxhost.net/v3/" is teh right change - can resubmit with that change * pls touch base with guilhermesp on #vexxhost if the failure is still there in your morning * Gate failures: * https://bugs.launchpad.net/tripleo/+bug/1980202 - pls get https://review.opendev.org/c/openstack/tripleo-heat-templates/+/848136 merged * https://bugs.launchpad.net/tripleo/+bug/1980255 - skiplist entry created - pls follow up on that bug ### Reruns and Investigations: (crossed out := known bugs recorded above) * [C9 Master](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/2tivP9BWz/component-pipeline?orgId=1): * ~~https://review.rdoproject.org/r/c/testproject/+/41437/14~~ https://review.rdoproject.org/r/c/testproject/+/41437/15 * network and tripleo * rerun because intermittent failures (tempest) * [C9 Wallaby](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/UcMR8py7z/component-pipeline-wallaby-centos9?orgId=1): * ~~https://review.rdoproject.org/r/c/testproject/+/41437/11~~ * Component rerun * ~~https://review.rdoproject.org/r/c/testproject/+/41437/12~~ * fs002 wallaby rerun - passed * ~~https://review.rdoproject.org/r/c/testproject/+/41469/8~~ * Wallaby fs39 rerun: `POST_FAILURE` * ~~https://review.rdoproject.org/r/c/testproject/+/41278/14~~ * tripleo component jobs rerun * `periodic-tripleo-ci-centos-9-standalone-on-multinode-ipa-tripleo-master` failed * ~~https://review.rdoproject.org/r/c/testproject/+/36254/128~~ * CentOS-9 master network component jobs * `periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master` failed * https://review.rdoproject.org/r/c/testproject/+/42434/8 * `periodic-tripleo-ci-centos-9-ovb-1ctlr_2comp-featureset020-wallaby` * `periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp_1supp-featureset039-wallaby` * `periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-wallaby` * `periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-wallaby` * [C8 Wallaby](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/ZLvFbT9Mz/component-pipeline-wallaby?orgId=1): * ~~https://review.rdoproject.org/r/c/testproject/+/39960/83~~ * CentOS-8 master network component jobs * `periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-network-wallaby` failed * https://review.rdoproject.org/r/c/testproject/+/41465/6 * network component on wallaby c8 * Recheck'ed because of potential intermittent failure: > tripleoclient.exceptions.HeatPodMessageQueueException: Message queue for ephemeral heat not created in time * [C8 Train](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/aJeQpzVGz/component-pipeline-train?orgId=1): * [RHEL9 OSP17](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/lF7RUpsnk/rhel9-rhos17-full-component-pipeline?orgId=1): * ~~https://code.engineering.redhat.com/gerrit/c/testproject/+/417259~~ - passed * waiting for https://bugzilla.redhat.com/show_bug.cgi?id=2101803 to be solved, then rerun * ~~https://code.engineering.redhat.com/gerrit/c/testproject/+/211643/150~~ * `periodic-tripleo-ci-rhel-9-ovb-3ctlr_1comp-featureset001-internal-baremetal-rhos-17` - passed * [RHEL8 OSP17](http://tripleo-cockpit.usersys.redhat.com/d/v8gltz4Mz/rhos-17-full-component-pipeline?orgId=1): * [RHEL8 OSP16.2](http://tripleo-cockpit.usersys.redhat.com/d/KyHCwLHMk/rhos-16-2-full-component-pipeline?orgId=1): * https://code.engineering.redhat.com/gerrit/c/testproject/+/417258/2 * component octavia * [periodic-tripleo-ci-rhel-8-scenario010-standalone-octavia-rhos-16.2]( https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/builds?job_name=periodic-tripleo-ci-rhel-8-scenario010-standalone-octavia-rhos-16.2&skip=0) fails on tempest very often, only sporadic passes ### Known Bugs * `Error overcloud network provision failed` * https://bugs.launchpad.net/tripleo/+bug/1980333 * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master&skip=0 * `ERROR: resources.StandaloneServiceChain<file:///home/zuul/tripleo-deploy/tripleo-heat-installer-templates/common/services/standalone-role.yaml>: The Resource Type (OS::TripleO::Services::NeutronCorePlugin) could not be found.` * https://bugs.launchpad.net/tripleo/+bug/1980202 * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-scenario012-standalone-compute-master * [periodic-tripleo-ci-centos-9-scenario012-standalone-compute-master]( https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario012-standalone-compute-master/fe1ad79/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz) * `FATAL | Add hosts with mon label to ceph_mon inventory group for next play | undercloud | error={"msg": "Invalid data passed to 'loop', it requires a list, got this instead: [] + [ 'standalone.localdomain' ]. Hint: If you passed a list/dict of just one element, try adding wantlist=True to your lookup invocation or use q/query instead of lookup."}` * Job `tripleo-ci-centos-9-scenario010-standalone` fails with ansible error in ceph tasks * https://bugs.launchpad.net/tripleo/+bug/1980185 * https://review.opendev.org/c/openstack/tripleo-ansible/+/848075 merged * https://review.opendev.org/c/openstack/tripleo-ansible/+/848111 - wallaby backport +w'd * `neutron AttributeError: module 'pr2modules.netlink.exceptions' has no attribute 'NetlinkDumpInterrupted'` * https://bugs.launchpad.net/tripleo/+bug/1979646 * fs01 network wallaby component failing on neutron/dhcp-agent * reported 2022-06-23, need neutron/hardprov team involvement to resolve this * Yatin fixed it today (2022-06-24): https://review.rdoproject.org/r/q/topic:bug%252F1979646 * testing after wallaby c8 promotion: https://review.rdoproject.org/r/c/testproject/+/36254/125 * ovn-dbs-bundle fails to start because ovn-ctl crashes with coredump generated * https://bugs.launchpad.net/tripleo/+bug/1979276 * puppet-glance-tripleo-standalone job failing * This is also affecting other jobs * Fix is being prepared as of 2022-06-24 https://bugzilla.redhat.com/show_bug.cgi?id=2100393 * No TripleO CI job affected so far as of 2022-06-29 * Fixes still work in progress as of 2022-06-29 ### Intermittent Failures * `tempest.lib.exceptions.SSHTimeout: Connection to the 192.168.24.151 via SSH timed out.` * [Job `periodic-tripleo-ci-centos-9-standalone-full-tempest-scenario-wallaby` (but also master jobs) fails with the above error message in Tempest because of a kernel panic]( https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-full-tempest-scenario-wallaby/6200d3f/logs/undercloud/var/log/tempest/stestr_results.html.gz): > Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option. * (fs001 and fs035 OVB jobs failing tempest with identity/haproxy connection errors) * https://bugs.launchpad.net/bugs/1971465 * Intermittent failure * Track the health of fs01 and fs035 * Job `periodic-tripleo-ci-rhel-8-scenario010-standalone-octavia-rhos-16.2` fails on tempest often, but has sporadic passes * https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/builds?job_name=periodic-tripleo-ci-rhel-8-scenario010-standalone-octavia-rhos-16.2 ### Fixed * `Problem: cannot install the best update candidate for package libxml2-devel-2.9.13-1.el9.x86_64 ... nothing provides libxml2(x86-64)` * https://bugzilla.redhat.com/show_bug.cgi?id=2101803 * > [28.06.22 14:34] <jschlueter> https://errata.engineering.redhat.com/advisory/95678 is currently in push * `puppet-user: Error: Evaluation Error: Operator '[]' is not applicable to an Undef Value. (file: /etc/puppet/modules/openstacklib/manifests/defaults.pp, line: 9, column: 7) on node undercloud.localdomain` * https://bugs.launchpad.net/tripleo/+bug/1979985 * ~~[1979986](https://bugs.launchpad.net/tripleo/+bug/1979986) is a duplicate~~ * Rerun: https://review.rdoproject.org/r/c/testproject/+/39960/82 * https://code.engineering.redhat.com/gerrit/c/testproject/+/417253 * https://review.rdoproject.org/r/c/testproject/+/43770 * `paramiko.ssh_exception.NoValidConnectionsError: [Errno None] Unable to connect to port 22 on 192.168.24.199` * standalone network wallaby failing on network tempest tests * https://bugs.launchpad.net/tripleo/+bug/1979665 * Patch below affected by this bug * Saw it on tripleo-ci-centos-9-standalone * recheck'd on 2022-06-23 (known intermittent issue) * [job periodic-tripleo-ci-centos-8-standalone-network-wallaby still failing](https://logserver.rdoproject.org/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-network-wallaby/50057b2/logs/undercloud/var/log/tempest/stestr_results.html.gz) * `tripleo_cephadm_ceph_cli is undefined` * https://bugs.launchpad.net/tripleo/+bug/1979651 * Fix: https://review.opendev.org/q/I137e335abeedccad801cdc03feee654c3e42a0e2 * [sc001/sc004 pass for master TripleO](https://review.opendev.org/c/openstack/tripleo-heat-templates/+/847357) * [sc001/sc004 pass for wallaby TripleO](https://review.opendev.org/c/openstack/tripleo-heat-templates/+/847329) * [sc001/sc004 pass for master periodic](https://review.rdoproject.org/r/c/testproject/+/36256) * sc001 fails (sc004 passes) for wallaby periodic * Failed containers: gnocchi_db_sync, ceilometer_gnocchi_upgrade * https://review.rdoproject.org/r/c/testproject/+/37973 * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-scenario001-standalone-wallaby * Waiting for containers to rebuild and being pushed as of 2022-06-28 --- ## 2022-06-28 ### Promotions (very nicely done rr's :) ) * master * wallaby c9 * train * 16.2 * 17 on rhel-8 * 17 on rhel-9 * all the wallaby c9 components Still to promote: * master - network and tripleo (pls try promote these) * rhos-17 on rhel-9 - baremetal (last rerun: https://code.engineering.redhat.com/gerrit/c/testproject/+/211643) who wants to present on program call ... rlandy is updating the info as of EoD ### Reruns and Investigations: (crossed out := known bugs recorded above) * C9 Wallaby: * https://review.rdoproject.org/r/c/testproject/+/41437/11 Component rerun * https://review.rdoproject.org/r/c/testproject/+/41437/12 fs002 wallaby rerun * https://review.rdoproject.org/r/c/testproject/+/41469/8 Wallaby fs39 rerun * https://review.rdoproject.org/r/c/testproject/+/41278: tripleo component jobs rerun * https://review.rdoproject.org/r/c/testproject/+/39960: CentOS-9 master network component jobs * https://review.rdoproject.org/r/c/testproject/+/36254: CentOS-9 wallaby ntework component jobs * C8 Train: * ~~c8 train fs39, c8 ovb fs35~~ - periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-train - periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-train - ~~https://review.rdoproject.org/r/c/testproject/+/36254~~ - passed * ~~internal kvm job https://code.engineering.redhat.com/gerrit/c/testproject/+/417256~~ - passed * RHEL9 OSP17: * https://code.engineering.redhat.com/gerrit/c/testproject/+/417259 - waiting for https://bugzilla.redhat.com/show_bug.cgi?id=2101803 to be solved, then rerun * RHEL8 OSP17: * ~~ovb fs1, ovb fs35~~ * periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-internal-rhos-17 * periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset035-internal-rhos-17 * ~~https://code.engineering.redhat.com/gerrit/c/testproject/+/398758~~ - passed * RHEL8 OSP16.2: * ~~https://code.engineering.redhat.com/gerrit/c/testproject/+/417258~~ - passed ### Promotion Blockers ### Known Bugs * `puppet-user: Error: Evaluation Error: Operator '[]' is not applicable to an Undef Value. (file: /etc/puppet/modules/openstacklib/manifests/defaults.pp, line: 9, column: 7) on node undercloud.localdomain` * https://bugs.launchpad.net/tripleo/+bug/1979985 * ~~[1979986](https://bugs.launchpad.net/tripleo/+bug/1979986) is a duplicate~~ * Rerun: https://review.rdoproject.org/r/c/testproject/+/39960/82 * https://code.engineering.redhat.com/gerrit/c/testproject/+/417253 * https://review.rdoproject.org/r/c/testproject/+/43770 * `neutron AttributeError: module 'pr2modules.netlink.exceptions' has no attribute 'NetlinkDumpInterrupted'` * https://bugs.launchpad.net/tripleo/+bug/1979646 * fs01 network wallaby component failing on neutron/dhcp-agent * reported 2022-06-23, need neutron/hardprov team involvement to resolve this * Yatin fixed it today (2022-06-24): https://review.rdoproject.org/r/q/topic:bug%252F1979646 * testing after wallaby c8 promotion: https://review.rdoproject.org/r/c/testproject/+/36254/125 * `paramiko.ssh_exception.NoValidConnectionsError: [Errno None] Unable to connect to port 22 on 192.168.24.199` * standalone network wallaby failing on network tempest tests * https://bugs.launchpad.net/tripleo/+bug/1979665 * Patch below affected by this bug * Saw it on tripleo-ci-centos-9-standalone * recheck'd on 2022-06-23 (known intermittent issue) * [job periodic-tripleo-ci-centos-8-standalone-network-wallaby still failing](https://logserver.rdoproject.org/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-network-wallaby/50057b2/logs/undercloud/var/log/tempest/stestr_results.html.gz) * ~~FAILED: adding user 'ceph-admin', exit code: 9~~ * https://bugs.launchpad.net/tripleo/+bug/1979093 * https://review.opendev.org/c/openstack/python-tripleoclient/+/847844 - merged * `tripleo_cephadm_ceph_cli is undefined` * https://bugs.launchpad.net/tripleo/+bug/1979651 * Fix: https://review.opendev.org/q/I137e335abeedccad801cdc03feee654c3e42a0e2 * [sc001/sc004 pass for master TripleO](https://review.opendev.org/c/openstack/tripleo-heat-templates/+/847357) * [sc001/sc004 pass for wallaby TripleO](https://review.opendev.org/c/openstack/tripleo-heat-templates/+/847329) * [sc001/sc004 pass for master periodic](https://review.rdoproject.org/r/c/testproject/+/36256) * sc001 fails (sc004 passes) for wallaby periodic * Failed containers: gnocchi_db_sync, ceilometer_gnocchi_upgrade * https://review.rdoproject.org/r/c/testproject/+/37973 * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-scenario001-standalone-wallaby&skip=0 * Waiting for containers to rebuild and being pushed as of 2022-06-28 * ovn-dbs-bundle fails to start because ovn-ctl crashes with coredump generated * https://bugs.launchpad.net/tripleo/+bug/1979276 * puppet-glance-tripleo-standalone job failing * This is also affecting other jobs * Fix is being prepared as of 2022-06-24 https://bugzilla.redhat.com/show_bug.cgi?id=2100393 * No TripleO CI job affected so far as of 2022-06-24?!? * Fixes still work in progress as of 2022-06-27 * `Problem: cannot install the best update candidate for package libxml2-devel-2.9.13-1.el9.x86_64 ... nothing provides libxml2(x86-64)` * https://bugzilla.redhat.com/show_bug.cgi?id=2101803 * > [28.06.22 14:34] <jschlueter> https://errata.engineering.redhat.com/advisory/95678 is currently in push * `ERROR: resources.StandaloneServiceChain<file:///home/zuul/tripleo-deploy/tripleo-heat-installer-templates/common/services/standalone-role.yaml>: The Resource Type (OS::TripleO::Services::NeutronCorePlugin) could not be found.` * https://bugs.launchpad.net/tripleo/+bug/1980202 * https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-scenario012-standalone-compute-master * [periodic-tripleo-ci-centos-9-scenario012-standalone-compute-master]( https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario012-standalone-compute-master/fe1ad79/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz) ### Intermittent Failures * `tempest.lib.exceptions.SSHTimeout: Connection to the 192.168.24.151 via SSH timed out.` * [Job `periodic-tripleo-ci-centos-9-standalone-full-tempest-scenario-wallaby` (but also master jobs) fails with the above error message in Tempest because of a kernel panic]( https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-full-tempest-scenario-wallaby/6200d3f/logs/undercloud/var/log/tempest/stestr_results.html.gz): > Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option. * (fs001 and fs035 OVB jobs failing tempest with identity/haproxy connection errors) * https://bugs.launchpad.net/bugs/1971465 * Intermittent failure * Track the health of fs01 and fs035 ### Fixed * `error: failed to connect to the hypervisor\nerror: Cannot create user runtime directory '/run/user/1001/libvirt': Permission denied` * https://bugzilla.redhat.com/show_bug.cgi?id=2101413 * Rerun: https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/status#211643 * [Job `periodic-tripleo-ci-rhel-8-bm_envD-3ctlr_1comp-featureset035-rhos-16.2` passed today]( https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/build/22b62419bc9c444d8841436811ae36e4) * `tripleo-ci-centos-9-scenario000-multinode-oooq-container-updates fails with file not found: /home/zuul//tripleo-ansible-inventory.yaml` * https://bugs.launchpad.net/tripleo/+bug/1979707 * Rabi opened [LP 1979707](https://bugs.launchpad.net/tripleo/+bug/1979707) * Rabi proposed fix for master: https://review.opendev.org/c/openstack/tripleo-upgrade/+/847519 - merged * Rabi's Wallaby backport is underway: https://review.opendev.org/c/openstack/tripleo-upgrade/+/847680 - now merged * Job `tripleo-ci-centos-9-scenario000-multinode-oooq-container-updates` passes now: https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-scenario000-multinode-oooq-container-updates&skip=0 --- ## 2022-06-27 ### Patch status * The following are merged: * https://review.opendev.org/c/openstack/tripleo-upgrade/+/847680/ * https://review.opendev.org/c/openstack/tripleo-common/+/847437 * Main task - promote wallaby and clear component lines * https://review.rdoproject.org/r/c/testproject/+/36254 (rerun of failing train - plus will need a rerun on the internal kvm job) * https://code.engineering.redhat.com/gerrit/c/testproject/+/398758 - rerun of failng rhos 17 on rhel-8 * Try promote 16.2 and 17 ### Promotion Blockers * Amol - you need ~~https://review.opendev.org/c/openstack/tripleo-upgrade/+/847680/~~ and https://review.opendev.org/c/openstack/tripleo-common/+/847437 to clean up wallaby ( 8474 keep sfailing standalone). After you fix the promoter and make sure maste promotes - ensure these two merge - contact fmount if you need help. We need these merged ASAP. ### Known Bugs * `puppet-user: Error: Evaluation Error: Operator '[]' is not applicable to an Undef Value. (file: /etc/puppet/modules/openstacklib/manifests/defaults.pp, line: 9, column: 7) on node undercloud.localdomain` * https://bugs.launchpad.net/tripleo/+bug/1979985 * ~~[1979986](https://bugs.launchpad.net/tripleo/+bug/1979986) is a duplicate~~ * `tripleo-ci-centos-9-scenario000-multinode-oooq-container-updates fails with file not found: /home/zuul//tripleo-ansible-inventory.yaml` * https://bugs.launchpad.net/tripleo/+bug/1979707 * Rabi opened [LP 1979707](https://bugs.launchpad.net/tripleo/+bug/1979707) * Rabi proposed fix for master: https://review.opendev.org/c/openstack/tripleo-upgrade/+/847519 - merged * Rabi's Wallaby backport is underway: https://review.opendev.org/c/openstack/tripleo-upgrade/+/847680 - now merged * `tripleo_cephadm_ceph_cli is undefined` [Fixed] * https://bugs.launchpad.net/tripleo/+bug/1979651 * fixed by https://review.opendev.org/q/I137e335abeedccad801cdc03feee654c3e42a0e2 * failed tripleo-ci-centos-9-standalone because of https://bugs.launchpad.net/tripleo/+bug/1979665 * tested in master TripleO https://review.opendev.org/c/openstack/tripleo-heat-templates/+/847357 * sc001: Success * sc004: Success * tested in wallaby TripleO https://review.opendev.org/c/openstack/tripleo-heat-templates/+/847329 * sc001: Failure (Failed containers: gnocchi_db_sync, ceilometer_gnocchi_upgrade) * sc004: Success * tested in wallaby periodic https://review.rdoproject.org/r/c/testproject/+/37973 * sc001: Failure (Failed containers: gnocchi_db_sync, ceilometer_gnocchi_upgrade) * sc004: Success * tested in master periodic https://review.rdoproject.org/r/c/testproject/+/36256 * sc001: Success * sc004: Success * __Updates__: * Master patch (847323) in **gate** * `neutron AttributeError: module 'pr2modules.netlink.exceptions' has no attribute 'NetlinkDumpInterrupted'` * https://bugs.launchpad.net/tripleo/+bug/1979646 * fs01 network wallaby component failing on neutron/dhcp-agent * reported 2022-06-23, need neutron/hardprov team involvement to resolve this * Yatin fixed it today (2022-06-24): https://review.rdoproject.org/r/q/topic:bug%252F1979646 * ovn-dbs-bundle fails to start because ovn-ctl crashes with coredump generated * https://bugs.launchpad.net/tripleo/+bug/1979276 * puppet-glance-tripleo-standalone job failing * This is also affecting other jobs * Fix is being prepared as of 2022-06-24 https://bugzilla.redhat.com/show_bug.cgi?id=2100393 * No TripleO CI job affected so far as of 2022-06-24?!? * Fixes still work in progress as of 2022-06-27 * `error: failed to connect to the hypervisor\nerror: Cannot create user runtime directory '/run/user/1001/libvirt': Permission denied` * https://bugzilla.redhat.com/show_bug.cgi?id=2101413 * Rerun: https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/status#211643 ### Investigations and Reruns: (crossed out := known bugs recorded above) * Wallaby C9: * RHEL9 OSP17: * RHEL8 OSP17: * ~~Rerun of periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset035-internal-rhos-17 in progress and QE https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/pipeline/job/pipeline_integration-pcci-17_dlrn-rhel-8.4-virthost-3cont_2comp_3ceph-ipv6-geneve-ceph/~~ - passed on 2022-06-27 * RHEL8 OSP16.2: * https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/builds?job_name=periodic-tripleo-ci-rhel-8-bm_envD-3ctlr_1comp-featureset035-rhos-16.2 - Rerun 1 (Jakob's): https://code.engineering.redhat.com/gerrit/c/testproject/+/416778 - Rerun 2 (Sandeep's): https://code.engineering.redhat.com/gerrit/c/testproject/+/211643 ### Intermittent Failures * `paramiko.ssh_exception.NoValidConnectionsError: [Errno None] Unable to connect to port 22 on 192.168.24.199` * standalone network wallaby failing on network tempest tests * https://bugs.launchpad.net/tripleo/+bug/1979665 * Patch below affected by this bug * Saw it on tripleo-ci-centos-9-standalone * recheck'd on 2022-06-23 (known intermittent issue) * [job periodic-tripleo-ci-centos-8-standalone-network-wallaby still failing](https://logserver.rdoproject.org/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-network-wallaby/50057b2/logs/undercloud/var/log/tempest/stestr_results.html.gz) * `FAILED: adding user 'ceph-admin', exit code: 9` * Intermittent failure * https://bugs.launchpad.net/tripleo/+bug/1979093 * https://review.opendev.org/c/openstack/tripleo-ansible/+/846530 * * needs to be tested in wallaby before merging in master * (fs001 and fs035 OVB jobs failing tempest with identity/haproxy connection errors) * https://bugs.launchpad.net/bugs/1971465 * Intermittent failure * Track the health of fs01 and fs035 ### Fixed * Amol - the promoter is broken - blocking promotion of master hash 76f5776506b3ce257820036c69be89cb - pls fix ASAP to promote master - Fixed * `curl: (60) SSL certificate problem: certificate has expired` breaks RHEL8 OSP16.2 * https://bugzilla.redhat.com/show_bug.cgi?id=2101405 ### Promotions * osp16.2 rhel8 - 2022-06-23 * osp17 rhel8 - 2022-06-20 * osp17 rhel9 - 2022-06-21 --- ## 2022-06-24 ### Promotion Blockers * Amol - the promoter is broken - blocking promotion of master hash 76f5776506b3ce257820036c69be89cb - pls fix ASAP to promote master * Amol - you need https://review.opendev.org/c/openstack/tripleo-upgrade/+/847680/ and https://review.opendev.org/c/openstack/tripleo-common/+/847437 to clean up wallaby ( 8474 keep sfailing standalone). After you fix the promoter and make sure maste promotes - ensure these two merge - contact fmount if you need help. We need these meregd ASAP. * Jakob - https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/builds?job_name=periodic-tripleo-ci-rhel-8-bm_envD-3ctlr_1comp-featureset035-rhos-16.2&skip=0 has a legit failure - pls ask Sandepp to help you fix that - 16.2 is way behind on promotions * Jakob - rhos-17 on rhel-8: rerun of periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset035-internal-rhos-17 in progress and QE https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/pipeline/job/pipeline_integration-pcci-17_dlrn-rhel-8.4-virthost-3cont_2comp_3ceph-ipv6-geneve-ceph/ - pls check in your morning - pls join internal chanel #openstack-pcci - pinged attila ### Known Bugs * `tripleo-ci-centos-9-scenario000-multinode-oooq-container-updates fails with file not found: /home/zuul//tripleo-ansible-inventory.yaml` * https://bugs.launchpad.net/tripleo/+bug/1979707 * Rabi opened [LP 1979707](https://bugs.launchpad.net/tripleo/+bug/1979707) and proposed fix ~~https://review.opendev.org/c/openstack/tripleo-upgrade/+/847519~~ * wallaby backport still not merged: * `tripleo_cephadm_ceph_cli is undefined` [Fixed] * https://bugs.launchpad.net/tripleo/+bug/1979651 * fixed by https://review.opendev.org/q/I137e335abeedccad801cdc03feee654c3e42a0e2 * failed tripleo-ci-centos-9-standalone because of https://bugs.launchpad.net/tripleo/+bug/1979665 * tested in master TripleO https://review.opendev.org/c/openstack/tripleo-heat-templates/+/847357 * sc001: Success * sc004: Success * tested in wallaby TripleO https://review.opendev.org/c/openstack/tripleo-heat-templates/+/847329 * sc001: Failure (Failed containers: gnocchi_db_sync, ceilometer_gnocchi_upgrade) * sc004: Success * tested in wallaby periodic https://review.rdoproject.org/r/c/testproject/+/37973 * sc001: Failure (Failed containers: gnocchi_db_sync, ceilometer_gnocchi_upgrade) * sc004: Success * tested in master periodic https://review.rdoproject.org/r/c/testproject/+/36256 * sc001: Success * sc004: Success * __Updates__: * Master patch (847323) in **gate** * `neutron AttributeError: module 'pr2modules.netlink.exceptions' has no attribute 'NetlinkDumpInterrupted'` * https://bugs.launchpad.net/tripleo/+bug/1979646 * fs01 network wallaby component failing on neutron/dhcp-agent * reported 2022-06-23, need neutron/hardprov team involvement to resolve this * Yatin fixed it today (2022-06-24): https://review.rdoproject.org/r/q/topic:bug%252F1979646 * ovn-dbs-bundle fails to start because ovn-ctl crashes with coredump generated * https://bugs.launchpad.net/tripleo/+bug/1979276 * puppet-glance-tripleo-standalone job failing * This is also affecting other jobs * Fix is being prepared as of 2022-06-24 https://bugzilla.redhat.com/show_bug.cgi?id=2100393 * No TripleO CI job affected so far as of 2022-06-24?!? ### Investigations and Reruns: (crossed out := known bugs recorded above) * Wallaby C9: * ~~[periodic-tripleo-ci-centos-9-scenario004-standalone-wallaby](https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario004-standalone-wallaby/431393b/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz): `FATAL | search triple_run_cephadm_output of cephadm run(s) non-zero return codes`~~ * ~~[periodic-tripleo-ci-centos-9-scenario002-standalone-wallaby](https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario002-standalone-wallaby/2c90044/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz): `gnocchi.incoming.SackDetectionError: Connection closed by server.`/ `redis.exceptions.ConnectionError: Connection closed by server.`~~ * ~~[periodic-tripleo-ci-centos-9-scenario001-standalone-wallaby](https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario001-standalone-wallaby/928ca67/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz): `FATAL | search triple_run_cephadm_output of cephadm run(s) non-zero return codes`~~ * ~~[periodic-tripleo-ci-centos-9-scenario010-kvm-internal-standalone-wallaby](https://sf.hosted.upshift.rdu2.redhat.com/logs/34/43534/187/check-rdo/periodic-tripleo-ci-centos-9-scenario010-kvm-internal-standalone-wallaby/dfadf25/logs/undercloud/home/zuul/standalone_deploy.log) / [periodic-tripleo-ci-centos-9-scenario010-ovn-provider-standalone-wallaby](https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-scenario010-ovn-provider-standalone-wallaby/3c402fb/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz): `FATAL | search triple_run_cephadm_output of cephadm run(s) non-zero return codes`~~ * ~~Rerunning `periodic-tripleo-ci-centos-9-standalone-wallaby` and `periodic-tripleo-ci-centos-9-standalone-full-tempest-api-wallaby`: https://review.rdoproject.org/r/c/testproject/+/43724 - passed~~ * network component: https://review.rdoproject.org/r/c/testproject/+/43727 * RHEL9 OSP17: * ~~Rerunning `periodic-tripleo-ci-rhel-9-standalone-full-tempest-api-rhos-17` and `periodic-tripleo-ci-rhel-9-ovb-3ctlr_1comp-featureset035-internal-rhos-17`: https://code.engineering.redhat.com/gerrit/c/testproject/+/416750 - passed~~ * RHEL8 OSP16.2: * https://code.engineering.redhat.com/gerrit/c/testproject/+/416778 ### Intermittent Failures * `paramiko.ssh_exception.NoValidConnectionsError: [Errno None] Unable to connect to port 22 on 192.168.24.199` * standalone network wallaby failing on network tempest tests * https://bugs.launchpad.net/tripleo/+bug/1979665 * Patch below affected by this bug * Saw it on tripleo-ci-centos-9-standalone * recheck'd on 2022-06-23 (known intermittent issue) * [job periodic-tripleo-ci-centos-8-standalone-network-wallaby still failing](https://logserver.rdoproject.org/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-network-wallaby/50057b2/logs/undercloud/var/log/tempest/stestr_results.html.gz) * `FAILED: adding user 'ceph-admin', exit code: 9` - fultonj * Intermittent failure * https://bugs.launchpad.net/tripleo/+bug/1979093 * https://review.opendev.org/c/openstack/python-tripleoclient/+/847844 * does not need to be tested in wallaby before merging in master * The python-tripleoclient change won't be run by wallaby jobs until we merge https://review.opendev.org/q/topic:standalone_ceph_wallaby_squash * (fs001 and fs035 OVB jobs failing tempest with identity/haproxy connection errors) * https://bugs.launchpad.net/bugs/1971465 * Intermittent failure * Track the health of fs01 and fs035 ### Fixed * ~~periodic-tripleo-ci-centos-9-scenario001-standalone failed to download the ceph container during bootstrap~~ * https://bugs.launchpad.net/tripleo/+bug/1978998 * Periodic Job Failure (**gate**) * https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/846231 \ (Override Ceph --container-namespace for periodic jobs) * intermittently failed for [LP 1979093](https://bugs.launchpad.net/tripleo/+bug/1979093) (above) * Tested by https://review.rdoproject.org/r/c/testproject/+/36256 * [Patch merged today](https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/846231) ### Promotions * osp16.2 rhel8 - 2022-06-23 * osp17 rhel8 - 2022-06-20 * osp17 rhel9 - 2022-06-21 ---