# Ruck Rover - 27th May 2022 - 02nd June 2022 ###### tags: `ruck_rover` ###### Previous RR notes: https://hackmd.io/2hB-P772SqyqDs0KKZzZEQ?view [Cockpit](http://dashboard-ci.tripleo.org/d/HkOLImOMk/upstream-and-rdo-promotions?orgId=1) [Downstream cockpit](http://tripleo-cockpit.lab4.eng.bos.redhat.com) ## Thursday 02 June #### new/transient/no bug yet * https://logserver.rdoproject.org/openstack-periodic-integration-stable1-cs8/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-wallaby/8dade15/job-output.txt * ~~because of https://review.rdoproject.org/r/c/rdo-jobs/+/42881~~ ? * trying/test with https://review.rdoproject.org/r/c/testproject/+/43374 Depends-On: https://review.rdoproject.org/r/c/rdo-jobs/+/43360 * not this same result https://logserver.rdoproject.org/78/43378/3/check/periodic-tripleo-ci-centos-9-standalone-full-tempest-api-compute-master/ff217a2/job-output.txt * because of this https://review.opendev.org/c/openstack/tripleo-ci/+/844389 ? * testing v4 @ https://review.rdoproject.org/r/c/testproject/+/43378 https://review.rdoproject.org/r/c/testproject/+/43374 looks good so far #### Bugs: * https://bugs.launchpad.net/tripleo/+bug/1976614 victoria standalone & undercloud upgrade jobs broken at undercloud-setup package deps ## Wednesday 01 June (and earlier ongoing things tracked here) #### Bugs: * https://bugs.launchpad.net/tripleo/+bug/1973223 Master Sc010-kvm job is failing on octavia related tempest test: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest * https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time. * https://bugs.launchpad.net/tripleo/+bug/1972163 cinder tempest.api.compute.admin.test_volumes_negative* tempest tests failing randomly in multiple branches. * https://bugs.launchpad.net/bugs/1971465 fs001 and fs035 OVB jobs failing tempest - identity/haproxy connection errors * https://bugzilla.redhat.com/show_bug.cgi?id=2089724 tripleo_nodes_validation failing with packet loss in the Network availability validation block * ~~https://bugs.launchpad.net/tripleo/+bug/1973568~~ Master Scenario002 is failing on Barbican related tempest test - tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received , Details: 503 * ~~https://bugzilla.redhat.com/show_bug.cgi?id=2089304~~ fs020 and full-tempest-scenario job failing on tempest test neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle (edit) --- ## ***STOP (all tracked bugs duplicated above stop scrolling) STOP*** --- --- ## ***STOP (all tracked bugs duplicated above stop scrolling) STOP*** --- ## ~~Tuesday 31 May~~ #### Bugs: * https://bugs.launchpad.net/tripleo/+bug/1973223 Master Sc010-kvm job is failing on octavia related tempest test: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest * https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time. * https://bugs.launchpad.net/tripleo/+bug/1972163 cinder tempest.api.compute.admin.test_volumes_negative* tempest tests failing randomly in multiple branches. * https://bugs.launchpad.net/bugs/1971465 fs001 and fs035 OVB jobs failing tempest - identity/haproxy connection errors * https://bugzilla.redhat.com/show_bug.cgi?id=2089304 fs020 and full-tempest-scenario job failing on tempest test neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle (edit) * https://bugs.launchpad.net/tripleo/+bug/1973568 Master Scenario002 is failing on Barbican related tempest test - tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received , Details: 503 * https://bugzilla.redhat.com/show_bug.cgi?id=2089724 tripleo_nodes_validation failing with packet loss in the Network availability validation block --- * downstream: * rhos17 on rhel9 : * build-containers-ubi-9-internal-rhel-9-build-push-upload-rhos-17 failing due [1] * [1] https://bugzilla.redhat.com/show_bug.cgi?id=2091816 * fix: https://code.engineering.redhat.com/gerrit/c/openstack/tripleo-ci-internal-jobs/+/412327 (works) * TO Do's: * for 16.2: * keep eye on fs020 results on testproject patch: https://code.engineering.redhat.com/gerrit/c/testproject/+/315285 * **PROMOTED** * for rhos17 on rhel8: waiting for next integration run result * (rlandy) fix for failing test - https://review.opendev.org/c/openstack/neutron/+/843763/ is not yet passed in the network component line and has not reached downstream. Rerunning failed jobs: https://review.rdoproject.org/r/c/testproject/+/36254 * for rhos17 on rhel9: waiting for tesproject patch result: https://code.engineering.redhat.com/gerrit/c/testproject/+/412202 once it pass need to rekick the whole line * (rlandy) containers and images builds pass - tests are still failing on https://bugzilla.redhat.com/2089724 (new ovn did not help) NEEDS INVESTIGATION TOMORROW ##### Promotions upstream handoff: ***1*** TRAIN * latest buildset https://review.rdoproject.org/zuul/buildset/0eb15e91b17d49dca312ae072d972aea - many fails stackviz thing https://bugs.launchpad.net/tripleo/+bug/1976247 * digging for candidate @ http://promoter.rdoproject.org/promoter_logs/centos8_train.log - https://trunk.rdoproject.org/centos8-train/tripleo-ci-testing/81/64/ from 27th :/ * rekicked line manually => decent result https://review.rdoproject.org/zuul/buildset/87f1a28cf802437f8823f57f9d2efe4c only fs35 timeout * posted chaser fs35 rerun https://review.rdoproject.org/r/c/testproject/+/43298 * (rlandy) rerun twice - second run still in progress (if not same tempest tests, suggestion to comment this test out and promoted) ***2*** ~~WALLABY 9~~ promoted * nice build @ https://review.rdoproject.org/zuul/buildset/bc85c2bb90bd4b0696a407f79c40fa84 * brief dig inconsistent fails @ https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp_1supp-featureset039-wallaby * posted chaser @ https://review.rdoproject.org/r/c/testproject/+/43284 * recheck tempest :/ https://review.rdoproject.org/r/c/testproject/+/43284/1#message-6da3a4d7a2b099ba15941eaae92637a645f0af94 => green & promoted ***3*** WALLABY/8 * buildset https://review.rdoproject.org/zuul/buildset/0d2c138a8d4d4a87a4c8fc92f6d55783 - all 3 fails stackviz issue. line currently running no test yet * (rlandy) only missing fs035 - you can decide to skip and promo if no new failures ***4*** MASTER * buildset https://review.rdoproject.org/zuul/buildset/0771b066f0364ba18ad462f5883ba7eb 20/35/39 (2x tempest 1x mirror) * chasing with https://review.rdoproject.org/r/c/testproject/+/43286 * dig @ http://promoter.rdoproject.org/promoter_logs/centos9_master.log another candidate from 29th https://trunk.rdoproject.org/centos9-master/tripleo-ci-testing/41/f9/ chasing with https://review.rdoproject.org/r/c/testproject/+/43285 * (rlandy) https://review.rdoproject.org/r/c/testproject/+/43286 failed twice - with what looks like a legit failure - same test neutron_tempest_plugin.scenario.test_security_groups.NetworkSecGroupTest. BUT ... the in the next hash fs020 passes (failed fs039 and fs064). So skipping seems not needed - we have a choice to ignore that failure and promo or try chase later hash ***5*** VICTORIA * https://review.rdoproject.org/zuul/buildset/237623aa98514f9a924981868450b10b * new bug standalone/undercloud upgrade package conflict ? https://logserver.rdoproject.org/openstack-periodic-integration-stable2/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-undercloud-upgrade-victoria/87354dd/logs/undercloud/home/zuul/install_packages.sh.log.txt.gz only seen once * posted chaser https://review.rdoproject.org/r/c/testproject/+/43296 #### new/transient/no bug yet * possible breakages in ceph jobs until these are all merged https://review.opendev.org/q/topic:ceph_promotion_pipeline ## ~~Monday 30 May~~ #### Bugs: * https://bugs.launchpad.net/tripleo/+bug/1973223 Master Sc010-kvm job is failing on octavia related tempest test: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest * https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time. * https://bugs.launchpad.net/tripleo/+bug/1972163 cinder tempest.api.compute.admin.test_volumes_negative* tempest tests failing randomly in multiple branches. * https://bugs.launchpad.net/bugs/1971465 fs001 and fs035 OVB jobs failing tempest - identity/haproxy connection errors * https://bugzilla.redhat.com/show_bug.cgi?id=2089304 fs020 and full-tempest-scenario job failing on tempest test neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle (edit) * https://bugs.launchpad.net/tripleo/+bug/1973568 Master Scenario002 is failing on Barbican related tempest test - tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received , Details: 503 * ~~https://bugzilla.redhat.com/show_bug.cgi?id=2091502~~ ERROR: Cannot install stackviz because these package versions have conflicting dependencies * downstream testproject: https://code.engineering.redhat.com/gerrit/c/testproject/+/315285 [DNM] Test rhos16.2 failing jobs * upstream bug: ~~https://bugs.launchpad.net/tripleo/+bug/1976247~~ wallaby gate blocker tripleo-ci-centos-8-standalone ERROR: Cannot install stackviz * ~~https://bugs.launchpad.net/tripleo/+bug/1976251~~ [CI] tox-ansible-test-sanity doesn't take the "ignore" anymore * ~~https://bugs.launchpad.net/tripleo/+bug/1975917~~ AttributeError: 'Service' object has no attribute 'enabled' * ~~https://code.engineering.redhat.com/gerrit/c/networking-ovn/+/411213~~ ``` <bhagyashris> slaweq, hey can you help us to merge this one https://code.engineering.redhat.com/gerrit/c/networking-ovn/+/411213 <bhagyashris> we are blocked due to this ^ <slaweq> bhagyashris: sure, looking <slaweq> bhagyashris: done <bhagyashris> slaweq, thanks ``` * regarding the curl error: * pinged migarcia ``` <bhagyashris> migarcia, hey <bhagyashris> around? <migarcia> bhagyashris: I am, what's up? <bhagyashris> we are facing one issue build push upload image <bhagyashris> https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-17-rhel9/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-build-containers-ubi-9-internal-rhel-9-build-push-upload-rhos-17/1c70911/logs/container-builds/d200f2a7-c38e-49ad-b9ef-e83cabfa5fc0/base/base-build.log <bhagyashris> lon is out today <bhagyashris> issue is: Errors during downloading metadata for repository 'osptrunk-candidate-deps': <bhagyashris> - Curl error (28): Timeout was reached for http://download.eng.bos.redhat.com/brewroot/repos/rhos-17.0-rhel-9-trunk-candidate/latest/x86_64/repodata/f6d120c5ebe86676cd598a6c179f7bef99e4aa3fc54f9291e27708b502d1f7fc-primary.xml.gz <migarcia> bhagyashris: I would rekick, could be a network blip or that the repo was regenerated while the job was running <migarcia> rhos-17.0-rhel-9-trunk-candidate/latest symlink gets updated regularly as new builds are tagged in <bhagyashris> migarcia, ack thanks ! will check the result in the rekick <migarcia> cool, let me know <bhagyashris> migarcia, thanks <bhagyashris> let me know once you re kicked <bhagyashris> migarcia, hey you are rekicking or should i rekicked <migarcia> bhagyashris: please do <bhagyashris> migarcia, ack <bhagyashris> migarcia, hey we are still with same issue on the recent run <bhagyashris> https://sf.hosted.upshift.rdu2.redhat.com/logs/94/947e8a93a865e16481d14a1dd9fe1f91216e1a8d/openstack-periodic-integration-rhos-17-rhel9/periodic-tripleo-build-containers-ubi-9-internal-rhel-9-build-push-upload-rhos-17/45338ea/logs/container-builds/a3e81dbf-42ba-42e9-bf72-d5aeb0e65b4f/base/base-build.log <migarcia> bhagyashris: huh, I can download that file just fine. <migarcia> and it looks like the job was also downloading it fine, but very slow for some reason <migarcia> osptrunk-candidate-deps 506 B/s | 64 kB 02:10 <ysandeep> bhagyashris: could you hold a node and check mtu on cni-podman bridge <ysandeep> bhagyashris, sounds similiar to https://bugzilla.redhat.com/show_bug.cgi?id=2060932 <bhagyashris> ysandeep, let me hit the testproject patch ``` **Update:** recently podman version updated from 2:4.0.2-6.el9_0 to 2:4.0.2-7.el9_0 which basically includes the new dependencies "netavark" and that creates the "podman0" bridget from here https://github.com/containers/netavark/blob/02e031fdd9f7cd849c4fdd18cdd1ecb1a135485f/src/test/config/setupopts2.test.json#L14-L22 and takes mtu value as 1500 which basically take more time to download metadata for repository 'osptrunk-candidate-deps' and get timed and failed. Will debug more tomorrow on that and will file bug if required #### new/transient/no bug yet * ~~c9 broken container build https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-build-containers-centos-9-push-master/a0d5760/logs/build.log~~ * not a blocker because https://github.com/rdo-infra/rdo-jobs/blob/c72a5465ae3b5425bb983f8a1744af8b3e839c34/zuul.d/integration-pipeline-main.yaml#L10 we are pushing to quay now that job is OK ## Friday 27 May ### Bugs: * https://bugs.launchpad.net/tripleo/+bug/1973223 Master Sc010-kvm job is failing on octavia related tempest test: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest * https://bugs.launchpad.net/tripleo/+bug/1973568 Master Scenario002 is failing on Barbican related tempest test - tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received , Details: 503 * https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time. * https://bugs.launchpad.net/tripleo/+bug/1972163 cinder tempest.api.compute.admin.test_volumes_negative* tempest tests failing randomly in multiple branches. * https://bugs.launchpad.net/bugs/1971465 fs001 and fs035 OVB jobs failing tempest - identity/haproxy connection errors * https://bugzilla.redhat.com/show_bug.cgi?id=2089304 fs020 and full-tempest-scenario job failing on tempest test neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle (edit) * https://bugzilla.redhat.com/show_bug.cgi?id=2089724 tripleo_nodes_validation failing with packet loss in the Network availability validation block * https://bugs.launchpad.net/tripleo/+bug/1975917 AttributeError: 'Service' object has no attribute 'enabled' * downstream notes: * TO Do's: * @rlandy didn't get any reply so far from lon so will need to check with lon and rekicked the line for ovn/ovs issue downstream: https://trello.com/c/GZtIK9Tb/2539-cixbz2089724rhos-17rhel-9-tripleonodesvalidation-failing-with-packet-loss-in-the-network-availability-validation-block * Rekicked - new base container issue - was failing on Friday due to mirror issue * Same issues today (Sunday) ... pls see log from https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-17-rhel9/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-build-containers-ubi-9-internal-rhel-9-build-push-upload-rhos-17/1c70911/logs/container-builds/d200f2a7-c38e-49ad-b9ef-e83cabfa5fc0/base/base-build.log. USA (lon/jon) will be out ... pls ask migarcia id he can fix: the curl error * for 16.2 https://trello.com/c/10kvMPrS/2529-cixbz2089304osp17osp162rhel8rhel9fs020-and-full-tempest-scenario-job-failing-on-tempest-test-neutrontempestpluginscenariotesttru / https://bugzilla.redhat.com/show_bug.cgi?id=2089304 waiting for patch https://code.engineering.redhat.com/gerrit/c/networking-ovn/+/411213 merged and get it promoted till integration * rhos16.2 on rhel8 * (from rlandy) https://code.engineering.redhat.com/gerrit/c/networking-ovn/+/411213 should be mergeable - pls ask Yatin/Rodolfo to merge ... then will need to be promoted up network component * Integration line: standalone and scenarios jobs failing with below error * log: https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-16.2/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-standalone-full-tempest-api-rhos-16.2/fbb5d0f/job-output.txt * error: ERROR! Error when getting the collection info for ansible.posix from default (https://galaxy.ansible.com/api/) (HTTP Code: 530, Message: Code: Unknown) * 530 Site is frozen re-running the jobs here https://code.engineering.redhat.com/gerrit/c/testproject/+/315285 * maximum node node_failure on testproject * rhos17 on rhel9 * node_failure on most of the jobs in the current run: * pinged on rhos-ops - no reply yet ``` <bhagyashris> #rhos-ops Hi, we are faing the node_failure on rhos17-rhel9 <bhagyashris> facing* <bhagyashris> currently running integration line - rhos17 on rhel9 <bhagyashris> https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/status <bhjf> Title: Zuul (at sf.hosted.upshift.rdu2.redhat.com) <bhagyashris> psedlak|ruck, ^ <bhagyashris> dpawlik, ^ <bhagyashris> facing node failure on downstream <dpawlik> kforde: hey, all is fine with the infra? <dpawlik> kforde: ah, just horizon does not work. Was thinking that something happend with one vm <dpawlik> bhagyashris: can I deque your job and recheck? <bhagyashris> yeah <bhagyashris> dpawlik, yeah <dpawlik> bhagyashris: "Global Service Outage Ongoing: RDU2 DC Impact" <dpawlik> it can be related <bhagyashris> dpawlik, ack <dpawlik> we got network flappings between services like zookeeper, DNS does not work... ``` ## Thursday 26 (previous ruck|rover pad: https://hackmd.io/uiv6iiN5QR-Z3mfFyKWeqA) * https://bugs.launchpad.net/tripleo/+bug/1973223 Master Sc010-kvm job is failing on octavia related tempest test: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest * https://bugs.launchpad.net/tripleo/+bug/1973568 Master Scenario002 is failing on Barbican related tempest test - tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received , Details: 503 * https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time. * https://bugs.launchpad.net/tripleo/+bug/1972163 cinder tempest.api.compute.admin.test_volumes_negative* tempest tests failing randomly in multiple branches. * https://bugs.launchpad.net/bugs/1971465 fs001 and fs035 OVB jobs failing tempest - identity/haproxy connection errors * https://bugzilla.redhat.com/show_bug.cgi?id=2089304 fs020 and full-tempest-scenario job failing on tempest test neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle (edit) * https://bugzilla.redhat.com/show_bug.cgi?id=2089724 tripleo_nodes_validation failing with packet loss in the Network availability validation block * ~~https://bugs.launchpad.net/tripleo/+bug/1975671~~ Component lines jobs and RDO Third party check jobs are failing because master containers are not available.