owned this note
owned this note
Published
Linked with GitHub
# Ruck Rover - 27th May 2022 - 02nd June 2022
###### tags: `ruck_rover`
###### Previous RR notes: https://hackmd.io/2hB-P772SqyqDs0KKZzZEQ?view
[Cockpit](http://dashboard-ci.tripleo.org/d/HkOLImOMk/upstream-and-rdo-promotions?orgId=1)
[Downstream cockpit](http://tripleo-cockpit.lab4.eng.bos.redhat.com)
## Thursday 02 June
#### new/transient/no bug yet
* https://logserver.rdoproject.org/openstack-periodic-integration-stable1-cs8/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-wallaby/8dade15/job-output.txt
* ~~because of https://review.rdoproject.org/r/c/rdo-jobs/+/42881~~ ?
* trying/test with https://review.rdoproject.org/r/c/testproject/+/43374 Depends-On: https://review.rdoproject.org/r/c/rdo-jobs/+/43360
* not this same result https://logserver.rdoproject.org/78/43378/3/check/periodic-tripleo-ci-centos-9-standalone-full-tempest-api-compute-master/ff217a2/job-output.txt
* because of this https://review.opendev.org/c/openstack/tripleo-ci/+/844389 ?
* testing v4 @ https://review.rdoproject.org/r/c/testproject/+/43378 https://review.rdoproject.org/r/c/testproject/+/43374 looks good so far
#### Bugs:
* https://bugs.launchpad.net/tripleo/+bug/1976614 victoria standalone & undercloud upgrade jobs broken at undercloud-setup package deps
## Wednesday 01 June (and earlier ongoing things tracked here)
#### Bugs:
* https://bugs.launchpad.net/tripleo/+bug/1973223 Master Sc010-kvm job is failing on octavia related tempest test: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest
* https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time.
* https://bugs.launchpad.net/tripleo/+bug/1972163 cinder tempest.api.compute.admin.test_volumes_negative* tempest tests failing randomly in multiple branches.
* https://bugs.launchpad.net/bugs/1971465 fs001 and fs035 OVB jobs failing tempest - identity/haproxy connection errors
* https://bugzilla.redhat.com/show_bug.cgi?id=2089724 tripleo_nodes_validation failing with packet loss in the Network availability validation block
* ~~https://bugs.launchpad.net/tripleo/+bug/1973568~~ Master Scenario002 is failing on Barbican related tempest test - tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received , Details: 503
* ~~https://bugzilla.redhat.com/show_bug.cgi?id=2089304~~ fs020 and full-tempest-scenario job failing on tempest test neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle (edit)
---
## ***STOP (all tracked bugs duplicated above stop scrolling) STOP***
---
---
## ***STOP (all tracked bugs duplicated above stop scrolling) STOP***
---
## ~~Tuesday 31 May~~
#### Bugs:
* https://bugs.launchpad.net/tripleo/+bug/1973223 Master Sc010-kvm job is failing on octavia related tempest test: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest
* https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time.
* https://bugs.launchpad.net/tripleo/+bug/1972163 cinder tempest.api.compute.admin.test_volumes_negative* tempest tests failing randomly in multiple branches.
* https://bugs.launchpad.net/bugs/1971465 fs001 and fs035 OVB jobs failing tempest - identity/haproxy connection errors
* https://bugzilla.redhat.com/show_bug.cgi?id=2089304 fs020 and full-tempest-scenario job failing on tempest test neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle (edit)
* https://bugs.launchpad.net/tripleo/+bug/1973568 Master Scenario002 is failing on Barbican related tempest test - tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received , Details: 503
* https://bugzilla.redhat.com/show_bug.cgi?id=2089724 tripleo_nodes_validation failing with packet loss in the Network availability validation block
---
* downstream:
* rhos17 on rhel9 :
* build-containers-ubi-9-internal-rhel-9-build-push-upload-rhos-17 failing due [1]
* [1] https://bugzilla.redhat.com/show_bug.cgi?id=2091816
* fix: https://code.engineering.redhat.com/gerrit/c/openstack/tripleo-ci-internal-jobs/+/412327 (works)
* TO Do's:
* for 16.2:
* keep eye on fs020 results on testproject patch: https://code.engineering.redhat.com/gerrit/c/testproject/+/315285
* **PROMOTED**
* for rhos17 on rhel8: waiting for next integration run result
* (rlandy) fix for failing test - https://review.opendev.org/c/openstack/neutron/+/843763/ is not yet passed in the network component line and has not reached downstream. Rerunning failed jobs: https://review.rdoproject.org/r/c/testproject/+/36254
* for rhos17 on rhel9: waiting for tesproject patch result: https://code.engineering.redhat.com/gerrit/c/testproject/+/412202 once it pass need to rekick the whole line
* (rlandy) containers and images builds pass - tests are still failing on https://bugzilla.redhat.com/2089724 (new ovn did not help) NEEDS INVESTIGATION TOMORROW
##### Promotions upstream handoff:
***1*** TRAIN
* latest buildset https://review.rdoproject.org/zuul/buildset/0eb15e91b17d49dca312ae072d972aea - many fails stackviz thing https://bugs.launchpad.net/tripleo/+bug/1976247
* digging for candidate @ http://promoter.rdoproject.org/promoter_logs/centos8_train.log - https://trunk.rdoproject.org/centos8-train/tripleo-ci-testing/81/64/ from 27th :/
* rekicked line manually => decent result https://review.rdoproject.org/zuul/buildset/87f1a28cf802437f8823f57f9d2efe4c only fs35 timeout
* posted chaser fs35 rerun https://review.rdoproject.org/r/c/testproject/+/43298
* (rlandy) rerun twice - second run still in progress (if not same tempest tests, suggestion to comment this test out and promoted)
***2*** ~~WALLABY 9~~ promoted
* nice build @ https://review.rdoproject.org/zuul/buildset/bc85c2bb90bd4b0696a407f79c40fa84
* brief dig inconsistent fails @ https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp_1supp-featureset039-wallaby
* posted chaser @ https://review.rdoproject.org/r/c/testproject/+/43284
* recheck tempest :/ https://review.rdoproject.org/r/c/testproject/+/43284/1#message-6da3a4d7a2b099ba15941eaae92637a645f0af94 => green & promoted
***3*** WALLABY/8
* buildset https://review.rdoproject.org/zuul/buildset/0d2c138a8d4d4a87a4c8fc92f6d55783 - all 3 fails stackviz issue. line currently running no test yet
* (rlandy) only missing fs035 - you can decide to skip and promo if no new failures
***4*** MASTER
* buildset https://review.rdoproject.org/zuul/buildset/0771b066f0364ba18ad462f5883ba7eb 20/35/39 (2x tempest 1x mirror)
* chasing with https://review.rdoproject.org/r/c/testproject/+/43286
* dig @ http://promoter.rdoproject.org/promoter_logs/centos9_master.log another candidate from 29th https://trunk.rdoproject.org/centos9-master/tripleo-ci-testing/41/f9/ chasing with https://review.rdoproject.org/r/c/testproject/+/43285
* (rlandy) https://review.rdoproject.org/r/c/testproject/+/43286 failed twice - with what looks like a legit failure - same test neutron_tempest_plugin.scenario.test_security_groups.NetworkSecGroupTest. BUT ... the in the next hash fs020 passes (failed fs039 and fs064). So skipping seems not needed - we have a choice to ignore that failure and promo or try chase later hash
***5*** VICTORIA
* https://review.rdoproject.org/zuul/buildset/237623aa98514f9a924981868450b10b
* new bug standalone/undercloud upgrade package conflict ? https://logserver.rdoproject.org/openstack-periodic-integration-stable2/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-undercloud-upgrade-victoria/87354dd/logs/undercloud/home/zuul/install_packages.sh.log.txt.gz only seen once
* posted chaser https://review.rdoproject.org/r/c/testproject/+/43296
#### new/transient/no bug yet
* possible breakages in ceph jobs until these are all merged https://review.opendev.org/q/topic:ceph_promotion_pipeline
## ~~Monday 30 May~~
#### Bugs:
* https://bugs.launchpad.net/tripleo/+bug/1973223 Master Sc010-kvm job is failing on octavia related tempest test: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest
* https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time.
* https://bugs.launchpad.net/tripleo/+bug/1972163 cinder tempest.api.compute.admin.test_volumes_negative* tempest tests failing randomly in multiple branches.
* https://bugs.launchpad.net/bugs/1971465 fs001 and fs035 OVB jobs failing tempest - identity/haproxy connection errors
* https://bugzilla.redhat.com/show_bug.cgi?id=2089304 fs020 and full-tempest-scenario job failing on tempest test neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle (edit)
* https://bugs.launchpad.net/tripleo/+bug/1973568 Master Scenario002 is failing on Barbican related tempest test - tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received , Details: 503
* ~~https://bugzilla.redhat.com/show_bug.cgi?id=2091502~~ ERROR: Cannot install stackviz because these package versions have conflicting dependencies
* downstream testproject: https://code.engineering.redhat.com/gerrit/c/testproject/+/315285 [DNM] Test rhos16.2 failing jobs
* upstream bug: ~~https://bugs.launchpad.net/tripleo/+bug/1976247~~ wallaby gate blocker tripleo-ci-centos-8-standalone ERROR: Cannot install stackviz
* ~~https://bugs.launchpad.net/tripleo/+bug/1976251~~ [CI] tox-ansible-test-sanity doesn't take the "ignore" anymore
* ~~https://bugs.launchpad.net/tripleo/+bug/1975917~~ AttributeError: 'Service' object has no attribute 'enabled'
* ~~https://code.engineering.redhat.com/gerrit/c/networking-ovn/+/411213~~
```
<bhagyashris> slaweq, hey can you help us to merge this one https://code.engineering.redhat.com/gerrit/c/networking-ovn/+/411213
<bhagyashris> we are blocked due to this ^
<slaweq> bhagyashris: sure, looking
<slaweq> bhagyashris: done
<bhagyashris> slaweq, thanks
```
* regarding the curl error:
* pinged migarcia
```
<bhagyashris> migarcia, hey
<bhagyashris> around?
<migarcia> bhagyashris: I am, what's up?
<bhagyashris> we are facing one issue build push upload image
<bhagyashris> https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-17-rhel9/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-build-containers-ubi-9-internal-rhel-9-build-push-upload-rhos-17/1c70911/logs/container-builds/d200f2a7-c38e-49ad-b9ef-e83cabfa5fc0/base/base-build.log
<bhagyashris> lon is out today
<bhagyashris> issue is: Errors during downloading metadata for repository 'osptrunk-candidate-deps':
<bhagyashris> - Curl error (28): Timeout was reached for http://download.eng.bos.redhat.com/brewroot/repos/rhos-17.0-rhel-9-trunk-candidate/latest/x86_64/repodata/f6d120c5ebe86676cd598a6c179f7bef99e4aa3fc54f9291e27708b502d1f7fc-primary.xml.gz
<migarcia> bhagyashris: I would rekick, could be a network blip or that the repo was regenerated while the job was running
<migarcia> rhos-17.0-rhel-9-trunk-candidate/latest symlink gets updated regularly as new builds are tagged in
<bhagyashris> migarcia, ack thanks ! will check the result in the rekick
<migarcia> cool, let me know
<bhagyashris> migarcia, thanks
<bhagyashris> let me know once you re kicked
<bhagyashris> migarcia, hey you are rekicking or should i rekicked
<migarcia> bhagyashris: please do
<bhagyashris> migarcia, ack
<bhagyashris> migarcia, hey we are still with same issue on the recent run
<bhagyashris> https://sf.hosted.upshift.rdu2.redhat.com/logs/94/947e8a93a865e16481d14a1dd9fe1f91216e1a8d/openstack-periodic-integration-rhos-17-rhel9/periodic-tripleo-build-containers-ubi-9-internal-rhel-9-build-push-upload-rhos-17/45338ea/logs/container-builds/a3e81dbf-42ba-42e9-bf72-d5aeb0e65b4f/base/base-build.log
<migarcia> bhagyashris: huh, I can download that file just fine.
<migarcia> and it looks like the job was also downloading it fine, but very slow for some reason
<migarcia> osptrunk-candidate-deps 506 B/s | 64 kB 02:10
<ysandeep> bhagyashris: could you hold a node and check mtu on cni-podman bridge
<ysandeep> bhagyashris, sounds similiar to https://bugzilla.redhat.com/show_bug.cgi?id=2060932
<bhagyashris> ysandeep, let me hit the testproject patch
```
**Update:** recently podman version updated from 2:4.0.2-6.el9_0 to 2:4.0.2-7.el9_0 which basically includes the new dependencies "netavark" and that creates the "podman0" bridget from here https://github.com/containers/netavark/blob/02e031fdd9f7cd849c4fdd18cdd1ecb1a135485f/src/test/config/setupopts2.test.json#L14-L22 and takes mtu value as 1500 which basically take more time to download metadata for repository 'osptrunk-candidate-deps' and get timed and failed. Will debug more tomorrow on that and will file bug if required
#### new/transient/no bug yet
* ~~c9 broken container build https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-build-containers-centos-9-push-master/a0d5760/logs/build.log~~
* not a blocker because https://github.com/rdo-infra/rdo-jobs/blob/c72a5465ae3b5425bb983f8a1744af8b3e839c34/zuul.d/integration-pipeline-main.yaml#L10 we are pushing to quay now that job is OK
## Friday 27 May
### Bugs:
* https://bugs.launchpad.net/tripleo/+bug/1973223 Master Sc010-kvm job is failing on octavia related tempest test: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest
* https://bugs.launchpad.net/tripleo/+bug/1973568 Master Scenario002 is failing on Barbican related tempest test - tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received , Details: 503
* https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time.
* https://bugs.launchpad.net/tripleo/+bug/1972163 cinder tempest.api.compute.admin.test_volumes_negative* tempest tests failing randomly in multiple branches.
* https://bugs.launchpad.net/bugs/1971465 fs001 and fs035 OVB jobs failing tempest - identity/haproxy connection errors
* https://bugzilla.redhat.com/show_bug.cgi?id=2089304 fs020 and full-tempest-scenario job failing on tempest test neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle (edit)
* https://bugzilla.redhat.com/show_bug.cgi?id=2089724 tripleo_nodes_validation failing with packet loss in the Network availability validation block
* https://bugs.launchpad.net/tripleo/+bug/1975917 AttributeError: 'Service' object has no attribute 'enabled'
* downstream notes:
* TO Do's:
* @rlandy didn't get any reply so far from lon so will need to check with lon and rekicked the line for ovn/ovs issue downstream: https://trello.com/c/GZtIK9Tb/2539-cixbz2089724rhos-17rhel-9-tripleonodesvalidation-failing-with-packet-loss-in-the-network-availability-validation-block
* Rekicked - new base container issue - was failing on Friday due to mirror issue
* Same issues today (Sunday) ... pls see log from https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-17-rhel9/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-build-containers-ubi-9-internal-rhel-9-build-push-upload-rhos-17/1c70911/logs/container-builds/d200f2a7-c38e-49ad-b9ef-e83cabfa5fc0/base/base-build.log.
USA (lon/jon) will be out ... pls ask migarcia id he can fix: the curl error
* for 16.2 https://trello.com/c/10kvMPrS/2529-cixbz2089304osp17osp162rhel8rhel9fs020-and-full-tempest-scenario-job-failing-on-tempest-test-neutrontempestpluginscenariotesttru / https://bugzilla.redhat.com/show_bug.cgi?id=2089304 waiting for patch https://code.engineering.redhat.com/gerrit/c/networking-ovn/+/411213 merged and get it promoted till integration
* rhos16.2 on rhel8
* (from rlandy) https://code.engineering.redhat.com/gerrit/c/networking-ovn/+/411213 should be mergeable - pls ask Yatin/Rodolfo to merge ... then will need to be promoted up network component
* Integration line: standalone and scenarios jobs failing with below error
* log: https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-16.2/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-standalone-full-tempest-api-rhos-16.2/fbb5d0f/job-output.txt
* error: ERROR! Error when getting the collection info for ansible.posix from default (https://galaxy.ansible.com/api/) (HTTP Code: 530, Message: Code: Unknown)
* 530 Site is frozen re-running the jobs here https://code.engineering.redhat.com/gerrit/c/testproject/+/315285
* maximum node node_failure on testproject
* rhos17 on rhel9
* node_failure on most of the jobs in the current run:
* pinged on rhos-ops - no reply yet
```
<bhagyashris> #rhos-ops Hi, we are faing the node_failure on rhos17-rhel9
<bhagyashris> facing*
<bhagyashris> currently running integration line - rhos17 on rhel9
<bhagyashris> https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/status
<bhjf> Title: Zuul (at sf.hosted.upshift.rdu2.redhat.com)
<bhagyashris> psedlak|ruck, ^
<bhagyashris> dpawlik, ^
<bhagyashris> facing node failure on downstream
<dpawlik> kforde: hey, all is fine with the infra?
<dpawlik> kforde: ah, just horizon does not work. Was thinking that something happend with one vm
<dpawlik> bhagyashris: can I deque your job and recheck?
<bhagyashris> yeah
<bhagyashris> dpawlik, yeah
<dpawlik> bhagyashris: "Global Service Outage Ongoing: RDU2 DC Impact"
<dpawlik> it can be related
<bhagyashris> dpawlik, ack
<dpawlik> we got network flappings between services like zookeeper, DNS does not work...
```
## Thursday 26
(previous ruck|rover pad: https://hackmd.io/uiv6iiN5QR-Z3mfFyKWeqA)
* https://bugs.launchpad.net/tripleo/+bug/1973223 Master Sc010-kvm job is failing on octavia related tempest test: octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest
* https://bugs.launchpad.net/tripleo/+bug/1973568 Master Scenario002 is failing on Barbican related tempest test - tempest.lib.exceptions.UnexpectedResponseCode: Unexpected response code received , Details: 503
* https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time.
* https://bugs.launchpad.net/tripleo/+bug/1972163 cinder tempest.api.compute.admin.test_volumes_negative* tempest tests failing randomly in multiple branches.
* https://bugs.launchpad.net/bugs/1971465 fs001 and fs035 OVB jobs failing tempest - identity/haproxy connection errors
* https://bugzilla.redhat.com/show_bug.cgi?id=2089304 fs020 and full-tempest-scenario job failing on tempest test neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle (edit)
* https://bugzilla.redhat.com/show_bug.cgi?id=2089724 tripleo_nodes_validation failing with packet loss in the Network availability validation block
* ~~https://bugs.launchpad.net/tripleo/+bug/1975671~~ Component lines jobs and RDO Third party check jobs are failing because master containers are not available.