# Ruck Rover 2022-08-05 to 2022-08-11
###### tags: `ruck_rover`
###### Previous RR notes: https://hackmd.io/H9CSoXvlTm6nTZ4bsJkeRg
[Cockpit](http://dashboard-ci.tripleo.org/d/HkOLImOMk/upstream-and-rdo-promotions?orgId=1)
[Downstream cockpit](http://tripleo-cockpit.lab4.eng.bos.redhat.com)
[OpenStack Program Meeting 2022](
https://docs.google.com/document/d/1n6ArkMh68R9zivjlyGbpedkggk1wMwEIcrMZSN2uIjc/edit)
[Downstream promoter](http://10.0.110.143/promoter_logs/)
---
## 2022-08-11
### New/Transient no bug yet
### New bugs today
#### ~~https://bugs.launchpad.net/tripleo/+bug/1985031~~ tripleo-tox-molecule consistently failing with failed: [localhost] (item=https://images.rdoproject.org/centos8/train/rdo_trunk/current-tripleo/ironic-python-agent.tar)
### Gate ongoing blocker https://bugs.launchpad.net/tripleo/+bug/1984175/comments/9
### Check Jobs
### master blocked cix https://bugs.launchpad.net/bugs/1984184
### wallaby c9 blocked cix https://bugs.launchpad.net/tripleo/+bug/1984453
### wallaby c8 chasing https://review.rdoproject.org/r/c/testproject/+/44517 Run missing fs1 job for wallaby/8 c36ce5aec4d97093b683c04f3ee56212
### train chasing https://review.rdoproject.org/r/c/testproject/+/44516 Run missing train jobs for 29018450b497197a282c6d4463b4c745
### rhos17.1 on rhel9
#### https://bugzilla.redhat.com/show_bug.cgi?id=2116287 - major blocker in d/stream atm for across all releases. We havent promoted single line due to this infra issue.
* chasing for promotion : https://code.engineering.redhat.com/gerrit/c/testproject/+/286307
(https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/buildset/d1d6dfe2a5b14ca9ab1830a33deee460)
### Promotions :
(https://docs.google.com/document/d/1n6ArkMh68R9zivjlyGbpedkggk1wMwEIcrMZSN2uIjc/edit#heading=h.hqhtw5tvhd63)
* OSP 16.2 RHEL-8 promoted on Aug 4, 2022
* OSP 17 RHEL-8 Promoted on Aug 5, 2022
* OSP-17 RHEL-9 Promoted on Aug 7, 2022
* OSP-17.1 RHEL-9 Promoted on July 27, 2022
### rhos17 on rhel9
* chasing for promotion : https://code.engineering.redhat.com/gerrit/c/testproject/+/209874
### rhos16.2
* chasing for promotion : https://code.engineering.redhat.com/gerrit/c/testproject/+/276230
### Upstream components
#### master
#### wallaby
#### wallaby c8
#### train
### Downstream components
#### rhos17 on rhel9
* network and tripleo components are promoted on 6 Aug
* now cloudops and tempest components are lagging - not consistent issues so rerunning with testproject : https://code.engineering.redhat.com/gerrit/c/testproject/+/266024
#### rhos16.2
* network and tripleo still lagging due to fs01 with node_failures atm.
* rerunning sc10 as issue is not consistent one : https://code.engineering.redhat.com/gerrit/c/testproject/+/266024
---
## 2022-08-10
### New/Transient no bug yet
* ~~https://zuul.openstack.org/builds?job_name=tripleo-ci-centos-9-undercloud-upgrade&skip=0~~ https://bugs.launchpad.net/tripleo/+bug/1984175
### New bugs today
#### https://bugs.launchpad.net/tripleo/+bug/1984175 tripleo-ci-centos-9-undercloud-upgrade - cannot install both NetworkManager-1:1.39.5-1.el9.x86_64 and NetworkManager-1:1.39.12
#### https://bugs.launchpad.net/tripleo/+bug/1984453 periodic-tripleo-ci-centos-9-ovb-1ctlr_1comp-featureset002-master is failing image build - dependency fence-agents-common = 4.10.0-28.el9
#### https://bugs.launchpad.net/tripleo/+bug/1984184 fs001 and fs035 OVB jobs failing to set up private network for os_tempest ( patch under test https://review.rdoproject.org/r/c/testproject/+/36254)
#### (NON VOTING) https://bugs.launchpad.net/tripleo/+bug/1984237 [FIPS] Standalone deploy failing with: "Error in GnuTLS initialization: Error while performing self checks"
### Gate ~~green~~
* looks like new issue undercloud-upgrade
### Check Jobs
### master ongoing upstream RETRY bug https://bugs.launchpad.net/tripleo/+bug/1983817/comments/10
### wallaby c9
### wallaby c8
### train
### rhos17.1 on rhel9
### rhos17 on rhel9
### rhos17 on rhel8
### rhos16.2
### Upstream components
#### master
#### wallaby c9
#### wallaby c8
#### train
### Downstream components
#### rhos17 on rhel9
#### rhos17 on rhel8
#### rhos16.2
---
## 2022-08-09
### New bugs today
#### https://bugs.launchpad.net/tripleo/+bug/1984035 seen on cs8 wallaby while uploading the image
### Gate green
### Check Jobs
### master ongoing issue all branches https://bugs.launchpad.net/tripleo/+bug/1983817/comments/5
* vexx ticket: https://support.vexxhost.com/hc/en-us/requests/362706
* miss 1/35 recheck https://review.rdoproject.org/r/c/testproject/+/44460/2#message-1fa61d1a0d0fca87173d4b04842ebdfdb69d3c0d
* even with this we have to skip for https://code.engineering.redhat.com/gerrit/c/testproject/+/423882 (https://bugzilla.redhat.com/show_bug.cgi?id=2116287)
* another hash here also 1/35 https://review.rdoproject.org/r/c/testproject/+/44475 for more comparison so we can skip
* latest failures on OVB are all: "Failed to download metadata for repo 'quickstart-centos-appstreams'"
https://opendev.org/openstack/openstack-ansible-os_tempest/commit/f8c8a1ed6c59dcbf1fbe66d137d715e53af2ff51 looks like the latest commit taken by openstack-os_tempest ... marked as the master branch as of 07/13. https://opendev.org/openstack/openstack-ansible-os_tempest/graph show other available commits. Maybe check with Arx impact of this change - or moving the branch. Tried these jobs on IBM cloud - fails at the same point.
### wallaby c9
* missing fs64 there https://review.rdoproject.org/r/c/testproject/+/44472
### wallaby c8
### train
* lets go with https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44477
* miss fs35 https://review.rdoproject.org/r/c/testproject/+/44442/4#message-829ecd515910cd3ac3dc1d9417243f3b1081b29d
* miss fs1 https://review.rdoproject.org/r/c/testproject/+/44462/3#message-a19d911453cfa39a2174e88a08dc3bf6c98c8dc1
### rhos17.1 on rhel9
* watching failed jobs with latest hash https://code.engineering.redhat.com/gerrit/c/testproject/+/286307
* still waiting for a response from kforde. Please check in with daniel tomorrow - node falures continues - no ovb stack
### rhos17 on rhel9
### rhos17 on rhel8
### rhos16.2
### Upstream components
#### master
#### wallaby c9
#### wallaby c8
#### train
### Downstream components
#### rhos17 on rhel9
#### rhos17 on rhel8
#### rhos16.2
* periodic-tripleo-ci-rhel-8-scenario010-standalone-network-rhos-16.2 only latest build failed previous green history so rechecking.
* periodic-tripleo-ci-rhel-8-scenario007-standalone-network-rhos-16.2 latest one build history is good, only one failure with retry-limit and one with node_failure thats why rechecking.
* periodic-tripleo-ci-rhel-8-standalone-network-rhos-16.2 build history good except one retry limit and one node failure thats why rechecking
* periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-internal-network-rhos-16.2 right now failing with retry_limit the real issue we actually facing now. https://bugzilla.redhat.com/show_bug.cgi?id=2116287
* periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-internal-network-rhos-16.2
(currently failing with node_failures/retry_limits)
```
2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.Controller.2.Controller]: CREATE_FAILED ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. , Code: 500"
2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.Controller.2]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.Controller: Went to status ERROR due to "Message: Build of instance 278b9381-5621-435c-9b8e-0fd6e83e4898 aborted: Failed to provision instance 278b9381-5621-435c-9b8e-0fd6e83e4898: Failed to prepare to de
2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.ComputeIpListMap.EnabledServicesValue]: CREATE_IN_PROGRESS state changed
2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.ComputeIpListMap.EnabledServicesValue]: CREATE_COMPLETE state changed
2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.ComputeIpListMap]: CREATE_COMPLETE Stack CREATE completed successfully
2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.Controller.2]: CREATE_FAILED ResourceInError: resources[2].resources.Controller: Went to status ERROR due to "Message: Build of instance 278b9381-5621-435c-9b8e-0fd6e83e4898 aborted: Failed to provision instance 278b9381-5621-435c-9b8e-0fd6e83e4898: Failed to prepare to deploy: IPMI
2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.Controller]: UPDATE_FAILED Resource CREATE failed: ResourceInError: resources[2].resources.Controller: Went to status ERROR due to "Message: Build of instance 278b9381-5621-435c-9b8e-0fd6e83e4898 aborted: Failed to provision instance 278b9381-5621-435c-9b8e-0fd6e83e4898: Failed to
2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.ComputeIpListMap]: CREATE_COMPLETE state changed
2022-08-05 15:20:16 | 2022-08-05 15:20:02Z [overcloud.Controller]: CREATE_FAILED resources.Controller: Resource CREATE failed: ResourceInError: resources[2].resources.Controller: Went to status ERROR due to "Message: Build of instance 278b9381-5621-435c-9b8e-0fd6e83e4898 aborted: Failed to provision instance 278b9381-5621-435c-9b8e-0f
2022-08-05 15:20:16 | 2022-08-05 15:20:02Z [overcloud]: CREATE_FAILED Resource CREATE failed: resources.Controller: Resource CREATE failed: ResourceInError: resources[2].resources.Controller: Went to status ERROR due to "Message: Build of instance 278b9381-5621-435c-9b8e-0fd6e83e4898 aborted: Failed to provision instance 27
```
### New/Transient no bug yet
* promoter missing images for c8/wallaby (holding on revert for https://review.rdoproject.org/r/c/rdo-jobs/+/44441 )
* http://promoter.rdoproject.org/promoter_logs/centos8_wallaby.log-20220807
* chkumar working on re-push images
---
## 2022-08-08
### New Bugs today:
#### https://bugs.launchpad.net/tripleo/+bug/1983817 periodic integration all branches RETRY Could not resolve host: mirror.regionone.vexxhost-nodepool-tripleo.rdoproject.org
#### https://bugzilla.redhat.com/show_bug.cgi?id=2116287 No promotions occurs due to NODE_FAILURE at downstream on weekend since 7th August
### Gate green
### Check Jobs
### master chasing master 3acadba0c3986b5a074088073042b411 https://review.rdoproject.org/r/c/testproject/+/44460 https://code.engineering.redhat.com/gerrit/c/testproject/+/423882
looks like a real issue on fs001 and fs035:
TASK [os_tempest : Ensure private network exists]503 Service Unavailable
(example logs in testproject) https://logserver.rdoproject.org/60/44460/2/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-master/b2eee76/job-output.txt
### wallaby c9
Chasing failures at: https://review.rdoproject.org/r/c/testproject/+/36255
f40013fd0fe770cbca616a1b53663936 is missing: internal-kvm (rerun in https://code.engineering.redhat.com/gerrit/c/testproject/+/423882), fs020 and fs039 (two reruns in testproject above so can compare results)
### wallaby c8
Hash_under_test=https://trunk.rdoproject.org/api-centos8-wallaby/api/civotes_agg_detail.html?ref_hash=31a61a906cdcd8fccc322d597aa1dd5f
All tests passed ... **promoter is pushing containers but failing to find images**: http://promoter.rdoproject.org/promoter_logs/centos8_wallaby.log-20220807
pinged Amol and Chandan - pls follow up.
### train chasing train * recheck https://review.rdoproject.org/r/c/testproject/+/44442/4#message-14062031d9b8e0b9bebbd95150f76ce3f09f2a74 (just fs35 * chase second hash posted https://review.rdoproject.org/r/c/testproject/+/44462 04837c814032381ae4e9d817c276a22 - rerunning with fs001 and scenario007
For hash: e04837c814032381ae4e9d817c276a22 - only missing fs001:
comparing two logs: https://logserver.rdoproject.org/openstack-periodic-integration-stable4/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-train/1f0a584/logs/undercloud/var/log/tempest/stestr_results.html.gz and https://logserver.rdoproject.org/62/44462/1/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-train/75ded79/logs/undercloud/var/log/tempest/stestr_results.html.gz
inconsistent test failures - **can skip promote this hash** (latest one to run in train)
### rhos17.1 on rhel9
* rerunning few jobs with old hash to promote https://code.engineering.redhat.com/gerrit/c/testproject/+/286307
(https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/buildset/6b594ee62d9d40729ebd1a7e6108e255)
### rhos17 on rhel9
* promoted yesterday(7th august)
### rhos17 on rhel8
* https://code.engineering.redhat.com/gerrit/c/testproject/+/209874
### rhos16.2
* rerunning few jobs with old hash https://code.engineering.redhat.com/gerrit/c/testproject/+/276230
(https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/buildset/cd1bd2bee295456bb009084f3a2229d7)
### Upstream components
#### master
#### wallaby c9
#### wallaby c8
#### train
### Downstream components
#### rhos17 on rhel9
* no blockers
#### rhos17 on rhel8
* tripleo component laggig
* periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-internal-tripleo-rhos-17 (posted test run - https://code.engineering.redhat.com/gerrit/c/testproject/+/266024)
* https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/pipeline_component-tripleo-pcci-17_dlrn-rhel-8.4-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph/ (pinged attila - job having consistent failure)
#### rhos16.2
* network component laggig
* https://code.engineering.redhat.com/gerrit/c/testproject/+/266024
### New/Transient no bug yet
* ~~d/stream all builds BUILD_FAILURE (https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/builds?skip=0)~~ https://bugzilla.redhat.com/show_bug.cgi?id=2116287
* unable to login horizon/down? (ping rhos-ops dpawlik)
* ~~u/stream all integration builds hit by Could not resolve host: mirror.regionone.vexxhost-nodepool-tripleo.rdoproject.org~~ https://bugs.launchpad.net/tripleo/+bug/1983817
* eg https://review.rdoproject.org/zuul/buildset/73812fd92e0f42978a461fd1cb890697
* 12:49 < ysandeep> marios|ruck, pojadhav|rover ^^ jfyi.. old stack cleanup script was disabled on vexx for
debug(causing the list of old stack to grow and we start hitting ""Quota exceeded for
resources"), we just enabled it back.
---
## 2022-08-05
### New Bugs today:
#### https://bugzilla.redhat.com/show_bug.cgi?id=2115778
#### https://bugs.launchpad.net/tripleo/+bug/1983718 periodic master scen1 standalone fails/timeout 'manage firewall rules'
**Component: validations: - 12 days out**
https://review.rdoproject.org/r/c/testproject/+/44413
real issue: https://logserver.rdoproject.org/13/44413/1/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-component-master-validation/a965b91/job-output.txt
pinged Jiri
### Gate green https://lists.openstack.org/pipermail/openstack-discuss/2022-August/029865.html
### Check Jobs
### master - blocked new bug https://bugs.launchpad.net/tripleo/+bug/1983718
### wallaby c9 - promoted today
### wallaby c8
### train - chasing https://review.rdoproject.org/r/c/testproject/+/44442 Run missing jobs train 8cf307aefe47066dfa2b89be39b174f8 && https://code.engineering.redhat.com/gerrit/c/testproject/+/423713 Run periodic-tripleo-ci-centos-8-scenario010-kvm-internal-standalone-train
* train fs035 may have a legit issue now: see https://logserver.rdoproject.org/42/44442/4/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-train/d31038e/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz. Pls bug if it repeats
### rhos17.1 on rhel9
* Testing a patch : https://code.engineering.redhat.com/gerrit/c/testproject/+/420272
(failed sc01 and sc02 standalone with same issue ceilometer_gnocchi_upgrade)
### rhos17 on rhel9
* https://code.engineering.redhat.com/gerrit/c/testproject/+/286307
### rhos17 on rhel8
* https://code.engineering.redhat.com/gerrit/c/testproject/+/286307
### rhos16.2
* No blockers
### Upstream components
#### master
#### wallaby c9
#### wallaby c8
#### train
### Downstream components
#### rhos17 on rhel9
* network component lagging
* periodic-tripleo-ci-rhel-9-ovb-3ctlr_1comp-featureset001-internal-network-rhos-17
* periodic-tripleo-ci-rhel-9-scenario010-standalone-network-rhos-17
#### rhos17 on rhel8
* reported a bug for tripleo component https://bugzilla.redhat.com/show_bug.cgi?id=2115778
Tried https://code.engineering.redhat.com/gerrit/c/testproject/+/418637 with depends-on: https://code.engineering.redhat.com/gerrit/c/openstack-tripleo-heat-templates/+/423661. Looks like that setting is on the controller but the same error is still there. Do we need to set it somewhere else?
#### rhos16.2
* network component lagging - https://code.engineering.redhat.com/gerrit/c/testproject/+/266024/
## 2022-08-04
previous ruck|rover shift - notes @ https://hackmd.io/H9CSoXvlTm6nTZ4bsJkeRg