# Ruck Rover 2022-08-05 to 2022-08-11 ###### tags: `ruck_rover` ###### Previous RR notes: https://hackmd.io/H9CSoXvlTm6nTZ4bsJkeRg [Cockpit](http://dashboard-ci.tripleo.org/d/HkOLImOMk/upstream-and-rdo-promotions?orgId=1) [Downstream cockpit](http://tripleo-cockpit.lab4.eng.bos.redhat.com) [OpenStack Program Meeting 2022]( https://docs.google.com/document/d/1n6ArkMh68R9zivjlyGbpedkggk1wMwEIcrMZSN2uIjc/edit) [Downstream promoter](http://10.0.110.143/promoter_logs/) --- ## 2022-08-11 ### New/Transient no bug yet ### New bugs today #### ~~https://bugs.launchpad.net/tripleo/+bug/1985031~~ tripleo-tox-molecule consistently failing with failed: [localhost] (item=https://images.rdoproject.org/centos8/train/rdo_trunk/current-tripleo/ironic-python-agent.tar) ### Gate ongoing blocker https://bugs.launchpad.net/tripleo/+bug/1984175/comments/9 ### Check Jobs ### master blocked cix https://bugs.launchpad.net/bugs/1984184 ### wallaby c9 blocked cix https://bugs.launchpad.net/tripleo/+bug/1984453 ### wallaby c8 chasing https://review.rdoproject.org/r/c/testproject/+/44517 Run missing fs1 job for wallaby/8 c36ce5aec4d97093b683c04f3ee56212 ### train chasing https://review.rdoproject.org/r/c/testproject/+/44516 Run missing train jobs for 29018450b497197a282c6d4463b4c745 ### rhos17.1 on rhel9 #### https://bugzilla.redhat.com/show_bug.cgi?id=2116287 - major blocker in d/stream atm for across all releases. We havent promoted single line due to this infra issue. * chasing for promotion : https://code.engineering.redhat.com/gerrit/c/testproject/+/286307 (https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/buildset/d1d6dfe2a5b14ca9ab1830a33deee460) ### Promotions : (https://docs.google.com/document/d/1n6ArkMh68R9zivjlyGbpedkggk1wMwEIcrMZSN2uIjc/edit#heading=h.hqhtw5tvhd63) * OSP 16.2 RHEL-8 promoted on Aug 4, 2022 * OSP 17 RHEL-8 Promoted on Aug 5, 2022 * OSP-17 RHEL-9 Promoted on Aug 7, 2022 * OSP-17.1 RHEL-9 Promoted on July 27, 2022 ### rhos17 on rhel9 * chasing for promotion : https://code.engineering.redhat.com/gerrit/c/testproject/+/209874 ### rhos16.2 * chasing for promotion : https://code.engineering.redhat.com/gerrit/c/testproject/+/276230 ### Upstream components #### master #### wallaby #### wallaby c8 #### train ### Downstream components #### rhos17 on rhel9 * network and tripleo components are promoted on 6 Aug * now cloudops and tempest components are lagging - not consistent issues so rerunning with testproject : https://code.engineering.redhat.com/gerrit/c/testproject/+/266024 #### rhos16.2 * network and tripleo still lagging due to fs01 with node_failures atm. * rerunning sc10 as issue is not consistent one : https://code.engineering.redhat.com/gerrit/c/testproject/+/266024 --- ## 2022-08-10 ### New/Transient no bug yet * ~~https://zuul.openstack.org/builds?job_name=tripleo-ci-centos-9-undercloud-upgrade&skip=0~~ https://bugs.launchpad.net/tripleo/+bug/1984175 ### New bugs today #### https://bugs.launchpad.net/tripleo/+bug/1984175 tripleo-ci-centos-9-undercloud-upgrade - cannot install both NetworkManager-1:1.39.5-1.el9.x86_64 and NetworkManager-1:1.39.12 #### https://bugs.launchpad.net/tripleo/+bug/1984453 periodic-tripleo-ci-centos-9-ovb-1ctlr_1comp-featureset002-master is failing image build - dependency fence-agents-common = 4.10.0-28.el9 #### https://bugs.launchpad.net/tripleo/+bug/1984184 fs001 and fs035 OVB jobs failing to set up private network for os_tempest ( patch under test https://review.rdoproject.org/r/c/testproject/+/36254) #### (NON VOTING) https://bugs.launchpad.net/tripleo/+bug/1984237 [FIPS] Standalone deploy failing with: "Error in GnuTLS initialization: Error while performing self checks" ### Gate ~~green~~ * looks like new issue undercloud-upgrade ### Check Jobs ### master ongoing upstream RETRY bug https://bugs.launchpad.net/tripleo/+bug/1983817/comments/10 ### wallaby c9 ### wallaby c8 ### train ### rhos17.1 on rhel9 ### rhos17 on rhel9 ### rhos17 on rhel8 ### rhos16.2 ### Upstream components #### master #### wallaby c9 #### wallaby c8 #### train ### Downstream components #### rhos17 on rhel9 #### rhos17 on rhel8 #### rhos16.2 --- ## 2022-08-09 ### New bugs today #### https://bugs.launchpad.net/tripleo/+bug/1984035 seen on cs8 wallaby while uploading the image ### Gate green ### Check Jobs ### master ongoing issue all branches https://bugs.launchpad.net/tripleo/+bug/1983817/comments/5 * vexx ticket: https://support.vexxhost.com/hc/en-us/requests/362706 * miss 1/35 recheck https://review.rdoproject.org/r/c/testproject/+/44460/2#message-1fa61d1a0d0fca87173d4b04842ebdfdb69d3c0d * even with this we have to skip for https://code.engineering.redhat.com/gerrit/c/testproject/+/423882 (https://bugzilla.redhat.com/show_bug.cgi?id=2116287) * another hash here also 1/35 https://review.rdoproject.org/r/c/testproject/+/44475 for more comparison so we can skip * latest failures on OVB are all: "Failed to download metadata for repo 'quickstart-centos-appstreams'" https://opendev.org/openstack/openstack-ansible-os_tempest/commit/f8c8a1ed6c59dcbf1fbe66d137d715e53af2ff51 looks like the latest commit taken by openstack-os_tempest ... marked as the master branch as of 07/13. https://opendev.org/openstack/openstack-ansible-os_tempest/graph show other available commits. Maybe check with Arx impact of this change - or moving the branch. Tried these jobs on IBM cloud - fails at the same point. ### wallaby c9 * missing fs64 there https://review.rdoproject.org/r/c/testproject/+/44472 ### wallaby c8 ### train * lets go with https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/44477 * miss fs35 https://review.rdoproject.org/r/c/testproject/+/44442/4#message-829ecd515910cd3ac3dc1d9417243f3b1081b29d * miss fs1 https://review.rdoproject.org/r/c/testproject/+/44462/3#message-a19d911453cfa39a2174e88a08dc3bf6c98c8dc1 ### rhos17.1 on rhel9 * watching failed jobs with latest hash https://code.engineering.redhat.com/gerrit/c/testproject/+/286307 * still waiting for a response from kforde. Please check in with daniel tomorrow - node falures continues - no ovb stack ### rhos17 on rhel9 ### rhos17 on rhel8 ### rhos16.2 ### Upstream components #### master #### wallaby c9 #### wallaby c8 #### train ### Downstream components #### rhos17 on rhel9 #### rhos17 on rhel8 #### rhos16.2 * periodic-tripleo-ci-rhel-8-scenario010-standalone-network-rhos-16.2 only latest build failed previous green history so rechecking. * periodic-tripleo-ci-rhel-8-scenario007-standalone-network-rhos-16.2 latest one build history is good, only one failure with retry-limit and one with node_failure thats why rechecking. * periodic-tripleo-ci-rhel-8-standalone-network-rhos-16.2 build history good except one retry limit and one node failure thats why rechecking * periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-internal-network-rhos-16.2 right now failing with retry_limit the real issue we actually facing now. https://bugzilla.redhat.com/show_bug.cgi?id=2116287 * periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-internal-network-rhos-16.2 (currently failing with node_failures/retry_limits) ``` 2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.Controller.2.Controller]: CREATE_FAILED ResourceInError: resources.Controller: Went to status ERROR due to "Message: No valid host was found. , Code: 500" 2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.Controller.2]: CREATE_FAILED Resource CREATE failed: ResourceInError: resources.Controller: Went to status ERROR due to "Message: Build of instance 278b9381-5621-435c-9b8e-0fd6e83e4898 aborted: Failed to provision instance 278b9381-5621-435c-9b8e-0fd6e83e4898: Failed to prepare to de 2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.ComputeIpListMap.EnabledServicesValue]: CREATE_IN_PROGRESS state changed 2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.ComputeIpListMap.EnabledServicesValue]: CREATE_COMPLETE state changed 2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.ComputeIpListMap]: CREATE_COMPLETE Stack CREATE completed successfully 2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.Controller.2]: CREATE_FAILED ResourceInError: resources[2].resources.Controller: Went to status ERROR due to "Message: Build of instance 278b9381-5621-435c-9b8e-0fd6e83e4898 aborted: Failed to provision instance 278b9381-5621-435c-9b8e-0fd6e83e4898: Failed to prepare to deploy: IPMI 2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.Controller]: UPDATE_FAILED Resource CREATE failed: ResourceInError: resources[2].resources.Controller: Went to status ERROR due to "Message: Build of instance 278b9381-5621-435c-9b8e-0fd6e83e4898 aborted: Failed to provision instance 278b9381-5621-435c-9b8e-0fd6e83e4898: Failed to 2022-08-05 15:20:16 | 2022-08-05 15:20:01Z [overcloud.ComputeIpListMap]: CREATE_COMPLETE state changed 2022-08-05 15:20:16 | 2022-08-05 15:20:02Z [overcloud.Controller]: CREATE_FAILED resources.Controller: Resource CREATE failed: ResourceInError: resources[2].resources.Controller: Went to status ERROR due to "Message: Build of instance 278b9381-5621-435c-9b8e-0fd6e83e4898 aborted: Failed to provision instance 278b9381-5621-435c-9b8e-0f 2022-08-05 15:20:16 | 2022-08-05 15:20:02Z [overcloud]: CREATE_FAILED Resource CREATE failed: resources.Controller: Resource CREATE failed: ResourceInError: resources[2].resources.Controller: Went to status ERROR due to "Message: Build of instance 278b9381-5621-435c-9b8e-0fd6e83e4898 aborted: Failed to provision instance 27 ``` ### New/Transient no bug yet * promoter missing images for c8/wallaby (holding on revert for https://review.rdoproject.org/r/c/rdo-jobs/+/44441 ) * http://promoter.rdoproject.org/promoter_logs/centos8_wallaby.log-20220807 * chkumar working on re-push images --- ## 2022-08-08 ### New Bugs today: #### https://bugs.launchpad.net/tripleo/+bug/1983817 periodic integration all branches RETRY Could not resolve host: mirror.regionone.vexxhost-nodepool-tripleo.rdoproject.org #### https://bugzilla.redhat.com/show_bug.cgi?id=2116287 No promotions occurs due to NODE_FAILURE at downstream on weekend since 7th August ### Gate green ### Check Jobs ### master chasing master 3acadba0c3986b5a074088073042b411 https://review.rdoproject.org/r/c/testproject/+/44460 https://code.engineering.redhat.com/gerrit/c/testproject/+/423882 looks like a real issue on fs001 and fs035: TASK [os_tempest : Ensure private network exists]503 Service Unavailable (example logs in testproject) https://logserver.rdoproject.org/60/44460/2/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-master/b2eee76/job-output.txt ### wallaby c9 Chasing failures at: https://review.rdoproject.org/r/c/testproject/+/36255 f40013fd0fe770cbca616a1b53663936 is missing: internal-kvm (rerun in https://code.engineering.redhat.com/gerrit/c/testproject/+/423882), fs020 and fs039 (two reruns in testproject above so can compare results) ### wallaby c8 Hash_under_test=https://trunk.rdoproject.org/api-centos8-wallaby/api/civotes_agg_detail.html?ref_hash=31a61a906cdcd8fccc322d597aa1dd5f All tests passed ... **promoter is pushing containers but failing to find images**: http://promoter.rdoproject.org/promoter_logs/centos8_wallaby.log-20220807 pinged Amol and Chandan - pls follow up. ### train chasing train * recheck https://review.rdoproject.org/r/c/testproject/+/44442/4#message-14062031d9b8e0b9bebbd95150f76ce3f09f2a74 (just fs35 * chase second hash posted https://review.rdoproject.org/r/c/testproject/+/44462 04837c814032381ae4e9d817c276a22 - rerunning with fs001 and scenario007 For hash: e04837c814032381ae4e9d817c276a22 - only missing fs001: comparing two logs: https://logserver.rdoproject.org/openstack-periodic-integration-stable4/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-train/1f0a584/logs/undercloud/var/log/tempest/stestr_results.html.gz and https://logserver.rdoproject.org/62/44462/1/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-train/75ded79/logs/undercloud/var/log/tempest/stestr_results.html.gz inconsistent test failures - **can skip promote this hash** (latest one to run in train) ### rhos17.1 on rhel9 * rerunning few jobs with old hash to promote https://code.engineering.redhat.com/gerrit/c/testproject/+/286307 (https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/buildset/6b594ee62d9d40729ebd1a7e6108e255) ### rhos17 on rhel9 * promoted yesterday(7th august) ### rhos17 on rhel8 * https://code.engineering.redhat.com/gerrit/c/testproject/+/209874 ### rhos16.2 * rerunning few jobs with old hash https://code.engineering.redhat.com/gerrit/c/testproject/+/276230 (https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/buildset/cd1bd2bee295456bb009084f3a2229d7) ### Upstream components #### master #### wallaby c9 #### wallaby c8 #### train ### Downstream components #### rhos17 on rhel9 * no blockers #### rhos17 on rhel8 * tripleo component laggig * periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-internal-tripleo-rhos-17 (posted test run - https://code.engineering.redhat.com/gerrit/c/testproject/+/266024) * https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/pipeline_component-tripleo-pcci-17_dlrn-rhel-8.4-virthost-3cont_2comp_3ceph-ipv4-geneve-ceph/ (pinged attila - job having consistent failure) #### rhos16.2 * network component laggig * https://code.engineering.redhat.com/gerrit/c/testproject/+/266024 ### New/Transient no bug yet * ~~d/stream all builds BUILD_FAILURE (https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/builds?skip=0)~~ https://bugzilla.redhat.com/show_bug.cgi?id=2116287 * unable to login horizon/down? (ping rhos-ops dpawlik) * ~~u/stream all integration builds hit by Could not resolve host: mirror.regionone.vexxhost-nodepool-tripleo.rdoproject.org~~ https://bugs.launchpad.net/tripleo/+bug/1983817 * eg https://review.rdoproject.org/zuul/buildset/73812fd92e0f42978a461fd1cb890697 * 12:49 < ysandeep> marios|ruck, pojadhav|rover ^^ jfyi.. old stack cleanup script was disabled on vexx for debug(causing the list of old stack to grow and we start hitting ""Quota exceeded for resources"), we just enabled it back. --- ## 2022-08-05 ### New Bugs today: #### https://bugzilla.redhat.com/show_bug.cgi?id=2115778 #### https://bugs.launchpad.net/tripleo/+bug/1983718 periodic master scen1 standalone fails/timeout 'manage firewall rules' **Component: validations: - 12 days out** https://review.rdoproject.org/r/c/testproject/+/44413 real issue: https://logserver.rdoproject.org/13/44413/1/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-component-master-validation/a965b91/job-output.txt pinged Jiri ### Gate green https://lists.openstack.org/pipermail/openstack-discuss/2022-August/029865.html ### Check Jobs ### master - blocked new bug https://bugs.launchpad.net/tripleo/+bug/1983718 ### wallaby c9 - promoted today ### wallaby c8 ### train - chasing https://review.rdoproject.org/r/c/testproject/+/44442 Run missing jobs train 8cf307aefe47066dfa2b89be39b174f8 && https://code.engineering.redhat.com/gerrit/c/testproject/+/423713 Run periodic-tripleo-ci-centos-8-scenario010-kvm-internal-standalone-train * train fs035 may have a legit issue now: see https://logserver.rdoproject.org/42/44442/4/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-train/d31038e/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz. Pls bug if it repeats ### rhos17.1 on rhel9 * Testing a patch : https://code.engineering.redhat.com/gerrit/c/testproject/+/420272 (failed sc01 and sc02 standalone with same issue ceilometer_gnocchi_upgrade) ### rhos17 on rhel9 * https://code.engineering.redhat.com/gerrit/c/testproject/+/286307 ### rhos17 on rhel8 * https://code.engineering.redhat.com/gerrit/c/testproject/+/286307 ### rhos16.2 * No blockers ### Upstream components #### master #### wallaby c9 #### wallaby c8 #### train ### Downstream components #### rhos17 on rhel9 * network component lagging * periodic-tripleo-ci-rhel-9-ovb-3ctlr_1comp-featureset001-internal-network-rhos-17 * periodic-tripleo-ci-rhel-9-scenario010-standalone-network-rhos-17 #### rhos17 on rhel8 * reported a bug for tripleo component https://bugzilla.redhat.com/show_bug.cgi?id=2115778 Tried https://code.engineering.redhat.com/gerrit/c/testproject/+/418637 with depends-on: https://code.engineering.redhat.com/gerrit/c/openstack-tripleo-heat-templates/+/423661. Looks like that setting is on the controller but the same error is still there. Do we need to set it somewhere else? #### rhos16.2 * network component lagging - https://code.engineering.redhat.com/gerrit/c/testproject/+/266024/ ## 2022-08-04 previous ruck|rover shift - notes @ https://hackmd.io/H9CSoXvlTm6nTZ4bsJkeRg