# Ruck Rover 2022-07-01 to 2022-07-07 ###### tags: `ruck_rover` ###### Previous RR notes: https://hackmd.io/9hv3vTNlST2rw014LSDqcg ###### Next RR notes: TODO [Cockpit](http://dashboard-ci.tripleo.org/d/HkOLImOMk/upstream-and-rdo-promotions?orgId=1) [Downstream cockpit](http://tripleo-cockpit.lab4.eng.bos.redhat.com) [OpenStack Program Meeting 2022]( https://docs.google.com/document/d/1n6ArkMh68R9zivjlyGbpedkggk1wMwEIcrMZSN2uIjc/edit) --- ## July 07th (Thursday) vexxhost low performance/network issues are back - random issues all over - going crazy as ruck/rovers :D and PSI was down this morning but back now. #### Gate * https://review.opendev.org/c/openstack/puppet-tripleo/+/848733/3#message-b8a44d79982074120be1fdc0652de3217f755c90 - content-provider - interesting - we should debug this further. * https://review.opendev.org/c/openstack/tripleo-ansible/+/848744/ - random infra issue #### Integration line jobs ##### Master - 04th July hash from yesterday 01a799f708f8c3c38d85e6a481481735 is out by 2 jobs only 01/64 ovb failed mostly due to mirror issues:- https://logserver.rdoproject.org/48/38348/22/check/periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp_1supp-featureset064-master/d9b844b/logs/undercloud/home/zuul/overcloud1_deploy.log.txt.gz ~~~ 2022-07-07 04:50:38 | 2022-07-07 04:50:38.793548 | fa163ee4-9156-4545-acd6-00000000079a | FATAL | Install tuned | overcloud1-controller-0 | error={"changed": false, "msg": "Failed to download metadata for repo 'quickstart-centos-crb': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried", "rc": 1, "results": []} ~~~ reruninng latest hash here: https://review.rdoproject.org/r/c/testproject/+/38348 ##### Wallaby - Promoted on 05th July * fs01/35/sc010 - b1c1cc02bd992f4ac8c75a830adefd23 * Rerunning here: https://review.rdoproject.org/r/c/testproject/+/39357 * sc010 internal https://code.engineering.redhat.com/gerrit/c/testproject/+/201382 ##### Wallaby c8 - Promoted on 05th July - Green * 0038058798919720edb4da904876798d passed everything - will promote anytime. ##### Train - promoted 06th July - No new hash this morning - Green ##### 17/9 - promoted 06th July running right now ##### 17/8 - promoted 05th July running right now ##### 16.2/8 - promoted 05th July Only out by periodic-tripleo-ci-rhel-8-standalone-full-tempest-api-rhos-16.2 - (tempest failure in last run) full-tempest-api failed: running here: https://code.engineering.redhat.com/gerrit/c/testproject/+/209874 ysandeep will open a bug. #### Component line jobs ##### master - rerunning here: https://review.rdoproject.org/r/c/testproject/+/28446 * clients * tripleo * ~~common~~ ##### wallaby - rerunning here: https://review.rdoproject.org/r/c/testproject/+/36254 * common * network * glance - running right now ##### C8 wallaby - rerunning here: https://review.rdoproject.org/r/c/testproject/+/36254 * network ##### C8 Train - rerunning here: https://review.rdoproject.org/r/c/testproject/+/36254 * network - sc010 internal - psi is out currently * tripleo #### 17/9 * network - sc010 - tempest https://sf.hosted.upshift.rdu2.redhat.com/logs/58/398758/51/check/periodic-tripleo-ci-rhel-9-scenario010-standalone-network-rhos-17/5f2ec48/logs/undercloud/var/log/tempest/failing_tests.log https://code.engineering.redhat.com/gerrit/c/testproject/+/201382 New content came today - but we promoted yesterday * client * tempest * tripleo --- ### Known New Bugs * Randomly deployment fails on keystone related tasks - https://bugs.launchpad.net/tripleo/+bug/1980918 * Randomly CI jobs are failing trying to start *_db_sync containers https://bugs.launchpad.net/tripleo/+bug/1980921 * RHOSP16.2 Neutron QOS related tempest test failure and traceback observed in neutron server log: "ERROR neutron.plugins.ml2.managers TypeError: qos_del() got an unexpected keyword argument 'if_exists'" https://bugzilla.redhat.com/show_bug.cgi?id=2104931 * [tripleo-ci-centos-9-standalone and multinode-ipa are failing the test_minimum_basic_instance_hard_reboot_after_vol_snap_deletion test - Failed to find floating IP](https://bugs.launchpad.net/tripleo/+bug/1980255) * In skiplist * Need to debug the cause * [ovn-dbs-bundle fails to start because ovn-ctl crashes with coredump generated](https://bugs.launchpad.net/tripleo/+bug/1979276) - puppet-glance-tripleo-standalone job failing, tripleo jobs not affected ### Intermittent Failures ### Fixed * sc01 17/8 * ~~https://bugzilla.redhat.com/show_bug.cgi?id=2104535~~ * ~~https://bugs.launchpad.net/tripleo/+bug/1980869~~ * ~~https://bugzilla.redhat.com/show_bug.cgi?id=2000535 sc01 failing in downstream~~ * ~~[openstack-tox-tht failing with: No module named 'testtools'](https://bugs.launchpad.net/tripleo/+bug/1980552)~~ * Fix merged in master and wallaby * ~~https://review.opendev.org/c/openstack/tripleo-heat-templates/+/848474~~ * ~~[Error overcloud network provision failed](https://bugs.launchpad.net/tripleo/+bug/1980333) - Blocking master promotion~~ * Fix merged: * ~~https://review.opendev.org/c/openstack/neutron/+/848396~~ --- ### Old dates notes:- --- July 06th(Wednesday) #### Gate Looks good, following in rerun:- * ~~https://review.opendev.org/c/openstack/tripleo-ansible/+/847927/~~ * ~~https://review.opendev.org/c/openstack/tripleo-heat-templates/+/830280/~~ #### Integration lines #### Master - Promoted on 04th July * fs001/fs064 - pending https://review.rdoproject.org/r/c/testproject/+/38348 #### Wallaby - Promoted on 05th July Will check after today's run #### Wallaby c8- Promoted on 05th July - All green no new hash ### train - 03rd July fs035 - rerunning here: https://review.rdoproject.org/r/c/testproject/+/32054 skipping fs035 and promoting https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/43903 - need to revert once train promotes #### 17/9 Sc001 legit bug: * https://bugzilla.redhat.com/show_bug.cgi?id=2000535 * /me proposed a fix: https://code.engineering.redhat.com/gerrit/c/openstack-tripleo-heat-templates/+/269923 we had to disable dashboard for sceanrio001 previously https://bugzilla.redhat.com/show_bug.cgi?id=2000535 Promoting fix through tripleo line. https://code.engineering.redhat.com/gerrit/c/testproject/+/276230 https://code.engineering.redhat.com/gerrit/c/testproject/+/211643 #### 17/8 - 05th July If testproject passes: https://code.engineering.redhat.com/gerrit/c/testproject/+/201382 merge the environment patch https://code.engineering.redhat.com/gerrit/c/tripleo-environments/+/418602) #### 16.2 - 05th July full-tempest-api failed: running here: https://code.engineering.redhat.com/gerrit/c/testproject/+/209874 #### #### Component lines ##### wallaby - rerunning here: https://review.rdoproject.org/r/c/testproject/+/21853 * network * clients #### C8 wallaby * baremetal * tripleo * network Running here: https://review.rdoproject.org/r/c/testproject/+/43466 #### --- ## 05th July(Tuesday) #### Gate Looks good ### Integration lines #### Master - Promoted on 04th July * sc010: https://code.engineering.redhat.com/gerrit/c/testproject/+/211643 * All ovb - mirror + random tempest test * ~~fs001~~/~~020~~/035/~~039~~ * Rerunning here: https://review.rdoproject.org/r/c/testproject/+/38348 #### Wallaby - Promoted on 04th July * fs035/~~39~~: https://review.rdoproject.org/r/c/testproject/+/39357 * ~~sc010: https://code.engineering.redhat.com/gerrit/c/testproject/+/211643~~ passed in rerun #### Wallaby C8 - Promoted on 30th June * fs035 tempest failure: https://review.rdoproject.org/r/c/testproject/+/43466 #### Train - Promoted on 03rd July * ~~fs001~~/035 - tempest test: https://review.rdoproject.org/r/c/testproject/+/32054 #### 17/9 - Promoted on 04th July - No new hash till this morning, line running right now. #### 17/8 - wohoo \o/ promoted today - 05th July - All green * ~~rerunning jenkins jobs~~ passed in rerun should promote soon #### 16.2 - wohoo \o/ promoted today - 05th July - All green * ~~rerunning jenkins jobs~~ passed in rerun should promote soon ### Component lines #### master * ~~Baremetal - ovb~~ * ~~common - rerunning~~ * ~~network - tempest-api~~ * ~~tripleo - ipa~~ rerunning here: https://review.rdoproject.org/r/c/testproject/+/28446/ #### wallaby - All green :) Retrigerring few component which have new build in last few hours as per https://trunk.rdoproject.org/centos9-wallaby/report.html * promoted by 23:00 (note vexxhost had outage - OVB jobs needed rerun - would have missed deadline - were skipped in criteria and are in rerun) * tripleo * common * clients * baremetal * network (rerunning rev'ed hash) * reran and promoted tripleo component on rhos-17 on rhel-9 - standalone-scenario001 failure in the integration line required that fix - rekicked integration line * tripleo: https://review.rdoproject.org/r/c/testproject/+/42657 * network: https://review.rdoproject.org/r/c/testproject/+/21853, https://review.rdoproject.org/r/c/testproject/+/31165 (rerun) * Tempest: https://review.rdoproject.org/r/c/testproject/+/36254 * c8 component promotion: https://review.rdoproject.org/r/c/testproject/+/43113 * compute - c9: https://review.rdoproject.org/r/c/testproject/+/39960 #### wallaby C8 - Green * ~~network - ovb - rerunning here: ~~https://review.rdoproject.org/r/c/testproject/+/43466 * ~~tripleo - ovb - failure in deployment~~ rerunning here: https://review.rdoproject.org/r/c/testproject/+/28446/ #### train - All green :) #### 17/9 - Green Jenkins jobs failure but they are not in criteria #### 17/8 - Green ~~Multiple components are blocked due to jenkins jobs, retriggered them as NTP issue seems to be fixed.~~ * cinder * clients * common * manila * network * octavia * swift * tripleo * ui * validation * Bm failed - not in critera - need to debug #### 16.2 - Green * ~~network - jenkins job~~ * ~~Octavia - Jenkins and sc010 job~~ * ~~tripleo - jekins job~~ jenkins job retriggered ~~sc010 rerunning here: https://code.engineering.redhat.com/gerrit/c/testproject/+/201382~~ --- ## 2022-07-04(Monday) ##### Gate - looks good ~~https://review.opendev.org/c/openstack/puppet-tripleo/+/848289/ in recheck~~ merged #### Integration lines Focusing on promoting master/wallaby - 6 days old now - no known bug - recheck dance ongoing. * Master : woot promoted today * Trying to promote hash 48f1c32c733e513ed9b5ca2e67a8ed3e - down to 2 jobs * Testproject: https://review.rdoproject.org/r/c/testproject/+/39357 * Latest hash also in rerun here: * https://review.rdoproject.org/r/c/testproject/+/38348 * https://code.engineering.redhat.com/gerrit/c/testproject/+/211643 * Wallaby - Woot promoted today \o/ * ~~trying to promote hash https://review.rdoproject.org/r/c/testproject/+/39357 - down to 1 job~~ * ~~Testproject: https://review.rdoproject.org/r/c/testproject/+/38348 ~~ * wallaby c8 - promoted on 30th June * Ovb failure(001/035) - random tempest * Rerunning here: https://review.rdoproject.org/r/c/testproject/+/43466 * train - promoted on 03rd July - No new content - All green * 17/9 - promoted on 03rd July * Today's failed jobs in rerun: https://code.engineering.redhat.com/gerrit/c/testproject/+/201382 * 17/8 - promoted on 01st July * Jenkins jobs blocking promotion - retriggered * Failed with NTP issue - pinged psedlak - he said fhubik is looking into this. * https://bugzilla.redhat.com/show_bug.cgi?id=2103608#c4 * 16.2 - promoted on 28th June. * https://bugzilla.redhat.com/2103117 * Pinged #rhos-delivery to revert to older version of dog pile cache. * Shrestha retagged older version * rekicked 16.2 line - lets see in few hours. #### Component lines * Master * ~~Baremetal - ovb failed~~ * passed in rerun * Wallaby - All green now * ~~Tripleo - rerunning failed jobs.~~ passed in rerun * wallaby Centos-8 * ~~network - ovb failure - random tempest~~ * ~~Rerunning here: https://review.rdoproject.org/r/c/testproject/+/43466~~ * train - All green now * ~~network and octavia - sc010 jobs failed~~ passed in rerun * 17/9 - All green - few jenkins failures(but they are not in criteria - are they supposed to be in criteria?) * 17/8 * jekins - NTP Issue * cinder * clients * common * manila * network * octavia * swift * tripleo * ui * validation * Bm failed - not in criteria - will check * 16.2 * Below component in rerun * https://code.engineering.redhat.com/gerrit/c/testproject/+/209874 * network - jenkins and sc010 * security * tripleo - jenkins ## 2022-07-03 (Weekend- Sunday) ##### Gate - Green * https://review.opendev.org/c/openstack/tripleo-heat-templates/+/845215/ - mirror issue - rechecked - merged * https://review.opendev.org/c/openstack/tripleo-ansible/+/833973/ - same tempest tests failing randomly(keeping an eye on this failure before reporting a bug) - merged * https://8e4a691f2753b7d79154-0855c69ab3b618fcee1eae2bcd2375e7.ssl.cf5.rackcdn.com/833973/1/gate/tripleo-ci-centos-9-standalone/82c8e02/logs/undercloud/var/log/tempest/failing_tests.log * https://review.opendev.org/c/openstack/python-tripleoclient/+/846937/3#message-6e2b5a954c4eedee643e79dc93ee87db56965ad7 - Weird issue - haven't seen in other jobs - recheked - merged ##### Periodic Master * Network component promoted which will bring ovb network provision fix * Master line running right now - lets see in few hours * Rerunning failed jobs here: https://review.rdoproject.org/r/c/testproject/+/39357 - mostly ovb - tempest failures Wallaby * Rerunning failed jobs * https://review.rdoproject.org/r/c/testproject/+/38348/9/zuul.yaml * sc010 kvm internal job failing because of NTP issue: * https://sf.hosted.upshift.rdu2.redhat.com/logs/22/43822/12/check-rdo/periodic-tripleo-ci-centos-9-scenario010-kvm-internal-standalone-wallaby/63b810e/logs/undercloud/home/zuul/standalone_deploy.log * Jobs could not resolve clock.corp.redhat.com as no internal dns entry in /etc/resolv.conf * Fix: https://code.engineering.redhat.com/gerrit/c/openstack/tripleo-ci-internal-config/+/418194 * Passed after above fix. wallaby c9 component only tripleo is behind- rerunning failed job: https://review.rdoproject.org/r/c/testproject/+/39357 Train * ovb fs001 failed - rerunning * https://review.rdoproject.org/r/c/testproject/+/21853 * sc010 kvm failed due to ntp issue. - Passed after https://code.engineering.redhat.com/gerrit/c/openstack/tripleo-ci-internal-config/+/418194 * Wohoo train promoted 17/9 * sc10 - rerunning here: https://code.engineering.redhat.com/gerrit/c/testproject/+/211643 * Component are green * wohoo 17/9 promoted --- ## 2022-07-02 (Weekend- Saturday) #### Gate - resolved now * [openstack-tox-tht failing with: No module named 'testtools'](https://bugs.launchpad.net/tripleo/+bug/1980552) * Updated fixes: * https://review.opendev.org/c/openstack/tripleo-heat-templates/+/848474 * https://review.opendev.org/c/openstack/tripleo-heat-templates/+/848490 Thanks to takashi/fmount for reviews on weekend to unblock gate. rechecked all the patches which were blocked due to tox-tht #### Periodic master - still ovb issue - fix will come via network component. * Network fix still hanging - still in component line * https://review.opendev.org/c/openstack/neutron/+/848396 * Rerunning network failed jobs * https://review.rdoproject.org/r/c/testproject/+/32054 * Force promoted tripleo component for master - Ignoring fs001 (Already discussed with Ronelle on Friday) * https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/43832 #### Downstream ##### 17/9 - https://bugzilla.redhat.com/show_bug.cgi?id=2103117 solved for 17 after jon reverted dogpile previous version * Rekicked few failing jobs for 17 line(seems randmon failure) - https://code.engineering.redhat.com/gerrit/c/testproject/+/276230 * sc012 failure looks weird - why centos repo in downstream - Will check on Monday * https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-17-rhel9/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-9-scenario012-standalone-rhos-17/1d78836/job-output.txt * 17/9 promoted - wohoo ##### 17/9 components tripleo/common have failed jobs - retrigged them here: https://code.engineering.redhat.com/gerrit/c/testproject/+/209874 passed in rerun ##### 16.2 All jobs are failing - Same dog pile issue https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-integration-rhos-16.2/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-standalone-rhos-16.2/00f4b75/logs/undercloud/var/log/extra/errors.txt Updated: https://bugzilla.redhat.com/show_bug.cgi?id=2103117 and Cix card: https://trello.com/c/mzY3eUif/2608-cixbz2103117osp17rhel9deployment-failing-during-keystonebootstrap-error-keystone-typeerror-set-got-an-unexpected-keyword-argumen ## 2022-07-01 ### Reruns and Investigations: (crossed out := known bugs recorded above) **NOTE:** Watch for running `testproject` jobs on https://review.rdoproject.org/zuul/status and https://sf.hosted.upshift.rdu2.redhat.com/zuul/t/tripleo-ci-internal/status. * Request from #rdo * They want to bump tempest * https://review.rdoproject.org/r/c/rdoinfo/+/43785 * Testing bump with testproject * https://review.rdoproject.org/r/c/testproject/+/42693 * Gate failure All gate failure passed in rerun(rechecked everything which failed in last 2 days) - No new issue observed. * [C9 Master](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/2tivP9BWz/component-pipeline?orgId=1): * Last promotion 28th * OVB known network provision blocking promotion. * Tripleo component jobs in rerun * https://review.rdoproject.org/r/c/testproject/+/39357 * We will skip fs001 and promote tripleo via testpatch incase fs001 fails. (Tripleo component is 15 days old now) * https://review.rdoproject.org/r/c/testproject/+/32054 - trying to promote network component * [C9 Wallaby](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/UcMR8py7z/component-pipeline-wallaby-centos9?orgId=1): * Last promotion: 28th * Line running right now - will check in few hours * Component line - tripleo component in rerun https://review.rdoproject.org/r/c/testproject/+/39357 * [C8 Wallaby](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/ZLvFbT9Mz/component-pipeline-wallaby?orgId=1): * Last promotion: 30th * ovb jobs failed - not rerunning as the promotion is recent * [C8 Train](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/aJeQpzVGz/component-pipeline-train?orgId=1): * Two jobs failed - in rerun https://review.rdoproject.org/r/c/testproject/+/21853 * Component line - Only octavia lagging due to a node failure. * In recheck - https://review.rdoproject.org/r/c/tripleo-downstream-trigger-nested-virt/+/43534 * [RHEL9 OSP17](http://tripleo-cockpit.lab4.eng.bos.redhat.com/d/lF7RUpsnk/rhel9-rhos17-full-component-pipeline?orgId=1): * https://bugzilla.redhat.com/show_bug.cgi?id=2103117 * Worked fine with older dogpile cache - worked with herve beraud and Jon schulter to revert to older good known version. * [RHEL8 OSP17](http://tripleo-cockpit.usersys.redhat.com/d/v8gltz4Mz/rhos-17-full-component-pipeline?orgId=1): * Last promotion 30th * Few jobs failed in rerun * https://code.engineering.redhat.com/gerrit/c/testproject/+/276230 * [RHEL8 OSP16.2](http://tripleo-cockpit.usersys.redhat.com/d/KyHCwLHMk/rhos-16-2-full-component-pipeline?orgId=1): * Rerunning bm job(Did a workaround on host - we have passed the older failure): https://code.engineering.redhat.com/gerrit/c/testproject/+/211643 * https://sf.hosted.upshift.rdu2.redhat.com/logs/43/211643/154/check/periodic-tripleo-ci-rhel-8-bm_envD-3ctlr_1comp-featureset035-rhos-16.2/c3bd266/logs/undercloud/home/zuul/undercloud_install.log