owned this note
owned this note
Published
Linked with GitHub
# Ruck Rover - Mar 18th - 24h 2022
###### tags: `ruck_rover`
## Previous RR notes: https://hackmd.io/TBlzcsVTSv6nAKg9ZY-8cA
## Notes:
* wallaby c8 - https://review.rdoproject.org/r/c/testproject/+/35663 in rerun - missing only periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-wallaby. Ask amol to skip on promoter and promote this hash
* https://code.engineering.redhat.com/gerrit/c/testproject/+/190672/ rerun missing two jobs to promote 17
## Bugs
* https://bugs.launchpad.net/tripleo/+bug/1965426
* https://bugs.launchpad.net/bugs/1964940
* https://bugs.launchpad.net/tripleo/+bug/1965540
* https://bugzilla.redhat.com/2066264 - Downstream bug
* https://bugs.launchpad.net/tripleo/+bug/1966165
* ~~https://bugs.launchpad.net/tripleo/+bug/1965528~~
* ~~https://bugs.launchpad.net/tripleo/+bug/1965934~~
* ~~https://bugs.launchpad.net/tripleo/+bug/1965525~~
## March 24:
* Upstream
* Status of component lines as follows :
* **periodic-tripleo-ci-centos-9-standalone-full-tempest-api-network-master**
sporadic tempest test failures (different tests everytime based on latest two failed run)
* **periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001-network-master**
"tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_update_router_admin_state" failed once in latest run
* **periodic-tripleo-ci-centos-9-containers-multinode-network-master**
OSError : OSError: [Errno 16] Device or resource busy: '/home/zuul/tripleo-deploy/undercloud/heat_launcher/tripleo_deploy-_oygmwuk' failed on latest run only
* **periodic-tripleo-ci-centos-9-scenario007-standalone-network-wallaby**
failed in latest run with Error running container image prepare
* Running above failed jobs again due to not having consistent failures on this patch : https://review.rdoproject.org/r/c/testproject/+/33582
* Testing of failed component jobs for all stable branches : https://review.rdoproject.org/r/q/topic:testing+status:open
* Still having vexx host issues
* https://logserver.rdoproject.org/openstack-periodic-integration-stable4/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-scenario001-standalone-train/2bc20b1/logs/undercloud/home/zuul/standalone_deploy.log.txt.gz
* Downstream
* osp16.2
* Promotions happened on march 22. Rekicked failed jenkins job :
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/pipeline/(Latest update : we can see one job green and other is running well)
* Components :
* periodic-tripleo-ci-rhel-8-containers-multinode-network-rhos-16.2 : consistent failures with bunch of tempest test in latest two runs.
(https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-containers-multinode-network-rhos-16.2/6e615ec/logs/undercloud/var/log/tempest/stestr_results.html)
* periodic-tripleo-ci-rhel-8-standalone-network-rhos-16.2 : consistent failures with different tempest tests. sporadic failures. (https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-standalone-network-rhos-16.2/d95bf66/logs/undercloud/var/log/tempest/stestr_results.html)
* osp17
* Promotions happend on march 23. Retesting failed job here : https://code.engineering.redhat.com/gerrit/c/testproject/+/210197/91
* Components : Not at all blockers on component blockers
* osp17 rhel9
* Promotions happend on march 23. No blockers.
* Components :
* periodic-tripleo-ci-rhel-9-ovb-3ctlr_1comp-featureset001-internal-clients-rhos-17 failed once (https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-component-clients/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-9-ovb-3ctlr_1comp-featureset001-internal-clients-rhos-17/542913d/job-output.txt)
*
## March 23:
* Upstream
* Removing fs035 from criteria for wallaby centos 8 and 9
* Rechecking those
* Recheck victoria, ussuri and master jobs
* Recheck https://review.opendev.org/c/openstack/tripleo-heat-templates/+/834285/ to confirm if https://bugs.launchpad.net/tripleo/+bug/1965934 was fixed
* Downstream
* 16.2 promoted on march 22
* green run of missing two jobs (https://code.engineering.redhat.com/gerrit/c/testproject/+/190672/) and 17 promoted on march 23
*
## March 22:
* Upstream
* https://review.opendev.org/c/openstack/openstack-tempest-skiplist/+/834628 to skip web_download test on other jobs
* Not only bad things happens on RR, we promote Victoria!!!
* Containers were deleted from rdo registry, coppied again, need to merge https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/40311/ to automatic copies
* Still a lot of download mirror errors on train jobs, waiting the queue finish to re-run the failing ones
* Gate
* Multinode centos 7 failing
* https://bugs.launchpad.net/tripleo/+bug/1965934
* Downstream
* Rerunning failed jobs promotion blokers as no consistent same failure
https://code.engineering.redhat.com/gerrit/c/testproject/+/209874
## March 21:
* Downstream (Pooja pls check):
* 16.2: baremetal and validations component are both old
* rerunning: periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-internal-rhos-16.2 and periodic-tripleo-rhel-8-upload-job-trigger-rhos-16.2 (jenkins jobs have not triggered since 03/15). Pls check in with attila/sandeep about that.
* 17
* Upstream
* Recheck master (one failure), ussuri, wallaby, victoria
* Talked with jpodivin about https://bugs.launchpad.net/tripleo/+bug/1965426 and right now, somehow is not being trigged anymore he is still investigating:
* [09:16:47] <jpodivin> arxcruz|ruck: I do, although I'm not sure what to make of them.
[09:17:02] <jpodivin> The jobs are coming up green, which is good. but ...
[09:17:51] <jpodivin> The reason they are succeeding seems to be that the behavior triggering the failure "wrong branch being checked out" isn't happening anymore ...
[09:18:18] <jpodivin> because no branch of the project is being checked out.
[09:18:31] <jpodivin> And I don't think we have merged anything that could fix this.
[09:20:12] <jpodivin> arxcruz|ruck: compare successful (https://f38d3934203b991a7ebe-d701f6a98461967df274f184d5a7d3cd.ssl.cf2.rackcdn.com/827628/2/gate/tripleo-ci-centos-8-content-provider/2dcbb84/job-output.txt) and failing (https://60ce53b5f45ed0ef7bd3-e873feb845d99f2e0685947947034235.ssl.cf1.rackcdn.com/827628/2/gate/tripleo-ci-centos-8-content-provider/19276d3/job-output.txt) run to see what I mean
[09:21:00] <jpodivin> The one that succeeds never checks out the validations-libs component. At all. So it can't have a branch mismatch and neither the error
### Mar-18th:
* Promotion blockers
* https://bugs.launchpad.net/tripleo/+bug/1965525
* This is hard to identify if it's related to mysql or keystone, or dns issue, or if all of them are related. TL;DR: some tests are failing because of identity error, that is caused because keystone wasn't able to connect to mysql but mysql seems to be fine.
* pinged tkajinum
* yatin is working on https://bugs.launchpad.net/bugs/1964940
* https://bugs.launchpad.net/tripleo/+bug/1965540
* Cinder test failing with URI for web-downloaded does not pass filtering
* https://bugs.launchpad.net/tripleo/+bug/1965546 - TestBasicOps failing on curl the metadata service.
* https://review.opendev.org/c/openstack/openstack-tempest-skiplist/+/834330 skipping
* Gate failures
* ~~https://bugs.launchpad.net/tripleo/+bug/1965528~~
* ~~Probably introduced by https://review.opendev.org/c/openstack/tripleo-ansible/+/832785/5/tripleo_ansible/roles/backup_and_restore/defaults/main.yml~~
* ~~Do not reference backup/restore config unless defined https://review.opendev.org/c/openstack/tripleo-ansible/+/834279~~
* ~~DNM Testing tripleo-ansible/+/834279 https://review.opendev.org/c/openstack/tripleo-heat-templates/+/834280~~
* https://bugs.launchpad.net/tripleo/+bug/1965426
* 14:23:47] <jpodivin> arxcruz|ruck: Hi, regarding the https://bugs.launchpad.net/tripleo/+bug/1965426 we have couple of patches in the RDO which should address the issue.
* downstream - dependency issue http://pastebin.test.redhat.com/1038119 (didn't file yet... which component? this is quickstart deps?)
### Mar-17th:
* Gate failure https://bugs.launchpad.net/tripleo/+bug/1965426
* happens only in ussuri or older branches
* affects validations jobs
* ping jpovidin or matbu tomorrow, I got no answers today
* Train C8 promoted, some components promoted
* Chasing promotion of a few old components, chasing master and wallaby c9
* Issues with mirrors still happening:
* <dviroel|ruck> guilhermesp_: hey o/ - today we start to face mirrors issues again, less then before the migration, but we still have jobs failing to download content
* <dviroel|ruck> guilhermesp_: "Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds"
* <guilhermesp_> yeah, i actually migrated to another old compute ( i was facing issues to live migrate it between those old systems to the new ones ) -- you can re-open the ticket and we can try to work on get a permanent solution. Maybe even taking a snapshot and relaunching it to a new hv if that not a lot of work on your side -- we'll see
* nhicher will re-open the ticket.
### Mar-16th
* (doug): C8 wallaby promoted by skipping fs020 this time.
* It seems that vexxhost fixed the issues with mirrors by migration their VMs (reopen 362150 if needed).
* There are some components that are too old - I triggered some testprojects to see how it goes. I noticed that fs001 was passing on component line, we should try to promoted components as possible.- retriggered failed jobs: https://review.rdoproject.org/r/c/testproject/+/40642
* Container build failure on 16.2 needs investigation. (it is fixed now, mirror issue, https://code.engineering.redhat.com/gerrit/c/testproject/+/317875)
* Trying to promote Train C8 with 59c817e5bcaebfb5aeee50225b2dc5f2 (missing ovbs) - Already promoted
* https://bugs.launchpad.net/tripleo/+bug/1965124 mirror issues
* stable/wallaby pep8 broken? commented https://bugs.launchpad.net/tripleo/+bug/1964935/comments/3 & posted cherrypick https://review.opendev.org/c/openstack/tripleo-heat-templates/+/834012 (is merged now)
### Mar-15th
* (doug): criteria change was needed to promote wallaby-c9 (fs001). Still missing wallaby-c8 and train-c8, but unfortunately vexxhost issues are back, mirror and connectivity issues with nodes made lot of jobs to fail. (╯°□°)╯︵ ┻━┻. @chkumar we need to update program call doc tomorrow.
### Mar-14th
* (doug): Lots of failures related to mirrors. Nothing promoted so far, featureset035 is blocking wallaby and master C9 in previous hashes and rerunning those jobs throws different errors everytime. Last thing on 14th, rerunning master and wallaby c9 failed jobs - some jobs still failing due to mirrors, others multiple tempest failures. featureset001 needs attention in almost all releases.
## Upstream
### Status:
* Gate:
* C9 main:
* C9 wallaby:
* C8 wallaby:
* C8 victoria:
* C8 ussuri:
* C8 train: Promoted Today
* C7 train: All jobs have passed, hope it promotes.
### Upstream Issues:
* CS9 master component jobs are failing due to missing containers from Registry
* https://bugs.launchpad.net/tripleo/+bug/1964457/ - https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/833027
* FS01 re-run for wallaby/victoria/ussuri - https://review.rdoproject.org/r/c/testproject/+/40525
* **Master failure**
* fs035 testing: https://review.rdoproject.org/r/c/testproject/+/40461
* Updated depends on with
* https://review.opendev.org/c/openstack/tripleo-heat-templates/+/833708
* https://review.opendev.org/c/openstack/puppet-tripleo/+/833711
* with 831608: Make rescue, volume attachment compute tests to create SSH-able server | https://review.opendev.org/c/openstack/tempest/+/831608
* Failures :https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset035-master
* No pattern found - different failures every run
* As per takashi, https://bugs.launchpad.net/tripleo/+bug/1964824 might be related (rough guess)
* Added the logs there https://bugs.launchpad.net/tripleo/+bug/1964824/comments/7
* **Mirror Issue**
* https://logserver.rdoproject.org/openstack-periodic-integration-stable4/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-centos-8-buildimage-overcloud-full-train/1fd6839/job-output.txt
* re-running cs8 train skipped /failed jobs here: https://review.rdoproject.org/r/c/testproject/+/40521
* More Jobs
* https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-9-standalone-master/b5b47fa/logs/undercloud/home/zuul/install_packages.sh.log.txt.gz
* **RETRY_LIMIT in ovb check jobs**
* Comments
* Asked on #rhos-ops with below details
* https://support.vexxhost.com/hc/en-us/requests/362100
* https://support.vexxhost.com/hc/en-us/requests/362141
* node_failures in component line https://review.rdoproject.org/zuul/builds?result=NODE_FAILURE timestamp from 04:31:31 to 06:55:41 UTC
* Jobs with retry_limit
* `stack_status_reason: 'Resource CREATE failed: OperationalError: resources.baremetal_env.resources.openstack_baremetal_servers.resources[2].resources.baremetal_ports:` is the error
* Below is the stack id and job logs
* https://logserver.rdoproject.org/32/831932/6/openstack-check/tripleo-ci-centos-9-ovb-3ctlr_1comp-featureset001/9f37b22/job-output.txt
stack id: 177ce44b-f404-46bd-9869-888b7ce19ff6
* https://logserver.rdoproject.org/96/833196/1/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039/149d4c6/job-output.txt
stack id: 0f0d76cd-1923-4574-a199-002632bc998d
* https://logserver.rdoproject.org/22/832722/1/openstack-check/tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039/c36da1c/job-output.txt
stack id: b9431de7-975c-4401-8181-b28708dd87f2
* **Gate Failures**
* Already rechecked (if it still failes, need to log a bug)
* https://review.opendev.org/c/openstack/tripleo-heat-templates/+/832722/
* https://review.opendev.org/c/openstack/tripleo-heat-templates/+/833556/
* https://review.opendev.org/c/openstack/tripleo-heat-templates/+/818637/
---
## Downstream
### Status:
* RHOS 17:
* RHOS 16.2:
### Downstream Issues:
* Container build on 16-2 failing:
* https://sf.hosted.upshift.rdu2.redhat.com/logs/84/195884/144/check/periodic-tripleo-build-containers-ubi-8-internal-rhel-8-build-push-upload-rhos-16.2/594fe71/logs/build.log
---
## RDO
* sent a message to amoralej and jcapitao about this job failing - no responses yet
* https://jenkins-cloudsig-ci.apps.ocp.ci.centos.org/job/weirdo-victoria-centos8-promote-puppet-openstack-scenario003/
---
## Fixed:
~~* tripleo-ansible-centos-8-molecule-tripleo_cephadm failure
* Status: waiting for reviews
* Patches: https://review.opendev.org/c/openstack/tripleo-ansible/+/833178
* In gates~~
* container build is passing now
* https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-build-containers-centos-9-push-master/bbc6eb7/logs/build.log
* SSL Cert update happened that time
* re-running here https://review.rdoproject.org/r/c/testproject/+/36356/26#message-c89594e16604bb56277d91f2f61dd9b50a9ecb99
* periodic-tripleo-ci-build-containers-centos-9-push-master https://review.rdoproject.org/zuul/build/7eb5a6ac4cd94012a9b8017720f0558c : SUCCESS in 1h 05m 28s
* https://review.rdoproject.org/r/c/testproject/+/36356/26#message-bbe021d515d0c112bd69360b1927fb67396982b8
* **openstackclient>=5.2.0 conflicting pip dependencies**
* Status: under investigation
* LP: https://bugs.launchpad.net/tripleo/+bug/1964468
* CIX:
* Jobs:
* periodic-tripleo-ci-build-containers-centos-9-push-master
* Patches
* Comments:
* https://bugs.launchpad.net/tripleo/+bug/1964477 - Tengu is working on a fix that may be useful
* https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/833091
* dviroel: unable to reproduce in my c9 env
* better now on testproject (https://review.rdoproject.org/r/c/testproject/+/36356/26)
* at least the piece that was failing
* UpStream gate failure
* https://520c46833a3e12074454-e521a4dbcb573778e4bab95c0cd81671.ssl.cf2.rackcdn.com/831932/6/gate/tripleo-ci-centos-9-content-provider-wallaby/b8490fd/logs/delorean_logs/component/validation/91/63/9163d8bcbcf364db9c04323436de35fc3946b380_dev/rpmbuild.log.txt.gz
* `DEBUG: =========================
DEBUG: Failures during discovery
DEBUG: =========================
DEBUG: --- import errors ---
DEBUG: Failed to import test module: validations_libs.tests.callback_plugins.test_vf_fail_if_no_hosts
DEBUG: Traceback (most recent call last):
DEBUG: File "/usr/lib64/python3.9/unittest/loader.py", line 436, in _find_test_path
DEBUG: module = self._get_module_from_name(name)
DEBUG: File "/usr/lib64/python3.9/unittest/loader.py", line 377, in _get_module_from_name
DEBUG: __import__(name)
DEBUG: File "/builddir/build/BUILD/validations-libs-1.7.0.dev8/validations_libs/tests/callback_plugins/test_vf_fail_if_no_hosts.py", line 27, in <module>
DEBUG: from oslotest import base
DEBUG: ModuleNotFoundError: No module named 'oslotest'`
* might came due to this https://github.com/openstack/validations-libs/commit/9163d8bcbcf364db9c04323436de35fc3946b380#diff-fac4c6890301d4de5c3f4266837803d5240c84a3d8b6c735bbc6a64c39d2f94eR16
* Recent run got cleared in check
* Based on this https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-content-provider-wallaby
* if again comes, please file a bug and add python-oslotest as BR in https://github.com/rdo-packages/validations-libs-distgit/blob/wallaby-rdo/python-validations-libs.spec
* It seems this job is passing https://zuul.openstack.org/builds?job_name=tripleo-ci-centos-9-content-provider-wallaby&skip=0
* **Check Failures**
* https://bugs.launchpad.net/tripleo/+bug/1964595
* Blocking some patches like: https://review.opendev.org/c/openstack/tripleo-ansible/+/833182/
* https://bugs.launchpad.net/tripleo/+bug/1964530
* Comments:
* fultonj working on a fix
* https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-scenario001-standalone
* https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-scenario004-standalone
* Patch: https://review.opendev.org/c/openstack/tripleo-ansible/+/833182
* Merged, waiting for more runs
* **CS9 wallaby**
* Comments
* ~~re-running full-tempest-api and scenario her~~ ~~https://review.rdoproject.org/r/c/testproject/+/36255~~
* CS9 Tempest tests failing with "Host 'standalone.localdomain' is not mapped to any cell
* https://bugs.launchpad.net/tripleo/+bug/1964269
* Affected jobs:
* periodic-tripleo-ci-centos-9-scenario007-multinode-oooq-container-master
* fs35
* Re-running failed jobs: https://review.rdoproject.org/r/c/testproject/+/40445
* fs035 passed
* centos-9-scenario001-standalone tripleo_ansible_inventory or deployed_metalsmith parameter is required [Fixed]
* https://bugs.launchpad.net/tripleo/+bug/1964530 (fulton working on fix)
* testing it here: https://review.opendev.org/c/openstack/tripleo-ci/+/833337
* CIX: https://trello.com/c/GLMsaCid/2401-cixlp1964530tripleociproa-centos-9-scenario001-standalone-tripleoansibleinventory-or-deployedmetalsmith-parameter-is-required
* affects all jobs that deploy ceph (scn001, scn004, scn010)
* Comments:
* we need tqe change that updates ceph container to 6.0.7 - but depends on wallaby tripleo-common too (see https://review.opendev.org/c/openstack/tripleo-ci/+/833337)
* ~~we depends on https://bugs.launchpad.net/tripleo/+bug/1964595 fix too~~
* **ERROR** State in few jobs
* https://review.rdoproject.org/zuul/builds?result=ERROR
* https://review.rdoproject.org/zuul/build/7241289f87cf466caad03d39c48d6fee
* Error: Failed to update project ansible-collections/ansible.netcommon
* Need investigation
* No longer seen removing it from here
* ** Wallaby Failure**
* c8s DLRN stuck : Now it is fixed
* Failed due to mirror issue re-running here: https://review.rdoproject.org/r/c/testproject/+/40440
* https://logserver.rdoproject.org/40/40440/2/check/periodic-tripleo-ci-centos-9-standalone-on-multinode-ipa-wallaby/8c11aa9/job-output.txt
* Still failing
* `Timeout was reached for http://mirror.regionone.vexxhost-nodepool-tripleo.rdoproject.org/centos-stream/9-stream/BaseOS/x86_64/os/Packages/crypto-policies-20220203-1.gitf03e75e.el9.noarch.rpm [Operation too slow. Less than 1000 bytes/sec transferred the last 30 seconds]", "[FAILED] crypto-policies-20220203-1.gitf03e75e.el9.noarch.rpm: No more mirrors to try - All mirrors were already tried without success", "", "The downloaded packages were saved in cache until the next successful transaction.", "You can remove cached packages by executing 'dnf clean packages'`
* Network issue on vexxhost
* **Gate Failure**
* tripleo-ci-centos-8-standalone-upgrade-victoria (1x)
* Tempest test failure (test_minimum_basic_scenario):
* "None matches Is(None): Failed to find floating IP '192.168.24.157' in server addresses: {'tempest-TestMinimumBasicScenario-1018522959-network'"
* https://22a8c80ab9bbcd6728f0-31bf408f42c27b0363a3847bc2706cfc.ssl.cf1.rackcdn.com/833331/1/gate/tripleo-ci-centos-8-standalone-upgrade-victoria/c3ec607/logs/undercloud/var/log/tempest/stestr_results.html
* Rechecked the job - fixed now
* https://review.opendev.org/c/openstack/tripleo-heat-templates/+/833331/1#message-d11b2fe589391abc292d67162be07786ee0c2b46