owned this note
owned this note
Published
Linked with GitHub
# Ruck and Rover notes #35
###### tags: `ruck_rover`
:::info
Important links for ruck rover's [ruck/rover links to help](https://hackmd.io/07z0xroHTFi2IbX93P5ZfQ)
**Ruck Rover - Unified Sprint #35**
Dates: Oct 21 - Nov 11
Tripleo CI team ruck|rover: sshnaidm, ysandeep, wes
OSP CI team ruck|rover: psedlak, aopincaru
Previous notes: https://hackmd.io/1qxCqYzATfudl1cKvaQ8-w
:::
[TOC]
---
## on-going issues
:::danger
## TripleO
### gate
* tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train job is failing on TASK [undercloud-deploy : Write containers-prepare-parameter.yaml] with error 'dict object' has no attribute 'registry_ip_address_branch'
https://bugs.launchpad.net/tripleo/+bug/1903980
* Sagi's patch will solve the issue https://review.opendev.org/#/c/761892/ for https://bugs.launchpad.net/tripleo/+bug/1903498
### periodic / 3rd party
**MASTER**
Last promotion - 12th Nov(Today)
Last buildset - https://review.rdoproject.org/zuul/buildset/71d102a4e3874db39ac744c67221287c
Failing jobs(Not under promotion criteria):-
* periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-master & periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-master - Bug: https://bugs.launchpad.net/tripleo/+bug/1903508, Cixed to security team, Provided reproducer to martin today, Patch is up
* periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master - Bug: https://bugs.launchpad.net/tripleo/+bug/1903033
**VICTORIA**
Last promotion - 10th Nov
Last buildset - https://review.rdoproject.org/zuul/buildset/f8e8812c49c14d2c8915af307d85731c
* periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-victoria(In criteria) & periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-victoria, Bug: https://bugs.launchpad.net/tripleo/+bug/1903508 Cixed to security team, Provided reproducer to martin today, Patch is up - https://review.opendev.org/#/c/762497
**Ussuri**
Ussuri pipeline
Last promotion - 09th Nov
No bugs, No pattern of failure
today's run 2 job failed:- rerunned via testproject - they passed https://review.rdoproject.org/r/#/c/26217/
Train
c8
Last promotion - 11th Nov
https://review.rdoproject.org/zuul/buildset/94277870c24347d2b4216c571314830a - All green
7
Last promotion - 04th Nov
Jobs timed out https://review.rdoproject.org/zuul/buildset/c8e285a731c04d78bfdf1f69303be237
Reported bug today: https://bugs.launchpad.net/tripleo/+bug/1903961
Rerun testproject for jobs which are in criteria promotion - https://review.rdoproject.org/r/#/c/28442/
**STEIN/ROCKY**
Last promotion - 03rd/04th Nov
* Intermittently some jobs are timing out while gathering facts on different tasks : [tripleo-inventory : Ensure gather_facts has been run against localhost] or [validate-undercloud : gather facts used by role]
https://bugs.launchpad.net/tripleo/+bug/1903961
**Queen**
Last promotion - 04th Nov
* Queens periodic jobs are failing on tempest test: tempest.scenario.test_network_basic_ops.TestNetworkBasicOps with testtools.matchers._impl.MismatchError: 'ACTIVE' != u'DOWN': - FloatingIP: is at status: DOWN. failed to reach status: ACTIVE
https://bugs.launchpad.net/tripleo/+bug/1903996
* Master/Ussuri/Victoriapromotion blocker
https://bugs.launchpad.net/tripleo/+bug/1900949
Ovb jobs are failing on master branch, imported nodes are not transitioning to manageable state with ‘Error: Unable to establish IPMI v2 / RMCP+ session\n’
* rdocloud infra issue, we moved jobs to vexx host
* Network component jobs is failing on tempest TASK [os_tempest : Ensure private network exists]
https://bugs.launchpad.net/tripleo/+bug/1902048
## OSP
:::
### Reviews / Fixes
::: spoiler PATCHES
:::
### Bugs reported
::: spoiler PATCHES
:::
---
:::info
add dates in decending order so the latest date is at the top. Break out TripleO and OSP sections.
:::
## Nov 12th
### TripleO
#### Headsup
<amoralej> we are updating ovs/ovn 2.13 for master and victoria today. We've gated it with different jobs so i expect to go smooth but let us know if you find any issue
* [Blocking check jobs]tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train job is failing on TASK [undercloud-deploy : Write containers-prepare-parameter.yaml] with error 'dict object' has no attribute 'registry_ip_address_branch'
https://bugs.launchpad.net/tripleo/+bug/1903980
* Sagi patch will solve the issue https://review.opendev.org/#/c/761892/
* [Promotion blocker for c7 train/stein/rocky]Intermittently some jobs are timing out while gathering facts on different tasks : [tripleo-inventory : Ensure gather_facts has been run against localhost] or [validate-undercloud : gather facts used by role]
https://bugs.launchpad.net/tripleo/+bug/1903961
* Hit on rerun via testproject too - Need to investigate
* "Intemittenly tripleo-ci-centos-8-standalone-upgrade-ussuri timeouts while running tempest tests."
https://bugs.launchpad.net/tripleo/+bug/1903993
* Need to investigate
* Queens periodic jobs are failing on tempest test: tempest.scenario.test_network_basic_ops.TestNetworkBasicOps with testtools.matchers._impl.MismatchError: 'ACTIVE' != u'DOWN': - FloatingIP: is at status: DOWN. failed to reach status: ACTIVE
https://bugs.launchpad.net/tripleo/+bug/1903996
* Need to investigate
## Nov 11th
### TripleO
* tripleo-ci-centos-8-content-provider failed on TASK [container-build : Pull failed containers from RDO registry] with error was: 'dict object' has no attribute 'split'
https://bugs.launchpad.net/tripleo/+bug/1903882
* ronelle is working on it - https://review.opendev.org/762400
* [Fixed] periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-victoria is failing with Error while evaluating a Resource Statement, Duplicate declaration: Exec[/etc/pki/CA/certs/qemu.pem] is already declared
https://bugs.launchpad.net/tripleo/+bug/1903828
## Nov 10th
### TripleO
* [Fixed] Upstream periodic jobs are failing on TASK [undercloud-setup : Run the package installation script] with dependencies errors.
https://bugs.launchpad.net/tripleo/+bug/1903709
* [Component Promotion blocker]Compute component jobs in master branch are failing with ERROR nova nova.exception.DBNotAllowed: nova-compute attempted direct database access which is not allowed by policy
https://bugs.launchpad.net/tripleo/+bug/1903655, Cix escalated
* Nova team is working on it
## Nov 09th
### TripleO
* Promoter blocker(Voting job) - * "tripleo-ci-centos-8-undercloud-upgrade-ussuri is failing RuntimeError: Update extra packages failed: b'sudo: yum: command not found\n" https://bugs.launchpad.net/tripleo/+bug/1903498
* Patch is up - https://review.opendev.org/#/c/761892/
* [Fixed]periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-master passed the earlier issue after https://review.opendev.org/#/c/760994/ merged but featureset039 is still hitting the same issue
Updated on existing bug: https://bugs.launchpad.net/tripleo/+bug/1902478
Trying to solve with https://review.opendev.org/761863
* [Promotion blocker]periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa and featureset039 on master/victoria are failing on tempest test while spawning instance with Error - Cannot load certificate '/etc/pki/libvirt-vnc/server-cert.pem' & key '/etc/pki/libvirt-vnc/server-key.pem': Error while reading file."
https://bugs.launchpad.net/tripleo/+bug/1903508
* Provided reproducer to matin, Patch is up https://review.opendev.org/#/c/762497
## Nov 06th
### TripleO
* tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates is pulling containers from docker.io
https://bugs.launchpad.net/tripleo/+bug/1903581
* periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-security-* on different branches is failing with error "Quota exceeded, too many key pairs."
https://bugs.launchpad.net/tripleo/+bug/1903263
* Looks like this is caused because keypair for tls jobs were not cleaned up.
cleanup-keys.sh which was added in https://review.rdoproject.org/r/#/c/29167/ was not in cron job.
Ideally those keypairs should be deleted in job itself, cleanup script mainly needed for cases where it's missed in job. We will try to investigate on why job cleanup not working.
We executed that script(cleanup-keys.sh) manually for now.
@weshayutin Can we add cleanup-keys.sh in cron on toolbox server?
* monitor https://review.opendev.org/#/c/760994/ to see it get merged. Yatin posted recheck sometime back.
## Nov 05th
### TripleO
* periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master is failing while executing cold migration & resize tempest tests with Stderr: 'mkdir: cannot create directory ‘/var/lib/nova/instances/<uuid>’: Permission denied\n'
https://bugs.launchpad.net/tripleo/+bug/1903033
* we got c7 train/ stein/ rocky / queen promotions yesterday
## Nov 04th
### TripleO
* [Fixed for now / Long term fix needed]Container build job: periodic-tripleo-ci-build-containers-ubi-8-push is failing with dependencies issues cannot install crypto-policies/cyrus-sasl-lib and cyrus-sasl-lib
https://bugs.launchpad.net/tripleo/+bug/1902846
* Patches to pin ubi8 is up https://review.opendev.org/#/c/761463/
https://review.opendev.org/#/c/761402/
* [Fixed] tripleo "tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates is failing in check/gate/periodic with [ERROR]: Container(s) which finished with wrong return code: ['container-puppet-clustercheck', 'container-puppet-haproxy', 'container-puppet-mysql', container-puppet-memcached', 'container-puppet-keystone', 'container-puppet-rabbitmq']" [Critical,Triaged]
https://bugs.launchpad.net/tripleo/+bug/1902831
* A jenkins ppc container build job( tripleo-upstream-containers-build-master-ppc64le) which is pushing ppc container to rdo registy without any additional suffix. Because this job is pushing with same name as non ppc - its was overriding non ppc containers.
* We repushed x86 arch containers for current-tripleo hash to clear issue we were hitting in check/gate jobs.
Also, we disabled the job to push container - tripleo-upstream-containers-build-master-ppc64le with https://review.rdoproject.org/r/#/c/30761/
In long term, tripleo-upstream-containers-build-master-ppc64le job needs to be modified to push ppc container with correct suffix so that it will not overwrire non-ppc containers.
## Nov 03th
### TripleO
Master and c8 train promoted, we waved criteria for failing ipa job as https://review.opendev.org/#/c/760994/ will fix the issue.
## Nov 02nd
### TripleO
* Master Promotion blocker
tripleo-ci-centos-8-standalone-on-multinode-ipa is failing while configuring FreeIPA server with RuntimeError: CA configuration failed
https://bugs.launchpad.net/tripleo/+bug/1902478
* https://review.opendev.org/#/c/760994/ patch is up.
* [Observation - Unable to reproduce]periodic-tripleo-centos-7-rocky-containers-build-push timed out during build-containers
https://bugs.launchpad.net/tripleo/+bug/1902480
* We tried to reproduce the issue in testproject patch[1] and hold node for further debugging, But we were unable to reproduce the issue.
Scheduled run on 04th Nov also passed - https://review.rdoproject.org/zuul/build/ab315a7ba5224fb88549d5d346d59294
We are keeping this bug under observation..
* Some periodic jobs failed while modifying container images.
We can investigate it under existing bug: https://bugs.launchpad.net/tripleo/+bug/1902190
* [Clear in next iteration]ci.centos jobs in ussuri failing on overcloud deployment with Failed container(s): ['nova_wait_for_compute_service']
https://bugs.launchpad.net/tripleo/+bug/1902487
* Fixed - Fs039 in security component is failing in master/Victoria/Ussuri on TASK [ovb-manage : Attach instance to provision OVB network]
https://bugs.launchpad.net/tripleo/+bug/1902506
* Fix proposed https://review.rdoproject.org/r/#/c/30710/
## Oct 30th
### TripleO
* "Master periodic jobs are failing with "msg": "Error running container image prepare: failed"
https://bugs.launchpad.net/tripleo/+bug/1902190
## Oct 29th
### TripleO
* tripleo-ci-centos-8-scenario010-standalone is failing on tempest test with error 'Load Balancer is immutable and cannot be updated.'
https://bugs.launchpad.net/tripleo/+bug/1901996
* fix-commited [Master]Network component jobs is failing on tempest TASK [os_tempest : Ensure private network exists]
https://bugs.launchpad.net/tripleo/+bug/1902048
* Already reported in neutron: https://bugs.launchpad.net/neutron/+bug/1901534 and fix https://review.opendev.org/#/c/759673/ is on the way.
## Oct 28th
### TripleO
* ci.centos jobs in master failing introspection with ERROR ironic_inspector.process TypeError: unhashable type: 'list'
https://bugs.launchpad.net/tripleo/+bug/1901917
* Patch is up https://github.com/redhat-cip/hardware/pull/156
* Ussuri promoted
## Oct 27th
### TripleO
* [Invalid]Ussuri undercloud upgrade is failing with Config error: Parsing file "/var/lib/mock/dlrn-centos8-x86_64-1/root/etc/dnf/dnf.conf" failed: Parsing file '/var/lib/mock/dlrn-centos8-x86_64-1/root/etc/dnf/dnf.conf' failed: IniParser: Missing '=' at line 59
https://bugs.launchpad.net/tripleo/+bug/1901705
* Fixed - Master promotion blocker: Undercloud install is failing with Error message: Validation ['r', 'o', 'p', 'd', '.', 'i', 'y', 'k', 'c', 'e', 'l', 's', '-', 'n', 'u', 'a', 'm'] not found in /usr/share/ansible/validation-playbooks
https://bugs.launchpad.net/tripleo/+bug/1901676
* Fixed - Master Component line promotion blocker Container images with current-tripleo hash: a9a790d0723c9fe6641e453c6a1f0c91 again removed from rdo registry. Updated https://bugs.launchpad.net/tripleo/+bug/1901186 with details.
* The issue happended for "current-tripleo" hash images as those images were not whiltelisted due to issue with batch promotions which is being used in tripleo-ci-testing to current-tripleo promotion. The issue is fixed with https://softwarefactory-project.io/r/19917 and now since the patch is merged and applied on all centos8 dlrn workers last promoted container images will not be removed.
## Oct 23rd
### TripleO
* Fixed - Check/gate jobs failing on task "configure-mirrors : Update yum/dnf cache" with Error: Failed to download metadata for repo 'AppStream': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried https://bugs.launchpad.net/tripleo/+bug/1901142
* There were issues with one of a mirror mirror.regionone.limestone.opendev.org. It went unaccessible. Infra team rebooted the affected server. Under observation
~~~
<AJaeger18> ysandeep|ruckyoctozepto, as discussed in #openstack-infra, the mirror has been rebooted. If you encounter problems with new jobs (starting now or later), please report again.
~~~
* Fixed - Master Component Line Promotion Blocker
All the component pipeline jobs in master are failing because of missing container images with msg": "Error running container image prepare: Not found image. Different jobs are failing on different container images.
https://bugs.launchpad.net/tripleo/+bug/1901186
* Wes trying to promoting the new hash 0982128d3d3aeca87d09c7100795f104 to current-tripleo, we wave the criteria on master for ovb and ipa jobs(as they have separate ongoing bugs)
* We are trying to fix now with promotion, but we need to investigate why the container images were purged
~~~
weshay|ruck>ysandeep|ruck, ya.. we'll have to follow up w/ david paulik in #rhos-ops to see why... if it was deleted there should be a log
<weshay|ruck> ysandeep|ruck, k.. any container not tagged w/ current-tripleo or another human readable will automatically be deleted after 3 days
<ysandeep|ruck> weshay|ruck, when you have time, wondering if there is a way to tell if older hash (fa69037728ceffb81826ff6d926a7884) have a human readable name ?
~~~
## Oct 22nd
### TripleO
* Fixed - Master promotion blocker
https://bugs.launchpad.net/tripleo/+bug/1900947
Standalone with ipa server jobs is failing with "Error: Evaluation Error: Error while evaluating a Function Call, The ssl_verify_client parameter is required when setting ssl_ca"
* https://review.opendev.org/#/c/759285/ will fix this issue.
~~~
With depends-on this patch 759285, Testproject: https://review.rdoproject.org/r/#/c/26217/ passes:-
periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-master https://review.rdoproject.org/zuul/build/5eb1e7b79dea47fbb0adf372b7fecd08 : SUCCESS in 1h 56m 38s
~~~
* Master promotion blocker
https://bugs.launchpad.net/tripleo/+bug/1900949
Ovb jobs are failing on master branch, imported nodes are not transitioning to manageable state with 'Error: Unable to establish IPMI v2 / RMCP+ session\n'
* Fixed - https://bugs.launchpad.net/tripleo/+bug/1900957
Ussuri periodic standalone upgrade job is failing with error "stderr": "/bin/sh: docker: command not found"
* Fixed - sc12 ussuri/train jobs failing on tempest - reran with test project
We reported this on existing bug https://bugs.launchpad.net/tripleo/+bug/1895822 , As the failure was same and alex already mentioned on bz that he is working on sc12.
* following patches will fix the issue.
~~~
https://review.opendev.org/#/c/759295/ - ussuri
https://review.opendev.org/#/c/759296/ - train
~~~
* Fixed - Upstream gates: a lot of RETRY_LIMIT in jobs, not TripleO specific.
* https://review.opendev.org/#/c/759251/ - excluding problematic cloud "inap-mtl01"
### OSP