ruck_rover
Important links for ruck rover's ruck/rover links to help
Ruck Rover - Unified Sprint #35
Dates: Oct 21 - Nov 11
Tripleo CI team ruck|rover: sshnaidm, ysandeep, wes
OSP CI team ruck|rover: psedlak, aopincaru
Previous notes: https://hackmd.io/1qxCqYzATfudl1cKvaQ8-w
MASTER
Last promotion - 12th Nov(Today)
Last buildset - https://review.rdoproject.org/zuul/buildset/71d102a4e3874db39ac744c67221287c
Failing jobs(Not under promotion criteria):-
periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-master & periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-master - Bug: https://bugs.launchpad.net/tripleo/+bug/1903508, Cixed to security team, Provided reproducer to martin today, Patch is up
periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master - Bug: https://bugs.launchpad.net/tripleo/+bug/1903033
VICTORIA
Last promotion - 10th Nov
Last buildset - https://review.rdoproject.org/zuul/buildset/f8e8812c49c14d2c8915af307d85731c
Ussuri
Ussuri pipeline
Last promotion - 09th Nov
No bugs, No pattern of failure
today's run 2 job failed:- rerunned via testproject - they passed https://review.rdoproject.org/r/#/c/26217/
Train
c8
Last promotion - 11th Nov
https://review.rdoproject.org/zuul/buildset/94277870c24347d2b4216c571314830a - All green
7
Last promotion - 04th Nov
Jobs timed out https://review.rdoproject.org/zuul/buildset/c8e285a731c04d78bfdf1f69303be237
Reported bug today: https://bugs.launchpad.net/tripleo/+bug/1903961
Rerun testproject for jobs which are in criteria promotion - https://review.rdoproject.org/r/#/c/28442/
STEIN/ROCKY
Last promotion - 03rd/04th Nov
https://bugs.launchpad.net/tripleo/+bug/1903961
Queen
Last promotion - 04th Nov
* Queens periodic jobs are failing on tempest test: tempest.scenario.test_network_basic_ops.TestNetworkBasicOps with testtools.matchers._impl.MismatchError: 'ACTIVE' != u'DOWN': - FloatingIP: is at status: DOWN. failed to reach status: ACTIVE
https://bugs.launchpad.net/tripleo/+bug/1903996
Master/Ussuri/Victoriapromotion blocker
https://bugs.launchpad.net/tripleo/+bug/1900949
Ovb jobs are failing on master branch, imported nodes are not transitioning to manageable state with ‘Error: Unable to establish IPMI v2 / RMCP+ session\n’
Network component jobs is failing on tempest TASK [os_tempest : Ensure private network exists]
https://bugs.launchpad.net/tripleo/+bug/1902048
add dates in decending order so the latest date is at the top. Break out TripleO and OSP sections.
<amoralej> we are updating ovs/ovn 2.13 for master and victoria today. We've gated it with different jobs so i expect to go smooth but let us know if you find any issue
[Blocking check jobs]tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train job is failing on TASK [undercloud-deploy : Write containers-prepare-parameter.yaml] with error 'dict object' has no attribute 'registry_ip_address_branch'
https://bugs.launchpad.net/tripleo/+bug/1903980
[Promotion blocker for c7 train/stein/rocky]Intermittently some jobs are timing out while gathering facts on different tasks : [tripleo-inventory : Ensure gather_facts has been run against localhost] or [validate-undercloud : gather facts used by role]
https://bugs.launchpad.net/tripleo/+bug/1903961
"Intemittenly tripleo-ci-centos-8-standalone-upgrade-ussuri timeouts while running tempest tests."
https://bugs.launchpad.net/tripleo/+bug/1903993
Queens periodic jobs are failing on tempest test: tempest.scenario.test_network_basic_ops.TestNetworkBasicOps with testtools.matchers._impl.MismatchError: 'ACTIVE' != u'DOWN': - FloatingIP: is at status: DOWN. failed to reach status: ACTIVE
https://bugs.launchpad.net/tripleo/+bug/1903996
tripleo-ci-centos-8-content-provider failed on TASK [container-build : Pull failed containers from RDO registry] with error was: 'dict object' has no attribute 'split'
https://bugs.launchpad.net/tripleo/+bug/1903882
[Fixed] periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-victoria is failing with Error while evaluating a Resource Statement, Duplicate declaration: Exec[/etc/pki/CA/certs/qemu.pem] is already declared
https://bugs.launchpad.net/tripleo/+bug/1903828
[Fixed] Upstream periodic jobs are failing on TASK [undercloud-setup : Run the package installation script] with dependencies errors.
https://bugs.launchpad.net/tripleo/+bug/1903709
[Component Promotion blocker]Compute component jobs in master branch are failing with ERROR nova nova.exception.DBNotAllowed: nova-compute attempted direct database access which is not allowed by policy
https://bugs.launchpad.net/tripleo/+bug/1903655, Cix escalated
Promoter blocker(Voting job) - * "tripleo-ci-centos-8-undercloud-upgrade-ussuri is failing RuntimeError: Update extra packages failed: b'sudo: yum: command not found\n" https://bugs.launchpad.net/tripleo/+bug/1903498
[Fixed]periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-master passed the earlier issue after https://review.opendev.org/#/c/760994/ merged but featureset039 is still hitting the same issue
Updated on existing bug: https://bugs.launchpad.net/tripleo/+bug/1902478
Trying to solve with https://review.opendev.org/761863
[Promotion blocker]periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa and featureset039 on master/victoria are failing on tempest test while spawning instance with Error - Cannot load certificate '/etc/pki/libvirt-vnc/server-cert.pem' & key '/etc/pki/libvirt-vnc/server-key.pem': Error while reading file."
https://bugs.launchpad.net/tripleo/+bug/1903508
tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates is pulling containers from docker.io
https://bugs.launchpad.net/tripleo/+bug/1903581
periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp_1supp-featureset039-security-* on different branches is failing with error "Quota exceeded, too many key pairs."
https://bugs.launchpad.net/tripleo/+bug/1903263
Looks like this is caused because keypair for tls jobs were not cleaned up.
cleanup-keys.sh which was added in https://review.rdoproject.org/r/#/c/29167/ was not in cron job.
Ideally those keypairs should be deleted in job itself, cleanup script mainly needed for cases where it's missed in job. We will try to investigate on why job cleanup not working.
We executed that script(cleanup-keys.sh) manually for now.
@weshayutin Can we add cleanup-keys.sh in cron on toolbox server?
periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master is failing while executing cold migration & resize tempest tests with Stderr: 'mkdir: cannot create directory ‘/var/lib/nova/instances/<uuid>’: Permission denied\n'
https://bugs.launchpad.net/tripleo/+bug/1903033
we got c7 train/ stein/ rocky / queen promotions yesterday
[Fixed for now / Long term fix needed]Container build job: periodic-tripleo-ci-build-containers-ubi-8-push is failing with dependencies issues cannot install crypto-policies/cyrus-sasl-lib and cyrus-sasl-lib
https://bugs.launchpad.net/tripleo/+bug/1902846
[Fixed] tripleo "tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates is failing in check/gate/periodic with [ERROR]: Container(s) which finished with wrong return code: ['container-puppet-clustercheck', 'container-puppet-haproxy', 'container-puppet-mysql', container-puppet-memcached', 'container-puppet-keystone', 'container-puppet-rabbitmq']" [Critical,Triaged]
https://bugs.launchpad.net/tripleo/+bug/1902831
A jenkins ppc container build job( tripleo-upstream-containers-build-master-ppc64le) which is pushing ppc container to rdo registy without any additional suffix. Because this job is pushing with same name as non ppc - its was overriding non ppc containers.
We repushed x86 arch containers for current-tripleo hash to clear issue we were hitting in check/gate jobs.
Also, we disabled the job to push container - tripleo-upstream-containers-build-master-ppc64le with https://review.rdoproject.org/r/#/c/30761/
In long term, tripleo-upstream-containers-build-master-ppc64le job needs to be modified to push ppc container with correct suffix so that it will not overwrire non-ppc containers.
Master and c8 train promoted, we waved criteria for failing ipa job as https://review.opendev.org/#/c/760994/ will fix the issue.
Master Promotion blocker
tripleo-ci-centos-8-standalone-on-multinode-ipa is failing while configuring FreeIPA server with RuntimeError: CA configuration failed
https://bugs.launchpad.net/tripleo/+bug/1902478
[Observation - Unable to reproduce]periodic-tripleo-centos-7-rocky-containers-build-push timed out during build-containers
https://bugs.launchpad.net/tripleo/+bug/1902480
We tried to reproduce the issue in testproject patch[1] and hold node for further debugging, But we were unable to reproduce the issue.
Scheduled run on 04th Nov also passed - https://review.rdoproject.org/zuul/build/ab315a7ba5224fb88549d5d346d59294
We are keeping this bug under observation..
Some periodic jobs failed while modifying container images.
We can investigate it under existing bug: https://bugs.launchpad.net/tripleo/+bug/1902190
[Clear in next iteration]ci.centos jobs in ussuri failing on overcloud deployment with Failed container(s): ['nova_wait_for_compute_service']
https://bugs.launchpad.net/tripleo/+bug/1902487
Fixed - Fs039 in security component is failing in master/Victoria/Ussuri on TASK [ovb-manage : Attach instance to provision OVB network]
https://bugs.launchpad.net/tripleo/+bug/1902506
tripleo-ci-centos-8-scenario010-standalone is failing on tempest test with error 'Load Balancer is immutable and cannot be updated.'
https://bugs.launchpad.net/tripleo/+bug/1901996
fix-commited [Master]Network component jobs is failing on tempest TASK [os_tempest : Ensure private network exists]
https://bugs.launchpad.net/tripleo/+bug/1902048
ci.centos jobs in master failing introspection with ERROR ironic_inspector.process TypeError: unhashable type: 'list'
https://bugs.launchpad.net/tripleo/+bug/1901917
Ussuri promoted
[Invalid]Ussuri undercloud upgrade is failing with Config error: Parsing file "/var/lib/mock/dlrn-centos8-x86_64-1/root/etc/dnf/dnf.conf" failed: Parsing file '/var/lib/mock/dlrn-centos8-x86_64-1/root/etc/dnf/dnf.conf' failed: IniParser: Missing '=' at line 59
https://bugs.launchpad.net/tripleo/+bug/1901705
Fixed - Master promotion blocker: Undercloud install is failing with Error message: Validation ['r', 'o', 'p', 'd', '.', 'i', 'y', 'k', 'c', 'e', 'l', 's', '-', 'n', 'u', 'a', 'm'] not found in /usr/share/ansible/validation-playbooks
https://bugs.launchpad.net/tripleo/+bug/1901676
Fixed - Master Component line promotion blocker Container images with current-tripleo hash: a9a790d0723c9fe6641e453c6a1f0c91 again removed from rdo registry. Updated https://bugs.launchpad.net/tripleo/+bug/1901186 with details.
Fixed - Check/gate jobs failing on task "configure-mirrors : Update yum/dnf cache" with Error: Failed to download metadata for repo 'AppStream': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried https://bugs.launchpad.net/tripleo/+bug/1901142
<AJaeger18> ysandeep|ruckyoctozepto, as discussed in #openstack-infra, the mirror has been rebooted. If you encounter problems with new jobs (starting now or later), please report again.
weshay|ruck>ysandeep|ruck, ya.. we'll have to follow up w/ david paulik in #rhos-ops to see why... if it was deleted there should be a log
<weshay|ruck> ysandeep|ruck, k.. any container not tagged w/ current-tripleo or another human readable will automatically be deleted after 3 days
<ysandeep|ruck> weshay|ruck, when you have time, wondering if there is a way to tell if older hash (fa69037728ceffb81826ff6d926a7884) have a human readable name ?
Fixed - Master promotion blocker
https://bugs.launchpad.net/tripleo/+bug/1900947
Standalone with ipa server jobs is failing with "Error: Evaluation Error: Error while evaluating a Function Call, The ssl_verify_client parameter is required when setting ssl_ca"
With depends-on this patch 759285, Testproject: https://review.rdoproject.org/r/#/c/26217/ passes:-
periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-master https://review.rdoproject.org/zuul/build/5eb1e7b79dea47fbb0adf372b7fecd08 : SUCCESS in 1h 56m 38s
Master promotion blocker
https://bugs.launchpad.net/tripleo/+bug/1900949
Ovb jobs are failing on master branch, imported nodes are not transitioning to manageable state with 'Error: Unable to establish IPMI v2 / RMCP+ session\n'
Fixed - https://bugs.launchpad.net/tripleo/+bug/1900957
Ussuri periodic standalone upgrade job is failing with error "stderr": "/bin/sh: docker: command not found"
Fixed - sc12 ussuri/train jobs failing on tempest - reran with test project
We reported this on existing bug https://bugs.launchpad.net/tripleo/+bug/1895822 , As the failure was same and alex already mentioned on bz that he is working on sc12.
https://review.opendev.org/#/c/759295/ - ussuri
https://review.opendev.org/#/c/759296/ - train
Fixed - Upstream gates: a lot of RETRY_LIMIT in jobs, not TripleO specific.