Ruck and Rover notes #35

tags: ruck_rover

Important links for ruck rover's ruck/rover links to help
Ruck Rover - Unified Sprint #35
Dates: Oct 21 - Nov 11

Tripleo CI team ruck|rover: sshnaidm, ysandeep, wes
OSP CI team ruck|rover: psedlak, aopincaru

Previous notes: https://hackmd.io/1qxCqYzATfudl1cKvaQ8-w


on-going issues

TripleO

gate

periodic / 3rd party

MASTER
Last promotion - 12th Nov(Today)
Last buildset - https://review.rdoproject.org/zuul/buildset/71d102a4e3874db39ac744c67221287c
Failing jobs(Not under promotion criteria):-

VICTORIA
Last promotion - 10th Nov
Last buildset - https://review.rdoproject.org/zuul/buildset/f8e8812c49c14d2c8915af307d85731c

Ussuri
Ussuri pipeline
Last promotion - 09th Nov
No bugs, No pattern of failure
today's run 2 job failed:- rerunned via testproject - they passed https://review.rdoproject.org/r/#/c/26217/

Train

c8

Last promotion - 11th Nov

https://review.rdoproject.org/zuul/buildset/94277870c24347d2b4216c571314830a - All green

7

Last promotion - 04th Nov

Jobs timed out https://review.rdoproject.org/zuul/buildset/c8e285a731c04d78bfdf1f69303be237

Reported bug today: https://bugs.launchpad.net/tripleo/+bug/1903961

Rerun testproject for jobs which are in criteria promotion - https://review.rdoproject.org/r/#/c/28442/

STEIN/ROCKY
Last promotion - 03rd/04th Nov

  • Intermittently some jobs are timing out while gathering facts on different tasks : [tripleo-inventory : Ensure gather_facts has been run against localhost] or [validate-undercloud : gather facts used by role]

https://bugs.launchpad.net/tripleo/+bug/1903961

Queen
Last promotion - 04th Nov
* Queens periodic jobs are failing on tempest test: tempest.scenario.test_network_basic_ops.TestNetworkBasicOps with testtools.matchers._impl.MismatchError: 'ACTIVE' != u'DOWN': - FloatingIP: is at status: DOWN. failed to reach status: ACTIVE
https://bugs.launchpad.net/tripleo/+bug/1903996

OSP

Reviews / Fixes

PATCHES

Bugs reported

PATCHES

add dates in decending order so the latest date is at the top. Break out TripleO and OSP sections.

Nov 12th

TripleO

Headsup

<amoralej> we are updating ovs/ovn 2.13 for master and victoria today. We've gated it with different jobs so i expect to go smooth but let us know if you find any issue

  • [Blocking check jobs]tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train job is failing on TASK [undercloud-deploy : Write containers-prepare-parameter.yaml] with error 'dict object' has no attribute 'registry_ip_address_branch'
    https://bugs.launchpad.net/tripleo/+bug/1903980

  • [Promotion blocker for c7 train/stein/rocky]Intermittently some jobs are timing out while gathering facts on different tasks : [tripleo-inventory : Ensure gather_facts has been run against localhost] or [validate-undercloud : gather facts used by role]
    https://bugs.launchpad.net/tripleo/+bug/1903961

    • Hit on rerun via testproject too - Need to investigate
  • "Intemittenly tripleo-ci-centos-8-standalone-upgrade-ussuri timeouts while running tempest tests."
    https://bugs.launchpad.net/tripleo/+bug/1903993

    • Need to investigate
  • Queens periodic jobs are failing on tempest test: tempest.scenario.test_network_basic_ops.TestNetworkBasicOps with testtools.matchers._impl.MismatchError: 'ACTIVE' != u'DOWN': - FloatingIP: is at status: DOWN. failed to reach status: ACTIVE
    https://bugs.launchpad.net/tripleo/+bug/1903996

    • Need to investigate

Nov 11th

TripleO

Nov 10th

TripleO

  • [Fixed] Upstream periodic jobs are failing on TASK [undercloud-setup : Run the package installation script] with dependencies errors.
    https://bugs.launchpad.net/tripleo/+bug/1903709

  • [Component Promotion blocker]Compute component jobs in master branch are failing with ERROR nova nova.exception.DBNotAllowed: nova-compute attempted direct database access which is not allowed by policy
    https://bugs.launchpad.net/tripleo/+bug/1903655, Cix escalated

    • Nova team is working on it

Nov 09th

TripleO

Nov 06th

TripleO

@weshayutin Can we add cleanup-keys.sh in cron on toolbox server?

Nov 05th

TripleO

  • periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master is failing while executing cold migration & resize tempest tests with Stderr: 'mkdir: cannot create directory ‘/var/lib/nova/instances/<uuid>’: Permission denied\n'
    https://bugs.launchpad.net/tripleo/+bug/1903033

  • we got c7 train/ stein/ rocky / queen promotions yesterday

Nov 04th

TripleO

  • [Fixed for now / Long term fix needed]Container build job: periodic-tripleo-ci-build-containers-ubi-8-push is failing with dependencies issues cannot install crypto-policies/cyrus-sasl-lib and cyrus-sasl-lib
    https://bugs.launchpad.net/tripleo/+bug/1902846

  • [Fixed] tripleo "tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates is failing in check/gate/periodic with [ERROR]: Container(s) which finished with wrong return code: ['container-puppet-clustercheck', 'container-puppet-haproxy', 'container-puppet-mysql', container-puppet-memcached', 'container-puppet-keystone', 'container-puppet-rabbitmq']" [Critical,Triaged]
    https://bugs.launchpad.net/tripleo/+bug/1902831

    • A jenkins ppc container build job( tripleo-upstream-containers-build-master-ppc64le) which is pushing ppc container to rdo registy without any additional suffix. Because this job is pushing with same name as non ppc - its was overriding non ppc containers.

      • We repushed x86 arch containers for current-tripleo hash to clear issue we were hitting in check/gate jobs.

        Also, we disabled the job to push container - tripleo-upstream-containers-build-master-ppc64le with https://review.rdoproject.org/r/#/c/30761/

        In long term, tripleo-upstream-containers-build-master-ppc64le job needs to be modified to push ppc container with correct suffix so that it will not overwrire non-ppc containers.

Nov 03th

TripleO

Master and c8 train promoted, we waved criteria for failing ipa job as https://review.opendev.org/#/c/760994/ will fix the issue.

Nov 02nd

TripleO

Oct 30th

TripleO

Oct 29th

TripleO

Oct 28th

TripleO

Oct 27th

TripleO

  • [Invalid]Ussuri undercloud upgrade is failing with Config error: Parsing file "/var/lib/mock/dlrn-centos8-x86_64-1/root/etc/dnf/dnf.conf" failed: Parsing file '/var/lib/mock/dlrn-centos8-x86_64-1/root/etc/dnf/dnf.conf' failed: IniParser: Missing '=' at line 59
    https://bugs.launchpad.net/tripleo/+bug/1901705

  • Fixed - Master promotion blocker: Undercloud install is failing with Error message: Validation ['r', 'o', 'p', 'd', '.', 'i', 'y', 'k', 'c', 'e', 'l', 's', '-', 'n', 'u', 'a', 'm'] not found in /usr/share/ansible/validation-playbooks
    https://bugs.launchpad.net/tripleo/+bug/1901676

  • Fixed - Master Component line promotion blocker Container images with current-tripleo hash: a9a790d0723c9fe6641e453c6a1f0c91 again removed from rdo registry. Updated https://bugs.launchpad.net/tripleo/+bug/1901186 with details.

    • The issue happended for "current-tripleo" hash images as those images were not whiltelisted due to issue with batch promotions which is being used in tripleo-ci-testing to current-tripleo promotion. The issue is fixed with https://softwarefactory-project.io/r/19917 and now since the patch is merged and applied on all centos8 dlrn workers last promoted container images will not be removed.

Oct 23rd

TripleO

  • Fixed - Check/gate jobs failing on task "configure-mirrors : Update yum/dnf cache" with Error: Failed to download metadata for repo 'AppStream': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried https://bugs.launchpad.net/tripleo/+bug/1901142

<AJaeger18> ysandeep|ruckyoctozepto, as discussed in #openstack-infra, the mirror has been rebooted. If you encounter problems with new jobs (starting now or later), please report again.
  • Fixed - Master Component Line Promotion Blocker
    All the component pipeline jobs in master are failing because of missing container images with msg": "Error running container image prepare: Not found image. Different jobs are failing on different container images.
    https://bugs.launchpad.net/tripleo/+bug/1901186
    • Wes trying to promoting the new hash 0982128d3d3aeca87d09c7100795f104 to current-tripleo, we wave the criteria on master for ovb and ipa jobs(as they have separate ongoing bugs)
    • We are trying to fix now with promotion, but we need to investigate why the container images were purged
weshay|ruck>ysandeep|ruck, ya.. we'll have to follow up w/ david paulik in #rhos-ops to see why... if it was deleted there should be a log
<weshay|ruck> ysandeep|ruck, k.. any container not tagged w/ current-tripleo or another human readable will automatically be deleted after 3 days
<ysandeep|ruck> weshay|ruck, when you have time, wondering if there is a way to tell if older hash (fa69037728ceffb81826ff6d926a7884) have a human readable name ?

Oct 22nd

TripleO

  • Fixed - Master promotion blocker
    https://bugs.launchpad.net/tripleo/+bug/1900947
    Standalone with ipa server jobs is failing with "Error: Evaluation Error: Error while evaluating a Function Call, The ssl_verify_client parameter is required when setting ssl_ca"

    ​​​​ With depends-on this patch 759285, Testproject: https://review.rdoproject.org/r/#/c/26217/ passes:-
    
    ​​​​periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-master https://review.rdoproject.org/zuul/build/5eb1e7b79dea47fbb0adf372b7fecd08 : SUCCESS in 1h 56m 38s
    
  • Master promotion blocker
    https://bugs.launchpad.net/tripleo/+bug/1900949
    Ovb jobs are failing on master branch, imported nodes are not transitioning to manageable state with 'Error: Unable to establish IPMI v2 / RMCP+ session\n'

  • Fixed - https://bugs.launchpad.net/tripleo/+bug/1900957
    Ussuri periodic standalone upgrade job is failing with error "stderr": "/bin/sh: docker: command not found"

  • Fixed - sc12 ussuri/train jobs failing on tempest - reran with test project
    We reported this on existing bug https://bugs.launchpad.net/tripleo/+bug/1895822 , As the failure was same and alex already mentioned on bz that he is working on sc12.

    • following patches will fix the issue.
    ​​​​https://review.opendev.org/#/c/759295/ - ussuri
    ​​​​https://review.opendev.org/#/c/759296/ - train
    
  • Fixed - Upstream gates: a lot of RETRY_LIMIT in jobs, not TripleO specific.

OSP

Select a repo