Ruck and rover notes #27

tags: ruck_rover

Important links for ruck rover's ruck/rover links to help
Ruck Rover - Unified Sprint 27
Dates: May 6 - May 26

Tripleo CI team ruck|rover: Folco (rfolco) / Bhagyashris (bhagyashris)
OSP CI team ruck|rover (April 24 - May 15): Waldek (wznoinsk) / Avi (TalmoR)

Previous notes: https://hackmd.io/1pY-KQB_QwOe-a-5oEXTRg


on-going issues

TripleO

gate

RDO CI

OSP


add dates in decending order so the latest date is at the top. Break out TripleO and OSP sections.

Launchpad Bugs Reported

Bugzilla Name status Review
1878101 ping br-ctlplane is failing too often, "Trying to ping default gateway" Complete 727942
1878190 periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-master job is consistently failing because of some tesmpest test are failing Triged 727192
1878197 periodic-tripleo-ci-centos-7-ovb-1ctlr_1cellctrl_1comp-featureset063-train is failing and time_out on tempest execute Triged
1878150 tox-linters jobs failing with AttributeError In Progress 727113
1878248 NetworkSecGroupTest failing on fs020 stein/train In Progress 727287
1877031 queens tripleo-ci-centos-7-undercloud-upgrades broken for ansible version In Progress 727696
1879267 Error: Failed to download metadata for repo 'advanced-virtualization'\n level=debug msg="error running [bash -x /tmp/yum_update.sh delorean-current,quickstart-centos-ceph-nautilus] in container \"centos-binary-swift-container-working-container\": error while running runtime: exit status 1 failing standlone deployment jobs Complete 728761
1879638 [Train Only] Error: No matching repo to modify: epel failing container build push on centos-8 train Complete 729519
1880383 ERROR! Unable to retrieve file contents, Could not find or access '/home/zuul/workspace/.quickstart/config/release/queens.yml' on the Ansible Controller. failing tripleo quickstart deployment on centos-8 (master, ussuri and train) Complete 730533

May 28th

Tripleo

OSP

May 27th

Tripleo

OSP

May 26th

Tripleo

OSP

May 25th

Tripleo

gate

13 failures 05-25-2020 3:50 UTC

  • no pattern, random failures, examples:
    • tempest.scenario.test_snapshot_pattern.TestSnapshotPattern failed on scen001 standalone (centos8)
    • Unable to disable service iscsid.socket on containers multinode (stein)
    • image prepare failed on containers multinode (train)
    • TIME OUT on undercloud-containers, containers-multinode, scen000 (train)
    • container-puppet tasks failed on centos7 standalone (train)
    • Wait for containers to start for step 3 using paunch >> scen003 standalone (train)
    • puppet host configuration failed on scen010 (train)
  • patches rechecked

RDO CI Failures

undercloud-upgrade failing in gate

https://bugs.launchpad.net/tripleo/+bug/1876893

OSP

May 22th

TripleO

train promotion failed due to:

master:

  • periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master: is failing consistently because of https://bugs.launchpad.net/tripleo/+bug/1879766 (master ovb jobs failing on Destination directory /etc/pki/tls/private does not exist): Not sure why this bug is marked as invalid.
  • periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-stein: is getting time out because of some tempest test are running out of time.

OSP

May 21th

Tripleo

OSP

May 20th

Tripleo

rfolco notes on gate failures

msg: '[''mysql_init_bundle''] failed to start

RUN END RESULT_TIMED_OUT

async task did not complete within the requested time - 5700s

Unable to start service tripleo_rabbitmq.service

FAILED - RETRYING: Wait for puppet host configuration to finish

tempest (network_basic_ops)

https://bce0317ca743b3733203-ca4b089b4b338eb03b97f6a00e3061e2.ssl.cf5.rackcdn.com/729105/1/gate/tripleo-ci-centos-8-scenario004-standalone/494b55d/logs/undercloud/var/log/tempest/stestr_results.html

OSP

May 19th

TripleO

Looks like stein fs020 hitting issue in overcloud-deploy.. waiting on repeat..
https://logserver.rdoproject.org/openstack-periodic-24hr/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset020-stein/27d5e4f/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

2020-05-19 07:08:47 | 
2020-05-19 07:08:47 | novacompute-1: Failed to connect to the host via ssh: Warning: Permanently
2020-05-19 07:08:47 | added '192.168.24.24' (ECDSA) to the list of known hosts.  Permission denied
2020-05-19 07:08:47 | (publickey,gssapi-keyex,gssapi-with-mic).

OSP

May 18th

Tripleo

OSP

May 15th

TripleO

Pipeline: openstack-periodic-latest-released
  • ERROR: "Could not find or access '/home/zuul/workspace/.quickstart/vars/tempest_skip_ussuri.yml'"

    • Failing jobs:

      1. periodic-tripleo-ci-centos-8-scenario010-ovn-provider-standalone-ussuri
      2. periodic-tripleo-ci-centos-8-standalone-full-tempest-scenario-ussuri
      3. periodic-tripleo-ci-centos-8-standalone-full-tempest-api-ussuri
      4. periodic-tripleo-ci-centos-8-scenario012-standalone-ussuri
      5. periodic-tripleo-ci-centos-8-scenario010-standalone-ussuri
      6. periodic-tripleo-ci-centos-8-scenario007-standalone-ussuri
      7. periodic-tripleo-ci-centos-8-scenario008-standalone-ussuri
      8. periodic-tripleo-ci-centos-8-scenario003-standalone-ussuri
      9. periodic-tripleo-ci-centos-8-scenario002-standalone-ussuri
      10. periodic-tripleo-ci-centos-8-scenario001-standalone-ussuri
      11. periodic-tripleo-ci-centos-8-standalone-ussuri
      12. periodic-tripleo-ci-centos-8-undercloud-containers-ussuri
    • Error log

      ​​​​​​  2020-05-14 23:03:09.383163 | primary | TASK [Load tempest skiplist file] **********************************************
      ​​​​​​  2020-05-14 23:03:09.383171 | primary | Thursday 14 May 2020  23:03:09 +0000 (0:00:00.052)       0:44:05.425 **********
      ​​​​​​  2020-05-14 23:03:09.425360 | primary | fatal: [undercloud]: FAILED! => {
      ​​​​​​  2020-05-14 23:03:09.425423 | primary |     "ansible_facts": {},
      ​​​​​​  2020-05-14 23:03:09.425433 | primary |     "ansible_included_var_files": [],
      ​​​​​​  2020-05-14 23:03:09.425440 | primary |     "changed": false,
      ​​​​​​  2020-05-14 23:03:09.425448 | primary |     "message": "Could not find or access '/home/zuul/workspace/.quickstart/vars/tempest_skip_ussuri.yml' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"
      ​​​​​​  2020-05-14 23:03:09.425470 | primary | }
      
  • Below jobs are failing bacuse of emit_releases_file.py: error: argument stable-release: invalid choice: 'ussuri' (choose from 'newton', 'ocata', 'pike', 'queens', 'rocky', 'stein', 'train', 'master')

    • Error log
      ​​​​  2020-05-14 22:12:11.131318 | primary | ++(/home/zuul/src/opendev.org/openstack/tripleo-ci/toci_gate_test.sh:113): main(): basename /home/zuul/src/opendev.org/openstack/tripleo-quickstart/config/general_config/featureset037.yml
      ​​​​  2020-05-14 22:12:11.132520 | primary | +(/home/zuul/src/opendev.org/openstack/tripleo-ci/toci_gate_test.sh:113): main(): python3 /home/zuul/src/opendev.org/openstack/tripleo-ci/scripts/emit_releases_file/emit_releases_file.py --stable-release ussuri --featureset-file /home/zuul/src/opendev.org/openstack/tripleo-quickstart/config/general_config/featureset037.yml --output-file /home/zuul/workspace/logs/releases.sh --log-file /home/zuul/workspace/logs/emit_releases_file.log --distro-name centos --distro-version 8 --is-periodic
      ​​​​  2020-05-14 22:12:11.316668 | primary | usage: emit_releases_file.py [-h] --stable-release
      ​​​​  2020-05-14 22:12:11.316747 | primary |                              {newton,ocata,pike,queens,rocky,stein,train,master}
      ​​​​  2020-05-14 22:12:11.316756 | primary |                              --distro-name {centos} --distro-version {7,8}
      ​​​​  2020-05-14 22:12:11.316763 | primary |                              --featureset-file FEATURESET_FILE
      ​​​​  2020-05-14 22:12:11.316770 | primary |                              [--output-file OUTPUT_FILE] [--log-file LOG_FILE]
      ​​​​  2020-05-14 22:12:11.316776 | primary |                              [--upgrade-from] [--is-periodic]
      ​​​​  2020-05-14 22:12:11.317979 | primary | emit_releases_file.py: error: argument --stable-release: invalid choice: 'ussuri' (choose from 'newton', 'ocata', 'pike', 'queens', 'rocky', 'stein', 'train', 'master')
      ​​​​  2020-05-14 22:12:12.165646 | primary | ERROR
      ​​​​  2020-05-14 22:12:12.165914 | primary | {
      ​​​​  2020-05-14 22:12:12.165958 | primary |   "delta": "0:00:03.616096",
      ​​​​  2020-05-14 22:12:12.165991 | primary |   "end": "2020-05-14 22:12:11.334283",
      ​​​​  2020-05-14 22:12:12.166019 | primary |   "msg": "non-zero return code",
      ​​​​  2020-05-14 22:12:12.166078 | primary |   "rc": 2,
      ​​​​  2020-05-14 22:12:12.166111 | primary |   "start": "2020-05-14 22:12:07.718187"
      ​​​​  2020-05-14 22:12:12.166140 | primary | }
      ​​​​  2020-05-14 22:12:12.214538 | 
      ​​​​  2020-05-14 22:12:12.214686 | PLAY RECAP
      ​​​​  2020-05-14 22:12:12.214748 | primary | ok: 8 changed: 5 unreachable: 0 failed: 1 skipped: 11 rescued: 0 ignored: 0
      ​​​​  2020-05-14 22:12:12.214779 | 
      ​​​​  2020-05-14 22:12:12.454582 | RUN END RESULT_NORMAL: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/run-v3.yaml@master]
      ​​​​  2020-05-14 22:12:12.454795 | POST-RUN START: [trusted : review.rdoproject.org/config/playbooks/tripleo-ci-periodic-base/post.yaml@master]
      
      • Failed job list:
        ​​​​​​  1. periodic-tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-ussuri
        ​​​​​​  2. periodic-tripleo-ci-centos-8-multinode-1ctlr-featureset010-ussuri
        
    • Fix: https://review.opendev.org/#/c/723905/ (Add ussuri support for emit_releases_file.py

)

OSP

May 14th

TripleO

Pipeline: periodic-tripleo-centos-7-rocky-containers-build-push is getting time out from TASK [build-containers : Run image build as ansible user > /home/zuul/workspace/build.log]

https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-centos-7-rocky-containers-build-push#

OSP

May 13th

TripleO

gate

OSP

  • 16.1 jobs occupy queue
    • 180 jobs still after 3/4 days
  • 16.0 - promoting content 12.1
    • getting jobs thru CI to reach some reasonable reporting state
    • solving situation around passed_phase2 promoted
      • %TODO add jira
  • 13 - live deployment reproduced
    • no CIX so far because still no clue what happens (no BZ yet)
  • 17 - nothing

May 12th

Tripleo

openstack-periodic-24h

gate

<bhagyashris|ruck> https://review.opendev.org/#/c/726993/
<bhagyashris|ruck> https://review.opendev.org/#/c/726004/

https://review.opendev.org/#/c/727113/ [linters refresh w/ afferent bugfixes] fixes this bug https://bugs.launchpad.net/tripleo/+bug/1878150 [tox-linters jobs failing with AttributeError]

Pipeline: openstack-periodic-master

https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-master#

Pipeline: openstack-periodic-latest-released

https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-latest-released

OSP

May 11th

Tripleo

Pipeline: Gate

bang.. opened bug https://bugs.launchpad.net/tripleo/+bug/1878101

Seen this a few times today
https://6d806783ed4dfdd971c5-158a33a5449e12f6f494625dd8517fb1.ssl.cf5.rackcdn.com/726374/1/gate/tripleo-ci-centos-8-undercloud-containers/2e3a947/logs/undercloud/home/zuul/undercloud_install.log

https://e453f1d8808c5b6bd184-223d8b88d73ea59070ac36b627fdc3bc.ssl.cf2.rackcdn.com/722662/1/gate/tripleo-ci-centos-8-undercloud-containers/ad8d1e6/logs/undercloud/home/zuul/undercloud_install.log

TASK [AllNodesValidationConfig] ************************************************
Monday 11 May 2020 20:46:19 +0000 (0:00:01.555) 0:00:50.282 ************
fatal: [undercloud]: FAILED! => changed=true
msg: non-zero return code
rc: 1
stderr: ''
stderr_lines: <omitted>
stdout: |-
Trying to ping default gateway 10.4.70.1Ping to 10.4.70.1 failed. Retrying
Ping to 10.4.70.1 failed. Retrying
Ping to 10.4.70.1 failed. Retrying
Ping to 10.4.70.1 failed. Retrying
Ping to 10.4.70.1 failed. Retrying
Ping to 10.4.70.1 failed. Retrying
Ping to 10.4.70.1 failed. Retrying
Ping to 10.4.70.1 failed. Retrying
Ping to 10.4.70.1 failed. Retrying
Ping to 10.4.70.1 failed. Retrying
FAILURE
10.4.70.1 is not pingable.
stdout_lines: <omitted>

Pipeline: openstack-periodic-master

https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-master#

Pipeline: openstack-periodic-latest-released

https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-latest-released

Pipeline: openstack-component-compute

https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-standalone-full-tempest-scenario-compute-master&pipeline=openstack-component-compute

* Job failures:
* periodic-tripleo-ci-centos-8-standalone-full-tempest-scenario-compute-master
https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-8-standalone-full-tempest-scenario-compute-master&pipeline=openstack-component-compute
* Note: this is failing randomly
* Details: Execute tempest tests failing because of insufficient ip adderss
https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-scenario-compute-master/cdfe0e3/logs/undercloud/var/log/tempest/stestr_results.html.gz
* Bug: https://bugs.launchpad.net/tripleo/+bug/1852770
* Fix: https://review.opendev.org/#/c/722662/

OSP

  • octavia jobs unstable
    • debugging
  • trying to get CI reports for OSP16/16.1 for RelDel

May 8th

TripleO

cirros image fixed tempest bug

~~### tempest issues on stable branches
CentOS-7 OVB jobs are RED fs001
https://bugs.launchpad.net/tripleo/+bug/1875731
https://bugs.launchpad.net/tripleo/+bug/1876972
TRAIN: GREEN except by Tempest fail in fs039
STEIN: Tempest fail ( arx is looking at it )
ROCKY: Tempest fail ( @arxcruz FYI)
QUEENS: Tempest fail ( @arxcruz FYI)
https://bugs.launchpad.net/tripleo/+bug/1876087 > tempest bug
FYI : @rfolco @arxcruz ~~

Added by bhagyashris:

container build failures = mirror issue

### container build
https://bugs.launchpad.net/tripleo/+bug/1877416 >> mirror issue

job2: periodic-tripleo-ci-centos-8-ovb-1ctlr_1comp-featureset002-master

### bring scenario10 online
@TheG Please work the networking team to bring https://zuul.openstack.org/builds?job_name=tripleo-ci-centos-8-scenario010-ovn-provider-standalone online.
(wes) removed (non-periodic) here https://review.opendev.org/#/c/726224/1/zuul.d/layout.yaml
(rfolco) periodic-tripleo-ci-centos-8-scenario010-ovn-provider-standalone-master: remove from promotion criteria if red

OSP

See https://hackmd.io/1pY-KQB_QwOe-a-5oEXTRg?view

May 7th

Tripleo

OSP

May 6th

Tripleo

1877031: queens tripleo-ci-centos-7-undercloud-upgrades broken for ansible version

OSP


Completed On-Going

undercloud containers

cix https://trello.com/c/E7gL6d4b/1490-cixlp1878101tripleociproa-ping-br-ctlplane-is-failing-too-often-trying-to-ping-default-gateway
lp https://bugs.launchpad.net/tripleo/+bug/1878101
logstash https://bit.ly/3bnXuwc
testproject https://review.rdoproject.org/r/#/c/27453/
upstream/check https://review.opendev.org/#/c/727754/2
Fix: https://review.opendev.org/#/c/727942/ (Use /32 netmask for VIPs)

INFRA Mirror issues

NOTICE: Our CI mirrors in OVH BHS1 and GRA1 regions were offline between 12:55 and 14:35 UTC, any failures there due to unreachable mirrors can safely be rechecked


History of bugs

bugs Reported
Bugzilla Name status Review
1873770 OVB fs001 in centos8 master fails to push certificates contents to controllers Incomplete
1873892 Non root login prevented on overcloud machines Fixed Release
1874019 scenario009-multinode.yaml and openshift.yaml is missing In Progress
1875352 keystone container failed to start in scenario000 Triged
1875871 periodic rocky jobs failing with missing name argument for pcs Triged
1875846 Overcloud stack creation failed because of failed dependencies. Closed
1875833 The WebSocket timed out before the Workflow completed in rocky/stain jobs New
1876087 Queens, tempest.scenario.test_network_basic_ops.TestNetworkBasicOps failing. Timeout Triged
1876096 Queens: tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern tests failed Triged
1876672 Python 2 - AttributeError: 'module' object has no attribute 'get_makefile_name' Fixed Release
1876893 Error: error removing container - device or resource busy In Progress
1877031 queens tripleo-ci-centos-7-undercloud-upgrades broken for ansible version

Handoff notes

Notes from previous RR cycle

Select a repo