Ruck and Rover notes #31

tags: ruck_rover

Important links for ruck rover's ruck/rover links to help
**Ruck Rover - Unified Sprint #31
Dates: July 30th - August 19

Tripleo CI team ruck|rover: Marios Andreou (marios) Chandan Kumar (chandankumar)
Downstream CI team ruck|rover: Filip Hubik (fhubik) Tuvya Korol (tkorol)

Vex/rdocloud introspection: vexx: https://bit.ly/3kfhIOb rdo-cloud: https://bit.ly/2Xu4jIG

Previous notes(sprint #30): https://hackmd.io/6Bx0FXwlRNCc75l39NSKvg
Next notes(sprint #32): none - this sprint 31 is current one

on-going issues

OSP

Jenkins TLV2 migration polishing
* UMB doesn't work
* https://projects.engineering.redhat.com/browse/RHOSINFRA-3635

LIBVIRT LEASE (RHEL8.x, x in 0-2) BUG:
Escalation: https://trello.com/c/I0ix688S will affect OSP16.1 for some time going forward (https://bugzilla.redhat.com/show_bug.cgi?id=1840307 [libvirt])

Remaining OSP13 escalation was not looked upon yet
Upgrade DFG p3 job is not getting SWAP_PUDDLENO* param from layer above


Thu 20 Aug

tripleo

New/Transient/Nobug yet:

Rerunning full tempest ussuri - https://review.rdoproject.org/r/29018

https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-scenario-ussuri/736c131/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz

{2} neutron_tempest_plugin.scenario.test_floatingip.FloatingIpMultipleRoutersTest.test_reuse_ip_address_with_other_fip_on_other_router [147.500962s] ... FAILED

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/test_floatingip.py", line 563, in test_reuse_ip_address_with_other_fip_on_other_router
        servers_num=1, fip_addresses=[ip_address])
      File "/usr/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/test_floatingip.py", line 477, in _create_network_and_servers
        network=network, fip_address=fip))
      File "/usr/lib/python3.6/site-packages/neutron_tempest_plugin/scenario/test_floatingip.py", line 498, in _create_server_and_fip
        port=port)
      File "/usr/lib/python3.6/site-packages/neutron_tempest_plugin/api/base.py", line 634, in create_floatingip
        **kwargs)['floatingip']
      File "/usr/lib/python3.6/site-packages/neutron_tempest_plugin/services/network/json/network_client.py", line 972, in create_floatingip
        resp, body = self.post(uri, body)
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 283, in post
        return self.request('POST', url, extra_headers, headers, body, chunked)
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 687, in request
        self._error_checker(resp, resp_body)
      File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 808, in _error_checker
        raise exceptions.Conflict(resp_body, resp=resp)
    tempest.lib.exceptions.Conflict: Conflict with state of target resource
    Details: {'type': 'IpAddressAlreadyAllocated', 'message': 'IP address 192.168.24.133 already allocated in subnet 356b93e1-6377-4055-9a6e-ce4a93b213cb', 'detail': ''}

osp

R&R transitioning and knowledge transfer

Libvirt fix testing (fhubik)

UMB debug (fhubik)

Wed 19 Aug \o/

tripleo

New/Transient/Nobug yet:

Train standalone jobs are failing check/gate - unable to resolve tripleo_deploy_control_virtual_ip - https://bugs.launchpad.net/tripleo/+bug/1892078
ERROR: Package 'pymod2pkg' requires a different Python: 2.7.5 not in '>=3.6' on tripleo-ci-centos-7-containers-multinode-train due to mirror issues
2020-08-19 00:08:43.959611 | primary | TASK [build-test-packages : Pip install rdopkg] ********************************
2020-08-19 00:08:43.959806 | primary | Wednesday 19 August 2020  00:08:43 +0000 (0:00:06.832)       0:05:19.651 ******
2020-08-19 00:08:48.608093 | primary | fatal: [undercloud]: FAILED! => {
2020-08-19 00:08:48.608221 | primary |     "changed": false,
2020-08-19 00:08:48.608288 | primary |     "cmd": [
2020-08-19 00:08:48.608407 | primary |         "/home/zuul/dlrn-venv/bin/pip2",
2020-08-19 00:08:48.608491 | primary |         "install",
2020-08-19 00:08:48.608558 | primary |         "-U",
2020-08-19 00:08:48.608637 | primary |         "rdopkg"
2020-08-19 00:08:48.608689 | primary |     ]
2020-08-19 00:08:48.608729 | primary | }
2020-08-19 00:08:48.608778 | primary |
2020-08-19 00:08:48.608824 | primary | MSG:
2020-08-19 00:08:48.608861 | primary |
2020-08-19 00:08:48.609182 | primary | stdout: Looking in indexes: https://mirror.ord.rax.opendev.org/pypi/simple, https://mirror.ord.rax.opendev.org/wheel/centos-7.8-x86_64
2020-08-19 00:08:48.609257 | primary | Collecting rdopkg
2020-08-19 00:08:48.609637 | primary |   Downloading https://mirror.ord.rax.opendev.org/pypifiles/packages/db/e0/3102e985c43b9fc6aeddb3279a7d338a745fcdacfc3cf3dd56d15d5ba1cf/rdopkg-1.2.0-py2-none-any.whl (71 kB)
2020-08-19 00:08:48.609743 | primary | Collecting distroinfo>=0.3.0
2020-08-19 00:08:48.610146 | primary |   Downloading https://mirror.ord.rax.opendev.org/pypifiles/packages/88/e5/e3f6a502251476966273a2065befb38d37687a58c50a3afd9dd92895f4a1/distroinfo-0.3.2-py2-none-any.whl (18 kB)
2020-08-19 00:08:48.610227 | primary | Collecting blessings
2020-08-19 00:08:48.610599 | primary |   Downloading https://mirror.ord.rax.opendev.org/pypifiles/packages/8d/b1/a3fe6fd8a012e6d019bafd671c2fee0597ea97ff2e76c25aadfa4545fc32/blessings-1.7-py2-none-any.whl (26 kB)
2020-08-19 00:08:48.610681 | primary | Collecting munch
2020-08-19 00:08:48.611083 | primary |   Downloading https://mirror.ord.rax.opendev.org/pypifiles/packages/cc/ab/85d8da5c9a45e072301beb37ad7f833cd344e04c817d97e0cc75681d248f/munch-2.5.0-py2.py3-none-any.whl (10 kB)
2020-08-19 00:08:48.611163 | primary | Collecting requests
2020-08-19 00:08:48.611545 | primary |   Downloading https://mirror.ord.rax.opendev.org/pypifiles/packages/45/1e/0c169c6a5381e241ba7404532c16a21d86ab872c9bed8bdcd4c423954103/requests-2.24.0-py2.py3-none-any.whl (61 kB)
2020-08-19 00:08:48.611633 | primary | Collecting pbr>=0.5.6
2020-08-19 00:08:48.612027 | primary |   Downloading https://mirror.ord.rax.opendev.org/pypifiles/packages/96/ba/aa953a11ec014b23df057ecdbc922fdb40ca8463466b1193f3367d2711a6/pbr-5.4.5-py2.py3-none-any.whl (110 kB)
2020-08-19 00:08:48.612108 | primary | Collecting pymod2pkg
2020-08-19 00:08:48.612465 | primary |   Downloading https://mirror.ord.rax.opendev.org/pypifiles/packages/c9/04/d59f150b9e0f9c377dc3efe71f737ddebfb34173d3ca1dc94cef09f20aa1/pymod2pkg-0.25.0.tar.gz (17 kB)
2020-08-19 00:08:48.612503 | primary |
2020-08-19 00:08:48.613225 | primary | :stderr: DEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
2020-08-19 00:08:48.613423 | primary | ERROR: Package 'pymod2pkg' requires a different Python: 2.7.5 not in '>=3.6'

https://bugs.launchpad.net/tripleo/+bug/1892078 Train standalone jobs are failing check/gate - unable to resolve tripleo_deploy_control_virtual_ip

NOT A BLOCKER https://bugs.launchpad.net/tripleo/+bug/1892169 periodic centos 8 train FS20 tempest fails

osp

First libvirt scratchbuild done https://bugzilla.redhat.com/show_bug.cgi?id=1868271#c3

  • testing has begun

Tue 18 Aug

tripleo ongoing issues duplicated here - no need check previous days

https://bugs.launchpad.net/puppet-openstack-integration/+bug/1891992 [Master][scenario002][ec2api] Tempest test(test_create_delete_bucket) failing

https://bugs.launchpad.net/tripleo/+bug/1891372 rocky periodic jobs are failing with " Error: image tripleorocky/centos-binary-tempest:9801dc7461cbd6cbd73868e72e74d21d586c6708_fbb4de96-updated-20200812131025 not found"

https://bugs.launchpad.net/tripleo/+bug/1890798 periodic centos8 Ussuri multinode minor update job fails: tderr": "Error: resource 'ip-192.168.24.16' is not running on any node

https://bugs.launchpad.net/tripleo/+bug/1890389 [Master] tripleoclient does not play well with cliff-3.4.0

https://bugs.launchpad.net/tripleo/+bug/1889122 mirror timeouts in upstream causing undercloud and standalone failures

https://bugs.launchpad.net/tripleo/+bug/1891971 ERROR: No matching distribution found for pprint (from -r /home/zuul/src/opendev.org/openstack/tripleo-ci/test-requirements.txt (line 6))

New/Transient/Nobug yet:

[/etc/sysconfig/network-scripts/ifup-eth] Error, some other host (FA:16:3E:CB:52:88) already uses address 10.0.0.1.
    2020-08-17 22:06:20.314877 | primary | [2020/08/17 10:06:20 PM] [ERROR] stdout: ERROR     : [/etc/sysconfig/network-scripts/ifup-eth] Error, some other host (FA:16:3E:CB:52:88) already uses address 10.0.0.1.
2020-08-17 22:06:20.314884 | primary | , stderr: WARN      : [ifup] You are using 'ifup' script provided by 'network-scripts', which are now deprecated.
2020-08-17 22:06:20.314889 | primary | WARN      : [ifup] 'network-scripts' will be removed in one of the next major releases of RHEL.
2020-08-17 22:06:20.314895 | primary | WARN      : [ifup] It is advised to switch to 'NetworkManager' instead - it provides 'ifup/ifdown' scripts as well.
2020-08-17 22:06:20.314901 | primary |
2020-08-17 22:06:20.314907 | primary | Traceback (most recent call last):
2020-08-17 22:06:20.314912 | primary |   File "/bin/os-net-config", line 10, in <module>
2020-08-17 22:06:20.314918 | primary |     sys.exit(main())
2020-08-17 22:06:20.314924 | primary |   File "/usr/lib/python3.6/site-packages/os_net_config/cli.py", line 349, in main
2020-08-17 22:06:20.314935 | primary |     activate=not opts.no_activate)
2020-08-17 22:06:20.314941 | primary |   File "/usr/lib/python3.6/site-packages/os_net_config/impl_ifcfg.py", line 1881, in apply
2020-08-17 22:06:20.314963 | primary |     raise os_net_config.ConfigurationError(message)
2020-08-17 22:06:20.314968 | primary | os_net_config.ConfigurationError: Failure(s) occurred when applying configuration
```~~~~
FATAL | Gather podman infos | overcloud-controller-1 green at test https://review.rdoproject.org/r/#/c/28994/
ATAL | Gather podman infos | overcloud-controller-1 | error={"changed": false, "msg": "Unable to gather info for ['1d80a14d7bed', '7cc5974a59bf', '5c448e9a9b79', '47516ce77307', '469b8894a78c', '9884d1e53afa', '8b3d2d5201eb', '9a8ba7c83a01', '5f304de29125', 'b7c321e8a0ca', 'd342acc6aa41', 'cd858ff84a36', 'c33350c68cbc', '4df3404e0f3a', '632c45236f23', '844896406c71', 'dc9f95bf777b', '793fde778317', 'a6c517024d9d', 'fe3b087c1ade', '1732b1f73e0c', '75bfbfbd8e16', '5a24946172c6', '09dcb0c8e7e9', 'a78f95a57861', '9ab385dc1e1e', '89d1f7f0a79d', 'e7288b4cb64d', '732098f889aa', 'b84405fd0e40', '38071fd3ea74', '08db348b3e0f', '060cab6e76d3', 'f5377e4e694f', 'e518ba88968a', '91841060c926', '971cc8c40ee2', '19b381d57ba5']: Error: error looking up container \"1d80a14d7bed\": no container with name or ID 1d80a14d7bed found: no such container\n"}
centos7 periodic ovb new bug https://bugs.launchpad.net/tripleo/+bug/1892008
heat stacks doing better today (fixed typo http://pastebin.test.redhat.com/893941)

http://dashboard-ci.tripleo.org/d/wb8HBhrWk/cockpit?orgId=1&fullscreen&panelId=231

15:11 < Tengu> marios|ruck: hello there! already seen this in stable/ussuri? ERROR: Could not find a version that satisfies the requirement python-glanceclient===3.1.2

15:12 < Tengu> marios|ruck: example job: https://review.opendev.org/#/c/746635/1

osp

Phase3 DFG:DF retrospective mtg today

UMB still occurs

  • UMB debugged by 3 people on all layers
  • looking into Upgrade jobs also

Mon 17 Aug

tripleo

https://bugs.launchpad.net/tripleo/+bug/1891372 rocky periodic jobs are failing with " Error: image tripleorocky/centos-binary-tempest:9801dc7461cbd6cbd73868e72e74d21d586c6708_fbb4de96-updated-20200812131025 not found"

https://bugs.launchpad.net/tripleo/+bug/1890798 periodic centos8 Ussuri multinode minor update job fails: tderr": "Error: resource 'ip-192.168.24.16' is not running on any node

https://bugs.launchpad.net/tripleo/+bug/1890389 [Master] tripleoclient does not play well with cliff-3.4.0

https://bugs.launchpad.net/tripleo/+bug/1889122 mirror timeouts in upstream causing undercloud and standalone failures

New/Transient/Nobug yet:

* Running c8 ussuri fs030 timed out job: https://review.rdoproject.org/r/28913

prefetch_image”, “changed”: false, “msg”: “Failed to pull image
Ussuri current-tripleo-rdo promoter error missing containers?
need to clean RDO heat stacks, ports etc..

http://dashboard-ci.tripleo.org/d/wb8HBhrWk/cockpit?orgId=1&fullscreen&panelId=231

rhos-17 component pipeline is all red

2020-08-17 04:40:55.141521 | primary | - Status code: 403 for http://download.devel.redhat.com/rcm-guest/puddles/OpenStack/17.0-RHEL-8/latest-RHOS_TRUNK-17-RHEL-8/compose/OpenStack/x86_64/os/repodata/repomd.xml (IP: 10.0.14.183)
https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-standalone-compute-rhos-17/c87a988/job-output.txt

osp

Investigation of weekend's 16.1 p3 results

  • hitting titan80 issue - whole p3 broken initially
    • retriggering also with issues (abandoned mjobs, previous were running)
    • email explaining it sent
    • majority of results should be now complete
  • UMB still not working in automation
  • reconsidering p2->p3 triggering to be only manual?
  • plan to check OSP13 ntp escalation once new OSP13 is done

Thu 13 Aug

tripleo

https://bugs.launchpad.net/tripleo/+bug/1891317 openstack-tox-tht ci failing with sudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error

https://bugs.launchpad.net/tripleo/+bug/1891372 rocky periodic jobs are failing with " Error: image tripleorocky/centos-binary-tempest:9801dc7461cbd6cbd73868e72e74d21d586c6708_fbb4de96-updated-20200812131025 not found"

https://bugs.launchpad.net/tripleo/+bug/1891179 periodic OVB train centos 7 failing ovb-manage No server with a name or ID

https://bugs.launchpad.net/tripleo/+bug/1890798 periodic centos8 Ussuri multinode minor update job fails: tderr": "Error: resource 'ip-192.168.24.16' is not running on any node

https://bugs.launchpad.net/tripleo/+bug/1890389 [Master] tripleoclient does not play well with cliff-3.4.0

https://bugs.launchpad.net/tripleo/+bug/1889122 mirror timeouts in upstream causing undercloud and standalone failures

New/Transient/Nobug yet:

master integration pipeline containers failed (all else skip)
ussuri/train integration pipe ipa image build fail

osp

Fighting to get 16.1 phase3 thru

13 on 7.9 situation is not clear yet

Wed 12 Aug

tripleo

https://bugs.launchpad.net/tripleo/+bug/1891293 periodic centos-8 scen10 standalone master fails tempest - octavia_tempest_plugin scenario.v2

https://bugs.launchpad.net/tripleo/+bug/1891317 openstack-tox-tht ci failing with sudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error

https://bugs.launchpad.net/tripleo/+bug/1891372 rocky periodic jobs are failing with " Error: image tripleorocky/centos-binary-tempest:9801dc7461cbd6cbd73868e72e74d21d586c6708_fbb4de96-updated-20200812131025 not found"

https://bugs.launchpad.net/tripleo/+bug/1891179 periodic OVB train centos 7 failing ovb-manage No server with a name or ID

https://bugs.launchpad.net/tripleo/+bug/1891000 master network component failing tempest - neutron containers unexpected keyword argument 'libc'

https://bugs.launchpad.net/tripleo/+bug/1890798 periodic centos8 Ussuri multinode minor update job fails: tderr": "Error: resource 'ip-192.168.24.16' is not running on any node

https://bugs.launchpad.net/tripleo/+bug/1890389 [Master] tripleoclient does not play well with cliff-3.4.0

https://bugs.launchpad.net/tripleo/+bug/1889122 mirror timeouts in upstream causing undercloud and standalone failures

https://bugs.launchpad.net/tripleo/+bug/1891287 periodic integration pipelines POST_FAIL promoted-components-to-tripleo-ci-testing

New/Transient/Nobug yet:

POST_FAIL bad buildsets periodic integration & component master/ussuri/train8/steinhttps://bugs.launchpad.net/tripleo/+bug/1891287
11:30 < ykarel> marios|ruck, is scenario010 tempest failures known? https://bugs.launchpad.net/tripleo/+bug/1891293

osp

Jenkins migration to https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/ done by most part

  • p1/p2/p3 resuming normal operation, but we are monitoring issues closely and expecting/experiencing minor issues (buildmarks, reevaluating reports, UMB issue(s), stuck threads, …)
      • Jenkins restart did not help clear this

CI quality might be degraded because of libvirt issue (https://trello.com/c/I0ix688S)

OSP13 - started moving CI jobs to RHEL-7.9 as default
https://projects.engineering.redhat.com/browse/RHOSINFRA-3631

OSP16.2 - first phase1 job added, early development stage
https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/view/QE/view/OSP16.2/

Tue 11 Aug

tripleo

https://bugs.launchpad.net/tripleo/+bug/1891179 periodic OVB train centos 7 failing ovb-manage No server with a name or ID

https://bugs.launchpad.net/tripleo/+bug/1891000 master network component failing tempest - neutron containers unexpected keyword argument 'libc'

https://bugs.launchpad.net/tripleo/+bug/1890798 periodic centos8 Ussuri multinode minor update job fails: tderr": "Error: resource 'ip-192.168.24.16' is not running on any node

https://bugs.launchpad.net/tripleo/+bug/1890389 [Master] tripleoclient does not play well with cliff-3.4.0

https://bugs.launchpad.net/tripleo/+bug/1890266 centos 8 security component + integration pipeline master - Failed container(s): ['nova_wait_for_api_service

https://bugs.launchpad.net/tripleo/+bug/1889122 mirror timeouts in upstream causing undercloud and standalone failures

New/Transient/Nobug yet:

train 8 missed promotion for fs1 (tempest)
train 7 ovb-manage attach instance to network fails https://bugs.launchpad.net/tripleo/+bug/1891179
why is master cloudops component not promoting?
why train common component isn't promoting? https://review.rdoproject.org/r/28928
http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016438.html
  • "Ansible 2.8.14 and 2.9.12 change the default mode, that created files will get, from 0666 (with umask; which would usually produce 0644) to 0600. [1]"
  • 18:30 < ykarel|away> marios|ruck, ack i fired https://review.rdoproject.org/r/#/c/28929/ to test tripleo deploys with ansible-2.9.12,

osp

  • libvirt bug hit again, typo in my patch, fixed hopefuly now by part2
  • Jenkins restart
    • jjb not updating, 504 by Jenkins, threads deadlocked, related to UMB
      • possibly can explain the p3 UMB delay? unclear
  • Ci slaves to RHEL-8.2 upgrade ongoing
    • People requesting advanced. virt's qemu-kvm should hopefully get it
      • once it is done

Mon 10 Aug

tripleo

https://bugs.launchpad.net/tripleo/+bug/1890997 USSURI periodic ovb fs35 fails tempest "computeFault": {"code": 500, "message": "Unexpected API Error

https://bugs.launchpad.net/tripleo/+bug/1891000 master network component failing tempest - neutron containers unexpected keyword argument 'libc'

https://bugs.launchpad.net/tripleo/+bug/1890798 periodic centos8 Ussuri multinode minor update job fails: tderr": "Error: resource 'ip-192.168.24.16' is not running on any node

https://bugs.launchpad.net/tripleo/+bug/1890389 [Master] tripleoclient does not play well with cliff-3.4.0

https://bugs.launchpad.net/tripleo/+bug/1890266 centos 8 security component + integration pipeline master - Failed container(s): ['nova_wait_for_api_service

https://bugs.launchpad.net/tripleo/+bug/1889122 mirror timeouts in upstream causing undercloud and standalone failures

New/Transient/Nobug yet:

10:40 < ykarel> chkumar|rover, marios|ruck is issue with network component known?

10:40 < ykarel> we seeing in puppet promotion, so similar should be hitting in network component
10:41 < ykarel> https://logserver.rdoproject.org/ci.centos.org/weirdo-generic-puppet-openstack-scenario001/16211/weirdo-project/logs/neutron/l3-agent.txt.gz
10:46 < marios|ruck> ykarel: haven't seen it and the fails on the component are tempest
https://logserver.rdoproject.org/openstack-component-network/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-multinode-1ctlr-featureset010-network-master/b996fdb/job-output.txt
10:46 < marios|ruck> ykarel: noting on the hackmd for now
10:47 -!- tosky [~tosky@dynamic-adsl-78-13-252-77.clienti.tiscali.it] has joined #oooq
10:47 < ykarel> yes tempest failure is likely due to error i shared above
10:48 < ykarel> marios|ruck, and for info it's caused after https://review.opendev.org/#/c/722254/
filed https://bugs.launchpad.net/tripleo/+bug/1891000

new ussuri fs35 blocker tempest fails

https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-ussuri/0a45ef9/logs/undercloud/var/log/tempest/stestr_results.html.gz
https://logserver.rdoproject.org/openstack-periodic-integration-stable1/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-ussuri/45d7fc9/logs/undercloud/var/log/tempest/stestr_results.html.gz
filed https://bugs.launchpad.net/tripleo/+bug/1890997

master security component not promoting

osp

  • Mainly going thru p3 results and cross checking against the libvirt issue
  • OSP13 passed 2 shows issues, possibly related to the open ntp escalation
  • Attempt to UMB "debug", since last p3 was not triggered right
    • 4hrs delay, but might be caused by these remains of TLV outage
      • well keep monitoring, since we dont have any tools to debug it (afaik)

Fri 07 Aug

tripleo

https://bugs.launchpad.net/tripleo/+bug/1890798 periodic centos8 Ussuri multinode minor update job fails: tderr": "Error: resource 'ip-192.168.24.16' is not running on any node

https://bugs.launchpad.net/tripleo/+bug/1890389 [Master] tripleoclient does not play well with cliff-3.4.0

https://bugs.launchpad.net/tripleo/+bug/1890266 centos 8 security component + integration pipeline master - Failed container(s)['nova_wait_for_api_service

https://bugs.launchpad.net/tripleo/+bug/1885314 vexx: OVB master job running on vexxhost show some nodes failing introspection step

https://bugs.launchpad.net/tripleo/+bug/1889122 mirror timeouts in upstream causing undercloud and standalone failures

New/Transient/Nobug yet:

periodic-tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-ussuri -rerunning in testproject https://review.rdoproject.org/r/28890
train 8 can promote today \o/ (hope)

osp

  • CI ressurestion in TLV2 lab, focus on 0804 p3 16.1 to be the only one that needs p3 run
    • build-marks broken, mjob not reevaluating
      • fixed by psedlak, not clear how yet, still TBI
    • libvirt bug fix testing done and merged
    • p3 triggered by UMB, but triggering issue found?!
      • delay in UMB reaction 4hrs? What the heck?!
        • TBI
      • but it run in the end, results are good (relatively)
        • a lot of jobs reached yellow/blue
        • from quick view coudln't find any libvirt bug being hiz

Thu 06 Aug

tripleo

https://bugs.launchpad.net/tripleo/+bug/1890571 periodic integration/component jobs failing "[Zuul] Log Stream did not terminate"

https://bugs.launchpad.net/tripleo/+bug/1890389 [Master] tripleoclient does not play well with cliff-3.4.0

https://bugs.launchpad.net/tripleo/+bug/1890266 centos 8 security component + integration pipeline master - Failed container(s): ['nova_wait_for_api_service

https://bugs.launchpad.net/tripleo/+bug/1885314 vexx: OVB master job running on vexxhost show some nodes failing introspection step

https://bugs.launchpad.net/tripleo/+bug/1889122 mirror timeouts in upstream causing undercloud and standalone failures

New/Transient/Nobug yet:

train gate too many requests tripleo/+bug/1889122 https://review.opendev.org/#/c/744955/
integration pipelines failing:

osp

  • outage continues, prolonged by IT by 1 day

Wed 05 Aug

tripleo

https://bugs.launchpad.net/tripleo/+bug/1890389 [Master] tripleoclient does not play well with cliff-3.4.0

https://bugs.launchpad.net/tripleo/+bug/1890266 centos 8 security component + integration pipeline master - Failed container(s): ['nova_wait_for_api_service

https://bugs.launchpad.net/tripleo/+bug/1889764 /sbin/pcs cluster setup tripleo_cluster standalone addr=192.168.24.1 token 10000 encryption 1' returned 1 instead of one of [0] train https://review.opendev.org/#/c/744192/ merged

https://bugs.launchpad.net/tripleo/+bug/1885314 vexx: OVB master job running on vexxhost show some nodes failing introspection step

https://bugs.launchpad.net/tripleo/+bug/1889122 mirror timeouts in upstream causing undercloud and standalone failures

New/Transient/Nobug yet:

NODE_FAIL periodic integration train first run today & skipped all - also master/component
various gate timeout & fails including 'too many requests'

osp

Tue 04 Aug

tripleo

https://bugs.launchpad.net/tripleo/+bug/1890266 centos 8 security component + integration pipeline master - Failed container(s): ['nova_wait_for_api_service

https://bugs.launchpad.net/tripleo/+bug/1889764 /sbin/pcs cluster setup tripleo_cluster standalone addr=192.168.24.1 token 10000 encryption 1' returned 1 instead of one of [0]

https://bugs.launchpad.net/tripleo/+bug/1885314 vexx: OVB master job running on vexxhost show some nodes failing introspection step

https://bugs.launchpad.net/tripleo/+bug/1889122 mirror timeouts in upstream causing undercloud and standalone failures

New/Transient/Nobug yet:

train promotion blocked https://bugs.launchpad.net/tripleo/+bug/1889764/comments/10
vexx introspection/resource issues continue - examples periodic/traincentos8
Config update on promoter https://review.rdoproject.org/r/28842

osp

phase2 OSP16.1 new puddle (RHOS-16.1-RHEL-8-20200803.n.0)

Mon 03 Aug

tripleo

https://bugs.launchpad.net/tripleo/+bug/1889764 /sbin/pcs cluster setup tripleo_cluster standalone addr=192.168.24.1 token 10000 encryption 1' returned 1 instead of one of [0]

https://bugs.launchpad.net/tripleo/+bug/1889122 mirror timeouts in upstream causing undercloud and standalone failures

https://bugs.launchpad.net/tripleo/+bug/1885314 vexx: OVB master job running on vexxhost show some nodes failing introspection step

New/Transient/Nobug yet:

osp

Fri 31 Jul

tripleo

https://bugs.launchpad.net/tripleo/+bug/1889764 /sbin/pcs cluster setup tripleo_cluster standalone addr=192.168.24.1 token 10000 encryption 1' returned 1 instead of one of [0]

https://bugs.launchpad.net/tripleo/+bug/1889122 mirror timeouts in upstream causing undercloud and standalone failures

https://bugs.launchpad.net/tripleo/+bug/1889357 Centos7 Check/Gate jobs failing with UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 2: ordinal not in range(128)

Periodic promotion

New/Transient/Nobug yet:

gate - vexx introspection issues

gate

TIMING | tripleo-modify-image : Write Dockerfile to {{ modify_dir_path }} | 0:00:22.924 | 0.04s\n\x1b[Ke30=\x1b[4D\x1b[K'
Stderr: '[WARNING]: provided hosts list is empty, only localhost is available. Note that\nthe implicit localhost does not match \'all\'\nFatal Python error: GC object already tracked\n\nCurrent thread 0x00007f75f2a69700 (most recent call first):\n  File "/usr/lib64/python3.6/multiprocessing/connection.py", line 390 in _recv\n  File "/usr/lib64/python3.6/multiprocessing/connection.py", line 411 in _recv_bytes\n  File "/usr/lib64/python3.6/multiprocessing/connection.py", line 220 in recv_bytes\n  File "/usr/lib64/python3.6/multiprocessing/queues.py", line 94 in get\n  File "/usr/lib/python3.6/site-packages/ansible/plugins/strategy/__init__.py", line 84 in results_thread_main\n  File "/usr/lib64/python3.6/threading.py", line 864 in run\n  File "/usr/lib64/python3.6/threading.py", line 916 in _bootstrap_inner\n  File "/usr/lib64/python3.6/threading.py", line 884 in _bootstrap\n\nThread 0x00007f7602bca740 (most recent call first):\n  File "/usr/lib/python3.6/site-packages/ansible/plugins/strategy/__init__.py", line 788 in _wait_on_pending_results\n  File "/usr/lib/python3.6/site-packages/ansible/plugins/strategy/linear.py", line 325 in run\n  File "/usr/lib/python3.6/site-packages/ansible/executor/task_queue_manager.py", line 244 in run\n  File "/usr/lib/python3.6/site-packages/ansible/executor/playbook_executor.py", line 169 in run\n  File "/usr/lib/python3.6/site-packages/ansible/cli/playbook.py", line 127 in run\n  File "/bin/ansible-pl
grep: /tmp/container-*/docker: No such file or directory

Investigation continues..

2020-07-31 05:19:52.344 84425 ERROR paunch [  ] Error executing ['podman', 'container', 'exists', 'rabbitmq_init_logs']: returned 1
2020-07-31 05:19:52.435 84425 WARNING paunch [  ] Did not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=rabbitmq_init_logs', '--filter', 'label=config_id=tripleo_step1', '--format', '{{.Names}}']" - retrying without config_id
2020-07-31 05:19:52.556 84425 WARNING paunch [  ] Did not find container with "['podman', 'ps', '-a', '--filter', 'label=container_name=rabbitmq_init_logs', '--format', '{{.Names}}']"
Best worst case (for wed) centos-7 train candidate

https://trunk.rdoproject.org/api-centos-train/api/civotes_detail.html?commit_hash=6f44509dcb4faa4bc0340ae138c86d77ef2e2c84&distro_hash=14a932b45720ecb4c2cc5a8811fc6c59ba6255d5

osp

fhubik/tkorol taking over from previous rucks

  • knowledge transfer
  • few CI p2 retriggers, but no new fires so far

Thu 30 Jul

tripleo

####New/Transient/Nobug yet:

https://bugs.launchpad.net/tripleo/+bug/1889524 periodic centos8 ovb featureset 1 baremetal master fails ironic unexpected keyword argument 'hash_function

https://bugs.launchpad.net/tripleo/+bug/1889529 periodic centos8 scenario10 network component standalone master timeout tempest conflicting state

https://bugs.launchpad.net/tripleo/+bug/1889553 centos8 periodic master jobs failing tempest 'no more ip addresses'

FYI.. tripleo centos7 ovb jobs on rdocloud https://review.rdoproject.org/r/#/c/28705/

Best centos-7 train candidate
https://trunk.rdoproject.org/api-centos-train/api/civotes_detail.html?commit_hash=6f44509dcb4faa4bc0340ae138c86d77ef2e2c84&distro_hash=14a932b45720ecb4c2cc5a8811fc6c59ba6255d5

triggered centos8 train

osp

R&R transfer, sprint planning, shifting duties, finishing prev. tasks

Select a repo