owned this note
owned this note
Published
Linked with GitHub
# Ruck and Rover notes #30
###### tags: `ruck_rover`
:::info
Important links for ruck rover's [ruck/rover links to help](https://hackmd.io/07z0xroHTFi2IbX93P5ZfQ)
**Ruck Rover - Unified Sprint #30
Dates: July 9th - July 29
Tripleo CI team ruck|rover: Sorin Sbarnea (zbr) , Sandeep Yadav (ysandeep), backup - rlandy and weshay
Previous notes(sprint #29): https://hackmd.io/XcuH2OIVTMiuxyrqSF6ocw?both
**Next #30 notes: https://hackmd.io/QnprH9-yRTi6uWlEfaahoQ**
:::
[TOC]
---
## on-going issues
:::danger
## TripleO
* https://bugs.launchpad.net/tripleo/+bug/1889357 - Centos7 Check/Gate jobs failing with UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 2: ordinal not in range(128)
Latest pip-20.2(released today) break Centos7 jobs. Patch is up to pin pip for c7 - https://review.opendev.org/#/c/743691/
* https://launchpad.net/bugs/1889122 mirror timeouts in upstream causing undercloud and standalone failures. https://review.opendev.org/#/c/743432/ Patch to locally build container to reduce pressure on docker.io.
### gate
### periodic / 3rd party
* Master Periodic jobs: Sc001/002 failing with Error: ", " Problem: package rdma-core-26.0-8.el8.x86_64 requires dracut, but none of the providers can be installed
Patches are up to fix issue
* introspection issues on vexxhost continue:
testproject rechecked the train/stein/rocky/queens jobs but still a lot failing introspection. Note that there was a vexxhost update at the end of last week
QUESTION: should we run these testproject jobs on RDO cloud to clear promotions??
* main (master) pipeline
so here we see a bunch of errors where the fixes are sitting in different components - and we need a combination of those components to promote
- https://bugs.launchpad.net/tripleo/+bug/1885602
- https://bugs.launchpad.net/tripleo/+bug/1887856
- and the lastest to show up ...
Failed to import test module: heat_tempest_plugin.tests.functional.test_create_update_neutron_trunk
^^ think this requires a heat update.
QUESTION: time to force some component promotions to clear this??
* TASK: many tests are commented out in the promotion criteria. Request to review all the ini files on the promoter and review the criteria.
:::
## July 29th
### TripleO
* **Promotion blocker** - https://bugs.launchpad.net/tripleo/+bug/1889357 - Centos7 Check/Gate jobs failing with UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 2: ordinal not in range(128)
Latest pip-20.2(released today) break Centos7 jobs.
Patch is up to pin pip for c7 - https://review.opendev.org/#/c/743691/
* https://bugs.launchpad.net/tripleo/+bug/1889394 - Undercloud deployment reports false successfully deployment but deployment was already failed with RuntimeError: wait_api_port_ready: Max retries 30 reached
Patch - https://review.opendev.org/#/c/743744/ is up by Emilien.
## July 28th
### TripleO
* https://launchpad.net/bugs/1889122 mirror timeouts in upstream causing undercloud and standalone failures
https://review.opendev.org/#/c/743432/ Patch to locally build container to reduce pressure on docker.io.
Another Bug around same context: https://bugs.launchpad.net/tripleo/+bug/1889372
containers image prepare should adjust numbers of workers and exp. fallback interval upon retrying connections
Bogdon porposed a Patch https://review.opendev.org/#/c/743704/
* Master awaiting promotion from last 5 days.
BZ:-https://bugs.launchpad.net/tripleo/+bug/1889192
Master Periodic jobs: Sc001/002 failing with Error: \", \" Problem: package rdma-core-26.0-8.el8.x86_64 requires dracut, but none of the providers can be installed
This came to be already known issue discussed already b/w emilien and yatin.
A work in progress patch[1] is already proposed and test run[2] is green using it.
[1] https://review.opendev.org/#/c/743263/
[2] https://review.rdoproject.org/r/#/c/28723/
* Rocky awaiting promotion from last 9 days
Earlier, ovb jobs failed(seems due to vexx host infra issue) and this weekend jobs were skipped container-push job timedout
I ran a Testproject to get rocky promotion :- https://review.rdoproject.org/r/#/c/28437/
Wes acked to take care of rocky promotion
## July 27th
### TripleO
(Solved)
(Improvement)
Valdation pipeline to be created and jobs to be added in validation pipeline
* Upstream Check/Gate jobs failing during python-tripleoclient rpm build with error - No matching package to install: 'validations-common' - https://bugs.launchpad.net/tripleo/+bug/1889045
This seems to be caused by recent component movement in rdoinfo:- "Create validation component for validation framework" , Patch - https://review.rdoproject.org/r/#/c/28511/
we created - Dummy distgit commits to force build first repo with all validation packages
~~~
https://review.rdoproject.org/r/#/c/28711/ - master
https://review.rdoproject.org/r/#/c/28713/ - ussuri
https://review.rdoproject.org/r/#/c/28714/ - train
~~~
## July 20th
### TripleO
(Solved)
* To clear https://bugs.launchpad.net/tripleo/+bug/1887856 we merged https://review.rdoproject.org/r/#/c/28604/ and rerun intergration main line to confirm it works.
## July 19th
### TripleO
* zuul was reset on Friday - some gate jobs may have to be rerun
## July 17th
### TripleO
(Solved)
* To fix https://bugs.launchpad.net/tripleo/+bug/1887856 - With help of test patch we tried to get tripleo/clients components promoted - https://review.rdoproject.org/r/28586
Awaiting integration pipeline promotion for the fix to reach current-tripleo
## July 16th
### TripleO
(Solved - it was duplicate of:
https://bugs.launchpad.net/tripleo/+bug/1885602)
* tripleo-ci-centos-8-scenario010-standalone check/gate/periodic jobs failing on tempest tests with error - tempest.lib.exceptions.Forbidden: Forbidden , Details: {'faultcode': 'Client', 'faultstring': 'Policy does not allow this request to be performed.', 'debuginfo': None
https://bugs.launchpad.net/tripleo/+bug/1887790
(Solved)
* [1887856 - centos-8 master tripleo component tests are failing with "ModuleNotFoundError: No module named 'blazarclient'"](https://bugs.launchpad.net/tripleo/+bug/1887856)
~~~
<tosky> ysandeep|rover: the fix for octavia tempest plugin is not there yet; it was added in the next commit
<ysandeep|rover> tosky, Hey o/ thank you, do you have that commit handy?
<tosky> ysandeep|rover: https://opendev.org/osf/python-tempestconf/commit/7ee63b1517b7412c8e25f2842b207339a70f62c6
<ysandeep|rover> tosky, thanks!
~~~
The patch tosky mentioned merged https://review.opendev.org/#/c/731501/ - but still stuck in component pipeline (consistent),
Yesterday, consistent-to-component-ci-testing didn't ran because of node_failure - https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-centos-8-master-component-tempest-promote-consistent-to-component-ci-testing
With help of Yatin - we retrigged that tempest component pipeline from zuul - Awaiting results
In the meantime we made sc10 non voting - https://review.opendev.org/#/c/741435/
Improvements:-
**We Need sc10 in promotion critera**
~~~
<ykarel> that job failed in periodic still it was promoted https://trunk.rdoproject.org/api-centos8-master-uc/api/civotes_agg_detail.html?ref_hash=1fcd094313791317563b22f5dcf54d3b
<ykarel> for voting jobs it shouldn't be skipped
<chandankumar> ykarel: I am not sure sc10 is the part of promotion criteria
<ykarel> chandankumar, then it shouldn't be voting
<chandankumar> ykarel: https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/dlrnapi_promoter/config/CentOS-8/master.ini
<chandankumar> ysandeep|rover: ^^ make it nv
<ykarel> hmm i seen , yes good to make it non voting until it get's fixed
<chandankumar> ysandeep|rover: once it becomes green , add it to criteria and then make it voting
~~~
* Ussuri check/Gate jobs are failing because of missing ovn containers, Jobs failing with Error - Not found image: docker://docker.io/tripleou/centos-binary-neutron-metadata-agent-ovn"
https://bugs.launchpad.net/tripleo/+bug/1887783
Ussuri gate broken - promoter missed pushing some containers
```
16th July:
11:40 < marios> zbr|ruck: well ussuri promoted but didn't push some containers to docker.io so now ussuri gate is broken
11:40 < marios> zbr|ruck: missing are 11:17 < ykarel> marios, all ovn related missing
11:40 < marios> 11:17 < ykarel> set(['ovn-northd', 'ovn-sb-db-server', 'ovn-nb-db-server', 'ovn-controller',
'neutron-metadata-agent-ovn'])
```
Chandan helping with Patch - https://review.rdoproject.org/r/#/c/28562/
* Headsup(Info from yatin/rabi):-
Once heat patch[1] merges, tripleo component will start failing.
To fix we need stevedore3.1.0(which is in client component), but stevedore after building it will be in clients component. We would need https://review.rdoproject.org/r/#/c/28529/3..4/tags/victoria-uc.yml included.
Because stevedore and heat are in different components, so need to trick the promotions to get both component promotion (clients/tripleo)together (may be by manual promotion/relax criteria of these component once that patch merges?)
[1] https://review.opendev.org/#/c/741088/
* periodic-tripleo-ci-centos-8-standalone-octavia-master - failing with different tempest failures from last 4 days
https://logserver.rdoproject.org/openstack-component-octavia/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-octavia-master/e2a954c/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz
https://logserver.rdoproject.org/openstack-component-octavia/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-octavia-master/8dbcbbb/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz
https://logserver.rdoproject.org/openstack-component-octavia/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-octavia-master/087ac76/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz
## July 15th
### TripleO
* Stein Promotion
Stein periodic jobs https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-integration-stable3 didn't trigger because periodic-tripleo-centos-7-stein-containers-build-push failed with NODE_FAILURE
posted https://review.rdoproject.org/r/#/c/28537/ testproject patch to rerun stein pipeline inorder to get stein promotion.
* periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-train job fails with pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'overcloud.ctlplane.ooo.test' ([Errno 113] EHOSTUNREACH)") https://bugs.launchpad.net/tripleo/+bug/1887633
Need to work
* tripleo-ci-centos-8-scenario010-ovn-provider-standalone is failing on tempest tests with error Details: {'faultcode': 'Client', 'faultstring': "Provider 'ovn' is not enabled.", 'debuginfo': None}
https://bugs.launchpad.net/tripleo/+bug/1887666
Patch: https://review.opendev.org/#/c/714639/ might solve this issue. - Awaiting results
## Downstream
periodic-tripleo-build-containers-ubi-8-internal-rhel-8-build-push-upload-rhos-17 failing.
https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-rhos-17/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-build-containers-ubi-8-internal-rhel-8-build-push-upload-rhos-17/bcb2416/job-output.txt
Need to work
## July 14th
### TripleO
* ussuri promoted
* train starts promoting ... fails with:
changed: [localhost] => (item=neutron-server-ovn)
failed: [localhost] (item=neutron-metadata-agent-ovn) => {"ansible_index_var": "index", "ansible_loop_var": "item", "changed": true, "cmd": "docker manifest create docker.io/tripleotrain/centos-binary-neutron-metadata-agent-ovn:7d0406b1a2bb054f42b198e9494ddc54372e7285_6e7a0112_manifest docker.io/tripleotrain/centos-binary-neutron-metadata-agent-ovn:7d0406b1a2bb054f42b198e9494ddc54372e7285_6e7a0112_x86_64\ndocker manifest annotate --arch amd64 docker.io/tripleotrain/centos-binary-neutron-metadata-agent-ovn:7d0406b1a2bb054f42b198e9494ddc54372e7285_6e7a0112_manifest docker.io/tripleotrain/centos-binary-neutron-metadata-agent-ovn:7d0406b1a2bb054f42b198e9494ddc54372e7285_6e7a0112_x86_64\n", "delta": "0:00:01.119006", "end": "2020-07-15 00:22:29.766344", "index": 60, "item": "neutron-metadata-agent-ovn", "msg": "non-zero return code", "rc": 1, "start": "2020-07-15 00:22:28.647338", "stderr": "unexpected end of JSON input\nunexpected end of JSON input", "stderr_lines": ["unexpected end of JSON input", "unexpected end of JSON input"], "stdout": "", "stdout_lines": []}
changed: [localhost] => (item=neutron-server)
http://38.102.83.109/centos7_train.log
Sandeep -
* Even thought neutron-metadata-agent-ovn manifest creation failed, container was pushed:-
https://hub.docker.com/r/tripleotrain/centos-binary-neutron-metadata-agent-ovn/tags
I am unable to reproduce locally with neutron-metadata-agent-ovn container:- http://paste.openstack.org/show/795933/
Filed a bz - https://bugs.launchpad.net/tripleo/+bug/1887660
On discussion with senior collegues(marios/chandan):-
If there is a problem pushing manifests we might just turn that off, manifests are only needed for ppc64le containers tagged but they aren't available so its safe to have them off.
* Came to know that pushing manifests is already off in master/usurri branch
https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/dlrnapi_promoter/config/CentOS-8/master.ini#L11
https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/dlrnapi_promoter/config/CentOS-8/ussuri.ini#L11
Proposed: https://review.rdoproject.org/r/#/c/28540/ - we can cherry -pick changes to promoter.
## July 13th
### TripleO
* periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-train is failing with Error while evaluating a Resource Statement, Duplicate declaration: Exec[/etc/pki/CA/certs/vnc.crt]
https://bugs.launchpad.net/tripleo/+bug/1887376
https://review.opendev.org/#/c/740679/
* [1885602 - Octavia component: failing tempest, Details: {'faultcode': 'Client', 'faultstring': 'Policy does not allow this request to be performed.',](https://bugs.launchpad.net/tripleo/+bug/1885602) There are two patch associated with this bug - we need to get those merged.
Sandeep - Added more core reviewers to get +w
* [1887427 - TripleO CI jobs do not fail on package build errors ](https://bugs.launchpad.net/tripleo/+bug/1887427)
Sandeep - Above seems expected
## July 10th
### TripleO
* Centos8 train missing some needed Iptables rules - Timeout exception waiting for the logger. Please check connectivity to [<IP>:19885] - https://bugs.launchpad.net/tripleo/+bug/1887112
Patch is up - https://review.opendev.org/#/c/739963/
* Periodic Centos8 Scenario007 failing because neutron_ovs_agent failed with error: /usr/bin/python: No such file or directory -
https://bugs.launchpad.net/tripleo/+bug/1887146
Patch is up - https://review.opendev.org/#/c/740440/
* Periodic C7 container push molecule job - periodic-molecule-container-push-delegated-centos-7 is failing with ERROR: molecule_delegated: could not install deps
https://bugs.launchpad.net/tripleo/+bug/1887120
As discussed in scrum planning meeting(on 09th July) we also created taiga card for sprint team to fix this job - https://tree.taiga.io/project/tripleo-ci-board/task/1879?kanban-status=1447274
Patch is up - https://review.rdoproject.org/r/#/c/28482/
* Periodic c8 molecule CI-Config jobs which are failing with ERROR: python-virtualenv No match for argument: python-virtualenv"
https://bugs.launchpad.net/tripleo/+bug/1887125
Following jobs are failing:-
~~~
* periodic-molecule-tripleo-common-delegated-centos-8
* periodic-molecule-delegated-promote-images-delegated-centos-8
* periodic-molecule-container-push-delegated-centos-8
~~~
As discussed in scrum planning meeting(on 09th July) we also created taiga card for sprint team to fix these jobs -
https://tree.taiga.io/project/tripleo-ci-board/task/1880?kanban-status=1447274
* Periodic C8 Promotion-staging jobs failing with Error: msg": "Failed to download metadata for repo 'influxdb': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried" https://bugs.launchpad.net/tripleo/+bug/1887130
following jobs are affected:-
~~~
periodic-tripleo-ci-promotion-staging-single-pipeline-centos-8
periodic-tripleo-ci-promotion-staging-integration-pipeline-centos-8
~~~
Taiga - https://tree.taiga.io/project/tripleo-ci-board/task/1881?kanban-status=1447274
## July 9th
### TripleO
* Gate blocker - Undercloud minion failing validation ERROR: Heat Engine host count is 1 or less. https://bugs.launchpad.net/tripleo/+bug/1886914
Undercloud is not uploading container to its registry with "modify_append_tag" suffix - because of this minion jobs are failing because minion node is trying to pull container with tag(that consists modify_append_tag ) from undercloud.
**Right flags are set on bz for card(We probably need emilien/kevin help on this one.)**
* Component jobs failing with error FileNotFoundError: [Errno 2] No such file or directory: '/home/zuul/workspace' - https://bugs.launchpad.net/tripleo/+bug/1886941
These two errors were noticed:-
Error: Failed to download metadata for repo 'AppStream':
~~~
2020-07-09 06:31:03.412406 | primary | Errors during downloading metadata for repository 'AppStream':
2020-07-09 06:31:03.412444 | primary | - Status code: 403 for http://mirror.regionone.rdo-cloud-tripleo.rdoproject.org/centos/8/AppStream/x86_64/os/repodata/repomd.xml (IP: 38.145.32.16)
2020-07-09 06:31:03.412483 | primary | Error: Failed to download metadata for repo 'AppStream': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried
~~~
FileNotFoundError: [Errno 2] No such file or directory: '/home/zuul/workspace'
~~~
2020-07-09 06:31:06.388197 |
2020-07-09 06:31:06.388348 | TASK [Report to DLRN]
2020-07-09 06:31:11.440669 | Timeout exception waiting for the logger. Please check connectivity to [38.145.32.72:19885]
2020-07-09 06:31:11.442587 | primary | MODULE FAILURE:
2020-07-09 06:31:11.442675 | primary | Traceback (most recent call last):
2020-07-09 06:31:11.442721 | primary | File "<stdin>", line 114, in <module>
2020-07-09 06:31:11.442796 | primary | File "<stdin>", line 106, in _ansiballz_main
2020-07-09 06:31:11.442859 | primary | File "<stdin>", line 49, in invoke_module
2020-07-09 06:31:11.442899 | primary | File "/usr/lib64/python3.6/imp.py", line 235, in load_module
2020-07-09 06:31:11.442937 | primary | return load_source(name, filename, file)
2020-07-09 06:31:11.442975 | primary | File "/usr/lib64/python3.6/imp.py", line 170, in load_source
2020-07-09 06:31:11.443012 | primary | module = _exec(spec, sys.modules[name])
2020-07-09 06:31:11.443072 | primary | File "<frozen importlib._bootstrap>", line 618, in _exec
2020-07-09 06:31:11.443125 | primary | File "<frozen importlib._bootstrap_external>", line 678, in exec_module
2020-07-09 06:31:11.443163 | primary | File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
2020-07-09 06:31:11.443201 | primary | File "/tmp/ansible_command_payload_hixla9is/__main__.py", line 675, in <module>
2020-07-09 06:31:11.443239 | primary | File "/tmp/ansible_command_payload_hixla9is/__main__.py", line 620, in main
2020-07-09 06:31:11.443277 | primary | FileNotFoundError: [Errno 2] No such file or directory: '/home/zuul/workspace'
2020-07-09 06:31:11.499141 |
~~~
This might be a transient issue, Doing a testproject run of one job to confirm- https://review.rdoproject.org/r/#/c/26273/10/.zuul.yaml
* Random tempest failure observed - No new bz opened and its under observation.
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6e0/739764/1/check/tripleo-ci-centos-8-scenario002-standalone/6e0f162/job-output.txt
~~~
{0} barbican_tempest_plugin.tests.scenario.test_volume_encryption.VolumeEncryptionTest.test_encrypted_cinder_volumes_cryptsetup [130.199098s] ... FAILED
~~~
https://5e09181bcc1a50499619-17764b56a5c622705c872e3c7dca2597.ssl.cf2.rackcdn.com/739495/2/gate/tripleo-ci-centos-8-standalone/32b535b/logs/undercloud/var/log/tempest/tempest_run.log
~~~
{0} tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern [422.084246s] ... FAILED
.
.
tempest.lib.exceptions.SSHTimeout: Connection to the 192.168.24.144 via SSH timed out.
User: cirros, Password: None
~~~