# Ruck and Rover notes #30 ###### tags: `ruck_rover` :::info Important links for ruck rover's [ruck/rover links to help](https://hackmd.io/07z0xroHTFi2IbX93P5ZfQ) **Ruck Rover - Unified Sprint #30 Dates: July 9th - July 29 Tripleo CI team ruck|rover: Sorin Sbarnea (zbr) , Sandeep Yadav (ysandeep), backup - rlandy and weshay Previous notes(sprint #29): https://hackmd.io/XcuH2OIVTMiuxyrqSF6ocw?both **Next #30 notes: https://hackmd.io/QnprH9-yRTi6uWlEfaahoQ** ::: [TOC] --- ## on-going issues :::danger ## TripleO * https://bugs.launchpad.net/tripleo/+bug/1889357 - Centos7 Check/Gate jobs failing with UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 2: ordinal not in range(128) Latest pip-20.2(released today) break Centos7 jobs. Patch is up to pin pip for c7 - https://review.opendev.org/#/c/743691/ * https://launchpad.net/bugs/1889122 mirror timeouts in upstream causing undercloud and standalone failures. https://review.opendev.org/#/c/743432/ Patch to locally build container to reduce pressure on docker.io. ### gate ### periodic / 3rd party * Master Periodic jobs: Sc001/002 failing with Error: ", " Problem: package rdma-core-26.0-8.el8.x86_64 requires dracut, but none of the providers can be installed Patches are up to fix issue * introspection issues on vexxhost continue: testproject rechecked the train/stein/rocky/queens jobs but still a lot failing introspection. Note that there was a vexxhost update at the end of last week QUESTION: should we run these testproject jobs on RDO cloud to clear promotions?? * main (master) pipeline so here we see a bunch of errors where the fixes are sitting in different components - and we need a combination of those components to promote - https://bugs.launchpad.net/tripleo/+bug/1885602 - https://bugs.launchpad.net/tripleo/+bug/1887856 - and the lastest to show up ... Failed to import test module: heat_tempest_plugin.tests.functional.test_create_update_neutron_trunk ^^ think this requires a heat update. QUESTION: time to force some component promotions to clear this?? * TASK: many tests are commented out in the promotion criteria. Request to review all the ini files on the promoter and review the criteria. ::: ## July 29th ### TripleO * **Promotion blocker** - https://bugs.launchpad.net/tripleo/+bug/1889357 - Centos7 Check/Gate jobs failing with UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 2: ordinal not in range(128) Latest pip-20.2(released today) break Centos7 jobs. Patch is up to pin pip for c7 - https://review.opendev.org/#/c/743691/ * https://bugs.launchpad.net/tripleo/+bug/1889394 - Undercloud deployment reports false successfully deployment but deployment was already failed with RuntimeError: wait_api_port_ready: Max retries 30 reached Patch - https://review.opendev.org/#/c/743744/ is up by Emilien. ## July 28th ### TripleO * https://launchpad.net/bugs/1889122 mirror timeouts in upstream causing undercloud and standalone failures https://review.opendev.org/#/c/743432/ Patch to locally build container to reduce pressure on docker.io. Another Bug around same context: https://bugs.launchpad.net/tripleo/+bug/1889372 containers image prepare should adjust numbers of workers and exp. fallback interval upon retrying connections Bogdon porposed a Patch https://review.opendev.org/#/c/743704/ * Master awaiting promotion from last 5 days. BZ:-https://bugs.launchpad.net/tripleo/+bug/1889192 Master Periodic jobs: Sc001/002 failing with Error: \", \" Problem: package rdma-core-26.0-8.el8.x86_64 requires dracut, but none of the providers can be installed This came to be already known issue discussed already b/w emilien and yatin. A work in progress patch[1] is already proposed and test run[2] is green using it. [1] https://review.opendev.org/#/c/743263/ [2] https://review.rdoproject.org/r/#/c/28723/ * Rocky awaiting promotion from last 9 days Earlier, ovb jobs failed(seems due to vexx host infra issue) and this weekend jobs were skipped container-push job timedout I ran a Testproject to get rocky promotion :- https://review.rdoproject.org/r/#/c/28437/ Wes acked to take care of rocky promotion ## July 27th ### TripleO (Solved) (Improvement) Valdation pipeline to be created and jobs to be added in validation pipeline * Upstream Check/Gate jobs failing during python-tripleoclient rpm build with error - No matching package to install: 'validations-common' - https://bugs.launchpad.net/tripleo/+bug/1889045 This seems to be caused by recent component movement in rdoinfo:- "Create validation component for validation framework" , Patch - https://review.rdoproject.org/r/#/c/28511/ we created - Dummy distgit commits to force build first repo with all validation packages ~~~ https://review.rdoproject.org/r/#/c/28711/ - master https://review.rdoproject.org/r/#/c/28713/ - ussuri https://review.rdoproject.org/r/#/c/28714/ - train ~~~ ## July 20th ### TripleO (Solved) * To clear https://bugs.launchpad.net/tripleo/+bug/1887856 we merged https://review.rdoproject.org/r/#/c/28604/ and rerun intergration main line to confirm it works. ## July 19th ### TripleO * zuul was reset on Friday - some gate jobs may have to be rerun ## July 17th ### TripleO (Solved) * To fix https://bugs.launchpad.net/tripleo/+bug/1887856 - With help of test patch we tried to get tripleo/clients components promoted - https://review.rdoproject.org/r/28586 Awaiting integration pipeline promotion for the fix to reach current-tripleo ## July 16th ### TripleO (Solved - it was duplicate of: https://bugs.launchpad.net/tripleo/+bug/1885602) * tripleo-ci-centos-8-scenario010-standalone check/gate/periodic jobs failing on tempest tests with error - tempest.lib.exceptions.Forbidden: Forbidden , Details: {'faultcode': 'Client', 'faultstring': 'Policy does not allow this request to be performed.', 'debuginfo': None https://bugs.launchpad.net/tripleo/+bug/1887790 (Solved) * [1887856 - centos-8 master tripleo component tests are failing with "ModuleNotFoundError: No module named 'blazarclient'"](https://bugs.launchpad.net/tripleo/+bug/1887856) ~~~ <tosky> ysandeep|rover: the fix for octavia tempest plugin is not there yet; it was added in the next commit <ysandeep|rover> tosky, Hey o/ thank you, do you have that commit handy? <tosky> ysandeep|rover: https://opendev.org/osf/python-tempestconf/commit/7ee63b1517b7412c8e25f2842b207339a70f62c6 <ysandeep|rover> tosky, thanks! ~~~ The patch tosky mentioned merged https://review.opendev.org/#/c/731501/ - but still stuck in component pipeline (consistent), Yesterday, consistent-to-component-ci-testing didn't ran because of node_failure - https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-centos-8-master-component-tempest-promote-consistent-to-component-ci-testing With help of Yatin - we retrigged that tempest component pipeline from zuul - Awaiting results In the meantime we made sc10 non voting - https://review.opendev.org/#/c/741435/ Improvements:- **We Need sc10 in promotion critera** ~~~ <ykarel> that job failed in periodic still it was promoted https://trunk.rdoproject.org/api-centos8-master-uc/api/civotes_agg_detail.html?ref_hash=1fcd094313791317563b22f5dcf54d3b <ykarel> for voting jobs it shouldn't be skipped <chandankumar> ykarel: I am not sure sc10 is the part of promotion criteria <ykarel> chandankumar, then it shouldn't be voting <chandankumar> ykarel: https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/dlrnapi_promoter/config/CentOS-8/master.ini <chandankumar> ysandeep|rover: ^^ make it nv <ykarel> hmm i seen , yes good to make it non voting until it get's fixed <chandankumar> ysandeep|rover: once it becomes green , add it to criteria and then make it voting ~~~ * Ussuri check/Gate jobs are failing because of missing ovn containers, Jobs failing with Error - Not found image: docker://docker.io/tripleou/centos-binary-neutron-metadata-agent-ovn" https://bugs.launchpad.net/tripleo/+bug/1887783 Ussuri gate broken - promoter missed pushing some containers ``` 16th July: 11:40 < marios> zbr|ruck: well ussuri promoted but didn't push some containers to docker.io so now ussuri gate is broken 11:40 < marios> zbr|ruck: missing are 11:17 < ykarel> marios, all ovn related missing 11:40 < marios> 11:17 < ykarel> set(['ovn-northd', 'ovn-sb-db-server', 'ovn-nb-db-server', 'ovn-controller', 'neutron-metadata-agent-ovn']) ``` Chandan helping with Patch - https://review.rdoproject.org/r/#/c/28562/ * Headsup(Info from yatin/rabi):- Once heat patch[1] merges, tripleo component will start failing. To fix we need stevedore3.1.0(which is in client component), but stevedore after building it will be in clients component. We would need https://review.rdoproject.org/r/#/c/28529/3..4/tags/victoria-uc.yml included. Because stevedore and heat are in different components, so need to trick the promotions to get both component promotion (clients/tripleo)together (may be by manual promotion/relax criteria of these component once that patch merges?) [1] https://review.opendev.org/#/c/741088/ * periodic-tripleo-ci-centos-8-standalone-octavia-master - failing with different tempest failures from last 4 days https://logserver.rdoproject.org/openstack-component-octavia/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-octavia-master/e2a954c/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz https://logserver.rdoproject.org/openstack-component-octavia/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-octavia-master/8dbcbbb/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz https://logserver.rdoproject.org/openstack-component-octavia/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-octavia-master/087ac76/logs/undercloud/var/log/tempest/tempest_run.log.txt.gz ## July 15th ### TripleO * Stein Promotion Stein periodic jobs https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-integration-stable3 didn't trigger because periodic-tripleo-centos-7-stein-containers-build-push failed with NODE_FAILURE posted https://review.rdoproject.org/r/#/c/28537/ testproject patch to rerun stein pipeline inorder to get stein promotion. * periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-train job fails with pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on 'overcloud.ctlplane.ooo.test' ([Errno 113] EHOSTUNREACH)") https://bugs.launchpad.net/tripleo/+bug/1887633 Need to work * tripleo-ci-centos-8-scenario010-ovn-provider-standalone is failing on tempest tests with error Details: {'faultcode': 'Client', 'faultstring': "Provider 'ovn' is not enabled.", 'debuginfo': None} https://bugs.launchpad.net/tripleo/+bug/1887666 Patch: https://review.opendev.org/#/c/714639/ might solve this issue. - Awaiting results ## Downstream periodic-tripleo-build-containers-ubi-8-internal-rhel-8-build-push-upload-rhos-17 failing. https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-periodic-rhos-17/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-build-containers-ubi-8-internal-rhel-8-build-push-upload-rhos-17/bcb2416/job-output.txt Need to work ## July 14th ### TripleO * ussuri promoted * train starts promoting ... fails with: changed: [localhost] => (item=neutron-server-ovn) failed: [localhost] (item=neutron-metadata-agent-ovn) => {"ansible_index_var": "index", "ansible_loop_var": "item", "changed": true, "cmd": "docker manifest create docker.io/tripleotrain/centos-binary-neutron-metadata-agent-ovn:7d0406b1a2bb054f42b198e9494ddc54372e7285_6e7a0112_manifest docker.io/tripleotrain/centos-binary-neutron-metadata-agent-ovn:7d0406b1a2bb054f42b198e9494ddc54372e7285_6e7a0112_x86_64\ndocker manifest annotate --arch amd64 docker.io/tripleotrain/centos-binary-neutron-metadata-agent-ovn:7d0406b1a2bb054f42b198e9494ddc54372e7285_6e7a0112_manifest docker.io/tripleotrain/centos-binary-neutron-metadata-agent-ovn:7d0406b1a2bb054f42b198e9494ddc54372e7285_6e7a0112_x86_64\n", "delta": "0:00:01.119006", "end": "2020-07-15 00:22:29.766344", "index": 60, "item": "neutron-metadata-agent-ovn", "msg": "non-zero return code", "rc": 1, "start": "2020-07-15 00:22:28.647338", "stderr": "unexpected end of JSON input\nunexpected end of JSON input", "stderr_lines": ["unexpected end of JSON input", "unexpected end of JSON input"], "stdout": "", "stdout_lines": []} changed: [localhost] => (item=neutron-server) http://38.102.83.109/centos7_train.log Sandeep - * Even thought neutron-metadata-agent-ovn manifest creation failed, container was pushed:- https://hub.docker.com/r/tripleotrain/centos-binary-neutron-metadata-agent-ovn/tags I am unable to reproduce locally with neutron-metadata-agent-ovn container:- http://paste.openstack.org/show/795933/ Filed a bz - https://bugs.launchpad.net/tripleo/+bug/1887660 On discussion with senior collegues(marios/chandan):- If there is a problem pushing manifests we might just turn that off, manifests are only needed for ppc64le containers tagged but they aren't available so its safe to have them off. * Came to know that pushing manifests is already off in master/usurri branch https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/dlrnapi_promoter/config/CentOS-8/master.ini#L11 https://github.com/rdo-infra/ci-config/blob/master/ci-scripts/dlrnapi_promoter/config/CentOS-8/ussuri.ini#L11 Proposed: https://review.rdoproject.org/r/#/c/28540/ - we can cherry -pick changes to promoter. ## July 13th ### TripleO * periodic-tripleo-ci-centos-8-standalone-on-multinode-ipa-train is failing with Error while evaluating a Resource Statement, Duplicate declaration: Exec[/etc/pki/CA/certs/vnc.crt] https://bugs.launchpad.net/tripleo/+bug/1887376 https://review.opendev.org/#/c/740679/ * [1885602 - Octavia component: failing tempest, Details: {'faultcode': 'Client', 'faultstring': 'Policy does not allow this request to be performed.',](https://bugs.launchpad.net/tripleo/+bug/1885602) There are two patch associated with this bug - we need to get those merged. Sandeep - Added more core reviewers to get +w * [1887427 - TripleO CI jobs do not fail on package build errors ](https://bugs.launchpad.net/tripleo/+bug/1887427) Sandeep - Above seems expected ## July 10th ### TripleO * Centos8 train missing some needed Iptables rules - Timeout exception waiting for the logger. Please check connectivity to [<IP>:19885] - https://bugs.launchpad.net/tripleo/+bug/1887112 Patch is up - https://review.opendev.org/#/c/739963/ * Periodic Centos8 Scenario007 failing because neutron_ovs_agent failed with error: /usr/bin/python: No such file or directory - https://bugs.launchpad.net/tripleo/+bug/1887146 Patch is up - https://review.opendev.org/#/c/740440/ * Periodic C7 container push molecule job - periodic-molecule-container-push-delegated-centos-7 is failing with ERROR: molecule_delegated: could not install deps https://bugs.launchpad.net/tripleo/+bug/1887120 As discussed in scrum planning meeting(on 09th July) we also created taiga card for sprint team to fix this job - https://tree.taiga.io/project/tripleo-ci-board/task/1879?kanban-status=1447274 Patch is up - https://review.rdoproject.org/r/#/c/28482/ * Periodic c8 molecule CI-Config jobs which are failing with ERROR: python-virtualenv No match for argument: python-virtualenv" https://bugs.launchpad.net/tripleo/+bug/1887125 Following jobs are failing:- ~~~ * periodic-molecule-tripleo-common-delegated-centos-8 * periodic-molecule-delegated-promote-images-delegated-centos-8 * periodic-molecule-container-push-delegated-centos-8 ~~~ As discussed in scrum planning meeting(on 09th July) we also created taiga card for sprint team to fix these jobs - https://tree.taiga.io/project/tripleo-ci-board/task/1880?kanban-status=1447274 * Periodic C8 Promotion-staging jobs failing with Error: msg": "Failed to download metadata for repo 'influxdb': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried" https://bugs.launchpad.net/tripleo/+bug/1887130 following jobs are affected:- ~~~ periodic-tripleo-ci-promotion-staging-single-pipeline-centos-8 periodic-tripleo-ci-promotion-staging-integration-pipeline-centos-8 ~~~ Taiga - https://tree.taiga.io/project/tripleo-ci-board/task/1881?kanban-status=1447274 ## July 9th ### TripleO * Gate blocker - Undercloud minion failing validation ERROR: Heat Engine host count is 1 or less. https://bugs.launchpad.net/tripleo/+bug/1886914 Undercloud is not uploading container to its registry with "modify_append_tag" suffix - because of this minion jobs are failing because minion node is trying to pull container with tag(that consists modify_append_tag ) from undercloud. **Right flags are set on bz for card(We probably need emilien/kevin help on this one.)** * Component jobs failing with error FileNotFoundError: [Errno 2] No such file or directory: '/home/zuul/workspace' - https://bugs.launchpad.net/tripleo/+bug/1886941 These two errors were noticed:- Error: Failed to download metadata for repo 'AppStream': ~~~ 2020-07-09 06:31:03.412406 | primary | Errors during downloading metadata for repository 'AppStream': 2020-07-09 06:31:03.412444 | primary | - Status code: 403 for http://mirror.regionone.rdo-cloud-tripleo.rdoproject.org/centos/8/AppStream/x86_64/os/repodata/repomd.xml (IP: 38.145.32.16) 2020-07-09 06:31:03.412483 | primary | Error: Failed to download metadata for repo 'AppStream': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried ~~~ FileNotFoundError: [Errno 2] No such file or directory: '/home/zuul/workspace' ~~~ 2020-07-09 06:31:06.388197 | 2020-07-09 06:31:06.388348 | TASK [Report to DLRN] 2020-07-09 06:31:11.440669 | Timeout exception waiting for the logger. Please check connectivity to [38.145.32.72:19885] 2020-07-09 06:31:11.442587 | primary | MODULE FAILURE: 2020-07-09 06:31:11.442675 | primary | Traceback (most recent call last): 2020-07-09 06:31:11.442721 | primary | File "<stdin>", line 114, in <module> 2020-07-09 06:31:11.442796 | primary | File "<stdin>", line 106, in _ansiballz_main 2020-07-09 06:31:11.442859 | primary | File "<stdin>", line 49, in invoke_module 2020-07-09 06:31:11.442899 | primary | File "/usr/lib64/python3.6/imp.py", line 235, in load_module 2020-07-09 06:31:11.442937 | primary | return load_source(name, filename, file) 2020-07-09 06:31:11.442975 | primary | File "/usr/lib64/python3.6/imp.py", line 170, in load_source 2020-07-09 06:31:11.443012 | primary | module = _exec(spec, sys.modules[name]) 2020-07-09 06:31:11.443072 | primary | File "<frozen importlib._bootstrap>", line 618, in _exec 2020-07-09 06:31:11.443125 | primary | File "<frozen importlib._bootstrap_external>", line 678, in exec_module 2020-07-09 06:31:11.443163 | primary | File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed 2020-07-09 06:31:11.443201 | primary | File "/tmp/ansible_command_payload_hixla9is/__main__.py", line 675, in <module> 2020-07-09 06:31:11.443239 | primary | File "/tmp/ansible_command_payload_hixla9is/__main__.py", line 620, in main 2020-07-09 06:31:11.443277 | primary | FileNotFoundError: [Errno 2] No such file or directory: '/home/zuul/workspace' 2020-07-09 06:31:11.499141 | ~~~ This might be a transient issue, Doing a testproject run of one job to confirm- https://review.rdoproject.org/r/#/c/26273/10/.zuul.yaml * Random tempest failure observed - No new bz opened and its under observation. https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6e0/739764/1/check/tripleo-ci-centos-8-scenario002-standalone/6e0f162/job-output.txt ~~~ {0} barbican_tempest_plugin.tests.scenario.test_volume_encryption.VolumeEncryptionTest.test_encrypted_cinder_volumes_cryptsetup [130.199098s] ... FAILED ~~~ https://5e09181bcc1a50499619-17764b56a5c622705c872e3c7dca2597.ssl.cf2.rackcdn.com/739495/2/gate/tripleo-ci-centos-8-standalone/32b535b/logs/undercloud/var/log/tempest/tempest_run.log ~~~ {0} tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern.test_volume_boot_pattern [422.084246s] ... FAILED . . tempest.lib.exceptions.SSHTimeout: Connection to the 192.168.24.144 via SSH timed out. User: cirros, Password: None ~~~