owned this note
owned this note
Published
Linked with GitHub
# Ruck and rover notes #24
###### tags: `ruck_rover`
## PCCI Ruck Rover Sprint #23 - 5 Mar to 26 Mar 2020
:::warning
Tripleo CI team ruck|rover: Wes (weshay) && Sandeep ysandeep
OSP CI team ruck|rover: Waldek (wznoinsk) && Vadim (vgriner)
:::
:::spoiler Useful information
* [Ruck/rover primer](https://docs.openstack.org/tripleo-docs/latest/ci/ruck_rover_primer.html)
* [Cockpit](http://dashboard-ci.tripleo.org/d/cockpit/cockpit?orgId=1)
*
* [Internal Cockpit ](http://tripleo-cockpit.usersys.redhat.com/?orgId=1)
* Status
* http://cistatus.tripleo.org/
* https://trello.com/b/j4IcIomh/production-chain-escalation
* http://rhos-release.virt.bos.redhat.com:3030/rhosp
* [Debugging Tools](https://docs.google.com/document/d/1VZhje7ZN9sk4E31fYVrPxpqMJGz5ZhHRfhte_RYMXxg/edit#)
* [RDO project dashboard](https://review.rdoproject.org/grafana/?orgId=1&var-datasource=default&var-server=registry.rdoproject.org.rdocloud&var-inter=$__auto_interval_inter)
* [CentOS pre-release rpm updates for minor releases](http://mirror.centos.org/centos/7/cr/x86_64/Packages/)
* [Internal software factory](https://sf.hosted.upshift.rdu2.redhat.com)
* [Upstream rsync mirror logs](files.openstack.org/mirror/logs/rsync-mirrors/centos.log)
* [Trello retrospective](https://trello.com/b/0VFswmht/rdo-infra-retrospective?menu=filter&filter=label:UniSprint21)
* Internal Dashboard
* [OSP16](https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/QE/view/OSP16/)
* [RHOS-INFRA Infrared issues](https://projects.engineering.redhat.com/issues/?filter=34183)
* [CIX escalation](https://mojo.redhat.com/docs/DOC-1098748#jive_content_id_CIX_Escalation_Automation_and_email_format)
* [CIX board](https://trello.com/b/j4IcIomh/production-chain-escalation)
* [Nodepool image logs](https://softwarefactory-project.io/nodepool-log/)
:::
# Sprint 23
## New / Transient / No bug yet:
@raukadah hey hey@w
* 11:12 < marios> weshay|ruck: fyi 11:09 < dpawlik> if this one will be merged, https://review.opendev.org/#/c/713177 jobs will fail if
they are running on f29/f28
* [OSP17][handover] jobs failing with "The task includes an option with an undefined variable. The error was: 'dict object' has no attribute 'ctlplane'" - https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/QE/view/OSP17/job/phase1-17_director-rhel-8.1-virthost-1cont_1comp_1ceph-ipv4-geneve-ceph/10/artifact/.sh/05-ooo-overcloud.log/*view*/ , any fixes to infrared should be submitted to gerrit with topic of 'osp17p1' (i.e.: ir_patches_topic in https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/QE/view/OSP17/job/phase1-17_director-rhel-8.1-virthost-1cont_1comp_1ceph-ipv4-geneve-ceph/10/parameters/)
* [OSP10][handover] phase2-10-rhel-7.7-openstack-all-in-one-neutron-rabbitmq failing on 'yum clean all' command, running again (to confirm) in https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/ReleaseDelivery/view/OSP10/job/phase2-10-rhel-7.7-openstack-all-in-one-neutron-rabbitmq/73/ - if fails as well then it needs investiation, contact: migarcia
* [OSP13] PSI upshift quota exceeded - phase2-13-rhel-7.8-openstack-all-in-one-neutron-rabbitmq failing (I think vgriner was fixing this)
* **[OSP][handover] Work to get first osp16.1 jobs ongoing** - https://projects.engineering.redhat.com/browse/RHOSINFRA-3118
**found bug**: ~~osp16.1, rhel8.2: ipmitool commands via vbmc (virtualbmc) take too long and cause overcloud introspection to fail - https://bugzilla.redhat.com/show_bug.cgi?id=1813889~~
* https://bugzilla.redhat.com/show_bug.cgi?id=1814616 - after extended team troubleshooted the issue on Mar 25th we've found a bogus nftables entries (which came in rhel8.2 only) blocking our dhcp/udp traffic, see the bugzilla for updates on the resolution
* **[OSP] All IPv6 jobs are broken.** - https://projects.engineering.redhat.com/browse/RHOSINFRA-3104 (reverted = resolved, they'll work on a better patch)
* **[OSP] osp16.0 jobs back working on the puddle from 2020-03-11**
* **OSP15 update job fail on No such property: overcloud_container_images_urls**
https://projects.engineering.redhat.com/browse/RHOSINFRA-3091
this would affect other osp versions as well - it's now fixed (as of ~9am GMT today)
* OSP16 - phase1 fails
* https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/phase1-16_director-rhel-8.1-virthost-1cont_1comp_1ceph-ipv4-geneve-ceph/105/artifact/.sh/04-ooo-undercloud.log
*~~ https://bugzilla.redhat.com/show_bug.cgi?id=1809998 duplicate~~
* Main bug: https://bugzilla.redhat.com/1809939
* Esca: https://trello.com/c/z0LYn4Rq
## Earlier unclosed things tracked here (first is new):
### OSP ISSUES
:::info
OSP ISSUES
:::
Thu 05 March 2020 R&R tansfer:
OSP16
1)https://trello.com/c/z0LYn4Rq - main p1 blocker
2)^ also forked into https://trello.com/c/Dkvbl5Kb - memcached container issue
OSP13
techdebt - nova-scheduler workaround - https://trello.com/c/pVkLagqH/1337-cixbz1803150ospphase2osp13nova-scheduler-hint-for-nova-scheduler-seems-to-be-ignored - fhubik will deal with that ouside of R&R and sprint
Also
* Need to construct email to pntops tlv ( check w/ Amnon )
* Express a SLA to rhos-dev re: RFE: https://projects.engineering.redhat.com/browse/RHOSINFRA-3075?filter=34183
* details from fhubik on that^ topic: https://trello.com/c/yBgjnUEn
* also we need to document the processes behind the PCCI_INFRA_SUPPORT queue (how long of inactivity before we can close/decr. priority etc.)
### TripleO ISSUES
:::info
TripleO ISSUES
:::
#### train
container build failed on collectd ( 1 time )
https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-rhel-8-train-containers-build-push-vexxhost/2b2c26c/logs/containers-failed-to-build.log
#### other
* :warning: https://bugs.launchpad.net/tripleo/+bug/1867744 Pacemaker cluster fails to start in CI jobs
* :warning: https://bugs.launchpad.net/tripleo/+bug/1868079 puppet-neutron container failed
* https://bugs.launchpad.net/tripleo/+bug/1867744 Mistral failed command manually execute the following script: /var/lib/mistral/overcloud/ansible-playbook-command.sh
* :warning: https://bugs.launchpad.net/tripleo/+bug/1867602 overcloud deploy failed with Systemd start for pcsd failed
* https://bugs.launchpad.net/tripleo/+bug/1867599 overcloud deploy failing on fs030 and fs016 while pulling mariadb container from undercloud registry - FIX IS HERE --> https://review.opendev.org/#/c/712013/6/config/general_config/featureset030.yml BLOCKED BY MOLECULE FAIL
* https://bugs.launchpad.net/tripleo/+bug/1866621 Can't run container mistral_db_sync
> [name=Arx Cruz]2020-03-09 Adriano is taking a look ATM
> [name=Arx Cruz]2020-03-09 According Emilien this should be fixed by https://review.rdoproject.org/r/25780
> [name=Arx Cruz]2020-03-16 I'm no longer seeing this happening, but the patch from Emilien isn't merged yet. Moving to done.
> [name=Arx Cruz]2020-03-16 According yatin, that's no longer happening because mistral is pinned, moving back to open issue
> https://review.opendev.org/#/c/713873/ fix merged here testing with mistral unpinning
* https://bugs.launchpad.net/tripleo/+bug/1867323 Launchpad bug 1867323 in tripleo " standalone deploy failed at Error: error checking path "/run/libvirt": stat /run/libvirt: no such file or directory"
* https://bugs.launchpad.net/tripleo/+bug/1864953 (intermittent/race?) Image prepare failed: [Errno 17] File exists
* ~~https://review.opendev.org/#/c/710836/~~
> [name=Arx Cruz]2020/03/09 This patch was abandoned, I'll poke around in the CIX meeting to check the status
* OSP
* OSP 16 - https://bugzilla.redhat.com/show_bug.cgi?id=1807375#c10 verified
* OSP10
* Packtack jobs still broken https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/ReleaseDelivery/view/OSP10/job/phase2-10-rhel-7.7-openstack-all-in-one-neutron-rabbitmq/59/
* problem between ci slaves and github fetch?
* resolved itself, maybe temp network issue?
## Closed things tracked here:
### Tripleo Closed
:::spoiler
TripleO issues
* https://bugs.launchpad.net/tripleo/+bug/1868439 All stable branches OVB job fail because "No module named 'keystoneauth1'"
* **RHEL vs CentOS 8 for Train**
* <wes> FIXED merge conflict</wes>
* (merge conflict on kolla patch https://review.opendev.org/#/c/693159/) try rhel8 container build without kolla patches posted https://review.opendev.org/711198 & testproject https://review.rdoproject.org/r/25663
* nop fails cos no yum-config-manager
https://review.opendev.org/#/c/693159/5/docker/base/Dockerfile.j2 https://logserver.rdoproject.org/63/25663/1/check/periodic-tripleo-rhel-8-train-containers-build-push/785736a/logs/buildah-builds/kolla-v4kkjxy6/docker/base/base-build.log
* But really we want centos8 train. we have the job but not the pipeline yet
* train phase 1 green build yesterday problem reporting https://ci.centos.org/view/rdo/view/promotion-pipeline/job/rdo_trunk-promote-train-current-tripleo/130/
* https://bugs.launchpad.net/tripleo/+bug/1865754 tripleo-ci-centos-8-scenario001-standalone tempest-conf fails 500 PUT http://192.168.24.1:9292/v2/images/ RADOS invalid argument
* https://bugs.launchpad.net/tripleo/+bug/1865574 centos-8 multinode and undercloud jobs are hanging on the undercloud install
* https://bugs.launchpad.net/tripleo/+bug/1866031 periodic centos7 fs2 upload/fs1 master fails overcloud deploy pcs "create constraint failed"
* https://bugs.launchpad.net/tripleo/+bug/1867332 Mistrial tests getting skiped in undercloud deployment
> [name=Arx Cruz]2020-03-13 akahat is working on this
* https://bugs.launchpad.net/tripleo/+bug/1867023 ImportError: cannot import name suppress
* https://bugs.launchpad.net/tripleo/+bug/1866543 All CentOS-8 jobs fail on missing various packages from component repos
* https://review.rdoproject.org/r/#/c/25770/ Fix purge script
* https://bugs.launchpad.net/tripleo/+bug/1866687 tempest-conf error: Setting [volume-feature-enabled] multi_backend = True; TypeError: option values must be strings
* https://review.opendev.org/#/c/711976/
* https://bugs.launchpad.net/tripleo/+bug/1866965 Duplicate declaration: Package[collectd-python] is already declared at puppet-collectd and puppet-tripleo
* https://review.opendev.org/#/c/712289/
* **Gate fails: No route to host**
`
2020-03-05 02:55:49 | TASK [Gathering Facts] *********************************************************
2020-03-05 02:55:49 | Thursday 05 March 2020 02:55:49 +0000 (0:00:00.093) 0:00:07.537 ********
2020-03-05 02:55:53 | ok: [localhost]
2020-03-05 02:55:53 | ok: [192.168.24.14]
2020-03-05 02:56:20 | [WARNING]: Unhandled error in Python interpreter discovery for host
2020-03-05 02:56:20 | 192.168.24.20: Failed to connect to the host via ssh: ssh: connect to host
2020-03-05 02:56:20 | 192.168.24.20 port 22: No route to host
2020-03-05 02:56:35 | fatal: [192.168.24.20]: UNREACHABLE! => changed=false
2020-03-05 02:56:35 | msg: |-
2020-03-05 02:56:35 | Data could not be sent to remote host "192.168.24.20". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.20 port 22: No route to host
2020-03-05 02:56:35 | unreachable: true
2020-03-05 02:56:35 |
2020-03-05 02:56:35 | NO MORE HOSTS LEFT *************************************************************
`
> [name=Arx Cruz]2020/03/06 - Wasn't able to confirm yet if this is a real issue because other failures is happening before we reach this step.
> [name=Arx Cruz]2020/03/12 - Moving to done because jobs are running fine these days and we haven't see this error.
* **Gate fails: RuntimeError: Ansible execution failed. playbook: /var/lib/mistral/overcloud/deploy_steps_playbook.yaml, Run Status: failed, Return Code: 2**
Gate fails: RuntimeError: Ansible execution failed. playbook: /var/lib/mistral/overcloud/deploy_steps_playbook.yaml, Run Status: failed, Return Code: 2 * https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_265/710811/2/gate/tripleo-ci-centos-7-containers-multinode/2651e0b/logs/undercloud/home/zuul/overcloud_deploy.log
* https://bugs.launchpad.net/tripleo/+bug/1866031 periodic centos7 fs2 upload/fs1 master fails overcloud deploy pcs "create constraint failed"
> [name=Arx Cruz]2020/03/06 - Wasn't able to confirm yet if this is fixed because other failures is happening before we reach this step.
* https://bugs.launchpad.net/tripleo/+bug/1867035 AttributeError: 'MoleculeItem' object has no attribute 'funcargs' in molecule jobs
* https://review.opendev.org/712527
* https://review.opendev.org/712526
* :warning: https://bugs.launchpad.net/tripleo/+bug/1866316 ara < 1.0.0 is failing on overcloud-prep-image
* https://review.opendev.org/#/c/711597/
> [name=Arx Cruz]2020/03/06 - I don't know the best approach here if is to update Ara, or if it is to get rid of pip2 and use only pip3 in the overcloud-prep-image role
* https://bugs.launchpad.net/tripleo/+bug/1866184 centos-8 multinode overcloud deploy failed with heat_resource_tree_params = heat_resource_tree['parameters'] KeyError: 'parameters'
* https://review.opendev.org/711480
* Master fs2 upload issue seems legit:
* Error: /Stage[main]/Tripleo::Profile::Pacemaker::Ovn_dbs_bundle/Pacemaker::Constraint::Order[ip-192.168.24.12-with-ovndb_servers]/Pcmk_constraint[order-ovn-dbs-bundle-ip-192.168.24.12]/ensure: change from 'absent' to 'present' failed: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20200303-9-1fr9x0o create constraint failed: Error: Resource 'ip-192.168.24.12' does not exist~~
* https://logserver.rdoproject.org/40/25640/1/check/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/f29d8ce/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz~~
* https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/e7bbed3/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz~~
* filed https://bugs.launchpad.net/tripleo/+bug/1866031
* https://bugs.launchpad.net/tripleo/+bug/1865832 tox py3 virtualenv jobs failing ImportError cannot import name 'ContextManager'
:::
### OSP Closed
:::spoiler
OSP issues
* shark13 finally resolved - https://projects.engineering.redhat.com/browse/RHOSINFRA-3019
:::
## UNPINNING NOTES / pipeline:
Directions for the unpinning event:
:::spoiler
Watch the reviews in [1] for merge, there may be some packages that fail to build and do not move from current -> consistent. The RDO team will handle any of those issues.
The new content will then enter the component pipeline [2], known issues are tracked here [3], note that centos-8-scenario001 has been fixed w/ [4]. Please remember the component pipeline triggers the first component @ 12am UTC and the following 14 components 1.5 hours later. Use test-project jobs if you want to speed things up, try not to starve nodepool though.
* If you find a packaging or infra related issue, please alert the EOD for #rhos-ops
* If you find a new issue, please create a launchpad bug and mark it a promotion blocker and owned by the associated DFG to the component ( this should be much more clear and easy to determine the owner )
* If you find a CI issue please open a launch pad and mark it alert.
* If you are not sure what you have found leave notes in the ruck/rover hackmd [5]
To find which packages are building or have FTBFS
https://review.rdoproject.org/r/#/q/topic:rdo-FTBFS
https://trunk.rdoproject.org/centos8-master/queue.html
I'm really proud of your effort to get us here so quickly. Thank you!!
[1] https://review.rdoproject.org/r/#/c/25612/ https://review.rdoproject.org/r/#/c/25727/
[2] http://dashboard-ci.tripleo.org/d/UDA4H3aZk/component-pipeline?orgId=1
[3] https://hackmd.io/HrQd03c9SxOMtFPFrq50tg#Failing-component-pipeline-tests
[4] https://review.opendev.org/#/c/712289/[5] https://hackmd.io/7MBqFHurTA2e5H8kYRwgag#UNPINNING-NOTES--Component-pipeline
:::
:::info
Leave notes here on what you find from the recent unpinning
:::
### Divide and conquer
* Marios + Chandan EURO time
* Ronelle + Wes NA time
* Arx POC for tempest :)
#### openstack-periodic-master:
Failing tests:
* fs001
* fs020
* fs035
* scenario010-ovn - no plan for this to be in criteria now
#### components
* fs1 this is fuzzy atm - periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-baremetal-master
* undercloud install fail https://logserver.rdoproject.org/openstack-component-baremetal/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-baremetal-master/1417807/logs/undercloud/home/zuul/install-undercloud.log.txt.gz
* unable to create route on br-ex; possibly only due to ens3 in network-scipts
* ironic issues
* cleaning
* need to hold nodes and debug and discover
* tempest issues
* failing network-basic-ops
*~~https://bugs.launchpad.net/tripleo/+bug/1867807 periodic- centos-8-ovb-3ctlr_1comp-featureset001-baremetal-master fails to download tempest cirros image~~
* **https://bugs.launchpad.net/tripleo/+bug/1867945 periodic centos8 standalone-full-tempest-tempest-master timeout**
* tempest (?) related issues again 18/03 & no logs :/
* (compute) https://logserver.rdoproject.org/openstack-component-tempest/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-tempest-master/de4fa04/job-output.txt
* (tempest) https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-compute-master/d55ae94/job-output.txt
#### integration pipeline failures/bugs
* https://bugs.launchpad.net/tripleo/+bug/1867664 Master periodic jobs are failing overcloud deploy with ''Container(s) with bad ExitCode: [''container-puppet-neutron''], check logs in /var/log/containers/stdouts/'
* OVB is stuck on providing nodes
https://bugs.launchpad.net/tripleo/+bug/1866204
overcloud deploy failing on fs030 and fs016 while pulling mariadb container from undercloud registry
* https://bugs.launchpad.net/tripleo/+bug/1867599
overcloud deploy failed due to Systemd start for pcsd failed
* https://bugs.launchpad.net/tripleo/+bug/1867602
#### Reviews still in play to add/move jobs
* Scenario 12
* ~~[rdo-jobs] https://review.rdoproject.org/r/25910~~
* ~~[config] https://review.rdoproject.org/r/25911~~
* ~~[ci-config] https://review.rdoproject.org/r/25912~~
* f30 centos8
* ~~[rdo-jobs] https://review.rdoproject.org/r/25921~~
* ~~[config] https://review.rdoproject.org/r/25922~~
* ~~[ci-config] https://review.rdoproject.org/r/25923~~
* podman c8
* ~~[rdo-jobs] https://review.rdoproject.org/r/25916~~
* ~~[config] https://review.rdoproject.org/r/25920~~
* scenario010-ovn-provider-standalone check @ https://review.opendev.org/711507 , ~~periodic @ https://review.rdoproject.org/r/25745, layout @ https://review.rdoproject.org/r/25746 , criteria @https://review.rdoproject.org/r/25747~~, test @ https://review.rdoproject.org/r/25712
* fs039
### fs39 tracked there now https://tree.taiga.io/project/tripleo-ci-board/task/1604
* ~~definitions@ https://review.rdoproject.org/r/25793~~
* ~~layout@ https://review.rdoproject.org/r/25794~~
* ~~criteria@ https://review.rdoproject.org/r/25795~~
* ~~fix tripleo-inventory https://review.opendev.org/712962~~
* ~~fix https://review.rdoproject.org/r/25932 config master: Set right centos image for ovb-manage rdo baremetal_image~~
* fix https://review.opendev.org/713659 tripleo-quickstart-extras master: Update freeipa-setup role for centos-8 - ansible_pkg_mgr and packages
* fix https://review.opendev.org/714065 tripleo-quickstart-extras master: Use correct novajoin package for centos8 featureset 39 freeipa
* fix https://review.opendev.org/715397 openstack/tripleo-quickstart master: Move featureset039 to os-tempest conditionally adds required vars
~~* fix https://review.rdoproject.org/r/26036 rdo-jobs master: WIP Adds use_os_tempest for centos8 fs39 check and periodic~~ **No instead do /715397**
* fix ... **not needed** dont build images do r/#/c/26127 instead ~~https://review.opendev.org/714627 tripleo-quickstart-extras master: Fix build-images for centos8 jobs set tripleoclient and image-yaml~~
* fix ~~https://review.rdoproject.org/r/#/c/26127/ Set to_build false for periodic centos8 ovb featureset 039~~
* **test**@ https://review.rdoproject.org/r/25796
* Dont run c7 upgrades on master
* https://review.opendev.org/713277 Only run upgrades centos7 jobs on stable branches
* Reproducer
* ~~ Update get-dlrn-hash-newest to work with aggregate hashes https://review.opendev.org/713333~~
* Added jobs for distgit testing - https://review.rdoproject.org/r/25659
* Baremetal (downstream) reviews
* ~~https://code.engineering.redhat.com/gerrit/194899 Add upstream-centos-8 node~~
* ~~https://code.engineering.redhat.com/gerrit/#/c/194604/ Add baremetal base job for centos-8~~
* ~~https://code.engineering.redhat.com/gerrit/194901 Add centos-8 baremetal nodeset~~
* ~~https://softwarefactory-project.io/r/#/c/17819/ bump version to 2.34.0~~
* https://review.rdoproject.org/r/25995 Add get-hash role to dlrn-vars-setup.yml
* https://code.engineering.redhat.com/gerrit/195040 DNM - Testing bm centos 8 master jobs
* `~~"msg": "Failed to find required executable virtualenv in paths: /sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin/:/usr/local/sbin"`~~
* ~~issue with upstream-centos-8 node ... need to chat with nhicher~~
* https://code.engineering.redhat.com/gerrit/195145 Adjust envA settings for centos 8