# Ruck Rover - 29 Apr 2022 - 05 May 2022 ###### tags: `ruck_rover` ###### Previous RR notes: https://hackmd.io/2hB-P772SqyqDs0KKZzZEQ [Cockpit](http://dashboard-ci.tripleo.org/d/HkOLImOMk/upstream-and-rdo-promotions?orgId=1) [Downstream cockpit](http://tripleo-cockpit.lab4.eng.bos.redhat.com) --- ## Thu 05 May \o/ Notes from vexxhost: guilhermesp_ what we've seen was two computes in the ci aggregate were "stuck" not receiving instances since you guys reported. We are discussing with nova what caused the issue. Therefore, all other computes on that aggregate were overloaded, which was affecting probably in latency and failures to plug vifs guilhermesp_ right now we are 100% of the aggregate CI in, so we believe the recent jobs should be ok hopefully, lets watch guilhermesp_ i would keep watching throughout the day. THat's because the nodes will naturally be balanced again since we brought back full capacity earlier this afternoon guilhermesp_ btw we have an upstream bug to keep track of the problem https://bugs.launchpad.net/nova/+bug/1971760 mnaser it looks like n-ovs-agent is just hanging, i was stracing trying to find the root cause when it decided to wake up mnaser process_network_ports - iteration:1192598 - agent port security group processed in 6685.411 * "tripleowallabycentos9" promoting now * python3 roles/rrcockpit/files/telegraf_py3/ruck_rover.py --release master --distro centos-9 --aggregate_hash tripleo-ci-testing/d2/83/d28363f9b6b1ae5f6e1793f8dcabf76c (only missing fs035 - rerunning now - could skip and promote) ### New/Transient/No bug yet * . ### Bugs * https://bugs.launchpad.net/tripleo/+bug/1971703 openstack-tox-tht failing test_tht_ansible_syntax "assert run.rc == 0" * fix candidate got merged. Verify if that solved the issue: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/840683 * https://bugs.launchpad.net/tripleo/+bug/1971751 [C8][Train] cloud-init update broke SSH to VMs * attempted fix: https://review.opendev.org/c/openstack/tripleo-quickstart/+/840715 * https://review.opendev.org/c/openstack/tripleo-quickstart/+/840755 WIP: Exclude cloud-init-22.* from Appstream for c8 we also need to downgrade in build-images - dasm is prepping that patch * https://review.opendev.org/c/openstack/tripleo-ci/+/840766 -- install 21.1 package with virt-customize during overcloud build (seems stalled - needs fix) --- ## Wed 04 May (and earlier ongoing things tracked here) * https://bugs.launchpad.net/tripleo/+bug/1971586 periodic ovb fs39 component security fails "Assign admin role to admin project - Internal Server Error" * https://bugs.launchpad.net/tripleo/+bug/1971465 fs001 and fs035 OVB jobs failing tempest - identity/haproxy connection errors * https://bugs.launchpad.net/tripleo/+bug/1970899 master component network fs 1 fails provision - host not connected to any segments on routed provider network * https://bugs.launchpad.net/tripleo/+bug/1970400 CS9 - OVB FS001 master job is failing on overcloud_node_provisioning Failed to connect to the host via ssh * (cix degraded) https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time. * https://bugs.launchpad.net/tripleo/+bug/1971566 various periodic promotion jobs failing package download - "all mirrors were already tried" * ~~https://bugs.launchpad.net/tripleo/+bug/1971629~~ [C8][Wallaby] build-containers failures --- ## ***STOP (all tracked bugs duplicated above stop scrolling) STOP*** --- ## ~~Tue 03 May~~ ### Bugs: * https://bugs.launchpad.net/tripleo/+bug/1971465 fs001 and fs035 OVB jobs failing tempest - identity/haproxy connection errors * https://bugs.launchpad.net/tripleo/+bug/1970899 master component network fs 1 fails provision - host not connected to any segments on routed provider network * https://bugs.launchpad.net/tripleo/+bug/1970400 CS9 - OVB FS001 master job is failing on overcloud_node_provisioning Failed to connect to the host via ssh * (cix degraded) https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time. ### Promotions: * CS8 Train promoted via https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/42450 and it's reverted already * rhos-17 on rhel-9 failed fs035 for a third time. one tempest failure at a time. May skip and promote tomorrow. Will assess then ## ~~Mon 02 May~~ ### Bugs * https://bugs.launchpad.net/tripleo/+bug/1970899 master component network fs 1 fails provision - host not connected to any segments on routed provider network * ran baremetal component -- passed * running network component: https://review.rdoproject.org/r/c/testproject/+/41128 * https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time. * https://bugs.launchpad.net/tripleo/+bug/1970400 CS9 - OVB FS001 master job is failing on overcloud_node_provisioning Failed to connect to the host via ssh * ~~https://bugs.launchpad.net/bugs/1970484~~ CS8 Wallaby - ovb-3ctlr_1comp-featureset035 and featureset001 failing on node provision with: "conductor take over" * ~~https://bugs.launchpad.net/tripleo/+bug/1970736~~ [victoria] fs001 and fs035 failing due to tempest.lib.exceptions.TimeoutException: Request timed out ### Promotion * Trying to promote network CS9 Master: https://review.rdoproject.org/r/c/testproject/+/41128 * Kicked off manila and network component promotions re https://bugs.launchpad.net/tripleo/+bug/1970899 * CS8 Train: disabling fs001 & fs035 to get promotion: https://review.rdoproject.org/r/c/rdo-infra/ci-config/+/42450 * Failures are inconclusive. ## ~~Fri 29 April~~ * https://bugs.launchpad.net/tripleo/+bug/1970899 master component network fs 1 fails provision - host not connected to any segments on routed provider network * ran baremetal component -- passed * running network component: https://review.rdoproject.org/r/c/testproject/+/41128 * https://bugs.launchpad.net/tripleo/+bug/1970736 [victoria] fs001 and fs035 failing due to tempest.lib.exceptions.TimeoutException: Request timed out * https://bugs.launchpad.net/tripleo/+bug/1970400 CS9 - OVB FS001 master job is failing on overcloud_node_provisioning Failed to connect to the host via ssh * https://bugs.launchpad.net/bugs/1970484 CS8 Wallaby - ovb-3ctlr_1comp-featureset035 and featureset001 failing on node provision with: "conductor take over" * https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time. * ~~https://bugs.launchpad.net/tripleo/+bug/1970710~~ TASK [os_tempest : Override tempestconf profile] - The task includes an option with an undefined variable failing on undercloud-containers jobs * ~~https://bugzilla.redhat.com/show_bug.cgi?id=2080254 Bug 2080254 - Failed to download packages: openvswitch2.15-2.15.0-53.el9fdp.x86_64 (edit)~~ * started testproject: https://code.engineering.redhat.com/gerrit/c/testproject/+/398473 Rerunning fixed the issue. ### Promotions #### CS8 Wallaby - promoted! #### RHEL 8 OSP 16.2 * components tripleo & tempest: https://code.engineering.redhat.com/gerrit/c/testproject/+/309187 #### RHEL 8 OSP 17 * components compute & tempest: https://code.engineering.redhat.com/gerrit/c/testproject/+/301027 #### RHEL 9 OSP 17 - promoted! ### New/Transient/No bug yet * 09:44 < amoralej> hi, i have some jobs in rdo failing when running tempestconf 09:44 < amoralej> https://logserver.rdoproject.org/73/41273/11/check/rdoinfo-tripleo-master-testing-centos-8-scenario001-standalone/c153e03/logs/undercloud/var/log/tempest/tempestconf.log.txt.gz 09:44 < amoralej> ERROR tempest config_tempest.services.base.ServiceError: Request on service 'volumev3' with url 'http://192.168.24.3:8776/v3/extensions' failed with code 404 ## ~~Thu 28 April~~ * https://bugs.launchpad.net/tripleo/+bug/1970736 [victoria] fs001 and fs035 failing due to tempest.lib.exceptions.TimeoutException: Request timed out * ~~https://bugs.launchpad.net/tripleo/+bug/1970710~~ TASK [os_tempest : Override tempestconf profile] - The task includes an option with an undefined variable failing on undercloud-containers jobs * https://bugs.launchpad.net/tripleo/+bug/1970400 CS9 - OVB FS001 master job is failing on overcloud_node_provisioning Failed to connect to the host via ssh * https://bugs.launchpad.net/bugs/1970484 CS8 Wallaby - ovb-3ctlr_1comp-featureset035 and featureset001 failing on node provision with: "conductor take over" * https://bugs.launchpad.net/tripleo/+bug/1964940 Compute tests are failing with failed to reach ACTIVE status and task state "None" within the required time.