Ruck and Rover notes #41

# Ruck and Rover notes #41 ###### tags: `ruck_rover` :::info Important links for ruck rover's [ruck/rover links to help](https://hackmd.io/07z0xroHTFi2IbX93P5ZfQ) **Ruck Rover - Unified Sprint #<fix>** Dates: Feb 4 - Feb 25 Tripleo CI team ruck|rover: arxcruz , ysandeep OSP CI team ruck|rover: <fix>Names</fix> Previous notes: [link](https://hackmd.io/_XvcCzQlQ1-A9ygYMlmvKA) ::: [TOC] ### Issues to track on-going put these issues in the spoiler. :::danger #### tripleo check/Gate: Stein branch check/Gate jobs are failing because of missing container images, Error - ImageNotFoundException https://bugs.launchpad.net/tripleo/+bug/1915921 promotions: **Master: 17th Feb** (Yellow) sc01/02 only failed once, passed in testproject We have a bug for fs39 for master, fixed now.. fix need to hit integration line. * We also need to talk with security dfg about fs039 - need to drop/migrate this job **Victoria - **Ussuri - Green - **c8 train- ** ** c7 train - Red - 23rd Jan** [CIX][LP:1915519][tripleoci][proa][Train][CentOS7][scenario004] Failing with Error: 'ip-192.168.24.3' already exists. Too many tries" https://bugs.launchpad.net/tripleo/+bug/1915519 * Stein - Green - Promoted on 24th Feb * Rocky - Red periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset010-rocky is failing with NeutronError: "Invalid input for operation: segmentation_id requires physical_network for VLAN provider network" https://bugs.launchpad.net/tripleo/+bug/1916695 * Queens - Promoted on 11th Feb ::: :::info add dates in decending order so the latest date is at the top. Break out TripleO and OSP sections. ::: ## March 3rd ### Gate ### RDO ## Feb 25th ### Gate ### RDO * Train C8: * fs002 is failing because overcloud deploy failed to create nova compute stack through heat https://logserver.rdoproject.org/openstack-periodic-integration-stable3/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_1comp-featureset002-train/65b8265/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz * fs020 is failing because onf some tempest test failure. https://logserver.rdoproject.org/openstack-periodic-integration-stable3/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-train/45dc70b/logs/undercloud/var/log/tempest/stestr_results.html.gz <soniya> Looking into it..probably recent scenario manager patches may have caused the issue. <soniya> So if we observe the logs here ...the 10.0.0.118 is not pingable, so it may be either CI, network issue or deployment issue <soniya> So if it constantly fails then it boils down to tempest and in this case it would most unlikely be tempest. Probably it also be due to cirros image issues. Hence for now, we can recheck the job for starters * @soniya in the second run fs020 passed * **Note:** I hit the testproject patch to re test above failures here https://review.rdoproject.org/r/#/c/32107/ and both the jobs got SUCCESS * periodic-tripleo-ci-centos-8-ovb-1ctlr_1comp-featureset002-train https://review.rdoproject.org/zuul/build/e8d83beca52748c3a9412bd8db04dcdb : SUCCESS in 2h 48m 38s * periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-train https://review.rdoproject.org/zuul/build/7cb27abbb2734379af673222474fa092 : SUCCESS in 3h 03m 03s * ussuri: is green and promoted as well * master: [DNM] Retrigger master failed job https://review.rdoproject.org/r/#/c/28458/43/.zuul.yaml * victoria: fs001 , fs035 and standalone job is failing retrigred jobs on testproject here https://review.rdoproject.org/r/31460 ## 26th Feb ### RDO * Seeing some node failures in rdo - https://review.rdoproject.org/zuul/builds?result=NODE_FAILURE * on master and victoria integration pipeline ## 1st March ## Tripleo upsteram Gate: ## RDO: * Promotion status * Master - 1st March * Victoria - 1st March * Ussuri - 27th Feb * C8 train - 1st March * NODE_FAILURES: https://review.rdoproject.org/zuul/builds?result=NODE_FAILURE ``` <ykarel> is there some known issue with vexxhost, our jobs using nodepool-rdo tenant in vexxhost are failing, on looking i see vms are not spawning, Build of instance 2806cd94-f395-4f2e-a803-2de297c45749 aborted: Failed to allocate the network(s), not rescheduling <ykarel> ohhk other tenants also have too many NODE_FAILURE https://review.rdoproject.org/zuul/builds?result=NODE_FAILURE <ykarel> so likely it's outage on vexxhost side, checking launcer logs <ykarel> hmm it's same on zuul side too:- Detailed node error: Build of instance 0eee0560-1304-4f7a-ad60-57885077d066 aborted: Failed to allocate the network(s), not rescheduling <ykarel> pinged on #vexxhost, but doubt if someone is around at this time, will wait <bhagyashris> ykarel, ack thanks :) <ykarel> mnaser fixed ^, now vms are being created successfully <ykarel> bhagyashris, fyi ^ <bhagyashris> ykarel, ack thanks :) ``` * Note: * There are two issue with NODE_FAILURE as given below: 1. Detailed node error: Build of instance 0eee0560-1304-4f7a-ad60-57885077d066 aborted: Failed to allocate the network(s), not rescheduling / "Failed to allocate the network(s), not rescheduling" 2. nodepool.exceptions.LaunchNetworkException: Unable to find public IP of server : https://review.rdoproject.org/r/#/c/32123/ * So the first one get resolved but second one is happeing randomly and discussed this on #rhos-ops channel. ``` <bhagyashris> ykarel, still seeing some node failure on triggered openstack-periodic-integration-stable3 pipeline https://review.rdoproject.org/zuul/status <ykarel> bhagyashris, looking <ykarel> bhagyashris, okk this is different one: nodepool.exceptions.LaunchNetworkException: Unable to find public IP of server <ykarel> it is known already, we hitting this randomly when large build requests are at a point <ykarel> see spikes in https://softwarefactory-project.io/grafana/d/lu6loudWz/provider-vexxhost-nodepool-tripleo?orgId=1&from=now-1h&to=now <ykarel> around 08:03 UTC <ykarel> dpawlik, did we get something from vexxhost for ^? <ykarel> also fyi there was one more issue since saturday, can see discussion on #vexxhost for that <dpawlik> ykarel: so the issue is related to the "Unable to find public IP of server" right? <ykarel> the other issue since saturday was "Failed to allocate the network(s), not rescheduling", which is now fixed. the random one for "Unable to find public IP of server" is still happening <dpawlik> near 8 was some pick of FIPs https://prometheus.monitoring.softwarefactory-project.io/prometheus/graph?g0.expr=floating_ip&g0.tab=0&g0.stacked=0&g0.range_input=1w <ykarel> yeap but iiuc we are not hitting quota iirc which is 125, so something wrong on server side <dpawlik> ykarel: not really. We are calculating the fips base on what the user can get from Neutron API. <ykarel> how we can get from admin side? <dpawlik> ykarel: we have a task for it. I will try to put it higher in priority <ykarel> dpawlik, okk Thanks <ykarel> dpawlik++ <bhagyashris> ykarel, ok thanks for info and dpawlik thanks :) <dpawlik> ykarel: seems that via horizon the calculation is ok <dpawlik> ykarel: so prometheus says that we now have 34 floating ips in use, where in horizon for nodepool-tripleo project is 52 <dpawlik> ykarel: I will try to dig a littlebit if we can do something in our script to fix that calculations <ykarel> dpawlik, okk <ykarel> dpawlik, those 52 ips are in-use state? <dpawlik> yup <dpawlik> ykarel|lunch: thats strange, all floating ips are from subnet 38.102.83.0/24 where it should also use 38.129.56.0/24 <dpawlik> and it can be possible that the first network is out of ips <dpawlik> ykarel|lunch: maybe I found where is an issue <dpawlik> ykarel|lunch: https://review.rdoproject.org/r/#/c/32123/ <dpawlik> slaweq: Hey. If we have network "public" and in that network, there are two subnets: 38.102.83.0/24 and 38.129.56.0/24 . Is it possible, that neutron is taking ip address just from one subnet and it does not touch second subnet until the first is finished or it will "touch" also the second subnet? <slaweq> dpawlik: let me check in the code <slaweq> I don't remember exactly <slaweq> dpawlik: it seems for me that it can get IP from any subnet <dpawlik> slaweq++ <dpawlik> thanks <slaweq> look here https://github.com/openstack/neutron/blob/482d0fe2bf0b078ced598aae4059862981550cae/neutron/db/ipam_pluggable_backend.py#L257 <slaweq> it makes list of available IPs from all subnets in the network <dpawlik> cc jpena ^^ <slaweq> or wait <slaweq> it seems it will be like that, in https://github.com/openstack/neutron/blob/482d0fe2bf0b078ced598aae4059862981550cae/neutron/ipam/drivers/neutrondb_ipam/driver.py#L174 it iterates over allocation pools and getting IP from the pool <slaweq> but to be sure I would need to test that :) <dpawlik> slaweq: k. Thanks <ykarel> dpawlik, Thanks, will check post meeting ``` ## 2nd March ## Tripleo upsteram Gate: ## RDO periodic : * Promotion status * Master - 2nd March * Victoria - 1st March * Ussuri - 2nd march * C8 train - 2nd March * https://bugs.launchpad.net/tripleo/+bug/1917418 [ [all] too many NODE_FAILURES on periodic CI jobs] * c7 train: * ~~https://bugs.launchpad.net/tripleo/+bug/1917422 [ [train][c7] AttributeError: 'module' object has no attribute 'run' is failing on c7 train sc004 ]~~ * ~~https://review.opendev.org/c/openstack/tripleo-ansible/+/778146~~ * https://review.rdoproject.org/r/#/c/32137/ * ussuri is green: https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-integration-stable2# * periodic-tripleo-ci-centos-8-scenario010-ovn-provider-standalone-ussuri https://review.rdoproject.org/zuul/build/0274904786924709b32021e905a41403 : SUCCESS in 1h 10m 48s * c8 train is green: https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-integration-stable3 * periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset035-train https://review.rdoproject.org/zuul/build/f0bd9043488f462d8c40398db5f3c9de : SUCCESS in 4h 15m 06s * master is green : https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-integration-main ## 3rd March ## Tripleo upsteram Gate: ## RDO periodic : * Promotion status * Master - 2nd March * Victoria - 1st March * Ussuri - 2nd march * C8 train - 2nd March * master: * ~~https://bugs.launchpad.net/tripleo/+bug/1917583 [ AttributeError: 'HeatNativeLauncher' object has no attribute 'install_tmp']~~ * ~~https://review.opendev.org/c/openstack/python-tripleoclient/+/778381 [WIP] Fix AttributeError~~ * https://review.rdoproject.org/r/32152 [DNM] Test bug/1917583 * ~~https://bugs.launchpad.net/tripleo/+bug/1917621 ansible_distribution' is undefined /usr/share/ceph-ansible/roles/ceph-facts ~~ *~~ https://review.opendev.org/c/openstack/python-tripleoclient/+/778422~~ * ~~https://bugs.launchpad.net/tripleo/+bug/1917582~~ * ~~https://review.opendev.org/c/openstack/tripleo-heat-templates/+/778453~~ ## 4th March ## Tripleo upsteram Gate: ## RDO periodic ## 5th March ## Tripleo upsteram Gate: ## RDO periodic : * master: https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-integration-main# * Note: 4 jobs are failing because of node failure issue but those are passing on testproject patch : https://review.rdoproject.org/r/#/c/29351/ noop https://review.rdoproject.org/zuul/build/af6873099da1473293f7b941b22afb19 : SUCCESS in 0s periodic-tripleo-ci-centos-8-standalone-upgrade-master https://review.rdoproject.org/zuul/build/2aa0da71d4b6489b9e2d4dbbf4d5f43e : SUCCESS in 1h 58m 31s (non-voting) periodic-tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-master https://review.rdoproject.org/zuul/build/c31974e757604226b954b1a5c5a556f4 : SUCCESS in 1h 33m 24s periodic-tripleo-ci-centos-8-containers-undercloud-minion-master https://review.rdoproject.org/zuul/build/6de62803261f473fb611ccb2f3c2f52a : SUCCESS in 1h 00m 07s periodic-tripleo-ci-centos-8-scenario010-ovn-provider-standalone-master https://review.rdoproject.org/zuul/build/7895a32e8317422b89363ae1a809a534 : SUCCESS in 1h 14m 41s ## 8th March ## Tripleo upsteram Gate: ## RDO periodic : ## 9th March ## Tripleo upsteram Gate: ## RDO periodic : ## 10th March ## Tripleo upsteram Gate: ## RDO periodic : * master: * https://bugs.launchpad.net/tripleo/+bug/1918366 ## 11th March ## Tripleo upsteram Gate: ## RDO periodic : * master: * https://bugs.launchpad.net/tripleo/+bug/1918597 * https://review.opendev.org/c/openstack/openstack-tempest-skiplist/+/779941 * https://bugs.launchpad.net/tripleo/+bug/1918672 * ussuri: * https://bugs.launchpad.net/tripleo/+bug/1918489 * https://bugs.launchpad.net/tripleo/+bug/1918478 * https://bugs.launchpad.net/tripleo/+bug/1918476 ## 12th March ## Tripleo upsteram Gate: * https://bugs.launchpad.net/tripleo/+bug/1918890 ## RDO periodic : * master: * https://bugs.launchpad.net/tripleo/+bug/1918891 ( memcached firewall rules are not being created since merging " Add non-tls listener to Memcached" ) ## 15th March ## Tripleo upsteram Gate: * https://bugs.launchpad.net/tripleo/+bug/1918890 ## RDO periodic : * master: * https://bugs.launchpad.net/tripleo/+bug/1919111 * https://bugs.launchpad.net/tripleo/+bug/1919064 ## 16th March ## Tripleo upsteram Gate: * https://bugs.launchpad.net/tripleo/+bug/1918890 ## RDO periodic : * master: * https://bugs.launchpad.net/tripleo/+bug/1919299 * https://bugs.launchpad.net/tripleo/+bug/1919064

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.