ruck_rover
Important links for ruck rover's ruck/rover links to help Ruck Rover - Unified Sprint #<fix> Dates: Feb 4 - Feb 25
Tripleo CI team ruck|rover: arxcruz , ysandeep OSP CI team ruck|rover: <fix>Names</fix>
Previous notes: link
put these issues in the spoiler.
check/Gate:
Stein branch check/Gate jobs are failing because of missing container images, Error - ImageNotFoundException https://bugs.launchpad.net/tripleo/+bug/1915921
promotions:
Master: 17th Feb (Yellow)
sc01/02 only failed once, passed in testproject
We have a bug for fs39 for master, fixed now.. fix need to hit integration line. *
We also need to talk with security dfg about fs039 - need to drop/migrate this job
**Victoria -
**Ussuri - Green -
**c8 train- **
** c7 train - Red - 23rd Jan** [CIX][LP:1915519][tripleoci][proa][Train][CentOS7][scenario004] Failing with Error: 'ip-192.168.24.3' already exists. Too many tries" https://bugs.launchpad.net/tripleo/+bug/1915519
add dates in decending order so the latest date is at the top. Break out TripleO and OSP sections.
Train C8:
fs002 is failing because overcloud deploy failed to create nova compute stack through heat https://logserver.rdoproject.org/openstack-periodic-integration-stable3/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_1comp-featureset002-train/65b8265/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz
fs020 is failing because onf some tempest test failure. https://logserver.rdoproject.org/openstack-periodic-integration-stable3/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-train/45dc70b/logs/undercloud/var/log/tempest/stestr_results.html.gz
<soniya> Looking into it..probably recent scenario manager patches may have caused the issue.
<soniya> So if we observe the logs here …the 10.0.0.118 is not pingable, so it may be either CI, network issue or deployment issue
<soniya> So if it constantly fails then it boils down to tempest and in this case it would most unlikely be tempest. Probably it also be due to cirros image issues. Hence for now, we can recheck the job for starters
@soniya in the second run fs020 passed
Note: I hit the testproject patch to re test above failures here https://review.rdoproject.org/r/#/c/32107/ and both the jobs got SUCCESS
ussuri: is green and promoted as well
master: [DNM] Retrigger master failed job https://review.rdoproject.org/r/#/c/28458/43/.zuul.yaml
victoria: fs001 , fs035 and standalone job is failing retrigred jobs on testproject here https://review.rdoproject.org/r/31460
Promotion status
NODE_FAILURES: https://review.rdoproject.org/zuul/builds?result=NODE_FAILURE
<ykarel> is there some known issue with vexxhost, our jobs using nodepool-rdo tenant in vexxhost are failing, on looking i see vms are not spawning, Build of instance 2806cd94-f395-4f2e-a803-2de297c45749 aborted: Failed to allocate the network(s), not rescheduling
<ykarel> ohhk other tenants also have too many NODE_FAILURE https://review.rdoproject.org/zuul/builds?result=NODE_FAILURE
<ykarel> so likely it's outage on vexxhost side, checking launcer logs
<ykarel> hmm it's same on zuul side too:- Detailed node error: Build of instance 0eee0560-1304-4f7a-ad60-57885077d066 aborted: Failed to allocate the network(s), not rescheduling
<ykarel> pinged on #vexxhost, but doubt if someone is around at this time, will wait
<bhagyashris> ykarel, ack thanks :)
<ykarel> mnaser fixed ^, now vms are being created successfully
<ykarel> bhagyashris, fyi ^
<bhagyashris> ykarel, ack thanks :)
Note:
<bhagyashris> ykarel, still seeing some node failure on triggered openstack-periodic-integration-stable3 pipeline https://review.rdoproject.org/zuul/status
<ykarel> bhagyashris, looking
<ykarel> bhagyashris, okk this is different one: nodepool.exceptions.LaunchNetworkException: Unable to find public IP of server
<ykarel> it is known already, we hitting this randomly when large build requests are at a point
<ykarel> see spikes in https://softwarefactory-project.io/grafana/d/lu6loudWz/provider-vexxhost-nodepool-tripleo?orgId=1&from=now-1h&to=now
<ykarel> around 08:03 UTC
<ykarel> dpawlik, did we get something from vexxhost for ^?
<ykarel> also fyi there was one more issue since saturday, can see discussion on #vexxhost for that
<dpawlik> ykarel: so the issue is related to the "Unable to find public IP of server" right?
<ykarel> the other issue since saturday was "Failed to allocate the network(s), not rescheduling", which is now fixed. the random one for "Unable to find public IP of server" is still happening
<dpawlik> near 8 was some pick of FIPs https://prometheus.monitoring.softwarefactory-project.io/prometheus/graph?g0.expr=floating_ip&g0.tab=0&g0.stacked=0&g0.range_input=1w
<ykarel> yeap but iiuc we are not hitting quota iirc which is 125, so something wrong on server side
<dpawlik> ykarel: not really. We are calculating the fips base on what the user can get from Neutron API.
<ykarel> how we can get from admin side?
<dpawlik> ykarel: we have a task for it. I will try to put it higher in priority
<ykarel> dpawlik, okk Thanks
<ykarel> dpawlik++
<bhagyashris> ykarel, ok thanks for info and dpawlik thanks :)
<dpawlik> ykarel: seems that via horizon the calculation is ok
<dpawlik> ykarel: so prometheus says that we now have 34 floating ips in use, where in horizon for nodepool-tripleo project is 52
<dpawlik> ykarel: I will try to dig a littlebit if we can do something in our script to fix that calculations
<ykarel> dpawlik, okk
<ykarel> dpawlik, those 52 ips are in-use state?
<dpawlik> yup
<dpawlik> ykarel|lunch: thats strange, all floating ips are from subnet 38.102.83.0/24 where it should also use 38.129.56.0/24
<dpawlik> and it can be possible that the first network is out of ips
<dpawlik> ykarel|lunch: maybe I found where is an issue
<dpawlik> ykarel|lunch: https://review.rdoproject.org/r/#/c/32123/
<dpawlik> slaweq: Hey. If we have network "public" and in that network, there are two subnets: 38.102.83.0/24 and 38.129.56.0/24 . Is it possible, that neutron is taking ip address just from one subnet and it does not touch second subnet until the first is finished or it will "touch" also the second subnet?
<slaweq> dpawlik: let me check in the code
<slaweq> I don't remember exactly
<slaweq> dpawlik: it seems for me that it can get IP from any subnet
<dpawlik> slaweq++
<dpawlik> thanks
<slaweq> look here https://github.com/openstack/neutron/blob/482d0fe2bf0b078ced598aae4059862981550cae/neutron/db/ipam_pluggable_backend.py#L257
<slaweq> it makes list of available IPs from all subnets in the network
<dpawlik> cc jpena ^^
<slaweq> or wait
<slaweq> it seems it will be like that, in https://github.com/openstack/neutron/blob/482d0fe2bf0b078ced598aae4059862981550cae/neutron/ipam/drivers/neutrondb_ipam/driver.py#L174 it iterates over allocation pools and getting IP from the pool
<slaweq> but to be sure I would need to test that :)
<dpawlik> slaweq: k. Thanks
<ykarel> dpawlik, Thanks, will check post meeting
Promotion status
https://bugs.launchpad.net/tripleo/+bug/1917418 [ [all] too many NODE_FAILURES on periodic CI jobs]
c7 train:
ussuri is green: https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-integration-stable2#
c8 train is green: https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-integration-stable3
master is green : https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-integration-main
Promotion status
master:
Note: 4 jobs are failing because of node failure issue but those are passing on testproject patch : https://review.rdoproject.org/r/#/c/29351/
noop https://review.rdoproject.org/zuul/build/af6873099da1473293f7b941b22afb19 : SUCCESS in 0s
periodic-tripleo-ci-centos-8-standalone-upgrade-master https://review.rdoproject.org/zuul/build/2aa0da71d4b6489b9e2d4dbbf4d5f43e : SUCCESS in 1h 58m 31s (non-voting) periodic-tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-master https://review.rdoproject.org/zuul/build/c31974e757604226b954b1a5c5a556f4 : SUCCESS in 1h 33m 24s periodic-tripleo-ci-centos-8-containers-undercloud-minion-master https://review.rdoproject.org/zuul/build/6de62803261f473fb611ccb2f3c2f52a : SUCCESS in 1h 00m 07s periodic-tripleo-ci-centos-8-scenario010-ovn-provider-standalone-master https://review.rdoproject.org/zuul/build/7895a32e8317422b89363ae1a809a534 : SUCCESS in 1h 14m 41s
master:
ussuri: