owned this note changed 4 years ago
Published Linked with GitHub

Debug tempest timeout in baremetal jobs

Issue:-

Tempest is timing out on fs001 running on real baremetal job in 16.2(component pipeline)

Impacted jobs:

  • periodic-tripleo-ci-rhel-8-bm_envB-3ctlr_1comp-featureset001-baremetal-rhos-16.2
  • periodic-tripleo-ci-rhel-8-bm_envC-3ctlr_1comp-featureset001-tripleo-rhos-16.2

This is happening from a while now, Build history , Baremetal jobs are currently not in 16.2 critera, Removed after they faced some infra issue last month, but we would like to add them back in criteria.

https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-component-baremetal/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-bm_envB-3ctlr_1comp-featureset001-baremetal-rhos-16.2/03f88a7/job-output.txt

2021-09-23 02:39:36.277080 | primary | TASK [os_tempest : Execute tempest tests] **************************************
2021-09-23 02:39:36.277088 | primary | Thursday 23 September 2021  02:39:36 -0400 (0:00:00.053)       0:04:44.480 ****
2021-09-23 06:41:13.961303 | RUN END RESULT_TIMED_OUT: [untrusted : opendev.org/openstack/tripleo-ci/playbooks/tripleo-ci/run-v3.yaml@master]
2021-09-23 06:41:13.986879 | POST-RUN START: [trusted : code.engineering.redhat.com/openstack/tripleo-ci-internal-config/playbooks/dlrn/post-dlrn.yaml@master]
Observation:-
  • We can see TASK [os_tempest : Execute tempest tests]
    • started at 2021-09-23 02:39:36
    • Timed out after ~4 hours at 2021-09-23 06:41:13

But if we take a look at tempest.log we can see tempest run logs only started at 2021-09-23 06:39 (this may be because of timezone difference between nodepool node and undercloud node)

https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-component-baremetal/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-bm_envB-3ctlr_1comp-featureset001-baremetal-rhos-16.2/03f88a7/logs/undercloud/home/zuul/tempest/tempest.log

2021-09-23 06:39:27.856 535205 INFO tempest [-] Using tempest config file /home/zuul/tempest/etc/tempest.conf

No error observed in tempest_run.log, but we can see tempest_run was not completed till logs were collected:-

https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-component-baremetal/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-bm_envB-3ctlr_1comp-featureset001-baremetal-rhos-16.2/03f88a7/logs/undercloud/var/log/tempest/tempest_run.log

While checking directly on baremetal node, we can see that tempest run was actually completing successfully, and it didn't took much longer only ~15 mins.

[root@undercloud tempest]# cat tempest_run.log
{2} keystone_tempest_plugin.tests.api.identity.v3.test_service_providers.ServiceProvidersTest.test_service_provider_create [2.459328s] ... ok
{1} keystone_tempest_plugin.tests.api.identity.v3.test_mapping_rules.MappingRulesTest.test_mapping_rule_update [2.964076s] ... ok
{1} keystone_tempest_plugin.tests.api.identity.v3.test_mapping_rules.MappingRulesTest.test_mapping_rules_create [0.567365s] ... ok
{2} keystone_tempest_plugin.tests.api.identity.v3.test_service_providers.ServiceProvidersTest.test_service_provider_create_with_bad_attributes [1.314761s] ... ok
{1} keystone_tempest_plugin.tests.api.identity.v3.test_mapping_rules.MappingRulesTest.test_mapping_rules_create_without_mandatory_attributes_fails [0.480817s] ... ok
{2} keystone_tempest_plugin.tests.api.identity.v3.test_service_providers.ServiceProvidersTest.test_service_provider_create_with_enabled_true [1.131471s] ... ok
{1} keystone_tempest_plugin.tests.api.identity.v3.test_mapping_rules.MappingRulesTest.test_mapping_rules_get [1.129647s] ... ok
{3} keystone_tempest_plugin.tests.api.identity.v3.test_identity_providers.IndentityProvidersTest.test_add_protocol_to_identity_provider [4.619346s] ... ok
{2} keystone_tempest_plugin.tests.api.identity.v3.test_service_providers.ServiceProvidersTest.test_service_provider_create_with_relay_state_prefix [0.911674s] ... ok
{2} keystone_tempest_plugin.tests.api.identity.v3.test_service_providers.ServiceProvidersTest.test_service_provider_create_without_mandatory_attributes [0.377334s] ... ok
{2} keystone_tempest_plugin.tests.api.identity.v3.test_service_providers.ServiceProvidersTest.test_service_provider_get [0.672732s] ... ok
{3} keystone_tempest_plugin.tests.api.identity.v3.test_identity_providers.IndentityProvidersTest.test_add_protocol_to_identity_provider_unknown_mapping [2.303053s] ... ok
{1} keystone_tempest_plugin.tests.api.identity.v3.test_mapping_rules.MappingRulesTest.test_mapping_rules_list [2.811598s] ... ok
{2} keystone_tempest_plugin.tests.api.identity.v3.test_service_providers.ServiceProvidersTest.test_service_provider_list [2.188552s] ... ok
{2} keystone_tempest_plugin.tests.api.identity.v3.test_service_providers.ServiceProvidersTest.test_service_provider_update [1.465505s] ... ok
{3} keystone_tempest_plugin.tests.api.identity.v3.test_identity_providers.IndentityProvidersTest.test_get_protocol_from_identity_provider [3.710659s] ... ok
{2} keystone_tempest_plugin.tests.api.identity.v3.test_service_providers.ServiceProvidersTest.test_service_provider_update_with_bad_attributes_fails [0.999335s] ... ok
{3} keystone_tempest_plugin.tests.api.identity.v3.test_identity_providers.IndentityProvidersTest.test_identity_provider_create [1.244634s] ... ok
{3} keystone_tempest_plugin.tests.api.identity.v3.test_identity_providers.IndentityProvidersTest.test_identity_provider_create_with_enabled_true [1.864970s] ... ok
{2} keystone_tempest_plugin.tests.api.identity.v3.test_service_providers.ServiceProvidersTest.test_service_providers_in_token [3.208134s] ... ok
{3} keystone_tempest_plugin.tests.api.identity.v3.test_identity_providers.IndentityProvidersTest.test_identity_provider_create_with_remote_ids [1.399431s] ... ok
{3} keystone_tempest_plugin.tests.api.identity.v3.test_identity_providers.IndentityProvidersTest.test_identity_provider_get [1.958122s] ... ok
{3} keystone_tempest_plugin.tests.api.identity.v3.test_identity_providers.IndentityProvidersTest.test_identity_provider_list [5.111974s] ... ok
{3} keystone_tempest_plugin.tests.api.identity.v3.test_identity_providers.IndentityProvidersTest.test_identity_provider_update [2.031424s] ... ok
{3} keystone_tempest_plugin.tests.api.identity.v3.test_identity_providers.IndentityProvidersTest.test_list_protocols_from_identity_provider [5.307588s] ... ok
{3} keystone_tempest_plugin.tests.api.identity.v3.test_identity_providers.IndentityProvidersTest.test_update_mapping_from_identity_provider_protocol [4.324410s] ... ok
{3} keystone_tempest_plugin.tests.api.identity.v3.test_identity_providers.IndentityProvidersTest.test_update_protocol_from_identity_provider_unknown_mapping [4.544826s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_connectivity_between_vms_on_different_networks [173.841089s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_hotplug_nic [103.662164s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_mtu_sized_frames [86.420564s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_network_basic_ops [167.981590s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_preserve_preexisting_port [58.848144s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_router_rescheduling ... SKIPPED: Skipped because network extension: l3_agent_scheduler is not enabled
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_subnet_details [97.361447s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_update_instance_port_admin_state [119.400239s] ... ok
{0} tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_update_router_admin_state [92.592431s] ... ok
{0} keystone_tempest_plugin.tests.api.identity.v3.test_oauth1_tokens.OAUTH1TokensTest.test_authorize_request_token [2.217951s] ... ok
{0} keystone_tempest_plugin.tests.api.identity.v3.test_oauth1_tokens.OAUTH1TokensTest.test_create_access_token [2.438565s] ... ok
{0} keystone_tempest_plugin.tests.api.identity.v3.test_oauth1_tokens.OAUTH1TokensTest.test_create_and_show_consumer [0.982271s] ... ok
{0} keystone_tempest_plugin.tests.api.identity.v3.test_oauth1_tokens.OAUTH1TokensTest.test_create_request_token [0.792249s] ... ok
{0} keystone_tempest_plugin.tests.api.identity.v3.test_oauth1_tokens.OAUTH1TokensTest.test_delete_consumer [1.415031s] ... ok
{0} keystone_tempest_plugin.tests.api.identity.v3.test_oauth1_tokens.OAUTH1TokensTest.test_list_access_tokens [4.302080s] ... ok
{0} keystone_tempest_plugin.tests.api.identity.v3.test_oauth1_tokens.OAUTH1TokensTest.test_list_consumers [1.646343s] ... ok
{0} keystone_tempest_plugin.tests.api.identity.v3.test_oauth1_tokens.OAUTH1TokensTest.test_list_roles_for_access_token [2.492090s] ... ok
{0} keystone_tempest_plugin.tests.api.identity.v3.test_oauth1_tokens.OAUTH1TokensTest.test_revoke_access_token [2.693355s] ... ok
{0} keystone_tempest_plugin.tests.api.identity.v3.test_oauth1_tokens.OAUTH1TokensTest.test_show_role_for_access_token [2.727612s] ... ok
{0} keystone_tempest_plugin.tests.api.identity.v3.test_oauth1_tokens.OAUTH1TokensTest.test_update_consumer [1.164047s] ... ok

======
Totals
======
Ran: 47 tests in 936.9758 sec.
 - Passed: 46
 - Skipped: 1
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 984.0829 sec.

==============
Worker Balance
==============
 - Worker 0 (20 tests) => 0:15:27.901148
 - Worker 1 (5 tests) => 0:00:07.955639
 - Worker 2 (10 tests) => 0:00:14.734064
 - Worker 3 (12 tests) => 0:00:38.430176
  • Last modified date/time on var/log/tempest_run.log also confirm it took ~15 mins.
 ls -al tempest_run.log
-rw-r--r--. 1 root root 7029 Sep 23 06:55 tempest_run.log
  • I have also executed tempest tests manually and it tool ~16 mins:-
$ tempest run --concurrency 4 --serial --exclude-list tempest_excludelist.txt --include-list tempest_includelist.txt
.
.
======
Totals
======
Ran: 47 tests in 970.7604 sec.
 - Passed: 46
 - Skipped: 1
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 963.2943 sec

Comparision

  1. If we compare BM job against OVB fs001 tempest run is starting as soon as TASK [os_tempest : Execute tempest tests] runs.

https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-component-baremetal/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-internal-baremetal-rhos-16.2/6255192/job-output.txt

2021-09-23 00:59:52.158176 | primary | TASK [os_tempest : Execute tempest tests] **************************************
2021-09-23 00:59:52.158183 | primary | Thursday 23 September 2021  00:59:52 -0400 (0:00:00.049)       3:30:58.155 ****
2021-09-23 01:16:14.304582 | primary | ok: [undercloud]

https://sf.hosted.upshift.rdu2.redhat.com/logs/openstack-component-baremetal/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-internal-baremetal-rhos-16.2/6255192/logs/undercloud/home/zuul/tempest/tempest.log.txt.gz

2021-09-23 00:59:45.221 595912 INFO tempest [-] Using tempest config file /home/zuul/tempest/etc/tempest.conf
  1. If we compare against 17 working BM job. We see same behavior that undercloud node time logging is 4 hours ahead, But TASK [os_tempest : Execute tempest tests] completes within ~15 mins.
2021-09-23 02:15:47.945002 | primary | TASK [os_tempest : Execute tempest tests] **************************************
2021-09-23 02:15:47.945011 | primary | Thursday 23 September 2021  02:15:47 -0400 (0:00:00.049)       0:04:35.483 ****
2021-09-23 02:31:55.413165 | primary | ok: [undercloud]
2021-09-23 06:15:39.762 540823 INFO tempest [-] Using tempest config file /home/zuul/tempest/etc/tempest.conf

What's done:-

  • Confirmed manually tempest test run is not slow, completing within ~15mins

  • Possible explanation of 4 hours difference between "TASK [os_tempest : Execute tempest tests]" and tempest.log start time.

    • EDT and UTC have difference of 4 hours.
    • By default, the nodepool provider node we get have time in EDT
    • During undercloud/standalone we set timezone to UTC
2021-09-23 03:51:18 | 2021-09-23 03:51:18.888076 | 00137eea-76f1-4447-45de-000000000606 |       TASK | Set timezone to UTC
2021-09-23 03:51:19 | 2021-09-23 03:51:19.604711 | 00137eea-76f1-4447-45de-000000000606 |         OK | Set timezone to UTC | undercloud
Select a repo