# 162697767
## Issue
- SCG Connectivity can't be enabled, no master PSNT is shown for Access Key tab.
## Cause
- VXM can't gather hosts' details from hosts' Platform Service.
## Troubleshooting
- Hosts' Platform Service can't connect to iDRAC for some reason.
- Network connections via iSM failed.
- Can't ping iDRAC internal IP: 169.254.0.1
- iSM Status: Running (Limited functionality)
- This because network connection between the host and iDRAC on iSM/vmk1 failed.
## Plan
- Restore iDRAC network connection on iSM/vmk1
- Restore Platform Service
- Ensure VXM can gather hosts' hw information via platform service
- Ensure redis cache have hosts' details keys.
## What has been done
- Can't ping iDRAC's internal IP, this is why Platform Service failed to work.
- For some reason, 169.254.0.1 is not reachable
- No specific firewall rule or security setting blocks this connection.
- Checked and found that iSM status on iDRAC is shown as:
- iSM Status: Running (Limited functionality)
- This is the reason why the connection fails.
- Follow kb 000042093, workaround gaved by PSE-19686 on VXEE-11377
- Restarted iDRAC and all related components
- This makes vmk1 to iDRAC connection works properly
- This bring up Platform Service to normal operation
- No VxRM is able to connect to all the hosts' platform service.
- Redis cache has been filled up with new hw keys.
### New Issue found:
- No Access Key tab is shown when enabling SCG Connectivity
What has been done:
- Follow KB: https://www.dell.com/support/kbdoc/en-us/000201895
- Delete hosts' universal keys.
## The idea:
- For any SCG Connectivity to work,
- Ensure cluster health is OK first.
- This means from iDRAC to iSM to Platform Service to do-host and do-cluster are working properly.
- Note: some thing is running does not always means it's working properly.
## Putty Log
### Failed to ping 169.254.0.1
```
[root@MH-ESX-02:~] vmkping -I vmk1 169.254.0.1
PING 169.254.0.1 (169.254.0.1): 56 data bytes
sendto() failed (Host is down)
```
### Even Network looks OK
Network:
```
[root@MH-ESX-02:~] esxcfg-route -l
VMkernel Routes:
Network Netmask Gateway Interface
10.63.104.0 255.255.255.0 Local Subnet vmk4
10.63.110.0 255.255.255.0 Local Subnet vmk3
169.254.0.0 255.255.255.0 Local Subnet vmk1
172.23.109.0 255.255.255.0 10.63.110.254 vmk3
10.63.100.0 255.255.252.0 Local Subnet vmk2
default 0.0.0.0 10.63.103.254 vmk2
```
### Platform Service status
Log states that pservice fails to connect to iDRAC:
```
[root@MH-ESX-02:~] tail -n 30 /var/log/platform_svc.log
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - Traceback (most recent call last):
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - File "/opt/vxrail/bin/service/dispatch.py", line 65, in handle_msg
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - hdlr(params, conn)
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - File "/opt/vxrail/bin/service/dispatch.py", line 94, in __call__
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - return self.handle_msg(msg, conn)
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - File "/opt/vxrail/bin/service/dispatch.py", line 88, in handle_msg
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - conn.send_msg(status, resp)
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - File "/opt/vxrail/bin/service/network.py", line 132, in send_msg
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - sent_bytes = self.sock.send(b)
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - BrokenPipeError: [Errno 32] Broken pipe
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - Exception dispatching message: ('platform', {'subaction': 'get'})
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - Traceback (most recent call last):
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - File "/opt/vxrail/bin/service/dispatch.py", line 65, in handle_msg
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - hdlr(params, conn)
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - File "/opt/vxrail/bin/service/dispatch.py", line 94, in __call__
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - return self.handle_msg(msg, conn)
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - File "/opt/vxrail/bin/service/dispatch.py", line 88, in handle_msg
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - conn.send_msg(status, resp)
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - File "/opt/vxrail/bin/service/network.py", line 132, in send_msg
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - sent_bytes = self.sock.send(b)
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - BrokenPipeError: [Errno 32] Broken pipe
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR -
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - During handling of the above exception, another exception occurred:
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR -
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - Traceback (most recent call last):
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - File "/opt/vxrail/bin/service/dispatch.py", line 68, in handle_msg
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - conn.send_msg(STATUS_INTERNAL_FAILURE, str(ex))
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - File "/opt/vxrail/bin/service/network.py", line 132, in send_msg
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - sent_bytes = self.sock.send(b)
2023-02-21T03:56:20Z platform_svc: [esxcli-plugin-service-thread] ERROR - BrokenPipeError: [Errno 32] Broken pipe
```
### iDRAC status is still OK
```
[root@MH-ESX-02:~] /opt/vxrail/tools/ipmitool mc info
Device ID : 32
Device Revision : 1
Firmware Revision : 6.00
IPMI Version : 2.0
Manufacturer ID : 674
Manufacturer Name : DELL Inc
Product ID : 256 (0x0100)
Product Name : Unknown (0x100)
Device Available : yes
Provides Device SDRs : yes
Additional Device Support :
Sensor Device
SDR Repository Device
SEL Device
FRU Inventory Device
IPMB Event Receiver
Bridge
Chassis Device
Aux Firmware Rev Info :
0x00
0x23
0x1e
0x00
```
```
[root@MH-ESX-02:~] /opt/vxrail/tools/ipmitool lan print
Set in Progress : Set Complete
Auth Type Support : MD5
Auth Type Enable : Callback : MD5
: User : MD5
: Operator : MD5
: Admin : MD5
: OEM :
IP Address Source : Static Address
IP Address : 10.63.103.225
Subnet Mask : 255.255.252.0
MAC Address : b4:45:06:e8:b5:9f
SNMP Community String : rymansnmpro
IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
BMC ARP Control : ARP Responses Enabled, Gratuitous ARP Disabled
Gratituous ARP Intrvl : 2.0 seconds
Default Gateway IP : 10.63.103.254
Default Gateway MAC : 00:00:00:00:00:00
Backup Gateway IP : 0.0.0.0
Backup Gateway MAC : 00:00:00:00:00:00
802.1q VLAN ID : Disabled
802.1q VLAN Priority : 0
RMCP+ Cipher Suites : 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
Cipher Suite Priv Max : Xaaaaaaaaaaaaaa
: X=Cipher Suite Unused
: c=CALLBACK
: u=USER
: o=OPERATOR
: a=ADMIN
: O=OEM
Bad Password Threshold : Not Available
```
### Recovery iSM connectivity
- New Hardware key has been created in the redis cache.
```
mh-vxm-01:/var/log # kubectl exec -it $(kubectl get pods -o name | grep cache | cut -d '/' -f 2) -- redis-cli -n 1
Defaulted container "cacheservice" out of: cacheservice, dependency (init)
127.0.0.1:6379[1]> keys **
1) "GSMWXT3:is_linzhi"
2) "pre_fetch_task"
3) "GSMWXT3:mh-esx-01.internal.rymanhealthcare.co.nz:hardware.model"
4) "HSMWXT3:is_linzhi"
127.0.0.1:6379[1]> keys *
1) "GSMWXT3:is_linzhi"
2) "pre_fetch_task"
3) "GSMWXT3:mh-esx-01.internal.rymanhealthcare.co.nz:hardware.model"
4) "HSMWXT3:is_linzhi"
127.0.0.1:6379[1]>
mh-vxm-01:/var/log # # NOW WE HAVE NODE 01 IN REDIS CACHE
```
### After iDRAC restart
- Stop dcism
- vmk1 is gone.
```
[root@MH-ESX-02:~] vmkping -I vmk1 169.254.0.1
PING 169.254.0.1 (169.254.0.1): 56 data bytes
sendto() failed (Host is down)
[root@MH-ESX-02:~] less /var/run/log/vxps_main.log
[root@MH-ESX-02:~] /etc/init.d/sfcbd-watchdog stop
sfcbd-init[2536935]: args ('stop')
sfcbd-init[2536935]: Getting Exclusive access, please wait...
sfcbd-init[2536935]: Exclusive access granted.
sfcbd-init[2536935]: Request to stop sfcbd-watchdog, pid 2536935
sfcbd-init[2536935]: Invoked kill 2102166
sfcbd-init[2536935]: stop sfcbd process completed.
[root@MH-ESX-02:~] /etc/init.d/dcism-netmon-watchdog stop
[root@MH-ESX-02:~] esxcfg-vmknic -l
Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
vmk2 11 IPv4 10.63.103.227 255.255.252.0 10.63.103.255 00:50:56:64:e9:76 1500 65535 true STATIC defaultTcpipStack
vmk2 11 IPv6 fe80::250:56ff:fe64:e976 64 00:50:56:64:e9:76 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk0 4 IPv4 N/A N/A N/A 00:50:56:61:d5:f9 1500 65535 true NONE defaultTcpipStack
vmk0 4 IPv6 fe80::250:56ff:fe61:d5f9 64 00:50:56:61:d5:f9 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk3 19 IPv4 10.63.110.232 255.255.255.0 10.63.110.255 00:50:56:61:15:26 1500 65535 true STATIC defaultTcpipStack
vmk3 19 IPv6 fe80::250:56ff:fe61:1526 64 00:50:56:61:15:26 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk4 3 IPv4 10.63.104.232 255.255.255.0 10.63.104.255 00:50:56:67:79:4e 9000 65535 true STATIC defaultTcpipStack
vmk4 3 IPv6 fe80::250:56ff:fe67:794e 64 00:50:56:67:79:4e 9000 65535 true STATIC, PREFERRED defaultTcpipStack
vmk5 11 IPv4 10.63.105.232 255.255.255.0 10.63.105.255 00:50:56:61:6a:14 9000 65535 true STATIC vmotion
vmk5 11
```
### Restarted dcism
- vmk1 appeared back.
```
[root@MH-ESX-02:~] esxcfg-vmknic -l
Interface Port Group/DVPort/Opaque Network IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type NetStack
vmk2 11 IPv4 10.63.103.227 255.255.252.0 10.63.103.255 00:50:56:64:e9:76 1500 65535 true STATIC defaultTcpipStack
vmk2 11 IPv6 fe80::250:56ff:fe64:e976 64 00:50:56:64:e9:76 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk0 4 IPv4 N/A N/A N/A 00:50:56:61:d5:f9 1500 65535 true NONE defaultTcpipStack
vmk0 4 IPv6 fe80::250:56ff:fe61:d5f9 64 00:50:56:61:d5:f9 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk3 19 IPv4 10.63.110.232 255.255.255.0 10.63.110.255 00:50:56:61:15:26 1500 65535 true STATIC defaultTcpipStack
vmk3 19 IPv6 fe80::250:56ff:fe61:1526 64 00:50:56:61:15:26 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk4 3 IPv4 10.63.104.232 255.255.255.0 10.63.104.255 00:50:56:67:79:4e 9000 65535 true STATIC defaultTcpipStack
vmk4 3 IPv6 fe80::250:56ff:fe67:794e 64 00:50:56:67:79:4e 9000 65535 true STATIC, PREFERRED defaultTcpipStack
vmk1 iDRAC Network IPv4 169.254.0.2 255.255.255.0 169.254.0.255 00:50:56:6f:45:47 1500 65535 true STATIC defaultTcpipStack
vmk1 iDRAC Network IPv6 fe80::250:56ff:fe6f:4547 64 00:50:56:6f:45:47 1500 65535 true STATIC, PREFERRED defaultTcpipStack
vmk5 11 IPv4 10.63.105.232 255.255.255.0 10.63.105.255 00:50:56:61:6a:14 9000 65535 true STATIC vmotion
vmk5 11 IPv6 fe80::250:56ff:fe61:6a14 64 00:50:56:61:6a:14 9000 65535 true STATIC, PREFERRED vmotion
```
### Platform Service started and working normally
platform_svc.log
```
2023-02-21T04:37:54Z platform_svc: [MainThread] INFO - ---------- Platform Service -----------------
2023-02-21T04:37:54Z platform_svc: [MainThread] INFO - ism address is: [169.254.0.1]
2023-02-21T04:37:54Z platform_svc: [MainThread] INFO - Registering "tasks"
2023-02-21T04:37:55Z platform_svc: [MainThread] INFO - iDracOpt. timestamp: 1676954274
2023-02-21T04:37:55Z platform_svc: [MainThread] INFO - Registering "platform"
2023-02-21T04:37:55Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Waiting for iSM ready
2023-02-21T04:37:55Z platform_svc: [MainThread] INFO - Setting up listener
2023-02-21T04:37:55Z platform_svc: [esxcli-plugin-service-thread] INFO - Starting endpoints
2023-02-21T04:37:55Z platform_svc: [esxcli-plugin-service-thread] INFO - Starting network loop
2023-02-21T04:37:55Z platform_svc: [esxcli-plugin-service-thread] INFO - Start network loop
2023-02-21T04:37:55Z platform_svc: [esxcli-plugin-service-thread] INFO - Backend started
2023-02-21T04:37:55Z platform_svc: [ThreadPoolExecutor-0_0] INFO - USB nic is ready
2023-02-21T04:37:55Z platform_svc: [ThreadPoolExecutor-0_0] INFO - iSM is ready
2023-02-21T04:37:55Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Found service acccount in slot 15
2023-02-21T04:37:55Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Refresh Password
2023-02-21T04:37:56Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Enable the account
2023-02-21T04:37:56Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Setting up the public key for the service account
2023-02-21T04:38:01Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Service account setup successfully
2023-02-21T04:38:01Z platform_svc: [MainThread] INFO - the Idrac is ready.
2023-02-21T04:38:01Z platform_svc: [MainThread] INFO - Add sticky bit for PS config file
2023-02-21T04:38:01Z platform_svc: [MainThread] INFO - started the restful API service.
2023-02-21T04:38:04Z platform_svc: [ThreadPoolExecutor-0_0] INFO - The updateable firmware info collected
2023-02-21T04:38:05Z platform_svc: [ThreadPoolExecutor-0_0] INFO - PTAgent Account setup done
2023-02-21T04:38:05Z platform_svc: [ThreadPoolExecutor-0_0] INFO - ConfigSetting: agent_backend='auto'
2023-02-21T04:38:05Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Detected Agent Backend: 'none'
2023-02-21T04:38:05Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Bypass the firmware inventory initialization
2023-02-21T04:38:05Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Start PSAPI service thread
2023-02-21T04:38:05Z platform_svc: [ThreadPoolExecutor-0_0] INFO - resistered api list: /node/discoveryinfo$ /node/info$ /node/apis$ /node/fw$ /node/fwupgrade/staging$ /node/fwupgrade/apply$ /node/idrac/reset$ /node/inventory/hard_refresh$ /node/tasks$ /node/tasks/(?P<taskid>[\w-]+)$ /node/agent/status$
2023-02-21T04:38:05Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Registering "dell.idrac_account"
2023-02-21T04:38:05Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Registering "techsuprep"
2023-02-21T04:38:05Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Registering "dell.esxi_tmp_account"
2023-02-21T04:38:08Z platform_svc: [ThreadPoolExecutor-0_0] INFO - Platform initialized
```
### redis cache after fixed both 2 hosts
```
mh-vxm-01:/var/log # kubectl exec -it $(kubectl get pods -o name | grep cache | cut -d '/' -f 2) -- redis-cli -n 1
Defaulted container "cacheservice" out of: cacheservice, dz`ependency (init)
127.0.0.1:6379[1]> keys *
1) "pre_fetch_task"
2) "GSMWXT3:mh-esx-01.internal.rymanhealthcare.co.nz:hardware.model"
3) "day2:cluster:node:cache"
4) "HSMWXT3:is_linzhi"
5) "GSMWXT3:is_linzhi"
6) "day2:cluster:appliance:cache"
7) "HSMWXT3:mh-esx-02.internal.rymanhealthcare.co.nz:hardware.model"
8) "day2:index:moid"
127.0.0.1:6379[1]>
mh-vxm-01:/var/log # # NOW WE HAVE BOTH 02 HOSTS in the REDIS CACHE
```
### ESE Missing the Access Key tab
- Something with hosts' universal keys.
```
mh-vxm-01:/var/log # tail -f /var/log/microservice_log/short.term.log | grep -E 'dell-ese|rcs-service'
"2023-02-21 04:42:44,871" microservice.dell-ese "2023-02-21T04:42:44.192483171Z stderr F 8 2023-02-21 04:42:44,192 CP Server Thread-9 INFO cherrypy.access.140315519420784 LN: 283 ::ffff:172.28.175.102 - - [21/Feb/2023:04:42:44] ""GET /ese/status HTTP/1.1"" 200 23 """" ""kube-probe/1.23"""
"2023-02-21 04:42:44,871" microservice.dell-ese "2023-02-21T04:42:44.192484743Z stdout F ::ffff:172.28.175.102 - - [21/Feb/2023:04:42:44] ""GET /ese/status HTTP/1.1"" 200 23 """" ""kube-probe/1.23"""
"2023-02-21 04:43:00,860" microservice.rcs-service "2023-02-21T04:43:00.185602377Z stderr F 2023-02-21 04:43:00,182 [INFO] <ThreadPoolExecutor-0_5:139719195203328> commonservice.py value_from_configservice() (30): get key ese_state from configservice value = unconfigured"
"2023-02-21 04:43:04,859" microservice.rcs-service "2023-02-21T04:43:04.163401412Z stderr F 2023-02-21 04:43:04,161 [INFO] <ThreadPoolExecutor-0_6:139718708688640> universal_key_service.py get_universal_key() (162): the request :HSMWXT3 CLUSTER"
```
Follow KB:
- https://www.dell.com/support/kbdoc/en-us/000201895
- Delete hosts' universal keys.