# Interview introspection 2
###### tags: `interview`
There was another interview question test how you troubleshooting an performance issue.
The interviewer show you an infrastruce diagram which present couple of server groups.
There an group of clients cross from different countries, connect to a gateway and then loadbalncer. The backend of loadbalancer connect with various applicaiton servers. And finally those application servers connect to different database servers.
Clients <-> Gateway <-> LB <-> App Servers <-> Database Servers
The quesiton is, assume your first day work in the company, and there is an performance issue reported from clients, how do you troubleshoot and fix?
#### questions
I think this question is trying to understand expereience you have before on managing cloud systems. It has many factors to consider to find right track and make all information linked correctly.
Because the question just provides very tiny information. So I further asked how slow that clients exprienced compare to normal case. Let's say the client click a button on website to load a page an its 10 times slower than normal.
Are all clients or just part of clients have such issue? -> part of clients.
Are we able to access client environment to have few command testing? -> No (The reason I ask it is because in my past experience, its a short cut to identify issue quickly and we work closely with customers or partners so sometimes we could do that. If yes, then I could ask some command output from client side such as traceroute, ping, dns or tcpdump)
#### where to start troubleshoot?
Because I do not know what applications are running and what is normaly loading or state from monitoring system, I think start from checking generic system metrics or data is good step.
There are some approach to begin with. firstly, I will check server logs files to find any unusual logs. However I think this is for engineer who knows the system well at a current degree. For a new engineer in the company, maybe check system loading first.
The system loading not just CPU and Memory usage, but also include every disk IO and network interface usage. For example, normally it is unlikey that CPU loading over 80% or memory usage cross 90% above for a long period. If there are a few process consume hugh cpu/memory resources, Its worth to check their state or logs.
There are many factors could impact performance and mostly network is not a bottleneck unless servers connect with slow or 1G network. There are some further checks could do such as checking dropped packets or interface flaping (check from dmesg or syslog). Also if there is unusual or unbalanced transfer/receive statistic, go further to examine it.
Moreover, domain resolution is very essential in web application. Sometimes DNS queries take longer than usual due to configuration or network issue. Check some critical domains to ensure they can be resolved quickly and correct.
Finally, collect some IO statistic to identify disk IO usage. if there is an disk IO usage keep 100% for a long period or its performance is obviously slow then others, further check what data are stroed and whether it relates to the application data. (for example, in Ceph storage, single slow OSD disk could affect application data accessability and performance)
#### conclusion
iIt is difficult to identify root cause directly for performance issue on a web application. It requires good knowledge and senses in systems to inference possible reasons and narrow down to specific software component or hardware device base on evidence and observation. And finally, consider any update/change made recently.