Very Search Log Analysis
===
## Preparation
1. Download one day adept-search log (2019-07-09) as an example for the investigation. The original log files can be found [here](https://s3.console.aws.amazon.com/s3/buckets/adept-very/adept-search/2019/07/09/?region=us-west-2&tab=overview).
2. Build an elasticsearch index as it is on the box:
```
http://192.168.0.215:9200/adept-search-2019-07-09/_search
```
## Checking the Number of Events
Run the elasticsearch
```
GET adept-search-2019-07-09/_search
{
"aggs": {
"events": {
"terms": {
"field": "event.keyword",
"size": 100
}
}
},
"size": 0
}
```
get the number of events as follows
```
"aggregations": {
"events": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "QUERY_EXECUTION_RESPONSE_TIME",
"doc_count": 366115
},
{
"key": "ADEPT_SEARCH_SEARCH",
"doc_count": 362279
},
{
"key": "ADEPT_SEARCH_CACHED_MATCH",
"doc_count": 358457
},
{
"key": "QUERY_UNDERSTANDING_DEBUG",
"doc_count": 3827
},
{
"key": "ADEPT_SEARCH_CACHED_MISS",
"doc_count": 3459
},
{
"key": "QUERY_UNDERSTANDING_RESPONSE_TIME",
"doc_count": 3321
},
{
"key": "ADEPT_SEARCH_CACHED_UPDATE_SUCCEEDED",
"doc_count": 3320
},
{
"key": "ADEPT_SEARCH_ES_SUGGEST_AUTOCORRECT",
"doc_count": 3134
},
{
"key": "QUERY_UNDERSTANDING_EXCEPTION",
"doc_count": 584
},
{
"key": "ADEPT_SEARCH_CACHED_NETWORKING",
"doc_count": 359
},
{
"key": "ADEPT_SEARCH_ES_SUGGEST_AUTOCORRECT_SKIPPED",
"doc_count": 128
},
{
"key": "ADEPT_SEARCH_CACHED_EXPIRED",
"doc_count": 9
},
{
"key": "ADEPT_SEARCH_AUTOCOMPLETE",
"doc_count": 1
}
]
}
}
```
A few questions are raised based on the number of events
- cache look up
- `ADEPT_SEARCH_CACHED_MATCH (358457) + ADEPT_SEARCH_CACHED_MISS (3459) + ADEPT_SEARCH_CACHED_NETWORKING (359) + ADEPT_SEARCH_CACHED_EXPIRED (9) = 362284 > ADEPT_SEARCH_SEARCH (362279)`
- cache update
- `ADEPT_SEARCH_CACHED_UPDATE_SUCCEEDED (3320) < ADEPT_SEARCH_CACHED_MISS (3459)`
- `QUERY_EXECUTION_RESPONSE_TIME (366115) > ADEPT_SEARCH_SEARCH (362279)`
- qu
- `QUERY_UNDERSTANDING_DEBUG (3827) > QUERY_UNDERSANDING_RESPONSE_TIME (3321) + QUERY_UNDERSTANDING_EXCEPTION (359)`
- `QUERY_UNDERSTANDING_DEBUG (3827) == ADEPT_SEARCH_CACHED_MISS (3459) + ADEPT_SEARCH_CACHED_NETWORKING (359) + ADEPT_SEARCH_CACHED_EXPIRED (9)`: this is a good one!
## QU Exceptions
Look at QU exceptions
```
GET adept-search-2019-07-09/_search
{
"aggs": {
"errors": {
"terms": {
"field": "properties.error.keyword",
"size": 100
}
}
},
"query": {
"term": {
"event.keyword": "QUERY_UNDERSTANDING_EXCEPTION"
}
},
"size": 0
}
```
response (partial)
```
"aggregations": {
"errors": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "HTTPConnectionPool(host='shopreader.machine-learning', port=7081): Read timed out. (read timeout=0.3)",
"doc_count": 503
},
{
"key": "HTTPConnectionPool(host='localhost', port=7081): Max retries exceeded with url: /version (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1115eac50>: Failed to establish a new connection: [Errno 61] Connection refused',))",
"doc_count": 1
}
]
}
}
```
It seems that most or the issues are connection timeout, and we should raise `QUERY_UNDERSTANDING_TIMEOUT` (we now have `QUERY_UNDERSTANDING_TIMEOUT` for timeouts and `QUERY_UNDERSTANDING_EXCEPTION` for any other exceptions)
## Other Issues
- many events miss the `query` field
- events should have a `search_id` field so all the events generated from one search can be easily aggregated for the analysis.