Elastic search

# Elastic search **How the Elastic search manage indices** * Inverted index - https://codingexplained.com/coding/elasticsearch/understanding-the-inverted-index-in-elasticsearch **Search types** * term Searches - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html * Compound searches * https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis-dict-decomp-tokenfilter.html * https://www.elastic.co/guide/en/elasticsearch/reference/7.6/analysis-hyp-decomp-tokenfilter.html * filter searches - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html * Full text searches- https://www.elastic.co/guide/en/elasticsearch/reference/current/full-text-queries.html **For auto-Complete and regex** - ES Use tries data structure **Get Cluster health** Url - http://localhost/your_indeces/health?v&pretty **Green** - `all replica set and shard available for this indices, means our these indices are fully functioning` **Yellow** - `maybe some replica set is not available under this cluster but is still functioning and able to serve request` **Red** - `cluster is not able to work properly or not functioning weel your data is the loss or not able to serve request` **Get node details like which nodes and replica set are available** Url - http://locahost/your_indeces/nodes?v&pretty **Create index of name cricket** `curl -XPUT 'localhost:9200/cricket?pretty' -H 'Content-Type: application/json' -d' { "settings" : { "index" : { "number_of_shards" : 3, "number_of_replicas" : 2 } } } '` **notes** * if you want to get only data like the document exists or not then just pass _source=false in the get request. `Example - localhost:9200/company/employee/1?pretty&_source=false` `Result - { "_index": "company", "_type": "employee", "_id": "1", "_version": 2, "found": true }` `if source is not false then result is { "_index": "company", "_type": "employee", "_id": "1", "_version": 2, "found": true, "_source": { "name": "Manoj Choudhary", "designation": "Fullstack developer", "address": "89 rajsthan 382545", "intrested": [ "cricket", "article", ], "age": 26, "status": "single", "gender": "male" } }` **Bulk insertion Curl** Example - ` curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d' { "index" : { "_index" : "company", "_type" : "employee", "_id" : "9" } } {"name":"Malay shah","age":30} { "index" : { "_index" : "company", "_type" : "employee", "_id" : "10" } } {"name":"Naseer prajapati","age":26} '` **search in multiple indices** `just pass multiple indices with comma like curl -XGET 'localhost:9200/cricket/series,news/_search?&q=name:austrailia'` **Few Tags for search** * you can search by adding _search tag to URL like - http://localhost:9200/cricket/series/_search?q=name:ind-vs-aus* * You can use regex for searching like q=ma* or exact search by just put the keyword you want to search. * **from** tag which used for pagination you can add from an index where you want to see data like I wan to see record after 10 records than simply put **from=10** * **size** tag here you just add size to see the document you want. like I want to see 2 documents only then just put **size=2** * **Example** http://localhost:9200/cricket/series/_search?q=name:ma*&from=2&size=2 * **sort** just pass in URL like **sort:run:desc** or in {"sort":"runs":{"order":"desc"}} this formats also. **Even you can do with this formats** `curl -XGET 'localhost:9200/cricket/series/_search?pretty' -H 'Content-Type: application/json' -d' { "query":{ "match_all":{}}, "from":2, "size":2 } '` * **Hit request to match keyword** * ` curl -XGET 'localhost:9200/cricket/series/_search?pretty' -H 'Content-Type: application/json' -d' { "query":{ "match":{"name":"ind-vs-africa"}}, "size":2 } '` * want to hit nested query just use dot to go in depth like **user.name.fname** **Term Query** The `term` query finds documents that contain the **exact** term specified in the inverted index. For instance: `curl -XPOST 'localhost:9200/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "term" : { "series" : "ind-vs-aus" } } } '` **Fileds Projection** You can view data only you want by projection example - `_source:n* here you can use regex also pass multiple fields in array ` **Example** _source:[name,runs,desc*] in json body. `You can also includes and exludes by ` `_source:{ includes:["name","runs"], exludes:["country"] }` **Full-text Query keyword** * match * match_phrase * match_phrase_prefix **Logical Or matches** if you not specify **"operator": "or"** the default its take as or * `curl -XGET 'localhost:9200/company/employee/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "match": { "name":"frank norris", "operator":"or" } } }'` * with match_phrase you can match exact full text to search above just replace match with matchphrase and you get only result which fully match with that phrase. * **match_phrase_prefix** is give result which docs prefix match. **Low and high-frequency terms** * like the sentence of search is The Quick sport then **The** is high-frequency work and **Quick Sport** is the low-frequency word so first elastic search finds by low-frequency word then go for high-frequency word. * benefits of that method is that low-frequency word occur a few times so you get fewer document so your performance is increasing. **Clauses** * **Boolean QUERY CLAUSES** * **must** `the caluse must appear in matching document` * **should**`The caluse may appear in matching document but may not sometime` * **must_not** `The clause must not appear in matching document` * **filter** **Example** `curl -XGET 'localhost:9200/company/employee/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "bool": { "must":[ {"match":{"name":"dhoni"}} ] } } } }'` **Note** bool `keyword is madatory` * **boost** Keyword - use to change relavance of document supposw with term name:"frank" you want first then just add boost keyword in your term and you get doc with name:"frank" first. * **Example** `curl -XGET 'localhost:9200/company/employee/_search?pretty' -H 'Content-Type: application/json' -d' { "query": { "bool": { "should":[ {"term":{"name":"frank","boost":2.0}} {"term":{"name":"Amanda"}} ] } } } }'` * **range** Term . `This search goes under filter tag` * **Example** `filter:{range:{age:{gt:10,lt:12}}}` * **hirarchy** query=>bool=>filter=>range # Aggregation for analytics **Fielddata=true** why - aggregation is done in heap memory so elastic search ensure that no field use unnecessary heap memory so that reason its default field data true so no field can use unneccesary heap memory . so if you want to aggregate on field you need to fielddata:true on perticular field. which we can do by _mapping api * **Example** `curl -XGET 'localhost:9200/company/_mapping/employee/?pretty' -H 'Content-Type: application/json' -d' { "properties":{"gender":{"type":"text","fielddata":true}} }'` **aggregation** * **matric aggregation** * Aggregation over a set of documents * all document in the search result * document withing the logical group * multi-value stats aggregation * Its allow you to calculate multiple statistics in one request * example * sum, average, min, max, count, etc * **bucketing aggregation** * Logically group document based on the search query or term * A document falls into a bucket if the criteria match * Each bucket associated with the key * **matrix aggregation** * Operated on multiple fields and produces a matrix result * experimental and may change in a future release * **pipeline aggregation** * aggregation that work on other aggregate output **Aggregation example** ` Note - You can use full keyword aggregation or aggs as keyword for query time.` * **matric aggregation** * **Example** `curl -XGET 'localhost:9200/company/employee/_search?pretty' -H 'Content-Type: application/json' -d' { "size":0, "aggs":{"avg_age":{"avg":{"filed":"age"}}} }'` * if You pass stats instead of average then its give you a min, max ,average whole statistics in one request * **Example** `curl -XGET 'localhost:9200/company/employee/_search?pretty' -H 'Content-Type: application/json' -d' { "size":0, "aggs":{"avg_age":{"stats":{"filed":"age"}}} }'` **Note** - here size:0 pass because we dont want doc in result just want statiscs data like avg , min , max , count, total **Cardinality** - a unique filed in docs * **example** `curl -XGET 'localhost:9200/company/employee/_search?pretty' -H 'Content-Type: application/json' -d' { "size":0, "aggs":{"age_count":{"cardinality":{"field":"age"}}} }'` * **Bucket aggregation** * here you get result by each individual unique term and you do by **term** keyword its same as **groupby** in sql * **Example** * `curl -XGET 'localhost:9200/company/employee/_search?pretty' -H 'Content-Type: application/json' -d' { "size":0, "aggs":{"gender_bucket":{"terms":{"field":"gender"}}} }'`