--- title: ElasticSearch tags: ELK stack --- # ElasticSearch ![](https://i.imgur.com/ey9mS5o.png) https://godleon.github.io/blog/Elasticsearch/Elasticsearch-distributed-mechanism/ https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery.html#modules-discovery ## Core Concept - Discovery - 如何發現不同節點 - Quorum-based Decision Making - 如何選出一個 master node - Voting configurations - 如何控制投票規則 - Shard - 水平擴展 - Elasticsearch uses ==term frequency statistics== to calculate relevance, but these statistics correspond to ==individual shards==. Maintaining only a small amount of data across a many shards will tend to result in poor document relevance. ## Diffience against Lucene 1. Lucene is a Java library providing powerful indexing and search features. 2. Elasticsearch is JSON-based, Lucene-based 3. Elasticsearch implemented Lucene 4. each shard from Elasticsearch is an instance implemented by independent Lucene. 5. Elasticsearch provides thread pooling, queue scheduling, cluster, monitor, HTTP protocol, cluster management and so on. ## Shard Allocation https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-cluster.html#modules-cluster ### Master Node Makes Decision About 1. shard 應該要被配置到哪個節點 2. 什麼時候該移動節點上的 shard 以達到資源平衡 3. 如何復原失去的節點 4. 新增/刪除 節點 :::info 通常情況: 1. maximum 20 shards per 1GB of heap 2. keep the shard size from 1 GB to 5 GB for such indexes. 3. keep the average shard size between a few GB and a few tens of GB. - time-based data in the 20-40 GB range. 4. benchmarking using realistic queries and data 5. Production clusters should always have at least 2 replicas for failover. 6. There is no fixed limit on how large shards can be, but a shard size of 50GB is often quoted as a limit that has been seen to work for a variety of use-cases. 7. allocate shards with a factor of 1.5 to 3 times the number of nodes in your initial configuration. ::: :::danger overly large shards can negatively affect the ability of the cluster to recover from failure. This is because it takes more time to rebalance shards to a new node after the failure. ::: :::danger Your shard size may be getting too large if you’re discovering issues through the cluster stats APIs or encountering minor performance degradations. If this is the case, simply add a node, and ES will rebalance the shards accordingly. ::: ### Mission 1. 哪些 shards 應該分配到哪些 nodes 上 2. 哪個作為 primary shard & replica shard ### Component allocator: 尋找最優的節點來分配分片 deciders: 負責判斷並決定是否要進行分配 #### allocator ![](https://i.imgur.com/JAFTrOu.png) PrimaryShardAllocator: 找到擁有某 Shard 最新資料(主分片)的節點 ReplicaShardAllocator: 找到磁碟上擁有這個 Shard 資料(副本分片)的節點 BalancedShardsAllocator: 找到擁有最少 Shard 個數的節點 ### deciders ![](https://i.imgur.com/BBLab4d.png) ### Create Indices allocator 負責找出擁有分片數量最少的節點列表, 按分片數量遞增排序, 分片數量較少的會被優先選擇; 對於新建索引, allocator 的目標是以更為均衡的方式把新索引的分片分配到叢集的節點中 deciders 依次遍歷 allocator 給出的節點列表, 判斷是否要把分片分配給該節點, 比如是否滿足分配過濾規則, 分片是否將超出節點磁碟容量閾值等等; ### Index already exists allocator 對於主分片, 只允許把主分片指定在已經擁有該分片完整資料的節點上; 對於副本分片, 則是先判斷其他節點上是否已有該分片的資料的拷貝, 如果有這樣的節點, allocator 則優先把分片分配到這其中一個節點上 ### gazillion shards problem The number of shards a node can hold is proportional to the available heap space. As a general rule, ==the number of shards per GB of heap space should be less than 20.== ## change number of shards 1. reindex - 不用重啟 es 2. shrink index API :::success “A little overallocation is good. A kagillion shards is bad. It is difficult to define what constitutes too many shards, as it depends on their size and how they are being used. A hundred shards that are seldom used may be fine, while two shards experiencing very heavy usage could be too many.” ==shards 數量多比少好== ::: ## Shrink API https://www.elastic.co/guide/en/elasticsearch/reference/7.9/indices-shrink-index.html#indices-shrink-index > Shrinks an existing index into a new index with fewer primary shards. 1. 減少 primary shard 數量 2. 經常使用在不會寫入新資料的 index 上 3. ILM 中會在 warm phase 進行 shrink ### Prerequisites 1. index 是 read-only 2. 所有 primary shards 必須在同一個 node 上 3. index status 是 green ### Recommend 1. 先刪除 replica shards ``` PUT /index/_settings { "settings": { "index.number_of_replicas": 0 "index.routing.allocation.require._name": "shrink_node_name", "index.blocks.write": true } } ``` 2. 執行 shrink API (在裡面加入 replica shards) ## Reindex API 可以將過多的 index alias 到新的單一 index 上 i.e., 日 -> 月 ``` POST /_reindex { "source": { "index": "my-index-2099.10.*" }, "dest": { "index": "my-index-2099.10" } } ``` ## Force merge API https://www.elastic.co/guide/en/elasticsearch/reference/7.9/indices-forcemerge.html > Merge: 將多個在 disk 中的 segments 合併成一個。 - 減少 segment 數量 -> 搜尋效能提升 - 真正移除資料 1. Forces a merge on the shards of one or more indices. 2. For data streams, the API forces a merge on the shards of the stream’s backing indices. 3. Force merge 很佔用系統資源,最好是 idle 期間執行 4. Merge 可以減少每一個 shard 的 segments 數量, 同時釋放空間 (真正移除已經刪除的資料), 通常 merge 是自動的觸發的過程。 5. Indices that are read-only may benefit from being merged down to a single segment. 6. If you continue to write to a force-merged index then its performance may become much worse. :::warning Force merge should only be called against an index after you have finished writing to it. ::: :::warning Do not force-merge indices to which you are still writing, or to which you will write again in the future. ::: :::danger force merge API 可能會產生巨大的 segment (> 5GB) 若將來還會將資料寫入此 index, 以後自動觸發的 merge 機制將不會考慮 merge 這巨大的 segments (除非他們都在等待被刪除),it affect disk usage going too big. ::: ## Global Ordinals https://www.elastic.co/guide/en/elasticsearch/reference/current/eager-global-ordinals.html 若經常用來 aggregation 的 index, 可以設置 Global Ordinals, 提升搜尋效能。 ```= PUT my-index-000001/_mapping { "properties": { "tags": { "type": "keyword", "eager_global_ordinals": true } } } ``` 1. Global ordinals are a data structure that is used to ==optimize the performance of aggregations==. 2. For fields that are heavily used for bucketing aggregations. 3. They are calculated lazily and stored in the JVM heap as part of the field data cache. :::warning it will increase heap usage and can make refreshes take longer. ::: ## Cache ### filesystem cache https://www.elastic.co/guide/en/elasticsearch/reference/current/preload-data-to-file-system-cache.html If Elasticsearch is restarted, the filesystem cache will be empty. so it will take some time before the operating system loads hot regions of the index into memory so that search operations are fast. You can explicitly tell the operating system which files should be loaded into memory eagerly depending on the file extension using the `index.store.preload` setting. :::warning Loading data into the filesystem cache eagerly on too many indices or too many files will make search slower if the filesystem cache is not large enough to hold all the data. Use with caution. ::: ### query cache https://www.elastic.co/guide/en/elasticsearch/reference/current/query-cache.html The results of queries used in the filter context are cached in the node query cache for fast lookup. - One queries cache per node that is shared by all shards. - LRU eviction policy. - You cannot inspect the contents of the query cache. - ==Term queries== and ==queries used outside of a filter context== are not eligible for caching. - By default, the cache holds a maximum of 10,000 queries in up to 10% of the total heap space. The following setting is static and must be configured on every data node in the cluster: ```= indices.queries.cache.size ``` can be configured on a per-index basis. only be set at index creation time or on a closed index ```= index.queries.cache.enabled ``` :::info Caching is done on a per segment basis if a segment contains at least 10,000 documents and the segment has at least 3% of the total documents of a shard. Because ==caching is per segment==, merging segments can invalidate cached queries. ::: ### request cache https://www.elastic.co/guide/en/elasticsearch/reference/current/shard-request-cache.html ```= PUT /my-index-000001 { "settings": { "index.requests.cache.enable": false } } ``` :::info The cache is managed at the node level, and has a ==default maximum size of 1% of the heap==. This can be changed in the config/elasticsearch.yml file with: ``` indices.requests.cache.size: 2% ``` ::: #### Step 1. search request against an index or many indices. 2. each shard executes locally and returns its local results to the ==coordinating node==. 3. coordinating node combines these shard-level results into a global result set. #### Feature - The cache is enabled by default. - The shard-level request cache module caches the local results on each shard. - frequently used search requests return almost instantly. - very good fit for the logging use case. - update shard 時, cached results 都會消失 - By default, the requests cache will only cache the results of search requests where size=0, so it will not cache hits, but it will cache `hits.total`, `aggregations`, and `suggestions`. > size指的是回傳的結果預設為10個,size=0 is that it will not cache the exact results (i.e., number of documents with their score) but only cache the metadata like total number of results(which is hits.total) and other things. - Most queries that use now (see Date Math) cannot be cached. - ==Scripted queries that use the API calls which are non-deterministic, such as Math.random() or new Date() are not cached.== > 因此不建議使用 script fuinction - The longer the refresh interval, the longer that cached entries will remain valid even if there are changes to the documents. - If the cache is full, the least recently used cache keys will be evicted. ### Monitoring cache usuage ```= GET /_stats/request_cache?human ``` ```= GET /_nodes/stats/indices/request_cache?human ``` ## Tune for search speed https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html ## [xpack] Security settings ```= xpack.security.enabled ``` - By default, the Elasticsearch security features are disabled. - 需要安裝 third-party package inside node - enable anonymous access - message authentication - set up document and field level security - configure realms - encrypt communications with SSL - audit security events ## Network ### commonly used settings ```bash= network.host ``` > Sets the address of this node for both HTTP and transport traffic. The node will bind to this address and will also use it as its publish address. - Defaults to `_local_` (i.e., `127.0.0.1`) - 設為 `0.0.0.0` 允許所有 network interfaces - 兩個作用: 作為節點的 IP & 作為發布的 IP :::warning 以 bridge driver 來說,發布的位址會與節點的位址不同,因此要分開設定:`network.host` & `network.publish_host` 或是 `network.bind_host` & `network.publish_host` ::: `_local_` Any loopback addresses on the system, for example 127.0.0.1. `_site_` Any site-local addresses on the system, for example 192.168.0.1. `_global_` Any globally-scoped addresses on the system, for example 8.8.8.8. ```bash= http.port ``` > The port to bind for HTTP client communication. - a single value or a range - Defaults to `9200-9300` ```bash= transport.port ``` > The port to bind for communication between nodes. - a single value or a range - Defaults to `9300-9400` - master-eligible node ### discovery and cluster formation settings ```bash= discovery.seed_hosts ``` > Provides a list of the addresses of the master-eligible nodes in the cluster. - `["host:port","host",""...]` - IPv6 addresses must be enclosed in square brackets. - port 未提供則參考 `transport.profiles.default.port` 或 `transport.port` - default port is 9300 ```bash= discovery.type ``` > Specifies whether Elasticsearch should form a multiple-node cluster. - single-node: 單一節點的群集 ```bash= cluster.initial_master_nodes ``` > Sets the initial set of master-eligible nodes in a brand-new cluster. - By default this list is empty ## meta-field ### _routing 1. 添加資料時,會被分配到哪個 shard 2. 查詢數據時,怎麼取得特定數據 3. 支持多個 `_routing` ```bash= # default shard_num = hash(_routing) % num_primary_shards ``` :::success `_routing` default 為 `_id` 字段或為 `_parent` 字段。 ::: 目的是保證 shard 的數據量平均分配,查詢數據時(indexing)會遍歷所有 shard 的數據 (注意算分是 by each individual shard),最後匯總在 master node,由 master node 返回數據。 可能的挑戰: 1. 資料量大,分布均勻,平均搜尋時間長 2. 資料量大,分佈不平均,但仍遍歷所有 shards 解決方法: 1. 根據分不平均問題,假設是一年中的月份,可以針對月份做分區 > 搜尋時指定 `_routing`,而不會遍歷所有 shrads :::warning 若設置 `required : true` 但未在 index 設定指定路由會發生`routing_missing_exception` ``` { "mappings": { "doc":{ "_routing":{ "required": true } } } } ``` ::: ## Node Rule ### master-eligible node - master role - create/delete an index - eligible to be elected as the master node - 可以設置 voting-only, 僅參與投票, 但不當候選人 - track which nodes are part of the cluster - decide which shards to allocate to which nodes - `path.data` directory 保存了 ==cluster metadata== - how to read the data stored on the data nodes - 若遺失 cluster metadata 則會造成無法讀取 data node ### dedicated master node - 避免 master node overhead - only master role, 但還是得作為 coordinating node - dedicated master node 專注在管理 cluster - 適合一個 cluster 多個 nodes (有可能 overload) ```bash= node.roles: [ master ] ``` ### voting-only master-eligible node - 只有 master node 可以被標示為 `voting_only` - less heap and a less powerful CPU than the true master nodes. ```bash= node.roles: [ data, master, voting_only ] ``` - dedicated voting-only master-eligible node ```bash= node.roles: [ master, voting_only ] ``` :::success dedicated master node marked as voting-only 需要滿足: - fast persistent storage - reliable and low-latency network ::: ### data node - data role - hold the shards that contain the documents you have indexed. - hold data and perform data related operations such as CRUD, search, and aggregations. - I/O-, memory-, and CPU-intensive. - dedicated data node ```bash= node.roles: [ data ] ``` > The main benefit of having dedicated data nodes is the separation of the master and data roles. ### ingest node - ingest role - able to apply an ingest pipeline to a document in order to transform and enrich the document ==before indexing==. - With a heavy ingest load, it makes sense to use dedicated ingest nodes and to not include the ingest role from nodes that have the master or data roles. ### coordinating node :::success Every node is implicitly a coordinating node. 每個節點預設有 coordinating node 的功能 ::: :::warning node needs to have enough memory and CPU in order to deal with the gather phase. ::: 1. 接收 search requests or bulk-indexing requests 2. forward 到 data node 處理資料 3. data node 回傳 coordinating node > gather phase: coordinating node 整理成 result set 回給 client-side. > 這裡是指從各 shards 回來的結果來整理出 result set ## Data streams https://www.elastic.co/guide/en/elasticsearch/reference/master/data-streams.html#backing-indices Pattern: ``` .ds-<data-stream>-<yyyy.MM.dd>-<generation> ``` - A data stream lets you store ==append-only== time series data ==across multiple indices==. - Data streams are well-suited for logs, events, metrics, continuously generated data. - The stream automatically routes the request to backing indices that store the stream’s data. - ==index lifecycle management (ILM)== help automate the management of these backing indices. :::info ILM 可以做的事: - move old backing indices to less expensive hardware - delete unneeded indices - 適合應用在持續增長的資料 ::: ### backing indices A data stream consists of one or more hidden, auto-generated backing indices. ![](https://i.imgur.com/4eZbLoa.png) - A data stream requires a matching index template - template 包含: mappings, settings 配置 backing indices ```= "template": { "settings": { "index.lifecycle.name": "log-stream-policy" } } ``` - Every document indexed to a data stream must contain a ==@timestamp== field, mapped as a `date` field type. > Elasticsearch maps @timestamp as a date field with default options. - 同一個 index template 可以被多個 data streams 使用 :::danger CANNOT delete an index template in use by a data stream ::: ### Read requests data stream 接收到讀取需求後, 會去檢查所有 backing indices ![](https://i.imgur.com/mx8ysJ4.png) ### Write index 會寫入最近期的 backing index, 不過無法指定寫入過去的 backing indices ![](https://i.imgur.com/0hAb459.png) :::danger You also cannot perform operations on a ==write index== that may hinder indexing, such as: Clone, Delete, Freeze, Shrink and Split ::: ### Rollover A rollover creates a new backing index that becomes the stream’s new write index. :::success Recommend using ILM to automatically roll over data streams when the write index reaches a specified age or size. ::: :::warning If you frequently update or delete existing time series data, use an index alias with a write index instead of a data stream. 假設經常會修改 time-based data, 建議透過 index alias 取代 data stream ::: ## Rollover API https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html Creates a new index for a ==data stream== or ==index alias==. ``` POST /<rollover-target>/_rollover/ POST /<rollover-target>/_rollover/<target-index> ``` 以 index alias 為例子: ```bash= # 1. 先創建第一個 index, 透過 index alias 新增 rollover feature PUT /logs-000001 { "aliases": { "logs_write": {} } } # 2. 建立 rollover 觸發機制, 針對 logs_write 的 alias POST /logs_write/_rollover { "conditions": { "max_age": "7d", "max_docs": 1000 } } ``` 條件觸發後, 會持續向後新增 index logs-000002, logs-000003, ... 若要更改規則(suffix is number), 指定名字: ```bash= POST /logs_write/_rollover/{__name__} { "conditions": { "max_age": "7d", "max_docs": 1000 } } ``` https://www.elastic.co/guide/en/elasticsearch/reference/master/indices-rollover-index.html :::warning 建議會使用 ILM's rollover action to automate rollovers. ``` PUT /_ilm/policy/{policy_name} { "policy": { "phases": { "hot": { "actions": { "rollover": { "max_size": "2gb" } } } } } } ``` ::: #### Prerequisites 綁定 Rollover 的 Index 必須滿足 pattern > Index 名稱 最後一個字符必須為數字, i.e., logs-1 #### Trigger condition 1. Index Size 2. Document Number 3. Datetime :::success 觸發後, 建立新的 Index (write alias), 後續將沿用此設定 ::: ## Aliases API https://ithelp.ithome.com.tw/articles/10241035 - alias 提供 index 別名 - alias 搭配 filter 等同於 RDMBS 中的 view - 使用 alias 時,若會需要將資料透過 alias 來寫入,必預要明確的標示哪個 index 是 `is_write_index` ``` POST /_aliases { "actions": [ { "add": { "index": "my-index-000001", "alias": "alias2", "filter": { "term": { "user.id": "kimchy" } } } } ] } ``` ### Q: alias has more than one write index 起因: 建立了包含 alias 設定的 template ```bash= "aliases": { "log-alias": { "is_write_index": true } } ``` 導致在 logstash 建立重複的 alias 出錯 ```bash= output { elasticsearch { hosts => ["http://elasticsearch:9200"] index => "%{[@metadata][target_index]}" action => "create" # created for stream } } ``` ## Rollup API A rollup job is a periodic task that aggregates data from indices specified by an index pattern, and then rolls it into a new index. 適合用在未來做成報告或是可視化時, 呈現資料的演進過程。 > Kibana: Stack Management > Rollup Jobs ### pros - Save disk storage - Save memory cache space when searching - Save JVM heap memory when loading an Index - Enhance speed each query https://www.elastic.co/guide/en/elasticsearch/reference/current/data-rollup-transform.html ## Life Cycle of Elasticsearch ### Step 1. 資料寫入 Document 2. 資料進入 Index Buffer 3. **Refresh**: Index Buffer 寫入 Segment - 寫入 Segment 後, Index Buffer 會被清空 - 寫入頻率: `index.refresh_interval` - written when default JVM usage 10% - 寫入 Segment 的資料才能被搜尋到 (會先寫入cache, 這時候已經可以搜尋了, 接著 fsync:寫入 disk, then clean transaction log) - when indexing document, 同時會寫入 transaction log (寫入 disk) - transaction log 提供節點故障時恢復的依據 ## Index Lifecycle Management (ILM) ### Hot Node - 會處理 Indexing 的請求 ### Warm Node - 不處理 Indexing - 一半 JVM heap 拿來處理 query request ### Cold Node - query request 所產生的 transient cache 也會一用完就盡快的釋放 ![](https://i.imgur.com/oUa6CBr.jpg) ![](https://i.imgur.com/QvPcYX3.png) ## Inverted Index is immutable ### pros - 不需要考慮 "同時多個 document 寫入" - cache 容易產生 & 維護 - 資料可以被壓縮 ### cons - 引入新的 document 需要重建 index ## Quorum-based Decision Making https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-quorums.html ### quorum (多數派) 為了避免 split brain problem 作用在 a subset of the master-eligible nodes in the cluster. ### split brain problem > 一個群集被硬分成兩個群集, 此時會產生數據衝突 - partitioned into two pieces - each piece may make decisions - inconsistent with those of the other piece ### Prevention ```bash= discovery.zen.minimum_master_nodes=\ (master_eligible_nodes / 2) + 1 ``` :::success the number of eligible_nodes more or equal than (master_eligible_nodes / 2) + 1 ::: :::info High availability (HA) clusters require at least 3 master-eligible nodes, at least two of which are not voting-only nodes. ::: https://blog.csdn.net/zuodaoyong/article/details/104719508 ## voting configuration - 存在於 master-eligible node - 屬於 cluster state 的一部分 - cluster state 包含: - cluster info - routing table - metadata (mapping & settings from index) - node ### Trigger voting 1. 當前 master-eligible node 不是 master 2. 當前 master-eligible node 與其他 nodes 溝通找不到 master 3. cluster 連結到 master-eligible node 的數量 < `discovery.zen.minimum_master_nodes` ### Example: Modify state of Cluster 1. master node start to change cluster state 2. master node broadcast the modification of cluster state 3. wait for confirm from nodes amid cluster 4. master node 在 timeout 之前獲得多數 master-eligible 的 response ? - Yes, next step - No, master failed, restart election 5. master node make commit for this modification, and broadcast to all nodes 6. nodes receive notification and afterward update locally, send response to master node 7. wait for confirm from nodes among cluster - timeout, prepared for the next cluster state change - else, done ## Zen Discovery & Gossip Algorithm ### Zen Discovery - implemented by Gossip Algorithm - time complexity: $O(ln(N))$ - N denotes the number of nodes. - `discovery.zen.fd.ping_timeout` - whenever traffic between hosts measured. - `discovery.zen.ping_timeout` - when electing master node. ### Gossip Algorithm Assume both node-A and node-B exist, considering three conditions: 1. A push to B ```bash= A -[DigestA]-> B B -[DigestAB]-> A A -[updated A-B]-> B ``` 2. A pull from B ```bash= A -[DigestA]-> B B -[updatedBA]-> A ``` 3. A pull from B and push to B ```bash= A -[DigestA]-> B B -[updatedBA && DigestAB]-> A A -[updated A-B]-> B ``` ![](https://i.imgur.com/q6b4590.png) > Push&Pull 能更快傳遞到每一個節點 ## Shard ### Feature - shard is an unit of ES - primary shard & replica shard - Data stored over shards - each shard implemented as Lucene instance/index - ==由 master node 決定 shard 會被分配到哪個 data node== - a thread each shard :::info index in Lucene means "a collection of segments plus a commit point" ::: ### Primary Shard: 提升系統儲存容量 - intend to distribute indexed data to a variety of nodes. - scalability: horizontal expansion - primary shard 的數量在建立 index 時就會指定,後續是無法修改的,若要修改就必須要進行 reindex(or shrink API) :+1: Problem: primary shard 數量設定太小 1. 資料增加太快, cluster 無法進行 horizontal expansion.(必須reindex) 2. while Data each shard is more and more bigger, it's a time-consuming distribution. :+1: Problem: primary shard 數量設定太大 1. 一個 data node 會包含過多 shards, 影響效能 2. 搜尋相關性算分在統計時會失準 ### Replica Shard: 提高資料可用性 - replica shard 用來提供資料高可用性,當 primary shard 遺失時,replica shard 就可以被 promote 成 primary shard 來保持資料完整性 - replica shard 數量可以動態調整,讓每個 data node 上都有完整的資料 - replica shard 可以一定程度的提高讀取(查詢)的效能 :+1: Ploblem: replica shard 數量設定太大 1. 降低寫入效能(每個節點都需要寫入) :::danger replica shard 必須和 primary shard 被分配在不同的 data node 上 但所有的 primary shard 可以在同一個 data node 上 ::: ### Shard Effect 計算 inverse document frequency 是基於當前 shard 的總資料數,因此會造成分數在每個 shard 會不同。 解決辦法: #### Document routing https://www.elastic.co/blog/customizing-your-document-routing 確保同一個 index 會根據特定的 field 被分到同一個 shard, 例如日期 作用時機會在 search request 的 index time #### Search type https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-search-type.html 作用在 scatter/gather type execution 期間 (scatter 給所有 shards, then gathered back) 1. how much results to retrieve from each shard ? 2. when a query is executed on a specific shard, rather than all shards. For large result set scrolling, it is best to sort by `_doc` ==if the order in which documents are returned is not important.== Search type lets you specify ==an order of events== you want the search to perform. It will query all the shards to get the frequencies distributed across them, then perform the calculations on the matching documents. :::success Elasticsearch is very flexible and allows to control the type of search to execute on a ==per search request basis==. ::: :+1: query_then_fetch (By default) - two steps 1. query is forwarded to all involved shards and returned to the coordinating node 2. the coordinating node requests the document content (and highlighted snippets, if any) from only the relevant shards. :+1: dfs_query_then_fetch - Same as "Query Then Fetch" - initial scatter phase which goes and computes the distributed term frequencies for more accurate scoring. - 不太理解會如何計算 term frequencies ## Why elasticsearch near real-time search ? ![](https://i.imgur.com/jnWld43.png) ### Index Refresh ```bash= writing document to ES writing to the Index Buffer writing to OS cache as a segment from the Index Buffer (Refresh) ``` - 每秒一次 (near real-time) - 當 document 被 refresh 進入到 segment 之後,就可以被搜尋 - 若系統有大量的資料寫入,就會產生很多 segments - 當 Index Buffer 被佔滿時,也會觸發 refresh,預設值是 JVM 的 10% - After a commit, a new segment is added to the commit point and segment will be clarified. - (Elasticsearch) -[filesystem cache]-> (disk). - Lucene allows new segment to be written and opened, making the documents they contain visible to search without performing a full commit. ### transaction log (translog) - 5 秒一次 - 可以將 translog 設置成每次寫操作必須是直接 fsync 到磁碟,但是性能會差很多 - 每個 shard 都會有對應的 transaction log - transaction log is written on disk - Recovery: 當 node 從故障中恢復時,就會優先讀取 transaction log 來恢復資料 ### Flush (commit) ```= refresh, clean the index buffer fsync, writing segment from cache to the disk clean the transaction log ``` - by dafault 30 minunte a time - by default maximum size: 512 MB :::warning `Fresh`: index written to a segment `Flush`: segments written to the disk ::: ### Merge > 將多個在 disk 中的 segments 合併成一個 1. 減少 segments 數量 2. 真正移除資料 ## Distributed Search: Query-then-Fetch ### Query Phase 1. 使用者送出 search 到 Elasticsearch,Coordinating Node 會在所有 primary & replica shard 中隨機挑選 n 個 shard 並送出 request 給 data node 2. 被選中的 shard 執行查詢 & 排序,返回 From + Size 個排序後的 document ID & 排序值給 Coordinating Node :::success 目前只有取得 document ID,沒有內容 ::: ### Fetch Phase 1. Coordinating Node 會將 Query 階段中從每個 shard 取得的 document ID 重新排序,並根據 From & Size 重新選出 document ID list 2. 以 multiple GET 的方式,從對應的 shard 取得詳細的 document 資訊 ### Performance - 每個 shard 需要查詢的 document 數量 = from + size - 最後 Coordinating Node 需要處理 number_of_shard * (from + size) 數量的 document - 若是遇到深度分頁的情況,效能會變很差 - 每個 shard 都基於自己 shard 上的資料進行相關度計算;若是在資料量少,shard 數量越大會導致算分越不準確 ## DSL (Domain Specific Language) https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl.html#query-dsl ### Leaf query clauses > look for a particular value in a particular field > i.e., match, term and range #### match Returns documents that match a provided text, number, date or boolean value. #### term extactly matching #### range Returns documents that contain terms within a provided range. ### Compound query clauses > wrap other leaf or compound queries and are used to combine multiple queries ### Relevance scores - `_score` metadata field - Query Context & Filter Context 會影響計算結果 - Query Context - 計算此筆 query 和其他資料的關聯分數, 即使沒有 match 也會計算分數 - Filter Context - Filter context is mostly used for filtering structured data - Does this timestamp fall into the range 2015 to 2016? - Is the status field set to "published"? - Frequently used filters will be cached automatically by Elasticsearch, to speed up performance. - bool { "filter": ..., } - Boosting query > Returns documents matching a positive query while reducing the relevance score of documents that also match a negative query. - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-boosting-query.html - Constant score query > Wraps a filter query and returns every matching document with a relevance score equal to the boost parameter value. - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-constant-score-query.html#query-dsl-constant-score-query - Function score query (IMPORTANT) > The function_score allows you to modify the score of documents that are retrieved by a query. - https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#query-dsl-function-score-query ## Real world simulation ### 客戶端發起資料寫入請求,對你寫的這條資料根據 _routing 規則選擇發給哪個 shard 1. client send a request for writing data. 2. check `_routing` for a specific shard. 3. check `fields` within Index Request. 4. if no `fields`, check `mapping` within configuration 5. if no `mapping`, check `_id` as the route parameter. 6. through hash(`_routing` ) to decide which shard. 7. send request to shard through primary node. ## Deep Pagination ### Scenario 假設查詢: From=990&Size=10, 在每個 shard 中取得 1000 個 document,然後 coordinating node 會整合所有結果,最後透過排序選取前 1000 個 document 這表示所有資料都會經過 retrieved、collected、sorted 三個階段,資料量大時其實消耗很多資源 頁數越深,佔用的 memory 越多;而為了避免記憶體耗用過大,Elasticsearch 預設限制到 10,000 個 document(可透過修改 index.max_result_window 來調整) ### Prevention 1. 限制搜尋資料筆數上限來避免此問題 2. search_after 3. Scroll API ## Datatypes of text and keyword https://kb.objectrocket.com/elasticsearch/when-to-use-the-keyword-type-vs-text-datatype-in-elasticsearch | dataType | utility | description | | -------- | -------- |--------| | text | analyzed at the time of indexing |broken down into their individual terms at indexing to allow for partial matching | | keyword | None | indexed as it be | ### Difference between them #### text > keyword 1. 可以被拆成多個terms > i.e., invited to Jenny's banquet => [invited, to, Jenny's, banquet] 2. fuzzy matches 2. 應用場景: product descriptions #### keyword > text 1. 可以被 sorted alphabetically 2. exact matches 3. 應用場景: state, email address ## Heap Memory Usage ### heap memory usage is too high - Oversharding - Large aggregation sizes ## Normalizer vs Analyzer | | Normalizer | Anaylzer | | -------- | -------- | -------- | | field | keyword field | text field | |activated| prior to indexing and search |indexing and search | Normalizer 會在 Anaylzer 之前處理 ## Anaylzer 中文分詞器: https://github.com/medcl/elasticsearch-analysis-ik ### `ik_max_word` vs `ik_smart` | | ik_max_word | ik_smart | | -------- | -------- | -------- | | 細粒度 | fine-grained | coarse-grained | | 使用時機 | index time | search time | :::info Text analysis occurs at two times: 1. index time > When a document is indexed, ==any text field== values are analyzed, it called index analyzer 2. search time > When running a ==full-text search== on a ==text field==, ==the query string (the text the user is searching for)== is analyzed. > Search time is also called query time, it called search analyzer https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-index-search-time.html ::: ### `Standard` versus `ik_max_word` | | standard | ik_max_word | | -------- | -------- | -------- | | type | \<IDEOGRAPHIC> | CN_CHAR | |特殊符號(!,?,...)|filter out | filter out| |斷詞|character-level|word-level, 會自主斷詞| ## copy-to https://www.elastic.co/guide/en/elasticsearch/reference/current/copy-to.html 可以將多個 fields 合併成一個 field ```= PUT my-index-000001 { "mappings": { "properties": { "first_name": { "type": "text", "copy_to": "full_name" }, ... ``` :::warning copy-to is not supported for field types where values take the form of objects, e.g. date_range ::: :::warning 若合併為一個 field 對於搜尋結果會有不同的結果,例如使用 multi_match 找得到樣本,但改用 match 針對一個 field 則找不到。 ::: ## mappings ### dynamic_templates 使用時機 1. 同一個欄位中有多種資料型態 > 原本是屬於 date 的 field, 遇到中文就糟糕了 2. 將內容轉換 vector > 若將 vector 直接讓 ES 自動生成 type, ES 會歸類在 float, float 無法轉成 vector 提供計算, 就是算是事後透過 escli 也無法硬轉 運作方式 Sequentially acivate the first match ```bash= { "mappings": { "dynamic_templates": [ { # temaplate identifier (customized) "template_id": { # applied on which filed name # path_match: exact path of field "match": "*_es", # applied on what kind of field "match_mapping_type": "string", # you expected to convert "mapping": { "type": "string", "analyzer": "standard" } } }] } } ``` ## elasticsearch-analysis-ik https://github.com/medcl/elasticsearch-analysis-ik - 處理中文斷詞 - ch-lang tokenizer ### takeaway 1. 更新 `IKAnalyzer.cfg.xml` 內容時, 必須 restart elsticsearch :::success 可以採用 hot updating, 可以避免 restart ``` <!--用户可以在这里配置远程扩展字典 --> <entry key="remote_ext_dict">dict url</entry> <!--用户可以在这里配置远程扩展停止词字典--> <entry key="remote_ext_stopwords">dict url</entry> ``` - 需要 headers: Last-Modified & ETag where type are string,有變化即更新 - 一行一個分词,换行符 \n ::: ## Match https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html 最基本的搜尋手段 搜尋對象為 text field (analyzer 處理) query 會在 search time 產生 tokens, 但 default adopt index-time analyzer. :::danger 但是若有設置 `default_search` 則會取 `default_search` 定義的 analyzer 而非 `default`。 ::: ## `query_string` vs `multi_match` ==query_string== supports Lucene syntax to interpret the text ==multi_match== just attempts to match the given "text" against the listed fields' indexed values. ## Validation Methods ### analyzer ```= GET /_analyze { "analyzer" : "standard", "text" : "台北平溪放天燈" } ``` ### explain ```= GET /turtour_plan/_explain/211672 { "query":{ "match": { "description": "台北加碼GO專案" } } } ``` ### highlight ```= GET /turtour_plan/_search/ { // "explain": true, "_source": { "excludes": [ "feature_vector" ] }, "query":{ "match": { "description": { "query": "台北加碼GO專案" //"analyzer": "standard" } } }, "highlight" : { "pre_tags" : ["<em>"], "post_tags" : ["</em>"], "fields" : { "description" : {} } } } ``` ## When to used wildcard search ? https://www.elastic.co/blog/find-strings-within-strings-faster-with-the-new-elasticsearch-wildcard-field ![](https://i.imgur.com/RWfaYSF.png) ## Tune the search speed https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-search-speed.html#_document_modeling ## Circuit Breaker https://segmentfault.com/a/1190000020929897 目的是避免操作引起的 out of memory 可以指定什麼操作使用多少的 memory ### Scenario ``` Data too large, data for [xxx] would be larger than limit ``` ### Solution 1. 清除 index cache ```bash= curl -XPOST 'index/_cache/clear' ``` 2. 設置 circuit breaker limit ```bash= # default indices.breaker.total.limit: 70% ``` ### parent circuit breaker ```bash= # default indices.breaker.total.limit: 70% indices.breaker.total.use_real_memory: true ``` ==indices.breaker.total.use_real_memory: false== => default 為 70% of the JVM heap. ==indices.breaker.total.use_real_memory: true== => 會使用 95% of the JVM heap. ### Field data circuit breaker 多少數據會佔用 memory default 60% of JVM heap ### Request circuit breaker 接受的請求佔用 memory default 60% of JVM heap ### In flight requests circuit breaker 請求中的 memory 佔用率 default 100% of JVM heap :::success JVM heap 最大值可以和 parent circuit breaker 相同。 ::: ### Accounting requests circuit breaker [不太懂] 允許 Elasticsearch 限制内存中保存的内容的使用量,这些内容在请求完成时未释放 default 100% of JVM heap ### Script compilation circuit breaker 不同於觀察 memory 的 circuit breaker,限制了單位時間內編譯指令的數量。 ```bash= script.max_compilations_rate ``` default 75 ipc / 5 mins (每五分鐘最多執行 75 個指令) ## Practical Scoring Function https://www.elastic.co/guide/en/elasticsearch/guide/current/practical-scoring-function.html https://www.compose.com/articles/how-scoring-works-in-elasticsearch/ The scoring of a document is determined based on the field matches from the query specified and any additional configurations you apply to the search. $Score(q, d)= \\ queryNorm(q)*\\ coord(q,d)*\\ \Sigma(tf(t\ in\ d)*idf(t)^2*t.getBoost()*norm(t, d))(t\ in \ q)$ ---- $score(q,d)$ is the relevance score of document d for query q. $queryNorm(q)$ is the query normalization factor. > normalize a query so that the results from one query may be compared with the results of another. $coord(q,d)$ is the coordination factor. > multiply (matching terms in the document) / (total number of terms in the query) $tf(t\ in\ d)$ is term frequency within a document. $idf(t)$ is Inverse document frequency. > 是會因為 sharding 而改變的分數: $idf(t) = 1 + log ( numDocs / (docFreq + 1))$ :::warning when our query found a match to our document, it counted the number of documents found on that particular shard for use in the inverse document frequency calculation. ::: :::success a document found on a shard with ==more total documents== would be scored lower than a document on a shard with less total documents. a document found on a shard with ==more additional matching documents== would be scored lower than one found on a shard with lower or no additional matching documents. ::: $t.getBoost()$ default as 2.2 $norm(t,d)$ is the field-length norm. > How long is the field? The shorter the field, the higher the weight. :::info [補充] boost 可以設置 positive & negative & megative_boost (0, 1) “Returns documents matching a positive query while reducing the relevance score of documents that also match a negative query.” ::: :::success 若想忽略 idf 的分數影響,可以採用 Constant score query https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-constant-score-query.html#query-dsl-constant-score-query ::: ```= GET /turtour_plan/_search/ { "explain": true, "_source": { "excludes": [ "feature_vector" ] }, "query":{ "match": { "description": "台北" } } } ``` ### Why boost added of 2.2 ? 預設未調整算法機制的情況下, boost 的值是 2.2 https://github.com/elastic/elasticsearch/issues/42625 ### 存在 term 但搜尋不到的情況 ? 原因是 tokenizer 的選擇, search time 切出來的 tokens 從 index time 切出來的tokens 對應不上。 i.e., 天燈 search time > 天燈 -> [天燈] index time > 天燈 -> [天, 燈] refferring to official document :::info Usually, the same analyzer should be applied at index time and at search time, to ensure that the terms in the query are in the same format as the terms in the inverted index. ::: ### masking sensitive data ```= "_source": { "excludes": [ "feature_vector" ] } ``` ## Cross-cluster replication (CCR) CCR provides a way to automatically synchronize indices from your primary cluster to a secondary remote cluster that can serve as a hot backup. If the primary cluster fails, the secondary cluster can take over. You can also use CCR to create secondary clusters to serve read requests in geo-proximity to your users. - Cross-cluster replication is active-passive - The index on the primary cluster is the active leader index and handles all write requests. - Indices replicated to secondary clusters are read-only followers. # [X-Pack] Elasticsearch SQL > Elasticsearch SQL is an X-Pack component that allows SQL-like queries to be executed in real-time against Elasticsearch. ```= POST /_sql?format=txt { "query": "SELECT * FROM library WHERE release_date < '2000-01-01'" } ``` ``` author | name | page_count | release_date ---------------+---------------+---------------+------------------------ Dan Simmons |Hyperion |482 |1989-05-26T00:00:00.000Z Frank Herbert |Dune |604 |1965-06-01T00:00:00.000Z ``` - Elasticsearch SQL will do its best to preserve the SQL semantic and, depending on the query, ==reject those that return fields with more than one value.== - # Reference ## Elastic Offical Materials https://www.elastic.co/ ## PostgreSQL vs Elasticsearch about selection https://testdriven.io/blog/django-drf-elasticsearch/