索引 - HackMD

###### tags: `MongoDB` # 索引閱讀前可先參考[索引原理](https://ithelp.ithome.com.tw/articles/10244479) 索引是一種加快查詢速度的優化方法，它使用「B-tree結構」將資料先行排序後儲存，當需要查詢資料時，從排序過的資料尋找會比起散落的資料更容易。以生活化的例子作為比喻，排序資料的行為如同整理東西，經過整理後的東西會比散落四處的東西更好找。但不是在任何情況下，使用索引就一定能提升效能，而且索引必須正確地建立才會被套用，否則索引是無效的，以下列出一些優缺點： - 優點：加快查詢、加快排序與降低排序的內存消耗 - 缺點：會佔據內存空間、降低新增、修改、刪除的操作速度、使用不當反而造成查詢變慢 ## 索引的使用時機考慮加入索引的時機： 1. 當搜尋速度很慢時 2. 當資料經常需要排序時（若排序的資料量超出32MB時，MongoDB會發生錯誤）提取較小的子數據時，索引非常高效，但某些情況下不使用索引會更快例如：結果的資料量在原集合中所佔的比例越大，索引的速度就越慢因為使用索引需要進行兩次查找（一次是查找索引條目、一次是根據索引指標去查找相應的檔案）而全表掃描只需要進行一次查找，因此在最差的情况下（返回集合内的所有文档），使用索引進行的查找次數會是全表掃描的兩倍 | 如何選擇索引或全表掃描 | 索引 | 全表掃描 | | -------- | -------- | -------- | | 目標集合經常操作的語法 | 查詢 | 新增、修改、刪除 | | 目標集合與內部的單一檔案大小 | 大 | 小 | | 結果的資料量在原集合所佔比例 | 小 | 大 | ## 索引對排序的影響假設查詢的資料需要進行排序，在不使用索引的情況下，資料會在RAM中進行排序，而排序使用的RAM上限為32MB，若超過上限MongoDB會拒絕進行排序，此時必須加入索引以降低RAM消耗，因此，使用索引的排序操作，通常會有更好的效能。 :::success [官方原文](https://docs.mongodb.com/v3.0/tutorial/sort-results-with-indexes/) In MongoDB, sort operations can obtain the sort order by retrieving documents based on the ordering in an index. If the query planner cannot obtain the sort order from an index, it will sort the results in memory. Sort operations that use an index often have better performance than those that do not use an index. In addition, sort operations that do not use an index will abort when they use 32 megabytes of memory. ::: ## 索引類型 ### 單一欄位 ### 組合索引組合索引的順序不同，會有不同的效率與此相似的概念是[索引交集（Index Intersection）](https://docs.mongodb.com/manual/core/index-intersection/) ### 多重鍵值索引 ### 文字索引正規表達式與文字索引的比較，寫問題：文字索引匹配時一定要用 $text跟$search嗎？一般的regex沒辦法命中索引文字組合索引的三種限制前面是單欄，後面是文字，這樣無法搜尋，不知道為什麼？限制： 1. 在一個集合內只能有一組文字索引，若嘗試建立多個文字索引會產生以下錯誤訊息 ``` "errmsg" : "only one text index per collection allowed..." ``` 2. 刪除文字索引時要使用索引的名稱：[Use the Index Name to Drop a text Index](https://docs.mongodb.com/manual/tutorial/avoid-text-index-name-limit/#use-the-index-name-to-drop-a-text-index) ### 地理空間索引 ### 雜湊索引 ### 其他附加屬性唯一性、稀疏、背景建立索引、存活時間索引值 - 1：遞增排序 - -1：遞減排序索引相關指令： ``` db.user.createIndex({field:value}) //建立索引 db.user.dropIndex({field:value}) //刪除索引 db.user.getIndexes() //取得索引資訊 ``` https://www.runoob.com/mongodb/mongodb-indexing.html ## 實戰演練 ### 建立資料首先建立龐大的資料，這樣查詢速度會比較有感 #### Step1：開啟終端機，輸入mongo指令，以啟動mongo shell #### Step2：選擇資料庫 ``` use ntut ``` #### Step3：使用for迴圈產生50萬筆資料 ``` for(var i = 0; i < 500000; i++){ db.score.insert( { name: "name"+i, math: NumberInt(_rand() * 100) } ) } ``` ### 資料庫分析 #### Step1：開啟資料庫分析 ``` db.setProfilingLevel(2) ``` #### Step2：執行查詢 ``` db.score.find({math:{$gt:60,$lt:80}}) ``` #### Step3：查看資料庫分析結果 ``` db.system.profile.find().pretty() ``` ### 未加入索引時的效能使用find查詢介於60~80分之間的學生並搭配explain("executionStats")顯示查詢的執行狀態（不會有查詢結果） ``` db.score.find({math:{$gt:60,$lt:80}}).explain("executionStats") ``` <details> <summary>終端機會顯示這樣的結果</summary> ``` { "queryPlanner" : { "plannerVersion" : 1, ... "winningPlan" : { "stage" : "COLLSCAN", ... }, "rejectedPlans" : [ ] }, "executionStats" : { "executionSuccess" : true, "nReturned" : 95591, "executionTimeMillis" : 273, "totalKeysExamined" : 0, "totalDocsExamined" : 500000, "executionStages" : { "stage" : "COLLSCAN", ... } }, ... } ``` ==executionStats內的資料== nReturned：符合的資料量，這次有95591筆資料符合 executionTimeMillis：執行時間，這次花費273毫秒 totalKeysExamined：查詢時掃描的索引數量，因為尚未加入索引，所以為0 totalDocsExamined：MongoDB需要讀取多少筆資料來找到符合的資料，因為尚未加入索引，所以要讀取所有資料，即50萬筆 executionStages.stage：執行策略，因為沒有匹配的索引，所以是COLLSCAN模式 </details> ### 加入索引時的效能 #### Step1：加入索引 ``` db.score.createIndex({math:1}) ``` <details> <summary>終端機會顯示這樣的結果</summary> ``` { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 } ``` numIndexesBefore：執行指令前的索引數量，這裡為1，因為預設會建立 ==\_id== 的索引 numIndexesAfter：執行指令後的索引數量，這裡為2，因為建立了一個math欄位遞增排序的索引 </details> #### Step2：執行查詢使用find查詢介於60~80分之間的學生並搭配explain("executionStats")顯示查詢的執行狀態（不會有查詢結果） ``` db.score.find({math:{$gt:60,$lt:80}}).explain("executionStats") ``` <details> <summary>終端機會顯示這樣的結果</summary> ``` { "queryPlanner" : { "plannerVersion" : 1, ... "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", ... } }, "rejectedPlans" : [ ] }, "executionStats" : { "executionSuccess" : true, "nReturned" : 95591, "executionTimeMillis" : 158, "totalKeysExamined" : 95591, "totalDocsExamined" : 95591, "executionStages" : { "stage" : "FETCH", ... "inputStage" : { "stage" : "IXSCAN", ... } } }, ... } ``` ==executionStats內的資料== nReturned：符合的資料量，這次有95591筆資料符合 executionTimeMillis：執行時間，這次花費158毫秒 totalKeysExamined：查詢時掃描的索引數量，這裡為95591筆 totalDocsExamined：MongoDB需要讀取多少筆資料來找到符合的資料，因為加入索引，所以僅讀取95591筆 executionStages.inputStage.stage：執行策略，因為有匹配索引，所以是IXSCAN模式 </details> ### 建立兩種不同順序的組合索引此範例要實作不同順序的組合索引會造成的影響我們會分別建立兩種組合索引，一種是以math為優先，另一種是以name為優先查詢資料的條件是math介於60~80，且name開頭為name1 ==以理論來分析效能== math介於60~80的資料共有95591筆（以前面範例可得知） name開頭為name1的資料會有name1、name10至name19、name100至name199...，共有111111筆（1+10+100+1000+10000+100000）以math為優先的組合索引會先查詢到95591筆資料之後，再去找name欄位包含name1的資料，最差情況是全部都符合，所以找了95591筆以name為優先的組合索引會先查詢到111111筆資料之後，再去找math欄位介於60~80的資料，最差情況是全部都符合，所以找了111111筆所以理論上，以math為優先的組合索引效能會比較好 #### Step1：加入兩種組合索引 ``` db.score.createIndex({math:1, name:1}) db.score.createIndex({name:1, math:1}) ``` #### Step2：執行查詢，指定{math:1, name:1}為索引使用find查詢介於60~80分之間的學生，且名字開頭為name1 使用hint指定{math:1, name:1}為索引最後搭配explain("executionStats")顯示查詢的執行狀態（不會有查詢結果） ``` db.score.find({math:{$gt:60,$lt:80}, name:{$regex:/^name1/}}).hint({math:1, name:1}).explain("executionStats") ``` <details> <summary>終端機會顯示這樣的結果</summary> ``` { "queryPlanner" : { "plannerVersion" : 1, ... "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", ... } }, "rejectedPlans" : [ ] }, "executionStats" : { "executionSuccess" : true, "nReturned" : 21200, "executionTimeMillis" : 41, "totalKeysExamined" : 21221, "totalDocsExamined" : 21200, "executionStages" : { "stage" : "FETCH", ... "inputStage" : { "stage" : "IXSCAN", ... } } }, ... } ``` ==executionStats內的資料== nReturned：符合的資料量，這次有21200筆資料符合 executionTimeMillis：執行時間，這次花費41毫秒 totalKeysExamined：查詢時掃描的索引數量，這裡為21221筆 totalDocsExamined：MongoDB需要讀取多少筆資料來找到符合的資料，因為加入索引，所以僅讀取21200筆 executionStages.inputStage.stage：執行策略，因為有匹配索引，所以是IXSCAN模式 </details> #### Step3：執行查詢，指定{name:1, math:1}為索引使用find查詢介於60~80分之間的學生，且名字開頭為name1 使用hint指定{name:1, math:1}為索引最後搭配explain("executionStats")顯示查詢的執行狀態（不會有查詢結果） ``` db.score.find({math:{$gt:60,$lt:80}, name:{$regex:/^name1/}}).hint({name:1, math:1}).explain("executionStats") ``` <details> <summary>終端機會顯示這樣的結果</summary> ``` { "queryPlanner" : { "plannerVersion" : 1, ... "winningPlan" : { "stage" : "FETCH", "inputStage" : { "stage" : "IXSCAN", ... } }, "rejectedPlans" : [ ] }, "executionStats" : { "executionSuccess" : true, "nReturned" : 21200, "executionTimeMillis" : 177, "totalKeysExamined" : 111112, "totalDocsExamined" : 21200, "executionStages" : { "stage" : "FETCH", ... "inputStage" : { "stage" : "IXSCAN", ... } } }, ... } ``` ==executionStats內的資料== nReturned：符合的資料量，這次有21200筆資料符合 executionTimeMillis：執行時間，這次花費177毫秒 totalKeysExamined：查詢時掃描的索引數量，這裡為111112筆 totalDocsExamined：MongoDB需要讀取多少筆資料來找到符合的資料，因為加入索引，所以僅讀取21200筆 executionStages.inputStage.stage：執行策略，因為有匹配索引，所以是IXSCAN模式 </details> ## 參考資料 [索引介紹（Index Introduction）](https://docs.mongodb.com/v3.0/core/indexes-introduction/) [索引策略（Indexing Strategies）](https://docs.mongodb.com/v3.0/applications/indexes/) [MongoDB系列輕鬆應對面試中遇到的MongonDB索引(index)問題](https://codertw.com/%E7%A8%8B%E5%BC%8F%E8%AA%9E%E8%A8%80/660096/) [MongoDB 索引限制](http://www.w3big.com/zh-TW/mongodb/mongodb-indexing-limitations.html) [MongoDB學習之Text Search文字搜尋功能](https://www.itread01.com/article/1493690665.html) [【MongoDB】文本索引（Text Indexes）](https://blog.csdn.net/chechengtao/article/details/106679784) [4.2的新索引Wildcard Indexes](https://docs.mongodb.com/manual/core/index-wildcard/)