要快就要少 - PostgreSQL Partial Index

# 要快就要少 - PostgreSQL Partial Index ## 背景我們有張 table `order_history` 記載著每筆訂單的付款紀錄，某天收到了一份需求，要可以很快速地取得當前付款失敗的訂單記錄，公司才可以派出債務協商業務去把錢給討回來。 **限制條件**：無法重構這張表的設計，也沒有權限可以開立新表，只能針對查詢做修改，或者是增減index。 *orderhistory* | columnn | type | | ------- | ---- | | id | int | | user_id | int | | success | boolean | ## 建置使用 docker PostgreSQL ```sh docker run -d --name demo-postgres -p 5432:5432 -e POSTGRES_PASSWORD=postgres postgres ``` user_name: postgres password: postgres port: 5432 *init.sql* ```sql CREATE TABLE public.order_history ( id int NOT NULL, user_id int NOT NULL, success boolean NOT NULL DEFAULT false, CONSTRAINT order_history_pk PRIMARY KEY (id) ); --- 新增3百萬筆訂單測試資料 insert into order_history (id, user_id, success) select g.id, floor(random() * 100000), random() < 0.9 from generate_series(1,3000000) as g(id); --- 預設index CREATE INDEX order_history_success_idx ON public.order_history (id, success); ``` ## 小試身手為了撈取目前欠款的訂單記錄，我們先撰寫出了第一版的 SQL 如下 ```sql select id from order_history oh where oh.success is false; ``` 再來我們看一下這句 SQL 的查詢效率如何 ```sql explain analyze select id from order_history oh where oh.success is false; ``` 有使用到 index (success)，速度約 `95.108 ms` ```log QUERY PLAN | --------------------------------------------------------------------------------------------------------------------------------------------------------------+ Index Only Scan using order_history_success_idx on order_history oh (cost=0.43..58387.43 rows=297500 width=4) (actual time=0.044..84.446 rows=300804 loops=1)| Index Cond: (success = false) | Heap Fetches: 0 | Planning Time: 0.165 ms | Execution Time: 95.108 ms | ``` 然而，老闆對於這個查詢的速度非常的不滿意，並且要求你立即改善。 ## 思考 > 會欠錢的人是少數我們先意識到了這件事情！ ```sql > select count(1) from order_history oh where oh.success is false; > 300804 ``` 我們全部有的紀錄是 3M 筆，而欠款的人是 30萬，占比約 10%。 Index 雖然有加速查詢的效果，但是 index 中如果存在一些我們根本不需要的資料，那就是累贅。業務場景中只關心那些欠錢的人，而不會需要查詢付款成功的訂單記錄，那有沒有方式可以只幫這些資料建立 index？ ## Partial Indexs [Partial Indexes](https://www.postgresql.org/docs/current/indexes-partial.html) >A partial index is an index built over a subset of a table; the subset is defined by a conditional expression (called the predicate of the partial index). The index contains entries only for those table rows that satisfy the predicate. Partial indexes are a specialized feature, but there are several situations in which they are useful. ## 嘗試我們稍微幫 index 做個調整，加上一個條件，並且排除掉我們所不要回傳的資料欄位，進行以下調整。 ```sql! DROP INDEX public.order_history_success_idx; CREATE INDEX order_history_success_idx ON public.order_history (id) where success is false; ``` Before Size ![Before](https://i.imgur.com/lN7yeY3.png) After Size ![After](https://i.imgur.com/0HDSiHY.png) 這邊可以看到，前後的 index 大小就已經有了近10倍的大小差距，那既然 index 中都已經不存在我們不關心的資料，是不是查詢速度也相對的提升了呢？ ```sql explain analyze select id from order_history oh where oh.success is false; ``` ```log QUERY PLAN | -------------------------------------------------------------------------------------------------------------------------------------------------------------+ Index Only Scan using order_history_success_idx on order_history oh (cost=0.42..7777.64 rows=297948 width=4) (actual time=0.037..28.343 rows=300804 loops=1)| Heap Fetches: 0 | Planning Time: 0.119 ms | Execution Time: 38.976 ms | ``` 相較於前面 `95ms` 我們現在只需要 `38.9ms` 就可以完成這項任務，老闆表示非常滿意。 ## 結論根據我們查詢所專注的資料範圍建立 index 可以有效地增加查詢速度，也減少 index size。 index 中存在過多不需要的資料，就會多花費時間在過濾，如果可以盡量的精簡，要快就要少！而我們判斷是否要使用 partial index 的條件就是要觀察資料的分佈，如果查詢範圍永遠是全部資料集的 10%，這樣建立 partial index 就可以精煉為 10% 的大小，以此類推就能先簡單判斷自己是否需要套用這樣的設計。 ###### tags: `blog` `PostgreSQL`

Read more

採用快取機制的歷程

MongoDB Atlas Data API - 再次解放你的雙手

Java Lombok 踩坑紀錄

MongoDB insert 小知識