tags: `jubo` `sql` `postgresql`

如何調校 Postgres query: 從 1s 到 1ms

如何調校 Postgres query: 從 1s 到 1ms

摘要

使用 EXPLAIN 調查 postgres 的效能瓶頸
了解 WITH Queries (Common Table Expressions, CTE) 及它的 RECURSIVE modifier
了解 subquery 與 LATERAL subquery 的差異

情境與目標

情境：有一張表，每秒不斷插入資料 (e.g. 10 QPS)，每筆資料有「組別」及「時間戳記 (單調遞增)」
目標：撈出每一個組別最新的資料

準備測資

可抽象化我們的問題、並使用以下語法創建實驗用的資料表，包含十萬筆測資：

DROP TABLE IF EXISTS test;
CREATE TABLE IF NOT EXISTS test (
	id integer,
	group_id text,
	ts float,
	PRIMARY KEY (id)
);
---
INSERT INTO test
SELECT 
	id, 
	floor(random() * 21) + 1 as group_id,
	random() * id as ts
FROM generate_series(1, 100000) as id;

test=# SELECT * FROM test LIMIT 10;
 id | group_id |         ts
----+----------+---------------------
  1 | 14       |  0.8738292700068797
  2 | 2        | 0.04276924171868757
  3 | 6        |  1.8895307871257785
  4 | 12       |  0.2828856022108823
  5 | 18       |   1.565112559748556
  6 | 15       |  3.4001230361198083
  7 | 7        |   6.962811861454497
  8 | 18       |  3.1767478852914905
  9 | 1        |  1.3585558463557987
 10 | 16       |   8.664995739231856
(10 rows)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

random() 與 generate_series()

皆是 postgres 內建的 function

random() - random value in the range 0.0 <= x < 1.0

generate_series() - 用來產生 series data。你也可以真的用它來產生 time series data，可參考此文章或以下 code snippets：

DROP TABLE IF EXISTS test;
CREATE TABLE IF NOT EXISTS test (
    id int,
    group_id text,
    ts timestamp,
    PRIMARY KEY (id)
);
INSERT INTO test
SELECT 
    cast(extract(epoch from ts) as integer) as id,
    floor(random() * 21) + 1 as group_id,
    ts
FROM generate_series(
    '2021-10-01 00:00:00',
    '2021-10-10 23:59:59', INTERVAL '1 second'
  ) as ts;

`DISTINCT ON`

postgres 獨有的功能 SELECT DISTINCT ON (expression) 可直接完成這題：

SELECT DISTINCT ON (expression) 會根據 expression 來分組，留下每一組的「第一個 row」
通常會搭配 ORDER BY，否則你無法預測第一個 row 是誰

test=# SELECT DISTINCT ON (group_id) group_id, ts FROM test ORDER BY group_id, ts DESC;
 group_id |        ts
----------+-------------------
 1        | 99112.07624905343
 10       | 98869.90539286354
 11       | 99730.91905459965
 12       | 98017.65032148497
 13       | 99434.19380268104
 14       | 98658.40954038972
 15       | 99335.33749995827
 16       | 98117.60123883729
 17       | 97094.92662912408
 18       | 99073.96975479793
 19       | 98198.59356591586
 2        | 98062.79233482035
 20       | 98099.25779874563
 21       | 99026.72343275082
 3        | 99363.51577292381
 4        | 98451.43327742901
 5        | 98490.15275335223
 6        | 98540.02564950146
 7        | 96433.46464303069
 8        | 97942.25337316301
 9        | 97892.12043570109
(21 rows)

若表中有百萬筆資料呢？

修改一下前面的語法，重新插入百萬筆資料之後再 query 一次…此時會發現，query 時間來到了「秒」的數量級：

test=# SELECT COUNT(*) FROM test;
  count
---------
 1000000
(1 row)

-- 開啟顯示執行時間的功能
test=# \timing
Timing is on.

test=# SELECT DISTINCT ON (group_id) group_id, ts FROM test ORDER BY group_id, ts DESC;
 group_id |        ts
----------+-------------------
 1        | 990273.0128625423
 10       | 999226.3812369105
 11       | 992656.9255788977
 12       | 997310.3971343199
 13       | 997192.9010228362
 14       | 989239.5313995752
 15       | 998120.1311129808
 16       | 995527.5426085395
 17       | 992236.2212767225
 18       | 995355.4447963368
 19       |  998788.216376456
 2        |  996437.390885592
 20       | 995076.1035105237
 21       | 993803.7471270775
 3        | 993276.0061478146
 4        | 993683.4488705536
 5        | 990940.8277919117
 6        | 994064.5686872238
 7        | 996748.4075327903
 8        |  995949.111045784
 9        | 991948.4967691106
(21 rows)

Time: 1508.229 ms (00:01.508)

使用 `EXPLAIN` 查看執行計畫

RDBMS 在執行你的 SQL statement 之前，planner 會先算出一個 cost 最低的 query plan，然後再使用該 plan 執行你的語句。

EXPLAIN 可以讓我們看到該 plan 長什麼樣，且不會真的執行。

使用方式為在你的 query 前加 EXPLAIN：

test=# EXPLAIN SELECT DISTINCT ON (group_id) * FROM test ORDER BY group_id, ts DESC;
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Unique  (cost=132154.34..137154.34 rows=21 width=10)
   ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=10)
         Sort Key: group_id, ts DESC
         ->  Seq Scan on test  (cost=0.00..15406.00 rows=1000000 width=10)

query plan 會以 tree of plan nodes 的結構呈現。此 plan 告訴我們它將會執行三個計畫節點 (plan node)，執行順序由 leaf node 開始依序為：Seq Scan、Sort 及 Unique。

其中，每個 plan node 會輸出 cost, rows, width 資訊。以 Seq Scan 的 (cost=0.00..15406.00 rows=1000000 width=10) 為例：

cost=0.00..15406.00 - 0.00 指此 node 可以開始輸出資料前的啟動成本 (start-up cost)，且會包含 child node 的總啟動成本；15406.00 則是估計此 node 完成操作需要花費的成本 (total cost)。cost 越高，可預期實際執行時間就越長
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
何謂 cost?
postgres 已定義好的一些 CPU 執行與 I/O 等動作的權重 (20.7.2. Planner Cost Constants)，給 planner 估計時做加總。另有一篇 start-up cost vs. total cost 解釋的很清楚。
rows=1000000 - 估計此節點完成後會輸出 1000000 rows 給 parent node
width=10 - 估計每一個 row 的大小為 10 bytes

故從這份 query plan 我們大概能了解到此 query 的行為：

首先會做全表掃描 (Seq Scan)，將整張表從 disk 讀進 memory
接著會根據 ORDER BY 給的條件做排序 (Sort)
排序完才能根據 DISTINCT ON 的條件做唯一值過濾 (Unique)

`ANALYZE` and `BUFFERS`

但其實只看 cost，還是只能猜，此時可再打開兩個選項了解更多執行細節：

ANALYZE - 實際執行此 query，取得實際執行時間及每個 plan node 實際輸出的 rows
BUFFERS - 有多少資料是在 cache 中找到、又有多少是從 disk IO 中讀取(寫入)

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

see more EXPLAIN options

 test=# EXPLAIN (ANALYZE, BUFFERS) SELECT DISTINCT ON (group_id) * FROM test ORDER BY group_id, ts DESC;
                                                         QUERY PLAN
---------------------------------------------------------------------------------
 Unique  (cost=132154.34..137154.34 rows=21 width=10) (actual time=1261.147..1661.021 rows=21 loops=1)
   Buffers: shared hit=5406, temp read=3902 written=3919
   ->  Sort  (cost=132154.34..134654.34 rows=1000000 width=10) (actual time=1261.145..1593.136 rows=1000000 loops=1)
         Sort Key: group_id, ts DESC
         Sort Method: external merge  Disk: 25496kB
         Buffers: shared hit=5406, temp read=3902 written=3919
         ->  Seq Scan on test  (cost=0.00..15406.00 rows=1000000 width=10) (actual time=2.303..95.573 rows=1000000 loops=1)
               Buffers: shared hit=5406
 Planning Time: 0.040 ms
 JIT:
   Functions: 3
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 0.347 ms, Inlining 0.000 ms, Optimization 0.164 ms, Emission 1.998 ms, Total 2.509 ms
 Execution Time: 1668.240 ms
(14 rows)

actual time - 每個 plan node 的實際執行時間 (ms)，一樣包含 start-up and total
Buffers:
- shared hit - 有多少 data block 來自 memory cache (1 data block = 8kb by default)
- read - 有多少 data block 來自 disk IO
- write - 為了處理此 plan node，有多少 data block 需寫入 disk
Execution Time - 總執行時間

query plan 越來越複雜，直接看有點累了，找一些視覺化工具來幫忙判讀：

(https://explain.dalibo.com/plan/eJP)

延伸閱讀

Understanding EXPLAIN plans - GitLab Docs

如何加速

create index

最直覺想到的是，加 index 如何？針對我們的 ORDER BY 條件加 composite index 看看 (死馬當活馬醫)：

test=# CREATE INDEX idx_composite ON test (group_id, ts DESC);
test=# EXPLAIN (ANALYZE, BUFFERS) SELECT DISTINCT ON (group_id) * FROM test ORDER BY group_id, ts DESC;
                                   QUERY PLAN
---------------------------------------------------------------------------------
 Unique  (cost=0.42..54516.29 rows=21 width=14) (actual time=0.008..546.256 rows=21 loops=1)
   Buffers: shared hit=1003332
   ->  Index Scan using idx_composite on test  (cost=0.42..52016.29 rows=1000000 width=14) (actual time=0.007..470.336 rows=1000000 loops=1)
         Buffers: shared hit=1003332
 Planning Time: 0.159 ms
 Execution Time: 546.301 ms

確實加速了

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

(1668.240 ms -> 546.301 ms)！
但仔細看會發現，加速的原因其實是因為所有的資料都已經在 buffer 裡了 (Buffers: shared hit=1003332)，因為沒有額外做 disk I/O 所以快。但實際上，Index Scan 還是輸出了所有的 rows 給 Unique node，在 Unique node 的階段才找出我們想要的那 21 筆資料。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

postgres 會盡可能地將 index 的資訊儲存在記憶體裡面提升效率。以此例來說，primary key (id) 已經在 memory，後來加上的 composite key 也將剩餘兩個 columns 的資訊給 cache 起來了。

該如何再進一步提升效率？有沒有辦法不要看完整張表來找到特定的資料？

問 google 大神

小插曲，在問 google 的過程中發現，stackoverflow 很多有關 postgres 的最佳解答都是這位眯眼哥：

也有其他人發現總是這位眯眼哥在回答 postgres 的問題：

那原來眯眼哥就是 postgres 的其中一位 contributor

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

我們可以從 Erwin 的回答中獲得一些 hints。以下的問答他都詳細地解釋了各種考量之下、可以嘗試的優化方向：

`WITH RECURSIVE` 與 `LATERAL`

在 Erwin 的回答中提到，需運用到 WITH RECURSIVE 與 LATERAL 語法的支援，來要求 postgres 模擬所謂的 loose indexscan。

拼拼湊湊，把我們的 query 改成以下、然後執行：

WITH RECURSIVE res AS (
	(SELECT * FROM test ORDER BY group_id, ts DESC LIMIT 1)
	UNION
	SELECT inner_res.* FROM res, LATERAL (
		SELECT * FROM test AS base
		WHERE base.group_id > res.group_id
		ORDER BY base.group_id, base.ts DESC
		LIMIT 1
	) inner_res
)
SELECT *
FROM res
ORDER BY group_id, ts DESC

                                    QUERY PLAN
---------------------------------------------------------------------------------
 Sort  (cost=64.15..64.40 rows=101 width=44) (actual time=0.372..0.374 rows=21 loops=1)
   Sort Key: res.group_id, res.ts DESC
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=69 read=18
   CTE res
     ->  Recursive Union  (cost=0.42..58.77 rows=101 width=14) (actual time=0.008..0.358 rows=21 loops=1)
           Buffers: shared hit=69 read=18
           ->  Limit  (cost=0.42..0.48 rows=1 width=14) (actual time=0.007..0.008 rows=1 loops=1)
                 Buffers: shared hit=4
                 ->  Index Scan using idx_composite on test  (cost=0.42..52016.29 rows=1000000 width=14) (actual time=0.007..0.007 rows=1 loops=1)
                       Buffers: shared hit=4
           ->  Nested Loop  (cost=0.42..5.63 rows=10 width=14) (actual time=0.016..0.016 rows=1 loops=21)
                 Buffers: shared hit=65 read=18
                 ->  WorkTable Scan on res res_1  (cost=0.00..0.20 rows=10 width=32) (actual time=0.000..0.000 rows=1 loops=21)
                 ->  Limit  (cost=0.42..0.52 rows=1 width=14) (actual time=0.016..0.016 rows=1 loops=21)
                       Buffers: shared hit=65 read=18
                       ->  Index Scan using idx_composite on test base  (cost=0.42..32573.15 rows=333333 width=14) (actual time=0.015..0.015 rows=1 loops=21)
                             Index Cond: (group_id > res_1.group_id)
                             Buffers: shared hit=65 read=18
   ->  CTE Scan on res  (cost=0.00..2.02 rows=101 width=44) (actual time=0.009..0.362 rows=21 loops=1)
         Buffers: shared hit=69 read=18
 Planning Time: 0.169 ms
 Execution Time: 0.392 ms
(23 rows)

(https://explain.dalibo.com/plan/vaJ)

好了，下課了各位

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

解讀 `WITH RECURSIVE` query 與 `LATERAL` subquery

為了解讀 WITH RECURSIVE query 與 LATERAL subquery 施展了什麼魔法，需要先帶大家分別了解一下：

WITH query 與 WITH RECURSIVE query 的差別
subquery 與 LATERAL subquery 的差別

`WITH` query vs. `WITH RECURSIVE` query

WITH query
WITH query 用來產生暫時的 table(s)，輔助主要 query 的查詢。

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

WITH quries 在官方文件中又稱為 Common Table Expression (CTE)。以下會使用 CTE 來表示一段會輸出 temporary table 的語句。

借個 official doc 上的範例:
















WITH regional_sales AS (
    SELECT region, SUM(amount) AS total_sales
    FROM orders
    GROUP BY region
), top_regions AS (
    SELECT region
    FROM regional_sales
    WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
)
SELECT region,
       product,
       SUM(quantity) AS product_units,
       SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;

regional_sales 會先根據 region 產生 (L1-4)
top_regions 再根據 regional_sales 產生 (L5-8)
最後再根據 top_regions 產生想要的結果 (L10-16)

(example from 7.8.1. SELECT in WITH)

WITH RECURSIVE query

而 WITH RECURSIVE query 的目的也是產生一個 temporary table，但運作原理是採用迭代的方式。語法為：

WITH RECURSIVE cte_name AS (
    CTE_query_non_recursive -- non-recursive term
    UNION [ALL]
    CTE_query_recursive     -- recursive term
) 
SELECT * 
FROM cte_name;

其中：

non-recursive term:
- 此 CTE 用來初始化第一個 result set (R₀)
recursive term:
- 此 CTE (R_i) 可以利用前一步 (R_i-1) 的 result set 作為 input value 來計算此步的 result set
UNION 會移除重複的 rows；UINON ALL 則保留

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

NOTE
Strictly speaking, this process is iteration not recursion, but RECURSIVE is the terminology chosen by the SQL standards committee.

舉個例子，假設有以下 employees table：

test=# SELECT * FROM employees;
 employee_id | manager_id |     full_name
-------------+------------+-------------------
           1 |            | Michael North
           2 |          1 | Megan Berry
           3 |          1 | Sarah Berry
           4 |          1 | Zoe Black
           5 |          1 | Tim James
           6 |          2 | Bella Tucker
           7 |          2 | Ryan Metcalfe
           8 |          2 | Max Mills
           9 |          2 | Benjamin Glover
          10 |          3 | Carolyn Henderson
          11 |          3 | Nicola Kelly
          12 |          3 | Alexandra Climo
          13 |          3 | Dominic King
          14 |          4 | Leonard Gray
          15 |          4 | Eric Rampling
          16 |          7 | Piers Paige
          17 |          7 | Ryan Henderson
          18 |          8 | Frank Tucker
          19 |          8 | Nathan Ferguson
          20 |          8 | Kevin Rampling
(20 rows)

(example from Learn PostgreSQL Recursive Query By Example)

想要取出員工 id=2 及他的所有下屬，可以這樣下：

WITH RECURSIVE subordinates AS (
	(SELECT * FROM employees WHERE employee_id = 2) -- non-recursive term
	UNION
	(
		SELECT e.* FROM employees e
		JOIN subordinates s ON e.manager_id = s.employee_id
	) -- recursive term
) 
SELECT * FROM subordinates;

讓我們一步步拆解，non-recursive term (R₀) 的 result set 為：

 employee_id | manager_id |  full_name
-------------+------------+-------------
           2 |          1 | Megan Berry

並且暫時存放在 temporary working table subordinates 中給 recursive term 參考引用。

接著使用 recursive term 迭代 (R₁)：

SELECT e.* FROM employees e
JOIN subordinates s ON e.manager_id = s.employee_id

我們將 employees table 與 subordinates (此時僅有一筆 Megan Berry) 做 JOIN，找出主管為 Megan Berry 的資料。得到新的 result set 為：

 employee_id | manager_id |    full_name
-------------+------------+-----------------
           6 |          2 | Bella Tucker
           7 |          2 | Ryan Metcalfe
           8 |          2 | Max Mills
           9 |          2 | Benjamin Glover

而此新的 result set (4 rows) 會取代掉 subordinates 的內容作為下次迭代之用。而前一次的 result set (1 row) 也會先放進 temporary intermediate table。

此時再實行一次 recursive term (R₂)，尋找主管 id 為 6, 7, 8, 9 的員工，可以得到以下 result set：

 employee_id | manager_id |    full_name
-------------+------------+-----------------
          16 |          7 | Piers Paige
          17 |          7 | Ryan Henderson
          18 |          8 | Frank Tucker
          19 |          8 | Nathan Ferguson
          20 |          8 | Kevin Rampling

一樣，此 result set (5 rows) 會取代掉 subordinates、並將前一次的 result set (4 rows) 放到 temporary intermediate table。

此時再執行一次 recursive term (R_３)，會發現 result set 為空了 (因為沒有員工的主管是 16, 17, 18, 19, 20)，故 recursive quey 終止。

最後，統整 subordinates (5 rows) 與 temporary intermediate table 裡 (共 1 + 4 rows) 的內容至 subordinates 供後續 query 使用。最後的執行結果：

test=# WITH RECURSIVE subordinates AS (
test(#     (SELECT * FROM employees WHERE employee_id = 2)
test(#     UNION
test(#     (
test(#         SELECT e.* FROM employees e
test(#         JOIN subordinates s ON s.employee_id = e.manager_id
test(#     )
test(# )
test-# SELECT * FROM subordinates;
employee_id | manager_id |    full_name
-------------+------------+-----------------
           2 |          1 | Megan Berry
           6 |          2 | Bella Tucker
           7 |          2 | Ryan Metcalfe
           8 |          2 | Max Mills
           9 |          2 | Benjamin Glover
          16 |          7 | Piers Paige
          17 |          7 | Ryan Henderson
          18 |          8 | Frank Tucker
          19 |          8 | Nathan Ferguson
          20 |          8 | Kevin Rampling
(10 rows)

subquery vs. `LATERAL` subquery

再次感謝 Erwin，在另一篇回應中詳細解釋了 subquery 與 LATERAL subquery 的差異與適用情境。

subquery

subquery 顧名思義一定是在 main query 中、有個 sub query 會先去執行，然後拿到結果給 main query 使用，但使用上有些限制。

所謂的限制，是它回傳的 result set 只能是 single column (但可以是 single row or multiple rows)，然後再作為主要 query 的條件來使用。例如：

SELECT id, title, boxoffice
FROM films
WHERE boxoffice > (
        SELECT AVG (boxoffice)
        FROM films
    );

(PostgreSQL Subquery)

LATERAL subquery

而 LATERAL subquery 則是 for loop 一張 outer-table 的 rows，並在迭代中執行 subquery，最後將每一個 subquery 的結果與 outer-table 的 row JOIN 起來。

這邊借一個例子簡單示意：

test=# SELECT * FROM t_wishlist;
 wishlist_id | username | desired_price
-------------+----------+---------------
           1 | hans     | 450
           2 | joe      | 60
           3 | jane     | 1500
(3 rows)

SELECT *
FROM t_wishlist AS w,
    LATERAL (
        SELECT      *
        FROM        t_product AS p
        WHERE       p.price < w.desired_price
        ORDER BY p.price DESC
        LIMIT 3
       ) AS x
ORDER BY wishlist_id, price DESC;
---
wishlist_id | username | desired_price | product_id | price              | product
-------------+----------+---------------+------------+--------------------+-------------
           1 | hans     | 450           | 708        | 447.0511375753179  | product 708
           1 | hans     | 450           | 126        | 443.6560873146138  | product 126
           1 | hans     | 450           | 655        | 438.0566432022443  | product 655
           2 | joe      | 60            | 40         | 59.32252841190291  | product 40
           2 | joe      | 60            | 19         | 59.2142714048882   | product 19
           2 | joe      | 60            | 87         | 58.78014573804254  | product 87
           3 | jane     | 1500          | 687        | 1495.8794483743645 | product 687
           3 | jane     | 1500          | 297        | 1494.4586352980593 | product 297
           3 | jane     | 1500          | 520        | 1490.7849437550085 | product 520
(9 rows)

(完整的例子可參考 Understanding LATERAL joins in PostgreSQL)

LATERAL subquery 會 loop t_wishlist 每一個 row 作為參考，執行 LATERAL (...) 內的 subquery。所以以此例來說 LATERAL 內的 subquery 會被執行 3 次，每次執行時 w.desired_price 參考到的值分別是 450, 60 與 1500。

結合上述知識點

有了上述的基礎之後，來一步步看我們的 1ms-query 怎麼運作：













WITH RECURSIVE res AS (
	(SELECT * FROM test ORDER BY group_id, ts DESC LIMIT 1)
	UNION
	SELECT inner_res.* FROM res, LATERAL (
		SELECT * FROM test AS base
		WHERE base.group_id > res.group_id
		ORDER BY base.group_id, base.ts DESC
		LIMIT 1
	) inner_res
)
SELECT *
FROM res
ORDER BY group_id, ts DESC

(L2) 先拿一個 row 當作 initial result set。這個 row 的 group_id 會是最小、且 ts 最大的那一筆而已。這一筆結果會先被暫存到 res 中給後續每一次迭代參考
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
字串型態比大小
這邊運用到字串比大小的技巧，而字串比的是他們的 code points (numerical value)。以 ASCII 編碼來看， "A" 的十進位值為 65，小於 "a" 的值 97，故 "a" > "A"。若有多個字元，則先比第一個字元，相等的話再比下一個字元，e.g. "Aa" > "AA"。
(L4) res 裡只有一筆資料，也就是這裡的 LATERAL 只會根據 res 裡那筆資料執行一次 subquery 而已
(L5-8) subquery 裡，則是從 test table 中，取出 group_id 比 res.group_id 還大、且 ts 最大的那一筆
(repeat L4) step 3. 拿到的那一筆資料會成為新的 res，此時再重複 step 2.，直到 step 3. 為 empty result set

上述的語法，更細緻地控制 postgres 的處理方式，避免從 disk/memory 輸出了一堆不需要的 rows 才能進行運算；而是更有效地先利用 index 來定位資料，然後僅輸出必要的 rows 進行處理。

總結

了解如何使用 EXPLAIN 探索效能瓶頸
了解更多 postgres 功能 (WITH, WITH RECURSIVE, subquery, LATERAL subquery)
了解如何用不同的語法更細緻地控制 postgres 的行為

維護性更佳的實務建議

以「撈出每一個組別最新的資料」這個需求來說

You probably don't want to hear this, but the best option to speed up SELECT DISTINCT is to avoid DISTINCT to begin with. In many cases (not all!) it can be avoided with better database-design or better queries.
Erwin Brandstetter
commentted on How to speed up select distinct?

是否有更好的 schema/table design 來避開使用 DISTINCT ON、或上述維護性較差的語法來 query？是否能避免 OLTP 系統有大海撈針的行為？

可以考慮

新資料進來的時候，規劃一張 table (e.g. latest_records)「只儲存」每個組別最新的那筆資料。當前端要資料時，後端直接從 latest_records 裡取出來吐給前端即可。

此時若還有效能瓶頸，改進的手段就比較容易理解了：

針對 latest_records 加 cache layer
- 但還是得記得考慮是否有 cache penetration, cache avalanche 與 cache stampede 的問題 (good overview article)
剖析需求，加入適當的 WHERE/LIMIT/OFFSET 條件來避免存取不必要的資料

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

OLAP vs. OLTP: What’s the Difference?
我們的系統流量雖然不及 B2C，但就性質上也應該比較接近 OLTP、而非 OLAP。故我們應該要對 query 讀取效率有較高的自我要求。

tags: jubo sql postgresql

如何調校 Postgres query: 從 1s 到 1ms

摘要

情境與目標

準備測資

DISTINCT ON

若表中有百萬筆資料呢？

使用 EXPLAIN 查看執行計畫

ANALYZE and BUFFERS

如何加速

create index

問 google 大神

WITH RECURSIVE 與 LATERAL

解讀 WITH RECURSIVE query 與 LATERAL subquery

WITH query vs. WITH RECURSIVE query

subquery vs. LATERAL subquery

結合上述知識點

總結

維護性更佳的實務建議

Read more

Pub/Sub 與 RabbitMQ 實務考量

Learning roadmap

Guides for Design Review & Code Review

General Guiding Principles

tags: `jubo` `sql` `postgresql`

`DISTINCT ON`

使用 `EXPLAIN` 查看執行計畫

`ANALYZE` and `BUFFERS`

`WITH RECURSIVE` 與 `LATERAL`

解讀 `WITH RECURSIVE` query 與 `LATERAL` subquery

`WITH` query vs. `WITH RECURSIVE` query

subquery vs. `LATERAL` subquery