PostgreSQL Upsert 的探討

# PostgreSQL Upsert 的探討之前有寫一篇Postgresql 的冪等操作探討[按我前往](https://hackmd.io/@pgsql-tw/Hks3X2EOJe),裡面有使用了upsert. 現在對upsert再來做一些探討. ## 第一個例子 ```sql= insert into upsert0223a select n, random(1, 10) from generate_series(1, 100) g(n); select last_autoanalyze , n_tup_ins , n_live_tup , n_dead_tup from pg_stat_all_tables where relname = 'upsert0223a'; last_autoanalyze | n_tup_ins | n_live_tup | n_dead_tup ------------------------------+-----------+------------+------------ 2025-02-23 21:30:56.93961+08 | 100 | 100 | 0 (1 row) insert into upsert0223a values (50, 3), (88, 4); ERROR: 23505: duplicate key value violates unique constraint "upsert0223a_pkey" DETAIL: Key (id)=(50) already exists. select n_tup_ins , n_live_tup , n_dead_tup from pg_stat_all_tables where relname = 'upsert0223a'; n_tup_ins | n_live_tup | n_dead_tup -----------+------------+------------ 101 | 100 | 1 (1 row) ``` 可以觀察到第一次輸入100筆,在系統統計表中, insert 了 100 tuple, live tuple 也是100, dead tuple 0. 第二次insert , 使用 values 兩筆, id 分別是 50, 88. 產生了 error, 違反了 PK. 查詢系統表, 做了101次 insert, 一個 dead tuple. 就是id=50這筆會先insert,然後違反PK,拋出錯誤,然後id=80這個緊接著的insert 並未操作. 因為 values 多筆,等於是 begin end 包起來的多筆單一insert.當有錯誤發生時,後面的就不執行了. ## 使用upsert 的情況 ```sql= insert into upsert0223a values (50, 3), (88, 4) on conflict do nothing; INSERT 0 0 select n_tup_ins , n_live_tup , n_dead_tup from pg_stat_all_tables where relname = 'upsert0223a'; n_tup_ins | n_live_tup | n_dead_tup -----------+------------+------------ 101 | 100 | 1 (1 row) ``` 可以觀察到,並未產生error, 而是回報 insert 0 0. 查詢系統表,insert數未增加,dead tuple 也未增加. 可以說upsert是比較聰明的先判斷是否違反限制,而不是先insert,既避免了錯誤發生,也有利於空間的使用,減少了後續做 dead tuple的清理. 錯誤若有發生,我們需要做相對應的處理,捕捉exception! 資料insert了,需要做IO操作.變成dead tuple,空間也佔了,後續清理也是成本,資料頁面又產生了空洞. upsert雖然看來只是insert 的一點改進,但是實際上帶來的好處是很多的. ## 不是PK的情況 ```sql= create table upsert0223b ( id int generated always as identity primary key , gal text unique ); insert into upsert0223b(gal) values ('佐山愛'),('小島南'),('夢実かなえ'); select n_tup_ins , n_live_tup , n_dead_tup from pg_stat_all_tables where relname = 'upsert0223b'; n_tup_ins | n_live_tup | n_dead_tup -----------+------------+------------ 3 | 3 | 0 (1 row) insert into upsert0223b(gal) values ('小島南') on conflict (gal) do nothing; INSERT 0 0 select * from upsert0223b; id | gal ----+------------ 1 | 佐山愛 2 | 小島南 3 | 夢実かなえ (3 rows) select n_tup_ins , n_live_tup , n_dead_tup from pg_stat_all_tables where relname = 'upsert0223b'; n_tup_ins | n_live_tup | n_dead_tup -----------+------------+------------ 3 | 3 | 0 (1 row) ``` 可以觀察到,使用unique key, 一樣也能使用upsert! ## 結語 Upsert表面上看來只是語法上,較為便利.實際上卻能帶來開發上與效能及空間利用的改善. ## 感謝感謝佐山愛,小島南,夢実かなえ.