Postgresql 的冪等操作探討

先直接來看例子









































create table gals1 (
  id int generated always as identity primary key
, name text not null unique
);

insert into gals1 (name)
values ('小島南')
returning *;

 id |  name
----+--------
  1 | 小島南
(1 row)

INSERT 0 1
--

insert into gals1 (name)
values ('佐山愛')
returning *;

 id |  name
----+--------
  2 | 佐山愛
(1 row)

INSERT 0 1

---
再次 insert 小島南

insert into gals1 (name)
values ('小島南')
returning *;

ERROR:  23505: duplicate key value violates unique constraint "gals1_name_key"
DETAIL:  Key (name)=(小島南) already exists.
SCHEMA NAME:  elixir
TABLE NAME:  gals1
CONSTRAINT NAME:  gals1_name_key

這個操作(單純insert)就不是冪等(idempotent).
因為要回應相同的結果.

接著先來看一個時常見到的需求.
就是要做insert,但是若有重複,就不要insert.
所以要先做查詢,判斷是否有找到.

很多人會使用一般自己熟悉的程式語言,來寫個function,
使用 if else 把兩道SQL組合起來.

這裡先用 plpgsql 來寫個function.









































































create or replace function select_insert_gals1 (_name text)
returns setof gals1
language plpgsql as
$code$
declare
  rtn gals1%rowtype;
begin
  select * into rtn
    from gals1
   where name = _name;
  if found then
    return next rtn;
  else
    return query
    insert into gals1 (name)
    values (_name)
    returning *;
  end if;
end;
$code$;


---
select select_insert_gals1('河北彩伽');

 select_insert_gals1
---------------------
 (4,河北彩伽)
(1 row)

備註: 3 因為剛才 第二次 insert 小島南時會先產生,
然後 unique 生效, 回報錯誤.此時。sequence 是 3.

再來一次 河北彩伽

select select_insert_gals1('河北彩伽');

 select_insert_gals1
---------------------
 (4,河北彩伽)

驗證一下 佐山愛

select select_insert_gals1('佐山愛');

 select_insert_gals1
---------------------
 (2,佐山愛)

驗證一下 小島南

select *
  from select_insert_gals1('小島南');

 id |  name
----+--------
  1 | 小島南

因為此函數回傳的結果是如同 gals1,是 set returning function.
所以使用 select * from func()這樣的格式.
直接 select func() 這樣的格式, Postgresql 會把結果
使用 row 方式展現.

也可以把 row 轉成 json 格式.

select row_to_json(n.*)
  from select_insert_gals1('河北彩伽') as n;

        row_to_json
----------------------------
 {"id":4,"name":"河北彩伽"}
(1 row)

由上面例子可以看到,使用SQL 搭配函數,十分地靈活好用.
還可以這樣用.










select current_gals.*
  from (values ('夢乃あいか'), ('石川澪'), ('佐山愛') ) as newgals(name)
     , select_insert_gals1(name) as current_gals;

 id |    name
----+------------
  5 | 夢乃あいか
  6 | 石川澪
  2 | 佐山愛
(3 rows)

由上面的結果,可以觀察到, 夢乃あいか, 石川澪是新增,佐山愛是原本有的.
而 newgals 可以是另一個table.
這樣直接使用一道SQL,就完成了!

而不是寫一隻程式,逐筆讀取新值, 然後先做查詢(select),判斷,再insert.
若沒搭配使用 retuning 時, 像MySQL 會有一個 last_insert_id() ,
有的會使用此函數取得PK,然後再次查詢.如此一來,透過迴圈,並多次讀寫,
效率自然不是很好.甚至有的並未使用(不知道)last_insert_id(),就會由外部程式
來產生uuid,並堅持一定只能這樣.
其實若有觀察到,我們上面的例子中,並未使用PK來判斷.而是使用 unique constraint.

雖然已經有達到冪等操作(請看上面的佐山愛),但是還有另一個坎,就是 Race condition.
先來查看文件
按我前往
裡面提到也要參考 The condition names can be any of those shown in Appendix A.
其中有
23505 unique_violation

接著我們來演化一下 function, 建立一個新的好了.



























create or replace function insert_gals1 (_name gals1.name%type)
returns setof gals1
language plpgsql as
$code$
declare
  rtn gals1%rowtype;
begin
  select * into rtn
    from gals1
   where name = _name;
  if found then
    return next rtn;
  else
    begin
      return query
      insert into gals1 (name)
      values (_name)
      returning *;
      exception when unique_violation then
       	return query
       	select *
       	  from gals1
       	 where name = _name;
    end;
  end if;
end;
$code$;

這個新的函數, 利用 exception 來處理 unique_violation,
這樣當有競爭發生時,落後的 process, 就是改做查詢,符合了冪等操作.
另外注意到輸入參數,改為使用 %type 方式,會更加靈活.

有發現到這個函數一開始的查詢部分,跟後面excption 的處理部分,幾乎相同.
換一個說法就是,因為有了 unique constraint, 而函數又能改處理 unique_violation,
所以此函數可以改為先 insert 吧, 發生 unique_violation 時,就查詢.
















create or replace function insert_gals1 (_name gals1.name%type)
returns setof gals1
language plpgsql as
$code$
begin
  return query
  insert into gals1 (name)
  values (_name)
  returning *;
  exception when unique_violation then
   	return query
   	select *
   	  from gals1
   	 where name = _name;
end;
$code$;

就是把原本 else 之後的 block 拿來就可以了.還減少了 declare 部分.
驗證看看





























select current_gals.*
  from (values ('夢乃あいか'), ('本鄉愛'), ('明里つむぎ') ) as newgals(name)
     , insert_gals1(name) as current_gals;

 id |    name
----+------------
  5 | 夢乃あいか
  8 | 本鄉愛
  9 | 明里つむぎ
(3 rows)

夢乃あいか 一樣是 id 5.
-------------
現在來看一下 gals1 裡的內容

 id |    name
----+------------
  1 | 小島南
  2 | 佐山愛
  4 | 河北彩伽
  5 | 夢乃あいか
  6 | 石川澪
  8 | 本鄉愛
  9 | 明里つむぎ
(7 rows)

3 , 第二次 insert 小島南.
6 , 第二次 insert 夢乃あいか.
消耗掉

Postgresql upsert 的方式

















https://www.postgresql.org/docs/current/sql-insert.html

[ ON CONFLICT [ conflict_target ] conflict_action ]

where conflict_target can be one of:

    ( { index_column_name | ( index_expression ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ]
    ON CONSTRAINT constraint_name
    
and conflict_action is one of:

    DO NOTHING
    DO UPDATE SET { column_name = { expression | DEFAULT } |
                    ( column_name [, ...] ) = [ ROW ] ( { expression | DEFAULT } [, ...] ) |
                    ( column_name [, ...] ) = ( sub-SELECT )
                  } [, ...]
              [ WHERE condition ]

使用 do nothing

建立新的 table, 與 gals1 相同












































































create table gals3
(like gals1 including all);

\d gals3

                          Table "elixir.gals3"
 Column |  Type   | Collation | Nullable |           Default
--------+---------+-----------+----------+------------------------------
 id     | integer |           | not null | generated always as identity
 name   | text    |           | not null |
Indexes:
    "gals3_pkey" PRIMARY KEY, btree (id)
    "gals3_name_key" UNIQUE CONSTRAINT, btree (name)

constraint, index 都如同 gals1.
當然是沒有資料的.

select *
  from gals3;

 id | name
----+------
(0 rows)

--
接著使用 insert on conflict do nothing 的方式

insert into gals3 (name)
values ('小島南'), ('佐山愛')
on conflict (name)
do nothing
returning *;

 id |  name
----+--------
  1 | 小島南
  2 | 佐山愛
(2 rows)

-- 再來一次

 id | name
----+------
(0 rows)

INSERT 0 0

可以觀察到,雖然不會重複輸入,但是 returning 是沒有資料的.

搭配 cte 與 union all

with t1 as (
insert into gals3 (name)
values ('河北彩伽'), ('小島南'), ('佐山愛')
on conflict (name)
do nothing
returning *
)
select *
  from t1
union all
select *
  from gals3
 where name in ('河北彩伽', '小島南', '佐山愛');

 id |   name
----+----------
  5 | 河北彩伽
  1 | 小島南
  2 | 佐山愛
(3 rows)

id 3, 4 在第二次 insert 時消耗了.
小島南, 佐山愛 的 id 保持 1, 2.

以上是多筆資料一次輸入的情境,無需另外開發函數,只需要組合成一道SQL即可.

參考

可以參考呂健誠這篇
按我前往

感謝

感謝佐山愛,小島南,河北彩伽諸位.

Postgresql 的冪等操作探討

Postgresql upsert 的方式

參考

感謝

Read more

PostgreSQL Upsert 的探討

正規化探討

巧用 array 處理 tags 續篇

使用statement trigger 檢查delete 是否有where條件