正規化探討

前言

正規化是一個重要,但是近年來較少討論,甚至被誤解的議題.
最近時常看到一些人,說到正規化,關聯式資料庫,就是以前磁碟貴,要節省空間與金錢,所以才這樣做.現在儲存科技進步,所以就不需要云云.
正規化主要是做一致性,去冗餘只是為了一致性的過程中,帶來的特性,並不是目的.

還是來看實際的案例

來一個簡化情況的影片網站.

分析與設計

gal 放 av女優 id,名字.
company 放片商 id, 名字.
videos 放影片id.
我們先假設av女優使用的藝名不變,簡化情況,看來這樣好像就可以了.
video 建立兩個 fk,分別對應 gal 與 company.
但是實際上,女優會與片商多次合作,這時候相同的組合會出現多次,將此組合另外建立一個table,由此table 與 gal, company 建立兩個fk, video 在與此table 做fk.這也是一種方式. 這時候就面臨決策思考點,這樣的方式,在查詢時還要多經一個table,而且這樣還要多一個table,管理上,或是建立的語法都不熟悉,因為書上或網路的範例,幾乎都是一層的.
但是系統不是只有查詢,還有update與delete! 雖然不少人時常說crud沒什麼,bala bala一堆的.也有都是用soft delete,對真的delete有較進一步操作或理解的較少.
在這裡我會增加一個女優片商配對的table, 雖然說女優有分專屬單體,但是為了好理解起見,這個table就取名為 contract.
在初步的階段,往往會不這樣設計,因為不見得有容易發現的實體.
但是經過先建立上面三個實體的table,再進一步分析,就會得到.
這個帶來的好處,在後面的實際例子中,就會看到,先不用急著了解.

建立 schema 與 tables

首先建立兩個 schema, s1212a 與 s1212bak.
s1212a 是主要的schema, s1212bak 的用處稍後就能理解.































































































create schema s1212a;
create schema s1212bak;

create table s1212a.gal (
  id int not null primary key
, gname text
);

create table s1212a.company (
  id int not null primary key
, cname text unique
);

insert into s1212a.gal values
(1, '小島みなみ'), (2, '葵つかさ'), (3, 'あやみ旬果'),
(4, '鈴村あいり');

insert into s1212a.company values
(1, 'Alice Japan'), (2, 'S1'), (3, 'PRESTIGE');


create table s1212a.contract (
  id int not null primary key
, gal_id int not null REFERENCES s1212a.gal(id) 
    ON DELETE CASCADE ON UPDATE CASCADE
, com_id int not null REFERENCES s1212a.company(id)
    ON DELETE CASCADE ON UPDATE CASCADE
, unique (gal_id, com_id)
);

insert into s1212a.contract values
(1, 1, 1), (2, 1, 2), -- 小島みなみ
(3, 2, 1), (4, 2, 2), -- 葵つかさ
(5, 3, 3), (6, 3, 2), -- あやみ旬果
(7, 4, 3);


select c.id, g.gname, co.cname
  from s1212a.contract c
  join s1212a.gal g
    on c.gal_id = g.id
  join s1212a.company co
    on c.com_id = co.id;

 id |   gname    |    cname    
----+------------+-------------
  1 | 小島みなみ | Alice Japan
  2 | 小島みなみ | S1
  3 | 葵つかさ   | Alice Japan
  4 | 葵つかさ   | S1
  5 | あやみ旬果 | PRESTIGE
  6 | あやみ旬果 | S1
  7 | 鈴村あいり | PRESTIGE
(7 rows)


create table s1212a.videos (
  id int not null primary key
, cid int not null REFERENCES s1212a.contract(id)
    ON DELETE CASCADE ON UPDATE CASCADE
, vid text not null
);

insert into s1212a.videos values
(1, 1, 'DV-1279'), (2, 2, 'SNIS-133'), (3, 2, 'SSIS-911'), -- 小島みなみ
(4, 3, 'DV-1195'), (5, 4, 'SNIS–436'), (6, 4, 'SSIS-974'), -- 葵つかさ
(7, 5, 'EDD-209'), (8, 5, 'ABP-356'), (9, 6, 'SSNI-203'), -- あやみ旬果
(10, 7, 'EDD-218'), (11, 7, 'ABF-061');                   -- 鈴村あいり

--
create view s1212a.current_vidoes as
select g.gname, co.cname
     , array_agg(v.vid) as vids
  from s1212a.contract c
  join s1212a.gal g
    on c.gal_id = g.id
  join s1212a.company co
    on c.com_id = co.id
  join s1212a.videos v
    on c.id = v.cid
 group by g.gname, co.cname;

select *
  from s1212a.current_vidoes;

   gname    |    cname    |        vids         
------------+-------------+---------------------
 葵つかさ   | Alice Japan | {DV-1195}
 葵つかさ   | S1          | {SNIS–436,SSIS-974}
 あやみ旬果 | PRESTIGE    | {EDD-209,ABP-356}
 あやみ旬果 | S1          | {SSNI-203}
 小島みなみ | Alice Japan | {DV-1279}
 小島みなみ | S1          | {SSIS-911,SNIS-133}
 鈴村あいり | PRESTIGE    | {EDD-218,ABF-061}
(7 rows)

最後建立了一個view.方便我們取得資料.

傳播性更新

在上面我們建立 REFERENCES 時,有宣告 ON DELETE CASCADE ON UPDATE CASCADE
我們以葵つかさ為例,來看在 gal 與 contract 的 id











select g.gname, g.id, c.gal_id
  from s1212a.gal g
  join s1212a.contract c
    on c.gal_id = g.id
 where g.gname = '葵つかさ';

  gname   | id | gal_id 
----------+----+--------
 葵つかさ |  2 |      2
 葵つかさ |  2 |      2
(2 rows)

更新 id 而且還是pk
































update s1212a.gal
   set id = -1
 where gname = '葵つかさ';

select g.gname, g.id, c.gal_id
  from s1212a.gal g
  join s1212a.contract c
    on c.gal_id = g.id
 where g.gname = '葵つかさ';
 
  gname   | id | gal_id 
----------+----+--------
 葵つかさ | -1 |     -1
 葵つかさ | -1 |     -1
(2 rows)

-- s1212a.contract 的 gal_id 也跟著更新了!

-- view 的內容也跟上面相同(多table join 結果也維持一致性)
select *
  from s1212a.current_vidoes;

   gname    |    cname    |        vids         
------------+-------------+---------------------
 葵つかさ   | Alice Japan | {DV-1195}
 葵つかさ   | S1          | {SNIS–436,SSIS-974}
 あやみ旬果 | PRESTIGE    | {EDD-209,ABP-356}
 あやみ旬果 | S1          | {SSNI-203}
 小島みなみ | Alice Japan | {DV-1279}
 小島みなみ | S1          | {SSIS-911,SNIS-133}
 鈴村あいり | PRESTIGE    | {EDD-218,ABF-061}
(7 rows)

刪除的情境

先建立 mirror table






-- create mirror tables
-- 使用 create table like
create table s1212bak.gal (like s1212a.gal including all);
create table s1212bak.company (like s1212a.company including all);
create table s1212bak.contract (like s1212a.contract including all);
create table s1212bak.videos (like s1212a.videos including all);

這樣就會在 schmea 下建立對應的mirror table,而且會連pk,index 都建立好.
但是 references 是沒有建立的,可以讓我們視需要再行建立.

建立 trigger































































create or replace function s1212a.f_gal_del()
returns trigger
language plpgsql as
$code$
begin
  insert into s1212bak.gal 
  select old.*;
  
  return old;
end;
$code$;

create or replace function s1212a.f_company_del()
returns trigger
language plpgsql as
$code$
begin
  insert into s1212bak.company
  select old.*;
  
  return old;
end;
$code$;

create or replace function s1212a.f_contract_del()
returns trigger
language plpgsql as
$code$
begin
  insert into s1212bak.contract
  select old.*;
  
  return old;
end;
$code$;

create or replace function s1212a.f_videos_del()
returns trigger
language plpgsql as
$code$
begin
  insert into s1212bak.videos
  select old.*;
  
  return old;
end;
$code$;

create trigger tri_bd_gal
before delete on s1212a.gal
for each row execute function s1212a.f_gal_del();

create trigger tri_bd_company
before delete on s1212a.company
for each row execute function s1212a.f_company_del();

create trigger tri_bd_contract
before delete on s1212a.contract
for each row execute function s1212a.f_contract_del();

create trigger tri_bd_videos
before delete on s1212a.videos
for each row execute function s1212a.f_videos_del();

假設 PRESTIGE 與我們終止合作,相關video都要下架!








































































select *
  from s1212a.company;
  
 id |    cname    
----+-------------
  1 | Alice Japan
  2 | S1
  3 | PRESTIGE
(3 rows)

delete
  from s1212a.company
 where cname = 'PRESTIGE';

commit; -- 注意到這個 commit.

select *
  from s1212a.current_vidoes;

   gname    |    cname    |        vids         
------------+-------------+---------------------
 葵つかさ   | Alice Japan | {DV-1195}
 葵つかさ   | S1          | {SNIS–436,SSIS-974}
 あやみ旬果 | S1          | {SSNI-203}
 小島みなみ | Alice Japan | {DV-1279}
 小島みなみ | S1          | {SNIS-133,SSIS-911}
(5 rows)

select *
  from s1212bak.company;

 id |  cname   
----+----------
  3 | PRESTIGE
(1 row)

select *
  from s1212bak.contract;

 id | gal_id | com_id 
----+--------+--------
  5 |      3 |      3
  7 |      4 |      3
(2 rows)

select *
  from s1212bak.videos;

 id | cid |   vid   
----+-----+---------
  7 |   5 | EDD-209
  8 |   5 | ABP-356
 10 |   7 | EDD-218
 11 |   7 | ABF-061
(4 rows)

select g.gname, co.cname
     , array_agg(v.vid) as vids
  from s1212bak.contract c
  join s1212a.gal g
    on c.gal_id = g.id
  join s1212bak.company co
    on c.com_id = co.id
  join s1212bak.videos v
    on c.id = v.cid
 group by g.gname, co.cname;

   gname    |  cname   |       vids        
------------+----------+-------------------
 あやみ旬果 | PRESTIGE | {EDD-209,ABP-356}
 鈴村あいり | PRESTIGE | {EDD-218,ABF-061}
(2 rows)

可以看到只有使用一個簡單的delete!
關聯資料都已經順利刪除,並且在s1212bak下的mirror table 也都對應存起來了.

假設 PRESTIGE 與我們恢復合作,相關video都要上架!












































































-- 復原的方式
with t1 as (
delete 
  from s1212bak.company
 where cname = 'PRESTIGE'
 returning *
)
insert into s1212a.company
select *
  from t1;

with t1 as (
delete 
  from s1212bak.contract c
 using s1212a.company co
 where co.cname = 'PRESTIGE'
   and c.com_id  = co.id
 returning c.*
)
insert into s1212a.contract
select *
  from t1;
  
-- 觀察現在s1212a下的 company, contract

select co.*, '|'
     , c.*
  from s1212a.company co
  join s1212a.contract c
    on c.com_id  = co.id
 where co.cname = 'PRESTIGE';

 id |  cname   | ?column? | id | gal_id | com_id 
----+----------+----------+----+--------+--------
  3 | PRESTIGE | |        |  5 |      3 |      3
  3 | PRESTIGE | |        |  7 |      4 |      3
(2 rows)

-- videos

with t1 as (
delete 
  from s1212bak.videos v
 using s1212a.contract c
     , s1212a.company co
 where co.cname = 'PRESTIGE'
   and c.com_id  = co.id
 returning v.*
)
insert into s1212a.videos
select *
  from t1;

select *
  from s1212a.current_vidoes;

   gname    |    cname    |        vids         
------------+-------------+---------------------
 葵つかさ   | Alice Japan | {DV-1195}
 葵つかさ   | S1          | {SNIS–436,SSIS-974}
 あやみ旬果 | PRESTIGE    | {ABP-356,EDD-209}
 あやみ旬果 | S1          | {SSNI-203}
 小島みなみ | Alice Japan | {DV-1279}
 小島みなみ | S1          | {SSIS-911,SNIS-133}
 鈴村あいり | PRESTIGE    | {EDD-218,ABF-061}
(7 rows)

-- 資料都回來了
-- 檢查 s1212bak schema 下的資料

select *
  from s1212bak.videos;

 id | cid | vid 
----+-----+-----
(0 rows)

使用delete 搭配join的方式,再做insert.先做最上級,然後逐步搭配已經復原的資料,就能復原了!

不使用 cascade 的方式也可以啊

是的! 不使用 cascade 的方式,一樣可以透過join, 先刪除最下級資料,
逐步往上,與復原那段類似但反向.
但是,select大家常下,delete其實是相對較少的,而且有時候下架時機是有壓力的,甚至有時候會有一些商務法務上的需求.可以,但是有時候就出包,往往還是大包.
為了減少失誤,提高效率與正確性,現成的機制,為何不善用?

假設あやみ旬果引退後要下架影片

這時候需要與片商協商,目前已經接獲 S1 同意,先下架あやみ旬果與 S1 合作之影片.
































































delete 
  from s1212a.contract c
 using s1212a.gal g
     , s1212a.company co
 where c.gal_id = g.id
   and c.com_id = co.id
   and g.gname = 'あやみ旬果'
   and co.cname = 'S1';

select *
  from s1212a.current_vidoes;

   gname    |    cname    |        vids         
------------+-------------+---------------------
 葵つかさ   | Alice Japan | {DV-1195}
 葵つかさ   | S1          | {SSIS-974,SNIS–436}
 あやみ旬果 | PRESTIGE    | {ABP-356,EDD-209}
 小島みなみ | Alice Japan | {DV-1279}
 小島みなみ | S1          | {SSIS-911,SNIS-133}
 鈴村あいり | PRESTIGE    | {EDD-218,ABF-061}
(6 rows)

-- 可以看到 あやみ旬果 與 S1 的影片都下架了!
-- 檢查 s1212bak schema 下的資料
select *
  from s1212bak.contract;

 id | gal_id | com_id 
----+--------+--------
  6 |      3 |      2
(1 row)

select *
  from s1212bak.videos;

 id | cid |   vid    
----+-----+----------
  9 |   6 | SSNI-203
(1 row)

rollback;

-- 因為我是把 autocommit off, 所以 rollback 就都復原了!

select *
  from s1212bak.videos;

 id | cid | vid 
----+-----+-----
(0 rows)

select *
  from s1212a.current_vidoes;

   gname    |    cname    |        vids         
------------+-------------+---------------------
 葵つかさ   | Alice Japan | {DV-1195}
 葵つかさ   | S1          | {SNIS–436,SSIS-974}
 あやみ旬果 | PRESTIGE    | {ABP-356,EDD-209}
 あやみ旬果 | S1          | {SSNI-203}
 小島みなみ | Alice Japan | {DV-1279}
 小島みなみ | S1          | {SSIS-911,SNIS-133}
 鈴村あいり | PRESTIGE    | {EDD-218,ABF-061}
(7 rows)

一樣是一道delete 搞定!

假設我們需要下架葵つかさ所有影片,不分公司.

這時候有兩種作法, 一種是直接將 gal 裡面葵つかさ刪除,
另一種是將 contract 裡面與葵つかさ關聯的刪除.
第二種方法與上面あやみ旬果下架的方式類似,而且還不用判斷company部分.只跟gal join.第一種方法更簡單.
至於使用哪種方式,要看我們實際需求,一般來說,我會採用第二種,保留av女優的名字在系統裡.因為有共演片,至於共演片是否下架,這是另外議題,假設共演片先不下架,這樣就能關聯到已下架影片女優的名字了.當然這是比現在範例複雜許多的情境.










































delete 
  from s1212a.gal
 where gname = '葵つかさ';

-- 觀察這時候的 s1212bak schema 下的

select g.gname, co.cname
     , array_agg(v.vid) as vids
  from s1212bak.contract c
  join s1212bak.gal g
    on c.gal_id = g.id
  join s1212a.company co
    on c.com_id = co.id
  join s1212bak.videos v
    on c.id = v.cid
 group by g.gname, co.cname;

  gname   |    cname    |        vids         
----------+-------------+---------------------
 葵つかさ | Alice Japan | {DV-1195}
 葵つかさ | S1          | {SNIS–436,SSIS-974}
(2 rows)

-- 觀察 current_videos

select *
  from s1212a.current_vidoes;

   gname    |    cname    |        vids         
------------+-------------+---------------------
 あやみ旬果 | PRESTIGE    | {ABP-356,EDD-209}
 あやみ旬果 | S1          | {SSNI-203}
 小島みなみ | Alice Japan | {DV-1279}
 小島みなみ | S1          | {SSIS-911,SNIS-133}
 鈴村あいり | PRESTIGE    | {EDD-218,ABF-061}
(5 rows)

-- 已經符合需求了,輕鬆簡單

rollback;

-- rollback 就全部回來了.

結語

正規化是很重要,且需要我們深入理解與實踐進而能夠更有效的使用.

正規化探討

前言

還是來看實際的案例

分析與設計

建立 schema 與 tables

傳播性更新

更新 id 而且還是pk

刪除的情境

先建立 mirror table

建立 trigger

假設 PRESTIGE 與我們終止合作,相關video都要下架!

假設 PRESTIGE 與我們恢復合作,相關video都要上架!

不使用 cascade 的方式也可以啊

假設 あやみ旬果 引退後要下架影片

假設 我們需要下架 葵つかさ 所有影片,不分公司.

結語

Read more

PostgreSQL Upsert 的探討

Postgresql 的冪等操作探討

巧用 array 處理 tags 續篇

使用statement trigger 檢查delete 是否有where條件

假設あやみ旬果引退後要下架影片

假設我們需要下架葵つかさ所有影片,不分公司.