comp3311 database note

# comp3311 database note ## ER Model **Attributes:** * Derived atttributes (從其他attribute推導的) * Composite attribute * Mulitvalued attribute (很多種類) ![](https://i.imgur.com/90nItyJ.jpg) **Relationship:** ![](https://i.imgur.com/XJ4DdVH.jpg) ![](https://i.imgur.com/yuZHL5e.jpg) **Subclass and inheritance** properties of subclasses: * overlapping or disjoint * total or partial # Relational Model **ER->relational mapping** (1.4) * 順序：entity->add attributes-> relationship * 各entity變成各table name * primary key & foreign key 用粗體表示 * relationship 不顯示 vs 有自己的table (多對多) **conclusion** **1 to many ** * ER圖多的那一方放1的foreign key當作relationship * relationship上的attribute放在多的那一方 * relational箭頭由ER圖中多的一側出發 **many to many** * relationship有自己的table * relationship有自己的attribute和其他連接entity(primary key) * 箭頭由ER圖中relationship出發 **1 to 1** * relarionship放在粗的一側 * ~~細的那側entity(primary key)加在粗的那側???~~ * 箭頭由ER圖中粗的一側出發 **補注:** * 如在ER diagram遇multivalue attribute(多層圈圈) 在顯示relational model時要另外幫他創表->內容有 1.從哪個entity延伸的表名 2.他本身這個attribute的名字 3.relational model的箭頭要由這個表的1.延伸到本來entity table * relational model表示不出來的 1.total participation ### Create sql table * foreign key 會reference在ER圖中多對一多的那一方(aka. relational model有箭頭出去的那方) * relational model中有底線且是entity table的是primary key （multivalue attribute有底線的不是） * relational model中有箭頭出去但沒底線是foreign key ![](https://i.imgur.com/gTSZnRf.jpg) * relational model中有粗的延伸在sql table中設為not null(total participation) We cannot enforce the total participation constraint (an employee may have no associated subclass tuples). We cannot enforce the disjoint subclasses constraint (an employee may have several associated subclass tuples). ## Modify DB ## **SQL query** * **biggest and smallest** (min/max) method 1: create another view ![](https://i.imgur.com/Ps4MaWI.png) method 2: ![](https://i.imgur.com/lygCig1.png) * **decimal places** ![](https://i.imgur.com/oUa9V1s.png) (group by+ having-> filtering) * **group by** 使用時要select group by後的attributes * **coalesce**-> d找不到找c, c找不到找b... select coalesce(d,coalesce(c,coalesce(b,a))) * **combine two column** (concat) select concat(e.givenname, '', e.familyname) * **checking string** ![](https://i.imgur.com/RTkRVTi.jpg) * **set operation** union/intersect/expect * **summarizing** :aggrigate values in single column avg(), sum(), count() * **not exist** ![](https://i.imgur.com/LHW6WA7.png) * *join** * left outer join ## **Functional Dependency** X→ Y (1 to 1) can be read as: * Y functionally depends on X * X determines Y * if we know X then we know Y *Rules* * reflexivity: X → X * augmentation: X → Y get XZ → YZ * transitivity: X → Y, Y → Z get X → Z * additivity: X → Y, X → Z get X → YZ * projectivity: X → YZ get X → Y, X → Z * pseudotransitivity: X → Y, YZ → W XZ → W #### What is a trivial functional dependency X → Y? Y is a subset of X (e.g. ABC→AB, A→A) LHS包含RHS ## Closure X+ largest set of attributes that can be derived from X using F where X set of attributes F set of functional dependencies ## Super Key set of attributes that uniquely identifies a tuple in a table any set X, such that X+ = R 是super key不等於是candidate key ## Candidate Key a.k.a **primary key** 不可拆除其中任何一項 minimal superkey any set X, such that X+ = R and there is no Y subset of X such that Y+ = R > **example:** BD+ = {A,B,C,D,E,F,G} result: BD is super key && candidate key ABD & ABCD is super key (可以拆除A && C)BD+ = {A,B,C,D,E,F,G} (redundency level: 1NF (most redundent) < 3NF < BCNF) ## Boyce-Codd Normal Form (BCNF) for all functional dependencies X → Y either * X → Y is trivial or * X is a super key (**LHS "all" have to be super key**) ## Third Normal Form (3NF) for all functional dependencies X → Y either * X → Y is trivial or * X is a super key or * Y is single attribute of a candidate key (RHS "part" is super key) > > example: Consider the relation R(A,B,C,D,E) and the set set of functional dependencies F = { A → B, BC → E, DE → A } 1. List all of the candidate keys for R. {ACD} {BCDE} {CDE} 2. Is R in third normal form (3NF)? to do: a.檢查所有RHS要部分在candidate key b.右側是不是單一element 3. Is R in Boyce-Codd normal form (BCNF)? to do: a.檢查"所有LHS"都要包含其中一組candidate key {ACD} {BCDE} {CDE}都沒有 b.RHS要是LHS的subset F = { A → B, BC → E, DE → A } 右邊皆不是左邊的subset -> non trival ![](https://i.imgur.com/AkDTGHk.png) ### 3NF Normalisation example1 R=BCANLM, F{B→CA, L→MN}, key(R)=BL 1. compute minimal cover = {B→C, B→A, L→M, L→N} (把所有都分開) 2. Reduce minimal cover = {B→CA, L→MN} （在合再一起） 3. convert into relations: S=BCA, T=LNM （分成不同table) 4. **No relation has key BL so add new table containing key U=BL** 檢查完整key(BL)有沒有在table中,沒有的話另加一個table, **如果有多組cadidate key只要一組有在表裡面就好了** 5. Result is S=BCA, T=LNM, U=BL ### 3NF Normalisation example 2 candidate key: B fd: {C→D, C→A, B→C} 1. reduce minimal cover {C→AD, B→C} 2. split into table S=CAD, T=BC 3. no need to add extra table (B（Key) is in it) 5. result ### BCNF Normalisation example1 R=BCANLM, F={B→CA, L→MN} Key(R)=BL 1. Reduce minimal cover 1. R is not in BCNF, because B→CA is not a whole key We decomposit into 隨意挑選一組fds(e.g. B→CA) 生成第一組table 2. S=BCA, F'S'＝{B→CA} Key(S)=B KEY為新生成的table 他的fds怎麼來的另一個table為原先R所有的減去剛才生成table的右半側(CA) 3. T=BNLM,F'T'={L→NM}, key(T)=**BL** **如果以上的table不是BCNF的話還要拆開**(fds左側不是key) 分解第二個table T=BNLM,F'T'={L→NM} 4. U=LNM, F'U'={L→NM}, key(U)=L, is BCNF 5. V=BL, F'V'={}, key(V)=BL, is result: R, U, V ### BCNF Normalisation example2 R=ABCD, F={C→D, C→A, B→C} Key(R)=B reduce minimal cover: {C→AD, B→C} case 1: choose {C→AD} CAD {C→AD} key:C, is BCNF BC {B→C} key: B, is BCNF case 2: choose {B→C} BC {B→C} key:B,is BCNF ABD {}空的因為C被取走 key ABD, is BCNF ### Relational algebra * **selection** = "where" in sql * **projecttion** = "select" * **renaming** = "as" **example** SQL->Rel Alg 1. select * from Bars where addr = 'Coogee' Res = Sel[addr=COOGEE](Bars) 2. select addr from Bars where addr ='Coogee' 法1 Res = Proj[addr](Sel[addr=Coogee](Bars)) 法2 Tmp = Sel[addr=Coogee](Bars) Res = Proj[addr](Tmp) 3. select price from Sells where bar = 'CBH' and beer = 'New' Res = Proj[price](Sel[bar=CBH && beer=New](Sells)) 4. select * from Sells where price = (select price from Sells where bar = 'CBH' and beer = 'New') P = Proj[price](Sel[bar=CBH && beer=New](Sells)) Res = Sel[(price in P)](Sells) 5. select b.name as beer, b.manf as brewer, d.name as drinker from Beers b join Likes L on b.name = L.beer join Drinkers d on L.drinker = d.name Tmp1(name, manf, beer, drinker) = Beers join[name=beer] Likes Tmp0 Rename[dname,addr](Drinkers) --rename name as dname Tmp2(name,manf,beer,drinker,name,addr)= Tmp0 join[drinker=dname]Tmp1 Tmp3 = Proj[name,manf,dname](Tmp2) Res = Rename[beer,brewer,drinker]Tmp3 ### Query evaluation example: --schema: R(a,b) S(c,d) > select a as x from R join S on (b=c) where d = 100 Tmp1(a,b,c,d) = R Join[b=c] S Tmp2(a,b,c,d) = Sel[d=100](Tmp1) Tmp3(a) = Proj[a](Tmp2) Res(x)=Rename[Res(x)](Tmp3) (a)相同query為何第二次比較快 Since when executing the query for the first time, it has to read from the disk, which takes a long time. However, for the second time, the information is already in memory, which makes the second time faster. (B)用index比較快 In the frist query, the id is integer, and thus can be done by "index", while in the second query birthday is of type text, which can't be done with index **Query evaluation** 1. parser: takes SQL query produces effectively 2. relational algebra expression feeds it throgh the 3. query optimiser which gives the execution plan * **Execution plan** is a sequence of relational operations **Query Optimisation** best method for evaluating a query * best = lowest cost = fatest evaluation time * cost is measure in terms of pages read/written **speed up** 1. query * **join** is faster than subquery (especially subquery is correlated) * first filter then join(avoid large intermediate tables) * avoid applying functions in where/group-by claues 2. indexing * indexing will speed-up based on indexed attributes (in postgre it automatically makes primary key index) * hash table or btree (if queries are always equally test-> hash table, > < -> btree) check the time: 1. after p1-> psql... 2. command: \timing compariing query cost exercise select count(*) from Movies where substring(title,1,1) = 'D'; --7 using substring function select count(*) from Movies where title like 'D%'; --6 select count(*) from Movies where title ilike 'd%'; --32 select count(*) from Movies where title ~ '^D.*'; --16 select count(*) from Movies where title ~* '^d.*'; --16 select count(*) from Movies where title = 'District 9'; --5 select count(*) from Movies where id = 111113; --0.6 bc of id select max(id) from Movies; ->先跑id再跑title, title速度會變快 select max(title) from Movies;