# Introduction of Computer Science<br>Ch14 & Ch15
NTNU 計算機概論
##### [Back to Note Overview](https://reurl.cc/XXeYaE)
##### [Back to Introduction of Computer Science](https://hackmd.io/@NTNUCSIE112/SyU5Uuyhr)
{%hackmd @NTNUCSIE112/csabb %}
{%hackmd @NTNUCSIE112/csch %}
<!--
https://hackmd.io/@sophie8909/csabb
https://hackmd.io/@sophie8909/csch
-->
###### tags: `NTNU` `CSIE` `必修` `Introduction of Computer Sciencce`
# Ch14 Databases
## 14-1 Introduction<!---->
A database is a collection of related, logically coherent data used by the application programs in an organization.
- Database system application
- Bank
- School
- etc.
- the method to build database system
- using programming language
- using specialized database software (e.g. Microsoft Access)
### Database Management Systems(DBMS)
```graphviz
digraph hierarhcy
{
nodesep=1.0
node[color = pink, fontname=Courier,shape=box]
edge [color=black,style=line]
user->Application->DBMS
"Database Manager"->DBMS->{Database1 Database2 Database3}
}
```
```graphviz
digraph hierarhcy
{
nodesep=1.0
node[color = pink, fontname=Courier,shape=box]
edge [color=black,style=line]
DBMS->{Hardware Softare Data Users Procedures}
}
```
---
## 14-2 Database Architecture
### ANSI/SPARC structure
```graphviz
digraph hierarhcy
{
nodesep=1.0
node[color = pink, fontname=Courier,shape=box]
edge [color=black,style=line]
{" user view" "user view" "user view "}->"External level"
"External level"->"conceptual level"->"Internal level"
"External level"
}
```
+ **External level**
interacts directly with the user.(what the user see)
+ **Conceptual level**
defines the logical view of data. (connection between internal and external)
+ **Internal level**
determines where data are actually stored.
---
## 14-3 Database models
+ A database model defines the logical design of data.
+ The model also describes the relationship between different parts of data.
+ Three models:
+ Hierarchical model: obsolete
+ Network model: obsolete
+ Relational model:widely used today
### Hierarchical model
- Data are organized as an upside down tree.
- Each entity has only one parent but can have several children.
- Advantage:easy to insert and delete structure
- Disadvantage:delete parent node will both delete child nodes
- Examples:IBM's IMS and DL/1
- 
### Network model
- The entities are organized in a graph, where some entities can be accessed through several path.
- Advantage:variability
- Disadvantage:complex
- Examples:HP's IMAGE and Honeywell's IDS
- 
---
## 14-4 Relationship model
- 
- Data are organized in two-dimensional tables called relations(關係).
| Attributes | v | v | |
|:----------:|:-------------:|:------:|:------:|
| No | Course-Name | Unit | |
| ------- | ------- | ------ | Tuples |
| CIS17 | Intro to java | 5 | **<-** |
| CIS15 | Intro to C | 5 | **<-** |
| CIS19 | Unix | 4 | **<-** |
| CIS51 | Networking | 5 | **<-** |
<!-- 這樣是對的ㄇ -->
- The number of **attributes** in a relation is its **degree**
- The number of **rows** in a relation is its **cardinality**
- Advantage:Suitable for expressing more complex relationships
- Disadvantage:This structure is inefficient when the number of random queries is small
- Relational database is using widest
- e.g. Microsoft SQL Server, Sybase, Informix, MySQL, PostgreSQL, Dbase III Plus, Microsoft Access
---
## 14-5 Operations on relations
+ Select : unary operation
+ Insert : unary operation
+ Delete : unary operation
+ Update : unary operation
+ Project : unary operation
+ 選取一關係圖中特定幾項元素顯示
<br>
+ Join : binary operation
+ 加入額外元素成為新關係圖
+ Union : binary operation
+ 取兩關係圖間的聯集
+ Intersection : binary operation
+ 取兩關係圖間的交集
+ Difference : binary operation
+ A關係圖多餘B關係圖之元素
### SQL(Standard Query Language)
SQL is the language standardized by ANSI and ISO for use on relation database
SQL語言可以分為三個部分:
1. Data Definition Language(DDL):可用來建立新的或刪除已有的表格
2. Data Manipulation Language(DML):可用來對已有的表格進行新增,刪除,存取......等
3. Data Control Language(DCL):可用來控制資料存取的權限
4. Online SQL Test:[SQL Tryit Editor](https://www.w3schools.com/sql/trysql.asp?filename=trysql_op_in)
5. SQL Library:[Microsoft SQL](https://docs.microsoft.com/zh-tw/sql/sql-server/?view=sql-server-ver15)
(以下示範左邊為原表格,右邊為經過指令修改/後顯示的東西)
#### Data Definition Language
- Create table
```SQL=
CREATE TABLE member
(
member_id CHAR(8),
person_id CHAR(10),
name VARCHAR(8),
birtday DATE,
)
#### Data Manipulation Language
- Select Operation
```SQL=
SELECT * --SQL指令大寫 註解用--
FROM relation_name
WHERE criteria
```
- COURSES
| No | Course-Name | Unit | | Course-Name | Unit |
|:-----:|:-------------:|:----:|:------:|:-------------:|:----:|
| CIS17 | Intro to java | 5 | -> | Intro to java | 5 |
| CIS15 | Intro to C | 5 | select | Intro to C | 5 |
| CIS19 | Unix | 4 | -> | Unix | 4 |
| CIS51 | Networking | 5 | | Networking | 5 |
- Insert Operation
```SQL=
INSERT INTO relation_name
VALUES (xx,xx,xx)
```
- COURSES
| No | Course-Name | Unit | | No | Course-Name | Unit |
|:-----:|:-------------:|:----:|:------:|:---------:|:-------------:|:-----:|
| CIS17 | Intro to java | 5 | -> | CIS17 | Intro to java | 5 |
| CIS15 | Intro to C | 5 | insert | CIS15 | Intro to C | 5 |
| CIS19 | Unix | 4 | -> | CIS19 | Unix | 4 |
| CIS51 | Networking | 5 | | CIS51 | Networking | 5 |
| | | | | **CIS52** | **TCP/IP** | **6** |
- Delete Operation
```SQL=
DELETE FROM relation_name
WHERE criteria
```
- COURSES
| No | Course-Name | Unit | | No | Course-Name | Unit |
|:--------:|:--------------:|:-----:|:------:|:-----:|:-------------:|:----:|
| CIS15 | Intro to C | 5 | | CIS15 | Intro to C | 5 |
| CIS17 | Intro to java | 5 | -> | CIS17 | Intro to java | 5 |
| CIS19 | Unix | 4 | delete | CIS51 | Networking | 5 |
| **CIS5** | **Networking** | **5** | -> | CIS52 | TCP/IP | 6 |
| CIS52 | TCP/IP | 6 | | | | |
- Update Operation
```SQL=
UPDATA relation_name
SET attribute1 = value1, attribute2 = value2...
WHERE criteria
```
- COURSES
| No | Course-Name | Unit | | No | Course-Name | Unit |
|:-----:|:-------------:|:----:|:------:|:-----:|:-------------:|:----:|
| CIS15 | Intro to C | 5 | | CIS15 | Intro to C | 5 |
| CIS17 | Intro to java | 5 | -> | CIS17 | Intro to java | 5 |
| CIS19 | Unix | 4 | update | CIS19 | Unix | 4 |
| CIS51 | Networking | 5 | -> | CIS51 | Networking | **6** |
| CIS52 | TCP/IP | 6 | | CIS52 | TCP/IP | 6 |
- Project Operation
```SQL=
SELECT attribute_list
FROM relation_name
```
- COURSES
| No | Course-Name | Unit | | No | Unit |
|:-----:|:-------------:|:----:|:-------:|:-----:|:----:|
| CIS15 | Intro to C | 5 | | CIS15 | 5 |
| CIS17 | Intro to java | 5 | -> | CIS17 | 5 |
| CIS19 | Unix | 4 | project | CIS19 | 4 |
| CIS51 | Networking | 5 | -> | CIS51 | 5 |
| CIS52 | TCP/IP | 6 | | CIS52 | 6 |
- Join Operation
```SQL=
SELECT attribute_list
FROM relation1, relation2
WHERE criteria
```
- join
- COURSES
| No | Course-Name | Unit |
|:-----:|:-------------:|:----:|
| CIS15 | Intro to C | 5 |
| CIS17 | Intro to java | 5 |
| CIS19 | Unix | 4 |
| CIS51 | Networking | 5 |
| CIS52 | TCP/IP | 6 |
- TAUGHT-BY
| No | Professor |
|:-----:|:---------:|
| CIS15 | Lee |
| CIS17 | Lu |
| CIS19 | Walter |
| CIS51 | Lu |
| CIS52 | Lee |
- Result
| No | Course-Name | Unit | Professor |
|:-----:|:-------------:|:----:|:---------:|
| CIS15 | Intro to C | 5 | Lee |
| CIS17 | Intro to java | 5 | Lu |
| CIS19 | Unix | 4 | Walter |
| CIS51 | Networking | 5 | Lu |
| CIS52 | TCP/IP | 6 | Lee |
- Union Operation
```SQL=
SELECT *
FROM relation1
UNION
SELECT *
FROM relation2
```
- Union
- CIS15-Roster
| StudentID | F-Name | L-Name |
|:-----------:|:------:|:------:|
| 145-67-6754 | John | Brown |
| 232-56-5690 | George | Yellow |
| 245-89-6580 | Anne | Green |
| 459-98-6789 | Ted | Purple |
- CIS52-Roster
| StudentID | F-Name | L-Name |
|:-----------:|:------:|:------:|
| 145-67-6754 | John | Brown |
| 232-56-5690 | George | Yellow |
| 342-88-9999 | Rich | White |
- Result
| StudentID | F-Name | L-Name |
|:-----------:|:------:|:------:|
| 145-67-6754 | John | Brown |
| 232-56-5690 | George | Yellow |
| 245-89-6580 | Anne | Green |
| 459-98-6789 | Ted | Purple |
| 342-88-9999 | Rich | White |
- Intersection operation
```SQL=
SELECT*
FROM relation1
INTERSECT
SELECT*
FROM relation2
```
- Intersection
- CIS15-Roster
| StudentID | F-Name | L-Name |
|:-----------:|:------:|:------:|
| 145-67-6754 | John | Brown |
| 232-56-5690 | George | Yellow |
| 245-89-6580 | Anne | Green |
| 459-98-6789 | Ted | Purple |
- CIS52-Roster
| StudentID | F-Name | L-Name |
|:-----------:|:------:|:------:|
| 145-67-6754 | John | Brown |
| 232-56-5690 | George | Yellow |
| 342-88-9999 | Rich | White |
- **Result**
| StudentID | F-Name | L-Name |
|:-----------:|:------:|:------:|
| 145-67-6754 | John | Brown |
| 232-56-5690 | George | Yellow |
- Different Operation
```SQL=
SELECT *
FROM relation1
MINUS
SELECT *
FROM relation2
```
- Different
- CIS15-Roster
| StudentID | F-Name | L-Name |
|:-----------:|:------:|:------:|
| 145-67-6754 | John | Brown |
| 232-56-5690 | George | Yellow |
| 342-88-9999 | Rich | White |
- CIS52-Roster
| StudentID | F-Name | L-Name |
|:-----------:|:------:|:------:|
| 145-67-6754 | John | Brown |
| 232-56-5690 | George | Yellow |
| 245-89-6580 | Anne | Green |
| 459-98-6789 | Ted | Purple |
- Result
| StudentID | F-Name | L-Name |
|:-----------:|:------:|:------:|
| 245-89-6580 | Anne | Green |
| 459-98-6789 | Ted | Purple |
---
## 14-6 Database Design
```graphviz
digraph{
nodesep=1
node[color = pink, fontname=Courier,shape=box]
edge [color=pink,style=line]
了解顧客需求->邏輯資料設計->實體資料設計->建立資料庫
}
```
### Entity-Relationship Diagram(E-R)
```graphviz
digraph{
nodesep=1
node[color = pink, fontname=Courier,shape=box]
edge [color=pink]
{
No, Name, Unit
}->Course
{
s_id[flag=ID]
s_name[flag=Name]
s_add[flag=Address]
}->Student
{
p_id[flag=ID]
p_name[flag=Name]
p_add[flag=Address]
node[shape=diamond]
}->Professor
Student->takes->Professor
Course->teaches->Professor
}
```
### Normalization
Normalization is the process by which a given set of relation are transformed to a new set of relations with a more solid structure.
- First normal form (1NF)
- 避免資料表上有多筆資料的情況
- 分割放置在不同row
- Second normal form (2NF)
- 某欄位只與Primary Key 的部分有相依性
- 拆解出來建一個新表
- Also having 3NF, 4NF...
---
## 14-7 Other Database Models
- 分段式資料庫
- 片段分散式資料庫(fragmented distributed database)<br>
資料局部分散在各電腦,每部電腦維護部分資料庫。
- 複製分散式資料庫(replicated distributed database)<br>
一部電腦擁有另一部電腦精確複製的資料,儲存於某一部電腦資料所駔的修改,會確切的在其他電腦重複該動作。
- 物件導向資料庫(object-oriented database)
- 例:Jasmine、Alltalk、GemStone、O2
---
# Ch15 Data Compression
## 15-1 Introduction
### Objectivese
- Realize the need for data compression
- Differentiate between lossless and lossy compression
- Understand three lossless compression encoding techniques
- run-length
- Huffman
- Lempel Ziv
- Understand two lossy compression methods
- JPEG
- MPEG
- Understand the main idea behind the MP3 standard for compressing audio
### Data compression methods
- Data compression methods are either lossless(all information is recoverable) or lossy(some information is lost)
- Data compression takes the original message and reduces the number of bits that are to be transmitted
## 15-2 Lossless Compression Methods
### Run-length encoding
- replace consecutive repeating occurrences of a symbol by number of occurrences
- AAAAAABBBBBB -> A6B6
- for two symbols
- encode one symbol which is more frequent than the other
- This example only encode 0 between 1
- {00001|0100} {0001|0011}{1|0000} {0001|0011}
### Huffman encoding
- Frequency ↑ bits ↓ (fewer bits represent for the letter with higher frequency)
- Optimization Algorithm
- 二元樹避免歧異
### Morse Code
- 1836 built by Samuel F.B. Morse
- 1838 Morse improve Telegraph receiver
- 1844 The USA built 60km
- 1949 G.K. Zipf 統計每一英文字母出現的頻率
- 缺點:
- 良好的壓縮率
- 太慢,讀取二次才可編碼
- 第一次統計各符號的數量
- 第二次對訊息編碼
- LZ解決問題
### Lempel Ziv encoding
- Dictionary-based encoding
- GIF and TIFF using this
- One parsed
## 15-3 Lossy Cpmpression Methods
### Format
#### GIF
- Compression:Lempel Ziv
- 優點:
- 可有效節省壓縮空間,提高壓縮比例
- 支援動畫(將多幅的圖案合為一個GIF檔案後依序顯示)
- 缺點:僅支援256色
#### TIFF
- Compression:LZ, Huffman Coding, Run Length Encoding
- 色彩:
- 單色
- 灰階
- 全彩
#### PCX影像壓縮技術
- 以Run Length Encoding 作為壓縮的核心技術
- 優點:演算法簡單、易懂且易實作
- 缺點:壓縮率會隨著影像複雜度的不同而有所改變。遇到重複性低的影像資料,PCX處理後的影像空間有時反而會不減反增
#### JPEG
- 由ISO及CCITT在1991年發表
- JPEG為一種有損影像壓縮法,解壓縮後無法完全恢復為原始影像
- JPEG壓縮效果比較好,成為現今最常用的影像壓縮法之一
- 壓縮全彩或是8位元的灰階影像
- Compression:[DCT](####DCT)
<br>(過程詳情請見講義)
### Animation and Video
#### MPEG
- [I-frames](####I-frames)
- [P-frames](####P-frames)
- [B-frames](####B-frames)
### Audio
#### MP3
- 1987由德國Fraunhofer與Erlangen大學共同制定的數位聲音壓縮演算法
- MP3是MPEG影音壓縮演算法中針對聲音部分所制定的演算法標準
- 原理:為降低聲音失真度,MP3利用人耳聽覺的特性,去除掉人耳聽不到的過高或過低的頻率,大幅減少儲存空間
#### 聲音檔比較
| 格式 | 簡介 | 優點 | 缺點 |example|
|:----:|:--------------------:|:--------:|:--------------:|:-:|
| WAV | 從CD或直接錄音而來 | 原音呈現 | 占空間 |44.1kHz 16bits <br>單聲道取樣<br>22675KB 263秒|
| MP3 | 聲音檔的壓縮格式之一 | 壓縮率高 | 某種程度的失真 | 24kHz 16bits <br>單聲道取樣<br>1029KB 263秒|
### Compression Methods
#### DCT
- F(i,j)
- Uniform Gray Scale
- Case 1: Uniform gray scale
- Case 2: Two sections
- Gradient Gray Scale
- Case 3:
- Quantization & Compression
- Reading the table
#### I-frames
- It is an **independent frame** that is not related to any other frame.
- JPEG encoding
- They are presented at regular intervals.
- 12~15 images having 1 I-frame
#### P-frames
- Motion compensation can be predicted from the previous I-frame and P-frame changes
- MPEG:Motion compensation
- only saving different part
#### B-frames