W2 - Using Relational Databases
====
## Creating Tables and Loading Data
### 1. Types of SQL statements (DDL vs. DML)
* Data Defination Language(DDL)
* Defining Object (tables)
* DDL statement
* Define
* Change
* Drop data
* Common DDL
* CREATE
* ALTER
* TRUNCATE
* DROP
* Data Manipulation Language(DML)
* Manipulating data in tables
* DML statement
* Read
* Modify data
* CRUD operations(Create, Read, Update, Deletw rows)
* Common DML
* INSERT
* SELECT
* UPDATE
* DELETE
### 2. Creating Tables
* Creating tables
* Schema
* Table name
* Column names
* Data Types
* Duplicates
* Null values
* Methods for creating tables
* Visual or UI tools
* Db2 on Cloud console
* MySQL phpMyAdmin
* PostgreSQL PGAdmin
* CREATE TABLE SQL statement
* Administrative APIs
### 3. CREATE TABLE Statement
* CREATE table
* Syntax:

* Example:

* Example based on the Library database:
* Blueprint:

* Example:

### 4. ALTER, DROP, and Truncate tables
* ALTER TABLE ... ADD COLUMN
* Add or remove columns
* Modify the data type of columns
* ADD or remove keys
* Add or remove contraints
* Syntax:

* ADD Example:

* Change data type Example:

* Drop Example:

* Drop table Example:

* TRUNCATE TABLE Example:

* 清空表格里的所有数据
### 5. Data Movement Utilities
* Scenarios for Data Movement
* Initially populate the entire database
* Create a development and testing copy
* Create a snapshot for disaster recovery
* Create new table using data from external source / file
* Add or append data in existing table
* Data Movement Tools and Utilities
* 3种转移数据的工具
* Backup and Restore
* Backup create a file for the entire database
* Restore creates exact copy of the database from file
* Preserve all database objects and their data:
* Schemas, Table, Views
* User Defined Types, Functions, Stored Procedures
* Constraints, Triggers, Security
* Backups performed periodically for disaster recovery
* Create copies of database for development and test
* Import and Export
* Import insert data from a file into a table
* Export saves table data into a file
* Operations available using different interfaces
* Command line
* API
* GUI
* Third Party Tools
* IMPORT/EXPORT file formats
<br>
* Load
* Supports XML, large objects, and user-defined types
* Faster than the import utility
* Does't perform as many checks
* Preferred option for loading very large datasets
* Initiate from cammand line / API / Visual Tool
### 6. Loading Data
* Source of data
* Load data
* From different data sources
* In different formats
* Using the Load Data utility
* 4 steps to loading data:
1. Source
2. Target
3. Define
4. Finaliza
## Designing Keys, Index, and Contraints
### 1. Database Objects & Hierarchy (Including Schemas)
* Database hierarchy

* Instances
* Is a logical boundary for databases objects, and configuration.
* Provides unque database server environment.
* Allows isolation between databases.

* Relational databases
* Stores, manages, and provides access to data.
* Contains objects like tables, views, indexes.
* Uses related tables to reduce data redundancy
* Can be distributed across multiple systems.

* Schemas
* Organize database objects
* Default schema is the user schema
* Provide a naming context
* System schemas contain database configuration information

* Database partitions
* Data is managed across multiple partitions
* Split tables that contain very large quantities of data
* Hold a subset of the data Commom in data warehousing

* Database objects
* Physical database design consistes of defining database objects
* Common database objects:
* Tables
* Contraints
* Indexs
* Views
* Aliases
* Create and manage using Graphical tools, scripting, APIs, SQL
### 2. Primary Keys and Foreign Keys
* Primary key

* 一个表只能有一个primary key,但可有多个unique key。
* 创建 primary keys

* Foreign key

* 一个建立父子表关系的键
* 创建 foreign key

* 更新表 ON UPDATE

* 删除表 ON DELETE CASCADE

<br>
### 3. Overview of Indexes
* Index

* book_id沒有依序排列会造成搜寻的效率降低。

* 经有依序排列的book_id的pointer表指向原有的表,当搜寻排列过的book_id寻找相对的数据,搜寻效率会提高。
* Creating an index

* 使用Index的优点及缺点
* 优点:
* 增强SELECT queries的效率
* 减少数据的排序
* 检查rows的唯一性
* 缺点:
* 占用数据存储空间
* 降低INSERT,UPDATE,and DELETE queries的效率。
### 4. Normalization
* Normalization减少数据重复导致不一致的可能。
* 提升读取/写入的效率。
* 提升数据的完整性
* Data engineer 会特別常用的normal form:
* First normal form
* 每一行必须是唯一的
* 每个cell里只允许有一个值
* 简称为1NF

* book_id 401 的 format 同一行只能有一个值。
* Second normal form
* 数据库必须在 first normal form 之中
* 独立一個table存取有几个行的数据
* 简称为2NF


* book_id 401 的 format 有两类值,所以另外独立一个table存取该数据。
* Third normal form
* 数据库必须在 first normal form 和 second normal form 之中
* 另设一列指向新的Table
* 简称为3NF


* Normalization in OLTP and OLAP
* OLTP
* 数据读取/写入的频率高
* 数据 normalized 到 3NF 或 BCNF
* OLAP
* 数据被读取的频率高
* 数据 de-normalized to 2NF 或 1NF
### 5. Relational Model Constraints - Advanced
* Entity integrity constraint
* Ensures that the primary key is a unique value that identifies each tuple (or row.)
* Referential integrity constraint
* Defines relationships between tables.
* Semantic integrity constraint
* Refers to the correctness of the meaning of the data.
* Domain constraint
* Specifies the permissible values for a given attribute.
* Null constraint
* Specifies that attribute values cannot be null.
* Check constraint
* Limits the values that are accepted by an attribute.