# DSCI - 551 Midterm - 1 & 2 & Final review
## Midterm - 1
- Overview
- Video 3V
- Volume, velocity and variety
- Hadoop: 2 major components
- HDFS: Hadoop Distributed File System -> store data
- Map-reduced -> process data
- Firebase Database
- JSON data
- valid & invalid data
- commands
- CRUD
- create ```PUT``` / ```POST```
- differences between these two?
- read ```GET```
- update ```PATCH```
- delete ```DELETE```
- filtering
- ```orderBy``` = "\$key" / "\$value" / "name"
- ```startAt``` / ```endAt``` / ```equalTo```
- ```limitToFirst``` = 2
- ```limitToLast``` = 1
- Storage systems
- 2 kinds of devices
- HDD
- CHS: cylinder, head, sector
- Physical address
- block id
- namenode
- datanode
- ==**Sequencial and random workflow**== and ==**bandwidth**==
- Metrics: trasmission, lantency
- In sequencial workflow, transmission time dominates
- The more data you choose to transmit, the closer/~~farther~~ away you are from the maximum bandwidth
- SSD
- p/e
- program: 1 -> 0
- erase: 0 -> 1
- 1: no electronic
- 0: electronic
- read / write / erase
- read / write can be done on page/~~block~~
- erase must erase in block (a number of pages)
- Writing is slower than reading
- ?? check
- ==The adding page example!==
- File systems & Hadoop
- Inode & inumber
- Hadoop achitechture
- Namenode manages metadata.
- Details of metadata.
- Reading
- getBlockLocations()
- 3 arguments
- 1 returned value
- Writing
- Data pipeline exmple (page 36-39t
- 64KB/packet
- Datanode manages data.
- File format / encoding
- XML
- UTF-8
- A sequence of bytes to decode
## Midterm - 2
- Overview
- ER diagram
- Relational modeling
- SQL
- Constraints
- ER diagram
- There are X entities. Entity A has X attributes including a, b, c.
- You don't need to say datatype for ER diagram.
- The relationship is between entity A and entity B. And its multiplicity is 1-to-1, 1-to-many, or many-to-many.
- Many-to-many: courses & students
- af: ERD has triangle, a special relationship
- af: superclass & subclass
- person is a super class while student is a subclass
- convert ERD into relational database
- af: Entity: bars, beers, drinkers
- af: table for E, R, state key for each table
- what the key for table Sells: (bar, beer)
- ER approach
- af: superclass: person
- id (key), name
- af: subclass: student
- id (key), gpa
- SQL
- mo:
- subquery
- group by
- aggregation
- constrains
- PK, unique, FK
- af:
- like operator
- whole bunch of join
- cross join, theta join, natural join, outer join, left outer, right outer, full outer (union all)
- 'select * from S, R' is also a join.
- if this join has a where, we call it a theta join
- example
- select * from titles limit 5;
- select * from salaries limit 5;
- natural join: join all common attributes
- example
- select * from Beers;
- select * from Likes;
- these two table doesn't has common attribute
- when you do natural join between tables without any common attribute, it generate result of Cartesian product.
- example
- select * from Likes;
- select * from Frequents;
- select * from Likes natural join Frequents;
- select * from Likes l join Frequents f where l.drinker = f.drinker;
- outer join
- subqueries
- constrains: PK, FK
# Final Exam