# DSCI - 551 Midterm - 1 & 2 & Final review ## Midterm - 1 - Overview - Video 3V - Volume, velocity and variety - Hadoop: 2 major components - HDFS: Hadoop Distributed File System -> store data - Map-reduced -> process data - Firebase Database - JSON data - valid & invalid data - commands - CRUD - create ```PUT``` / ```POST``` - differences between these two? - read ```GET``` - update ```PATCH``` - delete ```DELETE``` - filtering - ```orderBy``` = "\$key" / "\$value" / "name" - ```startAt``` / ```endAt``` / ```equalTo``` - ```limitToFirst``` = 2 - ```limitToLast``` = 1 - Storage systems - 2 kinds of devices - HDD - CHS: cylinder, head, sector - Physical address - block id - namenode - datanode - ==**Sequencial and random workflow**== and ==**bandwidth**== - Metrics: trasmission, lantency - In sequencial workflow, transmission time dominates - The more data you choose to transmit, the closer/~~farther~~ away you are from the maximum bandwidth - SSD - p/e - program: 1 -> 0 - erase: 0 -> 1 - 1: no electronic - 0: electronic - read / write / erase - read / write can be done on page/~~block~~ - erase must erase in block (a number of pages) - Writing is slower than reading - ?? check - ==The adding page example!== - File systems & Hadoop - Inode & inumber - Hadoop achitechture - Namenode manages metadata. - Details of metadata. - Reading - getBlockLocations() - 3 arguments - 1 returned value - Writing - Data pipeline exmple (page 36-39t - 64KB/packet - Datanode manages data. - File format / encoding - XML - UTF-8 - A sequence of bytes to decode ## Midterm - 2 - Overview - ER diagram - Relational modeling - SQL - Constraints - ER diagram - There are X entities. Entity A has X attributes including a, b, c. - You don't need to say datatype for ER diagram. - The relationship is between entity A and entity B. And its multiplicity is 1-to-1, 1-to-many, or many-to-many. - Many-to-many: courses & students - af: ERD has triangle, a special relationship - af: superclass & subclass - person is a super class while student is a subclass - convert ERD into relational database - af: Entity: bars, beers, drinkers - af: table for E, R, state key for each table - what the key for table Sells: (bar, beer) - ER approach - af: superclass: person - id (key), name - af: subclass: student - id (key), gpa - SQL - mo: - subquery - group by - aggregation - constrains - PK, unique, FK - af: - like operator - whole bunch of join - cross join, theta join, natural join, outer join, left outer, right outer, full outer (union all) - 'select * from S, R' is also a join. - if this join has a where, we call it a theta join - example - select * from titles limit 5; - select * from salaries limit 5; - natural join: join all common attributes - example - select * from Beers; - select * from Likes; - these two table doesn't has common attribute - when you do natural join between tables without any common attribute, it generate result of Cartesian product. - example - select * from Likes; - select * from Frequents; - select * from Likes natural join Frequents; - select * from Likes l join Frequents f where l.drinker = f.drinker; - outer join - subqueries - constrains: PK, FK # Final Exam