# DDIA - chp 2 ## Introduction - Each abstraction layer hides the complexity of the layers below it by providing a clean data model. - Focus on data storage and querying ## Relation model vs Document model - Relation model - Still dominant nowadays - The goal is to hide that implementation detail behind a cleaner interface - NoSQL - Scalability - Free and open sourced - Specialized operations - Dynamic and expressive data model ### The object-relational database mismatch - Object programming issue (ORM) - Example ![](https://i.imgur.com/uQdNg1G.png) ```json { "user_id": 251, "first_name": "Bill", "last_name": "Gates", "summary": "Co-chair of the Bill & Melinda Gates... Active blogger.", "region_id": "us:91", "industry_id": 131, "photo_url": "/p/7/000/253/05b/308dd6e.jpg", "positions": [ {"job_title": "Co-chair", "organization": "Bill & Melinda Gates Foundation"}, {"job_title": "Co-founder, Chairman", "organization": "Microsoft"} ], "education": [ {"school_name": "Harvard University", "start": 1973, "end": 1975}, {"school_name": "Lakeside School, Seattle", "start": null, "end": null} ], "contact_info": { "blog": "http://thegatesnotes.com", "twitter": "http://twitter.com/BillGates" } } ``` ### Many-to-one and Many-to-many relationships - Why Id instead of fixed readable text - Consistent style and spelling across profiles - Avoiding ambiguity - Localization support - Better search - Example ![](https://i.imgur.com/GcE5xN5.png) ### Are Document database repeating history - History - Hierarchical model → relation model → document model - Hierarchical model - document based tree structure - Network model - The only way of accessing a record was to follow a path from a root record along these chains of links. This was called an access path - Traversal of a linked list - Relation model - Query optimizer - Comparison to document databases - Different to hierarchical model: storing nested records within their parent record rather than in a separate table. ## Relational Versus Document Databases Today ### Document database pros - Good for document-like structured application - Schema flexibility - Example ```json if (user && user.name && !user.first_name) { // Documents written before Dec 8, 2013 don't have first_name user.first_name = user.name.split(" ")[0]; } ``` ```sql ALTER TABLE users ADD COLUMN first_name text; UPDATE users SET first_name = split_part(name, ' ', 1); -- PostgreSQL UPDATE users SET first_name = substring_index(name, ' ', 1); -- MySQL ``` ### Document database cons - Cannot refer directly to a nested item within a document - many-to-many relationships: The poor support for joins - Data locality for queries ### Convergence of document and relational databases - Relational database support document data type - Document database side, RethinkDB supports relational-like joins ## Query Languages for Data ### Imperative query - Example ```sql function getSharks() { var sharks = []; for (var i = 0; i < animals.length; i++) { if (animals[i].family === "Sharks") { sharks.push(animals[i]); } } return sharks; } ``` ### Declarative query - Parallel execution - Better optimization - Example ```sql SELECT * FROM animals WHERE family = 'Sharks'; ``` ### Declarative Queries on the Web - Declarative ![](https://i.imgur.com/tQkUWBM.png) ![](https://i.imgur.com/rtm3RIc.png) - Imperative query ![](https://i.imgur.com/kAHuOVF.png) ## MapReduce Querying - SQL version ![](https://i.imgur.com/yq8U6BO.png) - MongoDB’s MapReduce ![](https://i.imgur.com/ccneizr.png) - Result ![](https://i.imgur.com/sWqYhKt.png) - MongoDB aggregation pipeline ![](https://i.imgur.com/Njm8FKD.png) ## Graph-Like Data Models ### Property Graphs - Representing a property graph using a relational schema ![](https://i.imgur.com/qsI7F0O.png) ### The Cypher Query Language - Cypher query ![](https://i.imgur.com/ZsEHlu3.png) - Cypher query to find people who emigrated from the US to Europe ![](https://i.imgur.com/NTrdPxr.png) - Declarative query optimization ### Graph Queries in SQL - Example ![](https://i.imgur.com/8dEiErO.png) ![](https://i.imgur.com/L2yGoGS.png) ### Triple-Stores and SPARQL - Subset of the data in Figure 2-5, represented as Turtle triples ![](https://i.imgur.com/1Xiobgy.png) - Another form ![](https://i.imgur.com/Q7dnghC.png) - The semantic web - The RDF data model ![](https://i.imgur.com/wKccAq3.png) - The SPARQL query language ![](https://i.imgur.com/ADLeilb.png) ### The Foundation: Datalog - A subset of the data in Figure 2-5, represented as Datalog facts ![](https://i.imgur.com/Nn910l1.png) - The same query as Example 2-4, expressed in Datalog ![](https://i.imgur.com/iNcSfdj.png) ![](https://i.imgur.com/4PlkfqO.png) ## Summary - Hierarchical data model - Relation data model - NoSQL database - Document databases - Graph databases - Other domain specified database