<style> html, body, .ui-content { background-color: #222121; color: #ddd; } ::selection { background-color: #d46b95; color: black; } .markdown-body:not(.next-editor) pre { padding: 16px; background-color: #353535; color: #e0def4; } .markdown-body:not(.next-editor) strong { color: #d46b95; } .markdown-body:not(.next-editor) em { color: #fbaa77; } .markdown-body:not(.next-editor) code { color: #eee; background-color: #353535; } .markdown-body h1, .markdown-body h2, .markdown-body h3, .markdown-body h4, .markdown-body h5, .markdown-body h6 { color: #dfdad9; } .markdown-body h1:hover, .markdown-body h2:hover, .markdown-body h3:hover, .markdown-body h4:hover, .markdown-body h5:hover, .markdown-body h6:hover { color: #ffff90; } .markdown-body hr { height: 0em; padding: 0; margin: 24px 0; background-color: #212121; border: 0; border-top: 3px dashed #8c8b8b; } pre { background-color:#353535; } code { background-color:#353535 } .hljs-keyword { color: #ffaacc; } .hljs-type { color: #9ccfd8 } .hljs-number { color: #fbaa77 } .hljs-operator { color: #fbea77 } .hljs-string { color: #a3be8c } .hljs-built_in { color: #fbea77 } .markdown-body h1, .markdown-body h2 { border-bottom-color: #ffffff69; } .markdown-body h1 .octicon-link, .markdown-body h2 .octicon-link, .markdown-body h3 .octicon-link,i .markdown-body h4 .octicon-link, .markdown-body h5 .octicon-link, .markdown-body h6 .octicon-link { color: #fff; } .markdown-body img { background-color: transparent; } .ui-toc-dropdown .nav>.active:focus>a, .ui-toc-dropdown .nav>.active:hover>a, .ui-toc-dropdown .nav>.active>a { color: white; border-left: 2px solid white; } .expand-toggle:hover, .expand-toggle:focus, .back-to-top:hover, .back-to-top:focus, .go-to-bottom:hover, .go-to-bottom:focus { color: white; } .ui-toc-dropdown { background-color: #212121; } .ui-toc-label.btn { background-color: #191919; color: white; } .ui-toc-dropdown .nav>li>a:focus, .ui-toc-dropdown .nav>li>a:hover { color: white; border-left: 1px solid white; } .markdown-body blockquote { color: #bcbcbc; } .markdown-body table tr { background-color: #5f5f5f; } .markdown-body table tr:nth-child(2n) { background-color: #4f4f4f; } .markdown-body code, .markdown-body tt { color: #eee; background-color: rgba(230, 230, 230, 0.36); } a, .open-files-container li.selected a { color: #5EB7E0; } </style> ⠀ ## Acessing SQL from python Why Python with MySQL * Efficient language * Portability and Platform Independent * Python supports SQL cursors. * Python supports relational database systems.⠀⠀ ### MySQL connector Mysql connector python is a module or library available in python to communicate with MySQL. #### How to establish a connection ? * Establish a connection using ‘mysql.connector.connect()’ which takes (host name, user name, password, database name(optional)) * Create a cursor object: A cursor is used to execute SQL queries and fetch results from the database. * .execute() function used with cursor to execute sql queries * Close cursor and connection ![image](https://hackmd.io/_uploads/H1KkEsVU6.png) ![image](https://hackmd.io/_uploads/BJSbVi4IT.png) ### Functions for CRUD 1. cursor.execute(query, values) ![image](https://hackmd.io/_uploads/HJrDVs4Ia.png) 2. cursor.commit() : commit the changes to the database 3. cursor.fetchone(): Fetches the next row from the result set ![image](https://hackmd.io/_uploads/ByJESsNUa.png) 5. cursor.fetchall(): Fetches all the rows from the result set returned by a SELECT query. ## Semi Structured Databases ### XML : Extensible Markup Language * XML is a standard for structuring and exchanging data over the Web in text files. * Self-describing documents. **Why XML over HTML ?** * Provides standardized way to describe data, making it machine-readable. * Chosen when primary goal is to structure and exchange data (rather than web presentation) * Allows users to define custom data structures using elements and attributes * Semantic tag names convey the meaning of data (HTML tags covey how content must be displayed) #### XML Hierarchical (Tree) Data Model Has elements and attributes * Elements are identified by start tags enclosed in angle brackets (<...>) and end tags with a slash (</...>). * Attributes provide additional information that describes elements. **Provides** * Schema Definitions * Heirarchy * Flexibility ### Structured Data * Data that follows a predefined, organized format, often stored in relational database. ### Semi Structured Data * Does not have rigid, predifined structure but posseses some level of organization/heirarchy * Displaying semi-structured data as directed graph * Internal nodes represent individual object or composite attributes * Leaf nodes represent actual atomic values. * Flexibility in terms of structure and schema handling ![image](https://hackmd.io/_uploads/ByXLwjVUT.png) ### Unstructured Data * Does not have a predefined structure or format * Lacks a clear and organized schema, making it more challenging to analyze and process * Ex: audio, text documents, web pages ## NoSQL * NoSQL databases emerged as the cost of storage dramatically decreased. * Allow developers to store huge amounts of unstructured data, giving them a lot of flexibility. * Distributed databases focused on semistructured data storage, high performance, availability, data replication, and scalability * Do not require powerful query language and concurrency control ### Characteristics of NoSQL Db **Related to related to distributed databases and distributed systems** * Scalability * Availability, Replication and Eventual Consistency * Replication Models * master-slave * master-master * Sharding: Distributes the load of accessing the file records to multiple nodes. * High-Performance Data Access **Related to data models and query languages** * Not Requiring a Schema * Less Powerful Query Languages * Versioning ### Categories of NoSQL Systems ![image](https://hackmd.io/_uploads/B1gTpoVUa.png) ### NoSQL vs SQL ![image](https://hackmd.io/_uploads/ryv-CiV8a.png) ### CAP Theorem It is not possible to guarantee all three of the desirable properties— consistency, availability, and partition tolerance at the same time in a distributed system with data replication ![image](https://hackmd.io/_uploads/BkU_RsVUa.png) ## MongoDB * Designed as a scalable database * Document-based NOSQL systems typically store data as collections of similar documents. These types of systems are also sometimes known as document stores. * MongoDB documents are stored in BSON (Binary JSON) * Documents are organized in a structure called collection. * **Document : row** in a table and **collection : entire table** 1. createCollection() `db.createCollection(“project”, { capped : true, size : 1310720, max : 500 }` ![image](https://hackmd.io/_uploads/SkikznN8T.png) 2. `db.<collection_name>.insert(<document(s)>)` ![image](https://hackmd.io/_uploads/B1ytfh4La.png) 3. Read `db.<collection_name>.find(<condition>)` ![image](https://hackmd.io/_uploads/H1GRfnN8a.png) ![image](https://hackmd.io/_uploads/HJQGX2VIa.png) ![image](https://hackmd.io/_uploads/Sky4X2ELp.png) ![image](https://hackmd.io/_uploads/rJQ6m34La.png) 4. Update `db.collection_name.update(selection_criteria, update_value)` ![image](https://hackmd.io/_uploads/S1sQNnNIa.png) * Since MongoDB is a distributed system, the two-phase commit method is used to ensure atomicity and consistency of multi-document transactions. * Most MongoDB updates are atomic if they refer to a single document 5. Delete `db.<collection_name>.remove(<condition>)` ![image](https://hackmd.io/_uploads/H1SKE348a.png) ![image](https://hackmd.io/_uploads/ry13VhNUp.png) ### Replication in MongoDB ![image](https://hackmd.io/_uploads/Ske1r2E8T.png) * The concept of replica set is used in MongoDB to create multiple copies of the same data set on different nodes in the distributed system * It uses a variation of the master-slave approach for replication. * One primary copy and at least one secondary copy of a collection stored on different nodes * A replica set must have a minimum of three participants, including an arbiter if only one secondary copy is needed. * The arbiter doesn't store data but participates in elections to select a new primary if the current one fails. * The total number of members in a replica set must be odd for the election process to function properly, with an arbiter added if necessary. * All write operations must be applied to the primary copy and then propagated to the secondaries. * The default read preference processes all reads at the primary copy, so all read and write operations are performed at the primary node. #### Sharding Sharding is a method for distributing or partitioning data across multiple machines. * no single machine can handle large workloads * Horizontal scalability/ scale out ![image](https://hackmd.io/_uploads/By3IUh48T.png) There are two ways to partition a collection into shards in MongoDB ➢ Range partitioning ![image](https://hackmd.io/_uploads/H1U9L2EIa.png) ➢ Hash partitioning ![image](https://hackmd.io/_uploads/HJUsU24IT.png) ## DynamoDB ![1 KN2Anqo6dY5uGcQvg_zb5w](https://hackmd.io/_uploads/H1IHlUXUT.png) ● Key: A unique identifier associated with a data item and is used to locate this data item rapidly. ● Value: The data item itself, and it can have very different formats for different key-value storage systems. * Table: A table in DynamoDB does not have a schema; it holds a collection of self-describing items. * Item: Each item will consist of a number of (attribute, value) pairs, and attribute values can be single-valued or multivalued. A table will hold a collection of items, and each item is a self-describing record (or object). * The table name and a primary key is required when creating a table. Primary key must exist in all the items in the table. #### Types of PK **Single Attribute/ Hash PK:** This attribute is used to build a hash index on the items in the table. **Pair of Attributes/ Hash and range type PK :** The primary key is a pair of attributes (A, B) where A is used for hashing and B is used for ordering the records which have the same A value ### Voldemort Key-Value Distributed Data Source High performance, high scalability, high availability (i.e replication, sharding, horizontal scalability) are realized through a technique to distribute the key-value pairs among the nodes of a distributed cluster known as **consistent hashing**. #### Features ![image](https://hackmd.io/_uploads/HyGQt34Up.png) ![image](https://hackmd.io/_uploads/rk64K2N8T.png) ![image](https://hackmd.io/_uploads/BJNLY2NI6.png) *This scheme allows horizontal scalability because when a new node is added to the distributed system, it can be added in one or more locations on the ring depending on the node capacity.* *This scheme also allows replication by placing the number of specified replicas of an item on successive nodes on the ring in a clockwise direction.* *The sharding is built into the method, and different items in the store (file) are located on different nodes in the distributed cluster, which means the items are horizontally partitioned (sharded) among the nodes in the distributed system.* ![image](https://hackmd.io/_uploads/ryHwqnEIT.png) ![image](https://hackmd.io/_uploads/rJ8tcn4IT.png) ## Query Optimization ![image](https://hackmd.io/_uploads/r19Fh34Ip.png) ## Neo4j ![image](https://hackmd.io/_uploads/ByGDUJWHa.png) ![image](https://hackmd.io/_uploads/BkU9UkZBa.png) ![image](https://hackmd.io/_uploads/SyIh8JZB6.png) ![image](https://hackmd.io/_uploads/Sy-gDk-H6.png) ![image](https://hackmd.io/_uploads/rJhWYvXU6.png) ![image](https://hackmd.io/_uploads/BJMfKPXUT.png) ![image](https://hackmd.io/_uploads/HJ1QYPmU6.png) ![image](https://hackmd.io/_uploads/B1qXFDmI6.png) ![image](https://hackmd.io/_uploads/SyrVFv7I6.png) ![image](https://hackmd.io/_uploads/Hk0EKD7Lp.png) ![image](https://hackmd.io/_uploads/SkIrYvmL6.png) **you can have constraints on the property labels(Employee id primary key you can set)** ![image](https://hackmd.io/_uploads/BJ1su1ZS6.png) ## Query Processing ![image](https://hackmd.io/_uploads/B1XAi1Wra.png) ![image](https://hackmd.io/_uploads/BkUw7uELp.png) ![image](https://hackmd.io/_uploads/r1JDVuN86.png) *done by optimizer* ![image](https://hackmd.io/_uploads/SJHTVd4L6.png) ![image](https://hackmd.io/_uploads/SJQ86yWS6.png) ![image](https://hackmd.io/_uploads/Sk1zyl-Bp.png) ![image](https://hackmd.io/_uploads/Byt2HdNL6.png) ![image](https://hackmd.io/_uploads/rJcw_l-Ba.png) ![image](https://hackmd.io/_uploads/B1RO_gbrp.png)