Introduction to database

# Introduction to database ## What is database? * In computing, a database is ==an organized collection of data or a type of data== store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. :::info :pencil2: **What is data?** **Data** refers to any information that is captured and stored about an information or an object- called an entity- as well as the attributes of that entity. ::: * **Database Management Systems (DBMS)** are software systems used to store, retrieve, and run queries on data. A DBMS serves as an interface between an end-user and a database, allowing users to create, read, update, and delete data in the database. * DBMS, which include a layer of database management tools, help with the issue of the growing of a collection of data > ![sqlserver-dbms_components-h_half_column_mobile](https://hackmd.io/_uploads/B10Oe9sJR.png) --- ## What are the advantages of using the database approach? 1. Restricting unauthorized access to data 2. Providing persistent storage for program objects 3. Providing backup and recovery services 4. Enforcing integrity constraints on the database --- ## Type of the database Types of databases are broadly grouped into relational and non-relational databases. Relational databases are highly structured and understand a programming language called Structured Query Language (SQL). Non-relational databases are highly diverse, supporting a variety of data structures. Since many non-relational databases do not use SQL, they are often called NoSQL databases. > ***Relational database*** > Data is organized into tables that hold information about each entity and represent pre-defined categories through rows and columns. This structured data is both efficient and flexible to access. >:::success > A relational database stores data in separate tables rather than putting all the data in one big storeroom. >::: > ex. MySQL, PostgreSQL, SQL server > > > ***<Advantages of relational database>*** >* ACID (Atomicity, consistency, isolation, and durability) compliance >* Data accuracy (no duplicate information) >* [Normalization (eliminating redundancy and inconsistent dependency for the database](https://learn.microsoft.com/en-us/office/troubleshoot/access/database-normalization-description) > > ***<Disadvantages of relational databases>*** >* Scalability- If the requirements of the machine are insufficient, due to data size or an increase in the frequency of access, it is required to improve the hardware in the machine, also known as vertical scaling. >* Flexibility- In relational databases, ==the schema is rigid.== You define the columns and data types for those columns, including any restraints such as format or length. :::warning ***MySQL vs PostgreSQL vs SQL server*** * **MySQL** is the most widely adopted open source relational database and serves as the primary relational data store for many popular websites, applications, and commercial products. * **SQL Server** is a leading RDBMS that is built on top of SQL and developed by Microsoft. It is used to manage and store data to support numerous enterprise use cases for business intelligence, transaction processing, data analytics, and machine learning services. It is known for its high availability, fast performance when handling large workloads, and easy integration with other applications to gain business intelligence across your entire data estate. * **PostgreSQL** is an open source object-relational database management system, released under the PostgreSQL License. ==It supports both relational (SQL) and non-relational (JSON)== and offers advanced SQL functions, including foreign keys, subqueries, and triggers. PostgreSQL is also highly extensible, allowing you to define data types and generate custom functions. * [Key difference btw SQL Server and MySQL](https://aws.amazon.com/compare/the-difference-between-sql-and-mysql/): * SQL Server supports many platforms, but not nearly the same quantity as MySQL. SQL Server primarily supports Windows, with cloud deployment options for Linux and macOS via Docker, but MySQL supports more platforms than SQL Server. * When you increase the number of rows in SQL Server, there isn’t much increase in response time. When the number of data rows increases in MySQL, you also experience an increase in the time it takes to do things. * [Key difference btw MySQL and PostgreSQL](https://aws.amazon.com/compare/the-difference-between-mysql-vs-postgresql/): * For ACID compliance, **MySQL** offers only when you use it with InnoDB and NDB Cluster storage engines or software modules. **PostgreSQL** is fully ACID compliant in all configurations. * MySQL is a purely relational database. PostgreSQL, on the other hand, is an object-relational database. This means that in PostgreSQL, you can store data as objects with properties. Working with PostgreSQL is more intuitive for database developers. PostgreSQL also supports other additional data types like arrays and XML. * ***How to choose between PostgreSQL vs MySQL:*** * PostgreSQL is better suited for enterprise-level applications with frequent write operations and complex queries. * If your application requires frequent data updates, PostgreSQL is a better choice. However, if you require frequent data reads, MySQL is preferred. * PostgreSQL has built-in multiversion concurrency control support without read-write locks, but MySQL uses write locks to achieve real concurrency. * ***[How to choose between PostgreSQL vs SQL Server](https://cloud.google.com/learn/postgresql-vs-sql):*** * SQL Server has been the choice for bigger organizations that depend on Microsoft products. PostgreSQL, however, has carved out space for itself as a free, easy-to-implement database management system that offers maximum flexibility and functionality. * PostgreSQL supports most major operating systems. It can be hosted on a wide range of operating systems, including Linux, macOS, Windows, BSD, and Solaris. On the other hand, SQL Server does not support operating systems other than Microsoft Windows, Microsoft Server, or Linux. ::: > ***Non-relational databases*** > Non-relational databases, store unstructured or semi-structured data. ==They don't use tables with columns and rows the way that relational databases do.== Instead, they use a storage model that's optimized for the specific requirements of the type of data being stored. Non-relational databases allow for larger sets of distributed data to be accessed, updated, and analyzed quickly. > :::info > ***Document database*** > Document databases store data in documents, which are usually JSON-like structures that support a variety of data types. > * Due to documents being JSON-like, they are much easier to read and understand for a user. The data is organized, and also easy to view. > * There is also no schema, meaning you can have flexibility should documents of different shapes be inserted. > * Document databases are also highly scalable, which can be scaled horizontally. This means having the databases duplicated across multiple servers, while still being kept in sync. > > ***Key-value database*** > This is the most basic type of database, where information is stored in two parts: key and value. Because everything is stored as a unique key and a value that is either the data or a location for the data, reading and writing will always be fast. However, this simplicity restricts the type of use cases it can be used for. More complex data requirements can’t be supported. > > ***Graph databases*** > Graph databases are the most specialized of the non-relational database types. They use a structure of elements called nodes that store data, and edges between them contain attributes about the relationship. Relationships are defined in the edges, which makes searches related to these relationships naturally fast. > However, they are not very good for querying the whole database, where relationships aren’t as well—or at all—defined. They also don’t have a standard language for querying, which means moving between different graph database types comes with a learning requirement. > > ***Wide-column database*** > Wide-column databases, similar to relational databases, store data in tables, columns, and rows. However, the names and formatting of the columns don’t have to match in each row. The columns can even be stored across multiple servers. They are considered two-dimensional key-value stores because they use multi-dimensional mapping to reference data by row and column. > Like two-column key-value databases, wide-column databases have the benefit of being flexible, so queries are fast. They are good at handling “big data” and unstructured data because of this flexibility. > However, compared to relational databases, wide-column databases are much slower when handling transactions. > ::: --- ## [How to chose a relational database or a non-relational database](https://www.mongodb.com/compare/relational-vs-non-relational-databases) ### *Choosing a relational database* * the data is predictable, in terms of structure, size, and frequency of access. * Normalization can help reduce the size of the data on disk by limiting duplicate data and anomalies, decreasing the risk of requiring vertical scaling in future. ### *Choosing a non-relational database* * If the data you are storing needs to be flexible in terms of shape or size, or if it needs to be open to change in future, then a non-relational database is the answer. | Feature | Relational | Non-Relational | |:------------:|:----------:|:--------------:| | Availability | High | High | | Horizontal Scaling | Low | High | | Vertical Scaling | High | High | | Data Storage | Medium to large data | Optimized for huge data volumes | | Performance | Low to Medium | High | | Reliability | High | Medium | | Complexity | Medium | Low | | Flexibility | Low | High | --- # Example- [PostgreSQL](https://www.postgresql.org/) :::info ***Architectural Fundamentals*** In database jargon, PostgreSQL uses a client/server model. A PostgreSQL session consists of the following cooperating processes (programs): * A server process, which manages the database files, accepts connections to the database from client applications, and performs database actions on behalf of the clients. The database server program is called postgres. * A server process, which manages the database files, accepts connections to the database from client applications, and performs database actions on behalf of the clients. The database server program is called postgres. As is typical of client/server applications, the client and the server can be on different hosts. In that case they communicate over a TCP/IP network connection. You should keep this in mind, because the files that can be accessed on a client machine might not be accessible (or might only be accessible using a different file name) on the database server machine. The PostgreSQL server can handle multiple concurrent connections from clients. To achieve this it starts (“forks”) a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original postgres process. Thus, the supervisor server process is always running, waiting for client connections, whereas client and associated server processes come and go. (All of this is of course invisible to the user. We only mention it here for completeness.) ::: --- ## How to use [PostgreSQL](https://www.postgresql.org/docs/current/tutorial-sql.html): ### STEP1. Creating a Database By default, Postgres uses a concept called “roles” to handle authentication and authorization. The installation procedure created a user account called postgres that is associated with the default Postgres role. There are a few ways to utilize this account to access Postgres. One way is to switch over to the postgres account on your server by running the following command: ```shell $ sudo -i -u postgres ``` Then you can access the Postgres prompt by running: ```shell $ psql ``` This will log you into the PostgreSQL prompt, and from here you are free to interact with the database management system right away. To exit out of the PostgreSQL prompt, run the following: ```shell postgres=# \q ``` This will bring you back to the postgres Linux command prompt. To return to your regular system user, run the exit command: ```shell postgres@server:~$ exit ``` --- ### STEP2. [Create a table](https://www.digitalocean.com/community/tutorials/how-to-install-and-use-postgresql-on-ubuntu-20-04) Suppose that the Postgres authentication system makes by default is that for any role used to log in, that role will have a database with the same name which it can access. This means that if the user you created in the last section is called mydb, that role will attempt to connect to a database which is also called “mydb” by default. You can create the appropriate database with the createdb command. If you are logged in as the postgres account, you would type something like: ```shell postgres@server:~$ createdb mydb ``` Once the database is created, you can connect to the database by typing: ```shell postgres@server:~$ psql mydb ``` If it is required to delete the database, you can type: ```shell postgres@server:~$ dropdb mydb ``` After connecting to the database, The basic syntax for creating tables is as follows: ```shell mydb=# CREATE TABLE weather ( city varchar(80), temp_lo int, -- low temperature temp_hi int, -- high temperature prcp real, -- precipitation date date ); ``` You can enter this into psql with the line breaks. psql will recognize that the command is not terminated until the semicolon. :::info 1. White space (i.e., spaces, tabs, and newlines) can be used freely in SQL commands. That means you can type the command aligned differently than above, or even all on one line. Two dashes (“- -”) introduce comments. 2. PostgreSQL supports the standard SQL types int, smallint, real, double precision, char(N), varchar(N), date, time, timestamp, and interval, as well as other types of general utility and a rich set of geometric types. ::: --- ### STEP3. Generate a GUI [connected to database](https://tembo.io/docs/getting-started/postgres_guides/connecting-to-postgres-with-c-sharp) :::danger Note: : [libpqxx](https://pqxx.org/libpqxx/) library is the official C++ client API for PostgreSQL, which also makes the developer use [C++ to connect the database](https://www.tutorialspoint.com/postgresql/postgresql_c_cpp.htm). However, by leveraging [.NET framework](https://dotnet.microsoft.com/en-us/learn/dotnet/what-is-dotnet-framework) to create the app in windows, [Npgsql](https://www.npgsql.org/) (written in C#) is a better candidate for this project. ::: Before starting a project, we should install the Npgsql library in your project. You can install it via .NET command line-interface by running the following command in the terminal: ```shell dotnet add package Npgsql ``` Another way is to rely on [NuGet](https://www.nuget.org/) Package Manager UI in Microsoft Visual Studio. Once installed, the following code is to create the connection: ```shell NpgsqlConnection con = new NpgsqlConnection( connectionString: "Server=localhost;Port=5432;User Id=postgres;Password=passw0rd;Database=testdb;"); con.Open(); NpgsqlCommand cmd = new NpgsqlCommand(); cmd.Connection = con; ``` --- # [Design the Database Management System](https://github.com/preslavmihaylov/booknotes/tree/master/system-design/system-design-interview/chapter23) ## [Database Schema design](https://support.microsoft.com/en-us/office/database-design-basics-eb2159cf-1e30-401a-8084-bd4f9c9ca1f5) ### What is a database schema? :::danger Note: : The term database schema is most commonly used for relational databases, which organize information in tables and use the SQL query language. ::: A database schema defines how data is organized within a relational database; this is inclusive of logical constraints such as, table names, fields, data types and the relationships between these entities. ### [Types of database schemas](https://www.ibm.com/topics/database-schema) While the term schema is broadly used, it is commonly referring to three different schema types—a conceptual database schema, a logical database schema, and a physical database schema. 1. **Conceptual database schemas** They offer a big-picture view of what the system will contain, how it will be organized, and which business rules are involved. 2. **Logical database schemas** They clearly define schema objects with information, such as table names, field names, entity relationships, and integrity constraints. 3. **Physical database schemas** They provide the technical information that the logical database schema type lacks in addition to the contextual information, such as table names, field names, entity relationships, et cetera. That is, it also includes the syntax that will be used to create these data structures within disk storage. --- # Useful reference ## [(a) How to choose a right data structure in C#](https://learn.microsoft.com/en-us/dotnet/standard/collections/)