---
title: Database Systems Pre-Lab 5
---
<h1 style='border: none'><center>Database Systems Pre-Lab 5</center></h1>
<h2 style='border: none'><center>Intermediate SQL</center></h2>
<h5><center>The Islamic University of Gaza<br>Engineering Faculty<br>Department of Computer Engineering</center></h5>
<h6>Authors: Usama R. Al Zayan & Rasha E. Kahil<span style="float:right">2023/03/16</span></h6>
---
## Expecteed outcomes
* To be introduced to Database Authorization commands and concepts.
* To learn how to define, alter, and use Views and Materialized Views in PostgreSQL.
* To practice working with dates in SQL.
* To learn how to import CSV file into a PostgreSQL table.
## Lab 5: Time and Plan
| Tasks | Timing |
| -------- | -------- |
| Quiz 5 | 10 min |
| Task 1 | 10 min |
| Task 2 | 40 min |
| Task 3 | 15 min |
| Task 4 | 20 min |
## Table of Content
[toc]
# Introduction
<p style="text-align:justify">
In the last couple of labs, we had the chance to work with SQL and define many queries with a fair amount of complexity. We had the chance to look at topics like aggregate functions, joins, string functions, nested subqueries and many more. In this lab, we continue our SQL journey by exploring 3 of the most important topics in practice. Authorization, Views, and working with Dates or temporal data in general.
</p>
We start by introducing a nice new statement that will be useful in out quest.
## The EXPLAIN Statement
<p style="text-align:justify">
The <B>explain</B> statement is a special statement in PostgreSQL that -by itself- does not introduce any changes to the database or the query. <B>Explain</B> returns a description of the execution plan the query planner generates for a given statement, it also shows how the tables involved in the statement will be scanned by index scanned or sequential scan, etc. Furthermore, one of the most important and useful information that the <B>explain</B> statement gives is start-cost before the first row can be returned and the total cost to return the complete result set.
</p>
The following shows the syntax of the explain statement:
<center>
```sql
explain [ (option [, ...]) ] sql_statement;
```
</center>
Where option can be one of many options but we are now interested in:
* ANALYZE
* VERBOSE
### ANALYZE
<p style="text-align:justify">
The <B>analyze</B> option causes the SQL statement to be executed first and then actual run-time statistics is returned, including total elapsed time expended within each phase and the number of rows that phase actually returned. The <B>analyze</B> statement actually executes the SQL statement, therefore, if you want to analyze any DML statement such as <B>insert</B>, <B>update</B>, or <B>delete</B> without affecting the data, you should wrap the <B>explain analyze</B> in a transaction like the following example:
</p>
```sql=
begin;
explain analyse update instructor i
set salary = salary + (select count(*) from teaches t where t.id = i.id) * 1000;
rollback;
```
Note: you can write analyse or analyze. Both will work. Yes, PostgreSQL is a respectable DBMS š
### VERBOSE
<p style="text-align:justify">
The <B>verbose</B> parameter allows you to show additional information regarding the plan. This parameter sets to false by default. Try adding it after analyze and observe the extra details you get.
</p>
We will use `explain analyse` many times in this lab and to following one to be able to note in details what is actually happening when we execute a statement.
# Simple Database Authorization
<p style="text-align:justify">
Database authorization is a security mechanism used to determine access levels or user privileges related to database resources and objects. Our database security system is based on a two-step process. The first step is authentication, which verifies the user identity. The second stage is authorization, which is the process of the database system determining which database operations the user can perform on which objects. Our database security system is role-based, which allows the user to use his various privileges on resources and objects based on his given role/roles.
</p>
## Roles
<p style="text-align:justify">
PostgreSQL uses the concept of roles to manage database access permissions. A role can be considered as a "user", a "group", or both, depending on how the role is set up. A role that has login right is called user. A role may also be a member of other roles, which are known as groups.
Roles can own database objects {relations, functions, views, etc.} and have database privileges. A role can control who has access to which object by assign privileges on this object to other roles. Furthermore, it is possible to grant membership in a role to another role.
</p>
```sql=
--To list out all existing roles on postgresql server
Select * from pg_roles;
```
In psql command line, `\du` command is useful for listing the existing roles.
### Creating roles
```sql=
--To define a new database role
CREATE ROLE rolename [ [ WITH ] [options ... ] ];
```
<p style="text-align:justify">
Only superusers and users with <B>CREATEROLE</B> privilege can create new roles. However, if you wish to create a new superuser, you must connect as a superuser, <B>CREATEROLE</B> privilege is not enough in this case. The default user āpostgresā is a superuser. Being a superuser implies the ability to bypass all access permission checks within the database, so superuserdom should not be granted lightly!
<B>CREATE USER</B> is the same as <B>CREATE ROLE</B> except that it implies LOGIN. Other than that, <B>CREATE GROUP</B> and <B>CREATE USER</B> are aliases for <B>CREATE ROLE</B>.
</p>
### Role attributes
Attributes of a database role define roleās privileges including login, superuser, database creation, role creation, password, etc.
```sql=
--To creates a role that has login privilege, password, and valid date
CREATE ROLE user1 WITH LOGIN PASSWORD 'pg@pass1999' VALID UNTIL '2023-04-01';
--To creates a superuser which means this role can bypass all authorization checks
CREATE ROLE super_admin SUPERUSER;
--To create a role with database and role creation privilege
CREATE ROLE admin CREATEDB CREATEROLE;
```
**<U>Important Note:</U>** if you skip the clause `PASSWORD` when crating a role, PostgreSQL will assign that role a `password = NULL`. A role cannot log in when it has a NULL password so make sure you include it.
### Role membership
<p style="text-align:justify">
Managing roles as groups allows us to grant or revoke privileges from a group as a whole. In PostgreSQL, you create a role that represents a group, and then grant membership in the group role to individual user roles.
By convention, a group role does not have LOGIN privilege. To create a group role use the CREATE ROLE statement as follows:
</p>
<center>
```sql
CREATE ROLE group_role;
```
</center>
```sql=
--To add user roles or group roles to a group role
GRANT group_role TO user_role1, user_role2,..., group_role1;
--And to remove a user role from a group role
REVOKE group_role FROM user_role1, user_role2, ...;
```
PostgreSQL does not allow the circular membership loops, in which a role is the member of another role and vice versa. So be aware not to create membership chains.
A member of a group role can use the privileges of the role **<U>in two ways:</U>**
#### SET Role
<p style="text-align:justify">
The member temporarily ābecomesā the group role. In this state, the member database session has access to the privileges of the group role rather than the original login role, and any database objects created are considered owned by the group role not the original member role.
</p>
#### Role inheritance
The member role that has the INHERIT attribute automatically has use of the privileges of roles of which he is member, including any privileges inherited by those roles.
Carefully follow the example below:
```sql=
CREATE ROLE Ahmad LOGIN INHERIT;
CREATE ROLE sales NOINHERIT;
CREATE ROLE marketing;
GRANT sales to Ahmad;
GRANT marketing to sales;
```
If you connect to PostgreSQL as `Ahmad`, you will have privileges of `Ahmad`, plus privileges granted to `sales`, because Ahmad user role has the INHERIT attribute. However, you do not have privileges of `marketing` because the `NOINHERIT` attribute is defined for the sales user role.
When you are connected as `Ahmad`, and after executing the following statement:
```sql=
SET ROLE marketing;
```
You will have only those privileges granted to `marketing`, not the ones granted to `Ahmad`. To restore the original privileges, you can use the following statement:
```sql=
RESET ROLE;
```
**Notice** that only privileges on database objects are inheritable. `LOGIN`, `SUPERUSER`, `CREATEROLE`, and `CREATEDB` are special roles that cannot be inherited as an ordinary privilege.
#### Drop roles
<p style="text-align:justify">
NOTE: Before removing a role, you must reassign or remove all objects it owns and revoke its privileges.
If you remove a group role, PostgreSQL revokes all memberships in a group automatically. The user roles of the group are not affected.
</p>
<center>
```sql
DROP ROLE role_name;
```
</center>
## Authorization Specification
<p style="text-align:justify">
When a database object is created, it is assigned an owner. For most kinds of objects, the initial state is that only the owner (or a superuser) can do anything with the object. The owner could, however, choose to revoke some of their own privileges for safety or security reasons. To allow other roles to use the object, privileges must be granted.
</p>
A <B>privilege</B> is a right to execute a particular type of SQL statement or to access another user's object.
There are different kinds of privileges: `SELECT`, `INSERT`, `UPDATE`, `DELETE`, `REFERENCES`, `CREATE`, and `CONNECT`.
### The GRANT Command
Grant gives privileges and confers authorizations on database objects.
The grantor of the privilege must be the owner, superuser, or must hold the privilege on the specified item with the explicit permeation to give them to others āwith grant optionā.
```sql
GRANT {{privileges list} | ALL PRIVILEGES ( column_name [, ...] )
{privileges list} | ALL PRIVILEGES ( column_name [, ...] ) ... }
ON { [ TABLE ] table_or_view_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] }
-- ON SCHEMA schema_name [, ...]
-- ON DATABASE database_name [, ...]
TO role_specification [, ...] [ WITH GRANT OPTION ]
```
1. `ALL [PRIVILEGES]` -- Grant all of the privileges available for the object's type.
The `PRIVILEGES` keyword is optional in PostgreSQL.
2. `role_specification` -- Can be: `role_name` | `PUBLIC` | `CURRENT_USER` | `SESSION_USER`.
3. `PUBLIC` can be thought of as an implicitly defined group that always includes all roles.
Any particular role will have the sum of privileges granted directly to it, privileges granted to any role it is presently a member of, and privileges granted to `PUBLIC`.
### The REVOKE Command
Revoke is used to strip previously granted privileges from one or more roles.
```sql
REVOKE {{privileges list} | ALL PRIVILEGES ( column_name [, ...] )
{privileges list} | ALL PRIVILEGES ( column_name [, ...] ) ... }
ON { [ TABLE ] table_or_view_name [, ...] | ALL TABLES IN SCHEMA schema_name [, ...] }
-- ON SCHEMA schema_name [, ...]
-- ON DATABASE database_name [, ...]
FROM role_specification [, ...] [ cascade | restrict ]
```
<U>Note:</U> Revoking a privilege from `PUBLIC` does not necessarily mean that all roles have lost that privilege on the object; those who have it granted directly or via another role will still have it!
# Views In PostgreSQL
<p style="text-align:justify">
As we have seen in our lecture, a view is a database object that is in reality just a stored query. We have seen that a view can be accessed as a virtual table. In other words, we can issue select statements on it, we can join it with other tables or views, or use it anyway we use a real table. Keep in mind that a view does not store data physically, each time we use the view the DBMS runs the stored query and constructs the table to be used. This is the case for all view except for a materialized view which we will explore later.
</p>
<p style="text-align:justify">
A view can be very useful in many cases. It helps simplify the complexity of a query because we can query a view -which can be based on a complex query- using a simple select statement. Also, like a table; you can grant permission to users through a view that contains specific data those users are authorized to see.
</p>
To create a view, we use create view statement like the following example:
```sql=
create view student_credits as select name, tot_cred from student;
```
To make simple changes to the defining query of a view, we can use the same create statement with or replace addition as follows:
```sql=
create or replace view student_credits as
select name, tot_cred from student where tot_cred is not null;
```
<p style="text-align:justify">
<B>Note</B> that in PostgreSQL using the create or replace clause you cannot remove an existing column in the view. </p>
:::warning
:warning:
If you try to do it, you will get an error message:
ā[Err] ERROR: cannot drop columns from viewā.
:::
<p>The new query must generate the same columns that were generated when the view was created. To be more specific, the new columns must have the same names, same data types, and in the same order as they were created. However, you can append additional columns at the end of the column list.
</p>
<p style="text-align:justify">
To be able to drastically change the defining query of a view, your only good option is to drop that view and create it again. You can do both the drop of the old view and the creation of the new one in a single transaction to be safe. But things get more complicated if your view has dependent objects. This is beyond the scope of this lab but you are encouraged to read about it.
</p>
<p style="text-align:justify">
However, if you wish to make simple changes to the view such as changing its name or giving its columns a default value (that can be used in the insert) you can use the alter view statement. For example:
</p>
```sql=
alter view a rename to b;
create table base_table (id int, ts timestamptz);
create view a_view as select * from base_table;
alter view a_view alter column ts set default now();
insert into base_table(id) values(1); -- ts will receive a null
insert into a_view(id) values(2); -- ts will receive the current time
```
:::info
For your vast Knowledge: For historical reasons, alter table can be used with views too!
:::
<p style="text-align:justify">
It is also worth noting that a view is not only a stored query, PostgreSQL stores additional information and parameters such as the schema of the view when created. You can test thatš. Views created with <code>select * from</code> wouldn't "automatically inherit" the new column, because * gets expanded into a column-list during view creation. Yet if you dropped and re-created the view, the column would appear. You can follow up on the pervious example by executing the following statements and observing the results!
</p>
```sql=
select * from base_table;
select * from a_view;
alter table base_table add new_column varchar(20);
select * from base_table;
select * from a_view;
```
Finally, I guess I donāt have to tell you that you can drop a view use drop view. Anyway, letās include that example for the sake of completion.
```sql=
drop view if exists a_view;
```
## View Usage Example
<p style="text-align:justify">
As discussed in Section 4.4.8 in our textbook, we expect the constraint āan instructor cannot teach sections in two different classrooms in a semester in the same time slotā to hold. Letās use a view to help us list all (instructor, section) combinations that violate this constraint.
</p>
First, we create a view showing the ID, name, and all section information for each section thought by any instructor in our university.
```sql=
create or replace view all_sections as
select
ID, "name" as instructor_name , course_id, sec_id, semester,
"year", time_slot_id, building , room_number
from
instructor natural join teaches natural join "section";
```
We can use this view to answer many queries such as
* How many sections did an instructor teach in a certain year?
* How many sessions does every room serve?
* What is the course with the largest number of different instructors throughout time?
* And so on.
So now we can use this view to write our main query looking for instructors teaching sections in two different classrooms in a semester in the same time slot. The query will be as simple as:
```sql=
select ID
from all_sections
group by (ID, instructor_name, sec_id, semester, "year", time_slot_id)
having count( (building , room_number) ) > 1
```
## Materialized Views
<p style="text-align:justify">
Materialized views extend the view concept to allow views to store data physically. A materialized view can cache the result of a complex or expensive query and allow us to read those results repeatedly without having to execute the query every time. We can still run the query when we want to refresh this result. Materialized views are useful in many cases that require fast data access therefore they are often used in data warehouses or business intelligent applications.
</p>
To create a materialized view, you use the create materialized view statement
```sql
create materialized view view_name as
<<SQL query>>
with [no] data;
```
<p style="text-align:justify">
If we want to load data into the materialized view at the creation time, we use with data option, otherwise we put with no data. In case we use with no data, the view is flagged as unreadable. It means that we cannot query data from the view until you load data into it. To load data into a materialized view, we use the refresh materialized view statement.
</p>
For example, if we wish to materialize the results of our view āall sectionsā we can use:
```sql=
create materialized view mat_all_sections as
select
ID, "name" as instructor_name , course_id, sec_id,
semester, "year", time_slot_id, building , room_number
from
instructor natural join teaches natural join "section"
with no data;
```
If we then try to query this view before refreshing it, **we get an error**. Try it and note the hint.
# Dates
<p style="text-align:justify">
Dates hold a special place in terms of value. Sometimes the date and time an event occurs are the most important information about it, and for that SQL gives date and time the special treatment they deserve. In addition to providing date, time, timestamp, and interval datatypes. SQL provides a rich collection of date functions we can use to achieve the exact results we need. We will first look into datesā datatypes then we will explore the functions.
</p>
## Dates Datatypes
### DATE
<p style="text-align:justify">
PostgreSQL uses 4 bytes to store a date value, so the lowest and highest values a date can have are 4713 BC and 5874897 AD. When storing a date value, PostgreSQL -like all database systems-uses the <code>yyyy-mm-dd</code> format. It also uses this format for inserting data into a date column.
</p>
<p style="text-align:justify">
If you create a table that has a date column and you want to use the current date as the default value for the column, you can use the CURRENT_DATE after the default keyword. Also, if you wish to use it in your query you can write select CURRENT_DATE;
</p>
To extracting day, month, or year from a date value, we use the extract function as follows:
```sql=
Select
CURRENT_DATE,
extract (day from CURRENT_DATE) as day,
extract (month from CURRENT_DATE) as month,
extract (year from CURRENT_DATE) as year;
```
### TIME
<p style="text-align:justify">
The TIME data type that allows you to store the time of day values. A time value may have a precision up to 1 microsecond. The TIME data type requires 8 bytes and its allowed range is from 00:00:00 to 24:00:00. Besides the TIME data type, PostgreSQL provides the āTIME with time zoneā data type that allows you to store and manipulate the time of day with time zone. To get the current time with time zone, you use CURRENT_TIME. You can also use in your query and specify the precession. To get the local time, you use LOCALTIME.
</p>
To extracting hours, minutes, seconds from a time value, we use the extract function as follows:
```sql=
Select
LOCALTIME,
extract (hour from LOCALTIME) as hour,
extract (minute from LOCALTIME) as minute,
extract (second from LOCALTIME) as second,
extract (milliseconds from LOCALTIME) as milliseconds,
extract (microseconds from LOCALTIME) as microseconds;
```
### Timestamp
<p style="text-align:justify">
As we have seen in our lecture, timestamp data type allows us to store both date and time. PostgreSQL provides two temporal data types for handling timestamp, one without time zone (timestamp) and one with time zone (timestamptz). PostgreSQL stores the timestamptz in UTC value. When you insert a value into a timestamptz column, PostgreSQL converts the timestamptz value into a UTC value and stores the UTC value in the table. When you query timestamptz from the database, PostgreSQL converts the UTC value back to the time value of the time zone set by the database server, the user, or the current database connection. Both timestamp and timestamptz uses 8 bytes for storing the timestamp value.
</p>
You can get the time zone of your server by issuing the command: `show TIMEZONE;`
To set the time zone of your database server you can sue `set timezone = 'your new t z';`
To get the current timestamp you can use the now () function for example: `select now ();`
To get the time of day in the string format, you can use the `timeofday()` function.
Similar to date and time, to extracting hours, minutes, seconds, day, month, year from a timestamp value, we use the extract function exactly as mentioned above;
### Interval
<p style="text-align:justify">
The interval data type allows us to store and manipulate a period of time in years, months, days, hours, minutes, seconds, etc. An interval value requires 16 bytes storage size that can store a period in the range from -178,000,000 years to 178,000,000 years. In addition, an interval value can have an optional precision value p with the permitted range is from 0 to 6. The precision p is the number of fraction digits retained in the second fields.
</p>
The following examples show some interval values:
```sql=
select interval '2 weeks ago';
select interval '2 hours 10 minutes';
select interval '2 years 1 month 3 days 6 hours 12 minutes 30 seconds';
```
You can apply the arithmetic operators (+, -, *, /) to the interval values, for examples:
```sql=
select interval '2h 50m' + interval '10m' -- is 03:00:00
select interval '2h 50m' - interval '50m' -- is 02:00:00
select interval '2h 20m' / 2 -- is 01:10:00
select 600 * interval '1 minute' -- is 10:00:00
```
<p style="text-align:justify">
Similar to date, time, and timestamp, to extracting hours, minutes, seconds, day, month, year from an interval value, we use the extract function. Note that the returned value here is equivalent to ācastingā the part of this interval as another smaller interval. It will not do time conversion for you.
Discuss the following examples with your colleague:
</p>
```sql=
select extract (day from interval '2 weeks ago'); -- is -14
select extract (hour from interval '2 weeks ago'); -- is 0
select extract (hour from interval '2.5 days ago'); -- is -12
select extract (year from interval '6 month'); -- is 0
select extract (year from interval '30 month'); -- is 2
select extract (month from interval '30 month'); -- is 6
```
### Special Date/Time Inputs
| Input String | Valid Types | Description |
| ------------ | ----------- | ----------- |
| infinity | date, timestamp | later than all other time stamps |
| -infinity | date, timestamp | earlier than all other time stamps |
| Now | date, time, timestamp | current transaction's start time |
| Today | date, timestamp | midnight (00:00) today |
| tomorrow | date, timestamp | midnight (00:00) tomorrow |
| yesterday | date, timestamp | midnight (00:00) yesterday |
| allballs | time | 00:00:00.00 UTC |
## Date Functions
### Age()
<p style="text-align:justify">
As the name suggests, age() function will calculate the age for you. If you pass one date value to the age() function, it will subtract that date value from the current date. If you pass two arguments to the age() function, it will subtract the second argument from the first argument.
</p>
### date_part(text, timestamp)
This function is exactly equivalent to extract function we explored above. It appears in many contexts so when you see it now you know :smile:.
### make_date(year int, month int, day int)
It simply makes a date. You can also write: date ā2020-03-07ā
### date_trunc(āfieldā, timestamp/interval)
<p style="text-align:justify">
The function date_trunc is conceptually similar to truncating numbers. In the function you can specify a field that tells to which precision to truncate the input value. The return value is likewise of type timestamp, or interval, and it has all fields that are less significant than the selected one set to zero (or one, for day and month).
</p>
Valid values for field are: `{microseconds, milliseconds, second, minute, hour, day, week, month, quarter, year, decade, century, millennium}`.
Examples:
```sql=
select date_trunc('hour', timestamp '2001-02-16 20:38:40');
--result: 2001-02-16 20:00:00
select date_trunc('year', timestamp '2001-02-16 20:38:40');
--result: 2001-01-01 00:00:00
select date_trunc('day', timestamp '2001-02-16 20:38:40');
--result: 2001-02-16 00:00:00
select date_trunc('hour', interval '3 days 02:47:33');
--result: 3 days 02:00:00
```
### SQL overlaps operator
<center>
```
(start1, end1) OVERLAPS (start2, end2)
or
(start1, length1) OVERLAPS (start2, length2)
```
</center>
<p style="text-align:justify">
This expression yields true when two time periods (defined by their endpoints) overlap, false when they do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time, or time stamp followed by an interval. When a pair of values is provided, either the start or the end can be written first; overlaps automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant. This means for instance that 5-6 and 6-7 do not overlap.
</p>
Try the following example:
```sql=
select (date '2001-02-16', date '2001-12-21') overlaps
(date '2001-10-30', date '2002-10-30'); -- True
select (date '2001-02-16', interval '100 days') overlaps
(date '2001-10-30', date '2002-10-30'); -- False
select (date '2001-10-29', date '2001-10-30') overlaps
(date '2001-10-30', date '2001-10-31'); -- False
select (date '2001-10-30', date '2001-10-30') overlaps
(date '2001-10-30', date '2001-10-31'); -- True
```
**Unfortunately**, our university Database contains no dates to experiment with. Donāt worry, in the next segment we will import a dataset that will allow us to experiment and do some exercises with dates.
# More pre-Lab
## Importing data from a CSV file
<p style="text-align:justify">
PostgreSQL provides a nice way of importing data into tables even from files such as csv files that can be exported from almost any program. PostgreSQL uses a streaming protocol called the copy protocol that reads data and inserts it to the table. If you are using PostgreSQL from the shell, you can use the <code>\copy</code> command. If you use DBeaver or pgAdmin, both of them provide a GUI to perform the import.
</p>
We wish to import the file named `web_events.csv` uploaded for you [here](https://drive.google.com/file/d/1zG2i0YFzRwJsgCkS6t-i1Oae5YmG4KHD/view). But first we must create the table that will house the data.
```sql=
create table web_events (
event_time timestamptz,
event_type text,
product_id text,
category_id text,
category_code text,
brand text,
price numeric,
user_id text,
user_session text );
```
After creating the table and refreshing your schema you can
1. Navigate to the table inside your Database Navigator menu
2. Right click on the table and choose āimport Dataā
3. From the newly opened menu choose āCSV import from CSV file(s)ā
4. Add āweb_events.csvā that you have downloaded as a resource.
5. In āImporter settingsā section, make sure the option āset empty strings to NULLā is active.
6. Click Next, Next, Next, and start.
After this process is completed you will have all new data imported to your new table `web_events`.
This data is a subset of events happening on an online store for the entire month of October 2019. You can explore the data by running any select statements you find useful. Some useful ones I will think of are:
```sql=
select * from web_events limit 10;
select count(*) from web_events ;
select count(distinct user_session) from web_events;
select count(distinct event_time) from web_events;
select count(distinct product_id) from web_events;
select count(distinct category_id) from web_events;
select count(distinct user_id) from web_events;
select distinct event_type from web_events;
```
Try executing those quires and any others you think of. Write down your notes about his dataset and try to discuss this with your colleague.
Exploring large datasets and making sense of them using SQL commands can prove to be a very important skill in your arsenal, so make sure you practice this whenever you get a chance.
## Queries with date values and functions
We can now use this dataset to answer the following queries.
1. Show the number of `purchase` events that happened every day of this month ordered descending from the day with the most events to the least.
```sql=
select
date_trunc('day' , event_time) as day,
count(*) as Number_of_purchase_events
from web_events
where event_type = 'purchase'
group by 1
order by 2 desc;
```
Note how we used column numbers (in the order they appear in the select clause) in both the group by and order by classes. This makes writing the query easier. Be carful when you do that while adding or removing columns from your query as the order of the columns now matter.
2. Show the day of the week that contains the most purchase events (you can add limit 1 if you like).
```sql=
select
to_char( event_time, 'Day' ) as day_of_week,
count(*) as Number_of_purchase_events
from web_events
where event_type = 'purchase'
group by 1
order by 2 desc;
```
3. Show the hour of the day with the least events (you can add limit 1 if you like).
```sql=
select
extract(hour from event_time) as hour_of_day,
count(*) as Number_of_events
from web_events
group by 1
order by 2;
```
4. Show the daily total sales throughout the month, ordered descending from the day with the most sales (by value) to the day with the least.
```sql=
Select
date_trunc('day' , event_time) as day,
sum(price) as totoal_sales
from web_events
where event_type = 'purchase'
group by 1
order by 2 desc;
```
5. Show the day(s) that has the maximum number of purchase events.
```sql=
with daily_purchase as (
select
date_trunc('day' , event_time) as day,
count(*) as Number_of_purchase_events
from web_events
where event_type = 'purchase'
group by 1
),
max_purchases as (
select max(Number_of_purchase_events) as value
from daily_purchase
)
select * from
daily_purchase, max_purchases
where max_purchases.value = daily_purchase.Number_of_purchase_events;
```
6. Show the day that has the maximum number of different visitors.
```sql=
select
date_trunc('day' , event_time) as day,
count (distinct user_id) as visitors
from web_events
group by 1
order by 2 desc;
```
7. For each week, show the trending category, the category has products that are viewed the most.
```sql=
with weekly_cat_report as (
select
extract(week from event_time) as week,
category_code,
count(*) as number_of_view_events
from web_events
where event_type = 'view'
group by 1,2
)
select *
from weekly_cat_report r1
where r1.number_of_view_events >= all ( select r2.number_of_view_events
from weekly_cat_report r2
where r1.week = r2.week);
```
8. Show the minimum, maximum, and average time interval between the view of a product, and adding it to cart by buyers.
```sql=
with time_of_first_view as (
select user_id , product_id , min (event_time) as first_view
from web_events
where event_type = 'view'
group by 1,2
), time_of_first_cart as (
select user_id , product_id , min (event_time) as first_cart
from web_events
where event_type = 'cart'
group by 1,2
)
select
min(first_cart - first_view),
max(first_cart - first_view),
avg(first_cart - first_view)
from time_of_first_view natural join time_of_first_cart
where first_cart > first_view;-- because our dataset is not complete!
```
9. For each week, show the product that is the most frequently viewed but not purchased.
```sql=
with product_views as (
select extract(week from event_time) as week,
product_id , count(*) as view_count
from web_events
where event_type = 'view'
group by 1,2
), purchased_products as (
select distinct product_id
from web_events
where event_type = 'purchase'
), weekly_views_of_not_purchased as(
select week, product_id, view_count from product_views
where product_id not in (select product_id from purchased_products)
) select *
from weekly_views_of_not_purchased v1
where v1.view_count >= all (select view_count
from weekly_views_of_not_purchased v2
where v1.week = v2.week )
```
###### tags: `Database Systems` `Pre-Lab` `IUG` `Computer Engineering`
<center>End Of Pre-Lab 5</center>