# Database Design Database connection (We use Azure Database instead of GCP Database): ![](https://i.imgur.com/wBcomdS.png) ![](https://i.imgur.com/CllN389.png) ## tables overview ![](https://i.imgur.com/HlwOy0Y.png) <img src="https://i.imgur.com/JopwGUN.png" style="zoom: 50%;" /> <img src="https://i.imgur.com/0CRVJxz.png" style="zoom: 50%;" /> <!-- ![](https://i.imgur.com/y6W4zzN.png) --> <img src="https://i.imgur.com/y6W4zzN.png" style="zoom: 50%;" /> <img src="https://i.imgur.com/j0xeNPD.png" style="zoom: 50%;" /> <!-- ![](https://i.imgur.com/Rn8Rhgx.png) --> <img src="https://i.imgur.com/Rn8Rhgx.png" style="zoom: 50%;" /> <img src="https://i.imgur.com/eWii8Cr.png" style="zoom: 50%;" /> ## DDL commands ```mysql -- ----------------------------------------------------- -- Table account -- ----------------------------------------------------- DROP TABLE IF EXISTS account; CREATE TABLE IF NOT EXISTS account ( UserID INT NOT NULL AUTO_INCREMENT, EmailAddr VARCHAR(50), LastName VARCHAR(50), FirstName VARCHAR(50), LoginToken VARCHAR(50), PRIMARY KEY (UserID) ); -- ----------------------------------------------------- -- Table attends -- ----------------------------------------------------- DROP TABLE IF EXISTS attends; CREATE TABLE IF NOT EXISTS attends( UserID INT NOT NULL, EventID INT NOT NULL, PRIMARY KEY (UserID, EventID), FOREIGN KEY (UserID) REFERENCES account(UserID) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (EventID) REFERENCES movieevent(EventID) ON DELETE CASCADE ON UPDATE CASCADE ); -- ----------------------------------------------------- -- Table buys -- ----------------------------------------------------- DROP TABLE IF EXISTS buys; CREATE TABLE IF NOT EXISTS buys ( DateTime DATETIME, TransactionID VARCHAR(50), UserID INT, Amount REAL, PaymentMethod VARCHAR(50), PRIMARY KEY(TransactionID), FOREIGN KEY (UserID) REFERENCES account(UserID) ON DELETE CASCADE ON UPDATE CASCADE ); -- ----------------------------------------------------- -- Table credentials -- ----------------------------------------------------- DROP TABLE IF EXISTS credentials ; CREATE TABLE IF NOT EXISTS credentials ( PasswordHash VARCHAR(50), PasswordSalt VARCHAR(50), UserID INT, PRIMARY KEY (UserID), FOREIGN KEY (UserID) REFERENCES account(UserID) ON DELETE CASCADE ON UPDATE CASCADE ); -- ----------------------------------------------------- -- Table movie -- ----------------------------------------------------- DROP TABLE IF EXISTS movie; CREATE TABLE IF NOT EXISTS movie ( IMDB_Number VARCHAR(50), title VARCHAR(50), PRIMARY KEY (IMDB_Number) ); -- ----------------------------------------------------- -- Table dislikes -- ----------------------------------------------------- DROP TABLE IF EXISTS dislikes; CREATE TABLE IF NOT EXISTS dislikes ( UserID INT, Movie VARCHAR(50), PRIMARY KEY (UserID, Movie), FOREIGN KEY (Movie) REFERENCES movie (IMDB_Number) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (UserID) REFERENCES account(UserID) ON DELETE CASCADE ON UPDATE CASCADE ); -- ----------------------------------------------------- -- Table isfriendwith -- ----------------------------------------------------- DROP TABLE IF EXISTS isfriendwith; CREATE TABLE IF NOT EXISTS isfriendwith ( User1 INT, User2 INT, pending_status BOOL, PRIMARY KEY (User1, User2), FOREIGN KEY (User1) REFERENCES account(UserID) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (User2) REFERENCES account(UserID) ON DELETE CASCADE ON UPDATE CASCADE ); -- ----------------------------------------------------- -- Table movie_nite.likes -- ----------------------------------------------------- DROP TABLE IF EXISTS likes; CREATE TABLE IF NOT EXISTS likes ( UserID INT, Movie VARCHAR(50), PRIMARY KEY (UserID, Movie), FOREIGN KEY (Movie) REFERENCES movie(IMDB_Number) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (UserID) REFERENCES account(UserID) ON DELETE CASCADE ON UPDATE CASCADE ); -- ----------------------------------------------------- -- Table membership -- ----------------------------------------------------- DROP TABLE IF EXISTS membership ; CREATE TABLE IF NOT EXISTS membership ( membershipID INT AUTO_INCREMENT, ExpDate DATE, Level INT, UserID INT, PRIMARY KEY (membershipID), FOREIGN KEY (UserID) REFERENCES account (UserID) ON DELETE CASCADE ON UPDATE CASCADE ); -- ----------------------------------------------------- -- Table movieevent -- ----------------------------------------------------- DROP TABLE IF EXISTS movieevent ; CREATE TABLE IF NOT EXISTS movieevent ( EventID INT AUTO_INCREMENT, Location VARCHAR(50), DateTime DATETIME, Host INT, PRIMARY KEY (EventID), FOREIGN KEY (Host) REFERENCES account (UserID) ON DELETE CASCADE ON UPDATE CASCADE ); -- ----------------------------------------------------- -- Table reviews -- ----------------------------------------------------- DROP TABLE IF EXISTS reviews ; CREATE TABLE IF NOT EXISTS reviews ( UserID INT, Movie VARCHAR(50), Comments VARCHAR(255), Time DATETIME, Date DATE, PRIMARY KEY (UserID, Movie), FOREIGN KEY (UserID) REFERENCES account(UserID) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (Movie) REFERENCES movie(IMDB_Number) ON DELETE CASCADE ON UPDATE CASCADE ); -- ----------------------------------------------------- -- Table scores -- ----------------------------------------------------- DROP TABLE IF EXISTS scores ; CREATE TABLE IF NOT EXISTS scores ( UserID INT, Movie VARCHAR(10), Score DOUBLE, PRIMARY KEY(UserID, Movie), FOREIGN KEY (UserID) REFERENCES account(UserID) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (Movie) REFERENCES movie (IMDB_Number) ON DELETE CASCADE ON UPDATE CASCADE ); -- ----------------------------------------------------- -- Table movievote -- ----------------------------------------------------- DROP TABLE IF EXISTS movievote; CREATE TABLE IF NOT EXISTS movievote ( UserID INT, EventID INT, IMDB_Number VARCHAR(50), VoteTime DateTime, VoteCount INT, PRIMARY KEY(UserID, EventID, IMDB_Number), FOREIGN KEY(UserID) REFERENCES account(UserID) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY(EventID) REFERENCES movieevent(EventID) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY(IMDB_Number) REFERENCES movie(IMDB_Number) ON DELETE CASCADE ON UPDATE CASCADE ); ``` ## Advanced query example1 Find the movie(s) after a chosen year with highest number of votes among all your friends during a certain period of time. Input: `currUser`, `startTime`, `endTime`, `startYear` values: * currUser : 30 * startTime : \'2022-02-13 18:59:35\' * endTime : now() * startYear: 1990 ```mysql SELECT SUM(VoteCount) as votes, title From MovieVote natural join movie Where UserID in (Select User1 from isFriendWith Where User2 = currUser ) OR UserID in (Select User2 from isFriendWith Where User1 = currUser ) AND VoteTime >= startTime AND VoteTime <= endTime AND year_released >= startYear Group by IMDB_Number Order By votes desc Limit 15; ``` result: ![](https://i.imgur.com/4KcMCz9.png) ## EXPLAIN ANALYZE before adding indexes ```bash -> Limit: 15 row(s) (actual time=7.489..7.491 rows=15 loops=1) -> Sort: votes DESC, limit input to 15 row(s) per chunk (actual time=7.488..7.490 rows=15 loops=1) -> Table scan on <temporary> (actual time=0.001..0.009 rows=99 loops=1) -> Aggregate using temporary table (actual time=7.430..7.446 rows=99 loops=1) -> Nested loop inner join (cost=902.25 rows=2000) (actual time=0.103..5.704 rows=984 loops=1) -> Filter: (<in_optimizer>(movievote.UserID,movievote.UserID in (select #2)) or (<in_optimizer>(movievote.UserID,movievote.UserID in (select #3)) and (movievote.VoteTime >= TIMESTAMP'2022-02-13 18:59:35') and (movievote.VoteTime <= <cache>(now())))) (cost=202.25 rows=2000) (actual time=0.079..1.248 rows=984 loops=1) -> Table scan on MovieVote (cost=202.25 rows=2000) (actual time=0.031..0.865 rows=2000 loops=1) -> Select #2 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User1)) (actual time=0.001..0.001 rows=0 loops=41) -> Limit: 1 row(s) (actual time=0.001..0.001 rows=0 loops=41) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User1=movievote.UserID) (actual time=0.000..0.000 rows=0 loops=41) -> Materialize with deduplication (actual time=0.002..0.002 rows=0 loops=41) -> Index lookup on isFriendWith using User2 (User2=30) (cost=2.26 rows=20) (actual time=0.013..0.020 rows=20 loops=1) -> Select #3 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User2)) (actual time=0.001..0.001 rows=0 loops=1) -> Limit: 1 row(s) (actual time=0.001..0.001 rows=0 loops=1) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User2=movievote.UserID) (actual time=0.000..0.000 rows=0 loops=1) -> Materialize with deduplication (actual time=0.010..0.010 rows=0 loops=1) -> Index lookup on isFriendWith using PRIMARY (User1=30) (cost=0.35 rows=1) (actual time=0.007..0.007 rows=0 loops=1) -> Filter: (<in_optimizer>(movievote.UserID,movievote.UserID in (select #2)) or (<in_optimizer>(movievote.UserID,movievote.UserID in (select #3)) and (movievote.VoteTime >= TIMESTAMP'2022-02-13 18:59:35') and (movievote.VoteTime <= <cache>(now())) and (movie.year_released >= 1990))) (cost=0.25 rows=1) (actual time=0.004..0.004 rows=1 loops=984) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=movievote.IMDB_Number) (cost=0.25 rows=1) (actual time=0.004..0.004 rows=1 loops=984) -> Select #2 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User1)) (actual time=0.001..0.001 rows=0 loops=41) -> Limit: 1 row(s) (actual time=0.001..0.001 rows=0 loops=41) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User1=movievote.UserID) (actual time=0.000..0.000 rows=0 loops=41) -> Materialize with deduplication (actual time=0.002..0.002 rows=0 loops=41) -> Index lookup on isFriendWith using User2 (User2=30) (cost=2.26 rows=20) (actual time=0.013..0.020 rows=20 loops=1) -> Select #3 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User2)) (actual time=0.001..0.001 rows=0 loops=1) -> Limit: 1 row(s) (actual time=0.001..0.001 rows=0 loops=1) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User2=movievote.UserID) (actual time=0.000..0.000 rows=0 loops=1) -> Materialize with deduplication (actual time=0.010..0.010 rows=0 loops=1) -> Index lookup on isFriendWith using PRIMARY (User1=30) (cost=0.35 rows=1) (actual time=0.007..0.007 rows=0 loops=1) ``` ### EXPLAIN ANALYZE after adding index on `title` ```bash -> Limit: 15 row(s) (actual time=10.672..10.674 rows=15 loops=1) -> Sort: votes DESC, limit input to 15 row(s) per chunk (actual time=10.671..10.672 rows=15 loops=1) -> Table scan on <temporary> (actual time=0.001..0.010 rows=99 loops=1) -> Aggregate using temporary table (actual time=10.616..10.630 rows=99 loops=1) -> Nested loop inner join (cost=902.25 rows=2000) (actual time=0.102..8.189 rows=984 loops=1) -> Filter: (<in_optimizer>(movievote.UserID,movievote.UserID in (select #2)) or (<in_optimizer>(movievote.UserID,movievote.UserID in (select #3)) and (movievote.VoteTime >= TIMESTAMP'2022-02-13 18:59:35') and (movievote.VoteTime <= <cache>(now())))) (cost=202.25 rows=2000) (actual time=0.078..1.538 rows=984 loops=1) -> Table scan on MovieVote (cost=202.25 rows=2000) (actual time=0.028..1.074 rows=2000 loops=1) -> Select #2 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User1)) (actual time=0.001..0.001 rows=0 loops=41) -> Limit: 1 row(s) (actual time=0.001..0.001 rows=0 loops=41) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User1=movievote.UserID) (actual time=0.001..0.001 rows=0 loops=41) -> Materialize with deduplication (actual time=0.002..0.002 rows=0 loops=41) -> Index lookup on isFriendWith using User2 (User2=30) (cost=2.26 rows=20) (actual time=0.013..0.019 rows=20 loops=1) -> Select #3 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User2)) (actual time=0.001..0.001 rows=0 loops=1) -> Limit: 1 row(s) (actual time=0.000..0.000 rows=0 loops=1) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User2=movievote.UserID) (actual time=0.000..0.000 rows=0 loops=1) -> Materialize with deduplication (actual time=0.011..0.011 rows=0 loops=1) -> Index lookup on isFriendWith using PRIMARY (User1=30) (cost=0.35 rows=1) (actual time=0.008..0.008 rows=0 loops=1) -> Filter: (<in_optimizer>(movievote.UserID,movievote.UserID in (select #2)) or (<in_optimizer>(movievote.UserID,movievote.UserID in (select #3)) and (movievote.VoteTime >= TIMESTAMP'2022-02-13 18:59:35') and (movievote.VoteTime <= <cache>(now())) and (movie.year_released >= 1990))) (cost=0.25 rows=1) (actual time=0.006..0.006 rows=1 loops=984) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=movievote.IMDB_Number) (cost=0.25 rows=1) (actual time=0.006..0.006 rows=1 loops=984) -> Select #2 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User1)) (actual time=0.001..0.001 rows=0 loops=41) -> Limit: 1 row(s) (actual time=0.001..0.001 rows=0 loops=41) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User1=movievote.UserID) (actual time=0.001..0.001 rows=0 loops=41) -> Materialize with deduplication (actual time=0.002..0.002 rows=0 loops=41) -> Index lookup on isFriendWith using User2 (User2=30) (cost=2.26 rows=20) (actual time=0.013..0.019 rows=20 loops=1) -> Select #3 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User2)) (actual time=0.001..0.001 rows=0 loops=1) -> Limit: 1 row(s) (actual time=0.000..0.000 rows=0 loops=1) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User2=movievote.UserID) (actual time=0.000..0.000 rows=0 loops=1) -> Materialize with deduplication (actual time=0.011..0.011 rows=0 loops=1) -> Index lookup on isFriendWith using PRIMARY (User1=30) (cost=0.35 rows=1) (actual time=0.008..0.008 rows=0 loops=1) ``` ### EXPLAIN ANALYZE after adding index on `VoteTime` ```bash -> Limit: 15 row(s) (actual time=11.067..11.069 rows=15 loops=1) -> Sort: votes DESC, limit input to 15 row(s) per chunk (actual time=11.066..11.067 rows=15 loops=1) -> Table scan on <temporary> (actual time=0.001..0.007 rows=99 loops=1) -> Aggregate using temporary table (actual time=11.010..11.024 rows=99 loops=1) -> Nested loop inner join (cost=902.25 rows=2000) (actual time=0.086..9.418 rows=984 loops=1) -> Filter: (<in_optimizer>(movievote.UserID,movievote.UserID in (select #2)) or (<in_optimizer>(movievote.UserID,movievote.UserID in (select #3)) and (movievote.VoteTime >= TIMESTAMP'2022-02-13 18:59:35') and (movievote.VoteTime <= <cache>(now())))) (cost=202.25 rows=2000) (actual time=0.064..1.146 rows=984 loops=1) -> Table scan on MovieVote (cost=202.25 rows=2000) (actual time=0.029..0.768 rows=2000 loops=1) -> Select #2 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User1)) (actual time=0.001..0.001 rows=0 loops=41) -> Limit: 1 row(s) (actual time=0.001..0.001 rows=0 loops=41) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User1=movievote.UserID) (actual time=0.000..0.000 rows=0 loops=41) -> Materialize with deduplication (actual time=0.002..0.002 rows=0 loops=41) -> Index lookup on isFriendWith using User2 (User2=30) (cost=2.26 rows=20) (actual time=0.009..0.014 rows=20 loops=1) -> Select #3 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User2)) (actual time=0.000..0.000 rows=0 loops=1) -> Limit: 1 row(s) (actual time=0.000..0.000 rows=0 loops=1) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User2=movievote.UserID) (actual time=0.000..0.000 rows=0 loops=1) -> Materialize with deduplication (actual time=0.012..0.012 rows=0 loops=1) -> Index lookup on isFriendWith using PRIMARY (User1=30) (cost=0.35 rows=1) (actual time=0.009..0.009 rows=0 loops=1) -> Filter: (<in_optimizer>(movievote.UserID,movievote.UserID in (select #2)) or (<in_optimizer>(movievote.UserID,movievote.UserID in (select #3)) and (movievote.VoteTime >= TIMESTAMP'2022-02-13 18:59:35') and (movievote.VoteTime <= <cache>(now())) and (movie.year_released >= 1990))) (cost=0.25 rows=1) (actual time=0.008..0.008 rows=1 loops=984) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=movievote.IMDB_Number) (cost=0.25 rows=1) (actual time=0.008..0.008 rows=1 loops=984) -> Select #2 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User1)) (actual time=0.001..0.001 rows=0 loops=41) -> Limit: 1 row(s) (actual time=0.001..0.001 rows=0 loops=41) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User1=movievote.UserID) (actual time=0.000..0.000 rows=0 loops=41) -> Materialize with deduplication (actual time=0.002..0.002 rows=0 loops=41) -> Index lookup on isFriendWith using User2 (User2=30) (cost=2.26 rows=20) (actual time=0.009..0.014 rows=20 loops=1) -> Select #3 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User2)) (actual time=0.000..0.000 rows=0 loops=1) -> Limit: 1 row(s) (actual time=0.000..0.000 rows=0 loops=1) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User2=movievote.UserID) (actual time=0.000..0.000 rows=0 loops=1) -> Materialize with deduplication (actual time=0.012..0.012 rows=0 loops=1) -> Index lookup on isFriendWith using PRIMARY (User1=30) (cost=0.35 rows=1) (actual time=0.009..0.009 rows=0 loops=1) ``` ### EXPLAIN ANALYZE after adding index on `year_released` ```bash -> Limit: 15 row(s) (actual time=9.881..9.883 rows=15 loops=1) -> Sort: votes DESC, limit input to 15 row(s) per chunk (actual time=9.880..9.881 rows=15 loops=1) -> Table scan on <temporary> (actual time=0.001..0.008 rows=99 loops=1) -> Aggregate using temporary table (actual time=9.831..9.844 rows=99 loops=1) -> Nested loop inner join (cost=902.25 rows=2000) (actual time=0.103..7.760 rows=984 loops=1) -> Filter: (<in_optimizer>(movievote.UserID,movievote.UserID in (select #2)) or (<in_optimizer>(movievote.UserID,movievote.UserID in (select #3)) and (movievote.VoteTime >= TIMESTAMP'2022-02-13 18:59:35') and (movievote.VoteTime <= <cache>(now())))) (cost=202.25 rows=2000) (actual time=0.075..1.503 rows=984 loops=1) -> Table scan on MovieVote (cost=202.25 rows=2000) (actual time=0.028..1.082 rows=2000 loops=1) -> Select #2 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User1)) (actual time=0.001..0.001 rows=0 loops=41) -> Limit: 1 row(s) (actual time=0.001..0.001 rows=0 loops=41) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User1=movievote.UserID) (actual time=0.001..0.001 rows=0 loops=41) -> Materialize with deduplication (actual time=0.002..0.002 rows=0 loops=41) -> Index lookup on isFriendWith using User2 (User2=30) (cost=2.26 rows=20) (actual time=0.013..0.019 rows=20 loops=1) -> Select #3 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User2)) (actual time=0.001..0.001 rows=0 loops=1) -> Limit: 1 row(s) (actual time=0.000..0.000 rows=0 loops=1) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User2=movievote.UserID) (actual time=0.000..0.000 rows=0 loops=1) -> Materialize with deduplication (actual time=0.011..0.011 rows=0 loops=1) -> Index lookup on isFriendWith using PRIMARY (User1=30) (cost=0.35 rows=1) (actual time=0.007..0.007 rows=0 loops=1) -> Filter: (<in_optimizer>(movievote.UserID,movievote.UserID in (select #2)) or (<in_optimizer>(movievote.UserID,movievote.UserID in (select #3)) and (movievote.VoteTime >= TIMESTAMP'2022-02-13 18:59:35') and (movievote.VoteTime <= <cache>(now())) and (movie.year_released >= 1990))) (cost=0.25 rows=1) (actual time=0.006..0.006 rows=1 loops=984) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=movievote.IMDB_Number) (cost=0.25 rows=1) (actual time=0.006..0.006 rows=1 loops=984) -> Select #2 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User1)) (actual time=0.001..0.001 rows=0 loops=41) -> Limit: 1 row(s) (actual time=0.001..0.001 rows=0 loops=41) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User1=movievote.UserID) (actual time=0.001..0.001 rows=0 loops=41) -> Materialize with deduplication (actual time=0.002..0.002 rows=0 loops=41) -> Index lookup on isFriendWith using User2 (User2=30) (cost=2.26 rows=20) (actual time=0.013..0.019 rows=20 loops=1) -> Select #3 (subquery in condition; run only once) -> Filter: ((movievote.UserID = `<materialized_subquery>`.User2)) (actual time=0.001..0.001 rows=0 loops=1) -> Limit: 1 row(s) (actual time=0.000..0.000 rows=0 loops=1) -> Index lookup on <materialized_subquery> using <auto_distinct_key> (User2=movievote.UserID) (actual time=0.000..0.000 rows=0 loops=1) -> Materialize with deduplication (actual time=0.011..0.011 rows=0 loops=1) -> Index lookup on isFriendWith using PRIMARY (User1=30) (cost=0.35 rows=1) (actual time=0.007..0.007 rows=0 loops=1) ``` ### Example1 index report We explored three different indexes for query example1, which are `VoteTime`, `year_released`, `title`. The first index we chose is `VoteTime` since we did a range comparison with `VoteTime` (from end time to start time). But adding the index on `VoteTime` did not improve the performance. We think that the reason is the whole table scan of `movievote` cannot be avoided with the index on `VoteTime`. The second index we chose is `year_released` since there is one comparison in our query (look for movies released after startYear). Adding the index on `year_released` did not improve the performance either. The whole table scan of `movievote` still cannot be avoided with the index on `year_released`. The last index we chose is `title`. We were running out of choices since we have `UserID`, `EventID`, `IMDB_Number`, `VoteTime`, `VoteCount` in movieVote table and `IMDB_Number`, `title`, `genre`, `year_released` in movie table. Except for primary keys such as `UserID`, `EventID`, `IMDB_Number`, we can only choose from `VoteTime`, `VoteCount`, `title`, and `year_released`. We've done analysis on `VoteTime` and `year_released`, so we decided to choose `title` in this case. The index on title did not improve the performance since whole table scan of `movievote` cannot be avoided. ## Advanced query example2 Find the title, score and genre of movie(s) that are recently scored by users when searching for a genre (keyword search), in descending order. input: `currGenre`, `StartTime`, `endTime` * currGenre : "Oth%" * StartTime : \'2022-02-13 18:59:35\' * endTime : now() ```mysql SELECT title, Score, genre FROM movie JOIN scores ON movie.IMDB_Number = scores.Movie WHERE IMDB_Number IN ( SELECT IMDB_Number FROM movie WHERE genre LIKE currGenre ) AND ScoreTime >= StartTime AND ScoreTime <= endTime ORDER BY Score DESC Limit 15; ``` Result: ![](https://i.imgur.com/4YZrI0a.png) ### EXPLAIN ANALYZE before adding indexes ```bash -> Limit: 15 row(s) (actual time=2.244..2.342 rows=15 loops=1) -> Nested loop inner join (cost=1496.78 rows=356) (actual time=2.243..2.341 rows=15 loops=1) -> Nested loop inner join (cost=910.14 rows=3200) (actual time=2.236..2.288 rows=16 loops=1) -> Sort: scores.Score DESC (cost=323.50 rows=3200) (actual time=2.213..2.214 rows=16 loops=1) -> Filter: (scores.ScoreTime >= TIMESTAMP'2022-02-13 18:59:35') (actual time=0.079..1.886 rows=1090 loops=1) -> Table scan on scores (actual time=0.076..1.401 rows=3200 loops=1) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=scores.Movie) (cost=0.25 rows=1) (actual time=0.004..0.004 rows=1 loops=16) -> Filter: (movie.genre like 'Oth%') (cost=0.25 rows=0) (actual time=0.003..0.003 rows=1 loops=16) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=scores.Movie) (cost=0.25 rows=1) (actual time=0.003..0.003 rows=1 loops=16) ``` ### EXPLAIN ANALYZE after adding index on `genre` ```bash -> Limit: 15 row(s) (actual time=3.822..3.904 rows=15 loops=1) -> Nested loop inner join (cost=1004.22 rows=513) (actual time=3.821..3.902 rows=15 loops=1) -> Nested loop inner join (cost=910.14 rows=513) (actual time=3.815..3.864 rows=15 loops=1) -> Sort: scores.Score DESC (cost=323.50 rows=3200) (actual time=3.730..3.731 rows=16 loops=1) -> Filter: (scores.ScoreTime >= TIMESTAMP'2022-02-13 18:59:35') (actual time=0.059..3.379 rows=1090 loops=1) -> Table scan on scores (actual time=0.056..2.963 rows=3200 loops=1) -> Filter: (movie.genre like 'Oth%') (cost=0.25 rows=0) (actual time=0.004..0.004 rows=1 loops=16) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=scores.Movie) (cost=0.25 rows=1) (actual time=0.004..0.004 rows=1 loops=16) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=scores.Movie) (cost=0.25 rows=1) (actual time=0.002..0.002 rows=1 loops=15) ``` ### EXPLAIN ANALYZE after adding index on `Score` ```bash -> Limit: 15 row(s) (actual time=0.544..0.745 rows=15 loops=1) -> Nested loop inner join (cost=551.62 rows=5) (actual time=0.543..0.742 rows=15 loops=1) -> Nested loop inner join (cost=280.48 rows=45) (actual time=0.531..0.643 rows=16 loops=1) -> Filter: (scores.ScoreTime >= TIMESTAMP'2022-02-13 18:59:35') (cost=9.34 rows=45) (actual time=0.508..0.528 rows=16 loops=1) -> Index scan on scores using score_idx (reverse) (cost=9.34 rows=135) (actual time=0.500..0.516 rows=53 loops=1) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=scores.Movie) (cost=0.25 rows=1) (actual time=0.007..0.007 rows=1 loops=16) -> Filter: (movie.genre like 'Oth%') (cost=0.25 rows=0) (actual time=0.006..0.006 rows=1 loops=16) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=scores.Movie) (cost=0.25 rows=1) (actual time=0.005..0.005 rows=1 loops=16) ``` ### EXPLAIN ANALYZE after adding index on `ScoreTime` ```bash -> Limit: 15 row(s) (actual time=2.957..3.158 rows=15 loops=1) -> Nested loop inner join (cost=1508.50 rows=356) (actual time=2.956..3.156 rows=15 loops=1) -> Nested loop inner join (cost=916.00 rows=3200) (actual time=2.943..3.045 rows=16 loops=1) -> Sort: scores.Score DESC (cost=323.50 rows=3200) (actual time=2.912..2.915 rows=16 loops=1) -> Filter: (scores.ScoreTime >= TIMESTAMP'2022-02-13 18:59:35') (actual time=0.068..2.395 rows=1090 loops=1) -> Table scan on scores (actual time=0.065..1.822 rows=3200 loops=1) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=scores.Movie) (cost=0.25 rows=1) (actual time=0.008..0.008 rows=1 loops=16) -> Filter: (movie.genre like 'Oth%') (cost=0.25 rows=0) (actual time=0.006..0.007 rows=1 loops=16) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=scores.Movie) (cost=0.25 rows=1) (actual time=0.006..0.006 rows=1 loops=16) ``` ### Example2 index report We tried three different indexes in this example, which are `genre`, `Score`, `ScoreTime`. First, we believed the index `genre` could help us search the related genre much faster when compared with table scan, but in reality there's no satisfying improvement for searching efficiency, we believe it is due to the table scan of `scores`, which has already inner joined with `movie` and it tried to sort the entire score table first as: ``` -> Sort: scores.Score DESC (cost=323.50 rows=3200) (actual time=2.912..2.915 rows=16 loops=1) -> Filter: (scores.ScoreTime >= TIMESTAMP'2022-02-13 18:59:35') (actual time=0.068..2.395 rows=1090 loops=1) -> Table scan on scores (actual time=0.065..1.822 rows=3200 loops=1) -> Single-row index lookup on movie using PRIMARY (IMDB_Number=scores.Movie) (cost=0.25 rows=1) (actual time=0.008..0.008 rows=1 loops=16) ``` Thus, the index did not improve the efficiency since other operations happened before the filtering. As for the `ScoreTime` index, it didn't help this query a lot from the analysis, which is due to the table scan of `scores` . The reason why we chose this index is that we wanted to make use of index range search in the filter of ScoreTime. When it comes to the index for`Score`, we chose it because we thought it could help us to sort faster, and we could see an obviously improvement for the query, the cost of `Score DESC` costs 323.50, but improved to 9.34 with `score_idx`, so we will keep this index for an efficient search.