# CS322 REPORT OF DELIVERABLE ONE
#### Part 0: Data Exploration
The data given is a subset of the famous Yelp data which consists on the feebacks of clinets on business.The
We have 4 csv files:
- tip.csv
- review.csv
- user.csv
- business.csv
Each of these csv files contains various columns which represent to some extent the attributes or characteristics of the entity. As we only want our attributes values to be atomic, we decide to separate the columns of each csv files into two kinds: atomic, non-atomic.
Here we list all fields which are not atomic because they're the more difficult to treat with.
- __user.csv__ contains the following non-atomic fileds:
- elite
- friends
- __Business.csv__ contains the following non-atomic fileds:
- attributes
- categories
- hours
- __tip.csv__ contains the following non-atomic fields:
- Null
- __review.csv__ contains the following non-atomic fileds:
- Null
#### Part 1: ER Diagram
An ER diagram should contain entity set, relation set, attributes , and also the link between entity set and relation set.
We first consider business, tips, reviews and users as four entities sets and all the atomic fields contained in the corresponding csv file are considered as attributes linked to the entity.
As for non-atomic fields, we discuss them one by one:
- __elite__ :
- a list of years
- one user can have many years for elite
- represent it by a weak entity set for user
- contain each corresponding elite year for each users as separate rows
- __friends__:
- a list of user_id of friends (we call them invited users of a particular inviting users)
- inviting users can have many or no friends, and one user can become invited users of many or no users
- represent it by a relation set
- contain user_id of both inviting users and invited users
- __attributes__:
- have 6 catogaries: Music,GoodForMeal,BusinessParking,Ambience,DietaryRestrictions, Noise
- all of them (except noise which is atomic) have its own (fixed) set of keys and corresponding atomic values (of boolean)
- represent noise as attribute of business
- represent others as individual weak entity sets for business, containing their keys as attributes
- __categories__:
- list of words with over 1000 unique value of words
- impossible to have columns for each unique value of words
- represent as attribute of business (of long string)
- __hours__:
- have 7 atomic values (time) for each weekdays
- represent by a weak entity sets for business, with each weekdays as attribute
### Constraints:
#### Entity set:
- user has user_id as __primary key__
- review has review_id as __primary key__
- business has business_id as __primary key__
- tip has no primary key. It's a __weak entity__. It has two owner sets: user and business. __User_id, business_id__ and __tip.date__ all together form a __primary key__ for tip set.
- elite is a __weak entity__ whose owner set is user. User_id and elite_year together form a __primary key__ for elite set.
- each non-atomic catagory of attributes (Music,GoodForMeal,BusinessParking,Ambience,DietaryRestrictions) are __weak entity__ whose owner set is business. business_id is both the __foreign key__ and __primary key__
- hours is a __weak entity__ whose owner set is business. business_id is both __foreign key__ and __primary key__.
| Entity set | Primary Key | Foreign Key | NOT NULL | UNIQUE |
| ------------------- | ----------------------------------| ------------------ | ------------------ | ------ |
| User | user_id | None | None | None |
| Review | review_id | user_id, business_id | user_id, business_id | None |
| Business | business_id | None | None | None |
| Tip (weak entity) | (user_id, business_id, date) | user_id, business_id | user_id, business_id | None |
| Hours(weak entity) | business_id | business_id | business_id | None |
| Music(weak entity) | business_id | business_id | business_id | None |
| GoodForMeal(weak entity) | business_id | business_id | business_id | None |
| BusinessParking(weak entity) | business_id | business_id | business_id | None |
| Ambience(weak entity) | business_id | business_id | business_id | None |
| DietaryRestrictions(weak entity)| business_id | business_id | business_id | None |
| Elite(weak entity) | (user_id,year) | user_id | user_id,year | None |
| Relation set | Primary Key | Foreign Key | NOT NULL | UNIQUE |
| ------------ | ------------------ | --------------- | --------------- | ------ |
| friends | (userID1,userID2 ) | userID1,userID2 | userID1,userID2 | None |
## DDL code
```sql
CREATE TABLE Ambiance(
CASUCAL BOOLEAN,
CLASSY BOOLEAN,
DIVEY BOOLEAN,
HIPSTER BOOLEAN,
INTIMATE BOOLEAN,
ROMANTIC BOOLEAN,
TOURISTY BOOLEAN,
TRENDY BOOLEAN,
UPSCALE BOOLEAN,
PRIMARY KEY (BUSINESS_ID),
FOREIGN KEY (BUSINESS_ID) REFERENCE BUSINESS,
)
CREATE TABLE HaveBusinessParking(
GARAGE BOOLEAN,
LOT BOOLEAN,
STREET BOOLEAN,
VALET BOOLEAN,
VALIDATED BOOLEAN,
PRIMARY KEY (BUSINESS_ID),
FOREIGN KEY (BUSINESS_ID) REFERENCE BUSINESS,
)
CREATE TABLE HaveDietaryRestrictions(
DIARY_FREE BOOLEAN,
GULTEN_FREE BOOLEAN,
HALAL BOOLEAN,
KOSHER BOOLEAN,
SOY_FREE BOOLEAN,
VEGAN BOOLEAN,
VEGETARIAN BOOLEAN
PRIMARY KEY (BUSINESS_ID),
FOREIGN KEY (BUSINESS_ID) REFERENCE BUSINESS,
)
CREATE TABLE HaveGoodForMeal(
BREAKFAST BOOLEAN,
BRUNCH BOOLEAN,
DESSERT BOOLEAN,
DINNER BOOLEAN,
LATENIGHT BOOLEAN,
LUNCH BOOLEAN,
PRIMARY KEY (BUSINESS_ID),
FOREIGN KEY (BUSINESS_ID) REFERENCE BUSINESS,
)
CREATE TABLE HaveMusic(
BACKGROUND_MUSIC BOOLEAN,
DJ BOOLEAN,
JUKEBOX BOOLEAN,
KARAOKE BOOLEAN,
LIVE BOOLEAN,
NO_MUSIC BOOLEAN,
VIDEO BOOLEAN,
BUSINESS_ID CHAR [22]
PRIMARY KEY (BUSINESS_ID),
FOREIGN KEY (BUSINESS_ID) REFERENCE BUSINESS,
)
CREATE TABLE Open_in_Hours(
MONDAY CHAR[11], # e.g. 8:00-19:00
TUESDAY CHAR[11]
WEDNESDAY CHAR[11]
THURSDAY CHAR[11]
FRIDAY CHAR[11]
SATURDAY CHAR[11]
SUNDAY CHAR[11]
PRIMARY KEY (BUSINESS_ID),
FOREIGN KEY (BUSINESS_ID) REFERENCE BUSINESS,
)
CREATE TABLE BUSINESS(
BUSINESS_ID CHAR [22]
POSTE_CODE VARCHAR[16] # max poste code lenth in data=8
ADDRESS VARCHAR[128] # max address lenth in data=118
CITY VARCHAR[64] # max name lenth in data =50
LATITUDE FLOAT,
LONGITUDE FLOAT,
IS_OPEN BOOLEAN,
NAME VARCHAR(128), #MAX IN DATA =64
REVIEW_COUNT INT.
STARS INT,
STATE CHAR[2], # AZ,ON...
NOISE_LEVEL CHAR [10],#{'None', 'average', 'loud', 'quiet', 'very_loud'}
CATEGORIES TEXT, # max of lenth of categories in data =550
)
CREATE TABLE YELP_USER (
average_stars FLOAT,
compliment_cool INTEGER,
compliment_cute INTEGER,
compliment_funny INTEGER,
compliment_hot INTEGER,
compliment_list INTEGER,
compliment_more INTEGER,
compliment_note INTEGER,
compliment_photos INTEGER,
compliment_plain INTEGER,
compliment_profile INTEGER,
compliment_writer INTEGER,
cool INTEGER,
fans INTEGER,
funny INTEGER,
name varchar(64),
review_count INTEGER,
useful INTEGER,
user_id char(22),
yelping_since DATETIME,
PRIMARY KEY (user_id)
);
CREATE TABLE ELITE(
awardee char(22),
year INTEGER,
PRIMARY KEY (awardee,year),
FOREIGN KEY (awardee) REFERENCES YELP_USER(user_id)
)
CREATE TABLE FRIENDS(
inviting char(22),
invited char(22),
PRIMARY KEY (inviting,invited),
FOREIGN KEY (inviting) REFERENCES YELP_USER(user_id),
FOREIGN KEY (invited) REFERENCES YELP_USER(user_id)
)
create table TIP (
business_id char(22),
compliment_count INTEGER,
date DATETIME,
text varchar(1024),
user_id char(22),
PRIMARY key (business_id,date,user_id),
FOREIGN KEY (business_id) REFERENCES BUSINESS(business_id),
FOREIGN KEY (user_id) REFERENCES YELP_USER(user_id),
);
create TABLE REVIEW(
business_id char(22),
cool INTEGER,
date DATETIME,
funny FLOAT,
review_id char(22),
stars FLOAT,
text varchar(1024),
useful FLOAT,
user_id char(22),
PRIMARY key (review_id),
FOREIGN KEY (business_id) REFERENCES BUSINESS(business_id),
FOREIGN KEY (user_id) REFERENCES YELP_USER(user_id),
)
```