Project Summary

# Project Summary - Team 13 ## 1. Motivation and problem definition <div style="text-align: justify"> The idea of this project is to predict the rating that a user can give to an unvisited restaurant based on his previous interests/behavior. This prediction can be beneficial both for the business and for the users. It is an opportunity to recognize its possible customers and design a business plan to maximize the satisfaction of their target society. In addition, Yelp can use it to recommend new businesses to its own users and increase platform usage. This creates a magical loop where the user uses the platform more frequently and then makes the business's rating more accurate (creating an endless cycle that generates data). ## 2. Algorithm and methods There are two main approaches for the recommendation system which we plan to design. ### 2.1 Collaborative filtering Content-based filtering uses item features to recommend other items similar to what the user likes, based on their previous actions (e.g., visiting a website) or explicit feedback (e.g., rating). In this scenario, we have to build a separate profile for users and for the restaurants and define a metric to measure the distances between the two restaurants. To set a baseline, we will use the cosine distance between the vectors of the item and the user to determine its preference. The item profile for restaurants may include city, state, category, attributes, etc, while the user profile will be the weighted average of the rated item profile. ### 2.2 Content-based approach CF uses similarities between users and items simultaneously to provide recommendations. This allows recommending an item to user A based on the interests of a similar user B. In order to find these similarities, we can use some well-known approaches such as Jaccard Similarity, Cosine Similarity, and Pearson Correlation Coefficient. We will also use the Nearest Neighbourhood algorithm for CF. ## 3. Dataset and features In this project, we will use Yelp’s dataset, more specifically the restaurant category. The dataset is composed of three JSON files: 1. business.json contains 160,585 businesses records. It includes the business’s name, address, city, state, category, attributes, etc. 2. review.json contains 8,635,403 review records. It includes the review’s text, user’s id, restaurant’s id, the restaurant rate (1-5), etc. 3. user.json contains data about 2,189,457 users records. It includes the user’s name, review count, friends, average stars, etc. ## 4. Metrics to evaluate model We will use Root Mean Square Error (RMSE) and Mean Average Error (MAE), which are popular measures to evaluate regression problems. </div>

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.