This is my personal notes taken for the course Machine learning by Standford. Feel free to check the assignments.
Also, if you want to read my other notes, feel free to check them at my blog.
A recommender system or a recommendation system is a subclass of information filtering system that seeks to predict the "rating" or "preference" a user would give to an item.
Problem Formulation: (Predicting Movie Ratings case)
Let's define:
The objective of the recommender system is to use the rated movies by the users to predict the rating a user would give to a non-rated item.
To do so, 2 methods:
Let's define:
The goal of this method is to predict the user rating on a non-rated movie based on movies characteristics. For example, when a friend asks you for a book recommendation, it's pretty natural to ask what kinds of books they have read and liked. From there, you could think of a few titles (books characteristics: fantasy, sci-fi ...) that are similar to the things they've read and liked in the past (user past books review).
So, in our case, it means that you need to have beforehand:
With that, you can find user preference denoted as . Once you have user preference, it becomes easy to predict the user rating on a non-rated movie.
Suppose user has rated movies, then learning can be treated as linear regression problem. So, to learn ,
To get the parameters for all our users, we do the following:
The cost function is then:
The gradient descent update is then:
Remark: The effectiveness of content based recommendation depends on identifying the features properly, which is often not easy.
Collaborative filtering has the intrinsic property of feature learning (it can learn by itself what features to use) which helps overcome drawbacks of content-based recommender systems.
Given the user past movie review and the parameter vector , the algorithm learns the values for the features by applying linear regression.
Intuitively this boils down to the scenario where given a movie and its ratings by various users () and user preferences , the collaborative filitering algorithm tries to find the most optimal features to represent the movies such that the squared error between the two is minimized.
Since this is very similar to the linear regression problem, regularization term is introduced to prevent overfitting of the features learnt. Similarly by extending this, it is possible to learn all the features for all the movies . Thus,
where the gradient descent update on features is:
But it is also possible to solve for both and simultaneously, given by an update rule which is nothing but the combination of the earlier two update rules. Thus,
is equivalent to looping through all the data where .
And the minimization objective can be written as,
Remark: Because the algorithm can learn feature by itself, the bias units where and have been removed, therefore and .
To summarize, the collaborative filtering algorithm has the following
steps:
Consequently, the matrix of all predicted ratings of all movies by all users can be written as:
where is the rating for movie by user .
How to handle the case where a user has not rated any movies ?
The term in our cost function is because the summation applies only if the user has rated a movie. Thus,
When minimizing our cost function, we will find equal to the 0 vector because the only term to pull away our regularization term on theta from 0 is equal to 0 (See above). Thus, when it comes to predict movies for Eve, it will all be equal to 0 () which does not seem intuitively correct.
To prevent this, we will do mean normalization. Here is an example:
where
Then, let's define
Which means that in our cost function, we need to put without the summation condition instead of
Content-based recommendation engine works with existing profiles of users. A profile has information about a user and their taste. Taste is based on user rating for different items. Generally, whenever a user creates his profile, Recommendation engine does a user survey to get initial information about the user in order to avoid new user problem.
In the recommendation process, the engine compares the items that are already positively rated by the user with the items he didn't rate and looks for similarities. Items similar to the positively rated ones will be recommended to the user. Here, based on user’s taste and behavior a content-based model can be built by recommending articles relevant to user’s taste. This model is efficient and personalized yet it lacks something.
Let us understand this with an example. Assume there are four categories of news:
and there is a user A who has read articles related to Technology and Politics. The content-based recommendation engine will only recommend articles related to these categories and may never recommend anything in other categories as the user never viewed those articles before.
This problem can be solved using another variant of recommendation algorithm known as Collaborative Filtering.
The idea of collaborative filtering is finding users in a community that share appreciations. If two users have same or almost same rated items in common, then they have similar taste. Such users build a group or a so-called neighborhood. A user gets recommendations for those items that user hasn't rated before but was positively rated by users in his/her neighborhood.
Collaborative filtering has basically 2 approaches:
Let's try to understand above picture. Let's say there are three users A, B and C.