I am a Netflix subscriber. Every time I go to their site it will recommend movies I might like. Most of the times its recommendations are very good. How does it know what I like? I do not know how the actual algorithm is implemented. But I know some of the basic concepts behind this. Let us understand them in detail.
Imagine you really liked the movies Armageddon and Gladiator. You discuss this with your friend and he has watched both the movies and he really likes them. You would want to know others movies that he likes so that you can watch them. The reason why you do this is because you consider your friend as someone who has similar tastes like you and hence you would trust his recommendations. This is one of the core ideas which Netflix uses to recommend movies liked by others who are similar to you. In this post I will focus on movie recommendations based on similar others.
Take a look at the table containing the list of 10 movies along with the ratings given by 5 members. The rating scale is from 1 to 5. If the member has not rated a movie let us assume that they have not watched it.
Alice | Bob | Dan | Joe | Peter | |
Armageddon | 4.5 | 5 | 3 | 4.2 | 2.5 |
Brave Heart | 4 | 4.8 | 3.5 | 4.2 | 3.2 |
Cast Away | 4.7 | 2.7 | |||
Gladiator | 5 | 4.8 | 2.9 | 4.5 | 2.2 |
Ocean’s Eleven | 4.2 | 4.1 | 4.1 | 2.8 | |
Speed | 5 | 4.5 | |||
The Bourne Identity | 3 | 2 | |||
The Fugitive | 5 | 1 | |||
The Sixth Sense | 2.8 | 4.9 | 2.5 | ||
Titanic | 5 | 4.5 |
Let us recommend some movies for Alice. The first step to solve is to find out those who are similar to Alice. How do we solve this? We can use correlation to find this out. Correlation measures the degree to which two members are related to one another. Make sure you understand how correlations are calculated before proceeding further. Given below are the correlations between the members.
Alice | Bob | Dan | Joe | Peter | |
Alice | 1 | 0.334900163 | -0.933256525 | 0.862840305 | -0.964280699 |
Bob | 0.334900163 | 1 | -0.928178054 | 0.463139202 | 0.264353677 |
Dan | -0.933256525 | -0.928178054 | 1 | -0.628618557 | 0.970562185 |
Joe | 0.862840305 | 0.463139202 | -0.628618557 | 1 | -0.944720637 |
Peter | -0.964280699 | 0.264353677 | 0.970562185 | -0.944720637 | 1 |
Take a look at the values in the diagonal. They all have correlation as 1. This is because the diagonal calculates the correlation for the member with himself. Since we are going to recommend movies for Alice let us look at his correlation with other members.
Joe | 0.862840305 |
Bob | 0.334900163 |
Dan | -0.933256525 |
Peter | -0.964280699 |
You can see Joe’s taste are very similar to Alice as the correlation is close to 1. Bob taste is reasonably good as the correlation is positive. Dan and Peter tastes do not match with Alice as they are negatively correlated. Hence Joe and Bob have similar tastes like Alice.
Now let us rate the movies that are watched by Joe and/or Bob and not by Alice. The rating formula is given below. Why are we multiplying the correlation with the ratings? More weight is given to the ratings by the member who are highly correlated with Alice. In this case Joe’s rating is given more weight than Bob.
For Movie m Rating for Alice = ( Joe m rating * correlation(Alice, Joe) + Bob m rating * correlation(Alice, Bob) ) / (correlation(Alice, Joe) + correlation(Alice, Bob)) Using the formula for the movie The Fugitive Rating for Fugitive = (5 * 0.862840305 + 0) / (0.862840305 + 0) = 4.314201525 / 0.862840305 = 5 Using the formula for the movie The Sixth Sense Rating for Sixth Sense = (2.5 * 0.862840305 + 2.8 * 0.334900163) / (0.862840305 + 0.334900163) = (2.1571007625 + 0.9377204564) / 1.197740468 = 2.58
Joe Rating * Correlation | Bob Rating * Correlation | Rating | |
Cast Away | 0 | 1.574030766 | 4.7 |
Speed | 0 | 1.674500815 | 5 |
The Fugitive | 4.314201525 | 0 | 5 |
The Sixth Sense | 2.157100763 | 0.937720456 | 2.58 |
You can recommend Speed, The Fugitive, and Cast Away to Alice as they all have very high ratings.
There is one problem with the above solution. Rating is a number between 1 to 5. For the movie Armageddon, Bob has given a rating of 5 and Joe has given a rating of 4.2. What if Bob has a habit of giving inflated ratings compared to Joe even though both of them like the movie at the same level? The formula does not take the members overall ratings habit into account. Hence we need to adjust the formula to anchor it based on members average ratings. The average ratings of Alice, Bob, and Joe are
Average Ratings | |
Alice | 4.43 |
Bob | 4.46 |
Joe | 4.08 |
Here is the modified formula based on average.
For Movie m Rating for Alice = Average (Alice Rating) + (( (Joe m rating - Average(Joe Rating)) * correlation(Alice, Joe)) + (Bob m rating - Average(Bob Rating)) * correlation(Alice, Bob)) ) / ((correlation(Alice, Joe) + correlation(Alice, Bob)) ) Using the formula for the movie The Fugitive Rating for Fugitive = 4.43 + ((5 - 4.08) * 0.862840305) + 0) / (0.862840305 + 0) = 4.43 + (0.7938130806 / 0.862840305) = 4.43 + 0.92 = 5 (cap it by the maximum allowed rating) Using the formula for the movie The Sixth Sense Rating for Sixth Sense = 4.43 + (((2.5 - 4.08) * 0.862840305 + (2.8 - 4.46) * 0.334900163) ) / (0.862840305 + 0.334900163) ) = 4.43 + (-1.91922195248 / 1.197740468) = 4.43 - 1.60236879670997 = 2.82
Joe (Adjusted Rating * Correlation) | Bob (Adjusted Rating * Correlation) | Rating | |
Cast Away | 0 | 0.081332897 | 4.67 |
Speed | 0 | 0.181802946 | 4.97 |
The Fugitive | 0.790936946 | 0 | 5.00 |
The Sixth Sense | -1.366163816 | -0.554977413 | 2.82 |
You can recommend The Fugitive, Speed, and Cast Away to Alice as they all have very high ratings.
This should give an overall idea of how movie recommendations are done using the liking’s of similar others. The actual implementation which Netflix does will be very different and it will take other signals into account. To appreciate the power of Netflix recommendation read the excerpt from Think Twice.
Netflix, a Web-based DVD-rental firm founded in 1997, realized early on that successfully matching subscribers to movies was central to customer satisfaction – and hence the vibrancy of the business. In 2000, the company launched a service called CineMatch, a program of algorithms that pairs viewers and discs. Using consumer feedback, CineMatch rapidly improved its ability to anticipate consumer tastes and now drives well over half of Netflix’s rentals, keeping users happy and reducing reliance on new releases… CineMatch, or whatever program ultimately unseats it, is vastly better than the video-store employee in New York City.
Very interesting! Thanks for sharing. I always wondered about how they do the suggestions.