Non Personalized Recommenders

In non personalized recommenders everyone gets to see the same recommendation. The site does not know who you are. If I go to as an anonymous user it shows items that are currently viewed by other members. This is one example of a non personalized recommendation.

There are two variations in this – Predictions and Recommendations. Predictions are bold statements and they come in the form of scores, stars and counts. Recommendations on the other hand are not bold statements. They just place items without any numbers associated with it.

One of my favorite book is The Signal and the Noise by Nate Silver. In the scale of 1 to 5 the book has got 4.3 stars out of 5. This is an example of prediction.


The math behind this is very simple.

Score = (364 * 5 + 161 * 4 + 62 * 3 + 21 * 2 + 21 * 1)
Score = 2713 / 629
Score = 4.3 out of 5 stars

In the same page it also displays the information about the other books which the customers bought after buying Signal and Noise. This is an example of recommendation.


How would amazon do this? In order to understand this I came up with a matrix of members and different books purchased by them.

Signal and Noise Naked Statistics Influence Made to Stick How to read a book
Allen O O O
Bruce O O O
Peter O O
Tim O O
Tom O O

Let X be the number of customers who purchased the book Signal and Noise. Let Y be the other books they purchased. You need to compute the ratio given below for each book and sort them by descending order. Pick up the Top K books and show them as related.

Total Customers who purchased X and Y / Total Customers who purchased X

If you do the above math for all the other four books you will get

Naked Statistics 100%
Influence 100%
Made to Stick 0%
How to read a book 0%

Naked Statistics makes perfect sense. But why did Influence came to the top when it has been purchased by customers who have not purchased Signal and Noise. This situation is called as banana trap. In a grocery store most of the customers will buy bananas. If someone buys a razor and a banana then you cannot tell that the purchase of a razor influenced the purchase of banana. Hence we need to adjust the formula to handle this case as well. The modified version is

(Total Customers who purchased X and Y / Total Customers who purchased X) / 
(Total Customers who did not purchase X but got Y / Total Customers who did not purchase X)

Substituting the numbers we get

Naked Statistics = (2 / 2) / (1 / 3) = 3
Influence = (2 / 2) / (3 / 3) = 1

The denominator acts as a normalizer and you can see that Naked Statistics clearly stands out.

Let move from to Hacker news. Hacker News is a social news website that caters to programmers and entrepreneurs, delivering content related to computer science and entrepreneurship. How does the site decide which news should show at the top? Readers of the news articles can submit up votes and comments. The algorithm uses this information along with the time at which the post was submitted to decide its rank. Excerpt from their site.

How are stories ranked? 

On the front page, by points divided by a power of the time since they were submitted. Comments in comment threads are ranked the same way. Stories on the new page and comments on the comments page are listed chronologically.

In this article the formula for calculating the score is given as

Score = (P-1) / (T+2) ^ G

P = points of an item (and -1 is to negate submitters vote)
T = time since submission (in hours)
G = Gravity, defaults to 1.8 in news.arc

Points Time since submission (in hours) Score
Article 1 100 0 28.43
Article 2 200 2 16.41
Article 3 500 20 1.91
Article 4 2000 200 0.14

Even though Article 1 has lowest points it got the highest score. This makes perfect sense as more weight is given to the recent articles.

Reddit is another website which is called as the front page of internet. Users submit articles in the from of link and text. These articles can be voted up and down. The site uses a slightly different algorithm to score the articles. The details of the algorithm can be found here. Given below is the key piece of code which does the scoring.

def hot(ups, downs, date):
    s = score(ups, downs)
    order = log(max(abs(s), 1), 10)
    sign = 1 if s > 0 else -1 if s < 0 else 0
    seconds = epoch_seconds(date) - 1134028003
    return round(order + sign * seconds / 45000, 7)

Reddit differs from Hacker news in the following ways.

  1. Instead of using the absolute net points(ups – downs) the logarithm of the net points is used. Why would they do this? The site does not want to differentiate the scores for 100 and 200 points but it wants to differentiate the scores between 100 and 2500 points. Logarithms helps in solving this problem. In base10 Log(100) = 2 and Log(2500) = 3.39. If you need a refresher in logarithms click here.
  2. What does this number 1134028003 seconds mean? This comes to 35.95 years. Adding this to epoch of Jan 1 1970 this comes to the year when Reddit was founded.
  3. The scores will not decrease as time goes by. But the new articles are ranked higher the older articles.
Points Time since submission (in seconds) Score
Article 1 1100 252288000 5609.441393
Article 2 1500 252288000 5609.576091
Article 3 12000 252288000 5610.479181
Article 4 500000 252288000 5612.09897

All the articles are submitted at the same time in 2013. Even though there is a huge difference in the points between Article 1 and Article 4 the score differs only by 3 points as it uses logarithmic scale.

If you want to learn more about how recommendations systems work sign up for the free online course.