The Netflix Prize: Crowdsourcing to Improve DVD Recommendations

Netflix utilized crowdsourcing to develop innovative solution to improve its recommendation engine by 10%. The 3-year Netflix Prize attracted 44,014 submissions, and was ultimately won by a team that had combined algorithms after the second year of the contest, proving that you can crowdsource a crowdsource.

In 2006, Neflix launched the Netflix Prize, “a machine learning and data mining competition for movie rating prediction.” Netflix hoped the $1 million prize would encourage a range of algorithmic solutions to improve the company’s existing recommendation program, Cinematch, by 10%. Cinematch used “straightforward statistical linear models with a lot of data conditioning,” and served two major purposes: as a competitive differentiator by recommending unfamiliar movies to customers, and more importantly, enabled Netflix to reduce demand for blockbuster new releases and improve assets turns of all DVDs in inventory.

Project Scope: Netflix defined the project scope as a 10% reduction of the “root mean squared error” (RMSE) from Cinematch’s existing 0.9525, and set a series of guidelines for competition participants. Netflix provided  a dataset containing 100 million anonymous movie ratings for participants to test their algorithms.

2007 Progress Prize: In 2007, the BellKor Team, comprised of three employees from the Statistics Research group in AT&T labs, achieved an 8.43% improvement over Cinematch, and were awarded the first of two $50,000 Progress Prizes. The team spent nearly 2,000 developing a final solution that contained 107 algorithms and achieved a RMSE of 0.8712. Netflix engineers investigated the source code (a requirement for the prize), and identified the two best performing algorithms (of the 107): Matrix Foundation (also known as Singular Value Decomposition (SVD)) and Restricted Boltzmann Machines (RBM). “A linear blend” of the two algorithms were ultimately put to use in Netflix’s recommendation system, but the company set a goal of 1% improvement over BellKor’s solution to receive the 2008 Progress Prize.

netflix_winners_2_star-medi

2008 Progress Prize: The BellKor in BigChaos (the original BellKor team combined forces with colleagues from colleagues from Commendo Research) team won the second Progress Prize with an RMSE of 0.8627 and a 9.44% improvement over Cinematch.
2009 Grand Prize: With 24 minutes remaining before the close of the 3-year contest, Bellkor’s Pragmatic Chaos, a further expansion of the original BellKor team, submitted the ultimate $1 million grand prize solution. The final set of algorithms achieved an RMSE of 0.8567 and a 10.06% improvement over Cinematch.
bigcheck
The entire three year contest included 51,051 contestants and 41,305 teams (representing 186 countries). Netflix ultimately received 44,014 valid submission from 5,169 teams.
leaderboard
Capture
bellmath
Crowdsourcing: At each stage of the contest the original BellKor team expanded by adding new members, or in the case of the final submission, an entire competing team, to further refine their algorithms and drive for a >10% improvement. Second place finisher “The Ensemble” also formed as a collection of teams that had submitted individual solutions earlier in the contest. Wired magazine summarized this crowdsourcing within a crowdsourcing competition: “The secret sauce for both BellKor’s Pragmatic Chaos and The Ensemble was collaboration between diverse ideas, and not in some touchy-feely, unquantifiable, ‘when people work together things are better’ sort of way. The top two teams beat the challenge by combining teams and their algorithms into more complex algorithms incorporating everybody’s work. The more people joined, the more the resulting team’s score would increase.”
“At first, a whole lot of teams got in — and they got 6-percent improvement, 7-percent improvement, 8-percent improvement, and then it started slowing down, and we got into year two. There was this long period where they were barely making progress, and we were thinking, ‘maybe this will never be won…’ Then there was a great insight among some of the teams — that if they combined their approaches, they actually got better. It was fairly unintuitive to many people [because you generally take the smartest two people and say ‘come up with a solution’]… when you get this combining of these algorithms in certain ways, it started out this ‘second frenzy.’ In combination, the teams could get better and better and better,”  explained  Netflix chief product officer Neil Hunt.

When it was an independent team, Pragmatic Theory, discovered that the number of movies rated by an individual on an given day could be used as an indicator of how much time had passed since the viewer watched the movie. They also tracked “how memory affected particular movie ratings.” (ed. note: unclear how this was done). Although this discovery was not particular successful on its own at achieving the >10% improvement, when combined with BellKor’s algorithms, it gave the new team a slight edge over the competition.

The Netflix Prize demonstrates the power of crowdsourcing in developing innovative solutions for complex problems. Further, it’s an interesting example of how setting various stages in the competition can help further push teams to achieve new success by combining their solutions with other contestants.

Previous:

Yelp: An old dog may need a new trick soon

Next:

Minted: Using the Crowd to Define Good Design

Student comments on The Netflix Prize: Crowdsourcing to Improve DVD Recommendations

  1. Thank you for the post! I am always amused by Netflix and what’s happening to the business. They were successful in disrupting Blockbuster and a lot of credit to their business model actually lies in their superior movie recommendation algorithm. I had never realized the way they went about to refine the algorithm and make it so defensible that others could not spend the dollars to do themselves. Clearly they have demonstrated a lot of innovation in doing so! Though now going forward I am curious to see how much the algorithm can help them differentiate given that now video is now moving to a OTT model and that players like IMDB have a much larger database that Amazon Video has access to with their videos. Netflix needs to innovate again if it wants to survive and do something which other players cannot. May be the answer lies in how they can innovatively leverage the crowd again so that what they did to Blockbuster does not happen to them.

  2. Great example Noah. I always thought this was such a brilliant move on Netflix’s part to improve their product. However, I think this is an example where casting a wide, crowd-sourced net can create some unintended consequences. Apparently the initial contest caused some privacy concerns that resulted in a class action lawsuit and the cancelation of the second contest (http://www.wired.com/2010/03/netflix-cancels-contest/). Additionally, my understanding is that not all of the contest innovations were used because of the engineering costs associated with implementation (http://arstechnica.com/gadgets/2012/04/netflix-never-used-its-1-million-algorithm-due-to-engineering-costs/)

  3. Great post! Really interesting to see examples of crowd-sourcing very specific requests. I always like to think about whether the incentives to participate are optimized. Do you think if the award was bigger, Netflix would have gotten a better outcome? If it was smaller, would the outcome have been significantly worse? Always interesting to debate, and your blog really made me think of this.

    Can’t wait to read your next post!

Leave a comment