Kaggle: Building a Market for Data Science (and Scientists)
Kaggle is an innovative two-sided platform that connects companies facing hard problems with the data scientists who can solve them.
The online data science and machine learning community Kaggle is just one of the many two-sided platforms that have emerged in recent years. Recently acquired by Google, the site is home to over one million users ranging from computer science Ph.D. holders conducting cutting edge research to absolute beginners. Kaggle is best known for its data science competitions that offer (substantial) cash prizes, but it also serves as an educational tool for autodidacts as well as a place to present one’s portfolio of related work.
Various organizations use Kaggle to sponsor contests to develop machine learning algorithms for a slew of purposes. The analytics company Two Sigma recently created a competition to build an algorithm predicting stock market fluctuations in relation to news developments, offering $100,000 in total prize money. The United States Department of Homeland Security sponsored a challenge with a first-place prize of $500,000 for the algorithm that best streamlines airport passenger screening procedures.
By allowing organizations to crowd-source answers to extremely complex questions, Kaggle serves as a marketplace for these solutions. For what is usually less than the cost of a single full-time data scientist (almost always $100,000 or more a year), companies can employ an army of freelancers to build bespoke models for their problems. Furthermore, it allows companies to conduct a completely merit-based evaluation of potential employees. Applicants (who may or may not know they are trying out for a full-time position) also have the chance to earn money while going through the recruiting process for a position with the hosting company.
Although the strongest attractions for users are the competitions, the educational and portfolio aspects of the site serve as a mechanism to draw data scientists to it and keep them there. The ability to explore other users’ algorithms and collaborate on problems has almost certainly led to the massive growth in the platform’s user base.
Finally, Kaggle creates value for society by organizing competitions on a pro bono basis for non-profit and research organizations. By partnering with these groups to facilitate research into important but less financially lucrative problems, Kaggle helps direct the brainpower of its user base towards noble ends.
It does not appear that Google captures substantial monetary value from Kaggle. The platform does not run any advertisements and signing up is free for users. Organizations hosting competitions, however, likely pay a fee to do so, and Kaggle may offer consulting services to help the organization structure its competition. Press observers have speculated that the primary reason Google made the acquisition is to improve its brand in the data science community and to facilitate recruitment of these in-demand professionals. Kaggle’s parent company likely pays close attention to the top-ranked contestants on the platform, putting them in touch with Google’s recruiting team as appropriate.
By connecting talented data scientists with tough problems, motivating them through lucrative cash prizes, and assisting their professional development through educational and portfolio resources, Kaggle creates substantial value for its users. Partner organizations can similarly develop custom-built solutions for their business challenges while identifying the best talent to recruit. Finally, Google earns revenue from these partners while at the same time building its credibility in the arena of data science. The mutual value creation for all of the involved parties suggests that Kaggle will remain a dominant player in the space for years to come.
Image from Kaggle.
Student comments on Kaggle: Building a Market for Data Science (and Scientists)
I think Kaggle’s is a fascinating company. I wonder if you think that being acquired by Google was a smart move. As Kaggle has already made a name for themselves, do you think it may have been wiser to remain a separate entity with decision rights and potentially a bigger valuation by only partnering with Google?
Super interesting post! I had never heard of Kaggle and in reading your post the first question I had was around intellectual property of algorithms developed on the site, particularly through the crowd sourced challenges. From some quick research it looks like in exchange for the prize money the data scientists must grant “a worldwide, perpetual, irrevocable and royalty-free license […] to use the winning Entry”. I understand this is true under virtually any corporate employment contract but I wonder if any data scientists are unwilling to compete in these challenges since they don’t know the exact value of their algorithm when it’s developed and they waive any rights to future royalties.
Similarly, now that Google owns the platform do they have any right or even just visibility to the algorithms and IP developed on the platform? If so that could be a huge asset in addition to identifying hard to source data science talent.
Very interesting post — I also hadn’t heard of Kaggle. Reading your article, I am left with a question and a thought. The question: I wonder what the play for Google is. The thought: this business model ties in beautifully with Prof. Karim Lakhani’s work on “Crowds.”
Regarding the question, it is interesting that this was announced at the Cloud conference. With the platform’s focus on data scientists, this seems like a strong complement to Google’s growing cloud business. I wonder if this would be of any interest to some of the older enterprise clients that are starting use Google’s cloud services. I doubht it would be a huge money-maker, but it makes for an interesting bolt-on. I also like the recruiting argument that you make.
Regarding the comment about Professor Lakhani’s work, I think that this is a wide open space that has yet to be fully explored: the power of leveraging crowds to solve otherwise private, corporate problems. There is a private/public challenge to this, as well as a “matching” problem related with marrying “problems” with the right “sort” of talent.
Very insightful post! In a vein similar to Maren, I am curious as to IP rights and whether users will be able to claw back value from their contributions to any given initiative (pro bono or for profit) should it be successful in the future by demonstrating clear evidence of their contribution. My hope is that this platform democratizes data science for early stage ventures both by making the hiring of developers and data scientists more meritocratic (as you detail) and by allowing early-stage companies to crowd-source solutions to MVPs and data interpretation rather than hiring a less-than-‘full stack’ data scientist or developer.
Great post, Walter! Kaggle sounds like a great platform for beginners to gain exposure and enhance their skills while also offering smaller start-ups the ability to access a broader set of knowledge workers. This made me think a lot about the ZBJ case as well. Maren’s point regarding the IP rights and the potential secondary side-effect of data scientists self-selecting out of the system or not excepting a winner’s prize because they do not want to relinquish their IP rights. I’d also be curious to see how long the average participant remains involved with the site and if any of the submissions have led to longer-term service agreements or formal job offerings.
Great post! I’ve used Kaggle in the past to find datasets to use for online programming courses, but had no idea they were acquired by Google two years ago. This seems like a natural fit and good brand extension for Google. I wonder if they will more tightly integrate the platform with their other services in the future or if they just plan to use it for their “brand” and maintain it as a separate entity? Most press releases seem to suggest the later. Either way, this seems like a great addition to the Google portfolio and should help with their recruiting efforts and attempts to make sure they maintain their label as a leader in the data science community.
Love this post! I used Kaggle in the past as well to find dataset for a data visualization assignment. It’s always interesting to see a platform that promotes growth by crowdsourcing the skills, as well as make it competitive. I wonder if Google acquired this platform for the interest of looking at the data science trends in other industries, expanding their horizontal view.
In spite of the differences between Kaggle and typical data science, Kaggle competition can at present be an extraordinary learning instrument for beginners. They have Kaggle competitions, yet in the course of the competition you should utilize some different data sets and you may glance through Kaggle data sets.
I love this post! I wonder if Google acquired this platform for the interest of looking at the data science trends in other industries, expanding their horizontal view.