One Robot to Rule Them All
Data scientists and software engineers have built tools to successfully replace many high skilled jobs. But are we about to automate the data scientists themselves?
Imagine you founded a successful business…
Your product is getting traction and even revenue. You’re also collecting tons of interesting data. In order to stay competitive you need to use that data – perhaps to predict user behavior, recommend content, customize the experience, or price more effectively. But that means you must hire a data scientist. It is one of the most in-demand, and expensive skill sets on the market today. Even if you’re able to find a good one, you will end up paying him or her more than $150,000 a year in salary alone  And it is going to take a few months for that person to ramp up… In comes DataRobot.
DataRobot is a software company that attempts to automate the work done by data scientists. It promises to be better, faster, and cheaper.
This sounds too good to be true…
DataRobot uses brute force to ingest your datasets and algorithmically output the best predictive model for it. The platform evaluates 1000’s of models in open source libraries. It then “searches through millions of possible combinations of algorithms, pre-processing steps, features, transformations and tuning parameters to deliver the best models for your dataset and prediction target” . The process is computationally expensive, but far more comprehensive, cheaper, and quicker than hiring a data scientist.
DataRobot creates value by running state of the art analytics on your datasets. It captures it through charging a fee for use of an API that encapsulates the best possible predictive model for that dataset. So now, with only one additional line of code, your software makes an API call and adds a predictive analytics layer to your software. Its like magic!
For a visual explanation of how DataRobot works, see this video.
Who else is out there?
Today, DataRobot’s primary competition are data scientists. And good ones are hard to find, easy to lose, and expensive to retain. They do not pose a great threat to DataRobot. But other companies are attempting to build platforms that are just like Data Robot, such as Watson Analytics, and Loom Systems. These is even, machineJS, a push in the open source community to build this platform. These pose a more serious threat than humans. DataRobot’s “brute force” method – testing every open source predictive model on a dataset and picking the best combination – is not particularly opaque. So it is likely that these competitors can easily catch up. What is more, this brute force method requires a lot of servers and computational power, which is expensive. And the bigger competitors, such as Amazon and Google, which have their own cloud infrastructures, can do this at less of a cost than DataRobot, which relies on AWS for the heavy lifting. However, DataRobot has one interesting advantage: data about its data. It can develop a broad understanding of what kinds of algorithmic methods work best for any given datasets. Given a certain data type, distribution, size, or shape, it can look to previous analysis it has conducted and reduce its compute time and costs. In order to stay competitive, DataRobot must optimize, and learn from, its own previous analysis attempts.
So while today we see data scientists and software engineers building tools to successfully replace many high skilled jobs – we about to also see them automate their own work!
Student comments on One Robot to Rule Them All
Very interesting, Bansi thanks for sharing! While I think there is great potential in providing third-party data analytics capabilities I am concerned about their ability to compete with Amazon/Google. As you correctly pointed out Amazon/Google have their own cloud infrastructure and most data scientists rely on Amazon cloud services to run their ‘big data’ projects. How do you think Amazon will react if DataRobot grows fast?
Great post! Thanks Bansi. Interesting to think about data scientists new role in this DataRobot world. I wonder if the critical thinking role of the data scientist can ever be replaced by “dumb” brute force methods, as it seems that connecting online and offline worlds will continue to require human judgment. In which situations do you think robots will (or won’t) replace data scientists?
Thanks, Bansi! I’m super interested in how technology is going to transform jobs requirements and workforce retraining in the future. Daniel Franklin of the Economist spoke on campus yesterday about his new book, MegaTech, which outlines what he thinks technology will look like in 2050. Someone asked him about the impact of technology on jobs, and he believes the people who will succeed in the coming economy are those that have strong empathy and emotional intelligence (skills that are harder for computers to replicate). Thought it was an interesting, related idea.
My actual question 🙂 is related to what Rahul asked in class today. Is this a replacement or complementer? You seem pretty confident (based on your data background) that it’s the former, however the marketing video you shared does frame DataRobot as a tool that supplements and empowers the data scientist. Do you think this is simply a tactic to avoid threatening the professional identity of key stakeholders (e.g. data scientists may be decision makers’ in company’s purchasing process) or do you think the company really believes this product is better in combination with human judgement?
Really enjoyed your post Bansi – I’m wondering how Datarobots will be able to make better decisions when it comes to companies/organizations that need to evoke the emotional side of humans. My sister is running a pet accessories store and a lot of the products sold are not rational purchases (scarves/bow ties/clothes etc for dogs) and hence I feel there is a need for a human who can interpret data given the context on a case by case basis. It will be interesting to see how this field evolves and when, if ever, data robots can actually replace the scientists.