Data in Drug Research: “We have no idea what we what know”

Using data to discover treatments faster

PillsPharmaceuticals companies spend billions of dollars per year on creating data. But the past few decades have seen an arms race in who can not only create data, but mine and effectively analyze the data to place educated bets on which compound might be the next big blockbuster drug.

Thirty-five years ago, pharmaceutical companies had a dilemma: they simply couldn’t perform enough experiments fast enough to get the data and research they needed to isolate targets for further study. Most of the chemical experiments and processes, from cell culture to serial dilutions, were done by hand by technicians in the laboratory. And the speed at which the technicians could work was the rate limiting factor in eliminating poor candidates for development, or in finding the next silver bullet that might treat a stubborn and devastating type of cancer. However, in the mid 1990’s a new technology found its way into the laboratories of the big pharma companies with cash to spend: High-Throughput Screening.

An example of a High Throughput Screening system

High-Throughput Screening is a lab bench on steroids. The systems can range from $500,000 to many millions of dollars and are large automated platforms that mixed chemicals together, performed all necessary experimental reactions (e.g. incubation, centrifugation), and then retrieved data from the biological reactions using phosphorescence or microscopic readers. This solved the data scarcity problem.

But suddenly, pharmaceutical companies (and later biotechnology companies) had a new problem on their hands: famine had turned to glut. And there was simply too much data and not enough tools to analyze the information coming out of the automated platforms. It was estimated at Novartis in 2010 that up to 90% of the data sitting on company research servers had never been fully analyzed and was causing major waste in R&D through repeat experimentation and high IT storage costs.

It became clear: Pharmaceutical companies are data companies. The key to the next multi-billion dollar drug could be sitting in the vast expanse of data sitting on the servers. And if a drug company wanted to maintain its edge and beat competitors to a treatment, it needed to understand the data coming from experiments and look to trends in the results. But it also needed to understand the data coming from clinical trials, patients, caregivers, and biological processes to tailor its research toward valuable and promising opportunities.


VilesA large number of software analytics tools were created both in-house and by third party vendors. These tools sifted through the data which might include 2M different compounds in a single experiment and quickly eliminated poor candidates. But more remarkably, the face of drug discovery started to change with consideration of a different type of data: DNA. By incorporating the research data and the data coming from the mapping of the human genome, pharmaceutical companies were able to focus on the emerging field of Biologics – science in which a protein or other organically based drug is created to fit like a puzzle piece into the part of a biological system that is “broken”. By mining the vast resources of biological processes that had been discovered, and by building comprehensive models on computers, researchers could do “in silico” experiments (a play on traditional “in vitro” terminology and experimentation). Testing in a simulation environment could save the research group millions of dollars in compound and reagent costs in a single year. And the data being produced from these tools could be saved for future consideration when deciding how to build a new simulation.

The industry is still grasping for any new tools it can use to better analyze it’s data. IBM’s Watson is being used to analyze over 70,000 scientific journals and look for patterns in biology for a single protein. No researcher could hope to read so many papers. The new platform introduces a whole new level of capabilities in judging where to direct R&D dollars efficiently and effectively.

Pharma companies’ business models depend entirely on their ability to notice trends in the ocean of data around human health. Novartis’ top ten pharmaceutical drugs accounted for 35% of its 2014 revenues. All of these drugs were created as a results of data-driven R&D. The quicker data can be analyzed, the quicker a drug can come to market. This allows for greater value capture from a new drug before the patent expires and it can be manufactured by generics companies. The winners in this industry will be the companies with the best data analytics platforms.






Counter-Strike: Global Offensive — One of Valve’s Data Habits


Optum: Using moneyball analytics to provided better insights in healthcare

Student comments on Data in Drug Research: “We have no idea what we what know”

  1. You mentioned that Novartis in 2010 had 90% of the data sitting on their servers without being properly analyzed. I wonder if this is a company specific or industry wide phenomenon.

Leave a comment