Faster, Cheaper, Better: The Promise of Machine Learning for Drug Discovery

GSK is rethinking its drug discovery pipeline using machine learning

Innovation in the pharmaceutical industry has always been labor and cost-intensive. Drug discovery in particular requires intensive research into the molecular basis of diseases and the compounds that interact in effective ways with molecular targets. This happens alongside extensive testing to validate hypotheses and examine other, potentially unanticipated, effects.1 There always exists a possibility that a research project goes nowhere, ultimately not being commercialized to bring in revenue. Studies have found that the total cost of developing a new drug to be $1.4 billion (out-of-pocket) to $2.6 billion (capitalized). 2 Further, pharmaceutical companies are incentivized to constantly look for new, innovative drugs as patent expiration and the potential for generics can lessen the value of their already commercialized products. Thus, GlaxoSmithKline (GSK) spent £4.5 billion ($5.8 billion USD) on Research & Development in 2017.3

Machine learning has the potential to dramatically alter the product development paradigm for pharmaceuticals. Instead of the existing lengthy, labor-intensive, and expensive process of medicinal chemistry analysis, new technologies are able to use pattern recognition algorithms to identify relationships between “small molecules and extrapolate them to predict chemical, biological and physical properties of novel compounds.”4 Through these “quantitative structure-activity relationship” (QSAR) models, researchers can test chemical modification via machine simulations instead of through physical tests; and as the technology and algorithms improve with additional datasets being added, the relative accuracy of the machine learning techniques will likewise increase.

Illustration by Michele Marconi

Big pharma seeks partners for machine learning expertise

Building robust machine learning capabilities alongside best-in-class traditional research methods is another costly task, despite being a big pharma behemoth. Organizationally, GSK established a specialized drug discovery unit to explore how to use these new techniques to make drug discovery faster, more precise, and cheaper. To get ahead, however, GSK has turned to start-ups who are focused exclusively on artificial intelligence in drug discovery. In 2017, GSK entered two large partnerships to leverage existing machine learning technologies for its own product development: a £33 million ($43 million USD) deal with Exscientia and details undisclosed deal with Insilico Medicine. 5

Exscientia is tasked with using its AI enabled platform to identify new and high-quality drug candidates for 10 disease-related targets as nominated by GSK. Within the platform, the company created a “rapid design-make-test cycle” that enables them to learn quickly and design effective compounds with increasingly complex profiles. Exscientia claims their approach has “delivered candidate-quality molecules in roughly one-quarter of the time, and at one-quarter of the cost of traditional approaches.”6

Insilico will focus on vaccine discovery using biologics (substances made from living organisms). Their technology builds “network models of common [disease] host organisms” and runs simulations of various bioprocesses to identify which candidate vaccines will be safest and most effective. 7 The two-year partnership also includes explicit goals for using learnings to improve upon the existing processes.

The longer-term perspective

Despite a significant amount of media attention and hype around machine learning, many aspects of the technology are still unproven. There isn’t consensus within the life sciences industries about the potential and application of these technologies and GSK must continue to be skeptical and critical of partners’ technologies.8 Continuing to work with multiple partners, instead of relying on just one technology, and understanding how the inputs and outputs differ between them will be critical in ensuring that GSK is able to reliably generate the most value.

Further, GSK must prioritize the development of in-house platforms and expertise on machine learning. This is important in the shorter term in order to knowledgeably assess their partners’ work products and move forward in a high-risk business with a strong understanding of their drugs. In the long term, continuing to work with partners will put GSK’s competitive advantages in innovation at risk. Exscientia, for example, has also signed partnership deals with Evotec and Sanofi.9 If machine learning becomes a predominant and integral part of drug discovery, GSK must have the capabilities within their organization to succeed.

GSK was not on the forefront of machine learning for drug discovery. As the field of advanced data analytics through artificial intelligence grows, GSK should strive to be the ones to identify new research, or additional ways to take existing technologies and apply across its extensive business (including in manufacturing and distribution). However, should these new technologies replace existing traditional methods of research altogether? If so – how fast, and are there additional risks to consider? What are the costs required to develop this in-house, or to acquire the knowledge from others? Is that value proposition worth it?


(750 words)



1 Julia James, “Why drug development is time consuming and expensive (hint: it’s hard),” Scope (blog), Stanford University School of Medicine, July 21, 2010, [], accessed November 2018.

2 Joseph A. DiMasi, “Innovation in the pharmaceutical industry: New estimates of R&D costs.” Journal of Health Economics 47 (May 2016): 20-33.

3 GlaxoSmithKline, 2017 Annual Report. [], accessed November 2018.

4 Yu-Chen Lo, “Machine learning in chemoinformatics and drug discovery.” Drug Discovery Today 23(8) (August 2018): 1538-1546.

5 Amirah Al Idrus, “GlaxoSmithKline taps Baltimore’s Insilico for AI-based drug discovery,” Fierce Biotech, August 16, 2017, [], accessed November 2018.

6 “Exscientia enters strategic drug discovery collaboration with GSK,” press release, July 2, 2017, on Exscientia website, [], accessed November 2018.

7 “Insilico and GSK establish a collaboration to promote biomanufacturing of the future,” press release, March 1, 2017, on Insilico Biotechnology website, [], accessed November 2018.

8 “Drug discovery and development may never be the same — the life sciences are embracing AI,” STAT News, November 6, 2018, [], accessed November 2018.

9 Evelyn Warner, “Europe’s Brightest Stars: The 9 Best Biotech Companies of 2017,” LABIOTECH, May 12, 2017, [], accessed November 2018.



Crowdsourcing at Amazon: Democratization of TV / Film Content


Will you marry me (if I ask with a 3D-printed ring)?

Student comments on Faster, Cheaper, Better: The Promise of Machine Learning for Drug Discovery

  1. A very well-written piece, though a bit light on the scientific underpinnings of this novel approach to drug discovery. Would appreciate additional perspective from the author on the questions below:

    1. The author does a nice job explaining how machine learning can be applied to accelerate understanding of chemical modifications to compounds. This likely presumes a worthwhile base compound to alter. Did the author discover any information on machine learning applications geared towards the discovery of novel compounds?

    2. As noted by the author, this technology is yet unproven. The evidence cited in the piece refers to “candidate-quality” molecules. The strength of this evidence depends on the definition of “candidate-quality.” Does the author have insight into the odds of “candidate-quality” molecules materializing as marketable therapeutics?

    3. The author mentions biologics in the form of vaccines (ie, preventative medicines). Did the author encounter any information on therapeutic biologics, such as those used to treat rheumatoid arthritis? I wonder if network models of disease states (rather than host organisms; e.g., inflammation in HLA-B27 autoimmune disorders) would be similarly applicable.

Leave a comment