Flatiron Health – Using Healthcare Data to Make Better Drugs

On the heels of IBM Watson’s failure in leveraging data in healthcare, Flatiron Health embarked on a mission to standardize data collection across oncology. They hoped to create a comprehensive database that could be used to make better therapeutics.

Flatiron Health – Using Healthcare Data to Make Better Drugs

The Problem with Healthcare Data

In Healthcare, Real World Data is considered all data that is collected outside the confines of a clinical study – in short, data from real world patients.1 A vast majority if this data exists across a number of unique data types (images, doctor’s notes, genomic data, etc.) and locations (electronic medical records, , pharmaceutical companies, patients, etc.). It can be extremely difficult to compile a comprehensive set of data that can be queried and used to understand trends and implications on an anonymized basis.


One key area where these issues are rampant is in the Electronic Medical Record (EMR) space. There is a significant lack of interoperability among EMR systems at hospitals / clinics because 1) there are so many options to choose from and 2) they often want to customize their EMR system to improve ease-of-use for physicians.2

The ownership structure and location of real world data is so variable that it’s almost impossible to consolidate all the data into a single location that can be used to draw conclusions.

Flatiron Health

Flatiron Health was started in 2012 and offers a suite of services for oncology practices, the most important of which is OncoEMR. OncoEMR is a cloud-based EMR system specifically designed for oncology. By focusing specifically on oncology, Flatiron was able to target and convince community oncology practices to transition to their platform. Flatiron has now amassed more than 3 million patient records across 280+ community oncology clinics and 8 major cancer centers.3 This data is an absolute treasure trove for a variety of stakeholders across the healthcare ecosystem. Comprehensive healthcare data that effectively captures a patient’s journey through a treatment lifecycle can enable significant improvements in therapeutics.

Real world data can be used in a number of different ways across the therapeutic industry: from identification of novel targets for new cancer therapeutics to recruitment of patient for clinical trials to replacement of control arms for clinical trials.4

How Can This Treasure Trove of Data be Used

One way to think about how this data can be used across the therapeutics landscape is by thinking about pharmaceuticals development lifecycle.

Research: In this segment of the therapeutic lifecycle, pharmaceutical companies are trying to understand which areas they should be focusing on. Data from OncoEMR can be used to understand epidemiological trends in oncology. Are certain genetic expressions of cancer increasing in popularity? Are they being seen more often in specific sub-segments of the population (e.g., women, smokers, people with comorbidities)? This sort of information can help pharma companies prioritize specific areas and indications to target. Further, genomic data from cancer patient biopsies can be used to identify specific genetic targets or sequences that may not yet have been identified. A compiled database of oncology patients can help companies understand if there are new genetic targets that they should be making drugs for!5

Development – During the development stage, pharma companies are working on getting approval for the drugs they have developed. A comprehensive dataset like OncoEMR can help identify patients who may meet the inclusion criteria for a clinical study.6 Another use case is when data can be used as a synthetic control arm. Traditionally you would recruit patients for both the control arm (placebo) and the dependent arm. However, a comprehensive dataset of historical patient data is a great approximation of patients who never received the drug. As such, you can create a synthetic control arm using data from historical patients that match the inclusion criteria.7

Commercialization – Once a drug is approved and being marketed, there are still various uses for comprehensive EMR data. Flatiron could use their data to help pharma companies prove the efficacy of their drugs compared to other drugs. This data could also be sold to health insurance providers to help them decide which drugs are best to cover in their formularies. Further, OncoEMR data can be used to identify instances where a drug has value beyond its current indications. Since a doctor can prescribe a drug to a patient even if a product is not approved, you can use the data from these patients to understand if it’s worth trying to get a specific drug approved for a new kind of patient.8

These examples only scratch the surface when it comes to how effectively this data can be used. Flatiron Health has since expanded to provide a number of additional services through which they can collect data to help pharmaceutical companies improve oncology therapeutics.


Zhuiyi Technology: Using AI to empower every conversation that matters, at scale


Spotify – Listening in on Users, and Learning as a Result

Student comments on Flatiron Health – Using Healthcare Data to Make Better Drugs

  1. Great post!

    Data in the healthcare space has always been tricky as it is typically protected by privacy regulations worldwide. To make matters more complicated, most of the data is unstructured and can’t be used directly on ML.

    I see great potential for the commercialization of EMR. Drugs undergo extensive testing before being rolled out to the market (for obvious reasons). However, the case could be made that this is not enough. Testing thousands of patients is not representative of a global population. People with existing rare diseases might have negative reactions to the drug and have not come up during FDA testing.

Leave a comment