The Sentinel Initiative — How the FDA Might Know Why You’re Sick Before Your Doctor Does

An examination of the potential to harness big data to improve public health and safety using natural language processing.

The Sentinel Initiative

The U.S. Food and Drug Administration (FDA) regulates over 25 percent of the American economy and ensures the safety of medical products. Until recently, the FDA manually reviewed all incoming reports to identify potential safety issues (called adverse events) not uncovered during clinical trials. In 2007, Congress directed the FDA to establish a system to actively monitor such events rather than waiting for these reports. In 2008, FDA launched the Sentinel Initiatve—a national electronic system for medical product safety surveillance.

Fully launched in February 2016, the Sentinel System utilizes data generated from patient interactions with the U.S. health care system.[i] Specifically, Sentinel’s “distributed data infrastructure approach allows the FDA to rapidly and securely access electronic health care data” from over 193 million patients from multiple data partners, mostly comprised of claims data from national health insurers and managed care organizations.[ii][iii]

Sentinel currently has over 425 million person-years of observation time.[iv].With over 5,000 adverse event reports filed daily plus the deluge of claims data, Sentinel helps FDA hear the signal through the noise. Machine learning in Sentinel presents an incredible opportunity to improve public health and safety by reducing labor needs, deriving clinical insights, and mitigating risks from drugs even faster.


Machine Learning in Sentinel

Sentinel is currently exploring applications for machine learning. The integration of a natural language processing (NLP) solution for initial review and classification of reports from the FDA Adverse Events Reporting System (FAERS) and Vaccine Adverse Events Reporting System (VAERS) would significantly reduce the workforce required for reviewing reports. This would allow the FDA to instead focus on the second and third step analyses, which require greater judgement and discretion.

In a preliminary study, unstructured text from VAERS was used to identify and classify possible incidents of anaphylaxis after vaccination. While promising, the algorithm produced a misclassification rate of 40 percent due to its “inability… to make the same clinical judgements as human experts.”[v] To further explore, the FDA is running a pilot with the Centers for Disease Control and Prevention and IBM Watson to further refine the “NLP techniques used to assess safety reports” received in the FAERS and VAERS.[vi] These early results show great promise for the use of NLP in surveillance, but more work is needed around prediction and ETHER: Event-based Text-Mining of Health Electronic Records.


Looking Forward

Electronic Health Records

The FDA is developing a corpus of validator cases for training around ETHER[vii] However, it is critical that they expand to partner with Cerner and Epic, the two largest vendors of electronic health records (EHRs). The limitless potential of Sentinel will not be realized until the system can successfully use NLP to analyze real-time health data pulled from EHRs and catch drug-related adverse events proactively.

Real World Evidence

Congress mandated that the FDA incorporate Sentinel into its human drug review process by 2020.[viii] Advanced NLP could further enable Sentinel to serve as a platform to collect real-world evidence, potentially allowing the FDA to approve existing drugs for new indications. Leveraging Sentinel’s scale with appropriate predictive and NLP capabilities could significantly reduce the cost of subsequent approvals by eliminating the need for additional clinical trials.[ix]


  1. What other sources (online patient forums, Twitter, etc) should FDA use to inform its active postmarket surveillance?
  2. How should FDA balance opening Sentinel to external partners to advance machine learning and derive key insights about a system with huge data fragmentation with the need to maintain capacity for its regulatory and surveillance activities within the system?

(800 words)


[i] Coyle, D. T. U.S. Food and Drug Administration Center on Drug Evaluation and Research, Sentinel Operations Center. “Sentinel System Overview.” Accessed 9 Nov 2018.

[ii] U.S. Food and Drug Administration. “Sentinel Initiative – Transforming How We Monitor Product Safety – Background.” Accessed 11 Nov 2018.

[iii] U.S. Food and Drug Administration. “Sentinel Initiative: Final Assessment Report.” September 2017. Accessed 9 Nov 2018. 

[iv] Ibid.

[v] Ball, R. U.S. Food and Drug Administration, Center for Drug Evaluation and Research, Office of Surveillance and Epidemiology.  “Improving the Efficiency of Outcome Validation in the Sentinel System: Defining the Problem.” Remarks the Duke Margolis Center for Health Policy. 26 July 2018. Accessed 9 Nov 2018.

[vi] Proestel, S. U.S. Food and Drug Administration, Center for Drug Evaluation and Research. “Investigation of Artificial Intelligence in the Interpretation of Adverse Event Reports.” 14 Sept 2018. Accessed 9 Nov 2018.

[vii] Brown, J. Harvard Department of Population Medicine. “Next Steps to Advance the Sentinel System.” Remarks the Duke Margolis Center for Health Policy. 26 July 2018. Accessed 9 Nov 2018.

[viii] Public Law 115-52 : Food and Drug Administration Reauthorization Act of 2017. ( 131 Stat. 1005; Date: 8/18/18; enacted H.R. 2430). Accessed 9 Nov 2018.

[ix] Corrigan-Curay J, Sacks L, Woodcock J. “Real-World Evidence and Real-World Data for Evaluating Drug Safety and Effectiveness.” Journal of American Medicine. 2018;320(9):867–868. doi:10.1001/jama.2018.10136. Accessed 13 Nov 2018.



Amazon Go: The Future of Retail?


Cyft: The Promise of Machine Learning to Deliver Health Care Over Sick Care

Student comments on The Sentinel Initiative — How the FDA Might Know Why You’re Sick Before Your Doctor Does

  1. Great overview clearly explaining the operating method and significance of this ML application. The risk of data fragmentation and regulation/privacy you identify has come up across several ML-related posts. Specific to this case, do you see this application of ML to EHRs requiring any adjustment of or modifications due to HIPAA?

    1. Because the data in Sentinel is aggregated, in theory, no. However, the FDA would need to be sure that they have strong compliance systems to convince providers and EHR vendors that this would in fact be the case. HIPAA violations are complex in their enforcement and doctors are incredible wary of engaging in any activity that has even the faint whiff of a violation. Thus, these compliance assurances would be needed to get providers and vendors to participate.

  2. This is a really exciting use of machine learning which could make some huge improvements to our healthcare system! I think that EMRs are the most critical place for them to focus, and extending partnerships in that space seems critical to growing an accurate prediction system. Other inputs could be valuable in the future, but I think should be less of a priority. I also wonder if there is an opportunity to proactively send surveys or gather data from consumers when your algorithms are seeing red flags? This would allow you to gather a ton of additional, highly relevant information. Clearly there are lots of additional questions and concerns about such an approach, but could be an interesting one to explore.

Leave a comment