Sharing Data to Advance Cancer Care at Memorial Sloan Kettering Cancer Center

The trend towards precision oncology and machine learning for cancer care is unmistakeable. In a fast-growing industry, however, it is important to consider the risks and unintended consequences of sharing data.

Cancer can be a highly personal and sensitive topic, which is why significant revelations in the field tend to be exciting and noteworthy. An important development in cancer research has been the understanding that the disease is “extremely heterogeneous.” As a result, the concept of using precision medicine for oncology is valuable because there is not one “cure” for cancer; rather, each cancer patient has a different genetic profile and it would be more effective to tailor solutions to the individual patient [1]. With an incredible amount of data being generated, it is crucial to find a way to leverage this data for better healthcare outcomes. Collection of the data is only one, albeit critical, part of the equation; importantly, “the human ability to process this data without effective decision support is finite…computer models are required to help clinicians organize the data, recognize patterns, interpret results, and set thresholds for actions” [2]. While utilizing machine learning and big data offer big opportunities in the field of cancer, there are important issues to keep in mind as it relates to product development in the healthcare industry more generally.

Memorial Sloan Kettering Cancer Center (MSK), one of the most highly respected cancer centers in the world, understands the significance of machine learning in oncology. At a conference in 2017, MSK created a competition that “was conceived because precision medicine and genetic testing are disrupting the way diseases like cancer are treated;” the competition was a “call to develop a machine learning algorithm that, using MSK’s database, can begin to help automatically classify actionable genetic variations” [3]. While this appears to be a longer-term investment, MSK has recently made shorter-term commitments as well. Earlier this year, the hospital partnered with Paige.AI, which was “a new company focused on revolutionizing clinical diagnosis and treatment in oncology through use of artificial intelligence (AI).”  This specific start-up focused on using machine learning for pathology to help diagnose “with greater speed, accuracy, objectivity and reproducibility.” Their ability to do this relies on having data; in turn, MSK signed an agreement that gives the start-up “exclusive rights to MSK’s library of 25 million pathology slides,” a significant step for both sides [4].

There is an important, symbiotic relationship between MSK and its collaborators, but the hospital must exercise caution with potential conflicts. To leverage machine learning, MSK needs to continue collaborating by offering its valuable data. One of the big issues faced by health data “is lack of standardization, which would allow for information to be combined from multiple data centers to develop a deeper understanding of clinical outcomes” [5]. Rather than working exclusively with for-profit companies, MSK should also work with other hospitals to share data; this would provide a whole new source of valuable information. When collaborating, MSK must also be careful about how it shares data, and as importantly, how the hospital is perceived. Recently, MSK’s partnership with Paige.AI came under intense scrutiny. One issue was patient data; while the patient’s information was de-identified, sharing images of cells or genetic variations leads to a vaguer ethical boundary. MSK must be transparent with patients about the way in which their data is being used. MSK should educate patients about machine learning and the importance of collecting data; if patients are better educated, they may be more willing and excited to share their information. Another big question that was brought to the surface in this case was how we value data, and who has the rights to that data. While certain MSK executives retained equity in Paige.AI, pathologists believed “it is unfair that the founders received equity stakes in a company that relies on the pathologists’ expertise and work amassed over 60 years” [6]. To reduce this perception of taking advantage of others’ work for personal gain, MSK should not allow specific employees to benefit individually from companies with which it collaborates. If there are any relationships, which should be encouraged, any financial benefits should be realized by the hospital more broadly.

Unlike other industries, cancer care (and the healthcare industry more generally) faces different scrutiny. MSK (and other hospitals) must continue to collaborate with innovative companies, but they must have more specific policies and procedures in place as it relates to the data it shares. Creating a perception of impropriety when it comes to patients’ lives can jeopardize tremendous progress. One of the most important questions to consider is how we think about the value of the data we are inputting. Who has rights to this data? Should we allow it to be used as a common good? While other industries may face similar questions about the value of data, the stakes are raised in healthcare and these questions become amplified. [785 words].


[1] Seung Ho Shin, Ann M. Bode, and Zigang Dong, “Precision medicine: the foundation of future cancer therapeutics,” npj Precision Oncology 1, no. 12 (2017),, accessed November 2018.

[2] Daniel Richard Leff and Guang-Zhong Yang, “Big Data for Precision Medicine,” Engineering, 1, no. 3 (September 2015): 277-279, via ScienceDirect, accessed November 2018.

[3] “Memorial Sloan Kettering Advances Its AI, Machine Learning at NIPS 2017,” press release, December 11, 2017, on MSK website,, accessed November 2018.

[4] “Paige.AI Created to Transform Cancer Diagnosis and Treatment by Applying Artificial Intelligence to Pathology,” press release, February 5, 2018, via BusinessWire,, accessed November 2018.

[5] Cary Jo R Schlick, MD, Joshua P Castle, BS, and David J. Bentrem. MD, “Utilizing Big Data in Cancer Care,” Surgical Oncology Clinics of North America 27, no. 4 (October 2018): 641-652, via ScienceDirect, accessed November 2018.

[6] Charles Ornstein and Katie Thomas, “Sloan Kettering’s Cozy Deal With Start-Up Ignites a New Uproar,” New York Times, September 20, 2018,, accessed November 2018.



Machine Learning, Defense Innovation and the British Army


Cedar: Making Paying Doctors Much Easier

Student comments on Sharing Data to Advance Cancer Care at Memorial Sloan Kettering Cancer Center

  1. I agree with the author’s sentiments – sharing data has the potential to benefit cancer centers and the healthcare industry as a whole, but there are risks of this data sharing. At the end of the essay, the author raises the question of how to think about the value of the data we are inputting and if it is safe. As such, I’d like to introduce the author to Imprivata, a company formed off the basis of HIPPA regulations that enables healthcare to securely be transferred between people, technology and information ( Companies such as these will allow the healthcare industry to continue to leverage digital tools in a protective and secure manner. Before the process of standardizing shared data occurs, and hospitals start communicating information to one another, it is critical for patients to feel their data is protected and for large treatment centers such as MSK to invest in data protection for the short and long term.

  2. The author raised an interesting point about equity and equitability when it comes to making money from cancer treatment. While pathologists are certainly responsible for major breakthroughs in cancer care, they are compensated for their treatment of patients. MSK’s partnership with Paige.AI is the result of for-profit healthcare, which is the same system that compensated doctors at different rates at different hospitals for different jobs; this is just an extension of that. I think that anything less than that threatens to undermine the important innovations that are currently being made by aggregating large data sets and applying machine learning.

  3. Very interesting article. It reminds me a lot of the debate over Henrietta Lacks, a patient whose cancer cells were cloned and on which years and years of research was performed without explicit consent from her or compensation. Patient deidentification is key for this type of work. However, a patient’s genetic information is the ultimate identifier, which complicates these questions especially as direct to consumer sequencing becomes more widely available. Finally, in the spirit of collaboration and wanting to share data, I actually wonder if these hospitals are disincentivized to share their data or organize it in a digestible and transferable form because they are able to retain proprietary rights to the data, from which they can extract scientific findings and compensation.

Leave a comment