In today’s data-driven world, businesses face the dual challenge of leveraging vast datasets to gain insights while ensuring compliance with stringent data privacy regulations. The concept of machine unlearning, a method for efficiently removing the influence of specific data points from machine learning models, represents a paradigm shift in managing data responsibly.
Recent research explores a new framework for machine unlearning in the article, “Attribute-to-Delete: Machine Unlearning via Datamodel Matching,” by Seth Neel, Harvard Business School Assistant Professor, Faculty Affiliate, and Principal Investigator at the Digital Data Design Institute (D^3) Trustworthy AI Lab; Roy Rinberg, PhD student in Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences (co-advised with Seth Neel); Kristian Georgiev, PhD candidate at MIT’s Electrical Engineering & Computer Science (EECS) Department; Sung Min Park, Postdoctoral Scholar in Computer Science at Stanford University; Shivam Garg, PhD student at Stanford University; Andrew Ilyas, Stein Fellow at Stanford University; and Aleksander Madry, Cadence Design Systems Professor at MIT’s EECS Department.
Key Insight: The Growing Need for Machine Unlearning
“The goal of machine unlearning is to remove (or ‘unlearn’) the impact of a specific collection of training examples from a trained machine learning model.” [1]
The research emphasizes how regulatory pressures, like the EU’s Right to Be Forgotten, and practical needs—such as mitigating the effects of poisoned, toxic, or outdated data and resolving copyright infringement issues in generative AI models—are driving the demand for machine unlearning. The authors demonstrate how machine unlearning can address these challenges by enabling models to function as though specific data points (the “forget set”) were never part of the training process.
Key Insight: A Breakthrough Framework—Datamodel Matching
“Datamodel Matching (DMM) […] introduces a reduction from unlearning to data attribution, allowing us to translate future improvements in the latter field to better algorithms for the former.” [2]
The authors introduce DMM, a novel approach that links machine unlearning to data attribution. Unlike traditional retraining methods that can be computationally expensive, DMM employs data attribution to predict a model’s output as if it were retrained without the forget-set data and fine-tunes the data to match these predicted outputs.
Key concepts:
- Data attribution: A framework within machine learning that connects specific training data samples to the predictions made by a trained model. This concept focuses on understanding and quantifying the influence of individual training data points on a model’s behavior and predicting how changes to the training dataset, such as adding or removing data points, would affect a model’s outputs.
- Oracle Matching (OM): A hypothetical and idealized approach to machine unlearning where a model is fine-tuned to match the outputs of an oracle model. The oracle model represents a machine learning model that has been retrained from scratch on the dataset excluding the data points to be unlearned (the forget set).
- Fine-tuning: A process in which an already-trained machine learning model is updated to achieve a specific objective by making small adjustments to its parameters. In the context of machine unlearning, fine-tuning is used to modify a model so it behaves as though the forget-set data were never part of the original training process. The fine-tuned model’s behavior should be statistically indistinguishable from the oracle model on both the forget set and the retained data.
Key Insight: Addressing the Missing Targets Problem
“A pervasive challenge […] for fine-tuning-based approaches is what we refer to as the missing targets problem.” [3]
Existing fine-tuning-based unlearning methods suffer from the “missing targets” problem, which describes the challenge of determining the precise output a model should produce after forgetting a particular data point or group of points. DMM circumvents this issue by using data attribution to estimate the target outputs of an oracle model, and then fine-tuning to match, ensuring stability and preventing overshooting or undershooting the target loss.
Key Insight: Practical Efficiency with Broad Applications
“[DMM] achieves state-of-the-art performance across a suite of empirical evaluations.” [4]
To better assess unlearning performance, the researchers propose a new evaluation metric called KL Divergence of Margins (KLoM). This metric directly measures the distributional difference between unlearned model outputs and those of models retrained without the forget set. The authors’ research demonstrates that DMM delivers results comparable to full retraining at a fraction of the computational cost.
Why This Matters
DMM represents a significant step forward in the machine unlearning field, offering a more reliable and efficient approach to unlearning in complex neural networks. For C-suite executives and business professionals, this research highlights the potential for improved data management practices and reduced computational costs associated with model maintenance. This approach opens new avenues for future research and offers practical solutions for addressing privacy concerns and data removal requests in real-world applications.
References
[1] Kristian Georgiev, Roy Rinberg, Sung Min Park, Shivam Garg, Andrew Ilyas, Aleksander Madry, and Seth Neel, “Attribute-to-Delete: Machine Unlearning via Datamodel Matching”, arXiv preprint arXiv:2410.23232 (October 2024): 1-47, 1.
[2] Georgiev et al., “Attribute-to-Delete: Machine Unlearning via Datamodel Matching,” 3.
[3] Georgiev et al., “Attribute-to-Delete: Machine Unlearning via Datamodel Matching,” 2.
[4] Georgiev et al., “Attribute-to-Delete: Machine Unlearning via Datamodel Matching,” 3.
Meet the Authors

Seth Neel is an Assistant Professor housed in the Department of Technology and Operations Management (TOM) at Harvard Business School, and a Faculty Affiliate in Computer Science at SEAS. He is the Principal Investigator at the Digital Data Design Institute (D^3) Trustworthy AI Lab.

Roy Rinberg is PhD student in Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences, and is co-advised by Seth Neel. His research interests focus on public-interest technology, with a recent focus on privacy technology.

Kristian Georgiev is a PhD candidate at MIT’s Electrical Engineering & Computer Science (EECS) Department advised by Aleksander Madry. They are interested in the science of deep learning and deep learning for science.

Sung Min Park is a Postdoctoral Scholar at Stanford working with Prof. Tatsu Hashimoto, Prof. Percy Liang, and Prof. James Zou. He received his PhD from MIT, where he was advised by Prof. Aleksander Mądry. He is interested in understanding and improving machine learning (ML) methodology through the lens of data.

Shivam Garg is a PhD student at Stanford, advised by Greg Valiant . His is part of the Machine Learning Group and the Theory Group at Stanford. Prior to Stanford, he worked at Microsoft Research India.

Andrew Ilyas is a Stein Fellow at Stanford University. His research pursues a precise empirical understanding of the entire machine learning pipeline, with an emphasis on data. His interests span tracing predictions back to training data, identifying and alleviating data bias, and studying machine learning robustness.

Aleksander Madry is the Cadence Design Systems Professor of Computing in the MIT EECS Department and a member of CSAIL. He received his Ph.D. from MIT in 2011. He is the Director of the MIT Center for Deployable Machine Learning and a Faculty Co-Lead of the MIT AI Policy Forum. Prior to joining the MIT’s faculty, he spent a year as a postdoctoral researcher at Microsoft Research New England.