Faithfulness and Accuracy: How Fine-Tuning Shapes LLM Reasoning

Large language models (LLMs) have exhibited remarkable capabilities across various domains; from finance to healthcare, education to content creation, their potential seems limitless. However, as businesses increasingly embrace these powerful tools, understanding their nuances becomes crucial. One critical aspect is fine-tuning, a process that adapts LLMs to specific tasks. While fine-tuning promises enhanced performance, research suggests it can also introduce unintended consequences, impacting both accuracy and the reliability of reasoning. The recent research paper, “On the Impact of Fine-Tuning on Chain-of-Thought Reasoning” by Elita Lobo, PhD student in Computer Science at the University of Massachusetts Amherst, Chirag Agarwal, Assistant Professor at the University of Virginia, and Himabindu Lakkaraju, Assistant Professor of Business Administration at Harvard Business School, Faculty Affiliate at the Harvard Business School AI Institute Laboratory for Innovation Science at Harvard (LISH), and Principal Investigator in the HBS AI Institute Trustworthy AI Lab, sheds light on these potential pitfalls and their implications for business leaders.

Key Insight: Fine-Tuning’s Effect on Accuracy and Faithfulness

“Our study examines the impact of fine-tuning on a model’s CoT performance, focusing on two key aspects: the accuracy of answers after CoT reasoning and the faithfulness (1) of the reasoning steps generated before and after fine-tuning on different datasets.” [1]

Despite significant efforts to study the privacy, safety, and performance implications of fine-tuning, its impact on LLMs’ reasoning abilities have remained underexplored. This paper addresses the gap, with a focus on Chain-of-Thought (CoT) prompting—a technique that prompts models to generate step-by-step solutions for complex problems. Studying three LLMs (a 4-bit quantized Llama-3-8b-Instruct model, GPT-3.5-0125, and GPT-4), the researchers used the QLoRA method (2) to fine-tune the LLMs on a variety of datasets, including medical, common-sense, and math-reasoning.

Key Insight: Fine-Tuning Can Degrade Chain-of-Thought Reasoning

“Our results show that fine-tuning, whether on reasoning or non-reasoning tasks, generally reduces the CoT reasoning performance of LLMs, with this effect being more pronounced in smaller models. Additionally, fine-tuning smaller LLMs on non-reasoning datasets or those requiring minimal reasoning tends to further decrease the faithfulness of the CoTs they generate.” [2]

The research team found that fine-tuning LLMs on specialized datasets often led to a decrease in their CoT reasoning performance. This effect was particularly noticeable in smaller language models. For example, when fine-tuning the Llama-3-8b-Instruct model on medical datasets, its CoT accuracy on math reasoning tasks dropped significantly. This suggests that the process of adapting models to specific domains may come at the cost of their general reasoning capabilities.

The study additionally revealed that fine-tuning can negatively impact the faithfulness of CoT reasoning, especially in smaller language models. The researchers found that after fine-tuning on datasets that required less complex reasoning, the models were more likely to generate reasoning steps that didn’t truly influence their final outputs.

Key Insight: Larger Models Show More Resilience to Fine-Tuning Effects

“We conjecture that […] larger models possess better generalization capabilities and therefore require less significant weight adjustments to adapt to new tasks.” [3]

The research team observed that larger language models, such as GPT-4, were less susceptible to the negative effects of fine-tuning on their reasoning abilities. The fact that these models maintained more consistent performance across different tasks after fine-tuning suggests that their increased capacity allows them to adapt to new domains without significantly compromising their general reasoning capabilities.

Why This Matters

For business professionals and executives deploying LLMs, this research underscores key considerations in fine-tuning, which can enhance performance in specific domains, while it may compromise general reasoning capabilities and explanation faithfulness. This trade-off is particularly important in high-stakes decision-making scenarios, where the ability to provide accurate and trustworthy reasoning is crucial. Industries such as healthcare, finance, and legal services, which require both specialized knowledge and robust reasoning, may need to carefully balance the benefits of fine-tuning against the potential degradation of general reasoning abilities.

Endnotes

Faithfulness, in this context, refers to the extent to which the CoT reasoning steps directly contribute to the final answer, rather than being added after the fact or unrelated.
QLoRA (Quantized Low-Rank Adapters) is a technique for fine-tuning LLMs that leverages a frozen, 4-bit quantized version of the pre-trained model. It trains small, low-rank adapters to efficiently adapt the model to new tasks, minimizing memory consumption. In simpler terms, QLoRA is a way to teach a big AI model new tasks by using a smaller, simplified version and adding “helper modules” for the new learning. This approach adapts the model efficiently while preserving its original abilities and saving resources.

References

[1] Elita Lobo, Chirag Agarwal, and Himabindu Lakkaraju, “On the Impact of Fine-Tuning on Chain-of-Thought Reasoning,” arXiv:2411.15382 (November 22, 2024): 1-15, 2.

[2] Lobo, Agarwal, and Lakkaraju, “On the Impact of Fine-Tuning on Chain-of-Thought Reasoning,” 2.

[3] Lobo, Agarwal, and Lakkaraju, “On the Impact of Fine-Tuning on Chain-of-Thought Reasoning,” 4-6.

Meet the Authors

Elita Lobo is a third-year PhD student in Computer Science at the University of Massachusetts Amherst, advised by Dr. Yair Zick. Her research focuses on Trustworthy Reinforcement Learning (RL) and Machine Learning, with a particular emphasis on developing practical, fair, and robust algorithms. Before starting her PhD, she completed a master’s degree in Computer Science at UMass Amherst in 2020, during which she had the privilege of working with external collaborators, Dr. Marek Petrik and Dr. Hima Lakkaraju.

Chirag Agarwal is an Assistant Professor of Data Science at the University of Virginia and leads the Aikyam lab, which focuses on developing trustworthy machine learning frameworks that go beyond training models for specific downstream tasks and satisfy trustworthy properties, such as explainability, fairness, and robustness. Before joining UVA, he was a postdoctoral research fellow at Harvard University and completed his PhD at the University of Illinois at Chicago in electrical and computer engineering and bachelor’s degree in electronics and communication.

Himabindu Lakkaraju is an Assistant Professor of Business Administration at Harvard Business School. She is also a faculty affiliate in the Department of Computer Science at Harvard University, the Harvard Data Science Initiative, the Center for Research on Computation and Society, and the LISH. She teaches the first-year course on Technology and Operations Management, and has previously offered multiple courses and guest lectures on a diverse set of topics pertaining to artificial intelligence and machine learning, and their real-world implications.