Back to the Beginnings of AI at Work | Harvard Business School AI Institute

What a Landmark AI Study Tells Us About When to Trust, and When Not to Trust, AI

Listen to this article:

In September 2023, a working paper out of Harvard Business School landed at an unusually consequential moment. Generative AI had been publicly available for less than a year, organizations were scrambling to understand its implications, and almost no rigorous field evidence existed on how it actually affected professional performance. “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality” offered exactly that. Now, in March 2026, that research has been formally published in the peer-reviewed journal Organization Science. To mark this milestone, we’re revisiting the study and its findings. The questions it set out to answer, what AI actually does to knowledge worker performance, where it helps, where it hurts, and why, were foundational then. They remain foundational now.

Key Insight: An Experiment Built for the Real World

“[T]hese tasks were ‘very much in line with part of the daily activities of the consultants’ involved.” [1]

To test the impact of generative AI on high-end knowledge work, the researchers collaborated with Boston Consulting Group (BCG) on a randomized controlled trial involving 758 consultants. After establishing an individual performance baseline, participants were randomly assigned to one of three conditions: no AI access, GPT-4 access, or GPT-4 access paired with a brief prompt engineering overview. The core of the design involved testing how these professionals navigated realistic tasks simulating real-world workflows. The researchers created two kinds of consulting-style assignments. One centered on product innovation and go-to-market work, including ideation, analysis, writing, and persuasion. The other was a difficult brand strategy case that required participants to reconcile spreadsheet data with subtle clues embedded in interview notes. This design let the researchers ask not just whether AI boosts productivity in general, but whether the answer depends on the nature of the task itself.

Key Insight: AI’s Capabilities Don’t Follow a Smooth Line

“[W]ithin the same knowledge workflow, some tasks are beyond the frontier, whereas others remain within it, making effective AI use challenging.” [2]

The paper introduces its signature jagged technological frontier concept to describe the uneven capabilities of generative AI. Tasks that appear of similar difficulty to humans might fall on opposite sides of this boundary. When a task falls inside the frontier, AI is capable of generating accurate, high-quality outputs that support human work. Conversely, being outside the frontier means that AI fails or produces believable but incorrect hallucinations. In such tasks performance still depends on human judgment, guidance, or synthesis that the AI cannot reliably provide on its own. The danger is that professionals have no obvious signal telling them which side of the line a task is on.

Key Insight: AI as a Booster and Disruptor

“[E]xperienced and incentivized knowledge professionals, engaged in tasks akin to some of their daily responsibilities, performed worse when given access to AI.” [3]

For tasks inside the frontier (the innovation and market exercise), AI access produced striking improvements. Consultants using GPT-4 completed 12.2% more subtasks, worked roughly 25% faster, and delivered work that human graders rated about 32% higher in quality. But for the task outside the frontier (the brand strategy case), the results flipped sharply. The control group (no AI) answered correctly 84.5% of the time. Among consultants using AI, accuracy fell to 70.6% for those with GPT access and 60% for those with access and the prompt-engineering overview. The AI had correctly processed surface-level numerical data, but missed a critical insight buried in interview materials. Consultants who used AI tended to trust its analysis and follow it to the wrong conclusion. Over-reliance on AI output, not ignorance of the task, was the mechanism of failure.

Key Insight: The Biggest Gains Go to the Lower Half

“[T]he most significant beneficiaries of using AI are the bottom-half-skill subjects.” [4]

The distribution of gains was not uniform. When the research team segmented consultants by their baseline assessment performance, they found that the largest beneficiaries of AI assistance were those in the lower half of the skill distribution. Bottom-half performers improved by 31% on the experimental task; top-half performers improved by 11%. This pattern suggests that AI can function as a meaningful equalizer within professional environments, lifting those furthest from peak performance (while still delivering meaningful gains to those at the top).

Why This Matters

For executives and leaders, this paper remains foundational because it frames AI adoption as a problem of decisions, strategy, and execution. The lesson is not that AI is universally good or dangerously flawed, it’s that leaders have to understand where, in a workflow, AI strengthens performance and where it creates deceptive failures. This means training people to exercise judgment rather than outsource it, and recognizing that polished output is not the same as sound reasoning. How will you guide your team along the jagged frontier?

Bonus

The need for human oversight, the risk of overtrusting polished outputs, and the challenge of separating assessment from interpretation are tensions that run through many of the most important conversations in AI today. Seen through these lenses, the paper is not only about consulting work, but a broader shift in how decisions get made when AI becomes part of the process. For another look at those themes in the context of screening and evaluating ideas, check out The Future of Decision-Making: How Generative AI Transforms Innovation Evaluation.

References

[1] Dell’Acqua, Frabrizio, et al., “Navigating the Jagged Technological Frontier: Field Experimental Evidence of the Effects of Artificial Intelligence on Knowledge Worker Productivity and Quality,” Organizational Science 37(2): 403-423, 405. https://doi.org/10.1287/orsc.2025.21838

[2] Dell’Acqua et al., “Navigating the Jagged Technological Frontier,” 404.

[3] Dell’Acqua et al., “Navigating the Jagged Technological Frontier,” 419.

[4] Dell’Acqua et al., “Navigating the Jagged Technological Frontier,” 410.

Meet the Authors

Fabrizio Dell’Acqua is a postdoctoral researcher at Harvard Business School. His research explores how human/AI collaboration reshapes knowledge work: the impact of AI on knowledge workers, its effects on team dynamics and performance, and its broader organizational implications.

Edward McFowland III is an Assistant Professor in the Technology and Operations Management Unit at Harvard Business School and Principal Investigator at the HBS AI Insitute Data Science and AI Operations Lab hosted within the Laboratory for Innovation Science.

Ethan Mollick is an Associate Professor at the Wharton School of the University of Pennsylvania, where he studies and teaches innovation and entrepreneurship, and examines the effects of artificial intelligence on work and education. Ethan is the Co-Director of the Generative AI Lab at Wharton, which builds prototypes and conducts research to discover how AI can help humans thrive while mitigating risks.

Hila Lifshitz is a Professor of Management at Warwick Business School (WBS) and a visiting faculty at Harvard University, at the Laboratory for Innovation Science at Harvard (LISH). She heads the Artificial Intelligence Innovation Network at WBS.

Katherine C. Kellogg is the David J. McGrath Jr Professor of Management and Innovation, a Professor of Business Administration at the MIT Sloan School of Management. Her research focuses on helping knowledge workers and organizations develop and implement Predictive and Generative AI products, on-the-ground in everyday work, to improve decision making, collaboration, and learning.

Saran Rajendran is Director of Strategy and Execution at Palo Alto Networks.

Lisa Krayer is Principal at Boston Consulting Group (BGC).

François Candelon is Partner Value Creation & Portfolio Monitoring at Seven2.

Karim R. Lakhani is the Dorothy & Michael Hintze Professor of Business Administration at Harvard Business School. He specializes in technology management, innovation, digital transformation, and artificial intelligence. He is also the Co-Founder and Faculty Chair of the HBS AI Institute and the Founder and Co-Director of the Laboratory for Innovation Science at Harvard (LISH).

Watch a video version of the Insight Article here.