Research shows that giving AI to everyone doesn’t help everyone
Listen to this article:
Lately, you may have noticed some colleagues seem to use AI for everything. They ask it to sharpen emails, produce meeting notes, brainstorm strategy, and pressure-test campaigns. Watching them, it’s tempting to think the advantage lies in sheer volume: maybe the people getting the most from AI are simply the ones using it more often. But in the back of your mind, you might be thinking about the human-made business decisions happening in the space between the AI’s output and work in the real world. If AI provides us with five ideas, and we pick the wrong one, are we better off than if we had no AI at all? This dynamic sits at the heart of research from early in the generative AI era. As the Digital Data Design Institute at Harvard (D^3) becomes the Harvard Business School AI Institute, we’re revisiting some important scholarship from the institute’s first three years. In “The Uneven Impact of Generative AI on Entrepreneurial Performance: Evidence from a Field Experiment in Kenya,” a research team including HBS AI Institute PI Rembrand Koning followed over 300 entrepreneurs in Kenya as they integrated AI into their daily operations. They discovered that the impact of AI is far from uniform.
Key Insight: Putting AI Advice to a Real-World Test
“Entrepreneurship offers an especially promising context for this broader evaluation.” [1]
The authors note that much of the early evidence on generative AI has come from tightly defined ‘text-based’ tasks like drafting copy, summarizing documents, and generating memos. Running a business, however, demands far more and the researchers wanted to know whether AI could help with the full, complex burden of decision-making that entrepreneurs navigate daily. To find out, they ran a randomized controlled trial across Kenya, where half of 640 entrepreneurs were randomly assigned access to a custom AI assistant built on GPT-4 and delivered through WhatsApp, a platform already woven into daily life there. The assistant encouraged deep engagement by providing three to five enumerated recommendations for every query and allowing users to request granular details for any specific point. The control group received standard business training guides from the International Labour Organization for the same twelve-week period. The authors tracked business performance through repeated surveys of weekly and monthly profits and revenues, then combined those into a standardized performance index
Key Insight: The Average Hides the Story
“[W]e find that low and high performers selected and then implemented different aspects of the nearly two dozen suggestions that the average entrepreneur received from our AI assistant.” [2]
At first glance, the headline result looks unremarkable: on average, access to the AI assistant had no statistically significant effect on firm performance. However, averages can be misleading, and when the researchers looked closer at the data, they found that the impact of the AI depended entirely on how well the business was performing before the experiment began. For those who were high performers at baseline, the AI was a significant boost, potentially increasing performance by over 15%. For those who were lower performers at the start, the AI assistant actually made things worse, leading to a performance decrease of nearly 10%. What caused the split? Not, apparently, differences in basic usage. Low and high performers engaged with the tool at similar rates, asked similar types of questions, and received similar types of answers. Both groups also appeared to act on the AI’s suggestions. The difference emerged in selection and implementation. Lower-performing entrepreneurs were more likely to act on generic recommendations such as cutting prices, offering discounts, or investing in advertising. Those moves can sound sensible in the abstract, but they can also erode margins or add cost without solving underlying problems. Higher-performing entrepreneurs, by contrast, were more likely to translate the AI’s suggestions into tailored business changes. One entrepreneur used the AI to think through backup power during blackouts; another took AI’s advice and added gaming accessories in a cybercafe. Treated high performers were more likely to use non-generic, business-specific language in describing the changes they made and to explicitly mention having learned something from the AI.
Key Insight: Judgment Decides Value
“AI’s impact depends critically on user judgment and selection capabilities when the advice is open-ended rather than constrained.” [3]
The paper’s conclusion is not that AI is overrated, it’s that AI is uneven. Generative AI can clearly influence real-world performance, but not in a uniform way and not necessarily in a democratizing pattern. In open-ended settings, the technology generates a mix of useful and less useful suggestions. That means results depend on the human capability to choose and execute well. The authors note that this challenge may be especially important in emerging markets, where training data is often less representative and where firms often have fewer resources to absorb mistakes.
Why This Matters
For business leaders and executives, this research is an important reminder that using AI is not enough. If AI can produce both strong and weak recommendations from the same prompt and conversation, then advantage will come from knowing how to distinguish between good and bad advice, how to test it, and how to translate good advice into action. This has concrete implications for how companies design AI-assisted workflows, how they train people to work with AI tools, and how they measure whether AI is delivering real results or just superfluous activity.
Bonus
This research shows that it’s important to focus on how people work with AI. To take the next step and look at three different modes of AI collaboration, check out The Three Ways Professionals Work with AI – Which One Are You?
References
[1] Otis, Nicholas et al., “The Uneven Impact of Generative AI on Entrepreneurial Performance: Evidence from a Field Experiment in Kenya,,” working paper (February 27, 2024): 7. https://dx.doi.org/10.2139/ssrn.4671369
[2] Otis et al., “The Uneven Impact of Generative AI on Entrepreneurial Performance,” 4.
[3] Otis et al., “The Uneven Impact of Generative AI on Entrepreneurial Performance,” 33.
Meet the Authors

Nicholas G. Otis is a PhD candidate at the Berkeley Haas School of Business.

Rowan Clarke is a PhD candidate at Harvard Business School.

Solene Delecourt is an assistant professor in the Management of Organizations Group at the Berkeley Haas School of Business.

David Holtz is an assistant professor in the Decisions, Risk, and Operations (DRO) Division at Columbia Business School.

Rembrand Koning is Mary V. and Mark A. Stevens Associate Professor of Business Administration at Harvard Business School, and the co-director and co-founder of the Tech for All lab at the HBS AI Institute.
Watch a video version of the Insight Article here.