In a rapidly evolving business landscape, decision-makers must be agile. In a recent paper, Biyonka Liang, PhD candidate in Statistics at Harvard University, and Iavor Bojinov, Assistant Professor of Business Administration at HBS and PI at D^3’s Data Science and AI Operations Lab, discuss their development of the Mixture Adaptive Design (MAD) to allow industry researchers greater control and flexibility in their experiments. Their study, “An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits”, outlines how the MAD allows businesses to optimize experimentation without compromising results.
Key Insight: Experiment Faster Without Compromising Statistical Validity
Traditional A/B testing, while accurate, uniformly assigns 50% of users to each treatment, regardless of how good or bad the treatments are, delaying decision-making. Multi-Armed Bandit (MAB) algorithms 1, on the other hand, prioritize fast identification of successful strategies but sacrifice statistical depth. The MAD offers a hybrid approach, that combines any chosen MAB algorithm with a Bernoulli design 2, thus, combining the quick adaptability of MABs with the rigorous inference of A/B testing. This balance allows businesses to experiment faster without losing confidence in the validity of their results.
By adjusting the weight between exploration (testing different options) and exploitation (favoring the current best option), the MAD gives managers control over the experiment’s speed and precision. This is particularly useful in environments where quick decisions are critical, but businesses cannot afford to compromise on accuracy.
Key Insight: Minimizing Business Risk Through Early Stopping
A standout feature of the MAD is the flexibility it offers to experiment design. With traditional designs, managers must commit to a fixed experiment size or duration, hence there is no function allowing them to monitor and stop harmful experiments. The MAD changes this by allowing managers to monitor results continuously and adjust the balance between exploration and exploitation.
For businesses like e-commerce platforms or consumer-facing services, the features this approach offers are essential. The MAD can help quickly phase out underperforming variations and focus on the successful ones, protecting both the customer experience and the company’s bottom line.
Key Insight: How MAD Achieves Anytime-Valid Inferences and its Regret-Minimization Compared to Standard Bandit Designs
To get into the nitty gritty: at each timestep, the MAD assigns treatments based either on the Bernoulli design (with probability δt 3) or the bandit algorithm (with probability 1-δt). The deterministic sequence δt controls the balance between these two. By ensuring δt converges to zero slower than 1/t1/4, the MAD retains the inferential power and validity similar to the Bernoulli design while still leveraging the adaptive learning capabilities of the bandit algorithm.
Meanwhile, the MAD’s regret 4 is a weighted sum of the regret from the underlying bandit algorithm and the regret from the Bernoulli design, with weights determined by δt. By carefully selecting δt, one can explicitly control the trade-off between regret minimization and statistical power. At each time point, the regret difference between the MAD and the underlying bandit algorithm decreases towards zero at a rate proportional to δt.
Why This Matters
For modern business leaders, speed and accuracy are essential in decision-making. Whether it’s optimizing digital marketing strategies, enhancing user experience, or improving product features, the MAD provides a powerful tool for executives looking to make data-driven decisions more efficiently. In industries where rapid innovation is key, the MAD can help firms maintain their competitive edge by enabling smarter, more flexible experimentation.
Footnotes
1. A multi-armed bandit (MAB) algorithm is a type of reinforcement learning algorithm used for sequential decision-making in scenarios where there are multiple options (or “arms”) to choose from, and the goal is to maximize rewards over time. The name derives from the analogy of a gambler facing a row of slot machines (“one-armed bandits”), each with an unknown payout probability. The gambler’s objective is to figure out which machine offers the best rewards and exploit it as much as possible.
2. Bernoulli Design is a fundamental experimental design where each unit is independently assigned to either treatment or control with a fixed probability, typically 0.5. In other words, it’s like flipping a coin to determine the treatment assignment for each participant.
3. The sequence δt is a crucial element of the Mixture Adaptive Design (MAD) proposed in the sources. It represents a deterministic, user-specified sequence of values that lie between 0 and 1 (inclusive), i.e., δt ∈ (0, 1].
4. Regret in the context of multi-armed bandit (MAB) experiments, is the expected difference in outcome between always choosing the best-performing treatment (arm) and the outcome achieved by the algorithm used in the experiment. In simpler terms, regret quantifies the opportunity cost of not consistently choosing the best option. The goal of many MAB algorithms is to minimize regret over time by learning which arm yields the best results.
References
[1] Biyonka Liang and Iavor Bojinov, “An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits,” https://arxiv.org/abs/2311.05794 (June 14, 2024): 1-48, 1.
[2] Liang and Bojinov, “An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits,” 2.
[3] Liang and Bojinov, “An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed Bandits,” 15.
Meet the Authors
Biyonka Liang is a PhD candidate in Statistics at Harvard University. Her research, which is supported by the NSF Graduate Research Fellowship, focuses on developing statistical methods and models for challenging practical problems in adaptive experimentation, reinforcement learning, and causal inference.
Iavor Bojinov is an Assistant Professor of Business Administration and the Richard Hodgson Fellow at HBS as well as a faculty PI at D^3’s Data Science and AI Operations Lab and a faculty affiliate in the Department of Statistics at Harvard University and the Harvard Data Science Initiative. His research focuses on developing novel statistical methodologies to make business experimentation more rigorous, safer, and efficient, specifically homing in on the application of experimentation to the operationalization of artificial intelligence (AI), the process by which AI products are developed and integrated into real-world applications.