Duolingo: Language Learning through Deep Learning

Duolingo presents as a cutesy, unassuming layout on your screen. But here’s how one of the largest language learning platforms masterfully utilizes machine learning and data to fuel core business functions.

Duolingo has grown into the largest language learning platform and one of most downloaded education apps worldwide since its launch in 2011. The app teaches 42 languages to over 500 million users. The Duolingo experience driven by manageable, interactive chunks that can be done anywhere at any time.

Indeed, Duolingo presents as a cutesy elementary layout on your screen. But the company masterfully utilizes machine learning, data, and AI to fuel core business functions, such as product development, user retention efforts, recruitment priorities, and revenue strategies.

Why is it important for Duolingo to use data to its advantage? Because 24/7 immersion is key to language learning. No matter the use case (whether a user is studying English for a new job or Spanish to learn the language of your roots), Duolingo uses an enormous amount of data to keep their users practicing for as much time as possible.

Duolingo’s data journey began when it hired well-known computer scientist Burr Settles. Settles transformed Duolingo into an AI-powered language learning platform. He focused on comprehensive data models to navigate language-learning with the highest levels of efficiency. He started with the app’s most important feature: the proficiency test.

The proficiency test is the first step a user must take to enter Duolingo’s learning program at an appropriate level. So, it sets the tone for customer acquisition and retention – And it helps Duolingo generate and renew more and more data as the user base grows.

Under Settles, the company allocated 90% of AI resources on Duolingo’s proficiency test. While the user is engaged in the 45-minute exam, Duolingo is engaged in intensive data-gathering. Data gathered includes a person’s identity, engagement levels during a lesson, study performance of test items and questions, scoring methodologies, and adaptive testing. Duolingo regularly runs A/B tests to find an optimal way to design the test, and all of its courses, testing users on learning efficiency to adapt the experience and smooth the learning curve. These customized machine learning models were necessary helped Duolingo deliver the level of personalization they were targeting.

Additionally, data fuels Duolingo’s fun points-based reward system. (For example, users can compete against friends and random people by completing courses and receive rewards for streaks.) The research team invested in research and development around how often one should practice a language and to what degree the game-like curriculum increased practice time. Using data from A/B testing, Duolingo adapts its levels and game-like experience to attract the target user base.

How does such an unassuming app utilize sophisticated AI and data? Duolingo uses deep learning to mimic the brain’s behavior to quickly evaluate data to predict user behavior. Through deep learning, the company can customize algorithms for everything from non-native speech recognition to classification for automated scoring to predicting which questions a user will get correct. The company uses the PyTorch deep learning framework on Amazon Web Services (AWS). Using that framework, Duolingo makes upwards of 300 million predictions a day, each one informed by 100,000-30 million data points at any given time.

Thus, Duolingo’s aggressive push for data-driven allows the company to capture enormous value and transform the industry. Gone are the days of Rosetta Stone or conversation partners, when a proven interactive application is free and at your fingertips. The result? Today, over 300 universities worldwide use DuoLingo as a language proficiency test for students, including New York University, UCLA and Duke. The company is currently valued at $2.4 billion.

Only 4% of the company’s customers pay, which brings in nearly 85% of revenue. The rest is from advertisements. The founder of the company is considering an initial public offering as soon as this year. Duolingo continues to incorporate AI-powered features to increase paid subscribers.

Moving forward, I anticipate that the company will need to think more broadly in their user engagement strategy. While much of the appeal of Duolingo is in the gamification of the experience, I imagine this strategy can be modified for an older, more sophisticated user. Perhaps data can be used to modify the platform’s graphics depending on the user.

Also, Covid-19 increased usage by 101% in March 2020 – the impact of stay at home learning was immediate. I anticipate that when students return to school full time, some users will transition out of the app. Duolingo will have to strategize how to keep them engaged when language-learning returns to the physical classroom.



“Duolingo.” Research, research.duolingo.com/.

Farrell, Maureen. “WSJ News Exclusive | Duolingo Valued at $2.4 Billion in Fundraising Round.” The Wall Street Journal, Dow Jones & Company, 18 Nov. 2020, www.wsj.com/articles/duolingo-valued-at-2-4-billion-in-fundraising-round-11605700806#:~:text=raised%20new%20money%20in%20a,app%20maker%20at%20%242.4%20billion.

Gagliordi, Natalie. “How Duolingo Uses AI to Disrupt the Language Learning Market.” ZDNet, ZDNet, 13 Nov. 2018, www.zdnet.com/article/how-duolingo-uses-ai-to-disrupt-the-language-learning-market/.

Peranandam, Cynthya. “AI Helps Duolingo Personalize Language Learning.” Wired, Conde Nast, 6 Dec. 2018, www.wired.com/brandlab/2018/12/ai-helps-duolingo-personalize-language-learning/.



Siemens MindSphere: Gathering and Interpreting IIoT Data


Duo: Your New Data Driven Language Teacher

Student comments on Duolingo: Language Learning through Deep Learning

  1. Hi — great post! As an avid Duolingo user myself, it was really interesting to read how the company is leveraging ML and using A/B testing to create optimal experiences for its users.

    Duolingo has faced controversy in the past over its “gamification” approach — as they’re engaging in significant amounts of A/B testing to leverage some of the addictive tactics that normal mobile games utilize.

    I wonder what your thoughts are on this — given that Duolingo is a language learning app first, how should they draw the line between “gamification” of education and full out gaming?

  2. Hey Cristina, Amazing post! For the brief period that I used Duolingo, it was actually very strange how they could predict my schedule and the level at which I progressed to make me repeat parts I was struggling with and move faster past phrases I understood quickly. It makes sense that they use pattern recognizing deep learning algorithms.

    Also, considering Google’s stake in the company, do you think that they might be using speech data from Duolingo for better training their AI to understand languages in various foreign accents?

Leave a comment