Duolingo: Machine Learning Our Forgetfulness

Duolingo is using machine learning to build a fun, personalized, and effective language learning experience.

Duolingo: Machine Learning Our Forgetfulness

Luis von Ahn and Severin Hacker launched Duolingo in 2012 with the goal of revolutionizing language learning. In a 2014 interview with The Guardian, von Ahn explained his motivation:

“What I wanted to do was create a way to learn languages for free. If you look at language learning in the world, there are 1.2 billion people learning a foreign language and two thirds of those people are learning English so they can get a better job and earn more. The problem is that they don’t have equity and most language courses cost a lot of money”[1].

While there were language learning apps on the market before Duolingo, they tended to struggle with retaining users long-term. Von Ahn estimated that Busuu, a leading online language learning service that included a free option, had only 5% of users sticking with it long-term [2]. Busuu, Babbel, and other competitors were good ways to get started, but for most users they didn’t fully replace more expensive in-person language learning classes.

Duolingo’s solution was to create a personalized, fun, and effective experience using machine learning (ML) to understand and challenge users of the product. Their aim is readily apparent in their playful and game-like app design [3], but under the hood ML may be the true innovation. For example, their “Half-life regression” (HLR) algorithm uses data from over 300 million users worldwide to construct and maintain a personalized model that predicts how likely you are to remember a word at any time [4]. Relying on theory dating back to 1885, which says that chance of remembering decays exponentially with time [5], the team came up with HLR to determine when is the best time to show a user a word again. Duolingo rolled out HLR to all users after preliminary A/B tests showed that it increased overall user activity by 12% [6].

Standard “forgetting curve”: the probability of remembering goes down as a function of “lag time” Δ (days since the last practice) and “half-life” h.

Duolingo uses a personalized forgetting curve for each user and word combination. The curves become less steep each time a user remembers a word and steeper each time a user forgets.


Von Ahn had experience leveraging large populations to teach digital systems new information, having previously invented reCAPTCHA, which cleverly allows people on the web to verify they are not robots while helping digitize books at scale. reCAPTCHA was acquired by Google in 2009 and now includes image recognition and other uses beyond text [7].


reCAPTCHA example: the user teaches the computer that the left image is “specific” while verifying another image the computer already knows, “Donanne” [8].


While ML is a near-term differentiator for Duolingo, it also underpins longer-term strategy for the seven year old firm now valued at over $700M [9]. More than any direct competitor, it’s possible Duolingo’s primary adversary will always be attrition [10]. Beyond in-game tactics like HLR, Duolingo has also begun administering a placement test dependent on ML in order to better challenge and retain more advanced learners [11]. This competency could also enable them to compete in standardized testing and language certification markets.

Machine learning allows Duolingo’s to clearly quantify and steadily improve retention on its platform, but it remains unclear whether their effort is best spent further optimizing these quantifiable metrics. Perhaps the journey of learning a language cannot be simplified into perfectly personalized exercises such as multiple choice questions and drag-and-drop games. These may be the best way of getting people in the door and keeping them on the platform for some time, but unless they can recreate the experience of living with native speakers and having dozens of casual conversations each day, can they really revolutionize language learning? For Duolingo to meaningfully help as many people learn new languages as possible, they may need to consider branching out into video coaching, immersion trip planning, and other supporting efforts to the overarching mission—similar to how Airbnb now organizes “Experiences” as part of its mission to “create a world where people can belong through healthy travel that is local, authentic, diverse, inclusive, and sustainable” [12].

Finally, Duolingo must consider the business opportunity from ML along with the user opportunity. In addition to aiding in HLR and other algorithms, Duolingo is amassing a considerable trove of personalized language learning data. How should they use it beyond algorithms such as HLR and placement tests? Should they branch out into other areas of education beyond language? Should they recommend other experiences within and outside of the product based on personal data? More controversially, should they consider selling aggregated or individualized data for potentially lucrative opportunities advertising or recruiting?

(778 words)


Works Cited

[1] O’Conor, L. (2018). Duolingo creator: ‘I wanted to create a way to learn languages for free’. The Guardian. https://www.theguardian.com/education/2014/aug/27/luis-von-ahn-ceo-duolingo-interview.

[2] Ibid.

[3] Konrad, A. (2018). Language App Duolingo Raises $20M In Race To Teach English. Forbes. https://www.forbes.com/sites/alexkonrad/2014/02/18/language-learning-app-duolingo-raises-20m-in-race-to-teach-english/.

[4] Lardinois, F. (2018). Duolingo hires its first chief marketing officer as active user numbers stagnate but revenue grows. TechCrunch. https://techcrunch.com/2018/08/01/duolingo-hires-its-first-chief-marketing-officer-as-active-user-numbers-stagnate/.

[5] Murre JMJ, Dros J (2015) Replication and Analysis of Ebbinghaus’ Forgetting Curve. PLoS ONE 10(7): e0120644. https://doi.org/10.1371/journal.pone.0120644.

[6] Code for HLR is open sourced at https://github.com/duolingo/halflife-regression.

[7] Goedegebuure, D. (2018). You Are Helping Google AI Image Recognition – Dennis Goedegebuure – Medium. https://medium.com/@thenextcorner/you-are-helping-google-ai-image-recognition-b24d89372b7e.

[8] Licensed under Creative Commons Attribution-ShareAlike 3.0 License.

[9] Lardinois, F. (2018). Duolingo raises $25M at a $700M valuation. TechCrunch. https://techcrunch.com/2017/07/25/duolingo-raises-25m-at-a-700m-valuation/.

[10] Although Duolingo doesn’t share exact statistics, they do say on their website that “only a fraction of those who start a Duolingo course make it to the end.” https://forum.duolingo.com/comment/19245570/What-proportion-of-users-who-start-a-course-finish-it

[11] Gagliordi, N. (2018). How Duolingo uses AI to disrupt the language learning market. ZDNet. https://www.zdnet.com/article/how-duolingo-uses-ai-to-disrupt-the-language-learning-market/.

[12] Airbnb Press Room. (2018). About Us – Airbnb Press Room. https://press.airbnb.com/about-us/.


Revolutioning the seas: Rolls-Royce and Intel to build fully autonomous ships


Square: Using machine learning to de-risk small business lending

Student comments on Duolingo: Machine Learning Our Forgetfulness

  1. Very thought provoking! I agree that language can only truly be perfected through in person interaction and that digital models, while valuable in the interim will never fully replace immersion. Perhaps Duolingo can circumvent some of the problems you cited by having an opt-in immersion option, where data can be both sold to companies providing this service but in a targeted way that benefits the consumer?

  2. Boom goes the dynamite. Duolingo should definitely branch out into other areas of education besides language learning before their current tree burns to the ground. The same AI technologies that allow them to be successful are ones that are enabling people to communicate in ways that decreases the necessity of learning a new language.

    To you point, what additional value can they create using the data they are already collecting — they should consider the aspects of progression of learning and cultural differences that might be able to be inferred from such a data set and conduct a market landscape (through BCG or otherwise) of which potential industries or companies might benefit from those inferences.

  3. The user opportunity here is promising. Given its accessibility, especially to those who wouldn’t be able to afford language classes normally, Duolingo can capture more of the mass market. To further expand its portfolio, the technology should be applied to other educational areas while keeping its accessibility. The extension could be a platform to be used in K-12 classrooms. If they want to take it another route, they could create a learning platform for companies. With the modern worker in constant need of training, Duolingo could capitalize on this need. This latter approach may be more challenging as this could require customization. Instead of focusing on company-by-company specific models, they could focus on higher-level content.

  4. I am a heavy user of Duolingo and retention can really be an issue even for users like me. It is really hard to keep motivated once you’ve “completed” a language you started only for fun, a language that is not a part of your daily environment. Maybe they could use the knowledge they have about users to keep them engaged with the language through media of all kinds, beyond just the actual course.

    I also agree with what Taylor Stockton commented before me, that they need to go into other educational spaces. There is a lot of potential for its method in test prep, like for the GMAT, CFA or other kinds of exams. As someone who has been starting and giving up learning to code for a long time now, I can also see how much it would help to have a Duolingo for coding just to help students remember certain rules and tags.

  5. As someone who used Duolingo for only about 1 week before forgot about it, I think there are a lot of opportunities for them to grow. While I think the interface brings fun and excitement into language learning, I believe that the actual retention of the language learning is low when it is primarily in multiple choice and sentence construction. With their HLR model, I think there is potential for them to integrate more interactive media into their portfolio. For example, providing you with media videos in other languages and getting your feedback for understanding. They could also partner with schools or businesses to further build out their platform and provide a way for companies to help their employees learn other languages (i.e. Rakuten).

    With regards to the data, I would be curious to see what industries (outside of education) they feel this data most useful for and how they could leverage the data for those industries. Since you referred to Airbnb, one idea around experiences could also be to partner with someone like Airbnb and immerse people in a culture/environment as a supplement to language learning.

  6. Mr. ZACK! Interesting article – especially topical given the discussion on Englishnization at Rakuten earlier today.

    I really enjoyed the discussion of how they use machine learning to improve knowledge retention. The decay function is interesting – enjoyed the graphic. Also interesting mention of Gamification. In what ways is the app gamified? Never used it before, but curious.

    The reCAPTA shout-out is interesting (albeit random). I bet there’s a great article in there about the switch from simple image recognition to more behavior based “are you a human?” check boxes. Great piece overall!

  7. Thanks for this article – I’ve never considered what Duolingo could do with this data. In response to your question, I do think it could be useful to apply data on rates of practicing to other educational areas. For example, in math or vocabulary, Duolingo could set up flash card programs and use its learnings from their language app to design the optimal timing, difficulty, prompts/nudges, evaluation, and grading. What will be critical is understanding how that might need to be adjusted for different age groups. Duolingo is mostly used by adults as far as I know – can they adapt this for pre-school or elementary school aged children?

Leave a comment