Grammar(ly): Who sets the rules?

Over the past decade, Grammarly has accumulated 7M daily users of its grammatical suggestion engine [1]. Now Grammarly hopes to go beyond grammar in its mission of improving communication. How they do that depends on us.

Crowd-sourcing Grammar

In 2013, the Merriam-Webster dictionary changed the definition of the word “literally” to include its figurative meaning [2]. By doing so, Merriam-Webster gave a nod to the fluid nature of language. But when should we stick by traditional language rules and when can we adapt? This is the question facing Grammarly as it investigates the use of crowd-sourced data for its AI engine. Imagine users decide they don’t care about using the correct version of “your”, should Grammarly’s model change its understanding of “correct” language?

A Brief Overview

Founded in 2008, Grammarly’s mission is to “Improve Lives by Improving Communication” [3]. Grammarly’s core product is a software that makes spelling and grammatical recommendations. Grammarly currently has a freemium model as its primary revenue source and offers several integrations, including a Microsoft Office add-in and a Google Chrome extension [4].

Product Development and Improvement at Grammarly

At the heart of Grammarly’s core product (and the product of competitors Microsoft and Google) is an AI that identifies the intended meaning of a phrase using contextual clues. That AI must then serve up suggestions on how to improve the phrase. Due to the scope of such a problem, a complex learning algorithm is needed, in this case likely a recurring neural network model (this article focuses on data acquisition, as opposed to algorithm selection/design).

Data acquisition is critical because Grammarly’s competitive advantage over Google is never going to be their expertise in AI. Instead it will continue to be their ability to crowd-source training data for their algorithms. Traditionally, Grammarly collects data in two different forms:

  1. Public data-sets
    • For example, the AESW 2016 Shared Task corpus was created by a single editor making edits to scientific journal articles written in English by non-native English speakers [6].
  2. Crowd-sourced user data
    • Whenever users decide to accept or ignore a suggestion by Grammarly, they are providing data back to the algorithm. Do to their integration across multiple platforms, Grammarly has access to data from more contexts than Microsoft or Google.

Using these sources, Grammarly can continue improving its core service. But to continue innovating they will need a new sourcing strategy.

Next Steps for Grammarly

The future of written communication and of Grammarly is about more than just grammatical correctness. Users are increasingly looking for more stylistic suggestions. There are many ways to write the same grammatically correct sentence. But some of those sentences are easier to read, maybe because they flow better or use more accessible language.

To keep up with user demands, Grammarly will need to be able to differentiate between more and less coherent sentences. Which is a problem because this requires a new type of data that doesn’t currently exist. Grammarly needs data that doesn’t just look at spot changes, but at more fundamental changes to the underlying sentence structure.

To get this initial data, Grammarly commissioned a study that took text data from various sources including Yahoo Answers and Yelp reviews [5]. They then paid 50 editors crowd-sourced via Mechanical Turk to go through the data and provide holistic changes to the text to improve the overall coherency [7].

While this initial data-set will help kick-start a new phase of innovation at Grammarly, they face two long-term difficulties as they prove out this technology:

First, paid manual editing is cost prohibitive. Second, as Grammarly moves further away from rules-based suggestions, it will become increasingly difficult to maintain control over what their AI decides is the “correct” suggestion. For example, even in the above case with expert editors, the editors had relatively low agreement on what constituted a coherent sentence [5].

Recommendations

My suggestions for Grammarly are to continue expanding their integration across various platforms, particularly focusing on mobile to maintain their competitive advantage in data collection. I would also suggest rolling out a “coherence” suggestion pilot as soon as possible to start collecting valuable data. Risks are limited because users who opt-in to the pilot can always ignore suggestions.

Finally, I would continue looking forward and innovating. The natural next step is to look beyond cohesion of a single sentence to cohesion of a paragraph, or entire document. This is where platform context becomes so important since what makes a tweet cohesive is very different from cohesion in an email or paper.

Conclusion

Clearly there is a lot of value yet to be created in this space. Grammarly’s ability to navigate the data collection process will be the key to their long-term success. But we still haven’t answered the fundamental tension faced by crowd-trained ML algorithms. Can we trust the crowd to produce the “correct” answer? Especially when it comes to something as fluid and variable as human language?

Note: All spelling, grammatical, and stylistic errors in this article are puns made intentionally by the author

[796 words]

References

[1] Geron, T. (2018). Grammarly, With $110 Million, Brings Artificial Intelligence to Writing. [online] WSJ. Available at: https://www.wsj.com/articles/grammarly-with-110-million-brings-artificial-intelligence-to-writing-1494243003 [Accessed 13 Nov. 2018].

[2] Merriam-webster.com. (2018). Did We Change the Definition of ‘Literally’?. [online] Available at: https://www.merriam-webster.com/words-at-play/misuse-of-literally [Accessed 13 Nov. 2018].

[3] Grammarly.com. (2018). Help Grammarly improve lives by improving communication.. [online] Available at: https://www.grammarly.com/jobs [Accessed 13 Nov. 2018].

[4] Medium. (2018). How Grammarly Quietly Grew Its Way to 6.9 Million Daily Users in 9 Years. [online] Available at: https://medium.com/swlh/how-grammarly-quietly-grew-its-way-to-6-9-million-daily-users-in-9-years-88e417dbfbdf [Accessed 13 Nov. 2018].

[5] Alice Lai and Joel Tetreault, “Discourse Coherence in the Wild: A Dataset, Evaluation and Methods,” arXiv:1805.04993, May 2018.

[6] Textmining.lt. (2018). AESW 2016. [online] Available at: http://textmining.lt/aesw/index.html [Accessed 13 Nov. 2018].

[7] Tech.grammarly.com. (2018). Paving the way for human-level sentence corrections. [online] Available at: https://tech.grammarly.com/blog/paving-the-way-for-human-level-sentence-corrections [Accessed 13 Nov. 2018].

Previous:

Gantri Shines a Light on 3D Printing for Home Décor

Next:

The Road to Autonomous Driving at Fiat Chrysler Automobiles

Student comments on Grammar(ly): Who sets the rules?

  1. I think the more data that Grammarly collects either manually or via data mining algorithm, the better Grammarly product will be. A lot of different times, data mining alone does not produce the best results. That’s why manual checking and review at first is important. However, overtime when the algorithm gets smarter with more data, it will be able to suggest better words or writing styles instead of grammar review.

  2. Very interesting article. I’m actually a (heavy) user of Grammarly since I’m not a native English speaker. I do share your concern of how much the company should rely on crowd-sourced data and the constraint of having the recommendations checked by paid manual editors. That said, I do believe there are contexts in which grammar mistakes (or informal choices) do not matter that much such as informal communications, chats, etc. But there are other types of situations in which even one mistake or informality has a great cost (academic writing, cover letters for a job, to mention a few). Grammarly does offer in its platform the option of choosing your intent, style, and audience, and I believe that it should try to invest to get the error rate close to zero at least in a formal context of writing. Overall, I think the product is great and it is evolving greatly due to ML and the spread of use among users.

  3. This is an interesting article. I know I have been a target customer of their ads on YouTube, so I was curious to learn more about the product. I agree with mthai, the quality of the Grammarly product should improve with the increased number of users. While I have never used Grammarly, I do wonder how Grammarly goes about marketing to different consumers to scale their company. There seems to be an inherent tension between Grammarly’s growth goals (which might focus on specific population groups) and not over-sampling on the reviews from these specific population groups. Closely tied to this problem, it would be helpful to know the specific use cases of the product since it does not seem like Grammarly can be used in all cases. If Grammarly is keenly focused on a college-aged to young working professional demographic, content submissions could be more prone to biases and grammatical incorrections. Overall, I think this is a great application of machine-learning to help improve communication across countries and cultures that can continue to improve its product over time.

    1. That is a great point jrod and not one I had considered. It is definitely an issue with crowd-sourcing in general that the data you collect is probably representative of your current customer only and that in some cases may preclude you from attracting a different type of customer. I also agree with others here that the platform is super important when considering the type of language “rules” Grammarly should use. One way to mitigate the issue you bring up is by being very focused on attracting a diversity of applications/platforms. My guess is that the variation in how different demographics text is much lower than the overall variation in the type of language different demographics use. Thus filtering and controlling for platform context might help with this issue.

  4. Good piece on crowd-sourcing in an unusual and complex context.
    In addition to your question on “can we trust the crowd what it’s right or wrong” I would point out that there could be different answers for different styles and purposes of communication. For example, informal texting could have very different golden standards than formal press releases. Therefore Grammarly should not only focus on collecting crowd-sourced data but also on sorting and marking it, which could be actually done by both, users and ML.
    Another difficulty I can foresee with crowd-sourcing grammar is that it could make user experience slightly more troublesome and therefore lead to opting-out eventually. Currently one of the Grammarly advantages is smooth implementation into web browsers and it’s very easy to ignore suggestions and the plug-in doesn’t require any input from users. But once users are asked to clearly mark styles, paragraphs, beginnings and ends of the essays they could be annoyed by pop-up windows and/or often useless suggestions and consider switching off Grammarly altogether.

  5. This article reminded me of the Watson case, especially around understanding linguistic rules and coherence. I am curious why they don’t use data sources such as newspapers or novels. As Alexander noted, there are different kinds of communication for different occasions and perhaps it might be useful for Grammarly to explore learning more from the best-selling novels of 2017. Is there a way these novels are written that make them particularly attractive–for example do they use similar structures around the composition of long and short sentences? What about alliterations? I think it would be a pretty valuable insight to know if certain rhetorical techniques stand out more to our generation and can be employed by Grammarly for its recommendations?

  6. Very interesting article! I’m a heavy user of Grammarly and I always wondered how it worked and why Microsoft words could not do the proper grammar check as Grammarly does. I agree with Aditiya that Grammarly could learn more from the published sources, however, on non-fictions or scholarly journals. I guess that the primary user for Grammarly will use it for the formal business document or school essays that need to be checked. I also wonder wihch data set can they use for their product, since published sources might have a copyright issue. I also wonder how it uses different algorithm from Google Translate.

  7. I love this post – thanks Ben.

    Just because writing exists on the internet doesn’t mean it’s good writing. What data would serve as a way of scoring any piece of writing that Grammarly ingests? Is it possible to create a score in that sense without prescribing a particular point of view? The Hemingway App (http://hemingwayapp.com) is a tool I use (and recommend) to people interested in cleaning up a draft, but it favors terse language. I love the philosophical points that you raise – should grammar be objective? Definitive? If you’re curious, here are David Foster Wallace’s thoughts: https://harpers.org/wp-content/uploads/HarpersMagazine-2001-04-0070913.pdf.

  8. Thanks for the article. This was such an interesting read. As an avid user of grammarly myself, I do find that crowdsourcing does provide mixed suggestions when the algorithm is reviewing my work. I wonder if their data set is learning from the correct source or if they should be diversifying as new words are added to the English language each day.

Leave a comment