Tay: Crowdsourcing a PR Nightmare

“We are deeply sorry for the unintended offensive and hurtful tweets from Tay, which do not represent who we are or what we stand for, nor how we designed Tay.” [1]

Approximately 16 hours after launching its conversational chatterbot Tay in 2016, Microsoft shut her down due to what will likely go down as one of the company’s most embarrassing moments. The company got more than it bargained for when it built its “AI with zero chill;” within hours of being on Twitter, Tay began spouting wildly inappropriate and offensive tweets. [2]

The Idea Behind Tay: Crowdsourced training for conversational AI

Conversational chatterbots have recently attracted the attention of technology companies. For example, Facebook acquired a conversational AI company (wit.ai) in 2015 and integrated it with Bots for Messenger [3]. After opening Alexa to developers, Amazon is now sponsoring a $2.5M prize to create a conversational chatterbot [4]. Why chatterbots? Chatterbots provide these companies with a wide variety of both B2C (e.g., personal assistant, entertainment) and B2B (e.g., customer service/success tools, advertising) opportunities, which are amplified by the trend towards messaging platforms and away from apps. Why conversation? Many see conversation as the next generation of user interface.

Given Microsoft’s push to “democratize AI” and its investment in Cortana, it makes perfect sense that Microsoft focused R&D dollars on Tay. Conversational AI, however, is not simple. Unlike rule-based chatterbots, a conversational chatterbot has to respond to the infinite number of potential inputs a user could supply in natural language. A conversational chatterbot has to infer meaning/intent from natural language, find an answer, and generate a response in natural language. The modern approach is to train the ML algorithms on extensive datasets. [5]

At first, Microsoft trained Tay internally. Microsoft later released her on Twitter in what appears to be an effort to crowdsource further training. In their words, “The more you chat with Tay, the smarter she gets.” Every time users exchanged tweets with Tay, they were providing data to enhance her algorithms. The tweet was likely incorporated into Tay’s training corpus, and users’ reactions to her responses (e.g., likes) may have provided feedback for her algorithms. By hosting Tay on Twitter, Microsoft crowdsourced the interactions needed to train her algorithms. Indeed, Microsoft expected crowdsourcing to improve Tay’s AI, stating, “It’s through increased interaction where we expected to learn more and for the AI to get better and better.” [1]

The Reality of Tay: Garbage in, garbage out

Hours after being on Twitter, Tay surprised everyone with a variety of inappropriate tweets. Twitter trolls exploited two vulnerabilities with her system. First, Tay had a “repeat after me” feature that allowed users to put words into her mouth. Second, Tay did not seem to carefully filter tweets for appropriateness before using them to train her algorithms. The result? Trolls began tweeting inappropriate things at Tay and Tay did exactly what she was supposed to do: learn from those tweets. [6]

Learning from Tay

Microsoft failed to control its crowd and encourage it to be productive. This failure was particularly interesting to me because Microsoft’s crowdsourcing challenge reminded me of Weathernews’ challenge. Both companies were engaging a public crowd to collect data and create a public product, but their outcomes were wildly different.

One difference is that Weathernews had a risk mitigation strategy for trolls while Microsoft had a weak one at best. Weathernews anticipated that users could engage in destructive behaviors and deterred this by charging users to contribute. While this may not have worked for Tay, Microsoft could have employed other risk mitigation strategies. Microsoft could have limited Tay’s speech to certain topics or filtered tweets for appropriateness before using them in her algorithms. Companies who want to engage public crowds need to anticipate potential opportunities for abuse and develop mitigation strategies.

Another difference is that Weathernews tried to align the crowd behind a common cause, namely increasing situational awareness of events that impact the public (weather, earthquakes). Microsoft, however, pitched Tay as a form of entertainment, the definition of which is fairly open to interpretation. It’s possible that a common purpose instilled a set of values in Weathernews’ crowd that decreased destructive behavior. In retrospect, I wonder if Microsoft would have seen less destructive behavior if it created a different chatterbot persona and aligned the crowd behind the purpose of advancing the field of AI. It’s quite possible that this would have reduced the size of Tay’s crowd, but I would take a small, productive crowd over a large, destructive one any day.

[1] https://blogs.microsoft.com/blog/2016/03/25/learning-tays-introduction/#PsUkq77fw0qCJXQH.99

[2] https://en.wikipedia.org/wiki/Tay_(bot)

[3] http://www.recode.net/2015/1/5/11557500/facebook-acquires-wit-ai-a-startup-that-helps-people-talk-to-robots

[4] http://www.geekwire.com/2016/amazon-award-2-5m-quest-alexa-chatbot-can-converse-intelligently-20-minutes/

[5] https://www.wired.com/2016/03/fault-microsofts-teen-ai-turned-jerk/

[6] https://medium.com/@carolinesinders/microsoft-s-tay-is-an-example-of-bad-design-d4e65bb2569f#.5iso9wpvm


The Travel Industry Should Be More “Crowded”


IndieGoGo: Supply and Demand -> Demand and Supply

Student comments on Tay: Crowdsourcing a PR Nightmare

  1. Great post! On the topic of displaying emotions and learning from others, my mother (shameless self-advertising) wrote a fascinating paper on the basic building blocks of how to design AI emotions:
    I think the one lesson Microsoft didn’t realize is: It is hard to engineer anything that one cannot precisely define, and – it has been shown that exaggerated abuse of emotional outputs like the above could irritate users.

  2. Nicely written post. I grapple with why companies struggle to account for trolls in risk mitigation strategies. After the highly public crowdsourcing failures of Mountain Dew and Boaty McBoatface, you would think companies would exercise more control. Part of the solution may be acknowledging the distinction between public vs private crowdsourcing. While not equal, private crowdsourcing could provide much more control. Like you said, a reduced test size for Tay would have allowed for learning without the public backlash.

  3. Great post! The comparison between Tay and Weathernews is interesting and insightful. However, the algorithm that Microsoft and Weathernews want to build may be different, and it makes sense that Microsoft crowdsource from the public. Since Tay essentially will serve the public, it needs to be robust. I think this is a great lesson for Microsoft.

  4. Interesting post! A friend of mine made a very similar bot in the Netherlands. It worked fantastic, until it go into what he called ‘black swan’ events. It’s just very difficult not to have people try to game the system, exactly because they know its a bot. I wonder if they could have done it better by either just not revealing its a bot – but that its a live chat across the globe for instance – by beta testing it first with a small community outside of the public eye, or by having a service rep still having the final say over a post. And then slowly scaling it / opening it up. Interesting to see how they / other companies next plan to pilot bots!

  5. Very curious article. Microsoft would have done well to hire/incentivize its’ 14 year old XBox gamer community to break the bot before releasing it out to the public without restraint. On the other hand, from a PR perspective, the bot did what it was intended to do and also garnered Microsoft PR and acclaim on AI/ML.

  6. A very interesting read. I liked the idea to crowdsource for the bot to learn from public. The result would have been fantastic had it worked. It would have been much better if the machine learning algorithmic rules came out without any loopholes for the public to exploit. Have you come across any other company which exercised a similar approach?

Leave a comment