Automation at The Associated Press: Could a robot have written this blog post?

The AP is using algorithms to automatically generate news stories from structured data. But can it keep up with where data goes next?

~~ Update: For a brilliant and breathtaking example of automatically generated news based on audience input, as per the last section of this post, see Sunday’s New York Times interactive describing all 649 nonillion ways the NFL season could end. I mean, wow. ~~

In the summer of 2014, the Associated Press did something that, to many journalists, was straight out of science fiction. It began assigning earnings reports stories to robots.

It was able to do this for two reasons, both related to data. The AP recognized that much of the information in companies’ quarterly reports could easily be read as structured data — data that’s organized in a predictable structure, and therefore easily read by a machine. Additionally, computable comparisons between data determined the angle of and details in the stories. Did a company perform better or worse than expected? Compare its current earnings to analyst predictions. Was it just shy of expectations or did it miss them entirely? Compare the difference between expectations and performance.

Get the data, add a carefully crafted algorithm, and you’ve got an automatically generated news story.

Using Wordsmith, automation software from North Carolina-based Automated Insights, the Associated Press now creates about 3,500 stories per quarter using automation, and expects to produce 4,500 such stories by the end of 2015, it told Nieman Reports. In January, the company said its automated stories had already increased the company’s story output tenfold. Today, editors estimate that the stories have freed up as much as 20 percent of its business reporters’ time to work on other content.

“Every time you’re freeing up a staffer’s time, or somebody’s time, that time is going elsewhere to do something that is more relevant in the modern media world we’re living in,” Lou Ferrara, the AP’s vice president and managing editor, told Poynter.


How an algorithm builds a story. Screenshot from Wired.
How an algorithm builds a story. Screenshot from Wired.



Founded in 1846, the Associated Press is one of the oldest and largest newsgathering operations in the world. It’s a not-for-profit cooperative owned by its members, a host of American broadcast and news organizations, who support it with licensing fees. In 2014, 47 percent of the AP’s $604 million in revenue came from broadcasters, 25 percent from newspapers, 9 percent from Internet companies, 7 percent from other agencies, and 5 percent from radio stations.

Though automated earnings stories free up plenty of the AP’s resources for value creation, the company doesn’t see them as a way to grow revenue from its local markets — yet. But the stories are helping the company reshape its strategy to better serve its local customers, whom it hopes to retain despite job cuts and losses across the industry. Most of the AP’s customers are local news operations, and the AP has had to cut back on coverage of local companies as it’s struggled under its own financial strain (it posted its first revenue gain in six years in 2014).

Thanks to its output of automated stories, the AP has been able to restore its coverage of more local and regional companies, and then some. It’s programmed its automated system to gather data and produce earnings stories for all companies with a market capitalization of $75 million or greater, a number that could never have been covered by its staff reporters. With the AP supplying basic earnings reports stories, local business reporters can focus on more in-depth work.



As a wire service for other news organizations, the Associated Press’ customers value few things more than speed — especially with business critical information like earnings reports. In that respect, if there were a way to speed up the deployment of basic stories to its customers and the AP didn’t take advantage of it, it could lose competitive ground to someone who did.

It is no surprise, then, that the AP has done so much so early to develop automation. It’s getting ahead of the industry. Before the AP deployed its automation technology, staffers pushed to post 130-word earnings stories on the wire within 20 minutes of a press release — no easy feat for a human staffer. The AP now gets 500-word stories on the wire within a minute of a press release — and can generate 2,000 stories a second, according to stats shared with Nieman Reports.



The AP, along with the rest of the news industry, is just beginning to deploy story automation to assist in its value creation for readers and customers. Earnings stories were just the first step for the AP. They’re experimenting with automating sports stories and even election stories with returns from local polling stations. Anywhere structured data is news, automation can give a big assist.

The next big step with news automation will likely be fueled not by data about the story, but about the reader. The New York Times, ProPublica and others have already published one-off projects that customize stories to different readers based on those readers’ locations or expectations. As University of Maryland Francis King Carey School of Law professor Frank Pasquale told Nieman Reports, if stories can be tailored for users based on things like income, geography or other characteristics, newsrooms will soon feel pressure to do it. “That’s going to be seen, eventually, as revenue maximizing,” he said. He’s right.



IBM Marketing Gets Tennis Moneyball Rolling


Gmail: ensuring a spam-free inbox with Machine Learning

Student comments on Automation at The Associated Press: Could a robot have written this blog post?

  1. I find this fascinating. A few disconnected thoughts:

    * Automated writing, long term, could be analogous to driverless cars. How could a machine ever replace a human? Time. Sort of scary for people who make a living by getting money for their writing (instead of fame for their writing).

    * Data already changes the stories we see, in Facebook for example, but going forward it seems like properly harnessed data could make news more like a video game: actions the user takes will change their new experience, and it will be calculated live; or at least selected from a predetermined set of options.

    * The general business of providing pre-written news that can be slotted in somewhere seems like a poor long-term business in an age where Google only recognizes content once; if you post it second, you might as well not have posted it, because Google will consider it copied content or spam. Where does that leave a wire service? Maybe the prevalence of mobile is the way out… but even still, why have the same story in more than once place?

    1. To your first and second point, the key for journalists is not to make the mistake of viewing automation as an all or nothing prospect. Too many workers of all types do. They assume their jobs will be either all robots, or all humans. The best is a combination of both. And as Ferrara told Poynter, the thing about news is that it never ends. If you free up people from the busywork of basic stories robots could produce, there’s an endless supply of potential and valuable stories they can tell instead, hopefully with a keener eye toward the audience.

  2. Super interesting! On your “coming up” point, if news agencies are trying to personalize news based on reader data with automated news writing algorithms, how will it be able to sustain a competitive advantage against user-generated, highly personalized content such as Twitter?

    1. Well, it may not. It’s certainly something to worry about. In fact, I think news organizations’ number one competitive threat are the companies that have way more access to audience data than they have. It tips the scales. The second Facebook wants to go into news, or Twitter wants to deepen its news delivery (it’s already taken a much bigger step than before with its recent Moments feature, which delivers staff-curated stories), the companies will have a huge audience intelligence advantage. Snapchat is another one to watch for. Probably the best thing news organizations can do right now is prioritize audience analytics wherever they can. Automation, if it spreads, could help. A big reason newsrooms innovate slowly is because they always feel strapped, with all resources strained on news production.

Leave a comment