Extra! Extra! Robots help Journalists! – Bringing AI into the Newsroom

How can news organizations, at a time that they might die at the heels of free media, use machine learning to drive product development and productivity?


When asked to describe machine learning in layman’s terms, quantitative futurist Amy Webb put it simply as “a branch of computer science in which computers are programmed to do things that normally require human intelligence.”

Since their inception, newsrooms have been packed with journalists hungry to uncover the truth and share it with the general public. Yet the rise of digital, free, sites have put this industry at a crossroads and machine learning might very well be their last remaining lifeboat.

AI could not have replaced Woodward and Bernstein in uncovering the Watergate scandal. It required building trust and rapport with sources that a machine could never do. But machine learning did significantly help the ICIJ (International Council of Investigative Journalism) sort through the millions of documents uncovered in the Panama Papers scandal1. With data dumps becoming more common via sites like Wikileaks, journalists now look towards machine for much needed assistance in finding the “truth” needle, amongst the haystack.

Beyond its use as “truth” search engine. News organizations have also leveraged AI as an investigative tool, exemplified by Buzzfeed News designing algorithms to help detect undercover spy planes2 and ProPublica detecting patterns in politicians stated interests3. Some have even taken the extra step of replacing video editors and news hosts altogether4, however experimentally.

The case of the management of Infobae.com (a leading Argentine news site) is slightly different. In the words of feature editor Loreley Gaffoglio “We look towards the US and now China mostly for our source of inspiration. South American news organizations are not seen as innovation thought leaders. But with some of the world’s most impactful corruption scandals -Petrolao in Brazil, Cuadernos in Argentina- we are uniquely poised to use this technology as a public service”.

Indeed, the status quo of corrupt government organizations, can perhaps for the first time be challenged in this region, since it’s main pillars (fraudulent contracts and bribes) always leave behind a money or paper trail. For the first time these organizations can deploy tools to sift through government documents and bank transactions and uncover the red flags.

In the short term the site is looking to partner with banking consultants to leverage already existing fraud detection technology. “My banks knows within seconds if a transaction is questionable, they will block it and inform me. Why not then use that same technology to help us detect bribes?” comments the editor. This could open a door to possibly stop corrupt practices as they happen, pointing journalists to the right lead, instead of wasting their time down a dead end. “Nothing eliminates the importance of whistle blowers and confidential sources, but this helps us know where to look for them, and that’s already 80% of the job”.  In the long term the organization is looking towards expanding fraud detection to also include contracts. “Corrupt practices are always hidden within government, we have found it incredibly difficult to effectively identify these, but this is where our efforts are headed”.

It is in this aspect, precisely due to its difficulties and potential, that I would advise the organization to focus on. By chartering a team, possibly in partnership with the ICIJ, they could develop software that intelligently detects common fraudulent practices in contracts. The main difficulty is the balance between supervised and unsupervised learning, very much like the IBM Watson team struggles. A truly useful program would be able to identify discrepancies between the legal measures applied and the scope of project under the contract. A purely supervised model would only be able to identify corrupt practices based on current practices (assuming the program had sufficient data inputs), but as corruption evolves it would always stay a step behind, and therefore fail as investigative tool to uncover something new. An unsupervised learning model on the contrary would require a much bigger data set, but could potentially be more intuitive, and could also deliver results based on a confidence percentage like Watson did and so point journalists towards the most attractive leads for their verification.

The main problem would then lie within the organizations focus and allocation of resources. As explained by the editor “Our main issue is that we live minute to minute. In an endless cycle of breaking news, worsened by our digital environment. We would like to have the tools to help investigations, but it’s hard to dedicate an entire team to something that’s years down the line, when we’re thinking about what we’re going to publish 2 hours from now”. And so I look to my classmates to ideally help solve this conundrum. What would be the best way to establish an AI team within a news organization with limited resources? What should be their first area of focus? Should this be internally led? Or build upon others’ inventions?


Word Count: 797




1 https://www.icij.org/blog/2018/08/how-machine-learning-is-revolutionizing-journalism/

2 https://www.buzzfeednews.com/article/peteraldhous/hidden-spy-planes#.hlN9ElJz3

3 https://www.propublica.org/nerds/teaching-a-machine-what-congress-cares-about

4 http://time.com/5450141/china-xinhua-artificial-intelligence-news-anchor/


Data is the new oil: Rio Tinto builds new intelligent mine


Vertiv Not Raging Against The Machine

Student comments on Extra! Extra! Robots help Journalists! – Bringing AI into the Newsroom

  1. There’s huge potential in the news industry for AI to be utilised to gauge ‘truth’ needles. With the endless stream of information out there, it becomes practically impossible for humans to sift through them and uncover what should become a story. AI helps bridge that gap and augments the human effort to find the next big story.

    There are a number of efforts that focus on doing this, which can be leveraged to build tailored solution for the news Organization in question. For this reason, I would lean on available APIs to develop an internal tool capable of identifying stories. I believe it needs to be an internal effort, because your competitive advantage is your ability to source and report on stories and by bringing this capability in-house, you can help to differentiate yourself from your competition.

  2. In theory, using machine learning to sort through data dumps can absolutely be additive to the journalistic process, as proven by the example of the Panama Papers cited above. Equally, the role of the reporter cannot be lost in this effort, and the more that journalists use AI, the more those journalists must build a secondary knowledge base to understand what exactly the algorithms they are using are doing. For instance, without proper direction, journalists might miss key insights in a data dump, and therefore assume that, because their algorithm did not return anything of interest, there is no story to pursue.

    Additionally, if someone is creating algorithms to root out corruption, should that responsible agent be only journalists? If we are talking about countries or governments or corporations where the agent that would need to implement the technology would be the same agent under suspicion of corruption, then, perhaps, journalists are the appropriate actors to step in and fill the void. But, in the long term, governments could implement this technology themselves as a self-governance mechanism to discourage corruption, which seems a more appropriate place to lay the responsibility, than to leave it entirely to the “Fifth Estate.”

Leave a comment