Training Robots to Curate Individualized News

Social news provider Newstag is addressing issues of user engagement and content aggregation through deployment of machine learning algorithms

Newstag is a social news platform, allowing users to create tailored news channels derived from content provided by some of the world’s largest news providers (CNN, CNBC, Bloomberg, CCTV, Reuters). The revenue model is based on advertising; brands purchase video tags that they want to be associated with. Individual users can access the full service free of charge. In addition to the advertising revenue stream Newstag sell curated channels to organizations to distribute content to their employees. Newstag also work with charitable organizations, allowing users to donate 5% of the revenue they generate.

As a digital organization engaged in curation and distribution of individually tailored content to end users Newstag faces many challenges for which machine learning and pattern recognition can provide relevant solutions. These challenges come in the form of managing increasingly large amounts of content, as well as creating relevant content to its users. Without overcoming these challenges, Newstag will struggle to move into the mainstream and find a place as a niche content aggregator.

The challenge of user engagement

Recommendation Engines [2]
Similar to other platforms serving a wide range of content (e.g. YouTube), creating recommendations is key to user satisfaction and engagement with the platform. Newstag measures this through the “session duration” KPI, i.e. the average time a user engages in content during a visit. In order to increase engagement the company recently moved from using a simple “Content Based”-filter to applying a machine learning algorithm to create a “Collaborative”-filter. Using a collaborative filter has allowed Newstag to provide a more dynamic and personal content feed, hence keeping the user engaged on the platform longer. While this is already in place, the process of training the algorithm is continuous, and it is improving as more users join the platform – more data leads to richer taste-profiles, which in turn increases the accuracy of recommendations [1].

Approaching this problem with internally generated data from historical user interactions has helped improve the recommendation algorithm and provide a relevant feed for existing users. However, in order to attract and retain new customers there is a need for Newstag to improve recommendations for customers on which they have limited data. One approach to resolving the problem is to rely on third party data, such as other social media platforms, to broaden the availability of data used in predicting customer interests [3].

Additionally, Newstag faces the challenge of predicting user churn, since the value of the platform is mainly based on the number of active users. Leveraging self-reinforcing machine learning algorithms to improve churn prediction is becoming more common [4], and by using available data on user interaction patters with the platform Newstag could identify “at risk” customers and address issues proactively.

The challenge of “big data”

For Newstag to offer relevant content there is a need to collect and classify data. Initially this was done using manual labor, individually tagging each video with relevant keywords. As the company has increased the amount of available content this was no longer feasible. To resolve this issue management has started to rely on supervised machine learning algorithms to automatically tag video content. These algorithms train on features based on the metadata associated with the videos, such as description text, duration, location etc. [1].

In order to further augment their video library, and ensure they are able to provide the most up to date news, Newstag should start classifying content that does not have any meta-data attached to it, allowing them to provide news from a broader range of providers. This would require the company to develop the ability to classify content based on video image recognition, a still nascent technology in machine learning. However, it has been proven to be significantly faster than a human in classifying objects in video [5]. Using this technology would allow Newstag to broaden content availability, and attract and engage more users.

The Longer Tail of Issues

In addition to the user engagement and content classification issues, Newstag has to continue the battle of attracting users to the site in the first place, as well providing their advertisers and corporate partners with a clear and quantifiable value proposition – and improve their ability to help target customer segments.

Word Count: 798

End Notes:

[1] Andersson, M., 2018. How Newstag is Leveraging Machine Learning [Interview] (09 11 2018).

[2] Maruti Techlabs, 2018. How do Recommendation Engines Work? And What Are The Benefits?. [Online] Available at:

[3] Lewenberg, Y., Bachrach, Y. & Volkova, S., 2015. Using Emotions to Predict User Interest Areas in Online Social Networks. Paris, France, IEEE.

[4] Heather, S., 2017. Analyzing Customer Churn using Azure Machine Learning. [Online] Available at:

[5] H. James Wilson, S. S. A. A., 2016. How Companies Are Using Machine Learning to Get Faster and More Efficient. Harvard Business Review, 3 5, p. 3.


Machine Learning for Recruiting at Amazon – Challenges and Opportunities


Pinterest knows what you want before you do

Student comments on Training Robots to Curate Individualized News

  1. I agree that Newstags value proposition compared to long-established newsfeed providers such as Feedly is one of the key challenges. Feedly, as newsfeed that pulls together news from all your preferred newspapers, has been in business since 2008 reliably serving its users with their news for free.
    The only component that is missing is content recommendations – a service that many newsreaders are sceptical of due to the growing concern that, in order to engage users on the platform longer, providers like Newstags will present you with the news you want to hear instead of giving you a fair representation of the actual events. With that in mind, services like Newstags might not offer enough value for consumers to switch from long existing platforms and they might face criticism in terms of a fair representation of the news.

  2. Great read! Machine learning could not be a better fit for an organization like Newstag. We as consumers expect the products/content platforms we use to be increasingly tailored to our needs and likes – I can’t even imagine the level of specificity consumers will demand from content platforms in 10, 20, 30 years time. By the time our children and grandchildren are engaging with content platforms, I suspect the techniques Newstag currently employs (using historical data and third parties to generate recommendations) will look archaic.

    I completely agree that the level of data Newstag is needing to process is getting beyond human capabilities; very soon machine learning algorithms will be the ONLY way to manage this data. I’m curious to know how Newstag considers their 1) redistribution of human capital and 2) investments in software vs software developers. Are they concerned about the implications of the future of this type of work becoming more and more software dependent with less and less reliance on human intervention? How are they navigating that discussion? Should we as consumers be concerned if the companies who supply our content platforms are NOT discussing the ethics of moving away from human capital and towards machines?

Leave a comment