Symantec: Using Machine Learning to Improve Malware Research

As cybercrime grows, the defenders found a new tool for their arsenal: Machine Learning. Symantec implemented a creative approach that harnesses the power of ML to detect sophisticated actors in cyberspace.

Machine Learning (ML) has disrupted many industries – automotive, healthcare, robotics, and more[1]. An interesting angle of this megatrend that has not been discussed in class is the way it affects cyber malware research.

Traditionally, malware research is a labor-intensive task, requiring experienced and specialized researchers[2]. Anti-viruses, the most common type of endpoint security product, protect customers using “signatures”, a set of rules that recognize malware. A researcher comes up with an idea, codes it, tests it for false positives and negatives and finally deploys it. Valuable signatures identify an emerging trend or anomaly and find a generic way to detect all types of such malicious behavior. Malicious actors constantly change their tools to avoid existing signatures, thus overconcentrating on a specific sample is futile.

In recent years, the rise of cybercrime to $1 trillion USD annual losses pushed cybersecurity corporations to their limits[3]. In order to effectively protect their customers, it was necessary to find a disruptive way to improve exponentially. A CB Insights research paints a macro-level picture of the race to acquire AI cybersecurity startups[4] by tech giants, understanding the potential of the new trend.

Symantec Corporation, a leading cybersecurity firm with a market cap of $14bn[5], adopted this approach with their Targeted Attack Analytics (TAA)[6] platform for enterprise customers. In contrast to traditional methods, TAA is a cloud-based ML platform that helps Symantec’s Security Operations Center (SOC) detect incidents and effectively resolve them. Early this year, Symantec exposed Thrip, a cyberespionage group, detected through a flag raised by TAA[7].

TAA is essentially a new product development process for Symantec. Instead of relying on personal discovery of trends, an ML platform identifies incidents based on the huge data sources Symantec has access to: all its clients. As more incidents are analyzed, the model is retrained and improved. This new process type allows Symantec to focus its expensive resource – researchers – on needles that have been taken out of the haystack.

Seeing the increasing challenges in the industry, Symantec decided to sell its Information Management division, Veritas, to The Carlyle Group[8], believing the split will help Symantec focus on its critical task –cybersecurity. In the medium term, Symantec is leading an expansion effort, such as launching a new SOC in India[9].

Taking the global context and Symantec’s actions into consideration, I propose two ideas that could help Symantec on its quest to protect its customers. First, Symantec should develop a platform that will allow independent analysts to perform research based on Symantec’s TAA platform. This platform will crowdsource the identification & recognition of incidents to individuals who want to contribute to the world’s safety, much like how Waze crowdsourced navigation. Privacy concerns should obviously be addressed, but this product could help Symantec skyrocket the number of trends and anomalies they detect. Using ML on the independent submissions could help Symantec researchers focus on significant trends that have already been flagged by a cheap workforce.

Second, Symantec should strive to create data-sharing partnerships with the tech giants – Google, Microsoft and Apple. Symantec relies on their operating systems to provide value to its customers, but Symantec’s products are handicapped – the tech giants give themselves favorable treatment and have more access to telemetry and relevant data. An example is Windows Defender Advanced Threat Protection[10] – a service that Microsoft develops for its enterprise customers. To stay at the head of the cybersecurity industry, Symantec must position itself as a complementary solution that gives additional value over the baseline tech giant products. For that, Symantec needs more data.

As we’ve seen, in the recent years Symantec has successfully harnessed the ML megatrend to its own advantage. However, cybersecurity is a cat & mouse game. Every technological advance by the defenders elicits a workaround by the attackers. How will attackers overcome the ML obstacle, and what should be the cyberdefense industry’s next major step? (799 words)

[1] Daniel Faggella, “Artificial Intelligence Industry – An Overview by Segment”, Tech Emergence, September 16, 2018,, accessed November 2018.

[2] Adam Kujawa, “So You Want To Be A Malware Analyst”, Malwarebytes Labs, September 18, 2012,, accessed November 2018.

[3] Kaspersky Labs, “From a Hobby to an Industry”,, accessed November 2018.

[4] CB Insights, “Cybersecurity Exits Timeline: Activity Remains Strong As Tech Corporates Target AI Startups”, May 31, 2017,, accessed November 2018.

[5] Yahoo! Finance, NASDAQ:SYMC,, accessed November 2018.

[6] Symantec, “Targeted Attack Analytics”,, accessed November 2018.

[7] Security Response Attack Investigation Team, “Thrip: Espionage Group Hits Satellite, Telecoms, and Defense Companies”, Symantec,, June 19, 2018, accessed November 2018.

[8] Symantec, “Symantec Completes Sale of Veritas, Now Singularly Focused on Cybersecurity”,, January 29, 2016, accessed November 2018.

[9] Computerworld, “‘SoC 3.0’: Symantec beefs up Asia-Pacific cyber security with expanded Chennai SoC”,, August 1, 2018, accessed November 2018.

[10] Microsoft, “Windows Defender Advanced Protection”,, accessed November 2018.


Ant Financial – Pioneering China Fintech with Machine Learning


Machine learning in the Chemicals industry: Lyondellbasell

Student comments on Symantec: Using Machine Learning to Improve Malware Research

  1. Such an interesting read. Do you think that with Symantec’s structural decision to transition their focus solely to cybersecurity- particularly selling off their Information Management component- is risky in that they might be putting all of their eggs in one basket, especially when, as you mention 1) security and privacy issues risks are so high and 2) it’s so hard to predict what the future dynamic will look like between attackers and cyberdefense?

Leave a comment