The IRS and Big Data: solving a big problem, problematically

…and you owe him backtaxes.

The IRS is not the most popular of government agencies in the US, but it is one of the world’s most efficient. For every $100 collected, the IRS spends around $0.35, which is half the average amount of all OECD countries, and a third the amount that Germany, France, England, Canada, and Australia spend. (1)

But despite the increasing efficiency and growing tax return volume, the IRS has recently been facing funding headwinds: according to the Center on Budget and Policy Priorities, from 2010 to 2016, the budget has been cut by 17% and headcount has been reduced by 14%. The report estimates that for every dollar of funding the IRS loses, it fails to collect on five dollars it is owed. And there are a lot of additional dollars owed: the government estimates that between 2000 and 2010, tax evasion cost the US government over $3 trillion. (2)

In response, the IRS has adopted more expansive and sophisticated analytics to combat tax evasion and tax fraud. Traditionally, targets for audits are chosen based on comparing data within the tax return to historical tax returns or other sources of data not submitted by the taxpayer, such as a W-2. However, a recent paper from two Washington State University professors of business law and accounting has uncovered some new insight into how the IRS has expanded its data gathering program. Based on a number of recent court cases and FOIA requests, the IRS has acquired the ability to monitor phone traffic, through the purchase of Stringray phone tracking technology; emails, through Electronic Communications Privacy Act court orders that permit auditors to read emails from private accounts; and publicly available social media. (3) Dean Silverman, the former Senior Advisor to the Commissioner in the Office of Compliance Analytics for the IRS, indicated that it uses big data for the following: (4)

  • Charting and analyzing social media such as Facebook
  • Targeting audits by matching tax filings to social media or electronic payments
  • Tracking individual Internet addresses and emailing patterns
  • Sorting data in 32,000 categories of metadata and 1 million unique “attributes”
  • Machine learning across “neural” networks
  • Statistical and agent-based modeling
  • Relationship analysis based on Social Security numbers and other personal identifiers

The government uses these sources of data to build highly detailed profiles on taxpayers, which in theory allow it to more accurately detect deviations from expected behaviors and issue audits. They can also use this same data in court to help bolster their case.

There are numerous potential issues with this type of data collection. The first is perhaps the simplest: no one is aware of its extent. Part of the issue is that the legal basis for these expansions of data gathering are poorly covered under current regulations. A patchwork of different laws govern certain aspects of the ability for the IRS to gather third-party information on taxpayers, but is largely outdated. Both the modes of communication and means of tracking have far outpaced attempts to govern them. As a result, the IRS is under no obligation to share the data with the public. Taxpayers cannot review the information stored about them to correct mistakes.

This last point is also related to the second objection: accuracy. Internet data is not known for the rigor of its collection; the idea being, the more the better, and the ‘average’ will be roughly correct. As the paper’s authors point out, big data companies can profit from even low accuracy rates from their bundled profiles, and the consequences of being wrong are (relatively) low: a customer gets misidentified and shown a product to service that is not relevant to them. But the consequence of a poorly-matched IRS profile is a potential time-consuming and expensive audit.

Data security is also an issue. Recent repeated security breaches at Target, Yahoo, and others have shown what damage can be done with consumer data. And while we might expect the security at the IRS might be somewhat more robust than these private companies, the stakes are also higher: a full repository of robust individual profiles of every taxpaying U.S. citizen would be worth the attempt.

Privacy advocates have recently focused on large businesses collecting and selling personal information as an emerging threat; but as a meta-data-gatherer, there is probably none more capable or incentivized than the IRS.




(2) Federal Revenue Lost to Tax Evasion, DEMOS,

(3)Houser, K. A., & Sanders, D. (2017). The Use of Big Data Analytics by the IRS: Efficient Solutions or the End of Privacy as We Know It? Retrieved April 9, 2018, from

(4) ibid


Silicon Valley’s Secretive Big Data Unicorn


Minority Report IRL: Palantir and Predictive Policing

Student comments on The IRS and Big Data: solving a big problem, problematically

  1. .

  2. I wonder what it will look like with the further development of AI? While consumer data is of vital importance (thinking of Facebook most recently) I also think there is great potential here to develop the technology such that it can help with nefarious acts. I think the IRS is often one step behind the criminal. Thanks for the post!

Leave a comment