Techs and the City: Data and Machine Learning in Boston

The City of Boston faces new challenges as it seeks to harness the power of machine learning to improve performance, optimize processes, and explore frontiers in local governance.


Why Machine Learning is Important to the City of Boston

In a more and more data-driven world, public organisms are no strangers to the trends of Big Data, Artificial Intelligence and Machine Learning, nor are they immune to the challenges these trends pose.

The use of machine learning is valuable to local governments because it allows them to tackle old problems in new, more efficient ways, driving operational efficiencies through process improvement, while simultaneously unveiling the potential to tackle problems that up until now were not on the table. The first category, old problems new solutions, tends to correspond to automation without learning (e.g. report pipelines), while the latter category, new frontiers, involves learning from data (e.g. traffic prediction through video processing). [1] Where can machine learning be applied to municipal governance? The potential seems unlimited: a recent McKinsey Global Institute study on Smart Cities categorizes 5 potential fields of application of machine learning in local governments: Energy & Waste Management, Security, Healthcare, Economic Development, and Mobility. The study estimates that unleashing the potential of machine learning in cities could lead to an improvement of 10-30% in some key quality-of-life metrics (e.g. commute time saved, emissions averted, or crime incidents prevented). [2]


What the City of Boston is Doing to Incorporate Machine Learning

Mayor Walsh mandated the creation of the Citywide Analytics Team in 2015 to “bring the power of data to everything that we do […] Soon, you won’t have to fumble for quarters to pay the parking meter”. [3] The City has come a long way since, creating the position of Chief Data Officer and growing the team from the initial 6 members to 18 by the end of 2017 [4]. The Analytics Team maintains an open data portal that provides public access to more than 140 city datasets and regularly sponsors “hackathons”, inviting collaboration from the academic community as well as the general public in addressing city problems in innovative ways. [5]

Initial efforts were focused on structuring the different data available in the city and the launch of City Score, a balance scorecard that aims at exhaustively providing KPIs for city performance. [6] While none of these efforts are machine-learning-specific activities, both the availability of the data and its correct understanding are critical tasks that precede any successful machine learning implementation, and Boston began taking advantage of that data availability.

Currently, Citywide Analytics continues to work with the different departments to identify areas where to apply machine learning techniques to drive efficiencies and ultimately improve quality of life in the city. A concrete example of an early machine learning project that saw immediate and measurable gains for the city arose from addressing the challenge of food-borne disease.

Health Inspections in Boston: a Case Study

Traditionally, restaurant health inspections have been carried out essentially at random, resulting in wasted time at compliant restaurants and potentially missed public health risks that endanger constituents. The City of Boston partnered with DrivenData to address inspections more intelligently by using machine learning to predict where violations were most likely to occur, using data from previous violations, constituent complaints, and reviews on Yelp.  The result, RHIPA (the Risk-based Health Inspection Prediction Algorithm), allowed inspectors to catch 25% more health code violations than before, as of 2017. [7]

Figure 1: Boston’s health inspection hotspots [8]

Challenges and Recommendations

Apart from technical problems such as the construction and maintenance of big data lakes that enable the City’s data scientists to carry on their analysis, management must focus their efforts in overcoming two key challenges.

First, the Citywide Analytics team currently works on a reactive, bottom-up manner to serve city departments on a quasi-first-come first-serve basis. This approach might be preventing the city from developing a top-down strategy, which would involve an exhaustive pipeline of projects, appropriately prioritized based on city needs and potential impact.

Second, talent retention appears to be a big problem the city must address. Andrew Thierrault, former Chief Data Officer, left the city government to join Facebook as a Data Science Manager in the Spring of 2018 [9]. Jascha Franklin-Hodge, former Chief Information Officer who launched the City Analytics team in 2015 also left the organization earlier in 2018 [10]. As of November 2018, both positions are filled with interim officers, suggesting the City is struggling to find replacements. 27% of team members listed in the 2017 Year Report no longer list Citywide Analytics as their current employer on LinkedIn.

How can Citywide Analytics transition from the current reactive-to-demand model to an operating model aligned with a compelling analytics vision that is consistent with the city’s strategy? How does the 4-year horizon that constrains City leaders impact the development of a new operating model? Can the City attract and retain the best talent or should it be resigned to seeing their best people leave to big tech companies?

(797 words)




[1] A. Fedyk, “How to Tell If Machine Learning Can Solve Your Business Problem,” Harvard Business Review, 25 November 2016.

[2] McKinsey Global Institute, “Smart Cities: Digital Solutions for a More Livable Future,” McKinsey & Company, 2018.

[3] M. M. J. Walsh, State of the City Address, January 2015, Boston, 2015.

[4] Citywide Analytics Team, “2017 Year in Review,” City of Boston, Boston, 2017.

[5] “Analyze Boston,” [Online]. Available: [Accessed 13 November 2018].

[6] A. Therriault, Boston’s Citywide Analytics Team, Boston: Social Innovations Journal, 2016. Available: [Accessed 13 November 2018].

[7] Driven Data, “Using Yelp Reviews to Flag Restaurant Health Risks,” Driven Data, [Online]. Available: [Accessed 12 November 2018].

[8] Driven Data, “Announcing the Results of Our Keeping It Fresh Competition,” 11 November 2015. [Online]. Available: [Accessed 13 November 2018].

[9] C. Wood, “,” 14 May 2018. [Online]. Available: [Accessed 11 November 2018].

[10] C. Wood, “,” 19 December 2017. [Online]. Available: [Accessed 11 November 2018].


3D printing in Automobile: End of Invention from 100 Years Ago?


Using Machine Learning to Optimize Hospital Operations

Student comments on Techs and the City: Data and Machine Learning in Boston

  1. I wonder if the Citywide Analytics team’s current reactive projects are a sort of “initiation tax” that they are required by the city to pay. That is, before city governments (which do not have the reputation as being the most innovative bodies in the world…) embrace a prospective analytics vision, maybe they first need to see the real results that such approaches can deliver in a reactive fashion (such as the health inspections). Hopefully, the passage of time and successful projects will allow the data science team to shift its focus to more prospective challenges. By doing so, the city would ostensibly increase the challenge and engagement for members of the analytics team, potentially providing a needed boost to employee retention.

  2. It’s great to learn about these initiatives being developed in Boston!

    I believe that reliable access to quality data will be the greatest barrier to successfully applying machine learning across a broader range of governance problems. While I am pleased to hear that the city has made a strong effort to add procedures to its collection and storage of key data elements, given that the analytics team has already experienced high levels of turnover I worry about the sustainability of these efforts. Given the long implementation time for initiatives within the government, it is especially critical to hire a team that can ensure consistency in the city’s data strategy.

  3. How does the 4-year horizon that constrains City leaders impact the development of a new operating model?

    This is a fascinating question that is very applicable to urban planning objectives. If a city is seeking ways of overhauling entire systemic issues, such as the examples you stated, a 4 year horizon can lead to a limited view. Funding is likely allocated based off of this time frame and projects of longer term, higher impact, and larger costs are overlooked. I agree with your recommendation to seek out a bottom-down strategy for planning purposes. I’d be interested to see which projects rise to the top of the list based off of importance!

  4. To your questions, it would be interesting to see as the future of work becomes more distributed (i.e., talent is not centralized within one organization and everyone is seen more as a contractor), if individuals with the skillsets needed for Citywide Analytics are able to perform this data science and engineering work sustainably for government without City leaders needing to actively attract and retain. I would be curious to see if the City can create a payment model for this type of contract work. While the 4-year horizon does constrain City leaders, more open innovation can hold new leaders more accountable because now they are part of and respond to a larger network (versus being more insular in previous development models).

  5. Interesting piece.
    I’d be interested to know more about ML applications for other urban areas. Would it make sense to try to apply the same algorithms to Boston city or at least learn what tasks could be efficiently solved by ML in this context? Would it be reasonable to create a national agency aiming to create effective and easily scalable solutions for urban areas with a similar profile?
    Talking about the risks of ML technology here I would be worried about ever-changing patterns in the city as people tend to constantly change and adapt their behavior to new rules, regulations, etc. Along the same lines, for the particular example about restaurants – wouldn’t it be possible for malevolent restaurant owners to game the system?

Leave a comment