Way mo’ miles, way mo’ data

'Self-driving cars' may well be a misnomer, because data drives them. 🚗🚗🚗

As the pioneer in self-driving technology, how does data set Waymo apart? Can their data capabilities allow them to tip the self-driving market?

‘Our cars can see further, perceive better, and make snap decisions faster than anyone else’

This was a quote by Waymo’s CEO, John Krafcik, at the company’s I/O conference last year[1].

They say they are ‘building the world’s most experienced driver’[2]. As Waymo – formerly Google’s self-driving car project, ramps up its self-driving capabilities, its success story illuminates the direct correlation between the use of big data and market leadership in autonomous driving.


1000000000000000000s of bytes of data…

Waymo released an open dataset two months back. Each tarball (compressed data set) is around 1GB in size, and constitutes data acquired by the sensors on a single Waymo car in 20 seconds[3]. After all, Waymo cars are extensively equipped with lidar, radar and camera sensors in no fewer than 25 different places[4].



Waymo’s cars have clocked over 10 million miles together[5]. A quick back-of-the-envelope calculation reveals that this translates to over 20 petabytes of data generated by the cars’ sensors.

But there is more…

Waymo runs several simulations each day to compound the enormous mass of data they collect from real-life cars. These virtual cars reportedly clocked over 3 million miles each day, even as far back as in January 2016[6]. Even assuming that this number has not increased over the last three years, Waymo has generated several *exabytes* of data in the last few years.



Why data?

A few years back, Brian Krzanich, the CEO of Intel, rightly hypothesized that ‘data is the new oil’ in the world of self-driving cars[7]. Waymo and other companies in this space have constantly shown how expertise in big data collection and synthesis translates to higher-fidelity autonomous driving. Since the first company to go to market with a proven safety record is likely to capture a significant portion of the multi-trillion dollar autonomous vehicles market[8], the importance of big data in the self-driving space cannot be overstated.

Krzanich has classified the various data collected by self-driving cars (both real and ‘virtual’, like the simulations by Waymo) into three categories –

  • Technical data, or the data collected by the cars’ sensors pertaining to driving hazards, road conditions and other environmental parameters
  • Societal data, or the collectively generated data through crowdsourcing
  • Personal data, or data specific to the user

In the absence of common data-sharing protocols and standards, most of the data leveraged by self-driving car companies is technical. With certain exceptions, companies have been very protective of the data they generated, and have only publicized small segments of data to spur research and open innovation[9].


How data?

Although several self-driving car companies have different data capture techniques, the basic tenet is the same – all data is good data. With lidars, radars and cameras capturing visual data, most of the differentiation in data capabilities is in calibrating, synchronizing, cleaning and processing these disparate data streams. Waymo’s processing is largely built in Google’s data centers, and Waymo’s history of being incubated within Google X has contributed to many of their strengths in this area[6].

This tutorial by Waymo accompanied the dataset they released recently,  and gives us a glimpse into the tech stack they employ – https://github.com/waymo-research/waymo-open-dataset. As we may have guessed, they use TensorFlow for machine learning, and TPUs (Tensor Processing Units) designed by Google to optimize ML/AI applications[10].


Does it work?

Waymo is leading the safety charts, and is comfortably ahead of its competitors in the most commonly tracked safety metric – miles per disengagement. Even better – this metric is rapidly trending upwards for Waymo, and has even empowered their attempt to go to market in a controlled way – the Waymo One.




Waymo One is Waymo’s self-driving taxi service launched in geofenced areas to early adopters of Waymo’s technology[11]. This pilot will likely continue generating technical data, but is also an attempt by Waymo to find ways to leverage personal data to make the user experience richer. Waymo One will also help Waymo evaluate the viability of this business model, and better understand mechanisms to capture the value they are creating.


Many answers, but way mo’ questions?

As is said of many other disciplines, driving is not a pure science – it is art. Human intuition and perception are key to safe driving. Although the data capturing and processing capabilities of Waymo’s cars are significantly superior to human senses, it remains to be seen if data-driven self-driving cars can handle unforeseen situations as well as humans can. Several videos of Waymo cars that have surfaced on social media suggest that Waymo’s cars may be unnecessarily defensive in certain situations (such as merging into busy traffic), and cannot perceive oft-used cues that human drivers take for granted (such as eye contact with a pedestrian when they are crossing). The advent of self-driving also introduces myriad socio-ethical conundrums, such as the trolley problem, which cannot be resolved with data.

The case of Waymo brings to mind Immanuel Kant’s famous adage – ‘Perception without conception is blind; conception without perception is empty[12].



References –

  1. https://www.theverge.com/2018/5/9/17307156/google-waymo-driverless-cars-deep-learning-neural-net-interview
  2. https://waymo.com/
  3. https://waymo.com/open/data/
  4. https://spectrum.ieee.org/cars-that-think/transportation/self-driving/waymo-opens-up-part-of-its-humongous-selfdriving-database
  5. https://medium.com/waymo/waymo-open-dataset-6c6ac227ab1a
  6. https://medium.com/waymo/reliving-the-past-how-these-data-centers-drive-us-three-million-miles-each-day-49a8695e8c75
  7. https://www.networkworld.com/article/3147892/one-autonomous-car-will-use-4000-gb-of-dataday.html
  8. https://www.theverge.com/2017/6/1/15725516/intel-7-trillion-dollar-self-driving-autonomous-cars
  9. https://www.technologyreview.com/f/614211/waymo-is-going-to-share-its-self-driving-databut-its-still-not-enough/
  10. https://www.electronicdesign.com/industrial-automation/google-puts-out-third-generation-tensor-processing-unit
  11. https://medium.com/waymo/waymo-one-the-next-step-on-our-self-driving-journey-6d0c075b0e9b
  12. https://royalsocietypublishing.org/doi/full/10.1098/rsta.2016.0153


Devil in data. Monsanto


Esri and ArcGIS

Student comments on Way mo’ miles, way mo’ data

  1. Great article. Waymo -as others players in the autonomous car race – gathers enormous amounts of data each second. As you indicated, it is likely that the differentiator for a potential “self-driving” car market leader will not be in the amount of data gathered but in the processing of it. While the storage and processing of all this data need to be in data centers in remote locations, I wonder how much the whole industry can develop before other technologies advance such as quantum computers or at least better data communication such as 5G.

  2. This was a fascinating read!

    Tesla is their competitor and they are taking advantage of the cars they have on the road by collecting real-world data about how those vehicles perform (and how they might perform) with Autopilot, its current semi-autonomous system. As mentioned in your blog, Waymo seems to be leading in the safety charts. What do you think are the reasons for this? Is it just the amount of data points and type of data they have collected?

Leave a comment