Bikes, Data and the Crowd

The value of bike sharing services like Hubway heavily depends on bike availability at each of their stations. But how are they able to predict when, where, and how many bikes should be relocated to optimize their network? The solution lies in the data.

The huge potential of data analytics changes not only the processes of individual companies, but also bears the potential for massive improvements for urban infrastructure and the way we live, work, and commute in our cities. Data analytics can be used to efficiently plan the (public) transportation networks of cities like Boston, and helps to tackle the complexity that lies within these networks: the high connectedness, dynamic commuting patterns of the population, and a huge variety of externalities (weather, special events, etc.). Interestingly, it seems that especially companies introducing new transportation concepts without an existing infrastructure are benefitting from data analytics

The case of Uber’s ridesharing service is probably well known, but also local services like Bostons bike sharing service Hubway stationsare able to use data analytics. Hubway enjoyed a rapidly increasing popularity within the last 3-4 years, as they create great value for both tourists and local commuters: riding the bike is usually cheap, healthy, environment friendly, and often even faster than commuting in a stifling, old subway. And most importantly: Hubway offers the convenience of over 140 bike stations in the Boston area that can be used to pick up or drop off one of their bikes (see map on the right. The value of all these benefits is captured through either a monthly subscription fee or through a (significantly higher) fee for one-time users.

empty stationHowever, this value heavily depends on bike availability: if you want to bike to work in the morning and the rack is empty, this leads not only to lost revenue for Hubway, but also quickly decrease customer satisfaction. Given the rapid growth of bike sharing users it is a major challenge for Hubway to continuously guarantee the availability of bikes and to avoid empty stations like this one on the left. They do this by using trucks that are in operation 16 hours a day to relocate their bikes between different stations. But how are they able to predict when, where, and how many bikes should be relocated to optimize their network? The solution lies in the data.

Hubway collects a variety of features for each trip with one of their bikes. These include a timestamp, start/end station, bike ID, subscription type (registered or “tourist”), and some other user-related data (gender, ZIP-code, etc.). In 2014 they observed 1,192,805 trips in total. Although I have no detailed knowledge about their internal prediction models, it is obvious that this dataset offers an immense potential for data-driven predictions to optimize Hubway’s business. A couple of questions that Hubway can address by applying statistical models to the dataset:

  • How do the commuting patterns in the Boston area look like? Which stations are affected by a significant imbalance of inbound/outbound traffic in the morning/evening?
  • How does the time and weekday influence demand? What’s the effect of holidays or special events?
  • How does the weather influence the network?
  • What’s the optimal policy for the bike relocation truck for a given day?
  • Where should Hubway extend their existing locations or build new stations?

To give you a brief idea about the dimension and potential of the Hubway data, I created a simple dashboard using their publicly available datasets from 2011-2013. Feel free to play around with it – it’s interactive! 😉

Hubway didn’t just use their internal capabilities to analyze the data, but soon discovered new creative way to leverage their data wealth: They used the crowd to get additional insights. As there are many people that are immediately benefitting from improved local transportation services, the nature of Hubway’s business seems to fit perfectly to a crowdsourcing approach. In 2013 and 2014 Hubway therefore set up a public data analytics challenge to visualize and analyze their dataset, which they made available to the public.  The challenge resulted in a huge number of submissions and stunning visualizations.

Is that all there is? I guess not, and there are many different options to further work on a data-driven bike sharing future in Boston. Potential improvements include dynamic pricing models to reduce the imbalance of the demand, the connection with other public transportation services (Internet of Things), or further extensions to their (already very good) mobile services for smartphones and –watches.


Netflix : Leveraging the Power of Analytics


Waze – using driver data to build a better map

Student comments on Bikes, Data and the Crowd

  1. Interesting post David! I never knew that they actually move bikes from one stop to another using trucks, but that makes a lot of sense. I think that the data collected by Hubway is valuable, but that still does not include the actual routes taken by people while they are biking, and I wonder if that would increase the value of analytics for them. Currently, Hubway has two data points at the start and end of the trip, but probably does not know what happens in between, which could teach them a lot about the demand side and allow them to identify potential new stops that have high demand of stops (if someone stops the bike somewhere else and not at one of Hubway’s stops). I have never taken one of their bikes because I find it expensive for a one time use, so I agree with you that they should use different pricing models and promotions to push people to take a bike from a stop that has many bikes. This might also decrease the need for them to relocate bikes using trucks, which I imagine is very costly. It would be valuable for them to understand how much money they spend to move bikes from one place to another, and what other things they could be doing with that money to increase customer base or number of trips per person through advertising and reduced pricing. They could also use people to relocate their bikes, just like Uber uses drivers and their cars, but in this case, Hubway owns the assets, but can reward a customer with a free bike ride if they can relocate the bike for them. Of course, this becomes tricky as it will be hard to distinguish between a customer willing to pay for the trip and one who is not, and hence only giving the offer for the latter to have a free ride to move the bike for Hubway.

    1. Instead of being free, they could do some sort of discount program for the unpopular directions of the bikes. I’m thinking about companies like Jetsuite that do this for jet repositioning- where people can buy seats on a private plane that needs to be moved from one city to another at much cheaper prices than the normal rates. Something similar to that could work well for Hubway.

  2. Thanks for this post. I have a bike so I’ve never thought about the fact that empty Hubspot stations can be a source of frustration for loyal users. I wonder if they’ve considered creating a booking system (like Zipcar) that would give people certainty that they would have a bike when they know they need it.

  3. Thanks for the post! I agree that data can help Hubway better manage supply/demand and keep users satisfied. I think your point about connecting with other public transportation is a really important potential improvement, as I would guess that at least some people use Hubway as only part of a commute.
    Another thing that would be useful (not sure whether this already exists, as I don’t use Hubway) is providing real-time data to users on bike availability (and, alternatively, whether certain stations are full and can’t take any more bikes). Some other ride-sharing systems in other cities have this data available, and I’m guessing it makes it easier for people to choose a station, particularly if there are a few nearby.

  4. Great post! I’ve never used Hubway so found it really informative. Also loved the Tableau visualizations!

    I agree with Sara around the missing data around actual biking routes. I feel like getting a sense of most common routes and how traffic patterns affect how long someone has a bike out could actually help Hubway better suggest routes for their bikers and/or indicate during what time of day a user is most likely going to find a bike available at their preferred Hubway location. I’m not sure if Hubway has a mobile app, but maybe users could enter in their trip information ahead of time (pick up and drop off locations), and Hubway could provide suggestions on when they should plan to travel and how long commute times are likely to take based on data from other users. While this doesn’t necessarily change how Hubway is internally managing its bike inventory across locations, it might help manage user expectations around bike availability.

    1. Really like this post! I use Hubway quite a bit and to answer to this and the previous reply their mobile app is extremely helpful. It helps you locate the closest station and informs you about the availability of bikes. You can also check if there is available space at your drop-off location. Before using the mobile app, I was extremely frustrated when I arrived to my destination and discovered there was no space to leave my bike.

      The app has access to your location (since it suggests where the closest station is) but they probably can not track your location during your trip. They could consider enabling location tracking during the course of the trip and learn what are the fastest routes. They could then integrate a direction system guiding you to the next station through the fastest route. This would help them by increasing the utilization of each bike.

  5. Really enjoyed reading your post! I had no idea that Hubway was using so much data to optimally locate their bikes, though should’ve assumed as much. Thought your ideas on other statistical analyses that Hubway could do were spot on. I wonder if Hubway will ever go a step further and use individual usage patterns to either provide promotions for those users whose usage have lagged or partner with other locations such as movie theaters or restaurants to provide a discount for those who ride a Hubway to those locations.

  6. I have been using Hubway and also similar services in other cities and having enough bikes and docks is always the most frustrating problem. I definitely agree that this type of service especially relies on data analytics and combined with the operations and scale of street usage permits, these services are usually run by the city government, so digging into this data could possibly save us some tax dollars! What I find really interesting is the data visualization through Tableau that you’ve shared. In some stations, like the one close to SFP, demonstrated a corresponding curve in both inbound and outbound with the same peak times, while some other sites like the MIT Stata center shows opposite peaks for inbound and outbound. My assumption here is that at stations with corresponding peaks, Hubway wouldn’t have to pay too much effort in managing the bike numbers because bikes come and go at the same rate, the bike amount at the station should be constant. It is the stations that have peaks of inbound when there’s no outbound traffic or the reverse that needs trucks to arrive and make adjustments at the set time. Very interesting that there would be this kind of contrast between sites, I believe it has a lot to do with whether the area is residential or commercial zones.

  7. Really cool post! And totally agree with you that there’s probably some low hanging fruit for Hubway, especially with the dynamic pricing. Agree with Carina above that they could benefit from a zipcar “booking” system — in addition to driving additional customer value, it would probably allow for much more scale and efficiency with data collection as well.

  8. Very informative post – and amazing dashboard! That is a great example showing that data analytics can help government or general interest organizations to improve common good. As more and more data is being collected by government, analyzing it and drawing results leading to informed policy decisions would be one of the greatest achievement of the “big data” era we are currently in. That is the reason of the existence of websites like . Nonetheless, I would be curious to know how public leaders, who usually have a say on the way these bike-sharing programs are run, would react facing the results provided by data analytics: would they go along with decisions in line with what data results show? Or would they do otherwise? The power of numbers vs the power of politics….

  9. Great post David!
    I wonder how data can be used not only to move bikes around but also optimize trucks time and fuel and decide on locating additional stations in totally new locations.

Leave a comment