March 15, 2012

Prediction is hard, especially about buses?

The bus prediction system is potentially a very good idea — research in other cities has shown that it reduces actual waiting time, and reduces perceived waiting time even more.

Unfortunately, the illusory sense of control over one’s fate that the prediction system gives is easy to shatter.  This morning it was a stealth bus: arriving when it was allegedly still ten minutes away.   On other occasions the waiting time counts down steadily to DUE and then disappears, indicating that a ghost bus has gone by.

Bus prediction involves some hard engineering problems: the buses need to know where they are, they need to be able to tell the central office, and the central office needs to be able to get the information to transit users.  Fortunately, the first problem is solved by GPS (and odometers), the second by packet radio, and the third by the internet (and text-message gateways).

What remains is mostly a statistical problem, and partly a problem in applied psychology.    The data are fairly straightforward: the system knows approximately where the bus was every couple of minutes into the past, and this needs to be projected forward.   The Seattle MyBus project did a good job of implementing a simple version about ten years ago:  the prediction is a weighted average between where the bus would arrive at its current speed and when it should arrive by schedule, and the weights come from a large collection of actual bus trips.   There’s actually a lot more information available — for example, it’s hard to predict from a bus’s performance along Manukau Rd how long it will take to get through Newmarket, but since there’s a bus along Broadway every few minutes the system potentially has up-to-date information on congestion and crowding.

My guess, based on the relatively high frequency of buses that apparently go backwards, is that the Auckland system is a bit over-optimistic about how fast a late bus can return to schedule, and isn’t using the congestion information.  It also doesn’t seem to know which incoming bus will be running each route out of the city, so the city-center predictions can be a bit useless.  The real problem, though, is how to incorporate this uncertainty when presenting the results.   Even when you’re using all the available information there’s always prediction error, and sometimes a bus will stop transmitting information either for a few minutes or for its whole route.   In that case, the system has to fall back on the timetable, but the Auckland system doesn’t tell you it’s done that.

OneBusAway, in Seattle, distinguishes clearly between real predictions and timetable predictions. It also does helpful things like indicating which buses have just left, and it doesn’t seem to have ghost buses or stealth buses.  It’s also based on an open data stream that anyone can use — both the real-time bus location data and the predictions are accessible to anyone who wants to set up an improved system or just write a better app.

 

 

avatar

Thomas Lumley (@tslumley) is Professor of Biostatistics at the University of Auckland. His research interests include semiparametric models, survey sampling, statistical computing, foundations of statistics, and whatever methodological problems his medical collaborators come up with. He also blogs at Biased and Inefficient See all posts by Thomas Lumley »

Comments

  • avatar
    Karen McDonald

    Thanks Thomas – stealth buses and ghost buses – reminiscent of Harry Potter’s ‘night bus’!

    12 years ago

  • avatar

    Wouldn’t this be a case where one may want to introduce bias? As a bus user I rather wait for a few minutes than miss the bus by a few seconds. After the earthquakes we totally lost the prediction system in Christchurch and then one realizes that some approximate predictions are better than none.

    12 years ago

    • avatar
      Thomas Lumley

      Yes, the optimal number to display may well not be the posterior mean. However, the Seattle system, which is just a Kalman filter, does use the posterior mean and is quite successful.

      12 years ago

  • avatar
    Steve Black

    There are more bus issues which I was made aware of when my wife tried to take a bus into town to go to the dentist recently. She knew that she needed to get a 10 AM bus to get to her appointment on time. She needed the full appointment slot because it was a big piece of work.

    So she walked to the local bus stop at the appropriate time (leaving a bit extra). At 9:47 she was across the street from the bus stop facing a red light, when a bus with the target route number sailed by on the green light.

    But which one? Was it a late 9:30 or an early 10:00? The user has no way to know because busses are not uniquely identified to them. So she waited until 10:17 at which time she called the dentist and said she would have to reschedule for another day.

    So how can you provide useful information to the bus user in this circumstance? In addition to ghost and stealth busses, there are vague busses. You can’t tell whether you missed your specific bus or not until substantial time has passed. Thus you can’t make an informed decision about what to do next.

    Did the Seattle system do something about uniquely identifying busses to users that we could learn from?

    12 years ago

    • avatar
      Thomas Lumley

      Yes. If you click on the link to OneBusAway, it gives a forecast for a stop near the University of Washington. You can see that each forecast has a ‘minutes late or early’ field, and is color-coded for early,late, or on time. If it’s a late 9:30, it will have a time to arrival of -1 minutes (because it just left) and ’17 minutes late’; if it’s an early 10am, it will have time to arrival of -1 minutes and ’13 minutes early’.

      The system has to know which bus is which, so it’s just a matter of passing that information on.

      12 years ago