Who owns transport-related data and who is prepared to share it? These questions, among many others, require resolution.
As transport grows increasingly ‘smart’, the role of data is becoming more pivotal. It’s no longer an adjunct to services or a ‘good way of keeping tabs’ on them, it’s central to the creation and operation of infrastructure, from roads to railways, and the services that run on them.
Data enables people to plan journeys and also allows operators and agencies to measure use, make forecasts and perform analysis.
It’s also vast and complex – the massive amounts of data about people using transport, transport infrastructure and transport systems and services often interacts.
New data and new types of data emerge from transport use, shifting from one category to the next.
Roads were once ‘dumb’ strips of tarmac. Their use is now shaped by smart traffic lights managed by traffic management systems, with dynamic speed restrictions and lane management on smart motorways. Individuals’ mobile phone data becomes an indicator of congestion.
Bus service data is evolving from a database of bus stops to real-time information about individual buses on the routes they operate. Touch in, touch out ticketing enables fare collection, but also creates data on passenger numbers.
Newer modes are entirely data-driven, interacting with travellers via apps and managed through digital systems.
And this year, when Covid-19 has made everyone’s lives more complicated, services have had to change and the information relating to them has become more complex and requires better communication. Managing and providing this data is not just an essential part of transport information, it has become critical to transport provision.
While it’s easy to see the value of data to providing a better transport system, the devil is in the detail.
As Google slowly integrates transport into its Maps function; it’s become a comprehensive default ‘go to’ for people looking up journey information – despite not being a specialist transport provider. While Google might not be everyone’s preferred app, it’s one expression of a universal vision of people being able to use digital platforms to book, pay for and use multiple modes of transport in a seamless facilitated journey.
Providing this, however, depends on many layers of data across transport infrastructure elements, service elements (and their back-office systems) as well as an individual’s data – whether they are the person planning a journey or one of the multitude whose journeys will affect the congestion and speed of services on the network.
Transport data depends on Government, organisations and individuals collecting and storing data. How that data can be used depends on who holds which data set, what standards are adhered to, which data is open and which is closed. It’s still, in some cases, moot who should hold which data set, which standards should be adhered to and which data should be open.
The underlying data – on which everything is built – is data about infrastructure. This includes roads and all the information pertaining to road traffic regulations (remember when sat-navs would direct people the wrong way down one way streets?) and also public transport infrastructure.
At the core of public transport data is the National Public Transport Access Nodes (Naptan) database. It’s a national database containing a unique entry for each point of access to public transport: railway stations, bus stops, airports, ferry terminals, coach termini, taxi ranks and so on. In short, it’s an index of every place where a passenger can join or leave public transport. Naptan contains about 500,000 records, the vast majority of which are bus stops.
The increasing complexity of transport provision was perhaps not envisioned when Naptan was created. The Department for Transport (DfT) is starting to appreciate that it will need to change to still be useful in the future. As a simple example, analysis run in 2020 has revealed errors in the accuracy of its location data. An error of a few metres for a bus stop doesn’t create issues for most passengers. But any future application requiring very accurate data, such as connected autonomous vehicles (CAVs), will not work until accuracy is built in at a much more granular level.
There’s also the issue of missing data. While new services run on data (scooter share, for instance, is completely digitised, down (in some instances) to geofenced areas out of which the scooters stop functioning) some of the more traditional transport and infrastructure has yet to catch up.
As an example, there’s no pavement data set – which means that walking route instructions are not necessarily adequate for journeys, as anyone that has followed Google Maps instructions only to find themselves directed to cross a six-lane motorway junction round-about, can attest.
Likewise, cycle infrastructure is digitally quite uncharted (outside London) meaning active travel directions are untrustworthy and inadequate. This is not just an oversight but an omission that makes a nonsense of the transport hierarchy – supposedly with walking and cycling at the top – when it comes to digital routing.
This omission reflects the fact that data has generally been created when people have seen an opportunity to derive value from it – either by monetising it directly or by saving money through optimisation.
Mapping parking spaces and, indeed, the kerb, has been one of the former. By creating a dataset of parking spaces which can be rented out via an online interface, companies like JustPark use data to monetise parking spaces (and take a fee from transactions).
In contrast, data-driven smart motorways use data to manage capacity and obviate the need to build extra lanes of motorway, saving money. Street Manager has digitised roadworks to enable better coordination.
The absence of data can create additional cost bases. Poor data for addresses limits the provision of services, slows deliveries and leads to additional mileage for logistics companies and service providers.
When it comes to mapping data, granular data for addresses, roads, street furniture and signage will become increasingly necessary for the operation of autonomous vehicles (AVs). And the ownership of that data will also be in contention.
Services and system data
The next layer, built on the infrastructure, is formed from services and systems that keep the country moving. We have become used to rail timetables and ticketing being available through multiple apps and interfaces (rail regulation meant that the rail dataset, held by Network Rail, enables operators and others to provide complete rail timetable and travel information to apps and information services, and to sell each other’s tickets).
Buses, however, have been some way behind. Regulation, commercial considerations and differing attitudes have meant that different operators have provided data to different extents prior to the advent of the Bus Services Act 2017 which mandated that operators provide open timetable, fare and location data.
Over the past two years the Bus Open Data Service (Bods) has been developed to enable the supply of bus service and fares data. Bus service data is more complex than rail – with hundreds of thousands of bus stops and services which interact with local authority requirements – meaning that off peak times (and associated fares) often change as they cross county boundaries. And, in some cases, multiple operators run buses along the same routes – requiring deduplication. Additional layers of complexity have been added with Covid-19 regulations and tier changes as buses cross county boundaries.
While Bods sets the scene for greater sharing of data, there is still reticence about integration of data. Transport for London (TfL) has provided an openly accessible integrated API feed of all its public transport services for some time. Commercial concerns in the deregulated bus market and between the information providing apps whose business models could be called into question come into play which means achieving something similar nationally is more difficult.
“There’s Mexican standoff between a variety of different data owners. It’s a poker game in which no one wants to be the first to show their cards”
Jonathan Raper of leading transport API provider TransportAPI
The present crisis in public transport means everyone in the sector is fire-fighting. While fewer people are travelling (generating fewer fares), services are still necessary, but have to adapt to the circumstances. High quality information about a volatile and changing service is imperative. In this climate, data cleaning and validation has become critical.
While there is progress in the standardisation and openness of data for traditional public transport there is less formality in the shared mobility sector. However, it is arguable that for a true mobility as a service (MaaS) system to exist, data from novel mobility providers will need to be of equal quality to the bus and rail networks. This exposes a ragbag of systems with some operators remaining proprietary and others working to open standards and providing APIs.
Some taxi companies are still paper-based – but those using bigger software platforms to manage their businesses are aggregated within platforms like Taxicode and can then be integrated within API ‘feeds’ such as TransportAPI.
Major European bike-share operators such as Nextbike provide open APIs, while others have proprietary platforms.
Car clubs are also diverse in their approach leading to local authorities calling for car clubs to provide four standard data sets describing their fleet and conforming to the Car Club Local Authority Data Standard (Clads) to enable local authorities to understand the location, use and impact of car clubs.
The advent of scooter share trials has presented an opportunity for local authorities to demand data in return for licensing in order to understand the use of scooters – modelling their approach on the Mobility Data Specification used in Los Angeles. Providing APIs that allow people to find, book and pay for scooters within multi-modal apps enables new mobility to be part of the broader transport network.
Los Angeles has required that all scooter companies provide their data in real time using their Mobility Data Specification. This is an open source specification with full documentation available on Github. It is described as:
A data standard and API specification for mobility as a service providers, such as dockless bike-share, e-scooters, and shared ride providers who work within the public right of way. Specifically, the goals of the Mobility Data Specification (MDS) are to provide API and data standards for municipalities to help ingest, compare and analyse MaaS provider data.
The specification is a way to implement real-time data sharing, measurement and regulation for municipalities and mobility as a service providers.
It is meant to ensure that governments have the ability to enforce, evaluate and manage providers.
As people have realised the extent of data harvesting and use by social media, anger and antipathy towards the collection of data relating to individuals has grown. This antipathy has the unfortunate effect of creating a general distrust of the use of any personal data.
When personal data blends with transport service data – as it must in many ways to enable the move away from cash services to data-driven ones (from mobile tickets, tap on, tap off ticketing and, indeed, MaaS offers) it provides powerful information about the use of services.
And with respect to origin and destination searches in journey-planning tools, another powerful dataset is created from individuals’ actions. In public transport, the ability to understand the journeys people want to make would be invaluable to planning and improving services. At present, journey planners don’t share this with authorities.
A debate about who owns this data is timely.
We have a situation where many of those that collect data think that the data belongs to them – and individuals are becoming increasingly unwilling to give up their data.
Martin Howell of Worldline thinks a better balance needs to be struck.
"People to whom the data pertains have at least some interest in and part-ownership of it – it’s not just owned by the people who have captured it.”
Martin Howell of Worldline
The most widely appreciated use of data is probably congestion alerts in Google Maps and other sat-navs – which is based on data produced by mobile phones to show the density and moving speed of traffic.
There are some indications that people understand the benefits of sharing data in Assurant’s Connected Decade report, published in December.
This survey of consumer attitudes found that the majority of people were comfortable with the technical status and operating health of their vehicles being shared, and also to sharing data around external traffic, weather and road conditions. However, people were less happy to share data that could identify them, their driving habits or share their location in general. Understanding these concerns is essential.
Howell says: “ We need a bond of trust when it comes to transport so people are happy to give up data because it’s not being abused or sold on or used for marketing.”
A recent report for the DfT by the British Standards Institution set out the need for system interoperability and standards to enable transport data to be shared between services or to enable future mobility systems such as MaaS.
While there are some well-used standards which allow widespread publishing of transport information and, indeed, ticketing to occur, there are also many proprietary systems used which keep data within their organisation or app. This is partly because it’s unclear how the market will evolve.
The most comprehensive dataset of scheduled UK public transport is compiled and kept by Traveline – the Traveline National Dataset (TNDS) – which it makes available to third parties under the Open Government Licence. It contains public transport timetables for bus, light rail, tram and ferry services in Great Britain. It is compiled from local data in the TransXChange format (the UK nationwide standard for exchanging bus schedules and related data).
In addition to TransXChange, data is often supplied to conform with the General Transit Feed Specification (GTFS ) or Google feed as it enables transport data to appear on Google Maps. GTFS enables public
transit agencies to publish their transit data and developers write applications thatconsume that data in an interoperable way.
Currently, Traveline transforms its datasets into GTFS to enable UK transport options to feature on Google Maps. It’s useful to note that while Google is happy to consume the transport data provided, it does not provide any reciprocal reporting on how it is used by people searching for transport options on Google.
Data standards and APIs
Besides these standards, there is an increasing use of provision of APIs to enable data about transport to be used by apps and other services.
TfL’s unified API is based on the unification of the data for modes of transport into a common format and structure. Historically, TfL shared the data for each mode in different formats and structures – requiring developers to write code for each mode of transport.
The unified API presents all the data that is semantically similar for each mode of transport in the same format and consistent structures. This enables developers to write once and access all of the same types of data across all the modes of transport quickly, making multi-mode application development easier.
Open vs closed
The issue of open vs closed data is complex. Companies collecting their own datasets can argue it’s commercially confidential. However, where transport use of one service impacts others or a seamless journey requires data from more than one operator, this commercial confidentiality has negative consequences and people find the system too complex to navigate if they have to engage with multiple operators.
There’s an argument that providing data grows the market for transport. The data facilitates ease of use – and the more convenient something is, the more likely it is to be used. However, everyone at the moment has their idea of the business model that will eventually provide for simple user-friendly MaaS.
TfL has adopted an open data approach to its service data and encourages software developers to use its API.
While research into the business impact of increasing and improved data use is in its infancy and methodologically challenging, the existing evidence suggests wide-ranging economic benefits arising from better data use, in particular an association between efficiency, productivity and data-driven business practices.
There are also significant economic advantages from individual companies increasing data access and sharing. For example, TfL’s opening up of its data sets to travellers and third-party providers contributed up to £130 million per year to the London economy through time saved by travellers.
London is a particular example. Its regulated transport system means that bus, rail and underground operators are not competing commercially at the consumer level.
TfL can decide what information to provide for the greater benefit of the economy and Londoners’ access to services.
While the move to provide open datasets can open up markets and improve the economy, providing open data is not free. Maintaining the database infrastructure, cleaning data and providing robust services that stand up to large scale simultaneous searches carries a cost.
Working out stable business models for this is still in progress. Companies like TransportAPI and ITO World that clean data and provide API ‘feeds’ to enable information to be served through apps are commercial concerns.
At present their services are paid for by operators or services like Traveline so that high quality service information is robustly available on apps and to Google searches.
Other data users are still working out how to facilitate the fair balance of payment for provision across multi-modal apps. Start ups like MaaS Global rely on creating a new market and hoping that their pricing model derives sufficient margin for the profitable operation of the app on top of fares and charges and other costs.
Moovit, until recently focused on creating a customer interface and capturing a market with their own brand of provision has announced it will be working with the DfT to help bus operators make information about services available through Bods while enabling developers to add that information to their apps and products.
The uses of transport data are myriad and the potential is powerful.
Data is not a magic free asset, however. There needs to be thought on how it is best used to enable better transport planning, a seamless experience for people and to better reflect the transport hierarchy.
There is increasing recognition of this. A consultation recently closed on the National Data Strategy – of which transport is a relevant part. The resulting report in expected to set out how data can be joined up across government and used to help wider economic benefit.
It’s perhaps time to bring the transport data community from its disparate corners to work more closely on getting data out of siloes and unlock its benefits.
At various times, this has been achieved. The establishment of Smartex for the smart payments and media community is one example, the creation of public domain designations of standards or implementations is another potential approach.
Government – across departments and at the DfT recognise that there is a huge potential prize if siloes can be broken and data harnessed to provide better information to individual travellers and to those who plan and provide services to enable economic resilience and sustainable communities.