Saty Bhat - stock.adobe.com
Transport for London (TfL) has officially turned on wireless tracking of customer devices moving across the London Underground system as it sets out to gather data on how the public moves around the city. It hopes this will bring it new levels of insight into how people use the Tube, helping it to address issues such as overcrowding at peak times.
The data is being drawn from devices entering the system that have Wi-Fi enabled in their settings. Such devices will always be searching for a usable Wi-Fi network by pinging the routers used to provide Wi-Fi connectivity at platform level with their media access control (MAC) address.
By taking the MAC addresses of devices that have previously been authenticated to use the Virgin Media-run Wi-Fi network and depersonalising them to ensure that individual people cannot be identified, TfL can gain near-instantaneous visibility of how the devices and their owners are moving around beneath the streets of London.
“The benefits this new depersonalised dataset could unlock across our network – from providing customers with better alerts about overcrowding to helping station staff have a better understanding of the network in near-real time – are enormous,” said TfL chief data officer Lauren Sager Weinstein.
“By better understanding overall patterns and flows, we can provide better information to our customers and help us to plan and operate our transport network more effectively for all.”
The project has been more than two years in the making. During a pilot phase conducted in December 2016 and reported on extensively by Gizmodo, TfL collected millions of data points from devices moving around just over 50 of its stations, mostly Zone 1 stations in central London, which are generally mostly underground and therefore generally see the heaviest usage of the Wi-Fi service.
It later produced a number of infographics based on the data it had collected, showing how people moved around its stations, where they waited on platforms, how long it took them to change trains, and so on.
It also revealed details of how people moved around the Tube network as a whole. For example, TfL discovered that when people are travelling between mainline hubs King’s Cross and Waterloo, 32% of them will first take the Victoria Line to Oxford Circus and then change to the Bakerloo Line to complete their journey, and 27% will first take the Victoria Line to Green Park before changing to the Jubilee Line.
A handful of people, 0.1% of the total studied, managed to change trains four times during the relatively simple journey, going via Liverpool Street, Bank and London Bridge.
Besides generating new insights for its station staff and forward planning, TfL hopes to put the data back into the hands of passengers to help plan their journeys to avoid congestion and delays. This means the data could also be used by organisations external to TfL, such as the makers of transport app CityMappr, through TfL’s existing free open-data API (application programming interface).
The data is also likely to be used to enhance TfL’s commercial revenues. By gaining a better understanding of passenger flows, the organisation will be able to “highlight the effectiveness and accountability of its advertising estate based on actual customer volumes”, meaning it can charge advertisers more for plum locations.
TfL has worked alongside the Information Commissioner’s Office (ICO) to design and implement a fit-for-purpose security regime to depersonalise and protect the location data it gathers.
An ICO spokesperson told Computer Weekly: “Organisations processing people’s Wi-Fi data must inform them clearly that they are doing this. They must also avoid excessive data collection and take steps to reduce the risk of identifying the people whose data they have collected. We have discussed how the law around protecting personal data relates to Wi-Fi location analytics with Transport for London.”
Read more about privacy and data protection
- British Airways is to appeal against a record fine for infringement of data protection rules for a breach of customer data in 2018.
- St John Ambulance’s response to a recent ransomware attack demonstrates that it is possible to ensure minimal disruption if properly prepared.
- Facebook promised its users privacy then quietly abandoned its promises in pursuit of profits. Now it faces antitrust regulation.
TfL’s Weinstein added: “While I am excited about the potential of this new dataset, I am equally mindful of the responsibility that comes with it. We take our customers’ privacy extremely seriously and will not identify individuals from the Wi-Fi data collected. Transparency, privacy and ethics need to be at the forefront of data work in society and we recognise the trust that our customers place in us, and safeguarding our customers’ data is absolutely fundamental.”
To this end, TfL has undertaken to collect only the MAC address, which is then hashed twice, once with a single value and the second time with a unique random value, giving continuity to the data without needing to record the MAC address itself. It will not collect web browsing or cookie data.
The depersonalised, or pseudonynmised, data will then be encrypted in order to prevent identification of the original MAC address and device. This data will be stored in a restricted area, with access governed through industry standard authentication methods, and then only to a highly restricted group of individuals, all of whom will have to recomplete TfL’s privacy and data protection training procedures every year.
The encryption keys, meanwhile, will be held in an industry standard program for secrets and key management, with an industry standard one-way hashing algorithm used to protect them.
TfL said it recognised that there might be scenarios where the police or intelligence services might ask for access to the data it holds as a result of the exercise, and that these requests would be dealt with on a case-by-case basis to ensure any disclosure was lawful and necessary, either to prevent or detect crime, and/or to prosecute offenders.
However, it noted, because of the hashing processes, “we would only be able to disclose pseudonymised data”.
Sue Daley, associate director of technology and innovation at TechUK, said: “The transparency shown by TfL around the data collection taking place and the steps taken to make customers aware of its purpose is welcome and should be seen as an example for others. If we are to realise the full potential and value of real-time data, it is vital that we bring the public on this journey and build a culture of data trust and confidence.”
Despite the reassurances given, people who do not want to take part in the exercise are advised to turn their device Wi-Fi off, or activate their device’s flight mode when entering the Tube.