Stuart Monk - Fotolia

London councils tackle housing fraud with data science

London's local authorities have a fraud problem with Houses in Multiple Occupation (HMOs) and are using data science to combat it, with help from Nesta, The Greater London Authority and ASI Data Science

If everyone in both Tyneside and the Tees Valley moved to London, you would expect a housing squeeze. Something like this has happened over the last two decades: between 1997 and 2016, the capital’s population increased by a quarter, 1.7 million people, while the supply of housing rose by just 15%, according to the Greater London...

Authority (GLA). More people are sharing accommodation with non-family members, from approximately 300,000 in 1996 to 470,000 in 2016.

This increases the likelihood of people living in overcrowded and, in some cases, dangerous rented accommodation. London’s borough councils are responsible for policing houses in multiple occupation (HMOs), defined as rented properties with at least three tenants who share bathrooms and kitchens who are not all related.

Landlords of HMOs are meant to pay for regular fire, gas and electricity safety checks as well as keep properties well maintained. Given the extra costs, many ignore the rules.

So in 2016 Nesta, a charity set up by the government which supports public-sector innovation, started working with the GLA and London boroughs to predict which properties might be HMOs, to help council inspectors work more efficiently. By the start of 2017, London-based company ASI Data Science had compared 40 datasets held by the City of Westminster with a list of known HMOs and built an algorithm that was 500% better than picking properties at random.

But as well as showing potential, the pilot work revealed problems. Initially, 15 boroughs were interested in taking part, but this fell to six – although two more have since joined.

“Because of data quality, data availability and, in some cases, capacity, not everybody was able to play the game we wanted, to apply data science to what is a common problem,” says Andrew Collinge, the GLA’s assistant director of intelligence. “We had a winnowing-down exercise.”

The GLA is planning to report on the work in the near future, but Collinge says boroughs with the ability to undertake such work are starting to see the potential of using data science in real applications. “That is, I think, real progress, because you have frontline workers who are taking the fruits of that data science exercise and applying them in their day-to-day work.”

Read more about data analytics fighting fraud

The pilot showed that only a small number of datasets are needed to generate effective predictions. “We can go to other London boroughs and tell them all they need to provide is four or five datasets to find HMOs,” says Collinge.

The key predictive datasets include council tax, electoral registration, benefit claimants, structural data on how many storeys a property has – and complaints, including on fly tipping. “Anything else tends to be white noise,” he says. A unique property reference system and a way to distinguish between types of property are required to join them up.

While a small number of datasets can produce good predictions, the data they contain has to be of a good quality. “It took us five months to get enough data of sufficient quality from the boroughs to carry out the data science exercise,” says Collinge. The exercise itself took only a few days.

One particular problem was that boroughs collect data in different ways: “We really have to work hard on applying common standards for data where there is a good case for doing so,” Collinge adds. “There was an initial set of patchy results, but what we’ve found further on in the pilots is more positive results in places like Barking and Dagenham.”

Missing local context

Pye Nyunt, corporate insight hub manager for Barking and Dagenham and a consultant with Agilisys, says that the original model from ASI unfortunately didn’t find any illegal HMOs in its area. This wasn’t because it was badly constructed: “What was missing was some local context.”

For example, the original data supplied appeared to suggest an HMO hotspot in a new development in the borough. However, they were actually flats that had been miscoded. “We essentially gave the model training data which was miscoded at source without us realising it,” says Nyunt. “Local insight would have allowed us to sense-check this and remove the miscoded data before it was analysed by ASI.”

Barking and Dagenham, which has its own data science team, is now validating the predictions of its current model against the addresses of illegal HMOs found by its own inspectors with good results, although it does not yet have complete figures. “We’d like to get to a point where our new data model churns out new addresses that our inspectors didn’t know about, and turn out to be new HMOs,” says Nyunt.

The council used eight datasets for its predictive model. Phil Canham, an insight and data scientist for the council, says that the useful ones include council tax band, as higher-banded properties were usually larger one that could be used as HMOs, as well as receipt of housing benefit and electoral registrations.

“What turns out to be a very strong predictive factor is side-waste,” adds Canham, when a council-issued bin is overflowing or waste is left by its side, something recorded by refuse collectors. “We also found there’s likely to be significantly more antisocial behaviour associated with HMOs. All of these things we can measure as a council.”

But he adds: “The data is incredibly imbalanced. We’ve got 75,000 properties in the borough, and to train our data we’ve only got 230 HMOs that we know about. That’s one of the biggest problems we’ve found so far.”

Cleaning data

Another issue is that data often needs cleaning – a general problem for local authorities. Canham says sectors such as financial services make extensive use of their data to tackle fraud, and so recognise the importance of accurate information.

“In local authorities, because there are so many sections we might get data from, historically there’s some disconnection between parts of the council,” he says, such as addresses collected in different formats.

“Data sets can be a lot more messy. You need to sort that problem out before you can really start successfully going through with a data model. Having said that, I think there’s a massive opportunity here for local authorities.”

Canham says it is vital to involve those who are familiar with the information: “The people who understand the datasets are the people on the ground,” as they can spot problems at an early stage. For example, HMOs tend to have a history of owner-occupation rather than having been council housing, and staff know this for their areas.

But such history wasn’t included in the original pan-London work: “They couldn’t do the sense-checking on the data, so it was potentially open to errors,” he says. “Without having an absolutely clear connection between the programmers, the people on the ground and ourselves in the local authority, it can be easy to make incorrect assumptions. It’s more of a communication thing – if that’s in place, everything can work a lot better and smoother.”

Involving professionals in such work is important for another reason, according to Pye Nyunt: “All of this work has got to make a difference to a decision maker – a policymaker, an operational member of staff, someone who is facing a resident. There would be no point of us doing this for the sake of research – it’s got to be actionable,” he says.

Tools for front-line workers

Both Barking and Dagenham and the GLA are enthusiastic about applying similar techniques in other areas. Nyunt says the aim “is to help staff, not replace them”.

“Our intention is always to give the people at the frontline the tools they need to do their jobs better,” he says. The council’s insight hub is now looking at how it can predict which vulnerable residents are in danger of moving into crisis, to attempt to stop this happening.

Nyunt adds that capital-wide work, such as that by Nesta, the GLA and ASI, is valuable. “Having them look at it on the holistic level, and each London borough having the skills and capability to add a local context to it, that’s when it becomes a really powerful model.”

London Office for Data Analytics

Nesta, the GLA and ASI set up the pilot as the first project of the London Office for Data Analytics (Loda). “Its mission will be to bring forth other forms of data from private sector organisations down to household-level data in areas such as energy,” says Collinge, as well as providing advice and guidance on data usage such as on the General Data Protection Regulation.

Loda will also undertake new data analysis projects, with Collinge seeing air quality and social care as areas with strong potential.

But he warns: “To have 33 boroughs which are essentially holding data on the very same services, but in different forms to different standards, is becoming no longer acceptable. You have tools like machine learning that are becoming increasingly sophisticated, and are there and ready to be applied to shared problems.”

“This shortfall of data quality and data supply is becoming an issue getting in the way of providing the 21st century local government services that people expect,” he concludes.

Read more on Business applications