HMRC uses Hadoop to tackle corporate tax avoidance

HM Revenue & Customs (HMRC) uses open source technology with a Hadoop NoSQL big data engine to analyse tax filings

HM Revenue & Customs (HMRC) is driving the use of open source technology with a Hadoop NoSQL big data engine to analyse corporate tax.

Government austerity measures have driven HRMC costs down by 20% over the last four years. The organisation is committed to reduce costs by another 22% over the next four years.

Addressing delegates at the Open Source Open Standards 2014 conference in London, Mark Dearnley chief digital officer of the HMRC, said open source software was a great way to change the dynamics of how software is developed.

Analytics shines a light

According to Dearnley, analytics offered among the biggest opportunities for the use of open source software at the HMRC. He said: "Analytics is the first area where open source software has led the thinking."

Working with its system integrators, HMRC has developed a macro enterprise data hub, built on Hadoop. Dearnley said: “Open source software is more cost-effective. It drives the commoditisation of infrastructure and use of software and drives a different delivery model, which is massively more cost-effective.”

Corporation tax compliance is another example of Hadoop at HMRC. In the UK, companies need to submit tax returns electronically in the iXBRL format specified by HMRC.

HMRC single tax account strategy

The HMRC takes £1bn revenue on an average day to keep the UK economy running. It took £476bn in revenue last year, has 38 million customers and revenue is growing 4% annually. But there is a £35bn tax gap, which – according to Mark Dearnley, chief digital officer HMRC – makes the business case for anything that improves compliance.

The organisation uses traditional means of communication, receiving 70 million inbound letters and 73 million calls year. It also sends out 343 million forms and 240 million letters.

To modernise the paper and manual processes, Dearnley is developing a multi-channel strategy for the HMRC based around a single digital tax account with different versions for consumers and businesses. He said: "We want HMRC to be like working with your bank. Whatever you want to do with your tax account, it is easy, it is simple and you know where you stand."

His ambition is to join up telephone contact, scanned mail and online activities  to deliver a single view of HMRC customers, and connect this to back-end tax engines. With all the data it collects, he believes HMRC will be able to simplify online tax returns and tackle the £35bn tax deficit.

For instance, he said: "We know how much you earned in a year. We know how much bank interest you have got, we ought to know if you are a member of a professional organisation and the fees should be a known amount."

Dearnley said it took two and a half months to develop a complete Hadoop stack and load in all the corporation data, allowing tax officers to start analysing company tax returns. He said the users were impressed by how fast IT delivered and the speed with which they could get value.

While using Hadoop for analytics has proved the value of open source software at HMRC, he said his ambition was to create a level playing field for open source software: "At the moment the pendulum is a bit too far, the other way."

Open source opportunity

HMRC runs 5,000 servers but only 3% run Linux. A quarter of its systems are virtualised, mainly on VMware, and it runs 3% of its system in the cloud, he said – implying a substantial opportunity to deploy open source technologies in HMRC's infrastructure. Of the 500 enterprise applications at HMRC, Dearnley said 95% were based on proprietary platforms.

He admitted the penetration of open source software at HMRC was low: "We have some way to go. Our future will be a combination of private and public cloud, commodity compute, some of our databases are rather large and don't run in virtualised environments, so we will optimise our database cloud."

HMRC mainly used VMware for its virtual servers. But Dearnly said he was interested in whether it would viable to switch to OpenStack. He said: "We hope industry creates a level playing field so we can explore the options and feel comfortable using these technologies. You don't want us to play with your tax data."

Dearnley said he was determined to switch off the HMRC mainframe, but admitted there would still be other bits of proprietary technology that the organisation will need to maintain into the future.

HMRC is also beginning to move its digital platforms onto open source technologies. Dearnley said: "We are testing the way we develop software changes with open source, how we work with system integrators and the skills we will need."

Completing the open source circle, Dearnley said HMRC's experience with Hadoop has enabled it to contribute code back to the open source community. "As we develop in Hadoop we can put it back in the code stream. Even CESG encourages me to do that and it is encouraging for the team."

Dearnley said open source software would define the organisation's future. "It is as much about people as it is about technology – and the people have to believe in it."

Read more on Open source software