Data theft by web scraping often overlooked, says Sentor

Web scraping is a common form of data theft that is often overlooked by businesses, says security firm Sentor

Web scraping, a software technique for extracting information from websites, is a common form of data theft that is often overlooked by businesses, according to Swedish security firm Sentor.

“Data stolen in this way is also used to create spoof sites to trick users into entering their login credentials,” said Paolo Culora, UK manager for Sentor.

Online businesses such as directories, travel companies and gaming companies are the most highly targeted by competitors stealing content for their own websites.

Others affected by this practice include property companies, online auto traders and online event ticketing companies.

Sentor developed a detection and prevention capability at the request of UK local business directory Yell, which wanted a way to tell the difference between legitimate visitors and web scrapers.

The resultant technology looks at a combination of various factors such as interaction speed, user behaviour, geo-location of the requests and blacklisted IP addresses to block illegitimate activity.

Yell, which has around two million records about UK businesses, was concerned about large-scale copying of this information for re-use by competitors.

“The challenge was to find an invisible way of blocking web scrapers without affecting genuine users of the Yell directory,” said Mathias Elvang, Stockholm-based head of consulting at Sentor.

Web scraping can be a significant problem for businesses that operate online. “For example, Sentor blocks around one million web scraping IP addresses at any given time,” Elvang told Computer Weekly.

Protecting data and bandwidth

“Blocking this activity is not only about protecting information, but it is also about protecting the business from what amounts to a type of denial of service attack,” he said.

Ladbrokes was surprised by the extent to which web scraping was affecting its online betting site, with scraping requests making up 20% to 25% of traffic on average.

“However, the number of web scraping requests can peak at up to 50% during sporting events such as premiership football matches,” said Culora.

Ladbrokes is a good example of how web scraping can affect the business by slowing down the website through bandwidth consumption and enabling competitors to get an advantage, he said.

Although awareness of web scraping is growing, not all affected businesses are aware, and most of those in the UK which are aware do not have a good idea of how big the problem is, said Elvang.

“It often requires something to go wrong before businesses take action, while others pursue legal remedies before they realise it is more effective and less time-consuming to manage the problem proactively,” he said.

Often management spots the problem before IT, said Elvang. “Either way, the business case is relatively easy to make considering that tackling the problem saves about 20% of bandwidth on average,” he said.

Image: Thinkstock

Read more on Privacy and data protection