Web scraping, a software technique for extracting information from websites, is a common form of data theft that...
is often overlooked by businesses, according to Swedish security firm Sentor.
“Data stolen in this way is also used to create spoof sites to trick users into entering their login credentials,” said Paolo Culora, UK manager for Sentor.
Online businesses such as directories, travel companies and gaming companies are the most highly targeted by competitors stealing content for their own websites.
Others affected by this practice include property companies, online auto traders and online event ticketing companies.
Sentor developed a detection and prevention capability at the request of UK local business directory Yell, which wanted a way to tell the difference between legitimate visitors and web scrapers.
The resultant technology looks at a combination of various factors such as interaction speed, user behaviour, geo-location of the requests and blacklisted IP addresses to block illegitimate activity.
Yell, which has around two million records about UK businesses, was concerned about large-scale copying of this information for re-use by competitors.
“The challenge was to find an invisible way of blocking web scrapers without affecting genuine users of the Yell directory,” said Mathias Elvang, Stockholm-based head of consulting at Sentor.
Web scraping can be a significant problem for businesses that operate online. “For example, Sentor blocks around one million web scraping IP addresses at any given time,” Elvang told Computer Weekly.
Protecting data and bandwidth
Read more on denial of service attacks
- Bots and web apps among top threats to data security, says Check Point
- HSBC back online after DDoS attack
- Police arrest man for DDoS attacks on Theresa May sites
- Five DDoS attack tools that you should know about
- Some activist DDoS attacks growing in sophistication, expert says
- Izz ad-Din al-Qassam hackers launch cyber attack on US bank Wells Fargo
- Swedish government sites targeted by Anonymous
- Prolexic to introduce DDoS mitigation service plan PLXconnect
“Blocking this activity is not only about protecting information, but it is also about protecting the business from what amounts to a type of denial of service attack,” he said.
Ladbrokes was surprised by the extent to which web scraping was affecting its online betting site, with scraping requests making up 20% to 25% of traffic on average.
“However, the number of web scraping requests can peak at up to 50% during sporting events such as premiership football matches,” said Culora.
Ladbrokes is a good example of how web scraping can affect the business by slowing down the website through bandwidth consumption and enabling competitors to get an advantage, he said.
Although awareness of web scraping is growing, not all affected businesses are aware, and most of those in the UK which are aware do not have a good idea of how big the problem is, said Elvang.
“It often requires something to go wrong before businesses take action, while others pursue legal remedies before they realise it is more effective and less time-consuming to manage the problem proactively,” he said.
Often management spots the problem before IT, said Elvang. “Either way, the business case is relatively easy to make considering that tackling the problem saves about 20% of bandwidth on average,” he said.