Data mining outside the firewall

How do your company's pricing models compare with those of your competitors? Are customers making it to your site's deep links,...

How do your company's pricing models compare with those of your competitors? Are customers making it to your site's deep links, or leaving shortly after visiting the home page?

In-depth answers to these questions can be found through mining the web - that is, discovering and analysing web page content, descriptions found in web documents, overall web structure, and website use and access patterns.

Web mining is an externally focused relative of business intelligence. Retrieving data outside the firewall can be done via agent technology, by tapping into website logs or by adding data retrieval methods into website applications.

IT managers can turn to their existing data-mining tools to examine structured web data and also use text-mining tools to examine unstructured data.

Eyeballs count

In setting up the web-mining process, first define the business problem and the types of information desired. For example, with competition fierce for site visitors' time and attention, comparing link counts and page rankings of your company's website to others can affect the number of page views and, ultimately, revenue. This data can be uncovered by mining search engine data either via text-mining tools or through a data-mining wrappering strategy.

Analyse page weighting within your company's sector to see which companies are most effectively drawing visitors and achieving high search-engine ranking. Then examine the content, site structure, and page layout of high- and low-ranking companies.

Finally, consider taking a broader view, analyzing the web as a whole and examining those sites that are the most effective in terms of traffic and page rankings.

Likewise, analysing the structure of your web pages can yield useful insights. Using available tools, you can analyse the number of links into and out of various content. Usually, the more links there are, the more useful the content.

Looking inside, do visitors to your site hit the main page, but seldom go any deeper? Access trends can pinpoint a site structure that may need to be redesigned to increase traffic. The same tools and techniques used to mine outside the firewall can reveal how customers interact with your site.

Analysis of this information might lead you to provide precise content dynamically, choose a tight or loose site structure, or opt for customised services, such as online customer representatives.

Web server logs can yield some of the information needed to perform use and access analysis of your site. But additional data gathering with third-party tools or in-house scripting programs may be needed to capture enough elements to make the analysis useful.

Inside or out?

Data gathering for web-content mining can be handled in house, but a fair number of service providers can also tackle the task and may offer the capability of notifying you when content changes. You might consider using a service provider when large data sets are involved to reduce the overhead on your network when gathering data.

Quite a few commercial and open-source tools exist to assist with web mining efforts. For example, NetGenesis from SPSS collects and analyzes web data and transforms it into useful metrics, and QL2 Software's WebQL includes a development interface, querying capabilities, and a deployment engine to extract the data needed.

Web mining extends data mining beyond the corporate walls. And including the web in your mining strategy can improve your web presence and increase your competitive intelligence.

Maggie Biggs writes for InfoWorld

Read more on IT strategy