Using ASPs to analyse website traffic

Analysing the habits of website users is key to business success in today’s economy. ASPs are now lowering barriers to market...

Analysing the habits of website users is key to business success in today’s economy. ASPs are now lowering barriers to market entry by providing such information for a monthly fee

Given the size of today's databases, systems need to analyse an overwhelming quantity of data every second. ASPs achieve such capability by harnessing the power of today's standard processors and memory chips using Massively Parallel techniques. This provides increased reliability and resilience.

Massively Parallel

A shared-nothing architecture of, for example, 216 racks of 14 processors (3024 processors) in a single installation combined with 1.5 terabytes of RAM and 2,592 disks with 16.8 Tb of user storage (in a RAID5 configuration). Examples of this are not fixed configurations - there is a great deal of flexibility in how systems are configured to suit customer requirements.

Such massive amounts of RAM are another reason that hosted ASPs perform so quickly. By directly accessing data in RAM, rather than on disk, access times are measured in nanoseconds, not the milliseconds needed by a hard disk.


One essential in any massively parallel processing system is high speed, high bandwidth communication paths between the processing elements which comprise the server. A hierarchical fully switched Fast Ethernet fabric achieves inter-processor communication. The switches are arranged on special cards separate from the processor and disk cards. The design ensures database processing is not interrupted for message handling, and means the system can be scaled to as many as 10,000 processors with a genuinely straight-line performance increase.

Communication Processors handle external communication. Multiple Communication Processors can be configured for resilience, for large numbers of concurrent users and scalable load speeds. They are based on standard Unix processors, and accept PCI cards, to allow communication with current and future protocols and devices.

Database Software

Having the right hardware for a fully capable system is a prerequisite for delivering that capability, but the hardware must be fully exploited by database software optimised for high-speed data exploration. This means software designed from the very start to be truly parallel and to employ seamlessly the full power of the hardware architecture.

ASPs will be used to perform many database software activities such as running queries, although this will take place in parallel. The most popular software used is the standard SQL - it needs no proprietary 'hints' to encourage it to do the best job. So user applications or third party query and analysis tools that generate ANSI SQL can benefit from parallelism without any special technical intervention.

If a database is usually accessed using the same columns, then indexing those columns can speed up transactions. In decision support systems, however, indexes are often irrelevant, because indexed access is faster only when a few per cent of the total rows are selected; beyond that point it is faster to scan the data.

The Online Transaction Processing (OLTP) systems for which today's database systems were first designed almost always access a tiny percentage of tables. As a result, database systems engineers using mass market for exploration have to be highly skilled. They need to decide for every column in the database:

Will this column be used frequently enough to justify an index?

If so, will the maintenance overhead outweigh the benefits?

If not, will the optimiser make the right decision when to use the index and when to ignore it?

If not, can we update the optimiser?

If not, how many applications need to be changed to override the optimiser?

Do we have developers who are sufficiently skilled to make and test the changes?

This would be complex enough if it was a one-off task. However, there are additional complications. Having made thousands of micro-decisions on indexing, many will be wrong when new data is loaded, when a user with a different requirement accesses the system, or when a user has a new idea to pursue. Moreover, most databases have some columns that are non-indexable.

To make correct decisions the businesses need to know exactly how data will be accessed. This is in direct conflict with concepts like ad hoc querying, and train-of-thought processing. Businesses need an in-depth understanding of the data content and the restrictions of the database system, if they are to make full use of the services provided by ASPs.

Clean sheet database

Many ASPs are developing radically different solutions to the problem of data accessibility. Conventional indexing techniques are being dismissed as inappropriate for decision support. Instead, new ASP analysis and exploration technology has adopted the concept of images of data in RAM. An image can be created from a table, view or table fragment. An image can be thought of as an index, in that it is a structure with a copy of some (or all) of a data row that provides greatly enhanced response times. Images can be created quickly and easily and dropped just as fast and simply. If data is requested which is not loaded as an image, the server dynamically (and automatically) creates a temporary image.

No more sampling

Banks, telecommunication companies and retail groups collect massive amounts of data on customers, products and sales. This is likely to increase further with the growing popularity of the Internet. The key to making full use of the Internet isn't necessarily about sales or the number of users that visit an Internet site, but the data that is stored about website visitors. This has brought about the IT industry's present obsession of collecting visitor data, which can be sold as a business intelligence tool. However, the sheer volume of data means that users need to resort to samples when looking for knowledge. But this is far from perfect. As the size of databases become prohibitive to their full examination, many companies are working with samples as small as 1%, they ignore much of the potential knowledge they have collected - it's not easy to decide which 99% of detail to ignore. In reality, companies need to examine all the detail they have so carefully and so expensively assembled. There seems little point storing and requesting data unless you use it. This is why many IT companies are taking a more thorough, less traditional data exploration and analysis approach in the new economy, and also explains why ASPs have suddenly become this year's hot acronym. However, even if companies buy in their data, solutions are needed to allow business managers to interact intuitively with all the data that their organisations collect. They still need to have a rudimentary understanding of the technology and its applications if they are to obtain the competitive advantage they so dearly seek.

Paul Phillips

Read more on IT outsourcing