Storing the big e-data mountain

As e-business/commerce takes off, so the data it generates grows exponentially, producing major ramifications for storage and IT...

As e-business/commerce takes off, so the data it generates grows exponentially, producing major ramifications for storage and IT systems. Nick Enticknap considers how to meet this challenge.

There can be little doubt that the take-up of e-business systems will create a data explosion, and that information is the currency of this phenomenon. This creates important challenges for organisations running e-business systems. They will have to implement an IT strategy which ensures that they can cope with large volumes of data. This strategy must also ensure that dealing with these mountains of data does not create difficulties throughout the business, and especially the IT system in its entirety. Although storage systems are an important part of the solution, but managers must take into account the need to deal with the impact of large volumes of data across their whole IT infrastructure.

Rates of data growth are quite staggering, especially when they are allied to e-business. For example, IT research firm International Data Corporation forecasts that by 2002 - less than two years away - some 515 million devices will be accessing the Web and conducting over $400bn-worth of business transactions. These businesses need e-mail and e-messaging, want the benefits of ERP (Enterprise Resource Planning) systems, and, of course, the potentially high returns of e-commerce, plus the business benefits of e-business.

Also, the advent of so-called pervasive computing , which means the ability to access data from anywhere, at anytime, is on its way through the use of of mobile/portable computers, such as palmtops, and even mobile phones. Thanks partly to WAP (Wireless Application Protocol), mobile phones will be able to access Web sites. All this adds up to massive growth in data volumes.

According to analyst Richard Winter: 'Today, most enterprises are storing 10 times as much data as they had three years ago. Most will have 20 to 50 times more in the next three years, as the Internet is increasing the pace of data growth. This massive surge in data volume has stressed the infrastructure for storing and managing it.'

The rate of data growth for so-called 'dot com' companies is likely to be far higher, though, of course, they are starting from a lower base. US storage giant EMC says that one of its dotcom customers, Excite, went from nothing to 45 terabytes in two years.

Another reason for the increase in data volume is that companies are collecting more data than they did with the traditional transaction processing systems of the past. According to Tony Reid, Hitachi Data Systems Storage Solutions Manager: 'Today, most transactions over the Internet are fairly simple. As time goes on, suppliers will be trying to find out more about what their customers' buying patterns are, so they can target them. The amount of data that has to be gathered is significant, because you don't know what you will need: it doesn't matter if 80 per cent is useless if 20 per cent generates business.'

Coping with all this data is one thing, but IT systems supporting e-business must also offer 24X7 reliability, availability, and scaleability.

Web based trading knows no boundaries in either space or time. And failures of any kind not only cause embarrassment and loss of image, but loss of business - it is so easy for the customer to click onto somebody else's site that is working.

And it is not simply failures that are a concern: poor performance can cause just as much of a problem. As Tivoli SAN Strategy Manager Ron Riffe points out: 'As more and more data becomes involved, simply making a backup can impact performance, so that you're not transacting business properly. So storage is becoming more critical than the host processor these things are running on.'

Riffe's colleague, Tivoli's EMEA marketing manager for storage management, Robin Pilcher elaborates the point. 'There is an irony. Companies have to protect their data. But the actions they take to protect it - backup and archiving - are working contrary to running the business. Up to 60 per cent of traffic running across a LAN is backup and archiving; in other words just data movement. One of the key drivers behind SAN is to move all that housekeeping off the LAN onto a fast, intelligent SAN.'

To meet this need Tivoli has enhanced its Tivoli Storage Manager, (formerly known as IBM ADSM) product to provide storage area network based backup facilities. According to Robin Pilcher: 'At the end of September we made a LAN-free announcement. You stick it on a SAN.'

Removing 60 per cent of your LAN traffic in this way has an enormous effect on throughput, he says: 'It's like building a couple of extra lanes on the M25 motorway.'

Pilcher says he cannot stress enough the importance of information management strategies across e-business enterprises. This must take account of all the IT elements that are impacted by, and contribute to, e-business type operations, where the goal must be 'business as usual', or, in other words, business continuity. Storage management is a key element, but certainly not the only one.

An effective information management strategy, focused on ensuring business continuity, will consider all the relevant IT elements, such as hardware, networks, databases and applications, and must be designed in such a way that when downtime occurs, systems can be brought back up in the blink of an eye.

Also, systems must be designed so that they can scale as demand grows - poor scaleability can have dramatic negative consequences. Even regular housekeeping activities can have a negative impact on e-business systems availability if not managed properly.

'There is an irony,' comments Pilcher, 'that while enterprises must backup their data for availability reasons, by performing these activities they are actually impacting their networks' and servers' ability to run applications. Up to 60 per cent of the data over the LAN is important data movement activity, but it is also non-productive in terms of earning revenues. It is simply prudent housekeeping. This problem will only worsen as data volumes continue to double year-on-year for many customers.'

I believe a sound information management strategy must address these fundamental issues which are facing so many IT managers.'

Such folk and their bosses must be wary of approaches which centre solely on storage management as the means of coping with the e-business data explosion and its consequences. They must look to cut out those activities which can harm those IT systems supporting e-business operations. And that includes not wasting LAN and WAN resources, and processor cycles on housekeeping activities, and not wasting storage space with duplicated redundant data.

One storage product development which should help is the SAN - the storage area network. This is a major new technology which the IT storage industry has developed to meet the requirements of the e-business age. One of the things a SAN offers is the ability to move traffic off the existing infrastructure, just as the LAN did when it arrived in the eighties.

But SANs offer a lot more than that. For example, they offer greater scaleability: it is easier to add storage to a network than it is to a server with a limited number of channels or SCSI/Fibre Channel adaptors.

Moreover, the intelligence contained within the SAN offers the potential of more than just the ability to add additional storage whenever you need it. As Tivoli's Ron Riffe says: 'Today, a human being has to watch a file system to see when it fills up, then go and buy more disk . With a SAN, we will have capacity on demand.'

One of the great pluses of the SAN is that, where implemented, it enables large volumes of data that are involved in storage housekeeping (such as backup and archiving) to be moved off LANs to dedicated fibre channel connections. SAN management software can also manage data movement, thus reducing the burden borne by server management systems. This frees up valuable processor capabilities which can then be focused on activities of direct value to e-business operations. Also the data sharing qualities inherent in SANs should enable wasted space and duplicated data to be eliminated. This helps control the growth of data, its currency and availability, and should also contribute to cost savings through the consolidation of storage.

But these benefits can only accrue when SANs are controlled by intelligent SAN management software. A combination which Pilcher says can 'manage information across the SAN and indeed across the entire enterprise. Only then is the company (using these techniques) able to meet the challenges of e-business and mobile commerce.'

But the Holy Grail of true SANs - with multiple heterogeneous servers accessing storage devices from many different vendors under the control of software from a variety of companies - is still some way off in the view of some in the storage business. As Hitachi Data Systems European SAN and high availability manager Vincent Franceschini puts it, today 'the promise of 'any data, anywhere, any time' is unrealistic.'

He lists three areas where SANs are already capable of meeting customer needs: storage consolidation; resource sharing; and backup.

Storage consolidation is a trend that has been evident for some time. Tivoli's Pilcher says: 'People are consolidating storage, and some of the difficulties of that lead to some of the benefits that SANs bring.'

Derek Warry, Storage Manager at major UK third-party storage supplier Storm, explains why. 'Users are trying to get storage under control, and servers for that matter. Some financial institutions have several thousand servers, and they don't know what's on them. So we are seeing big growth in multi-platform support... This is where the SAN comes in. That is really a tremendous number of data gateways, providing access to all the islands of storage.'

As we have already seen, the demands of 24x7 mean that the traditional luxuries, such as a 10 hour backup window, have gone. SANs are already providing a solution to this problem.

However true SANs are some way off as the necessary standards are not yet all in place (see separate article). For this reason, it is not yet a question of 'plug and play': it takes expertise to construct a SAN from the various building blocks needed.

As EMC's Nigel Ghent puts it: 'Most production SANs are on just a single platform, or a small number of different platforms, using products qualified by particular vendors - hubs, switches, and storage devices - to work together.'

In future, software companies particularly will develop functionality to exploit the potential of SANs. An example is Tivoli's Data Protection suite, designed, according to Ron Riffe, 'to assist customers with the availability of e-business applications, such as SAP, Lotus Notes, Microsoft Exchange, Lotus Domino, DB2, Informix, and Sybase'. Tivoli is creating a whole family of 'Tivoli Data Protection for XXX' products - specific programs to interface with those applications, and allow backup copies without impeding the application, and allowing recovery in minimum time. They can be integrated with system management tools for performance monitoring, and so on.'

The main immediate effect of the growth of e-business on storage, then, will be a change to the traditional storage architecture that has been with us since the computer was invented. Instead of having storage dedicated to and directly attached to the server, it will in future be organised separately into its own network.

This development will arrive rapidly. According to Dataquest, SANs and network-attached storage combined will account for 50 per cent of all storage sales by as early as the end of next year.

As time goes on and as the SAN concept is more fully exploited, users can expect to see significant increases in scaleability and availability, and then eventually in the ability to share data between users and applications, irrespective of the underlying technology, just as the Internet allows unfettered access to the World Wide Web today. n

Supermarket chain Safeway is an example of a large company - in this case a leading retailer - that is leading the way with the type of application which requires sophisticated storage management integrated with systems management. It was the first UK supermarket chain to introduce loyalty cards, and now has over one terabyte of data deriving from these cards representing over two years of purchasing information.

The data is stored in DB2 databases on three IBM System 390 mainframes, and is currently being transferred from StorageTek Iceberg disks to a new IBM Enterprise Storage Server subsystem (formerly codenamed Shark).

According to Systems Software Manager Paul Kelly: 'We are using this data to help raise customer satisfaction in order to gain competitive advantage... The DB2 databases will now be able to support much more interesting and advanced operations with the data stored on it. For example, if a customer buys no wine but lots of food that could go well with wine, then we can make a special offer to that customer.'

This increased exploitation of the loyalty card data is possible because of the advances in storage technology incorporated in ESS, including fibre optic cabling, as well as faster disks and larger cache. ESS will also allow easier management of data growth.

According to Kelly: 'On the former StorageTek architecture, we always had to hand balance our data storage, to avoid the bottlenecks which occurred with some data sets. Now with Shark we see that these likely bottlenecks are being resolved automatically by the storage system. This has huge advantages for us.'

Safeway is also using the loyalty card database to support a new Web site, taking advantage of the ESS capability to allow multiple accesses from different applications.

OS 1997 1998 1999 2000 2001 2002 2003
NOS 3,079.9 4,643.7 7,930.8 12,829.4 20,313.5 31,979.3 51,083.6
NT 3,503.1 8,888.0 42,3111.1 42,317.1 79,542.9 147,722.7 265,847.5
Other OS 762.9 449.5 637.9 1,062.5 1,062.5 2,870.7 4,719.3
Unix 6,291.0 11,505.8 21,747.1 41,510.6 75,076.3 132,623.0 231,024.2
Open VMS 440.8 355.2 448.1 573.2 728.4 864.0 987.1
AS/400 834.0 1,666.3 2,838.5 4,612.1 7,526.1 12,339.4 20,029.0
S/390 1,617.3 3,236.6 4,776.1 7,323.9 11,117.4 18,115.7 29,284.2
Total 16,528.8 30,745.1 58,689.6 110,228.9 196,050.6 346,514.8 602,974.9

Read more on Business applications