Compression bandages for enterprise data

Business make more data every day, but the amount of bandwidth available to move it around is harder to acquire. This feature from Voice & Data Magazine explains how compression can help your network cope with the data deluge.

Data storage has expanded at such a healthy pace for so long that many companies have become inured to the data management challenges of the past. It's so easy to add additional hard drives to an enterprise storage array, after all, that once-carefully-monitored personal storage quotas are little more than memories. David Braue details the reasons why compression is becoming more important when it comes to storage over networks.

At the same time, however, the inexorable creep of critical information systems across the humble wide area network (WAN) - which is now more likely to include virtual private networks (VPNs) running over unpredictable internet connections than to be based on traditional leased lines - has meant trouble.

That's because even though we can store more data, moving that data where it needs to go can be frustrating as networking protocols suffer inordinate delays through control signal overheads and retransmission of redundant information. Even increasing bandwidth isn't the answer, since 'chatty' protocols like Microsoft's CIFS (Common Internet File System) spend so much time checking and confirming the progress of data transfers that their effective transfer speed is often far below optimal levels.

Recent efforts to improve this situation have spawned a revolution of sorts in compression technology, with new approaches to trimming the fat out of enterprise data now extending to storage servers, application servers and WAN links. This last area, in particular, has become the rallying cry for numerous companies as they seek to use new compression and anti-redundancy technologies to improve the speed of data moving over the internet and direct WAN connections.

Putting the squeeze on data

In many ways, the current focus on optimising data flows grew out of the fledgling email archiving market - which sold itself as a solution for corporate compliance but soon became a major ally in the fight to control bloated Outlook inboxes. Redundancy scanning technology looks for frequently used data, such as attachments that are forwarded repeatedly between people, and replaces the second and later references to that data with a pointer to the archived version.

In this way, email archiving systems can often strip out most of the bulk from enterprise email backups. That technique is well accepted, but it also forms the basis of more recent, promising innovations such as IBM's new Venom compression technology. Released in July as part of IBM's new DB2 v9 'Viper' database, Venom uses new techniques to pick out redundant data and data patterns within database rows. The net effect: typical database compression of 50% or more, particularly in large tables with repetitive data patterns, according to widespread reviews of the beta technology.

Storing less data may seem irrelevant given the cheapness of disk storage today, but it's essential if companies are going to build effective data ecosystems in which all kinds of critical information is to be shuttled across the internet and corporate networks to support real-time decision-making. By compressing that data within the database, it can be stored in a highly optimised form that will make its analysis and transfer even more efficient.

Just how efficient isn't exactly clear, and, of course, depends on the type of data being transferred. Vendor claims of anywhere from 2:1 to 40:1 compression give little indication of the benefits any particular company can expect to see from the technology, although Melbourne-based Exinda Networks has tried to take the mystery out of the process by bundling its WAN optimisation appliances with application response measurement software that tracks efficiency improvements.

Even as the data on disks is being compressed and deduped, some disk manufacturers are turning to conventional compression techniques to further reduce data storage usage. Quantum, for one, last year extracted from its years of tape experience its Optyon Inline Data Compression technology, which embodies the compression techniques that have long provided 2:1 and better compression of data on backup tapes.

Applied to disk-based backup appliances, Optyon is designed to reduce overall disk consumption in increasingly important disk-based backup and tiered storage environments.

"Compression is starting to become a real buzzword in the disk space," says Simon Tippett, senior system engineer with Quantum. "Tape-based systems have been keeping pace with the industry, but now that customers are putting staging areas in the middle layers - and disk was always going to be more expensive than tape - we're seeing compression used in there to reduce [storage consumption]."

A key feature of Optyon is its use of hardware-based compression and decompression, which provides the benefits of the compression algorithms without hitting performance; this trade-off has put off many users from adopting compression in the past.

Although Optyon may have given Quantum the first-mover advantage after its launch last year, competitors like Neartek, ExaGrid Systems, Data Domain, and Indra Networks have all offered hardware-based compression appliances or add-ons. Major storage companies are likely to follow suit: Network Appliance, for one, announced in April that it will add hardware compression to its NearStore VTL virtual tape libraries by next year.

Brute-force compression isn't the only way that storage vendors are improving storage efficiency, however: Optyon will soon be bolstered by the fruits of Quantum's recent purchase of tape rival ADIC, which itself bought Adelaide-based compression company Rocksoft earlier this year. Rocksoft's core deduplication technology manages data using 'blocklets' that facilitate incremental data updates as well as more efficient use of space.

Your new WAN view

Even as storage vendors work to integrate hardware-based compression into their products, other innovation is targeting other parts of the enterprise network to streamline the overall flow of data between sites.

A major element of this effort has been the use of WAFS (wide area file system), an alternative to CIFS that has been designed to provide LAN-speed data access over WANs by being less chatty and more efficient, over any type of connection. Instead of waiting for repeated acknowledgments before sending the next chunk of data, WAFS endpoints use adaptive techniques to push larger data chunks across the line using less frequent and more intelligent error checking.

The result, says McData chief operating officer Todd Oseth, is a major reduction in the time needed to access files over both wireless and fixed WAN connections. "The rate of data growth far outstretches the rate of technology improvement," Oseth explains.

"Somewhere between 150 and 160 operating system calls are normally required to open up a single file; multiply each of those by a 25 ms delay, and you find it takes an awful lot of time to open your file because the [protocols] make a lot of assumptions about where the data is residing. Today's file systems are just not designed for headers that are potentially five to six times the size of the data they're carrying."

McData is one of a number of companies that resell WAFS-based Wide-area Data Services (WDS) technology from Riverbed Technology (McData calls it Spectranet WAN Data Services Accelerator, part of its recently launched Remote Office Consolidation bundle).

As in HP's StorageWorks Enterprise File Services WAN Accelerator, also a Riverbed derivative, WDS is often paired with intelligent caching that offers deduplication of a sort by detecting retransmitted data in a data stream, then stripping out the duplicated data and sending a pointer to the file that the receiving node pulls from its own cache.

"It's not just about blindly compressing everything," explains HP's StorageWorks product marketing manager Mark Nielsen. "This is about looking at bitstreams within files travelling across the WAN, and whether they need to travel the WAN again or not."

Another increasingly popular use for WAFS is in facilitating backup of notebook PCs: earlier this year, Riverbed rival Tacit Networks struck a path into this area by buying notebook synchronisation software vendor Mobiliti. Mobiliti's technology combines file compression, delta detection and network optimisation to facilitate the efficient transfer of data changes back to the data centre for backup.

Although they're proving highly effective for compressing data streams, WAFS-based devices are only one of many spokes in the WAN optimisation wheel. A variety of other point solutions - including caching and compression to email protocol manipulation, route control, and quality-of-service bandwidth management - all offer alternative ways of moving data from point A to point B more effectively.

In the long term, increasingly intelligent content-aware devices will learn to identify specific types of content streams and apply individual optimisation techniques for each one. Geoff Johnson, vice president of research for enterprise communications applications with Gartner, says such devices will be critical in the evolution of what he calls the 'application-fluent network'.

"The application-fluent network is a way of solving those pain points that are addressed by individual solutions," Johnson explains. "Multimedia's requirements are different from SAP, for example, and enterprises need a network that is optimised for particular applications. Everybody has a special solution for particular problems, but [customers] can't afford to deploy all of that branded equipment in their offices. Application optimisation and WAN optimisation will come together in a universal switch."

With 'WAN optimisation controllers' encapsulating the learning of the past few years, supported by equal advances in hardware-based storage compression, it won't be long before the technologies help customers rein in their storage growth - for a while at least. Continued growth has already been shown to be unavoidable, but by taking a concerted effort to optimise storage and WAN infrastructures it's possible to make sure the glut of data doesn't become a liability.

Read more on Network monitoring and analysis