Enterprises are discovering new uses for wide area networks, such as voice over IP (VoIP), datacentre consolidation...
and back-ups of distributed data files to a central, safe, auditable location.
But with greater versatility comes far greater demands on Wan bandwidth and, although its costs are decreasing, they are not going to disappear. The key, therefore, is to make the most of available bandwidth, and compression is one of the techniques enterprises can use.
The main types of data compression are lossy compression, lossless standards-based compression and lossless advanced compression.
In lossy compression, some data is lost in the attempt to decrease file size, but it is hoped that the loss of data is not significant.
Lossy compression is used for audio and video, where data loss leads to a slightly noisier sound or a coarser image. acceptable because of the massive decrease in file size. JPeg, MPeg, MP3 and MPeg-2are examples of lossy compression formats.
Files that have been compressed with a lossy compression algorithm can usually be recompressed to decrease the size of the file, and some commercial products provide that capability through on-the-fly recompression of images on a website, for example.
However, it is important to be aware of the contents of the files that are being recompressed. For example, it might not be a good idea to increase the compression (and therefore lose important detail) in an x-ray image being used for diagnosis.
As its name implies, lossless compression reduces the size of the data flow without losing any information. The flow delivered to the recipient is absolutely identical to the flow sent by the original sender.
Standardised lossless compression algorithms include deflate/GZip, Gif, PNG and Van Jacobson's TCP/IP header compression.
Deflate is a very common option and the decompressor is embedded in all the major browsers. (It is the actual algorithm that is known as "deflate," while GZip is a common implementation and associated file format.)
Some suppliers enable GZip compression on data flowing from the browser to the server. This is particularly useful if the user is transmitting long data fields to the server or if very large cookies are present, as is sometimes the case with single sign-on technologies.
GZip compression is also commonly used to compress and transport files from one computer or PC to another. It is available for non-browser site-to-user data flows, although that sometimes requires a browser plug-in.
Compression software inside the individual user's client can see the data in cleartext, without any of the encryption that a virtual private network might impose. (Encrypted data cannot be compressed because it appears to be random.)
Van Jacobson's TCP/IP header compression (RFC 1144) is an entirely different compression method in which intimate knowledge of the data is used in the compression algorithm. A combined TCP/IP header needs 40 bytes, uncompressed, but Jacobson found a way to transport all of the information in just three to five bytes.
The initial headers that establish the TCP/IP connection are sent uncompressed, but subsequent headers make very few changes from these. It is therefore necessary to send only the few bits that change from packet to packet, so the header size can be reduced.
Applications are usually the only place where this type of compression can be performed, but it is also possible, with detailed knowledge, to implement header compression on the communications link.
Lossless advanced compression
Some products can provide much higher compression, but they require the installation of a special appliance or software at either the user's location or at some intermediate site close to the user, where the decompression can be performed.
These advanced compression products provide value by:
- Increasing the compression and decompression speed, thereby allowing use on high-speed Wan links of more than 155mbps.
- Increasing the amount of compression by using specialised algorithms, some of which can identify the type of file being compressed and then custom-fit the algorithm to the file (for example, using different compression algorithms for image files, for text files, for TCP/IP headers, and for VoIP packets).
- Creating and maintaining large dynamic compression dictionaries of repeated data sequences at both ends of the link and storing those sequences over a very long period of time (days, weeks, or more, instead of just the few moments stored in a normal algorithm). This "stateful compression" or "data reduction," usually combined with the ability to process high-bandwidth links at full speed, can result in extremely high compression ratios.
- Performing all of this work while introducing only minimal latency (typically under two milliseconds end-to-end for compression and decompression together).
The algorithms used by advanced compression products are very different from one another. They are completely proprietary and are not standardised.
For example, these products yield different results depending on the data being compressed and on the number of times that identical data passes over the Wan link.
Eric Siegel is a senior analystat Burton Group