White Paper: S3TC compression technology

Various types of texture compression techniques have been employed over the years to increase performance. Now, a new technology...

Various types of texture compression techniques have been employed over the years to increase performance. Now, a new technology from graphics chipset manufacturer S3 Inc. offers both increased performance and improved image quality

S3 Inc. is one of the major manufacturers of graphics chipsets. Its current Savage3D and Savage4 chipsets provide good performance at an enticing price for the cost-conscious OEM market. Of all the features incorporated in its graphics chipsets, one technology stands out. This is a texture compression technique called S3TC, now part of Microsoft's DirectX 6 application programming interface (API). This technology has several benefits for PC users.

Texture maps are bitmap images that are applied to 3D objects. They are used to add realistic surface detail without increasing the complexity of the geometry in a 3D scene. Textures can be anything from wood grain or marble patterns to complex pictures of people, buildings and trees. To simulate real-life scenes, it is desirable to have access to a large number of detailed textures. However, this places significant demands on system or graphics memory (depending on where the textures are stored), forcing application developers to use fewer and less detailed textures in order to match a limited amount of memory storage and bandwidth.

The Accelerated Graphics Port (AGP) has made it possible to access textures directly from system memory increasing overall available storage. However, AGP and the system memory interface are shared resources. Besides textures, AGP is also used for passing geometry data while system memory is used to store and run the operating system and applications. So, it should not be assumed that all system memory bandwidth will be available for reading texture data. Given this, making the most of the available bandwidth is critical to a graphics subsystem optimised for AGP texturing.

S3TC texture compression helps in these two main areas by allowing more detailed textures to be stored in the same memory area and, at the same time, by significantly decreasing the bandwidth required to access them. The obvious benefit that texture compression provides is that a given amount of texture data can be stored using significantly less memory. This is most critical when texturing out of the local frame buffer. Equally important is the fact that the memory and bus bandwidth required to read textures is greatly reduced, translating to much improved performance over AGP.

System memory and AGP bus bandwidths are finite resources. With compression, the accelerator can take better advantage of these resources. Texture compression can allow for larger textures. While smaller, less detailed texture maps typically result in surfaces that look blurry or blocky, larger textures let the application provide more surface detail. Texture compression can also allow for a greater variety of textures to be used at any given time, permitting more varied scenes.

When texturing out of local frame buffer memory, compression may free up enough memory to increase the display resolution or to perform triple buffering. A higher resolution display provides a smoother, more detailed look, while triple buffering can improve performance by allowing the rendering engine to start on a new scene without waiting for the display's vertical sync. The use of triple buffering can result in a significant increase in frame rate (typically 30 per cent).

The extra memory available with compressed textures allows for the use of mip-maps even with the added memory storage required (30 per cent) over the base texture map level. Mip-maps help to reduce aliasing artifacts visible on textured surfaces that span significant distances. Without mip-maps, a pixel on an object far away may be associated with several texels from the original texture map.

Low-pass filtering is used to retain the information, while not introducing unwanted artifacts (shimmering, crawling pixels). Real-time filtering is expensive, so staged mip-map levels can be used that are pre-computed filtered images, dramatically lowering the filtering complexity.

The use of mip-maps can also improve performance. Mip-maps help keep memory accesses sequential and allow longer bursts. Otherwise, as the object moves farther away, pixels are sampled less frequently causing memory accesses to become more random and increasing the likelihood of shorter (less efficient) bursts and page break penalties.

The S3 texture compression scheme (S3TC) was developed specifically for texture maps. Textures are compressed to a fixed size equal to four bits per texel for opaque textures (or textures with simple transparency effects) or eight bits per texel for complex transparent textures. The quality of the textures, even after compression, is very good.

S3TC breaks a texture map into 4x4 blocks of texels. For opaque texture maps, each of these texels is represented by two bits in a bitmap, for a total of 32-bits. In addition to the bitmap, each block also has two representative 16-bit colours in RGB565 format associated with it.

These two explicitly encoded colours, plus two additional colours that are derived by uniformly interpolating the explicitly encoded colours, form a four-colour lookup table. This lookup table is used to determine the actual colour at any texel in the block. In total, the 16 texels are encoded using 64-bits or an average of four bits per texel. Simple transparencies (single bit or colour keys) are performed by reserving one of the four colours to indicate that the texel is transparent and the third colour is just the average of the explicitly encoded colours.

The order of the two encoded colours determines whether the block is completely opaque or whether it has transparent texels. When the block is determined to be one that has transparent texels, the fourth bit encoding (11) indicates a transparent texel, while the other derived value is just the average of the two encoded colour values.

S3TC also provides an additional 64-bits to encode more sophisticated transparency effects if desired. Decoding blocks compressed in S3TC format is straightforward. A two-bit index is assigned to each of the 16 texels. A four-colour lookup table is then used to determine which 16-bit colour value should be used for each texel. The decoder requires relatively little logic which can be operated at very high speeds and replicated to allow parallel decoding for very high performance solutions. The simplicity of the implementation should result in its universal adoption throughout the graphics industry.

While other compression techniques exist, many "simple" schemes with inexpensive decoders have inferior quality or a smaller compression factor, or both. Vector quantisation techniques produce inferior quality and, because they rely on using a code book, the decoder needs to do two memory accesses to decode each texel block. The first memory read is needed to read the block code - the index into the code book. The second access is needed to look up the texel values associated with the block code by using the index to look that up in the code book.

The code book could theoretically be stored on-chip to avoid the second memory reference but, since this would be costly, it is not done. Plus, even if the code book were stored on-chip, it would need to be loaded and that would have a significant performance impact as well. For these reasons, VQ compression is not only much slower than S3TC, but - equally important - the quality is worse.

Palletising is one well-known form of vector quantisation. Palletised texture image quality suffers when a large variety of colours are used. While a palletized image is limited to 256 colours for the whole image, S3TC does not impose a limit on the overall number of colours available. Palletised textures also require that a new palette be downloaded for each new texture.

Other "standard" compression techniques like JPEG (TREC) are expensive to implement and the quality of many images will still look better with S3TC. The cost is not just the large amount of logic required to do the decoding. Because of the large latency incurred in decoding a block of TREC or JPEG texture, a significant amount of latency-compensating buffering is required as well, further increasing the cost and complexity of the logic around the decoder. Also, DCT-based compression introduces low frequency artifacts - ringing or blocky artifacts - that are not easily removed with standard texture filtering algorithms like trilinear filtering. Texture images with smooth gradients will even look better with S3TC than uncompressed 16-bit RGB 565.

S3TC is the basis for the compressed texture formats used in DirectX 6. Microsoft has licensed S3TC from S3 and is making it available to both the ISV and IHV communities through the DirectX 6 API.

Software developers can expect a broad level of support for texture maps compressed in this format. Coupled with Microsoft's support for this standard is the fact that it is also easy to build into any graphics hardware. So, expect it to be universally available before too long.

Shipping with compressed textures should also ease any storage problems on the CD-ROM or diskettes used to ship the application. To give an idea of how S3TC can help to improve the quality of 3D applications, in an 8Mb frame buffer, an application running at 800x600x16 double-buffered with a 16-bit z-buffer will have 5.25Mb of memory left over for texture storage. If S3TC is used, those 5.25Mb will be able to store the equivalent of 31.5Mb of texture. Furthermore, you could switch to triple buffering and still have the equivalent of 26Mb of texture storage. Or you could quadruple the resolution of all the textures, convert them all to mip maps and still have the equivalent of 3.5Mb of texture storage left over.

With S3TC compressed textures, applications can use a broader variety of textures, use higher resolution textures, increase performance by cutting the read bandwidth required, increase performance by switching to triple buffering and mip mapping, or use a combination of all of these. This will increase the realism of computer generated 3D scenes significantly.

Compiled by Ajith Ram

( 1999 Microsoft

Read more on IT risk management