To optimise your web-server's performance, you first need to understand how the different subsystems work together. This white paper examines two ways to improve effectiveness
To optimise your web-server's performance, you first need to understand how the different subsystems work together. You must also be well acquainted with your site's content, because what it serves up determines to a large extent which subsystem(s) most influences performance. Dynamic applications such as ISAPI and CGI, for example, can quickly saturate the server's CPU, and static content can tax the memory subsystem.
So what is the optimal setup? Your web-server must have enough processor power to drive the disk subsystem and effectively manage the cache. Again, what's sufficient depends on your site's content and traffic patterns. If you're just starting out, and your site has mostly static content, you can probably get by with a Pentium 166. But as the number of hits rises, or you start running CPU-intensive CGI apps, you'll need to add processors.
RAM also plays an important role; you'll need at least 64Mb. Most web-servers use part of this RAM to store previously requested files. As clients request pages, the server software first checks the cache; if the file's there, the server retrieves it rather than grabbing a fresh copy from the disk.
How a server manages cache depends on its software. Microsoft's Internet Information Server (IIS) 4 uses the NT file system's cache-management functions. To increase the cache, you simply increase the RAM on the server. NT automatically adjusts the amount of cache available depending on use, ensuring the most requested files remain cached. Other web-server software, such as Netscape's Enterprise Server, allocates a user-configurable, fixed amount of RAM for cache and manages this itself. This isn't as efficient as using the NT file system, but it's likely the best approach if your web-server must run on a variety of platforms.
A fast, preferably SCSI, disk subsystem will also help raise and maintain the web-server's peak performance. This is particularly true for servers that don't have enough RAM to cache their static content or that run server software that doesn't use a cache, such as O'Reilly's WebSite. Even with sufficient RAM, the web-server might be forced to satisfy some requests from the disk - requests for files that are too large to physically reside in cache, for example.
How we tested
To understand how the subsystems, site size, and content types work together to affect server performance, Windows Sources turned to WebBench 2.0. The latest version of Ziff-Davis's web-server benchmark uses a workload based on the traffic of several popular websites that receive millions of hits a day, including ZDNet, Microsoft, USA Today, and the Internet Movie Database.
The testbed was a Compaq ProLiant 6000 server with four 200-MHz processors, 512Mb of RAM, two 10/100 NICs, and a disk subsystem with four 2GB SCSI drives configured at RAID Level 0 - comparable with a real-life web-server. We tested configurations ranging from 64Mb of RAM and one CPU to 512Mb and four CPUs, using NT's boot menu to start the server with the desired level of RAM and number of processors. The test servers were running IIS 4 under Windows NT 4 with Service Pack 3.
For our purposes, we used the WebBench 2.0 static and ISAPI test suites. The static test requests data, such as HTML text, .GIF images, and binary executables. The ISAPI test suite combines static and dynamic requests to generate Web pages on the fly.
The WebBench 2.0 static workload consists of 6,010 files of various sizes and types and consumes approximately 63Mb of disk space. The configuration of 64Mb of RAM and one processor performed poorly serving static content. With only 64Mb of RAM, it was impossible for IIS 4 to cache the entire workload, so the server had to satisfy a large number of requests from disk. The NT Performance Monitor indicated that this configuration never achieved more than a 64 per cent cache hit ratio. In other words, the server had to access the disk subsystem about 35 per cent of the time.
If your website, like most, takes fewer than 100,000 hits per day, you might be able to live with this. But if your site generates hundreds or thousands of hits per second, you'll need to increase the rate at which your server delivers static data, which you can do in two ways: by increasing the amount of available cache or adding processors.
Which is the better option? With a purely static workload, the bigger payoff comes from increasing the cache. Ideally, you'd provide enough RAM to cache all static content. Practically, you can achieve good performance if the server has enough cache to accommodate the static content that users request 95 per cent of the time.
Increasing the RAM lets NT 4 dynamically allocate system resources to cache the static content. In turn, the server can satisfy more requests from RAM, reducing the load on the disk subsystem. On our tests, doubling the RAM from 64Mb to 128Mb on a one-processor server running NT 4 increased the cache hit rate to about 77 per cent and boosted the delivery of static content by about 90 per cent.
Compare this with the results we got when we held the RAM constant at 64Mb and added a processor. The extra processor increased performance by only about 50 per cent. It accomplished this by turning requests around faster and making more efficient use of the disk subsystem. However, the second processor didn't reduce the number of requests served from disk. With static content, increasing the web-server cache delivers a bigger payoff than adding processors.
What happens when IIS 4 has enough RAM to cache the entire static WebBench workload? We found that a system with a single processor and 256Mb of RAM satisfied all requests from cache. Taking the disk subsystem completely out of the picture increased the WebBench scores by approximately 375 per cent, to a sustained rate of over 1,000 requests per second. Not bad considering the price of RAM today.
In real life, it might not be practical or possible to provide enough RAM to cache all the static content on your site. However, by analysing the webserver's log files, you should be able to determine how much RAM is needed to cache the static content that gets hit 95 per cent of the time.
RAM speeds static Web content
The static portion of the WebBench 2.0 test simulates the typical static-content workload of a medium-to-large website. When there isn't sufficient RAM to cache the workload, the server is forced to turn to the disk subsystem, which slows things down. If you add enough RAM to cache the workload, you can take the disk subsystem out of the picture completely, and your server can achieve optimal performance. But once there's enough RAM to accommodate the entire static workload, upping the amount of memory won't speed server performance. At this point, adding processors is your best bet.
Most websites contain more than static content, of course. Many web-servers provide APIs that let you create apps that live on the server and extend its basic functions. To measure how dynamic content affects a web-server's performance, we used the WebBench ISAPI test suite on a server with only 64Mb of RAM and a single processor. Under this configuration, the ISAPI test scores were roughly 30 per cent higher than the static test results, because ISAPI requests dynamically create HTML content for clients without requiring disk access.
As we increased the RAM, the ISAPI results improved significantly. Eventually, though, we reached a point at which the CPU overhead inflicted by the ISAPI application outweighed any performance benefit derived from taking some of the load from the disk.
Although not as CPU-intensive as a CGI app, the ISAPI test takes its toll on the server CPU. Adding RAM over and above what it takes to cache the static content won't improve performance once the processor is saturated; you must then add CPUs. When we installed a second processor, test scores jumped nearly 70 per cent; they rose another 30 per cent after we added two more.
If your site combines static and CGI applications, adding RAM or doubling the size of your disk array might not help much. That's because even a few CGI requests can quickly saturate a CPU. Most sites with a significant number of CGI requests set up a separate server dedicated to this task. In this case, adding CPUs is your best option for improving performance.
Dynamic content saps the CPU
The WebBench scores for dynamic ISAPI content topped those for static content - when we tested on a server configured with 64MB of RAM and a single processor. Creating dynamic content is generally more CPU-intensive than serving static content, but once the static data exhausted the limited cache, the web-server was forced to turn to the disk subsystem, and that generated greater overhead. But as we added RAM to the server, the static results improved, because the server had to rely less and less on the disk subsystem. Once there was enough RAM to cache the entire static- content workload, ISAPI was unable to catch up, due to the CPU strain involved in dynamic content. When you reach this point, start adding CPUs to the web-server to improve its performance.
Chris Lemmons is the development leader for ZDBOp's WebBench and NetBench products.
Compiled by Clive Morris