In-depth: Optimising data centre operations (Next-generation enterprise IT)

In fat times or thin, the datacentre manager always seems to get it in the neck. Last year, unparalleled growth meant that companies could not fit enough...

In fat times or thin, the datacentre manager always seems to get it in the neck. Last year, unparalleled growth meant that companies could not fit enough equipment into their buildings to support their business expansion and struggled to power it. Consequently, datacentres were told to become more efficient. Post credit-crunch, IT budgets will come under great strain as IT spending drops. Companies will be trying to save money on power and equipment, and the IT department, as ever, will be under scrutiny. In the data centre, the belt-tightening will continue. How can the datacentre manager placate the decision-makers by optimising operations?

Danny Bradbury investigates the issues in the first of our series of articles exploring next-generation IT, produced in association with IBM. There is also an accompanying podcast on optimising data centre ops.

Damian Milkins, CEO of datacentre services firm Control Circle, says an audit is the place to start. Unless you know what you are currently using in terms of CPU cycles, power consumption and space, you will not know how much you are likely to save, or where the improvements can be made. "We run things such as PlateSpin to look at servers and the performance of virtualisation platforms," he says. "We also watch the power usage on the datacentre floor, to ensure that we have an even spread without any particular hotspots. It is a combination of third-party tools and scripts that we have created."

PlateSpin Recon, each instance of which can manage 2,000 servers, will gather data using metrics such as CPU, storage and memory use, compiling it into reports to help datacentre managers identify areas for improvement. The tool, which can also take data inputs from tools such as HP Openview and Microsoft Operations Manager, provides datacentre baselining and scenario modelling functions, enabling managers to see at which point they are likely to run out of power capacity.

Power consumption monitoring is an important part of such an audit. Several vendors now offer tools that go beyond conventional rack power monitoring systems which used a needle to visually display power usage. IBM offers its Active Energy Manager, which measures the energy being used at a processor level. The system, available as an extension to IBM Director, enables regular power reporting and also lets administrators set power usage caps on particular servers. There are other alternatives. LS Simple tackles this at the rack level by installing hardware sensors that feed information through to its own reporting tool via the network, and Avocent provides power distribution units (PDUs) that can be controlled and monitored remotely to report power usage at the port level.

The problem with power management tools is that there are no standards to help datacentre managers handle heterogenous environments. IBM says it can assess power usage in other equipment at the rack level by taking input from its PDU+ power distribution unit, but proper inter-vendor standards would enable a single management system to control the amount of power used by a server, disc or switch, regardless of vendor. Still, vendor-specific power management is a start.

Understanding power consumption becomes an important part of chargeback mechanisms, which in turn help to optimise datacentre usage by making business departments accountable. Even in a colocated facility, chargeback systems can be used to regulate power usage and expenditure. Brian Fry, vice-president at Canadian datacentre provider RackForce, which is building a 150,000 square foot facility in Kelowna, Canada. He says that if he puts water cooling technologies in place he will be able to squeeze a thousand watts per square foot into the facility. He will use a combination of power and space usage to help drive down customers' footprints.

"We want the customer to take as little space as possible, so the more racks they take the more it will cost them. But they also pay by the volt amp that travels to the kit," he says. "It is like a green tax that is done for the right reasons." Customers might decide to move to newer, more efficient equipment to gain a longer-term benefit in their colocation contract.

A big part of those efficiency savings will inevitably come down to consolidation, which in turn means virtualiation. This approach carries interesting ramifications. Firstly, you cannot assume that your cost savings will be linear. "While you are getting more virtual servers on a single device, you will find that the devices get bigger in terms of size and power demands, simply because they are having to support so much more," says Gary Boyd, datacentre manager at Rackspace. "You may be able to cut out six or seven other servers, but you will often find that the power demand for the new server will be more than it was for one of the original servers."

That is particularly true if you do it sensibly and buy a fault-tolerant system. It is a brave datacentre manager who puts 100 VMs on a server that does not have dual power supplies and duplicate processors and memory. Stratus Technologies will sell you a box built on relatively cheap Wintel hardware, but it will still set you back 1.5-2.5 times the cost of a regular Wintel server. All of that must be factored into the budget when you are dealing with datacentre optimisation ROI.

However, the cost of managing the environment also needs to be addressed. Even when the large numbers of X86 servers people bought over the last decade are virtualised, they must still be managed. Colin Bannister, vice-president of technical sales at Computer Associates, argues that before datacentre managers get into issues like chargebacks for business departments, they need to handle these servers properly. "How do you provision the servers? It is the manpower needed to provision services in a virtual world that is the issue. It is done mostly manually at the moment," he says. We are starting to see tools to help automate such systems. CA's Virtualisation Automation Manager is designed to provision or deprovision virtual servers based on events or performance data.

But optimising equipment is only part of the battle. The facility (including cooling and other equipment) uses a significant proportion of the whole IT department's energy. How can this be reduced?

Those mechanical and electrical (M&E) systems are all designed around basic parameters such as ideal datacentre temperatures and humidity levels, explains Rackspace's Boyd, but in many cases, the capacity of servers deployed at the start is much less than the datacentre's total capacity. "So you are burning power that you do not need to burn. You can allow the temperature to rise." Being able to turn off a whole computer room air-conditioning (CRAC) unit could drastically reduce your power costs from cooling.

Getting a good picture of temperature and humidity in the datacentre is a big part of making those decisions. Using tools from firms such as SynapSense can help you to visually understand temperature, pressure and humidity information throughout the datacentre, enabling you to identify hotspots and cool them down. The company hooks up wireless sensors to parts of the datacentre and uses the information in its visual analysis tool.

So, you have looked at virtualisation, rigged up your CRAC units to talk to your systems management software, and persuaded the boss to let you turn the middle management's floor into a modular datacentre, after he fires them all next month. Good for you. But did you remember to turn all your servers off, and sort out your patch cabinet?

"Both Cassatt and VMWare have software products to analyse what is going on and turn off servers that are not needed, such as development servers, in the middle of the night," says Paul McGuckin, research vice-president at Gartner. "This can represent a tremendous energy saving. Even though servers today are much better than they used to be, even at a baseline level they are using hundreds of watts of electricity that could be saved if they were turned off."

Optimising the datacentre is not something you can do with one action. It is a multi-faceted operation, encompassing changes to your facility, equipment and management methodology. Done correctly, however, it could reduce the company's overall expenditure on IT - in a increasingly cash-constrained world, that could be even more valuable tomorrow than yesterday.


Top tips for datacentre optimisation

There are a number of ways to optimise your datacentre. Here are some distinct pointers.

Modular datacentres

Datacentres used to be built from the ground up in one go, at new sites. That meant expensive facilities that would often be underused for long periods until they filled up. Modular datacentre systems (IBM sells one) can be bolted together inside existing buildings, making it easier to map capacity to requirements and reduce both capex (building provision) and opex (energy costs).

Free air cooling

Use air from outside to help cool your datacentre environment by passing it through a heat exchanger. This can cut cooling costs.


Free air cooling is just one reason why you should consider the location of the data centre when optimising operations. Cooler climes mean more available free air cooling. Property costs are another factor.

Hot and cold aisles

Using ceilings and floors for aisles can help to direct cooling air to specific areas, cutting down on the amount of cooling needed.


Put as many servers as possible in one box to increase power efficiency, but do not forget to allow for fault tolerance in your physical box, and for management software to handle server provisioning.

Power measurement and control

You cannot begin to make improvements until you understand how much power you are using. Install monitoring systems at the rack, and take advantage of vendor-specific power management systems to help control server energy usage.



See also: Podcast - Optimising data centre operations (Next-generation enterprise IT)

Read more on Server hardware