Green Storage and the challenges of data growth

Whatever their long-term view ought to be, many people see the premium on green goods as discretionary, and among the first thing to be cut when times get...

Whatever their long-term view ought to be, many people see the premium on green goods as discretionary, and among the first thing to be cut when times get tough. To take just one example, UK organic food sales are estimated to have slumped by a third in the last six months. But for datacentres, green could be the colour of survival.

Green storage is not just a tick in the box on the way to meeting corporate social responsibility targets. The many different technologies that fall under the green storage heading can all be used to rein in the datacentre's fastest-growing overheads, and can help postpone the expansion or relocation of datacentre premises. It may even be the only approach that will enable companies to keep up with the ever-swelling flood of data, and thereby stay in business.

Last year Gartner warned that by the end of 2008, nearly 50% of datacentres worldwide would be facing increasing difficulties in finding the electricity they needed to power - and cool - their equipment. Even if they can find the power, organisations which currently spend 4-8% of their IT budgets on energy may find their costs rising as much as fourfold within the next five years.

Gartner research vice-president Rakesh Kumar says that although the threat still looms, people are finding ways of working around it, such as virtualisation, running datacentres at higher temperatures, and using outside air for cooling. "Many of the datacentres in use today were built 10 or 15 years ago and are not appropriate for current requirements. People are building new sites, or going to hosting players - demand for hosting is exceeding supply in major cities in Europe and the US."

Gartner's figures are quoted in a 2007 report to Congress on datacentre energy consumption by the US Environmental Protection Agency (EPA), which adds that over the next five years, power failures and limits on power availability will halt operations at more than 90% of datacentres.

The EPA also reports that 40 out of 100 datacentre operators said they had run out of space, power or cooling capacity without sufficient notice. The likelihood is that it is data storage, as much as processing, that has brought them to the brink of meltdown. Consumers uploading videos, businesses storing multiple instances of the same data for different applications and for different levels of backup, and regulatory requirements such as Sarbanes Oxley in the USA, are just a few of the sources of the surge. Some commentators suggest our data storage requirements could soon be doubling every year. Among soberer estimates, IDC puts the increase at 60% a year and rising.

According to IDC's March 2008 white paper "Driving Reduced Cost and Increased Return from the Data Center" (prepared on behalf of Fujitsu-Siemens), electricity currently accounts for 8.4% of datacentre running costs. IDC says the same amount of energy is consumed to supply power and cooling as to run IT equipment. "This means each watt saved in terms of IT equipment counts double because one watt is also saved in terms of the infrastructure." Other estimates say overall consumption is around 2.5 times direct consumption by the equipment.

Most attention up to now has focused on servers, with improvements to both energy efficiency and virtualisation. But various industry estimates suggest we are coming to the limits of the gains that server virtualisation can make.

According to the Storage Networking Industry Association's Green Storage tutorial, the proportion of datacentre power and cooling costs associated with storage can vary from less than 10% to more than than 40%. The SNIA's rule of thumb is 60% for servers, and 20% each for networking and storage.

But that 20% rule of thumb should be regarded as a starting point only. Storage is not seeing the energy efficiencies which are being achieved with other technologies. The EPA's report to Congress says that while power consumption by networking equipment and servers grew 14% and 13% respectively between 2000 and 2006, storage was up by 20%.

However, the SNIA says there is "no consistent definition of storage". If, as the old industry adage goes, you cannot manage what you cannot measure, you are in an even tougher position if you cannot even fully define what you are trying to bring under control. The SNIA is working to establish metrics for storage energy consumption and management. The US Green Grid is working on a broader set of metrics for the whole datacentre.

Once upon a time, when people were cheap and hardware was expensive, managing disk to get highest rates of utilization and performance was a prized art. But with storage getting cheaper by half every year, according to Clod Barrera, chief technical strategist at IBM Storage, people "stopped managing storage with real science and real tools".

Instead, storage planning became a matter of deploying more than you thought you would need, so you did not get caught out. Now, Barrera says, even if you can afford the capital expense of more disk, you cannot afford the power to run it, so "there is a significant value-add for doing a better management job".

If we could parachute in teams of hardy veterans from mainframe days, we could perhaps learn to do more with what we have already got. But these are the people likeliest to have been squeezed out in the 1990s because their skills were seen as obsolete and too expensive, their frugality and obsession with efficiency and "elegance" too narrow for an age of growth without limits.

Information Lifecycle Management is the modern equivalent of those old skills. As IDC defines it, ILM "aligns requirements from business processes (expressed in service level objectives) with the possibilities offered by the available storage resources in an optimized process. Policies control the dynamic processes if, for example, the value of the information objects changes over time".

Mission critical data can be stored on high performance Serial Attached SCSI (Sas) disks, the 3Gb/sec replacement to Ultra SCSI introduced in 2004. Recently introduced 2.5-inch Sas disks claim 50% power and cooling efficiency over their 3.5-inch predecessors, as well as taking up less space. Or you could simply replace old hard drives new drives offering several times the capacity will also be more energy efficient.

Less critical "nearline" data can be stored on "cheap and power efficient" (IDC) Serial Advanced Technology Attachment (Sata) disks, which are used in Massive Arrays of Idle Disks (Maid) systems. Disks are only "spun-up" when the data on them needs to be accessed. Maid is designed for maximum utilisation rates of 25% - but this is at the top end of the range for all arrays, even those in online use. Maid is designed for persistent or "write once, read occasionally" data, as opposed to transactional data which is updated and accessed frequently.

For archiving, tape consumes a fraction of the power of disk - supplier and analyst claims range between 20 and 100 times more energy efficient. Tape once played a major role in backup and recovery perhaps it is time to rethink how much data needs to be on "hot" standby on disk for the business to function, and how much could be recovered with less urgency from tape. Users' and customers' performance expectations may have to be managed downwards if the recessionary climate deepens, these expectations will probably fall anyway.

Storage (or Hierachical) Resource Management tools can help by analysing usage patterns, and identifying under-used capacity and data that could be moved to near or offline storage. SRM is also used to predict future capacity needs. An international working group is defining standards for SRM web services.

Setting ILM policies is not as easy as it used to be. How do you define mission-critical in these days of outward-facing, entertainment-based services? Will you lose customers - or eyeballs for the ads that provide your revenue - if people have to wait too long for the skateboarding kitten video?

And how many copies of the skateboarding kitten video are you storing? Data de-duplication can help. De-duplication specialist Diligent Technology defines it as "a method for finding and eliminating redundant data from the network and/or storage infrastructure". A single instance of data appears to multiple applications as multiple "nominal" instances.

In its white paper Guidelines for the Evaluation of Enterprise De-Duplication Solutions, Diligent explains, "the de-duplication ratio is the ratio of nominal data to the physical storage used... A 10:1 ratio means that 10 times more nominal data is managed by the system than the physical space required to store it." Some suppliers use voodoo calculations to claim ratios of up to 500:1. Diligent says 25:1 is more realistic. Diligent was acquired by IBM earlier this year its engine is used by HDS, Sun and Overland Data.

De-duplication can be oversold as a green technology. As IBM Senior Storage consultant Tony Pearson points out on his blog, a de-duplicated disk is greener than a non-de-duplicated disk, "but not as green as storing the data on non-de-duplicated tape".

Storage virtualisation is another approach. Gartner's Kumar says there is a lot of misinformation about virtualisation the benefits only come when you decommission the machines identified as under-used. "It is good operational housekeeping, but it is not going to make an enormous difference to power consumption." As for using the redundant machines to cope with future demand, he points out that you will lose out on the substantial energy efficiency advances of newer technology.

"Thin provisioning" specialist 3Par claims virtualisation requires "just one terabyte of capacity for every 2.5 terabytes required with traditional storage arrays", and can cut storage carbon footprints by 60%. Earlier this year 3Par announced that its customers are responsible for "estimated worldwide combined annualized energy savings of approximately £3.6m, equivalent to eliminating approximately 48,000 metric tons of CO2 and 250,000 kilowatt-hours of electricity."

These are rare hard figures in the quest for green storage. Without standardised metrics, datacentre managers who are struggling to balance demands for growth and performance with energy efficiency are at the mercy of supplier claims.

And unfortunately the benefits of green technology have been over-hyped to the point where, according to a survey by Overland Data, UK buyers are "incredibly sceptical" about these claims. Overland also found only 22% of UK storage buyers put "green" in their top three criteria, compared to three-quarters of buyers in France and Germany. But a hard-headed business-based approach to cost reduction through energy efficiency will reach the same goal as an overtly green strategy it may even get there more quickly.

Read more on Network software