Keeping it up all hours

Times were when keeping the system up from nine till five was an achievement. Now it's expected 24 hours a day. Adding just one...

Times were when keeping the system up from nine till five was an achievement. Now it's expected 24 hours a day. Adding just one extra nine can make all the difference.

The issue of systems availability has become increasingly important over the last few years with the advent of the global economy, ever more competitive markets and the move to e-business.

For many organisations, it is no longer enough that they operate on a nine to five basis as the imperative is now to open for business 24 hours a day, seven days a week.

For all companies, however, system downtime costs money. At best, this might mean staff are not able to do their jobs, but at worst, it can lead to a loss of revenue if customers take their business elsewhere.

So measures need to be taken to ensure that systems can be kept running should a problem occur, although such measures clearly need to be proportionate to the impact that any failure is likely to have on the business.

Nigel Wilson, business development manager at IBM business partner, Syan, explained: "High availability allows you to retain access to your critical data and applications even during system or system component failure, and it covers both planned and unplanned downtime."

Different things
"But when looking at what strategy to employ with regard to high availability, you have to consider how long you can afford any given application to be offline and what it will mean to the business if it goes down. Business critical means different things to different companies, so it's important to think about what it means to you," he said.

Planned downtime, Wilson added, means taking a system offline to undertake such activities as software upgrades, while unplanned downtime can be brought about by anything ranging from the hard disk crashing to a lightning strike at the workplace, which causes a power outage.

However, according to Wilson, organisations have two main ways of handling these events. They can, for one, design and build their own high availability infrastructure using products such as Lakeview Technology's MIMIX Managed Availability Solution Suite, which Syan resells.

This replicates and synchronises data and applications running on one IBM iSeries (formerly known as AS/400) to a second one, so that it can take over operations in the event of a failure.

Another option, claimed Wilson, is to subscribe to disaster recovery (DR) services, which are provided by organisations such as Syan, GuardianIT, IBM Global Services and Sema and offer high availability as part of their overall offering.

John Kersley, Sema's general manager of global recovery services, advises that if a company decides to go down the DR route, the level of service they require is likely to depend on how much risk they are able to tolerate.

"Options range from doing nothing after you've considered actively that it doesn't matter if your systems go down, although that's pretty unlikely, through to recovery in minutes if that's a cost worth bearing to continue the business," he said.

But Kersley explained there is a "risk/reward and spend/reward ratio" involved in deciding what to go for, and the cost of services increases directly in line with the scale and speed of recovery.

This means that fast recovery times cost more than slower ones, but it is also more expensive to ensure that one mainframe and 100 PCs are operational than 10 PCs.

David French, channel account manager at Legato Systems, warns, however, that not all organisations are ready to trust a third party to host their data and as a result, prefer to control the management of it inhouse.

And Legato sells a raft of high availability software to help IT managers do just that. These include Networker, its flagship backup and recovery software, Automated Availability Manager, which monitors and ensures application availability, and Co-Standby Server, which undertakes failover for Windows NT and 2000 machines.

But French advises: "There's no one size fits all and most vendors just provide one or two pieces of the puzzle. High availability should be a consultancy-based sell really because it's a very fragmented market, although it doesn't tend to be."

Chris Boorman, European marketing director at Veritas' Software, a rival of Legato, explained why such figures are significant. "If you aspire to 99% availability, that is the equivalent of 3.5 days of downtime per year. But, 99.999% is 8.75 hours, 99.9999% is 52 minutes, and 99.99999% is five minutes per year," he said.

The answer
"If you're aspiring to the concept of 99.99999%, you need to have the answer to how to deal with local problems such as disk array or CPU failures, and logical problems such as bugs, viruses, or site problems such as earthquakes," he added.

But the key thing, Kusnetzky believes, is for IT managers to clarify what they are trying to accomplish, examine where the pitfalls might be to achieving this, and to build their infrastructure accordingly.

"Think of an artist painting a picture. The industry provides a whole palette of colours, but two different artists can take them and paint pictures that are wildly different but achieve the same goals. Each layer of the architecture can be designed to improve availability and layer upon layer of hardware and software can be invoked, but it depends on what's appropriate," he continued.

Colin Grocock, IBM's eServer business development director, agrees. "To design a modern ebusiness infrastructure, you have to work out what can go wrong, determine how to work round failures and how to recover from situations. To get to very high levels of availability, you need to duplicate and replicate everything," he said.

But it is also vital for IT managers to think about how they develop applications from the outset to ensure they are written from the ground up to be as available and have as much redundancy as possible, Grocock added

People problems
Mark Lewis, Sun Microsystems' UK and Ireland product marketing manager for datacentre servers, on the other hand, claims that only 20% of system downtime is due to product failure, and the rest is down to problems with people and processes.

As a result, Sun provides professional services to help customers understand what procedures they need to put in place and what training their staff are likely to need.

"It's alright spending £1000s on different systems, but if you don't have enough people trained up to use them and one of them pulls out the wrong plug or goes off sick, the company is exposed," Lewis explained.

He also recommends that, on the process side, customers set up what is known as a runbook on their corporate intranet. This is a database to provide inhouse staff or contractors with information on approved procedures should something go wrong along with contact details of relevant personnel and suppliers.

But in the end, the key thing to remember when dealing with the thorny issue of high availability is that it is necessary to understand how system downtime is likely to affect the business and to choose products and services accordingly.

Read more on Business applications