Times were when keeping the system up from nine till five was an
achievement. Now it's expected 24 hours a day. Adding just one extra nine can make all the
difference.
The issue of systems availability has become increasingly important over the last few years with
the advent of the global economy, ever more competitive markets and the move to e-business.
For many organisations, it is no longer enough that they operate on a nine to five basis as the
imperative is now to open for business 24 hours a day, seven days a week.
For all companies, however, system downtime costs money. At best, this might mean staff are not
able to do their jobs, but at worst, it can lead to a loss of revenue if customers take their
business elsewhere.
So measures need to be taken to ensure that systems can be kept running should a problem occur,
although such measures clearly need to be proportionate to the impact that any failure is likely to
have on the business.
Nigel Wilson, business development manager at IBM business partner, Syan, explained: "High
availability allows you to retain access to your critical data and applications even during system
or system component failure, and it covers both planned and unplanned downtime."
Different things
"But when looking at what strategy to employ with regard to high availability, you have to
consider how long you can afford any given application to be offline and what it will mean to the
business if it goes down. Business critical means different things to different companies, so it's
important to think about what it means to you," he said.
Planned downtime, Wilson added, means taking a system offline to undertake such activities as
software upgrades, while unplanned downtime can be brought about by anything ranging from the hard
disk crashing to a lightning
strike at the workplace, which causes a power outage.
However, according to Wilson, organisations have two main ways of handling these events. They can,
for one, design and build their own high availability infrastructure using products such as
Lakeview Technology's MIMIX Managed Availability Solution Suite, which Syan resells.
This replicates and synchronises data and applications running on one IBM iSeries (formerly known
as AS/400) to a second one, so that it can take over operations in the event of a failure.
Another option, claimed Wilson, is to subscribe to disaster recovery (DR) services, which are
provided by organisations such as Syan, GuardianIT, IBM Global Services and Sema and offer high
availability as part of their overall offering.
John Kersley, Sema's general manager of global recovery services, advises that if a company decides
to go down the DR route, the level of service they require is likely to depend on how much risk
they are able to tolerate.
"Options range from doing nothing after you've considered actively that it doesn't matter if your
systems go down, although that's pretty unlikely, through to recovery in minutes if that's a cost
worth bearing to continue the business," he said.
But Kersley explained there is a "risk/reward and spend/reward ratio" involved in deciding what to
go for, and the cost of services increases directly in line with the scale and speed of
recovery.
This means that fast recovery times cost more than slower ones, but it is also more expensive to
ensure that one mainframe and 100 PCs are operational than 10 PCs.
David French, channel account manager at Legato Systems, warns, however, that not all organisations
are ready to trust a third party to host their data and as a result, prefer to control the
management of it inhouse.
And Legato sells a raft of high availability software to help IT managers do just that. These
include Networker, its flagship backup and recovery software, Automated Availability Manager, which
monitors and ensures application availability, and Co-Standby Server, which undertakes failover for
Windows NT and 2000 machines.
But French advises: "There's no one size fits all and most vendors just provide one or two pieces
of the puzzle. High availability should be a consultancy-based sell really because it's a very
fragmented market, although it doesn't tend to be."
Chris Boorman, European marketing director at Veritas' Software, a rival of Legato, explained why
such figures are significant. "If you aspire to 99% availability, that is the equivalent of 3.5
days of downtime per year. But, 99.999% is 8.75 hours, 99.9999% is 52 minutes, and 99.99999% is
five minutes per year," he said.
The answer
"If you're aspiring to the concept of 99.99999%, you need to have the answer to how to deal
with local problems such as disk array or CPU failures, and logical problems such as bugs, viruses,
or site problems such as earthquakes," he added.
But the key thing, Kusnetzky believes, is for IT managers to clarify what they are trying to
accomplish, examine where the pitfalls might be to achieving this, and to build their
infrastructure accordingly.
"Think of an artist painting a picture. The industry provides a whole palette of colours, but two
different artists can take them and paint pictures that are wildly different but achieve the same
goals. Each layer of the architecture can be designed to improve availability and layer upon layer
of hardware and software can be invoked, but it depends on what's appropriate," he continued.
Recovery
Colin Grocock, IBM's eServer business development director, agrees. "To design a modern
ebusiness infrastructure, you have to work out what can go wrong, determine how to work round
failures and how to recover from situations. To get to very high levels of availability, you need
to duplicate and replicate everything," he said.
But it is also vital for IT managers to think about how they develop applications from the outset
to ensure they are written from the ground up to be as available and have as much redundancy as
possible, Grocock added
People problems
Mark Lewis, Sun Microsystems' UK and Ireland product marketing manager for datacentre servers,
on the other hand, claims that only 20% of system downtime is due to product failure, and the rest
is down to problems with people and processes.
As a result, Sun provides professional services to help customers understand what procedures they
need to put in place and what training their staff are likely to need.
"It's alright spending £1000s on different systems, but if you don't have enough people trained up
to use them and one of them pulls out the wrong plug or goes off sick, the company is exposed,"
Lewis explained.
He also recommends that, on the process side, customers set up what is known as a runbook on their
corporate intranet. This is a database to provide inhouse staff or contractors with information on
approved procedures should something go wrong along with contact details of relevant personnel and
suppliers.
But in the end, the key thing to remember when dealing with the thorny issue of high availability
is that it is necessary to understand how system downtime is likely to affect the business and to
choose products and services accordingly.
Email Alerts
This was first published in December 2001
