Buying the hardware and the software to manage your
storage needs
from one supplier is a logical choice for most businesses, as
there is no question who is responsible when it goes
wrong.
But it is not always the best option to use what comes out of
the box. Sometimes, buying a third-party product can make a big
difference. This is what
retail bank HBOS found when it needed to monitor its
storage area network (San) in order to tweak performance.
Another challenge IT departments face is that storage is
constantly growing. It is a costly exercise to upgrade storage
hardware and database and network infrastructure to keep pace with
this growth.
As supermarket Tesco found, upgrading is not the only option.
Sometimes, a low-cost option can make a big difference. Rather than
turn-charging the infrastructure, Tesco simply compressed the
data.
Case study: HBOS
Keeping 67,000 users of real-time applications happy is not
easy, but the storage team at
HBOS has found high-level monitoring tools help. The UK's
largest mortgage and savings provider has a relationship with two
out of every five UK households, so the storage team has its work
cut out optimising response times of systems in order to keep the
relationship with customers and internal users smooth.
"We have lots of different systems that need to be accessed in
real time," says Simon Close, service manager of storage management
services at HBOS. These include datawarehouses, large online
database systems and real-time transaction analysis applications
holding terabytes of customer data. "Such applications demand swift
and consistent response times - sub-second delays will not do,"
says Close.
In order to provide continuous data access, a preliminary San
was implemented in 2000, and another four Sans have since been
rolled out. The Sans run specific data access requirements for
functions including back-up, production and pre-production. Each is
maintained separately, as the data stored on it is accessed in a
different way. "We like to keep the back-up San separate so it does
not impact on live transactions," says Close.
Today, more than 5,000 San ports and 100 switches constitute
five replicated, cross-site "fabrics" - essentially ring-fenced
configurations of dedicated storage, connected by fibre links. The
fabric is continually replenished and HBOS is currently upgrading
legacy 3900 switches to 4Gbyte Director models from Brocade. This
provides an upgrade from 32 port switches to 256 port versions, and
lets HBOS stay ahead of the game in terms of bandwidth and
throughput.
Having such robust Sans in place does secure data across HBOS'
two datacentres located in West Yorkshire, but it does not
guarantee the optimal performance of the applications that it
supports. "Sans are not the end of all performance problems - in
many ways they are the start," says Close.
For example, the IT department must now think about the way that
each application is designed and deployed. "An Oracle application
may entail specific different read and write ratios and require
storage to be provisioned in different ways compared to other
objects," says Richard Briggs, senior technical infrastructure
developer in HBOS' storage management services group.
"Similarly, application changes, operating system patches, or
simply adding servers or storage modules may all have an impact on
San performance," Briggs adds.
Because of the scale of the datacentre and storage operation,
HBOS employs a dedicated team to look after this element of the IT
operation. A team of 30 looks after storage, and within this, a
six-strong team tends to the Sans. The latter found themselves in a
political hotspot when, despite the improved resilience it brought
to the operation, the Sans fast became the victim of the "blame it
on the network" mantra.
"One of the biggest challenges we face is the perceived
performance issues related to our Sans - for example when a user
tells us his or her application is slow," says Close. Indeed, the
majority of problems on the Sans are not failures, but intermittent
"niggles", such as slow response times or a server reacting badly
to delayed I/O. The possibilities are endless as to where the
source of the problem might be, but nonetheless, "everyone was
blaming the Sans when things went wrong", says Briggs.
The team had been relying on hardware-specific tools, which
provided a lot of statistics on a product's performance rather than
any lower-level diagnostic information. "None of the tools that we
had been using to monitor the Sans' performance could get us to the
root of the problem, and some problems in the early days went
unsolved," says Briggs.
Because of these shortcomings, HBOS used a process of
elimination method, even though it recognised it was not an
effective way to handle such issues. Whenever there was a problem
it could not solve, the team would call its third-party maintenance
supplier, HDS, which would bring in an analyser. However, this
entailed shutting down several servers, both to plug in the
analyser and to take it out again.
The HBOS storage team realised it needed its own dedicated San
monitoring tool to allow it to troubleshoot with greater accuracy,
and selected Finisar's Netwisdom. The impact was instantaneous, if
a little alarming. "When we first plugged it in, the dashboard lit
up red and we were shocked at the amount of errors being reported,"
says Close. Since then the San team has learned to prioritise and
retune the alerting thresholds for different applications.
HBOS can now solve problems in a proactive way, such as when the
team pinpointed an underperforming piece of middleware servicing an
online banking application. "The message broker system was
struggling to cope with the demands of web-based traffic and
customers were getting impatient," says Close.
Netwisdom identified the "hot" area on the disc and prompted the
storage team to work with Unix administrators, database
administrators and the application team to improve response times
for the application.
Third-party suppliers like it too, says Briggs. "It means they
can get to the root of the problem quicker, rather than coming on
site and poring through logs. It helps them understand at a
detailed technical level how their products are operating in a
complex and demanding environment. This allows them to identify and
isolate not only defects, but also enhance their products."
However, Briggs does note that, "On a day-to-day basis we are
still not fully there yet - it is an ongoing process of tuning and
optimisation." This is inevitable as performance benchmarks degrade
over time and customers raise the bar in their expectations.
Importantly, however, finger pointing at the Sans has reduced.
"We have been able to demonstrate that San response times are well
within the thresholds that have been dictated. The problem is
either in the way the application has been configured, or how the
storage was originally provisioned," Close adds.
"Armed with that information, groups can evaluate their
application design and redevelop or tune the application
accordingly."
Case study: Tesco
Increasing storage requirements at
Tesco meant backing-up data was absorbing valuable time of the
IT operations team, and the retailer faced the prospect of the
spiralling data volumes undermining the success of its online
shopping business.
The rate of growth of Tesco's online shopping operation is not
dissimilar from the main store's business, at about 30% year on
year, says Chris Howell, IT manager of operations and
infrastructure at Tesco.com. The online shopping site generates
operational data that is held in a series of
SQL Server databases. This includes information about available
products, billing and delivery information for customers, as well
as any "favourites" that they have saved.
With the quantity of data increasing rapidly and the time taken
to perform the back-up approaching five hours, there was not much
room for error in the retailer's slender back-up window. Howell was
concerned that the main back-up procedure should not breach the
period between midnight and 6am. "If a disaster occurred after 8am
and we had not succeeded in backing-up, there would be problems
opening stores' sites," he says.
A team of three database administrators was already performing
many back-ups in each 24-hour period using traditional log shipping
and replication methods. However, Howell decided to make a
pre-emptive strike before the besieged storage operation ran into
trouble. "We want customers to have the best possible experience,
and anything that slows their online journey is not good."
Howell reviewed three options: a database overhaul, storage
compression techniques and a network upgrade to increase the data
flow. Data compression far outweighed a database architecture
overhaul or an upgrade of network capacity in terms of value for
money, Howell discovered. Investing in compression software cost
under £20,000 and Tesco expects the software to secure another five
years' life expectancy for its current back-up arrangement.
Beefing up the databases would have been a viable alternative,
but the investment would have been more expensive and taken a lot
longer. "Investment in a major piece of database hardware and the
corresponding time in the configuration and migration would not
have stacked up," says Howell.
By deploying compression tool Litespeed from software supplier
Quest Software, Tesco has secured a 67% improvement in performance.
"Our 120Gbyte SQL Server previously took 59 minutes to back-up, but
with Litespeed it now takes 18 minutes. This improvement means that
we have ample time to deal with any back-up problems to ensure that
Tesco.com is always open," says Howell.
Another important factor in Litespeed's favour was the minimal
amount of time required to execute the upgrade. "It was easy to
install and we only needed to call the Quest helpdesk once," says
Howell. This was largely because of the simplicity of the concept.
The product works in tandem with SQL Server which has its own
native back-up, and Litespeed simply plugs into and supplements
this.
Data compression has future-proofed the Tesco.com operation for
the next five years, and gives Howell a breather to concentrate on
other aspects of the storage operation. Freed from the headache of
back-up problems, Howell can concentrate on retuning the rest of
the storage infrastructure that keeps the Tesco.com business
growing. "It is finetuning performance of the Sans and network
attached storage infrastructure that occupies me now."
The merits of data de-duplication>>
Tesco standardisation leads the pack >>
HBOS >>
Tesco >>
Comment on this story:
computer.weekly@rbi.co.uk