Can there ever be such a thing as an 'always-up' Web site? Some
people think so, especially if you use a combination of load
balancing, clustering and IP traffic monitoring.
Anyone who trades on the Internet, whatever market they may be in,
relies on getting high on business basics: high availability, high
reliability, high security and high speed.
But with the e-commerce boom still on a rising curve, albeit a less
steep one than this time last year, already overloaded networks are
facing the inevitable strain that ever increasing amounts of
traffic across the public Internet bring to the Web party.
The buzzwords for savvy, modern business during recent years have
been Internet Protocol and convergence - and for good reason. IP is
the infrastructure technology of choice as we start life in the
21st century, so naturally its benefits of budget savings,
scalability coupled with relative simplicity, equally simplified
support and management, have led to it becoming the focus of
converging technologies and markets.
A single wire providing myriad services has always been the holy
grail, but now the search is over and the masses are starting to
coalesce around it. But therein lies the rub, because much like the
plain old telephone system was never intended to carry data and
therefore voice users can get a rough ride when congestion reveals
its hand, so IP was never meant to be a multi-discipline traffic
carrier.
With traffic types as varied as TCP/IP, bridged Ethernet, Novell
IPX, VoIP and other applications fighting for space across the
enterprise network, and the fact that their bandwidth requirement
and latency 'sensitivity' can vary dramatically, it should come as
no surprise that business-critical application performance can
become degraded.
Think about it - you've got TCP/IP and non-TCP/IP traffic competing
for network resources, along with multiple types of TCP/IP traffic
competing with each other, just to make things more complicated.
Under such circumstances can there be such a thing as the
'always-up' commercial Web site? Surprisingly, the answer is a
qualified 'yes, just about'. The trick lies in getting the balance
right through a combination of load balancing, clustering and IP
traffic monitoring techniques.
Take a long-term view
Sitara Networks' Jane Cox
outlines some of the options: "Throwing more bandwidth at a
corporate network is one option, but this is a short-term approach.
Companies have upgraded the 10Mbps LANs of 1995 to the 100Mbps fast
Ethernet LANs that fuel IT networks today, and some are
implementing gigabit Ethernet links.
Even as the cost of bandwidth continues to fall, there are still
congestion points affecting the enterprise and service provider's
ability to meet service commitments for critical business traffic
flows. Mission-critical business suffers as bandwidth does not
recognise nor prioritise data flow.
Best-efforts delivery may be acceptable for Web traffic, but
real-time applications require a guaranteed delivery mechanism that
meets the most stringent quality of service standards."
So if increasing bandwidth doesn't solve network availability
problems because many IP-based applications such as e-mail eat up
as much bandwidth as they can, squeezing business-critical
applications into the slow lane, what about intelligent
routers?
"Routers have been designed to be high performance solutions for
forwarding traffic. Increasingly, though, vendors have been adding
some intelligence for QoS management, but not the depth of
intelligence that effective QoS services require," says Cox.
"Routers address only a subset of traffic management necessities,
not the extensive policy setting capabilities QoS appliances
possess. In addition, most routers offer limited classification
capabilities and do not provide Web caching or visual real-time
monitoring and reporting.
Regardless of whether they use routers for traffic management, most
businesses have to implement complementary QoS devices for
classification, hierarchical policy setting, Web caching and
real-time monitoring and reporting."
How about QoS bandwidth management then because QoS devices can
enable e-business applications - including e-commerce, intranet
access, Internet browsing and multimedia streams - to share
bandwidth with legacy and other existing applications, while
ensuring response time for business-critical applications is always
protected.
"The QoS process can be divided into basic functions: monitoring,
policy setting, policy enforcement and reporting," says Cox. "These
form a circular process that provides feedback to ensure the
desired results are not only achieved, but maintained. Networks and
applications are dynamic environments and QoS implementations need
periodic tuning to stay effective. With these elements in place,
network managers can implement QoS policies that effectively
support their business processes."
Clustering versus load balancing
Weynand Kuijpers,
vice president of European operations at Web hosting company
NTT/Verio, considers the differences between these two approaches
to Web site uptime.
"Load balancers bring multiple sources for getting the same
content, either locally or geographically spread. A load balancer
provides 'instant redundancy' in terms of simple content sources.
It allows a general Web site architecture to be very scalable, by
having capacity on demand (for example, by deploying more content
serving servers as the engine behind the load balancer or deploying
more geographically spread content origins).
"In addition, they bring a logical decoupling of inbound Internet
traffic to content serving servers - the general public does not
talk to the content serving servers, which 'hides' flaws in the Web
servers and Web demons.
"Clustering, on the other hand, in general isn't a device-based
technology, but represents clustered operating systems, where it
provides redundancy for higher order processes than a simple Web
server. Mission-critical parts of a Web hosting infrastructure such
as databases and application infrastructures require hot standby
platforms to run the application/database."
The NTT/Verio approach is to combine all the technologies,
providing customers with a total managed system. Load balancing and
clustering are technologies that, used in the correct combination,
augment each other and make the Web infrastructures more resilient,
scalable and manageable.
Traffic shaping
We mustn't forget IP traffic management
either which, although a broad term, can best be explained as IP
filtering and firewalling to shape the traffic, which means making
sure only authorised traffic will reach the target infrastructure.
The difficulty is that traffic shaping slows down access to the
Internet site. When asked if there were any credible alternatives
to the load balancing/clustering/IP traffic shaping trio, Kuijpers
responded with a qualified 'no'.
"I would position the load balancer as the most important
component, clustering as the second most important and then IP
traffic shaping. As the Internet grows in terms of points of
presence, it gets easier to have content in multiple places, tied
together by global load balancing technologies."
Nick Bond, technical manager at Radware UK, knows a thing or two
about traffic management because it's what his company specialises
in - and his words should be comforting to all involved in serious
e-business.
"There is such a thing as an invincible Internet site if you don't
put all your network eggs in the one basket," he says, instead
advising that "to build a fail-safe site you must disperse content
- use different ISPs in different locations, then you can look to
acquire 100 per cent availability".
Essentially, you are ensuring content is not exposed to the same
dangers. This can be taken a step further by using different
devices in different locations so that potential vulnerabilities
will not put all sites out of action simultaneously.
Ask Bond for his tip for the future and he'll nod in the direction
of Web-based global load balancing management service outsourced to
a third party. "It is the next wave. First we had connectivity,
then security and now we are seeing the development of highly
resilient, global solutions."
Next time your business suffers as a result of unexpected downtime,
don't say we didn't warn you
www.radware.comwww.sitaranetworks.comwww.stratus.comwww.verio.co.ukThe fault-tolerant Web
Nick Cheetham, UK managing
director at Stratus Technologies, believes that although nothing is
invincible if you need a transactional, e-commerce Web site up and
running 24/7, you can get pretty close.
"Some Stratus customers have gone 10, 12 and as many as 17 years
without experiencing any unplanned computer downtime," Cheetham
claims. "This level of performance is important when people rely on
applications supported by servers to make credit card purchases,
trade stock, book airline reservations or fill prescriptions around
the clock and around the world.
"To illustrate this near-invincibility, after the 1989 San
Francisco earthquake registering 7.1 on the Richter Scale, and the
1993 Los Angeles Northridge earthquake measuring 6.9, Stratus
electronically accessed all customers' machines in the area. All
were running and none required service.
One customer's system had danced across the floor, held fast only
by its strained power cord, and kept on processing. At last a
support coordinator connected with a customer employee by phone,
who reported, 'The server is fine. Can't talk. Everything else is
down.'"
So industrial-strength Internet business is possible, but how do
the technologies we've discussed here fit into this heavy-duty
equation?
"I take issue with the idea that to achieve the highest stability
clusters are the answer," Cheetham says. "Clusters are often quoted
as providing 99.9 per cent uptime (so called three 9s
availability). While this sounds impressive it actually translates
to a downtime of eight hours a year. The problem is you can't
predict when this will occur, so you must assume it happens at your
worst time to arrive at a potential downtime cost to the business.
Load balancing, on the other hand, allows a service or application
to span multiple hosts. This may be desirable for two reasons;
first, to allow an application to scale above the capability of any
single system; second, to allow a single system to fail and for the
other systems in the group to recover what threads of the service
are left, reconnect the users and pick up the load, if they have
the spare capacity.
"A fault-tolerant approach provides 99.999 per cent uptime (five 9s
availability). This translates to a downtime of five minutes a
year. This is 100 times better than a cluster, so the potential
downtime cost to the business is 100 times less. On IP load
balancing, our fault-tolerant servers provide support for all the
leading approaches (Adapter Fault Tolerance, Adaptive Load
Balancing, Gigabit EtherChannel). The key issue is not whether a
user should focus on a hardware or software approach, but rather on
how available the system chosen will be."
What about cost factors - after all, fault-tolerant servers have
traditionally been seen as the domain of the multinationals,
banking industry and the like?
"There are comparable services costs and cheaper IT support costs
with fault tolerance. Clusters are difficult to build, configure
and maintain. The technical advantage of the Stratus is that it
looks externally like a regular Windows 2000 server, so any Windows
2000-trained Microsoft Certified Professional (MCP) can maintain
it.
With fault-tolerant servers in failure situations there is no loss
of service, no lost transactions and downtime is 100 times less
than with clusters. This is the key point - downtime equals lost
revenue. High-profile examples of extended downtime at the London
Stock Exchange and the Egg bank illustrate that companies often
lose millions during downtimes."