« Online banking security (or lack of it) | Main | Responding to the Global Security Challenge »

Single point failures

The recent two hour outage of Google's Gmail, affecting the majority of its 150 million users reflects the growing risks associated with the inevitable drift towards centralised system management.

At least Google was honest enough to issue an apology explaining that the incident was caused by an engineer's miscalculation and that they were investigating ways to ensure it did not happen again. (Mind you it's not the first of these incidents.)  That's a big improvement over O2 whose service was down for many customers during most of Saturday without any explanation.

Expect more of these crashes. Information technology is spectacularly vulnerable to tiny errors and we are building massive single point failure scenarios based on cloud computing, centralised management and technology monoculture. In response, we must all raise our game in business continuity and crisis response. 

Bookmark and Share


TrackBack

TrackBack URL for this entry:
http://www.computerweekly.com/cgi-bin/mt/mt-tb.cgi/63319

Comments (3)

Anonymous:

I couldn't agree more, we really do need to raise our game in terms of redundancy.

A couple of years ago centralised management was a key selling point, now it's kind of lost in the wind in terms of... Single place to update apps... Single place to fail !

At VESK virtual desktop we have 3 datacentres for DR and we're finding that our power charges are being hiked up ! It just means that business is less profitable from the normal sense of hosting. We have many other revenue streams so I think like you say, everyone needs to step up their game which will hopefully have a reciprocal effect on the rest of the industry.

This article asserts that Cloud Computing and the current direction of software architecture makes global system outages inevitable. Surely the point behind this sort of technology is to reduce such single points of failure, and hence the system outages that go with it.

Is the assertion that this is just hype, or that the implementation is poor?

System outages are inevitable and always have been, the effects can be reduced by multiple redundancy sites.

We find the implementation is pretty good but there are a number of factors that can cause an outage from power, Internet connection, faulty hardware/software so multiple sites are the only option for DR.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on September 2, 2009 9:22 AM.

The previous post in this blog was Online banking security (or lack of it).

The next post in this blog is Responding to the Global Security Challenge.

Many more can be found on the main index page or by looking through the archives.