There is a big difference between lights-out and working in the
dark. Mark Lillycrop sheds some light on self-managing,
self-healing systems.
A couple of months back in this column, I mentioned IBM's eLiza
project, and various initiatives that are underway to combine the
enterprise management strengths of our many and varied data centre
technologies, to create "self-managing, self-healing"
systems.
What's behind projects of this kind is an over-riding determination
to automate as many operational and management functions as
possible, and to simplify those that do require human intervention.
After all, why should people get involved, if systems can run
themselves?
It's not just that people are more expensive than hardware and
software, although they certainly are, and the cost of manpower is
climbing as fast as hardware prices are declining.
Less reliable
But the main issue is that, when it comes
to dull, repetitive management tasks, we're so much less reliable
than our computers. Forty per cent or more of system failures are
attributable to human error (sigh - how many times are we reminded
of that statistic?), and we now know that the Nasdaq can be brought
to its metaphorical knees by one ill-advised key-stroke from a
human being.
I know I must be approaching middle-age because, when I read the
rhetoric about "self-managing, self-healing" systems, I keep
getting a feeling that I've seen it all before. And, indeed, I
have.
Glancing through my own cuttings file on automated operations,
which dates back to 1987, I find heaps of hyperbole from a decade
ago about the brave new world of darkened data centres; eulogies to
the job scheduling, console automation, and system management tools
that will allow us to turn out the lights once and for all; and
confident analyst predictions that, "a revolution is brewing, and
data centre operations in five years will look nothing like it does
today".
OK, so how many people managed to turn the lights out - and keep
them out? Well, to be fair, the industry has made giant strides in
terms of self-management in those intervening 14 years. There are
plenty of mainframes, AS/400s, even Unix systems out there that are
running back-end jobs day-in and day-out with little or no operator
intervention - and that's just as well, as all our technical
expertise is being employed elsewhere now.
With the advent of e-commerce, all eyes are on the network: keeping
applications running end-to-end across alien network territory,
using desperately rudimentary tools.
Today's challenge
Today's management conundrums involve
integrating our existing TCP/IP based SNMP and RMON MIBs with
Web-inspired tools such as xmlCIM and CIMOM, from the Web based
Enterprise Management initiative and the Distributed Management
Task Force.
Today's challenges include gathering Application Response
Measurement (ARM) information on application and data transfer
rates, and feeding that into our network management infrastructure.
Then there's Microsoft's WMI, directory-enabled network functions,
etc etc.
To administrators and technical support specialists, these
technologies are all too visible and bleeding-edge to be automated.
But gradually, as the standards become more tightly integrated,
automation features will be added to allow routine performance data
to be collected and alert responses to be handled with minimum
intervention, even in these apparently primitive areas.
Meanwhile, the humans will have moved on to the next challenge -
wireless nets, support for PDAs and disposable clients,
whatever.
And I guess that's why automation has remained an elusive target,
because we are constantly dealing with new complexities; and until
tasks and alerts become relatively routine in nature, until the
management tools and consoles can be effectively integrated, I'm
afraid we humans will just have to be involved.
I'll leave you with an excellent quote I picked up while I was
rummaging through that dusty old clippings file. It comes from
Robert Becker, writing in Technical Support magazine in 1988. He
said, "There is a big difference between lights-out and working in
the dark."
Perhaps, in the IT industry, we spend so much time working in the
dark, we simply don't notice when the lights are switched off!
Mark Lillycrop is director of research at the market analyst
Xephon.