Recently in Incident Response Category

The art of strategic crisis management

| No Comments
| More

Scientific American has an interesting article "How Would the U.S. Respond to a Nightmare Cyber Attack?" based on a recent crisis exercise.  

It's a good question as well as a well overdue exercise. I've been concerned about such an exposure for the past two decades. At the turn of the century I predicted that the electronic Pearl Harbor would probably not happen until 2006, by which time several trends (in threats, vulnerabilities and dependencies) would simultaneously peak or mature, resulting in a step change in the risk profile. I called this the "critical convergence period". My forecast was not based on uninformed speculation, but on analysis of technology road mapping exercises carried out by subject matter experts.  

As with many well-researched forecasts I got the outcome right but the timing wrong. Yet it's taken more than a decade for the world to wake up to the problem. At the time of my forecast ZDNet criticised me for being a "doomsayer". Some CISOs suggested I had it wrong and that the future threat landscape was more likely to be one full of numerous minor incidents rather than any big disasters. Six years later Professor Fred Piper drew attention to this debate posing the question "Who is right?" Judging by current national risk assessments, it seems that I was, though too far ahead with my timing.

We are still in the very early stages of educating managers about how to respond to a major crisis. The Scientific American article reveals a number of weaknesses, though these are to be expected as strategic crisis management capabilities are very weak across the board. For example:

  • "The teams had to come up with a response within hours. The pressure was intense." (In a real situation they'd have less time to decide a response.)    
  •  "As the situation grew more serious, the consensus for diplomatic engagement dissolved." (This suggests a fundamental weakness in strategy. Thank goodness we don't manage nuclear confrontations that way.)  
  • "The 19 groups suddenly diverged considerably about what the proper response should be." (Challenge and diversity of options are both very good, as long as a consensus can be quickly reached through logical debate.)

Why is crisis management so weak? The answer is very simple: it's because it requires a number of skills and traits that are in short supply. This is not helped by the fact that few individuals understand how to do it, and even fewer take the trouble to teach others how to do it.  

My book "Managing the Human Factor in Information Security" has a good chapter on the subject, explaining many of the nuances of the art of strategic crisis management that are unlikely to be developed by self-taught CERT teams.   

In practice each crisis team has to develop its own unique blend of skills and chemistry. These need to include: a capability to manage large amounts of incoming information; the skill to piece together what's really happening from incomplete reports; the discipline to focus on the immediate problems at hand; the imagination to develop a compelling strategy to guide the broader response; the ability to communicate effectively to external stakeholders; and the ambition to leverage the power of the full resources available across the organization. Most importantly team members need good listening skills, which are always in short supply, especially in a fast-moving crisis team environment. 

Enhanced by Zemanta

Computer says No

| 1 Comment
| More

A few postings ago, I mentioned the growing number of high-profile digital catastrophes reported in the media. And I wasn't referring to natural disasters such as fire and flood or deliberate attacks such as hacking. What I was really concerned about was the type of increasingly spectacular glitch caused by simple, human causes, such as inadequate software testing, fat finger mistakes, bad change management or poor data quality. These are the things we generally class as "cock-ups" rather than "conspiracies". They are the result of accidental rather than sinister actions.

One would hope, after all these years of designing and operating IT services, that we should be able to deliver services that are highly reliable. Unfortunately it's not always the case. In recent months we've seen failures of supposedly bullet-proof Cloud services and extended outages of major banking services. But that's just the tip of the iceberg. Behind every major incident are dozens of near misses, hundreds of minor incidents and thousands of bad practices.      

Why is this continuing to happen? Several trends are behind this. Hardware might be a little more reliable (though not always) but systems and infrastructure are becoming increasingly complex and harder to integrate. Project deadlines are becoming shorter because of the continuous pressure from business management to move faster and faster. There's also relentless pressure to cut costs resulting in greater demands on resources and constantly changing supply chains. Add to this the usual elephants in the room that nobody wants to tackle such as data quality (for which there no standards) and intrinsically insecure legacy assets, and it's a wonder our systems manage to stay up as much as they do.

Yet this is a world moving to Cloud Computing, where we might reasonably expect better than 'five nines' service availability to keep out businesses running. A major issue is that business continuity planning is difficult and expensive for users of Cloud services. They will have few, if any, alternative sources of identical services. And switching is far from easy. Try asking a Cloud service provider how to plan for a major outage and you'll be lucky to get a sensible answer that even acknowledges the problem. 

So what can be done? Here are a few ideas. Firstly, accept that no service is invincible: they are all vulnerable to deliberate and accident incidents. Increasing centralisation of service delivery and a growing reliance on monoculture (use of identical components and practices) is also raising the stakes by increasing the global impact of a failure. The bigger and more widespread they are the harder they will fall. And credits for missed service levels are no substitute for lost business and damaged reputation. 

Secondly, treat outages and security events like safety incidents. Monitor the minor incidents and conduct a root cause analysis for near misses and common sources of failure. There's no such thing as an isolated incident. Examine your own operations and dig into your service provider's history. Many well-known service providers fall well short of customer expectations.  

Thirdly, draw up a 'catastrophe plan'. And I don't just mean a disaster plan, which generally involves recovering from a fire or flood. I mean a full-blown catastrophe plan based on a "worst of the worst" complete or extended loss of service or data. It will demand imaginative thinking and preparation, for example ideas to speed up the recreation of databases from scratch, alternative sources of essential management information, and proactive plans to reassure customers that everything is being done to protect their interests.

Fourthly, make your own personal contingency plans. Make sure you can work offline. Carry a decent amount of cash. Top up your petrol tank. And keep a torch, maps and compass in your briefcase. Because, like it or not, we are entering an information age in which business and life will become increasingly volatile, and major crises will become more commonplace.

Enhanced by Zemanta

Personal Continuity Planning

| 1 Comment
| More

We have computers to thank for teaching us the importance of business continuity planning. The real objective might be to keep the business running rather than prop up the technology, but the approach and plans largely grew out of computer fallback planning. That's why the manuals tend to be so thick. Business continuity planning is a simple process spoilt by consultants copying manuals from other clients.

But today's computer systems failures have a much wider impact than business processes. The consequences ripple down the supply chain affecting large numbers of customers who have grown to depend on just-in-time supplies of money, goods and transport. The problem is that unlike enterprises, consumers don't do contingency planning. It's understandable of course, given that nobody has encouraged them to do it.

Security and contingency planning are similar in that nobody bothers to do them unless forced to by compelling legislation or after experiencing a life-changing incident. Even with the highest levels of education, people won't pay attention unless the perceived consequences of not doing so are personal, immediate and certain. And they're not or rather they haven't been in the past.

In the last few months however we've seen some compelling incentives for UK citizens. Major UK banks have failed to work as expected, in one case for a couple of weeks. Floods have disrupted travel. Immigration queues have caused travellers to miss connections. And the forthcoming Olympic Games threaten to bring parts of London to a standstill.

How should a citizen react? The answer is by anticipating disaster and preparing practical continuity plans. It's nothing new, it's just rarely practised. I have one neighbour for example with a relatively sophisticated disaster plan. We've been briefed in detail on how to respond to virtually any major disaster affecting their property, whether fire, flood, earthquake or theft. But this is a rare exception.

Today, every citizen should be prepared for extended bank outages, petrol shortages, power outages, travel disruptions and other major disasters. Fifty years ago many people worried about nuclear war. Today we need to worry about how to survive when ATMs and transport fail.

Earlier this year I published the first ever book (as far as I know) on business continuity planning for small and medium businesses. With this year's hindsight, I'd admit that I probably didn't go far enough. We now need citizen continuity plans. Because information systems and process control systems are far from foolproof and given the pressures placed by management on IT development and operations staff, they are likely to stay that way for a long, long time. 

Enhanced by Zemanta

The forgotten art of crisis management

| No Comments
| More

The progressive worsening in BP's share price might in part reflect a continuing failure to address the finer points of strategic crisis management. Following on from the recent Toyota crisis, it leaves a worrying impression that many big international enterprises are not well equipped to manage large-scale incidents.

This is not a new problem of course. We've experienced many disasters before, and there are well established principles on how to go about crisis management. The snag is that they're not widely appreciated. Neither are they easy to execute. In fact very few senior executives, no matter how bright or well trained, seem to be able to translate expert advice into reality. 

Good crisis management is a rare skill. There are a few reasons for this. Partly it's because most executives are immersed in an organisation culture that is often itself a major contributing factor to the crisis, preventing them from seeing the wood for the trees. Partly it's because few executives are comfortable playing a dynamic, decision-making role that's completely different from their day job and prior experience. And partly it's because it's hard in practice to think clearly, objectively and strategically when you're under enormous pressure.

You can certainly spot some questionable decisions in BP's response: attempting to play down the size of the disaster; presenting a British image to an outraged US community; and offering up the CEO as a potential whipping boy. Lack of preparation or rehearsal for such events might also be a contributory factor, as there are press reports of factual errors in the published oil spill response plan.

As Dr Peter Sandman, a risk communications expert, once put it "The engine of risk response is outrage". An engineering response, not matter how elegant, will never suffice. Citizen rage needs to be directed to an appropriate target. President Obama clearly recognises this and is channelling it, along with his own rage, towards BP's British management.   

There are numerous learning points from this and other crises. My book "Managing the Human Factor in Information Security" contains a whole chapter on the subject of incidents and crisis management, setting out many of these points. It's a difficult art but one that needs to be studied and practised by a lot more senior executives.   

Physician, heal thyself

| No Comments
| More

It saddens me to see good security initiatives holed by sloppy security practice. My in-tray has been full of emails urging me to comment on reports about the lack of security in the web site for the UK Cyber Security Challenge, sponsored by leading security institutes such as the UK Government's Office of Cyber Security, SANS institute, the Institute of Information Security Professionals and QinetiQ.

Operational security is easily overlooked when dealing with educational or research initiatives. That's the learning point. Reputation can be equally damaged by an incident on a minor web site as on a mission critical one. All public sites need to be safeguarded whenever brand value or reputation is important. Security professionals in particular need to aim for higher standards in widely promoted initiatives. 

The response now demanded is for the sponsors and organisers to demonstrate their crisis management skills and turn this threat into an opportunity. It's not easy, but it can be done.

In search of sensible security advice

| 1 Comment
| More

Where does one turn to find objective, authoritative advice on security issues?

Certainly not the vendors if the recent reports of a security flaw in Internet Explorer are anything to go. There's a fair bit of spin or FUD in the announcements made in the last few days by Microsoft and its rivals. You have to carefully analyse the weasel words to get at the truth.

Nor can you rely on advice from governments, who seem to have created a hostage to fortune by recommending a temporary switch to other browsers. What does that mean? When will it be safe to go back? Are we talking days, weeks, months or years?

Security advice needs to consider the full range of circumstances. The size of the risk depends on many variables: products, versions, settings, behaviour, business impact, and of course the modus operandi, targets and capabilities of the attackers.

If Government wants citizens to use the Internet, then it needs to develop a more sophisticated approach to responding to vulnerabilities. Products cannot be judged to fine one day, and unsuitable the next. Security flaws in products are inevitable. We need defence in depth and better citizen education, not last minute panic warnings.

Worse case scenarios

| No Comments
| More

Every now and then we have to persuade our executive to think the unthinkable. But too much scaremongering can be counterproductive. You can read a few of my thoughts on the hazards of preparing for worst case scenarios on this Infosecurity Europe blog posting.


Single point failures

| More

The recent two hour outage of Google's Gmail, affecting the majority of its 150 million users reflects the growing risks associated with the inevitable drift towards centralised system management.

At least Google was honest enough to issue an apology explaining that the incident was caused by an engineer's miscalculation and that they were investigating ways to ensure it did not happen again. (Mind you it's not the first of these incidents.)  That's a big improvement over O2 whose service was down for many customers during most of Saturday without any explanation.

Expect more of these crashes. Information technology is spectacularly vulnerable to tiny errors and we are building massive single point failure scenarios based on cloud computing, centralised management and technology monoculture. In response, we must all raise our game in business continuity and crisis response. 

Learning from mistakes

| No Comments
| More

Making a mistake once is good for your education. Making it twice means you're not learning fast enough.

On Tuesday, Twitter suffered its second denial-of-service attack in a week. Admittedly the site stood up better, being down for only 30 minutes this time. But it demonstrates the importance of immediately beefing up security following a damaging incident that might be repeated.

Customer perception drives business value. To go down once is unfortunate, to go down twice can seem careless, but to go down three times might suggest that the wheels are coming off.

Safeguarding the DNA of the Internet

| No Comments
| More

A few postings ago I mentioned the growing importance of random acts of kindness by unsung heroes in rescuing or maintaining vital Internet services. Make no mistake; this is the future of security. When things get bad, we need to call on brilliant technicians to fix things. Fortunately some of the best have an altruistic streak.

Team Cymru are a good example. You might not have heard of them, and you might wonder why a top US security outfit would want to adopt a Welsh name. But I'm pleased to report that their hearts, as well as their expertise, are in the right place. In fact they are a strictly not-for-profit enterprise outfit, but with state-of-the-art skills.

The latest Team Cymru offering is a free alerting system to pinpoint open DNS resolvers in your immediate area. DNS is the DNA of the Internet, though it's based on a devolved management model which means that not all servers are as secure as you might like them to be. The Million Resolvers Project is a reporting system to alert participants when open resolvers are detected in their local address space.

If you're interested, you should contact them to get signed up. Like many information age initiatives, security is a two-way street. But what you get back is always more than what you put in.

About Archives

This page contains links to all the archived content.

Find recent content on the main index.



-- Advertisement --