When IT goes critical

A new Quocirca research report, Masters of Machines III, sponsored by operational intelligence tools vendor Splunk, shows that European organisations are suffering an average of three critical IT events a month. When the cost to the IT department and the broader business are added together, on average each incident runs up costs of over €100K. Do the sums – that is tens of millions of Euros per year. 65% of respondents say such an event has also caused reputational damage to their business.

Whilst critical IT events are inevitable, some organisations cope better that others. The average cost can be reduced through minimising the mean-time-to-repair (MTTR) for each incident and ensuring lessons are learnt from each event through effective root cause analysis.

A critical IT event was defined in the survey as occurring when a business application or supporting infrastructure is down, or has a malfunction, whereby a business process is halted, or users are unable to reasonably carry out tasks and transactions. The new report, the third in a series, following Masters of Machines II in 2015 and Masters of Machines in 2014, showed that system downtime has overtaken security as the top concern for IT management teams.

The overall MTTR for a critical IT event is 6.8 hours and, unsurprisingly, most organisations aspire to shorten the time taken to get applications back up and running. The average size of the team put together, usually at short notice, to address each event is 18 people. Teams comprise a range of individuals with the necessary skills and may include third part staff. Good team co-ordination enables events to be handled more effectively.

All three Masters of Machines reports use an index to measure a given organisation’s operational intelligence capability (insight across IT infrastructure). Effective operational intelligence improves IT infrastructure visibility and productivity (reducing the cost to IT of each team member involved in a critical IT event by up to 25%) and speeds up the time taken for root cause analysis, which for many organisations still runs into days.

No organisation can stop IT going critical on occasions. However, it is possible to be prepared for the unpredictable and have in place the capabilities and tools to minimise the cost and impact of critical IT events and return to business as usual as soon as possible.