Human error most likely cause of datacentre downtime, finds study
Human error such as inadvertently adjusting the temperature or pulling power cords from an asset are biggest causes of datacentre downtime
Datacentre professionals inadvertently adjusting the temperature from Fahrenheit to Celsius; accidentally pulling power cords from an IT asset; or inadvertently overloading a circuit by accidentally plugging in a server are some of the main reasons leading to datacentre downtime, according to a study.
Human error in datacentre facilities is posing one of the biggest internal threat to business continuity, found the study by Enlogic, the company that provides datacentre energy monitoring products.
More than a third of respondents viewed human error as the most likely cause of downtime. Equipment failure and external threat of power outages were the second and third likely causes cited by respondents.
Other studies have also concluded that as much as 75% of downtime is the result of some sort of human error.
Downtime is simply a time when an IT equipment is not operational. Unplanned downtime can lead to financial loss as well as reputational damage to businesses. Estimates from the survey suggested that just one minute of downtime can cost as much as £100k, so if a datacentre suffers downtime for 60 minutes or more then it is likely to go out of business.
IT professionals usually overcome such downtime challenges by implementing a redundancy plan or creating a datacentre resiliency strategy.
Resilience is usually associated with other disaster planning and datacentre disaster-recovery considerations such as data protection. It is achieved through the use of redundant components, subsystems, systems or facilities.
But this approach increases upfront purchase costs and escalates energy bills by sending power to idle servers, the company warned.
“Human error has actually cropped up numerous times as the most prominent threat in this kind of survey; yet the industry allows the problem to remain. It is concerning that managers have been aware of the biggest cause of downtime but have failed to implement the right technology”, said Paul Inett, vice-president, Enlogic Europe.
Robert McFarlane, a datacentre design expert at US company Shen Milsom Wilke, blamed lack of planning as the reason for human error within a datacentre.
For instance, most things within a facility are dual-corded – connected to two different power receptacles, coming from two different power centres. Electricians may connect one receptacle to panel A, and the other receptacle to panel B.
Furthermore, they may put circuit labels on the outlets inside a cabinet, which are difficult to read, and put identifications on the panel schedules that do not correspond to the cabinet numbering.
This makes it too easy to turn off circuits in different cabinets or fail to power down the intended cabinet, according to McFarlane.
“You’ll never eliminate human error from the datacentre, but you can make choices in both technology and training that will help to reduce its severity and impact,” Inett said.