BlackBerry outage triggered by 'non-critical' system routine

BlackBerry maker RIM said that this week's widespread BlackBerry outage was caused by a non-critical system routine.

BlackBerry maker Research In Motion  ( RIM) has blamed last week's massive BlackBerry outage on "the introduction of a new, non-critical system routine that was designed to provide better optimisation of the system's cache."

In a seven-paragraph statement, RIM said the diagnostic analysis of the BlackBerry service interruption -- which saw mobile email grind to a halt -- is progressing, and more information will be released as it becomes available.

Around midnight GMT, on 18 April,  millions of BlackBerry users found themselves unable to send or receive mobile emails on their BlackBerry devices. Lack of service or spotty service continued well into the Wednesday morning. Once service was fully restored, most users were greeted by backlogged emails that had been on hold throughout the blackout.

Several mobility experts and BlackBerry users complained on 18 April that RIM did nothing to notify them of the outage. As of 20 April morning, both the RIM and BlackBerry Web sites had no mention of the outage.

It was still unclear how many of BlackBerry's 8 million worldwide users were affected, but BlackBerry has roughly 5 million users in the U.S. alone.

In a Web poll conducted Wednesday morning by ProfitLine, a telecom expense management firm, 80% of responding enterprise IT and telecom professionals said the BlackBerry outage caused disruption to operations. In addition, 44.5% reported a moderate or substantial impact to enterprise productivity. A smaller number, 18.2%, reported that the outage had no impact.

In a statement, ProfitLine's vice president of mobility strategies said, "These numbers show the critical role that wireless devices play in corporate America. Wireless communication has gone from a travel convenience to a mission-critical communications tool."

For more on BlackBerry
Read more about the BlackBerry mobile platform

Check out what a BlackBerry shutdown could cost

Learn more about mobile email from Daniel Taylor
According to RIM's statement, the company's "first priority during any service interruption is always to restore service and then establish, monitor and maintain stability."

RIM added that it was able to definitively rule out security and capacity issues as root causes of the BlackBerry outage. RIM also found that the blackout was not caused by any hardware failure or core software infrastructure.

According to RIM's statement, the new system routine that caused the outage produced an "unexpected impact and triggered a compounding series of interaction errors between the system's operational database and cache. After isolating the resulting database problem and unsuccessfully attempting to correct it, RIM began its failover process to a back-up system."

Further delays in restoring service and processing the message queue were caused by RIM's failover process not working correctly, despite repeated testing.

RIM's statement concludes: "RIM apologises to customers for inconvenience resulting from the service interruption. RIM's root cause analysis and system enhancement process with respect to this incident is ongoing and RIM has already identified certain aspects of its testing, monitoring and recovery process that will be enhanced as a result of the incident and in order to prevent recurrence."

Read more on Data centre hardware