The art of finding (and fixing) cloud faults

Senior Editor, UK

In this guest post, Ron Vermeulen, go-to-market manager for north-west Europe at IT services provider, Comparex, runs through the process of finding and fixing cloud faults

There is no doubt that cloud computing offers huge benefits to organisations, but CIOs must accept and manage the potential barriers to realising its value.

Service faults and latency issues can prove problematic, for example, when the application in question is business-critical. They can also cost organisations time and money, and have a negative impact on the end-user experience.

Pinpointing where a performance issue occurs in the first place can also be a challenge.

When on-premise IT infrastructure was de rigueur, it was far easier for organisations to find the source of the problem, which could be down to a misbehaving server in the datacentre, for instance.

It’s not so simple today, because ‘your’ public cloud server is now in someone else’s facility, and the difficulty is compounded because the glitch could be closer to home, rather than the fault of the service provider.

A cloud service might be performing fine, but a network problem could be causing issues at ‘home’. A managed service can often help to lessen this headache by identifying, on behalf of the organisation, where the problem lies in the first place.

Finding cloud faults and fixing them

Fixing the fault is the next hurdle. If the problem is with a supplier’s services (rather than in-house) then another complication is added.

Different Service Level Agreements (SLA) for fixing a fault are in place with all cloud suppliers; and managing the various terms and conditions is a mammoth task.

SLAs governing ‘time-to-repair’ can vary greatly – up to 30 hours in some cases. For a business-critical application this is an unacceptable timeframe.

Organisations can pay for a higher level of SLA to guarantee a rapid fix time, but this is rarely factored into their initial cloud costs. As such, organisations can end up paying more than expected just to keep the lights on.

The flexibility and agility of cloud still make it the first choice for lots of organisations, but when it comes to management, many IT teams have essentially relinquished control over support and maintenance.

It is critical organisations retain visibility across their IT infrastructure and ensure individual SLAs meet the specific needs of their organisation.

Take back control of cloud

Regaining control of a cloud deployment can be achieved by adding an overarching management layer that offers visibility, or engaging a managed service to help implement this.

This means, rather than relying on a vendor to analyse a support ticket, the analysis can begin at home.

Pinpointing an issue can be done in as little as 30 minutes using tools and services available today. This offers an even greater level of control – by introducing a sophisticated management layer or service can actually spot a problem before it happens – so issues can be fixed proactively.

This level of visibility into cloud is a ‘must have’, not just a ‘nice to have’ – particularly as convoluted IT infrastructures become commonplace.

The shift to multi- and hybrid-cloud installations, pointed out by Gartner, is one example of this increasing complexity. The cloud ‘stack’ no longer just encompasses software-, infrastructure- and platform services, but can be made up of six interlocking layers.

Ultimately, ‘out of sight, out of mind’ is not a viable approach to cloud. Ensuring seamless performance and round the clock availability can only be achieved by retaining visibility and control.