What are the aspects to keep in mind when it comes to managing an SLA?
SLAs are not substitutes for a good partner relationship and governance procedures. Therefore, a broader management scope would still be necessary. Inordinate effort to make the SLA absolutely watertight isn't necessary, particularly when the internal IT organization involved. Select the SLA that best represents the end user's actual experience, while being realistic about what the expectations should be. So, application availability makes much more sense than server uptime does.
It needs to be appreciated that SLAs can be met only under certain given conditions – for example, if the average user's Web experience is to remain above the threshold, strong URL filtering is required to ensure that Web usage is in accordance with policy, and requests for exceptions should be carefully managed. At a high level, there isn't much that is arguable about the right SLAs. However, collecting and aggregating metrics is usually quite tough. Data typically resides in multiple databases, infrastructure management systems and possibly multiple service desk solutions. Building connectors to these data sources, aggregating them and developing dashboards for reporting are all resource-intensive tasks.
Tools don't do much here – the professional services' costs typically equal the average SLA management tool's licensing costs (a 1-to-1 ratio). Hiring an expert who is familiar with the metrics that work in terms of securing buy-in (and truly representing the business' interests) and hiring the IT talent to build the data connectors is much more important than any tool-related considerations. How can IT teams leverage the various features offered by SLA management tools?
SLA management tool vendors offer value in multiple ways. First of all, they know which metrics to select. They also know where to look for data necessary to build metrics that IT and business would monitor as part of the SLA.
Vendors have the necessary tool knowledge to build connectors to the data. They also provide mechanisms such as rules engines that ease the process of aggregating the data to build metrics to be monitored. Dashboards are usually provided to aggregate and present the metrics in an automated way, which provide automated alerting services, analytics, etc., and in effect, create a single system of records that everybody can agree on. So, it's less about tool features and more about competence that the SLA management tool provider brings to the table. What is the scenario in India when it comes to SLA management?
|Somak Roy, Butler Group, Datamonitor India|
We believe that the adoption of formal SLAs is fairly limited in India. SLA formulation and management requires IT and multiple business stakeholders to accept a single version of the truth, in terms of the data that drives the metrics. This is typically a fairly resource-intensive task that involves aggregating data from multiple application repositories, databases, infrastructure management systems, configuration management databases and service desks. The cost is at least in the order of tens of thousands of dollars, and in many cases above the U.S.$100,000 mark.
A company would take up such a resource-intensive project only if sufficient scale exists and/or if the awareness of the need for IT maturity is high – these conditions are rare in India. Even globally, formalized internal SLAs are more important to the very large enterprise than any other kind of company (this does not include service providers of all kinds, because SLAs are the core of their business). When defining an SLA with a service provider, what are the key parameters to be specified?
That really depends on the provided service. Any infrastructure management (including outsourced help desk) sort of service would have SLAs that cover incident response time (including tasks such as adds, moves and changes), mean time to repair, availability (such as network and email availability) and how these variables trend over time. A lot of work usually gets done before parties can agree about defining the right metric and context for each (such as, which event would define incident closure?), segmenting types of events (based on severity), periodicity of monitoring, data collection and aggregation procedure, rules defining penalties and the escalation procedure.
When the service involves application hosting, SLAs would be related to uptime (availability), which is qualified by factors such as the number of concurrent users to be expected. Performance is a little more difficult to define. Ideally, performance should be measured in terms of end-user experience, but sometimes measuring end-user experience involves hard-to-scale tasks such as installing agents on end users' desktops, and with the growth of mobile users, it's hard to control the endpoint and the network, complicating matters.