Forget availability – it’s SAN performance that really matters

Nicholas Dimotakis of Virtual Instruments argues against wasteful over-provisioning and vendor-specific monitoring tools. Overall SAN performance is key, he says

Virtualisation and cloud computing are key projects for IT departments right now. And a key part of these key projects is the addition of storage hardware resources to ensure SAN performance in the new environment.

Yet organisations often over-provision those environments – there is, for example, just 5% utilisation of storage switch ports installed.

Often, flash arrays are implemented too. Here the environment becomes so unbalanced that adding significantly faster storage slows down the entire system.

New technology deployments, such as virtualisation and cloud, require holistic management of the infrastructure. To do this effectively you really need to see what’s going on in the entire SAN environment.

Deploying technology makes things more complex

Each time you implement a new technology you add another layer of complexity, making it more difficult to manage. Yet over-provisioning and over-engineering the hardware is what many businesses traditionally do in a knee-jerk attempt to assure a stable environment.

Deploying all-flash arrays into an existing production environment and achieving optimum performance is no small feat. There are literally thousands of host and fabric conditions that have to be satisfied to enable high I/O performance across the entire system.

Blindly throwing resources at the issue is a costly way to underpin performance. It’s far better to be able to see the utilisation of each application and size or tier your resources accordingly.

Monitoring the environment

Monitoring, adjusting and controlling resources as the entire system fluctuates is fundamental to ensuring consistent system performance.

One large European bank recently migrated from mainframe to open systems, from one to two datacentres, and virtualised at the same time. During the project it looked for the first time at switch utilisation levels and found that of 3,115 switch ports installed, 1,727 were not being used at all. The reason for this was that every time it had rolled out a new application it had rolled out new infrastructure to support it as a matter of policy.

By monitoring its infrastructure the bank was able to spot the bottlenecks in its storage fabric and resolve them. It was also able to pinpoint and utilise untapped existing resources in the form of thousands of unused ports instead of purchasing new switches to accommodate growth. That made for an immediate saving.

Why such under-utilisation?

Why, in this cost-conscious age, do we allow such under-utilisation of resources to happen? A few would argue that infrastructure needs to be over-provisioned to allow for spikes of activity, but over-provisioning by 10,000% is surely excessive.

The real reason for this inefficiency is that within the datacentre you can’t see what’s going on end-to-end – from VMs right down to LUN level in the arrays, and everything in between – in real time, at line speed. Instead of showing how the whole system is performing, vendor tools provide information only about what’s going on with that specific element, and in some cases what it’s attached to. You never get the whole view.

If you’re on a sinking ship, there’s not much comfort in knowing that the engines are still working fine

Nicholas Dimotakis

That means that if a problem occurs everyone can look at the technology that they are responsible for and declare it is working fine while pointing fingers at other departments or suppliers.

Obviously, that isn’t helpful. If you’re on a sinking ship, there's not much comfort in knowing that the engines are still working fine. It’s exactly the same for your datacentre. The first metric you’re normally given is the core temperature of the CPU. But wouldn’t it be better to know how the entire infrastructure is performing so you can drill down to find the inefficiency?

The old saying “Penny-wise but pound foolish” applies here. We rush to follow the latest trend, be it virtualisation, cloud or mobile but forget that to gain the improvements they promise, we also need to monitor performance of applications in the datacentre. There is no point in virtualising a server, switch or array if you are going to have to over-provision by a large factor to ensure it performs.

Performance not availability

And although performance is key, we’re all still focusing on availability.

Availability is not what we should be measuring. We need to know how the application performs, the quality of the user experience and how to set a performance guarantee. You don’t buy a new car by looking at how available it is – you want to know how it performs in terms of load capacity, speed or fuel efficiency.

Likewise with the datacentre – it has to perform. That is what we are paid to provide, but the industry has until now talked only about availability, with no guarantee of overall infrastructure performance.

How to measure performance?

So, how do we resolve this? To guarantee performance to the business, we need to look at exactly how the infrastructure and all its elements are performing. The only way to achieve this is to implement an infrastructure performance management platform that can show what is going on in the datacentre, from virtual machines to storage LUNs. It needs to show this in real-time across the entire infrastructure, regardless of vendor. It mustn’t introduce latency and most of all it mustn’t give averaged results over minutes or hours that don’t show spikes of latency, or where a problem might be.

The virtualisation revolution, cloud and flash storage are the latest phases of IT progress, but progress with added inefficiency and cost built-in is not the way forward. Only by looking at the whole picture can new technology be implemented efficiently.

Nicholas Dimotakis is director, services and presales EMEA, at Virtual Instruments

Read more on Storage