Today’s storage array market offers customers a bewildering choice of products, ranging from general purpose to flash storage systems, as well as software-only storage products.
Choosing the right array can sometimes seem like picking a mobile phone contract, where network operators trade off one benefit against another, making direct comparisons almost impossible.
So, what storage performance metrics should we look for when choosing a storage array? How do we translate supplier specifications so we can truly understand what’s on offer?
Unfortunately, many specifications provided by suppliers can be misinterpreted or prove misleading. So, we need to examine them in detail or put them into the correct context to make sense of them.
We will start by dividing supplier product data into three broad categories; metrics, scalability and features.
Performance and availability metrics
All storage platform specs should provide a number of key metrics that can be used when making direct comparisons. These include:
- Latency: This is a measure of the time taken to deliver a single I/O request and is typically measured in milliseconds (ms). Latency is relevant in two contexts. Firstly, achieving high application performance requires latency to be as low as possible. Secondly, latency figures should be as consistent as possible with minimal variability;
- IOPS: An abbreviation of input/output operations per second, or the number of read and write requests the system is capable of handling;
- Throughput: This is directly related to IOPS, and indicates the volume of data that can be transferred to or from the system over a fixed period of time, usually measured in megabytes or gigabytes per second (MBps, GBps respectively);
- Availability: This measures the uptime of a system, typically quoted as a percentage. A "five 9s" or 99.999% availability figure represents unplanned downtime of just over five minutes per year.
Read more about storage performance
- Forget availability – it’s SAN performance that really matters
- SAN storage systems purchases driven by capacity, performance
- Key storage performance metrics for virtual environments
- Top features to boost VMware storage performance
- Improve performance with all-flash storage arrays
Latency, IOPS and throughput are all affected by whether operations are read (usually faster) or write (often slower) and whether I/O is sequential (usually faster) or random (often slower). This applies to spinning disk and flash-based systems.
There are no agreed standards about how to measure the performance of a storage array but testing regimes such as the Storage Performance Council (SPC) do go some way to producing figures for comparison.
The SPC uses a number of workload types (large file processing, large database queries, video on demand) to simulate typical workloads. Unfortunately, many suppliers choose not to submit their systems for testing.
Performance figures are affected by the block size used in testing. Latency and IOPS figures are better with smaller block sizes, whereas throughput achieves better results with larger block sizes.
Array suppliers may choose to use different block size figures to show the best results for their product. If block sizes aren’t shown, ask the supplier what values were used.
Finally, remember to ask if quoted figures represent peak capabilities or whether the values given can be sustained over a long period of time. In many cases short-term peak workloads can be improved with a large cache, but aren’t sustainable.
Redundancy and failover
Increased availability is achieved through component redundancy (multiple power supplies, host ports, back-end controllers, Raid) and the ability to hot swap many of these components with no outage.
High availability can be delivered by multiple redundant controllers or nodes, where the work of a failing node is taken up by others in the system. At the basic level it is implemented as a dual-controller architecture and at the high end as a multi-node scale-out system.
The exact nature of the failover process offered by the supplier needs to be carefully examined. Some systems deploy nodes in pairs, which means only one node takes over the workload of a failing controller, and that can result in an imbalanced configuration.
Many simple dual-node configurations only ensure data is accessible during a node failure and, due to the lack of write-cache redundancy, will suffer a significant drop in performance.
Full component hot-swap capability tends to be seen only on high-end enterprise arrays, with the accompanying cost premium.
With ever-increasing volumes of data being stored, managing scalability is an important consideration in storage purchasing. A storage array is a significant investment financially and operationally, so ensuring seamless scalability is important.
These are some of the key specs to watch out for with regard to scalability:
- Processors: Most storage arrays are based on commodity Intel processors. More and faster processors does not necessarily mean faster I/O response times. It may indicate a system has the capability to support more storage features. Question the supplier about how multi-processing support is implemented in their controller software.
- Cache/memory: This is typically used to improve I/O performance; more cache generally means more of the working set of data can be kept in memory. Look out for suppliers who replicate cache and then quote the total cache size. In this instance, the effective cache capacity is only half of that advertised.
- Disk type: This is the range of disk types (based on capacity, speed, physical size and connection) that the array supports, for example, 2TB 7,200rpm 3.5” SATA or 600GB 15,000rpm 2.5” SAS. High-capacity drives will usually give slower access times. Higher RPM drives will give less latency. Obviously you’ll get more 2.5” drives in an enclosure than 3.5”. It is important to have a balance of multiple drive types if an array supports many different workload types.
- Drive count: Most systems grow by adding disk shelves, with the largest systems capable of supporting thousands of drives (for example, EMC VMAX3 400K with 5,760 drives). Adding drives increases the traffic and contention on controllers, which means system performance may not scale with capacity.
- Connectivity: This covers supported protocols and physical ports. The most common connectivity types are Fibre Channel or Ethernet; Fibre Channel connections use the Fibre Channel (FC) protocol and Ethernet can use iSCSI, Fibre Channel over Ethernet (FCoE) or file-based protocols such as NFS and SMB/CIFS. Generally speaking, Fibre Channel provides a highly reliable, non-routable, point-to-point private data network, but can be expensive to deploy and maintain. Ethernet provides an alternative, based on standard network equipment, but doesn’t provide lossless capabilities (which are important in highly utilised networks), unless the hardware supports Datacentre Bridging (DCB) protocols.
- Capacity: The storage capacity of a system is based on the number and capacity of drives available. Suppliers (of drives and arrays) like to use decimal rather than binary capacity figures (that is, where a kilobyte is 1,000 bytes rather than 1,024 bytes), which has an effect on the total system capacity – around 10% less, when talking in terabyte terms. Some suppliers (such as NetApp) format drives to a fixed size, allowing HDDs from different manufacturers to be mixed in the same system. NetApp also formats SATA drives with extra CRC protection, which results in an additional 9% loss of usable capacity. Always check for the available capacity after system overheads.
- Data Protection: All arrays come with some form of local data protection. Most use Raid, which in the worst case (Raid-1) results in a 50% capacity loss to resiliency. This declines with Raid-5 implementations (25% with 3+1 schemes, 12.5% with 8+1) and wider Raid-6 configurations (25% with 6+2, 12.5% with 14+2). Of course, wider Raid stripes mean deploying more disk and fixing the “unit of capacity upgrade” at the Raid stripe width, plus the increased risk of data loss through drive failure.
- Host and LUN count: The number of hosts and LUNs (volumes) supported is an important scaling factor. Limited LUN count means capacity can only be increased through using larger LUNs. Limited host count support may require deploying multiple arrays with the associated management overhead.
Array makers like to add features to their products as a way of differentiating them in a crowded marketplace. Features suppliers tend to fall into a number of categories:
- Space optimisation: This includes thin provisioning, data compression and data deduplication, all of which aim to ensure that physical utilisation is kept to a minimum. These features can save the customer money but also come with trade-offs. Thin provisioning requires more management; compression and deduplication typically consume more processor/memory resources, which can make an impact on performance.
- Data protection: This includes remote replication technologies (both synchronous and asynchronous) as well as local protection, such as snapshots. Replication protocols tend to be proprietary, requiring customers to deploy the same supplier’s products at the primary and secondary sites. Snapshot implementations can also vary, with the best providing writeable snapshots and only storing changed blocks on disk.
- Management: Suppliers have worked hard to improve the usability of their products over the last 20 years. Today, customers should expect multiple management tools (Web/GUI, CLI and API) as well as integrated management and reporting functions into other platforms, such as hypervisor and cloud management frameworks.
The implementation of space optimisation features depends heavily on the block size used for on-disk structures in the system architecture. Generally speaking, the smaller the on-disk block size, the more efficient the savings will be. Data deduplication can be carried out inline or as a post-processing task. Inline deduplication needs to be more efficient than post-processing, as it has a direct impact on I/O performance. Check exactly how your supplier delivers space optimisation features.
As we have seen, caveat emptor applies to storage array specifications. As always, don’t take your supplier’s claims at face value. It’s always worth taking time to question exactly how spec sheet figures are arrived at.