It's not quite something for nothing, but the concept behind thin provisioning might occasionally seem like it....
Allocate 20 TB of disk volumes, when all you have is a 5 TB array? Sure. Snapshot a 2 TB volume, but only copy a few hundred gigabytes? No problem.
The premise behind thin provisioning is simple: You allocate a logical volume with a stated capacity, but only allocate physical disk blocks to it as they are occupied. So a server might think it has a 2 TB volume, but if it only puts 100 GB of data on it, the volume only occupies 100 GB on disk.
Thin provisioning was first popularised by relatively small companies like 3PAR, Compellent Technologies, LeftHand Networks (now owned by HP) and Pillar Data Systems on the hardware side, as well as SAN software developers such as DataCore and FalconStor. However, storage giants such as EMC (EMC plans to have virtual provisioning in all its disk arrays), HDS (HDS disk array can thin-provision other external storage systems) and IBM rolls out thin provisioning on SVC storage virtualisation devices) are now onboard with thin provisioning too.
Thin provisioning sits on top of virtualised storage -- disk capacity that has been converted to a pool of blocks by a storage controller that either lives in the SAN, such as the IBM SVC device, or inside the storage array itself. Blocks from the pool can then be assembled into logical disk volumes (LUNs) and provisioned to servers which use them as if they were locally attached drives.
Normally, if you provision a 1 TB LUN, it consumes 1 TB of blocks from the pool. However, servers and applications always need room for growth, so it's common to allocate at least twice as much storage capacity as they need. Add that up over dozens of servers, and a large amount of your expensive disk space is allocated. . .but empty.
Saving storage space with thin provisioning
Thin provisioning lets you share that unused space. Each volume now only occupies as many physical blocks as it has real data, and when more data comes along, the storage controller gives it more physical blocks from the shared pool, up to its full allocation if necessary.
Essentially, you need fewer disks because you're wasting less of them, and that means huge savings, says Roger Bearpark, assistant head of ICT for the London Borough of Hillingdon, which recently installed thin-provisioned storage from Compellent to replace its old StorageTek SAN.
"We have data growth of 100% a year, which was heavy even by industry standards," said Bearpark. "But we also had a lot of white space -- capacity that was allocated but unused."
According to Bearpark, thin provisioning has removed the need for much of that white space -- along with the disks that it lived on. "Our power consumption is down from around 34kW to just over 1kW, mostly from taking out disks that were spinning but not doing anything," he said, "and I've avoided probably £50,000 a year in administrator salary."
Volkswagen Financial Services (UK) used DataCore's SANmelody software, running on a redundant pair of storage servers, to implement a virtualised and thin-provisioned SAN. "We are certainly more efficient," says senior network specialist Mike Duxbury. "We were wasting terabytes before." He estimates that thin provisioning has probably saved his company 20% to 30% of its storage.
A big financial advantage with thin provisioning, adds Duxbury, is the ability to over-allocate, allocating the servers more total disk capacity than you physically own. So if you think your servers will need 10 TB each in three years time, you can give them all logical 10 TB volumes without needing to buy that much disk space right now. Extra physical capacity can be added to the SAN over time, as they actually consume it.
One caveat: You need to keep an eye on how much spare disk capacity you currently have. Typically the storage controller would be set to alert at 70% or 80% full, giving time to add more disk, which the controller then virtualises and adds to the block pool. From there, the extra blocks can be used to expand thin-provisioned volumes as needed.
According to Duxbury, adding storage is a largely automated process -- as far as the hardware allows it to be automated. "We can just hook up new disks and chuck them into the pool," he said. "Our two storage servers have spare SCSI ports so we can add another array. We still have to reboot the storage server when adding disks, at least until we switch to hot-pluggable SAS, but our high-availability system means that can be transparent to the applications."
Pitfalls of thin provisioning
As they are hosted on a virtualised SAN, thin-provisioned volumes can be protected against physical failure, for example by RAID arrays or mirrored storage servers. But at the logical volume level, that's not always true. Thin provisioning is relatively young technology, and some storage management software may not understand it yet.
"One of the big differentiators is some companies who offer thin provisioning can only replicate physical disks, so they can't replicate thin provisioned volumes," warns Alexandre Delcayre, EMEA technical director for FalconStor Software. He says it's important to ask potential suppliers if they can also run storage services on top, such as mirroring and replication?
Another problem is migrating to thin provisioning. If you use an image or sector-based tool that copies the free space too, the thin volume will simply take up its full assigned capacity.
Hillingdon used Compellent software, which allows fat legacy volumes to be converted into thin volumes by copying only the blocks actually in use. "We knew we had the opportunity in moving from one storage environment to another to tidy up," says Bearpark. "The Compellent thin-import option looked too good to be true, but it worked."
Volkswagen FS also uses a thin-copy scheme. "We did an exercise recently where we migrated all our data to new volumes on the SAN and deleted the old volumes," said Duxbury. "We got back 1.5 TB, because it only copied the used blocks. But of course it was a manual process. We had to take the servers offline to do it."
Thin provisioning challenge: reclaiming deleted filespace
However, that the Volkwagen FS team recovered 1.5 TB illustrates perhaps the biggest challenge facing thin provisioning today: the need to reclaim deleted filespace.
"One mistake we made was in just taking the maximum disk size we could," said Duxbury. "We gave every server 2 TB. The problem is that Windows doesn't give space back when it's finished with it, so we had a huge amount allocated but no longer used.
"For example, Exchange creates an awful lot of logfiles and deletes them when it's finished. Windows may reuse the deleted space, but it may also ask for more from the SAN. It depends where it decides to write its next files. Usually it's half and half."
The problem comes from using a file system that's designed to run on standalone machines and which doesn't know that it is now operating on a SAN. If the SAN is already protecting your data by keeping snapshots of it, there is no need for Windows to keep your deleted files around, just in case you want to undelete them.
Of course, file systems vary in their capabilities, and if your file system was designed for networked use (as many are), you may not have an issue. "We don't have much of a problem with file reclamation as we're primarily a Novell NetWare site for file and print," notes Bearpark.
Return to lender
But if, like the majority of sites, you do need to support Windows NTFS, can you recover that deleted filespace automatically? Only if you're willing to deploy agents on the servers connected to the thin-provisioned storage.
DataCore released a manual free space reclamation tool five years ago. According to Jamie Price, marketing vice president at DataCore, "We've long shied away from putting agents on servers -- people don't like them, they're messy. People were happier with administrators doing the reclamation. Now people are happier with thin provisioning though, so we're releasing an automated version."
"There has to be some level of communication with Windows, it's the only way to do it," argues Bob Fine, Compellent's director of product management. "Our Freespace Recovery agent runs in the background and compares how Windows and Compellent see the storage."
"The only way to keep an array efficient is if there is a conversation between the host and the array," agrees Geoff Hough, senior director of strategy for 3PAR. "So we partnered with Symantec to develop an API to do just that."
This API links into Symantec's Veritas Storage Foundation heterogeneous file system, which "aggressively re-uses space that has been previously used," Hough says. As long as the server has the Symantec software running, the array signals 'this is a thin-provisioned volume' and the server treats it accordingly.
Looking forward, Hillingdon's Bearpark says he'd like to see thin provisioning augmented with deduplication technology like that already used by his backup software.
That can be done today, but only at the file level, says FalconStor's Delcayre. "We prefer a file-based approach -- it's complementary," he said. "But if you have an intensive Oracle application, it will be much faster on a Fibre Channel SAN than on a filer. File-level resources are much more for fileservers, archiving, etc."
One thing suppliers and users agree on is the need to set capacity thresholds and alerts, and to track and analyse usage trends. Thin provisioning may let you avoid wasted white space, but sooner or later an application may actually want to fill the volume it's nominally been given.
And don't forget: Thin provisioning is only one part of a total solution. You can't do thin provisioning without storage virtualisation, and storage virtualisation goes hand-in-hand with server virtualisation. Together, they can bring new levels of efficiency and flexibility, but you can't optimise the servers or the storage alone -- you have to re-think your resources as a whole.
About the author: Bryan Betts is a UK-based journalist specialising in business and technology. He writes from time to time on computer storage technology and networking (NAS, SANs, Fibre Channel and virtualisation) for SearchStorage.co.UK and other magazines and Web sites.