It's very odd. Dell, EMC and NetApp are all working on storage array management of application server-located flash cache. EMC has Project Lightning, NetApp has Project Mercury and Dell talked about it during its recent Dell Storage Forum in Orlando, Fla. The question is, however, why on earth would you want to do that?
Let’s have a look at flash cache, where it resides, where you can manage it from and the pros and cons of each.
There are four places you can put flash memory: in the application server, inline between the app server and array (think Avere and Alacritech), in the array controller (think NetApp Flash Cache), and as a tier of fast-access drives in array disk enclosures.
Apps execute faster if cache is in the server because they get much quicker access to data in that cache than if they have to go elsewhere for it. Also, flash in an app server is local to that server and cannot be shared. By contrast, flash in an in-line filer or SAN accelerator can be shared among accessing servers, as can flash in the array controller and SSD in the array itself.
Now what about the management overhead of managing server flash from the array? If an array manages flash in app servers, it might have to manage dozens or hundreds of them in separate caches. It’s going to gobble up controller processor cycles, and adding the code to do it won't be a trivial matter.
So, why do it? Let's see what happens with app server-managed flash, array-managed app server flash and array controller flash when an app requests data from or writes data to an array.
App server-managed flash will get you data fastest of all if it's in the cache because the distance travelled is so short. Of course, a cache miss means we’d have to go to the array, but cache loading can be done by fetching data from the region around the specific read data and hoping the next byte or byte set is in there somewhere.
With array-managed app server flash cache, the same thing happens. The data will be in cache or not, and the cache load algorithm will be pretty much the same. So, time to data will be the same, but this cache will work only with the array controller that manages it, and with this arrangement you will lose the flexibility to use whatever arrays you want. Why would customers give up that flexibility for a zero improvement in data access speed and a significant increase in array controller code complexity?
If array suppliers say they can better load the cache than an app server can, the question is, Why? The app server sees the same I/O requests as the array controller. Why should the app server be better able to interpret the pattern of those requests and load the cache better?
With array controller cache, we have a network link to cross and an array controller poking its oar in as well. "Oh," it says, "you want data? Perhaps it's in cache. Oh, yes … it is. Here you are." All of which slows data access a little more.
You may also have request contention as multiple requests come in from app servers and the array controller gets maxed out or the cache gets thrashed as data is loaded, evicted and loaded again. App server flash avoids that problem.
With array-located SSDs, the array controller has to fetch the data from the SSD before squirting it out to the app server, so you have to add the time needed to access it and get the data across the internal array backplane.
For that reason, I can't see any data access speed advantage in having array controller-managed app server flash rather than app server-managed flash caching. But I can see a lock-in advantage for vendors and loss of flexibility for customers.
Having said all that, there can be utility in having both server-side and array controller-side flash. Shawn Kung, director of product marketing with US semiconductor maker Marvell, said, "When the server-side cache decides to evict something that was previously cached, it could tell the storage array cache to keep it just in case it is required again in the future. There is [also] an opportunity for deduplicating [between] the cache[s]. [Finally] the storage array can help invalidate stale caches to prevent cache coherency issues if applications move around to different servers."
When writing data the situation is even simpler. The app server application knows what it wants to write, not the array controller. Kung thinks app server-located flash is good for writes. He calls it “dirty data” caching, as the data in cache is different from what's in the array. He said, "The principal advantage of server-side dirty data caching is that it enables a tremendous reduction in write I/O traffic to the storage array through write coalescing and pruning of overwrites.”
So, to sum up, a mixture of array- and server-side cache can be a good thing, but there’s nothing to say having the array controller manage the server-side cache is beneficial.
Array controller-managed app server cache gets you data at the same speed as app server-managed flash cache but with array supplier lock-in and added array complexity.
Array controller flash cache gets you data a few microseconds’ worth of network hop and CPU cycles slower than a flash cache in the app server, and a few more microseconds faster than from SSDs in the array enclosures.
Having server-side flash is good for write I/O reduction, but you can do that without having the array manage the server-side flash and locking you in to the array supplier.
So, am I missing something? Where is the customer benefit in having array controller-managed flash cache in application servers?
Chris Mellor is storage editor with The Register.
This was first published in June 2011