How do SANs work

Are the claims that have been made for storage area networks true. We provide some analysis of how SANs work

Are the claims that have been made for storage area networks true. We provide some analysis of how SANs work

I must admit that I am rather confused about the claims being made for Storage Area Networks (Sans). The basic principle is to introduce a very high speed, limited distance network to interconnect computer I/O and disk/tape controllers, in place of the mess of cables common today. In essence, a network of computers and storage controllers is similar in architecture to a mainframe. The claimed attraction of a San is that a variety of storage subsystems and computers can be interconnected.

All this seems attractive, but there are some serious questions yet to be answered. A storage subsystem only stores the actual physical data. In most applications there are layers of software above the physical sectors on the disks, which progressively apply meaning to the raw bit patterns. Above the storage layer is a file system, which manages physical sector allocation to present a logical 'file' to the operating system. This involves a low-level directory, which itself needs storage. Above the file level, software is free to open and close files by logical names, without recourse to knowledge of which part of the disk or tape is referenced. Within the file, however, the 'data' is simply a sequential string of bits/bytes. Thus a record management layer is required on top of the file system, in order to calculate which string of bytes inside the file corresponds to a desired 'record'.

This mapping of logical references to data in practice gets more complex as databases need more complex, interrelated directories, than the simpler sequential record-oriented file systems. One of the key refinements of modern database software is the extensive use of caching of records into memory to speed up reads; indeed with big databases the extensively referenced read-only parts of the database are copied into memory, the disk effectively acting as a non-volatile back-up. Caching is far more complex since data written to the cache must be mapped to the storage system, and in the case of multiple caches they must be synchronised.

The point of this discussion then is that it is a very complex situation in practice, one that has taken years of effort to arrive at practical working solutions. And those solutions are largely inside big computers, or to some extent in clustered computers with a common storage controller. All that capability must be re-engineered to fit the San.

The question now arises as to which supplier is responsible for the synchronisation of this complex software? If the San is to be merely a new generation of mainframe storage sub systems, then this is practical. But the suppliers of Sans are allegedly targeting networks involving a variety of computers, and even more importantly, operating systems. This is where my confusion comes in. Surely each operating system will have its own algorithm for allocating physical disk sectors to logical files! The meaning of any individual disk sector is controlled by the file software, and so it would seem to me that there is a very complex situation arising where some point of control must be imposed, and that is adding a degree of complexity which is avoided in a mainframe complex (admittedly because only one operating system is involved). It seems that true sharing of the San resources is some way off yet. There is, however, a compromise in which the disk subsystem controller can split sector allocation into multiple partitions so that, say, Unix can use one section, NT another, and MVS another. This, however, is only sharing the storage sub systems, and not the data.

The key to the progress of Sans lies in the intelligence of the storage controllers. It would be possible, for instance, for multiple file directory systems to run on the controller, synchronised by replication to the individual computers, using the San. If this is possible, then one of the big advantages of this architecture could be applied to support archiving and back up, typically between disk and tape subsystems. At present back-up software works by reading sectors or files into memory, and then writing them back to the tape systems. If the controller has all the necessary mapping directory information, then selective copying from disk to tape can be done across the San, without intermediate storage in the computer.

In summary, as a storage option for multiple similar computers, the San is very interesting, but as a replacement for huge systems with various operating systems involved, it seems far too complex to even work!

Read more on Data centre hardware