Do you know how to configure a storage-area network (SAN) switch? In this podcast from SearchStorage.co.UK, bureau chief Antony Adshead talks with Steve Pinder, principal consultant at GlassHouse Technologies (UK), about how to decide what fabric topology to implement, the importance of zoning and masking, and how to figure out fan-in and fan-out ratios.
Click on the podcast link or read the transcript that follows for all you need to know about configuring a SAN switch.
Download for later:
- Internet Explorer: Right Click > Save Target As
- Firefox: Right Click > Save Link As
What is involved in configuring a SAN switch?
The answer to this question depends on whether you're starting a new SAN fabric or adding a switch to an existing one. If you're starting a new fabric, the configuration of the switch is much easier. All switches have a default setup and IP address. You'll need to connect to the IP address using a browser or command line and carry out some changes on the switch so it's configured correctly for the environment. For new switches, the only changes you must make are to configure the IP address, subnet mask and default gateway to allow you to connect to it via a browser or whatever transport you choose. All the settings will work for the new fabric.
The first switch in the fabric is called the principal switch and this switch holds the master database for fabric configuration. When other switches are added to the fabric, they download that information from the principal switch. All switches also have a domain ID which can be statically configured or allocated from the principal switch.
I tend to configure switches with static domain IDs so I can guarantee a particular domain ID will never be allocated to two different switches. If two switches have been allocated the same ID this could cause fabric segmentation, outages in the fabric and denial of service to logical unit numbers (LUNs).
For best practice you should also ensure unused switch ports are disabled. This will prevent unauthorised devices logging into the fabric and causing disruption to traffic. This should be done following initial port testing on a switch you're about to add to the fabric but before you add it to the fabric.
How do I decide what topology to implement?
Before starting on SAN topology it's important to say that when redundancy is required the known best practice is to implement two SAN fabrics and have known devices connected to both of them. This ensures that if a host is connected to both fabrics it will still be able to operate effectively if there's a switch, HBA [host bus adapter] or even an entire fabric failure. In these answers I'm going to assume that if redundancy is required then two identical SAN fabrics will be implemented.
There are a number of topologies that can be used when configuring a fabric, although there are three I'd recommend, depending on the size of the fabric and number of switches.
The single switch fabric has one switch. Director-class switches can be purchased with hundreds of ports, although they're expensive compared to low-capacity switches such as those with 32 ports. The single switch SAN offers the lowest possible latency between the host and its associated storage as all devices are connected to the single switch.
As most SANs grow over time, it's more likely that an organisation with a small SAN -- possibly a single switch SAN -- will add more switches as the number of devices grows. As you add devices this brings us to the second type of SAN, which is a mesh fabric. This type of fabric is one where every switch in the fabric is connected to every other switch in the fabric. The connections are via ISLs or inter-switch links. In this configuration the host will have to go through a maximum of one ISL to get to the storage it uses. When using a mesh configuration it's favourable to group a host and its storage on the same switch so that a host will not have to traverse an ISL to get to its storage. As the mesh grows, the number of ISLs on a single switch grows at the rate of one for every additional switch. After a certain point there's little benefit to adding extra switches as many of the additional ports are required for ISLs.
When you get to a large number of ports this is where the third type of fabric comes in, which is called core-edge. This configuration uses a large switch at the core of the fabric to which you would generally attach storage. Hosts are attached to smaller edge switches which are also attached to the core via ISLs. This topology can grow to hundreds or thousands of ports while ensuring hosts only have to traverse a maximum of two switches to access storage. Hosts that require very low latency or very high throughput can be connected to the core.
What is zoning and masking, and why is it important?
Zoning is a procedure that takes place on the SAN fabric and ensures devices can only communicate with those that they need to. Masking takes place on storage arrays and ensures that only particular World Wide Names [WWNs] can communicate with LUNs on that array. If the correct masking is applied to the storage array then there's no absolute necessity to configure zoning on the SAN, although using zoning and masking is always to be recommended.
There are two distinct methods of zoning that can be applied to a SAN: World Wide Name zoning and port zoning.
WWN zoning groups a number of WWNs in a zone and allows them to communicate with each other. The switch port that each device is connected to is irrelevant when WWN zoning is configured. One advantage of this type of zoning is that when a port is suspected to be faulty a device can be connected to another port without the need for fabric reconfiguration. A disadvantage is that if an HBA fails in a server the fabric will need to be reconfigured for the host to reattach to its storage. WWN zoning is also sometimes called 'soft zoning.'
Port zoning groups particular ports on a switch or number of switches together, allowing any device connected to those ports to communicate with each other. An advantage of port zoning is that you don't need to reconfigure a zone when an HBA is changed. A disadvantage is that any device can be attached into the zone and communicate with any device in the zone.
My opinion is that neither is particularly superior to the other, and what I find is that the type of zoning used is generally determined by what a particular consultant or organisation has done in the past.
What do I need to know about fan-in and fan-out?
The fan-in ratio denotes the number of hosts connected to a port on a storage array. There are many methods that have been used to determine the optimum number of hosts connected to a storage port, but in my experience there are no hard and fast rules to determine an absolute number.
My recommendation would always be to assess the throughput of each host you want to connect to a port, determine the maximum throughput of that port, and add hosts such that the total throughput is slightly higher than the throughput of that port. It's very important, however, to ensure you have good utilisation statistics available to detect any time period where the port is heavily utilised and could be causing a bottleneck to your SAN fabric.
There are a number of reasons why it's difficult to give a host count as an optimum fan-out ratio. These include: differing port speeds -- a 4 Gbps port can obviously handle twice the throughput of a 2 Gbps port and will allow you to add roughly double the number of hosts; and multipathing -- if a host has two HBAs, traffic will either be aggregated down those two HBAs in an active-active mode or all the traffic will go down one HBA and nothing down the other if the connection is active-passive.
These scenarios will have a big impact on how many hosts you can add to a particular port. In normal operating circumstances, you can connect double the [amount of] HBAs to a particular port as they will all be doing half the work of the host. This is in a multipathing environment. If, however, there's an issue with the SAN and a device has failed over from its active port to its passive port the remaining ports may be required to carry out twice the standard workload. This can cause poor performance if you oversubscribe hosts to storage ports.