NetApp ALUA Configuration
NetApp ALUA Configuration
NetApp ALUA Configuration
www.citrix.com
Contents
Introduction .................................................................................................................................................... 4
Prerequisites ................................................................................................................................................ 4
Editing the Multipath.conf File to Use the NetApp ALUA Settings ...............................................20
Troubleshooting ...........................................................................................................................................28
Increasing the Number of Errors Captured in the System Log for Multipathing ..........................28
2
Additional Information ...............................................................................................................................29
3
Introduction
This document explains how to configure a high availability pair of NetApp controllers using
Asymmetric Logical Unit Access (ALUA) for the Fibre Channel protocol.
The term high availability pair refers to a NetApp that contains two controllers that are configured
to failover traffic in the event one fails.
Although many NetApp arrays now come with two controllers, to get them to work efficiently with
virtualized servers, you must enable ALUA by performing specific steps in XenServer and on the
NetApp.
Enabling ALUA in your XenServer environment is a multi-stage process with several different
places for potential errors. This article explains some basic ALUA concepts and provides a process
for enabling ALUA, which is based on XenServer engineerings internal testing scenarios. This
process is structured to help isolate errors at different stages, so if a setting is misconfigured it is
easier to isolate the issue.
Prerequisites
This article assumes that you can perform basic configuration on a NetApp storage array and create
a XenServer Storage Repository (SR). It also assumes that you have some familiarity with Linux
multipathing configuration, know about the multipath.conf file, and perhaps have configured
multipathing before.
This article assumes that you may not be familiar with ALUA. If you are familiar with ALUA, you
may want to skip to Configuring ALUA on page 8.
If you are not familiar with the prerequisite concepts, you may find you need background
information while you read this article. Some references are provided at the end of this article.
What is ALUA?
Having multiple controllers on NetApp storage provides both throughput and redundancy.
However, you need to enable ALUA or else it introduces inefficient paths between the XenServer
host and the storage.
ALUA is a SCSI standard some arrays use to ensure that storage traffic takes the most efficient path
from the host to the storage. If ALUA is not enabled correctly on both the NetApp and XenServer
host, storage traffic may experience latency and the NetApp may automatically contact NetApp
support.
4
Why is ALUA necessary?
ALUA is not used on all brands of storage. However, some vendors support ALUA to ensure
storage traffic takes the most efficient path when users configure a second storage controller for
failover. In a failover configuration, the second controller, known as the partner controller, can access
the LUN if the primary controller (owner) associated with the LUN fails.
The following illustrations shows how when ALUA is enabled traffic to the LUN takes the path
associated with the controller that owns (or is associated with) the LUN.
This illustration shows two images. In the top image, when ALUA is enabled the traffic takes the most efficient path
through the owner controller. The second image shows an example of how storage traffic may take any path when
ALUA is not enabled. In this image, the storage traffic goes through the less efficient path by traveling through the
partner controller, the interconnect cable, and then the owner controller before reaching the LUN.
5
How does ALUA work?
Virtualized hosts are not aware of what path is optimal. To prevent storage traffic from taking the
less optimal path, paths to the LUN are given a weight or priority.
The most efficient paths to the LUN (through the owner controller) are given the highest weight.
For example:
1. In a two controller configuration, enabling ALUA ensures the owner LUNs storage traffic
takes the path through the owner controller.
2. Traffic destined for the owner LUN only takes the partner-controller path if the owner path
is unavailable (for example, during failover).
When the owner controller for a LUN fails, the partner controller assumes control of the LUN.
XenServer sends traffic to the remaining links (associated with the partner). In effect, all of the
LUNs associations move to the partner controller so the traffic does not go over the cluster
interconnect. Instead, traffic uses the links associated with the partner controller so it is local to the
remaining physical controller.
In general, if ALUA and its dependent configurations are set up correctly, almost no traffic should
come across the cluster interconnect.
Enabling ALUA does not mean that the partner storage controller is sitting idle. Another LUN can
use it as its primary controller. The illustration that follows shows a typical configuration when
multiple controllers are installed.
6
These two illustrations show how Controller 2 processes the traffic for its LUN (LUN B) but can be used to process
the traffic for LUN A if Controller 1 is unavailable. In the first illustration, ALUA is enabled so the host can send
its traffic down the owner paths to the owner controllers for the LUN. However, as shown in the second illustration,
the storage controllers function as a High Availability pair and, when the Controller 1 fails, Controller 2 takes over
LUN A and sends traffic to the LUN using the interconnect cables that are internal to the disk shelf.
When owner controller (controller 1) for LUN A fails, the partner controller (controller 2) assumes
control of the LUN.As a result, until controller 1 comes back online, controller 2 is processing
storage traffic for both LUN A and LUN B.
Enabling ALUA requires configuring settings in both the XenServer and NetApp.
On each XenServer host in the pool, you must modify the multipath.conf file to use the
ALUA settings, as described on page20.
7
On the owner NetApp controller, you must enable ALUA when you create the initiator
group, as described on page Creating the Initiator Group on page 14.
Note: Although traditionally ALUA has been a Fibre Channel standard, NetApp supports it for
iSCSI as of Data ONTAP 8.1. However, iSCSI is beyond the scope of this document.
Configuring ALUA
This section provides information about how to configure ALUA, including the following:
Note: In our test configuration referenced throughout this document, we had two HBA ports and
two controllers, which resulted in eight active paths. All of the examples provided assume you are
performing a similar configuration. If your configuration is more complex and you have more active
paths, the number of paths active at different points in this process will be different from those
listed in this document.
Minimum Requirements
Fibre Channel switch; however, two switches is the NetApp best practice
Each host in the pool must have at least two Fibre Channel HBA ports (either (a) one HBA
with two ports or (b) two HBAs with one port each)
A version of NetApp Data ONTAP that is compatible with ALUA (see your ONTAP
documentation)
8
Our Test Configuration
Brocade switch.
An Emulex Zypher-X LightPulse Fibre Channel dual-port HBA was installed in the host
Two NetApp FAS2040 controllers. As a result, the interconnect cable was internal. If you
have two separate chassis, you must plug the chassis into each other using an external
connection cable.
XenServer 6.0.2.
a. Connecting or inserting storage controllers (if two are not already installed in the
storage)
2. Configuring a pair of High Availability controllers, including ensuring the correct licenses are
present, the operating systems match, and the options match.
a. Creating the initiator group on the owner controller (and enabling ALUA in it).
9
b. Editing the multipath.conf file.
d. Creating the SR. (If you select the pool node in XenCenter when you create the SR,
you only need to create the SR once.)
Instead of saving verification until the end, the configuration also includes verification stages after
each major task to make it easier to isolate issues.
Configuring ALUA requires two storage controllers, which can either be in the same chassis or in
different chassis. Before configuring ALUA, if the two controllers are not already connected, you
must connect them.
If the controllers are in different chassis, you must make an interconnect link between the two
controllers using a physical interconnect cable. If the controllers are in the same chassis, they can use
the internal interconnect links in the chassis.
In some cases, if you want an internal interconnection and the second controller is not currently
installed in the chassis, you may need to add it. Inserting the second controller requires redoing your
RAID configuration and reinstalling ONTAP, which results in erasing your data from the disks.
Likewise, you will lose disk space (capacity) because adding a second controller means installing
ONTAP on some disks for that second controller to use.
For more information about installing controllers or connecting controllers in different chassis, see
your NetApp documentation.
Networking
When enabling ALUA with Fibre Channel, subnet configuration is not relevant so the controllers
can be on same subnet or different subnets. In our test environment, the NetApp Controller was on
a different subnet than the switch.
Note: Subnet configuration only matters for ALUA with iSCSI, which NetApp supports in ONTAP
8.1. In an iSCSI configuration, it is recommended to have both of the storage (data) networks on
different subnets (on NetApp Storage as well as XenServer). However, iSCSI ALUA configuration is
beyond the scope of this document.
10
Configuring a High Availability Pair of Controllers
NetApp uses the term High Availability to refer to two storage controllers configured for
redundancy so they can provide a failover path to the same LUN.
When you plug the second controller into the NetApp, NetApp automatically configures the two
controllers into a High Availability pair. However, there are specific parameters and options that
must be the same on both controllers or else the pairing may not work or may not be able to
failover smoothly.
Matching Options
Make sure the following options are the same on both controllers. To access, these options, connect
to the owner and then partner controllers using a tool, such as Putty, and run the optionstimed
command. There is additional information about these options in their man pages.
Time zone timezone The timezone option displays the current time zone.
Both controllers must use the same time zone settings. Set the
time zone on the owner controller in the options.
11
Options options Running this command lists options for the time of day. You
timed timed can use this command to specify an NTP server.
For more information, see the Data ONTAP 8.0 7-Mode High-Availability Configuration Guide.
Note: If the Time Zone settings do not match on both controllers, you may receive a Timed
Daemon warning on one of the controllers.
This procedure uses the generic, vendor-agnostic systool command. However, if desired, you may
be able to obtain WWPNs by running vendor-specific HBA utilities or utilities like HBAAnywhere.
For additional tips on finding WWPNs, see CTX118791--Multipathing Overview for XenServer 5.0.
1. On the XenServer host (for example, by using the Console tab in XenCenter), enter the following
command at the command prompt:
When specifying the WWPN, omit the 0x from the port_name value. For example, for
0x10000000c9adbf06, enter 10000000c9adbf06.
Tip: It may be possible to use this procedure for configuring iSCSI HBAs as well. However,
that is beyond the scope of our testing.
12
Zoning the Fibre Channel Switches for ALUA
To configure ALUA, you must use a Fibre Channel switch between the links and configure zones
on those links. The specific configuration varies by switch model and manufacturer.
The NetApp best-practice configuration uses two Fibre Channel switches and zones the switches
accordingly. In the NetApp documentation, this is referred to as dual-fabric zoning. At the time this
article was created this method is outlined in the Dual Fabric HA Pair Zoning Data section of
the ONTAP 8.1 SAN Configuration Guide for 7-Mode (Part No. 210-05673_A0)
In our testing, we only used one Fibre Channel switch and did not do dual-fabric zoning. This is a
high-level summary of the method we used during our testing:
1. Create two zone-sets in the switch. For example, zone-set1 and zone-set2.
13
In this illustration, the WWPN from FC-HBA1and both HBAs on controller1 and controller2 are added to zone-
set1 in the switch. The WWPN from FC-HBA2 and both HBAs on controller1 and controller2 are added to zone-
set2 in the switch.
Best Practice
It is helpful to create a separate initiator group for each host in the XenServer pool and name those
groups based on the hostname/IP address of the associated host. Once you create the initiator
group for a host, put only the WWPNs for that host in that group and name the group after the
host. This makes it easier to distinguish which LUN is exposed to which host.
You only need to create initiator groups on the owner controller. Likewise, when you create the
LUN, you specify only the initiator group on the owner controller. The following illustration shows
how, on the owner controller, there is an initiator group for each host.
14
This illustration shows how you create the initiator group on the owner controller. The initiator group contains the
WWPNs for all the connected HBAs in the pool and the ALUA checkbox is enabled in the initiator groups.
However, each initiator group only contains the WWPNs from the host the group was named after.
Note: Information about creating initiator groups and retrieving WWPNs is available in the
XenServer and NetApp Storage Best Practices.
1. In NetApp System Manager, in the left pane, select LUNs, and click the Initiator Groups tab,
and click Create.
2. In the Create Initiator Group dialog, click the General tab, and do the following:
a. In the Name box, enter a name that uniquely identifies the Initiator Group XenServer hosts
will use. For example, name it after the host and/or IP address. In environments that use
multiple storage protocols, it is helpful to enter the storage protocol you will be using with
this Initiator Group (for example, iSCSI or Fibre Channel).
Note: Configuring initiator groups with any operating system besides Linux is not
recommended.
15
c. Select FC/FCoE as the protocol.
3. In the Create Initiators Group dialog box, click the Initiators tab and click Add.
4. Enter the WWPNs from both HBA ports from all the XenServer hosts in the pool in the Name
box, and click OK:
16
5. Repeat this process for each host in the pool. As previously described, the best-practice is to
create one initiator group per host on the owner controller.
For example, if you decide that Controller 1 is where you want the LUN to run normally (if there
isnt a failover), then create the LUN by selecting that controller in NetApp System Manager.
For more information about creating the LUN, see the Citrix XenServer and NetApp Storage Best
Practices guide.
1. In NetApp System Manager, in the tree pane, select the HA Configuration node.
Tip: You can also verify HA is enabled by running the cf status command.
cf takeover
A message in the feedback appears stating that the takeover has started (or failed).
17
2. Run the command:
cf status
If the command returns the message, Filer X has taken over Filer Y, then the takeover
completed successfully.
In addition, after running the cf takeover command or using the System Manager Takeover
feature, the following appears in System Manager.
1. Check the High Availability status to determine if the NetApp is ready to be restored to the
controller. To do so, run the cf status command on the partner controller (that is the controller
you just ran the command on):
cf status
1. If the NetApp returns Filer X is ready for giveback, run the following command on the
partner controller:
cf giveback
2. Editing the multipath.conf file so it uses the ALUA settings for NetApp.
18
4. Creating the SR.
Important: When you create the SR, do not select the (*) wildcard option, which appears in
the XenCenter SR wizard > Location Page > Target IQN box. Selecting the (*) wildcard
option is not necessary for most, if not all, NetApps. Using the wildcard option when it is
not appropriate can actually slow down storage performance.
For information about when to select the (*) wildcard option and the pattern of IQN/IP
addresses returned that requires it, see CTX136354Configuring Multipathing for XenServer.
4. After you determine what port you want to configure as the target, run the
fcadmin config t <type> <adapter_name> command. For example, to configure port 0c as the
target, run:
19
5. To put the ports back online, run the following command:
6. Repeat this process for the other Fibre Channel HBAs on the controllers.
Note: It may be necessary to restart the NetApp for the changes to take effect.
Although the ALUA settings appear in the multipath.conf file, the storage vendor defines these
ALUA settings, not Citrix.
Note: Be sure to detach any NetApp SRs that may be using the settings in multipath.conf before
performing this procedure, even if those SRs connect to other NetApp storage.
1. Using a program like WinSCP or the CLI method of your choice, on each host in the pool, open
the /etc/multipath.conf file.
2. Find the NetApp section, in the defaults section, include the following settings:
20
defaults
user_friendly_names no
queue_without_daemon no
flush_on_last_del yes
4. Replace the text in the devices section with text for ALUA, as shown below:
devices {
device {
vendor "NETAPP"
product "LUN"
path_grouping_policy group_by_prio
path_checker directio
failback immediate
rr_weight uniform
rr_min_io 128
21
5. Save and close the file.
Enabling Multipathing
When you have two storage controllers configured with, for example, two NICs on each controller,
you will have eight active paths once you enable multipathing. All eight of these paths must be active
for ALUA to function correctly and the owner controller to failover successfully.
Ideally, multipathing should be enabled before creating the SR but it is still possible to enable it after
SR creation.
1. After editing the multipath.conf file on all hosts in the pool, open XenCenter, select the host and
then the General tab.
2. Click the Properties button and then click on the Multipathing tab.
3. Select the Enable multipathing on this server check box, and click OK.
If multipathing is enabled correctly, you should see all of the paths marked as active in the
Multipathing section on the General tab for the SR. The screen capture that follows shows eight
of eight paths marked as active, but the number of paths could vary according to the total
number of paths in your configuration:
22
Note: To enable multipathing using the CLI, see the XenCenter Administrators Guide.
Common signs multipathing is not working include: alerts in XenCenter, the inability of traffic to
failover, or only 4 of 8 paths being active (when both HBA cables are plugged in to the host).
To verify multipathing is working you can either do a cable push/pull test or block and unblock
switch ports (from the host side as well as the target side).
23
To verify multipathing is working (cable push/pull test)
Assuming your environment originally had 8 paths to the storage, if XenCenter says4 of 8paths
are active, multipathing is working correctly.
2. Check the storage traffic failed over by running the following command:
Check to make sure the traffic has resumed on the HBA that is still active.
Creating the SR
After enabling multipathing, create the XenServer SR for the LUN associated with the owner
controller. To do so, create the SR by selecting node in the Resource pane of XenCenter so the SR is
available to the entire pool. (In XenCenter, right click the pool node and click New SR.)
If you do not enable multipathing before creating the SR, you can still enable multipathing.
However, you must put the pool into maintenance mode first.
For more information about creating SRs, see the XenCenter Help.
Before you begin these procedures, you will need the following:
To create data traffic, you can copy a file from a share or other location to the virtual machine.
24
To determine which devices represent the owner and partner paths
2. In the output, look for the devices listed in the section with the highest priority. The devices
represent virtual disks and partitions of virtual disks on the LUN.
For example, in the output in the screen capture below, in the first group (the prio=200 section),
devices sdd, sdh, sdl, and sdp represent the devices on the owner path. In the prio=40 group,
devices sdb, sdf, sdn, and sdj represent devices on the partner path.
Notes:
2. In the previous procedure, we used the multipath ll command simply because we were not
aware of the multipathd syntax for determining which devices represent the owner path. While
the multipath ll and the multipathd commands are standard Linux commands and
interchangeable for this purpose, Citrix generally recommends using the multipathd command
(echo show topology | multipathd k) simply because it is consistent with our other
multipathing commands.
3. You can also run dmsetup table on the XenServer host to see the path groupings.
2. Verify that no or very few Ops run through the low-priority group.
Note:
When you start the VM, you may see some spikes on the partner devices. Occasionally, the
partner will have 2 Ops. The multipathd daemon initiates these Ops, which are essentially
25
path-checker Ops. The Ops count increases depending on the number of paths. Every 20
seconds, the partner controller sees 2 Ops/path for every path-checker issued.
For example, if your zoning is configured like the example on page 13 with eight paths on the
XenServer host (four optimal, four non-optimal), if you run the iostat command on the controller
owning the LUN, you would see 8 Ops on the partner controller.
To verify the ALUA setting (is there traffic on the partner path)
1. Connect to the owner NetApp controller (by specifying its IP address in Putty) and run the lun
stats command as follows:
where<checking interval> is how often in seconds the lun stats command is run and <path to
LUN> is the path to your LUN
3. Look in the Read Ops and Write Ops columns and the Partner columns. The majority of data
should appear in the Read or Write Ops columns and not in the Partner columns. As previously
discussed, for the partner controller, 2 Ops will appear per path.
For example, if ALUA is not enabled correctly, you will see more data in the Partner KB
column, as shown in the screen capture that follows.
26
Note: You can try to copy a file to see a spike in the Read and Write columns, but the partner
should remain at or very close to 0.
1. Using Putty or a similar utility, connect to the partner controller and run the cf takeover
command on the partner controller to simulate the owner controller failing.
2. Check the HA Configuration tab in the System Manager. You should see an HA error for the
owner controller and the active/active state should have a failover status, as follows:
27
If the test was successful, an error will appear beside HA row and it will show the Active/active
state as being in Failover.
Also, if you open XenCenter, assuming you had eight paths active previously, on the General tab of
the SR, in the multipathing status section, it will state 4 of 4 paths active.
4. After performing takeover, give the LUN back to the controller by either clicking the Giveback
button in the NetApp System Manager or running the cf giveback command.
Troubleshooting
If the ALUA configuration is incorrect, you will see the following symptoms:
The array contacts NetApp (if enabled), which is sometimes colloquially known as phoning
home to NetApp
user_friendly_names no
pg_prio_calc avg
verbosity 6
28
Note: Valid values for the verbosity keyword are 0 to 6, where 6 is the maximum verbosity.
By default, without specifying a keyword, verbosity is set to 2.
3. Save the multipath.conf file and close it. Prio errors will begin to appear in the syslog.
Additional Information
Data ONTAP 8.0 7-Mode High-Availability Configuration Guide
CTX118791 Multipathing Overview for XenServer 5.0. This article includes many tips and commands
for configuring and troubleshooting multipathing that are still valid in XenServer 6.1.
XenServer 6.1 Administrator's Guide. This guide includes a small section about multipathing, some
information about the different handlers, and instructions for configuring multipathing using the
CLI.
XenServer 6.1 Quick Start Guide. This guide and the XenCenter Help provide basic information about
basic tasks, such as creating an SR.
29
About Citrix
Citrix Systems, Inc. (NASDAQ:CTXS) is the leading provider of virtualization, networking and
software as a service technologies for more than 230,000 organizations worldwide. Its Citrix
Delivery Center, Citrix Cloud Center (C3) and Citrix Online Services product families radically
simplify computing for millions of users, delivering applications as an on-demand service to any
user, in any location on any device. Citrix customers include the worlds largest Internet companies,
99 percent of Fortune Global 500 enterprises, and hundreds of thousands of small businesses and
prosumers worldwide. Citrix partners with over 10,000 companies worldwide in more than 100
countries. Founded in 1989, annual revenue in 2010 was $1.87 billion.
2013 Citrix Systems, Inc. All rights reserved. Citrix, Access Gateway, Branch Repeater,
Citrix Repeater, HDX, XenServer, XenCenter, XenApp, XenDesktop and Citrix
Delivery Center are trademarks of Citrix Systems, Inc. and/or one or more of its subsidiaries, and
may be registered in the United States Patent and Trademark Office and in other countries. All other
trademarks and registered trademarks are property of their respective owners.
30