Home > Articles

ALUA

📄 Contents

  1. ALUA Definition
  2. Summary
  • Print
  • + Share This

In this sample chapter from Storage Design and Implementation in vSphere 6: A Technology Deep Dive, 2nd Edition, learn how to identify various ALUA (Asymmetric Logical Unit Access) configurations and how different configurations affect the hosts.

This chapter is from the book

Storage arrays provide various configurations and features, depending on their class and design. Depending on how the arrays handle I/O to devices presented to hosts, they can be classified as follows:

  • Active/Active—I/O (input/output) can be sent to logical unit numbers (LUNs) via any storage processor (SP) and port. Most of these arrays have large caches in place, and the I/O is done on the LUN representation in cache, and then the writes are flushed to the physical disks asynchronously from the I/O.

  • Active/  Passive—I/O can be sent only to any port on the storage processor that “owns” the LUN (also known as the active SP). If the I/O is attempted on the LUN via ports on the “non-owner” processor (also known as a passive SP), an error is returned to the initiator that means, simply, “No entry,” or “No, you can’t do that.” (I provide the actual sense codes in Chapter 7, “Multipathing and Failover.”)

  • Pseudo-active/Active (also known as asymmetric active-active)—I/O can be sent to ports on either storage processor. However, I/O sent to the owner processor is faster than that sent to the non-owner processor because of the path the I/O takes to get to the devices from each SP. Going through the non-owner SP involves sending the I/O via some back-end channels, whereas there is a direct path via the owner SP.

The latter two types of arrays have recently started implementing a SCSI-3 specification referred to as Asymmetric Logical Unit Access (ALUA). It allows access to the array devices via both SPs but clearly identifies to the initiators which targets are on the owner SP and which are on the non-owner SP. ALUA support was first introduced in vSphere 4.0.

ALUA Definition

ALUA is described in the T10 SCSI-3 specification SPC-3, section 5.8 (see www.t10.org/cgi-bin/ac.pl?t=f&f=spc3r23.pdf; access to this URL requires T10 membership or other organizational access). The official description from this standard is as follows:

In simpler terms, ALUA specifies a type of storage device that is capable of servicing I/O to a given LUN on two different storage processors but in an uneven manner.

As I mentioned briefly earlier, using ALUA, I/O to a given LUN can be sent to available ports on any of the SPs in the storage array. This is closer to the behavior of asymmetric active/active arrays than to that of active/passive arrays. The I/O is allowed to the LUN, but the performance of the owner SP is better than that of the non-owner SP. To allow the initiators to identify which targets would provide the best I/O, the ports on each SP are grouped together into target port groups. Each target port group is given a distinctive “state” (asymmetric access state [AAS]) that denotes the optimization of ports on one SP compared to ports on the other SP (for example, active-optimized versus active-non-optimized).

ALUA Target Port Groups

According to SPC-3, a target port group (TPG) is described as follows:

This simply means that in a given storage array that has, say, two SPs—SPA and SPB—ports on SPA are grouped together, and ports on SPB are grouped in a separate group. Assume that this storage array presents two LUNs—LUN 1 and LUN 2—to initiators in E hosts and that LUN 1 is owned by SPA, whereas LUN 2 is owned by SPB. For the hosts, it is better to access LUN 1 via SPA and to access LUN 2 via SPB. Relative to LUN 1, ports on SPA are in the active-optimized (AO) TPGs, and ports on SPB are in the active-non-optimized (ANO) TPGs. The reverse is true for LUN 2 in this example, where TPGs on SPA are ANO, and TPGs on SPB are AO.

Figure 6.1 shows this example on an asymmetric active/active array. TPG with ID=1 (the left-hand rectangle on SPA) is AO (represented by the solid line connecting it to LUN 1). This same TPG is ANO for LUN2 (represented by the interrupted line connecting TPG 1 to LUN 2).

06fig01.jpg

Figure 6.1 Illustration of TPGs

The reverse is true for TPG with ID=2. That is, it is AO for LUN 2 and ANO for LUN 1.

On some active/passive ALUA-capable arrays, you may see port groups with “Standby” AAS instead of “ANO” on the non-owner SP.

Asymmetric Access State

Ports in an ALUA TPG can be in the same AAS at all times with respect to a given LUN. The TPG’s AAS are reported to the initiators in response to the REPORT TPGS command. The TPG descriptor is reported in byte 1 of that response.

The possible states are as follows:

  1. Active-optimized (AO)—Ports are on the owner SP and provided the best I/O to the LUN.

    • Active-non-optimized (ANO)—Ports are on the non-owner SP. I/O to the LUN is less optimal compared to AO AAS.

    • Transitioning—The TPG AAS is in the process of switching from one state to another. For example, if the SP of an AO TPG is being rebooted or is taken offline, or if the SAN (storage area network) admin manually transfers LUN ownership (on EMC CLARiiON, this is known as trespass), the AAS of the TPG on the alternate SP changes to AO. While this process is ongoing, the TPG AAS is transitioning.

      While the TPG is in this state, receiving requests from the initiators return BUSY or a CHECK CONDITION with sense key NOT READY and ASC (additional sense code) LOGICAL UNIT NOT ACCESSIBLE or ASYMMETRIC ACCESS STATE TRANSITION.

    • Standby—This state is similar to a passive SP in a non-ALUA configuration and on certain ALUA-capable arrays. It returns a CHECK CONDITION with sense key NOT READY.

      When the TPG is in this AAS, it supports a subset of commands that it accepts when it is in AO AAS:

      INQUIRY

      LOG SELECT

      LOG SENSE

      MODE SELECT

      MODE SENSE

      REPORT LUNS (for LUN 0)

      RECEIVE DIAGNOSTIC RESULTS

      SEND DIAGNOSTIC

      REPORT TARGET PORT GROUPS

      SET TARGET PORT GROUPS

      REQUEST SENSE

      PERSISTENT RESERVE IN

      PERSISTENT RESERVE OUT

      Echo buffer modes of READ BUFFER

      Echo buffer modes of WRITE BUFFER

    • Unavailable—This AAS is usually seen when the TPG’s access to the LUN is restricted as a result of hardware errors or other SCSI device limitations. A TPG in this state is unable to transition to AO or ANO until the error subsides.

Some ALUA storage arrays certified with vSphere 6 might not support some of the latter three states.

ESXi 6 sends the I/O to TPGs that are in AO AAS, but if they are not available, I/O is sent to TPGs that are in ANO AAS. If the storage array receives sustained I/O on TPGs that are in ANO AAS, the array transitions the TPG’s state to AO AAS. Who makes that change depends on the ALUA management mode of the storage array (see the next section).

ALUA Management Modes

The dynamic nature of multipathing and failover requires the flexibility of managing and controlling an ALUA TPG’s AAS. This is done via a set of commands and responses to and from the storage arrays. These commands are as follows:

  • INQUIRY—According to SPC-3, section 6.4.2, in response to this command, an array returns certain pages of the VPD (vital product data) or EVPD (extended vital product data). The inquiry data returned in response to this command includes the TPGS field. If the returned value in that field is nonzero, that device (LUN) supports ALUA. (See Table 6.3, later in this chapter, for the correlation between the value of the TPGS field and AAS management modes.)

  • REPORT TARGET PORT GROUPS (REPORT TPGs)—This command requests that the storage array send the TPG information to the initiator.

  • SET TARGET PORT GROUPS (SET TPGs)—This command requests that the storage array set the AAS of all ports in specified TPGs. For example, a TPG’s AAS can transition from ANO to AO via the SET TPGs command.

The control or management of ALUA AAS can operate in one of four modes (see Table 6.1):

Table 6.1 ALUA AAS management modes

Mode Managed By REPORTPGs SET TPGs
Not Supported N/A Invalid Invalid
Implicit Array Yes No
Explicit Host Yes Yes
Both Array/host Yes Yes
  • Not Supported—The response to the REPORT TPGs and SET TPGs commands is invalid. This means that the storage array does not support ALUA or, in the case of EMC CLARiiON, the initiator records are not configured in a mode that supports ALUA.

  • Implicit—The array responds to REPORT TPGs but not SET TPGs commands. In this case, setting the TPG’s AAS is done only by the storage array.

  • Explicit—The array responds to both REPORT TPGs and SET TPGs commands. In this case, setting the TPG’s AAS can be done only by the initiator.

  • Both—Same as Explicit, but both the array and the initiator can set the TPG’s AAS.

ALUA Common Implementations

The combination of ALUA AAS and management modes varies by vendor. Table 6.2 shows a matrix of common combinations.

Table 6.2 Common ALUA implementations

Mode AO ANO Standby Array Vendor Example
Implicit Yes Yes No NetApp
Explicit and Implicit Yes Yes No HP EVA
VNX or CLARiiON CX
Explicit Yes No Yes IBM DS4000

ALUA Followover

To better explain what ALUA followover does, let me first describe what happens without it. Storage design that uses active/passive arrays must consider configurations that prevent a condition referred to as path thrashing, which is the case when, due to poor design or physical failure, some hosts have access to only one SP, whereas other hosts have access to the other SP and/or the incorrect Path Selection Plug-in (PSP) is selected for the array. I have seen this happen in two scenarios, which are described in the following sections.

Path Thrashing Scenario 1

Figure 6.2 shows a Fibre Channel SAN design for a non-ALUA active/passive array. Here Host A has access to SPA only, and Host B has access to SPB only. LUN 1 is owned by SPA. However, because Host B cannot access that SP, it requests from the array to transfer the LUN ownership to SPB. When the array complies, the result is that Host A loses access to the LUN because it is no longer owned by SPA. Host A attempts to recover from this state by requesting that the array to transfer the LUN ownership back to SPA. When the array complies, Host B starts this cycle again. This tug-of-war continues on and on, and neither host can issue any I/O on the LUN.

The only solution for this problem is to correct the design so that each host has access to both SPs and use the VMW_PSP_MRU Pluggable Storage Architecture (PSA) plug-in. Note that enabling ALUA without correcting the design may not prevent this problem.

06fig02.jpg

Figure 6.2 Scenario 1: Path thrashing due to a poor cabling design choice

Path Thrashing Scenario 2

Figure 6.3 shows a variation on Scenario 1 in which the Fibre Channel fabric was designed according to VMware best practices. However, both hosts were configured with VMW_PSP_FIXED instead of VMW_PSP_MRU. This by itself wouldn’t result in path thrashing. However, the designer decided to customize each host so that they have different preferred paths to LUN 1. These preferred path settings are represented by the interrupted lines (a path from Host A and another path from Host B). The expected behavior in this configuration is that as long as the defined preferred path to LUN 1 is available, the host insists on sending I/O via that path. As a result, Host A attempts to send its I/O to LUN 1 via SPA, and Host B sends it I/O via SPB. However, LUN 1 is owned by SPA and attempts to send I/O via SPB, resulting in a check condition with the sense key ILLEGAL_REQUEST (more on this in Chapter 7). Host B insists on sending the I/O via its preferred path. So, it sends a START_UNIT or a TRESPASS command to the array. As a result, the array transfers LUN 1 ownership to SPB. Now Host A gets really upset and tells the array to transfer the LUN back to SPA, using the START_UNIT or TRESPASS commands. The array complies, and the tug-of-war begins!

06fig03.jpg

Figure 6.3 Scenario 2: Path thrashing due to a poor PSP design choice

Preventing Path Thrashing

These two examples prompted VMware to create the VMW_PSP_MRU plug-in for use with active/passive arrays. In older releases, prior to ESX 4.0, this used to be a policy setting for each LUN. In 4.0 and later, including 6.0 and 6.5, MRU is a PSA plug-in. (I the PSP design choices in Chapter 7.) With MRU, the host sends the I/O to the most recently used path. If the LUN moves to another SP, the I/O is sent on the new path to that SP instead of being sent to SP that was the previous owner. Note that MRU ignores the preferred path setting.

ALUA-capable arrays that provide AO AAS for TPGs on the owner SP and ANO AAS for TPGs on the non-owner SP allow I/O to the given LUN with high priority via the AO TPGs and, conversely, lower priority via the ANO TPGs. This means that the latter does not return a check condition with sense key ILLEGAL_REQUEST if I/O to the LUN is sent through it. This means that using VMW_PSP_FIXED with these arrays can result in a lighter version of path thrashing. In this case, I/O does not fail to be sent to the ANO TPGs if that is the preferred path. However, the I/O performance is much lower compared to using the AO TPGs. If more hosts are using the AO TPGs as the preferred path, the LUN ownership stays on the original SP that owns it. As a result, the ANO TPGs are not transitioned to AO for the offending host.

To accommodate this situation, VMware introduced a new feature for use with ALUA devices; however, it is not defined in the ALUA spec. This feature is referred to as ALUA followover.

ALUA followover simply means that when the host detects a TPG AAS change that it did not cause by itself, it does not try to revert the change even if it only has access to TPGs that are ANO. Effectively, this prevents the hosts from fighting for TPG AAS and, instead, they follow the TPG AAS of the array. Figures 6.4 and 6.5 illustrate ALUA followover interaction with TPG AAS.

06fig04.jpg

Figure 6.4 ALUA followover before failure

Figure 6.4 shows a logical storage diagram in which the switch fabrics have been removed to simplify the diagram . Here, TPG ID 1 is the AO on SPA, and both hosts send the I/O to that TPG. TPG ID 2 is ANO, and I/O is not sent to it. These TPGs are configured with ALUA Explicit mode.

Figure 6.5 shows what happens after a path to the AO TPG fails.

Figure 6.5 shows that Host A lost its path to the AO TPG (based on Figure 6.4). As a result, this host takes advantage of the ALUA Explicit mode on the array and sends a SET_TPGS command to the array so that TPG ID 2 is changed to AO and TPG ID 1 is changed to ANO. Host B recognizes that it did not make this change. But because ALUA followover is enabled, Host B just accepts this change and does not attempt to reverse it. Consequently, the I/O is sent to TPG ID 2 because it is now the AO TPG. (Notice that the array moved the LUN ownership to SPB because this is where the AO TPG is located.)

06fig05.jpg

Figure 6.5 ALUA followover after failure

Some storage arrays implement the PREF (preference) bit, which enables an array to specify which SP is the preferred owner of a given LUN. This allows the storage administrator to spread the LUNs over both SPs (for example, even LUNs on one SP and odd LUNs on the other SP). Whenever the need arises to shut down one of the SPs, the LUNs owned by that SP (say SPA) get transferred to the surviving nonpreferred SP (SPB). As a result, the AAS of the port group on SPB is changed to AO. ALUA followover honors this change and sends the next I/O intended for the transferred LUNs to the port group on SPB. When SPA is brought back online, the LUNs it used to own get transferred back to it. This reverses the changes done earlier, and the AAS of the port group on SPA is set to AO for the transferred LUNs. Conversely, the AAS of the port group on SPB, which no longer owns the LUNs, is changed to ANO. Again, ALUA followover honors this change and switches the I/O back to the port group on SPA. This is the default behavior of ALUA-capable HP EVA storage arrays.

Identifying Device ALUA Configuration

ESXi 6 host configuration that enables use of ALUA devices is a PSA component in the form of a SATP (see Chapter 5, “vSphere Pluggable Storage Architecture [PSA]”). PSA claim rules determine which SATP to use, based on array information returned in response to an INQUIRY command. As mentioned earlier, part of the inquiry string is the TPGS field. The claim rules are configured such that if a field’s value is nonzero, the device is claimed by the defined ALUA SATP. In the following sections, I show how to list these claim rules and how to identify ALUA configurations from the device properties.

Identifying ALUA Claim Rule

In Chapter 5, I showed you how to list all the SATP rules. I had to split the screenshots into four quadrants so that I could show all the content of the output. Here, I’ve tried to trim it down and list only the lines I need to show. To do so, I used the following command:

esxcli storage nmp satp rule list |grep -i 'model\|satp_alua\|---' |less -S

This command lists all SATP rules and then uses grep for the string’s model, satp_alua, and ---. This causes the output to have column headers and separator lines, which are the first two lines in the output. The rest of the output shows only the lines with satp_alua in them. Notice that the -i argument causes grep to ignore the case.

Figure 6.6 shows the output from this command.

06fig06.jpg

Figure 6.6 ALUA claim rules

The following is the text of the output with blank columns removed for readability:

Name              Vendor  Model      Options                     Rule Group  Claim Options
------------      ------- ---------  --------------------------  ----------  -----------
VMW_SATP_ALUA_CX  DGC                                            system      tpgs_on
VMW_SATP_ALUA     LSI     INF-01-00  reset_on_attempted_reserve  system      tpgs_on
VMW_SATP_ALUA     NETAPP             reset_on_attempted_reserve  system      tpgs_on
VMW_SATP_ALUA     IBM     2810XIV                                system      tpgs_on
VMW_SATP_ALUA     IBM     2107900    reset_on_attempted_reserve  system
VMW_SATP_ALUA     IBM     2145                                   system
VMW_SATP_ALUA                                                    system      tpgs_on

In this output, notice that the EMC CLARiiON CX family is claimed by the VMW_SATP_ALUA_CX plug-in, based on matches on the Model column setting being DGC and the Claim Options setting being tpgs_on.

On the other hand, both LSI and IBM 2810-XIV are claimed by the VMW_SATP_ALUA plug-in, based on matches on the Vendor column, the Model column, and the value of tpgs_on in the Claim Options column.

NetApp is also claimed by the VMW_SATP_ALUA plug-in, based on matches on the Vendor column and the value of tpgs_on in the Claim Options column only. In this case, the Model column was not used.

IBM DS8000, which is model 2107-900 (listed in the output without the dash), and IBM SVC (listed here as model 2145) are claimed by the VMW_SATP_ALUA plug-in, based on the Vendor and Model columns only, even though the Claim Options column setting is not tpgs_on.

The remaining rule allows VMW_SATP_ALUA to claim devices with any Vendor or Model column value, as long as the Claim Options column value is tpgs_on. This means that any array not listed in the preceding rules that returns a nonzero value for the TPGS field in the inquiry response string gets claimed by VMW_SATP_ALUA. You might think of this as a catch-all ALUA claim rule that claims devices on all ALUA arrays that are not explicitly listed by vendor or model in the SATP claim rules.

Identifying Devices’ ALUA Configurations

ALUA configurations are associated with LUNs in combination with TPGs. To list these configurations, you can run the following command:

esxcli storage nmp device list --device [device-ID]

Or you can use the abbreviated option -d instead of --device.

Here is an example:

esxcli storage nmp device list --device naa.60060160f2c43500bc280391f656e311

The output of this command is listed in the following sections, which show examples from various storage arrays.

An Example from an EMC VNX or CLARiiON CX Array

Figure 6.7 shows an example of an EMC VNX or CLARiiON CX LUN configured for ALUA.

06fig07.jpg

Figure 6.7 ALUA Configuration of a VNX or CLARiiON CX family device

This output shows the Storage Array Type field set to VMW_SATP_ALUA_CX, which is the same as the VMW_SATP_ALUA plug-in with additional code to handle certain commands specific to CLARiiON CX ALUA arrays.

  • The output also shows the Storage Array Type Device Config line, wrapped for readability, which includes a number of parts:The first set of curly brackets, {}, includes initiator registration–specific configuration. This is specific to EMC VNX and the CLARiiON family of arrays. Within this set, two options are listed:

    • navireg=on—This means that NaviAgent Registration option is enabled on this host. It registers the initiator with the VNX or CX array if it is not already registered. Note that you need to check the initiator record on the array to make sure that Failover Mode is set to 4, which enables ALUA for this initiator. (You can find more details on this in Chapter 7.)

    • ipfilter=on—This option filters the host’s IP address so that it is not visible to the storage array. (You’ll learn more about this in Chapter 7.)

  • The ALUA AAS management mode options are enclosed in a second set of curly brackets, within which is another nested pair of curly brackets for the TPG’s AAS configuration. These are the ALUA AAS management mode options:

    • Implicit_support=on—This means that the array supports the Implicit mode of AAS management (refer to Table 6.1).

    • Explicit_support=on—This means that the array supports the Explicit mode of AAS management (refer to Table 6.1).

    • Explicit_allow=on—This means that the host is configured to allow the SATP to exercise its explicit ALUA capability if the need arises (for example, in the event of a failed controller).

    • ALUA_followover=on—This enables the ALUA followover option on the host. (See the “ALUA Followover” section, earlier in this chapter.)

    • action_OnRetryErrors=off—This option is set to off by default. This means that when and I/O on that path returns an error indicating the I/O can be retried, the path remains in an on state even after the I/O is retried indefinitely. When this option is set to on, the path eventually, after a certain timeout, is marked dead if the I/O retries still result in an error. (You’ll learn more about this in Chapter 7.)

  • The next set of options appear within the nested pair of curly brackets for the TPG IDs and AAS:

    • TPG_id—This field shows the target port group ID. If the LUN is accessible via more than one target port group (typically two groups), both IDs are listed here. This example has TPG_id 1 and 2. Each TPG is listed within its own pair of curly brackets.

    • TPG_state—This field shows the AAS of the TPG. Notice that TPG_id 1 is in AO AAS, whereas TPG_id 2 is in ANO AAS. Based on this configuration, I/O is sent to TPG_id 1.

I cover the path-related options in Chapter 7.

More Examples from an EMC VNX/CLARiiON CX Array

The example in Figure 6.8 shows similar output but from an EMC VNX array.

06fig08.jpg

Figure 6.8 ALUA configuration of EMC VNX/CLARiiON CX FC devices

Figure 6.8 shows two devices on the same array. Both devices have identical information. Note the following observations:

  • The first device in this example shows TPG_id 1 being in AO AAS, and on the second device, TPG_id 2 is also in the same state (AO AAS). This means that the devices are spread evenly over the array’s SPs. For example, TPG 1 on SPA services I/O to LUN 2, whereas TPG 2 on SPB services I/O to LUN 0. You should also notice from the Working Paths field that for LUN 2, the target portion of the pathname is T2, via vmhba2 and vmhba3, whereas it is T3 for LUN 0. (I explain in Chapter 2, “Fibre Channel Storage Connectivity,” Chapter 3, “FCoE Storage Connectivity,” and Chapter 7, “Multipathing and Failover,” how to identify which target belongs to which SP.)

  • Both devices are claimed by VMW_PSP_RR (round-robin). This means that the I/Os rotate on SPA ports via vmhba2 and vmhba3 for LUN 2, whereas for LUN 0, the I/Os rotate on SPB ports via the same HBAs. This design balances the load on the storage array SPs as well as the host’s initiators. (I explain this in Chapter 7.)

An Example from an IBM DS8000 Array

Figure 6.9 shows similar output from an IBM DS8000 array-based device.

06fig09.jpg

Figure 6.9 ALUA configuration of an IBM DS8000 device

The output is similar to the output in the preceding sections in many aspects, but note the following differences:

  • The device is claimed by VMW_SATP_ALUA instead of VMW_SATP_ALUA_CX.

  • explicit_support=off means that the array does not support the Explicit mode of AAS management.

  • There is only one TPG_id, which is 0, and it is in an AO AAS.

An Example from an IBM XIV Array

Figure 6.10 shows output from an IBM XIV array-based device.

06fig10.jpg

Figure 6.10 ALUA configuration of an IBM XIV device

The output is similar to that from the IBM DS8000 array (refer to Figure 6.9), with the following differences:

  • The device ID uses the eui format instead of naa. This is usually the result of the array supporting an ANSI revision lower than 3. (See Chapter 1, “Storage Types,” for details.)

  • The PSP in use is VMW_PSP_RR (round-robin) rather than FIXED. (I discuss PSP choices and configuration in Chapter 7.)

An Example from a NetApp Array

Figure 6.11 shows an example from a NetApp array.

06fig11.jpg

Figure 6.11 ALUA configuration of a NetApp FC device

This example is similar to the two-TPG EMC CX example in Figure 6.7, with the following differences:

  • The device is claimed by VMW_SATP_ALUA instead of VMW_SATP_ALUA_CX.

  • explicit_support=off means that the array does not support the Explicit mode of AAS management.

  • Device is claimed by VMW_PSP_RR instead of VMW_PSP_FIXED.

An Example from an HP MSA Array

Figure 6.12 shows an example from an HP MSA array.

06fig12.jpg

Figure 6.12 ALUA configuration of an HP MSA FC device

This example is similar to the NetApp array output (refer to Figure 6.11), with the difference that the PSP is VMW_PSP_MRU.

Troubleshooting ALUA

In this section, I give some troubleshooting foundation that will hopefully help you learn how to fish (also known as TIY—troubleshoot it yourself)!

First, let me familiarize you with the normal log entries. When a device is discovered by vmkernel (logged to /var/log/vmkernel.log or /var/log/boot.gz files), as I mentioned earlier, the TPGS field is included with the inquiry string. The value of that field helps vmkernel identify the AAS management mode (that is, Explicit, Implicit, or Both).

Following are examples from the storage arrays I used in the previous sections. Figure 6.13 shows vmkernel.log entries from an ESXi 6 host connected to an EMC VNX storage array.

06fig13.jpg

Figure 6.13 VMkernel.log entries of an EMC CLARiiON ALUA device

In this example, I truncated the first part of each line, which shows the data, timestamp, and host name. Notice that the ScsiScan lines show the TPGS field with a value of 3. This means that the array supports both Implicit and Explicit ALUA modes. This is printed in English at the end of each line as well.

Figure 6.14 shows log entries from an ESXi 6 host connected to a NetApp storage array.

06fig14.jpg

Figure 6.14 VMkernel.log entries of a NetApp ALUA device

Notice that the ScsiScan lines show the TPGS field with a value of 1. This means that the array supports implicit ALUA mode only. This is printed in English as well at the end of each line.

Figure 6.15 shows log entries from an ESXi 6 host connected to an IBM DS8000 storage array.

06fig15.jpg

Figure 6.15 VMkernel.log entries of an IBM DS8000 ALUA device

This log shows the array as Model: '2107900. Notice that the ScsiScan lines show the TPGS field with a value of 1. This means that the array supports Implicit ALUA mode only. This is printed in English as well at the end of each line.

Figure 6.16 shows log entries from an ESXi 6 host connected to an IBM XIV storage array.

This log shows the array as Model: ‘2810XIV. Notice that the ScsiScan lines show the TPGS field with a value of 1. This means that the array supports Implicit ALUA mode only. This is printed in English as well at the end of each line.

06fig16.jpg

Figure 6.16 VMkernel.log entries of an IBM XIV ALUA device

At this time, I don’t have access to an array that supports an explicit ALUA-only mode. However, the log from such an array would show the value of the TPGS field as 2.

Table 6.3 summarizes the different values of TPGS field and their meaning.

Table 6.3 TPGS field value meanings

TPGS Field Value ALUA Mode
0 Not Supported
1 Implicit only
2 Explicit only
3 Both Implicit and Explicit

Identifying ALUA Devices’ Path State

The next step in troubleshooting is to identify the state of the path or paths to the ALUA device. (I cover the details of multipathing in Chapter 7.) In this section, I show you how to identify the path states. Figure 6.17 shows output from the following command:

esxcli storage nmp path list

Figure 6.17 shows four paths to LUN 2, which is on an EMC VNX array configured for ALUA.

06fig17.jpg

Figure 6.17 Listing of paths to an EMC VNX ALUA device

The following are the troubleshooting-related fields in this output:

  • Group State—Shows the target port group AAS; Active means AO, and Active unoptimized means ANO.

  • Storage Array Type Path Config—This field can be set to TPG_id, TPG_state, RTP_id, or RTP_health:

  • TPG_id—As in the output of the device list, this is the target port group ID.

    • TPG_state—As in the output of the device list, this matches the value equivalent to the previous field, Group State (for example, AO or ANO).

    • RTP_id—This is the relative target port ID, the port ID from which the inquiry response was sent to the initiator. The vital product data (VPD) included in this string includes the relative target port ID. So, with two paths per HBA in this example, two inquiry strings were received by each HBA. The first, on vmhba2, came from RTP ID 13, and the second came from RTP ID 5. In contrast, on vmhba3, the first inquiry string was received from RTP ID 14, and the second was received from RTP ID 6.

    • RTP_health—This is the health status of the RTP. It can be either UP or DOWN. In the output shown in Figure 6.17, it is UP. If it were DOWN, the Group State value would be Dead instead of Active or Active Unoptimized.

  • + Share This
  • 🔖 Save To Your Account