System Level Diagnostic Guide

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

NetApp System-Level Diagnostics Guide

NetApp, Inc.
495 East Java Drive
Sunnyvale, CA 94089 U.S.A.
Telephone: +1 (408) 822-6000
Fax: +1 (408) 822-4501
Support telephone: +1 (888) 4-NETAPP
Documentation comments: [email protected]
Information Web: www.netapp.com

Part number 215-05496_A0


December 2010
Table of Contents | 3

Contents
Copyright information ................................................................................. 5
Trademark information ............................................................................... 7
About this guide ............................................................................................ 9
Audience ...................................................................................................................... 9
Terminology ................................................................................................................ 9
Where to enter commands ......................................................................................... 10
Keyboard and formatting conventions ...................................................................... 11
Special messages ....................................................................................................... 12
How to send your comments ..................................................................................... 12
Introduction to system-level diagnostics .................................................. 13
Requirements for running system-level diagnostics ................................................. 14
Accessing Data ONTAP man pages .......................................................................... 14
How to use online command-line help ...................................................................... 15
Running system installation diagnostics ................................................... 17
Running system panic diagnostics ............................................................ 21
Running slow system response diagnostics .............................................. 25
Running hardware installation diagnostics ............................................. 29
Running device failure diagnostics ........................................................... 33
Index ............................................................................................................. 37
Copyright information | 5

Copyright information
Copyright 19942011 NetApp, Inc. All rights reserved. Printed in the U.S.A.
No part of this document covered by copyright may be reproduced in any form or by any means
graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an
electronic retrieval systemwithout prior written permission of the copyright owner.
Software derived from copyrighted NetApp material is subject to the following license and
disclaimer:
THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND WITHOUT ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE,
WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
NetApp reserves the right to change any products described herein at any time, and without notice.
NetApp assumes no responsibility or liability arising from the use of products described herein,
except as expressly agreed to in writing by NetApp. The use or purchase of this product does not
convey a license under any patent rights, trademark rights, or any other intellectual property rights of
NetApp.
The product described in this manual may be protected by one or more U.S.A. patents, foreign
patents, or pending applications.
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to
restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer
Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987).
Trademark information | 7

Trademark information
NetApp, the NetApp logo, Network Appliance, the Network Appliance logo, ApplianceWatch,
ASUP, AutoSupport, Bycast, Campaign Express, ComplianceClock, Cryptainer, CryptoShred, Data
ONTAP, DataFabric, DataFort, Decru, Decru DataFort, FAServer, FilerView, FlexCache, FlexClone,
FlexScale, FlexShare, FlexSuite, FlexVol, FPolicy, GetSuccessful, gFiler, Go further, faster, Imagine
Virtually Anything, Lifetime Key Management, LockVault, Manage ONTAP, MetroCluster,
MultiStore, NearStore, NetCache, NOW (NetApp on the Web), ONTAPI, OpenKey, RAID-DP,
ReplicatorX, SANscreen, SecureAdmin, SecureShare, Select, Shadow Tape, Simulate ONTAP,
SnapCopy, SnapDirector, SnapDrive, SnapFilter, SnapLock, SnapManager, SnapMigrator,
SnapMirror, SnapMover, SnapRestore, Snapshot, SnapSuite, SnapValidator, SnapVault,
StorageGRID, StoreVault, the StoreVault logo, SyncMirror, Tech OnTap, The evolution of storage,
Topio, vFiler, VFM, Virtual File Manager, VPolicy, WAFL, and Web Filer are trademarks or
registered trademarks of NetApp, Inc. in the United States, other countries, or both.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corporation in the United States, other countries, or both. A complete and current list of
other IBM trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml.
Apple is a registered trademark and QuickTime is a trademark of Apple, Inc. in the U.S.A. and/or
other countries. Microsoft is a registered trademark and Windows Media is a trademark of Microsoft
Corporation in the U.S.A. and/or other countries. RealAudio, RealNetworks, RealPlayer,
RealSystem, RealText, and RealVideo are registered trademarks and RealMedia, RealProxy, and
SureStream are trademarks of RealNetworks, Inc. in the U.S.A. and/or other countries.
All other brands or products are trademarks or registered trademarks of their respective holders and
should be treated as such.
NetApp, Inc. is a licensee of the CompactFlash and CF Logo trademarks.
NetApp, Inc. NetCache is certified RealSystem compatible.
About this guide | 9

About this guide


You can use your product more effectively when you understand this document's intended audience
and the conventions that this document uses to present information.
This guide explains how to use system-level diagnostics to resolve five common troubleshooting
situations. This guide does not provide detailed definitions of specific tests, error messages, or
conditions. The na_sldiag(1) man page describes the various subcommands available for running
system level diagnostics.
Note: This guide applies to systems running Data ONTAP 8.x 7-Mode, including V-Series
systems. The 7-Mode in the Data ONTAP 8.x 7-Mode product name means that this release has
the features and functionality you are used to if you have been using the Data ONTAP 7.0, 7.1,
7.2, or 7.3 release families. If you are a Data ONTAP 8.x Cluster-Mode user, you use the Data
ONTAP 8.x Cluster-Mode guides plus any Data ONTAP 8.x 7-Mode guides for functionality you
might want to access with 7-Mode commands through the nodeshell.

Next topics
Audience on page 9
Terminology on page 9
Where to enter commands on page 10
Keyboard and formatting conventions on page 11
Special messages on page 12
How to send your comments on page 12

Audience
This document is written with certain assumptions about your technical knowledge and experience.
This guide is for qualified system administrators and service personnel who are familiar with NetApp
storage systems.

Terminology
To understand the concepts in this document, you might need to know how certain terms are used.

Storage terms
storage The component of a storage system that runs the Data ONTAP operating system
controller and controls its disk subsystem. Storage controllers are also sometimes called
10 | System-Level Diagnostics Guide

controllers, storage appliances, appliances, storage engines, heads, CPU modules,


or controller modules.
storage The hardware device running Data ONTAP that receives data from and sends data
system to native disk shelves, third-party storage, or both. Storage systems that run Data
ONTAP are sometimes referred to as filers, appliances, storage appliances, V-
Series systems, or systems.

Cluster and high-availability terms


cluster In Data ONTAP 8.x Cluster-Mode, a group of connected nodes (storage
systems) that share a global namespace and that you can manage as a single
virtual server or multiple virtual servers, providing performance, reliability, and
scalability benefits.
In the Data ONTAP 7.1 release family and earlier releases, a pair of storage
systems (sometimes called nodes) configured to serve data for each other if one
of the two systems stops functioning.
In the Data ONTAP 7.3 and 7.2 release families, this functionality is referred to
as an active/active configuration.
For some storage array vendors, cluster refers to the hardware component on
which host adapters and ports are located. Some storage array vendors refer to
this component as a controller.

HA (high In Data ONTAP 8.x, the recovery capability provided by a pair of nodes
availability) (storage systems), called an HA pair, that are configured to serve data for each
other if one of the two nodes stops functioning.
In the Data ONTAP 7.3 and 7.2 release families, this functionality is referred to
as an active/active configuration.

HA pair In Data ONTAP 8.x, a pair of nodes (storage systems) configured to serve data
for each other if one of the two nodes stops functioning.
In the Data ONTAP 7.3 and 7.2 release families, this functionality is referred to
as an active/active configuration.

Where to enter commands


You can use your product more effectively when you understand how this document uses command
conventions to present information.
You can perform common administrator tasks in one or more of the following ways:
Note: Data ONTAP commands shown in this document are for Data ONTAP 8.x 7-Mode and the
Data ONTAP 7.x release families. However, some of these commands might also be available at
About this guide | 11

the nodeshell prompt on systems running Data ONTAP 8.x Cluster-Mode. See the Data ONTAP
Cluster-Mode Administration Reference for more information.

You can enter commands either at the system console or from any client computer that can obtain
access to the storage system using a Telnet or Secure Shell (SSH) session.
In examples that illustrate command execution, the command syntax and output shown might
differ from what you enter or see displayed, depending on your version of the operating system.

Keyboard and formatting conventions


You can use your product more effectively when you understand how this document uses keyboard
and formatting conventions to present information.

Keyboard conventions

Convention What it means

The NOW site Refers to the NetApp Support site at now.netapp.com.

Enter, enter Used to refer to the key that generates a carriage return; the key is named
Return on some keyboards.
Used to mean pressing one or more keys on the keyboard and then pressing the
Enter key, or clicking in a field in a graphical interface and then typing
information into the field.

hyphen (-) Used to separate individual keys. For example, Ctrl-D means holding down the
Ctrl key while pressing the D key.

type Used to mean pressing one or more keys on the keyboard.

Formatting conventions

Convention What it means

Italic font Words or characters that require special attention.


Placeholders for information that you must supply.
For example, if the guide says to enter the arp -d hostname command,
you enter the characters "arp -d" followed by the actual name of the host.
Book titles in cross-references.
12 | System-Level Diagnostics Guide

Convention What it means

Monospaced font Command names, option names, keywords, and daemon names.
Information displayed on the system console or other computer monitors.
Contents of files.
File, path, and directory names.

Bold monospaced Words or characters you type. What you type is always shown in lowercase
font letters, unless your program is case-sensitive and uppercase letters are
necessary for it to work properly.

Special messages
This document might contain the following types of messages to alert you to conditions that you
need to be aware of.
Note: A note contains important information that helps you install or operate the system
efficiently.

Attention: An attention notice contains instructions that you must follow to avoid a system crash,
loss of data, or damage to the equipment.

How to send your comments


You can help us to improve the quality of our documentation by sending us your feedback.
Your feedback is important in helping us to provide the most accurate and high-quality information.
If you have suggestions for improving this document, send us your comments by e-mail to
[email protected]. To help us direct your comments to the correct division, include in the
subject line the name of your product and the applicable operating system. For example, FAS6070
Data ONTAP 7.3, or Host UtilitiesSolaris, or Operations Manager 3.8Windows.
Introduction to system-level diagnostics | 13

Introduction to system-level diagnostics


System-level diagnostics provides a command-line interface to tests that search for and determine
hardware problems on supported storage systems. You use system-level diagnostics to confirm that a
specific component is operating properly or to help identify faulty components.
System-level diagnostics is available for supported storage systems only. Entering system-level
diagnostics at the command-line interface of unsupported storage systems generates an error
message.
You run system-level diagnostics after one of the following events:
Initial system installation
Addition or replacement of hardware components
System panic caused by an unidentified hardware failure
Access to a specific device becomes intermittent or the device becomes unavailable
System response time becomes sluggish
To run system-level diagnostics, you have to get to the Maintenance mode boot option in Data
ONTAP. There are several approaches to get to this option. Some adapters in your system, however,
may require a specific approach. This is the recommended approach taken in the procedures
documented in this guide.
Once the command is entered, the tests run in the background and the passed or failed outcome of the
tests is logged in the internal memory-based log which has a fixed size. Some tests are utilities and
will simply state completed rather than passed or failed. After you run the appropriate tests, the
procedures documented in this guide help you generate status report. Once the test results show a
successful completion of system-level diagnostics, it is a recommended best practice to clear the log.
In the event of test failures, the status reports will help technical support make appropriate
recommendations. The failure could be resolved by re-installing a field-replaceable unit, by ensuring
cables are connected, or by enabling specific tests recommended by technical support and then re-
running the tests. If the failure cannot be resolved, then there is a hardware failure and the affected
hardware must be replaced.
There are no error messages that require further definitions or explanations.

Next topics
Requirements for running system-level diagnostics on page 14
Accessing Data ONTAP man pages on page 14
How to use online command-line help on page 15
14 | System-Level Diagnostics Guide

Requirements for running system-level diagnostics


Depending on the system-level diagnostic tests you are running, you need to be aware of time and
system hardware requirements.
The following requirements must be met when running system-level diagnostics; otherwise, parts of
the tests fail and error messages appear in the status report:
Each documented task has slight differences; use the recommended procedure for the task.
Running memory tests takes time; the larger the memory capacity of your storage system, the
longer it takes.
Each system being tested must be on a separate network.
The network interface test assigns unique static IP addresses, beginning with 172.25.150.23, to all
available network interfaces on a storage system. This results in network interface ports on
different storage controllers being assigned the same IP address. If all the systems being tested are
on the same network, then duplicate ip address warning messages appear on the connected
consoles. These warning messages do not affect the test results.
Adjacent network interface ports must be connected for best performance.
Examples of adjacent ports are e0a and e0b or e2c and e2d.
When running the SAS system-level diagnostic tests, adjacent SAS ports must be connected for
best performance; storage shelves must be disconnected from the ports.
When running the FC-AL system-level diagnostic tests, you must have loopback hoods on FC-
AL interfaces on the motherboard or expansion adapters for best performance; all other cables for
storage or Fibre Channel networks must be disconnected from the ports.

Accessing Data ONTAP man pages


You can use the Data ONTAP manual (man) pages to access technical information.

About this task


Data ONTAP manual pages are available for the following types of information. They are grouped
into sections according to standard UNIX naming conventions.

Types of information Man page section

Commands 1

Special files 4

File formats and conventions 5

System management and services 8


Introduction to system-level diagnostics | 15

Step

1. View man pages in the following ways:


Enter the following command at the console command line:
man command_or_file_name
Click the manual pages button on the main Data ONTAP navigational page in the FilerView
user interface.
Note: All Data ONTAP 8.x 7-Mode man pages are stored on the system in files whose
names are prefixed with the string "na_" to distinguish them from other man pages. The
prefixed names sometimes appear in the NAME field of the man page, but the prefixes are
not part of the command, file, or service.

How to use online command-line help


You can get command-line syntax help from the command line by entering the name of the
command followed by help or the question mark (?).
The fonts or symbols used in syntax help are as follows:

keyword Specifies the name of a command or an option that must be entered as


shown.
< > (less than, greater Specify that you must replace the variable identified inside the symbols
than symbols) with a value.
| (pipe) Indicates that you must choose one of the elements on either side of the
pipe.
[ ] (brackets) Indicate that the element inside the brackets is optional.
{ } (braces) Indicate that the element inside the braces is required.

You can also type the question mark at the command line for a list of all the commands that are
available at the current level of administration (administrative or advanced).
The following example shows the result of entering the environment help command at the
storage system command line. The command output displays the syntax help for the environment
commands.

toaster> environment help


Usage: environment status |
[status] [shelf [<adapter>]] |
[status] [shelf_log] |
[status] [shelf_stats] |
[status] [shelf_power_status] |
[status] [chassis [all | list-sensors | Fan | Power | Temp | Power Supply
| RTC Battery | NVRAM4-temperature-7 | NVRAM4-battery-7]]
Running system installation diagnostics | 17

Running system installation diagnostics


You run diagnostics after an initial system installation to identify the version of system-level
diagnostics and the supported devices on your storage system, and to verify that the installation is
successful and that all hardware is functioning properly.

Steps

1. At the storage system prompt, enter the following command to get to the Loader prompt:
halt

2. Enter the following command at the Loader prompt:


boot_diags

Note: You must run this command from the Loader prompt for system-level diagnostics to
function properly. The boot_diags command starts special drivers designed specifically for
system-level diagnostics.

The Maintenance mode prompt (*>) appears.


3. Enter the following command at the Maintenance mode prompt:
sldiag

For details about the sldiag command, see the sldiag man page.
4. View the version of system-level diagnostics present on your storage system by entering the
following command:
sldiag version show

The version is displayed in the format System Level DiagnosticsX.nn.nn. The X is an


alpha reference and nn.nn are major and minor numeric references, respectively.
5. Identify the device types in your new system installation so that you know which components to
verify by entering the following command:
sldiag device types

Your storage system displays some or all of the following devices:


mem is system memory.
nvram is nonvolatile RAM.
nvmem is a hybrid of NVRAM and system memory.
ata is an Advanced Technology Attachment device.
fcal is a Fibre Channel-Arbitrated Loop device not connected to a storage device or Fibre
Channel network.
sas is a Serial Attached SCSI device not connected to a disk shelf.
18 | System-Level Diagnostics Guide

storage is an ATA, FC-AL, or SAS storage device that has an attached disk shelf.
toe is a TCP Offload Engine, a type of NIC.
nic is a Network Interface Card.
cna is a Converged Network Adapter.
env is motherboard environmentals.
serviceproc is the Service Processor.
fcache is the Performance Acceleration Module 2, also known as the Flash Cache adapter.
bootmedia is the system booting device.
interconnect or nvram-ib is the high-availability interface.

6. Run all the default selected diagnostic tests on your storage system by entering the following
command:
sldiag device run

7. View the status of the test by entering the following command:


sldiag device status

Your storage system provides the following output while the tests are still running:

There are still test(s) being processed.

After all the tests are complete, the following response appears by default:

*> <SLDIAG:_ALL_TESTS_COMPLETED>
8. Verify that there are no hardware problems on your new storage system by entering the following
command:
sldiag device status -long -state failed

The example shows that the tests were run without the appropriate hardware.
Running system installation diagnostics | 19

If the system- Then...


level diagnostics
tests...
Were completed There are no hardware problems and your storage system returns to the prompt.
without any
failures a. Clear the status logs by entering the following command:
sldiag device clearstatus
b. Verify that the log is cleared by entering the following command:
sldiag device status
The following default response is displayed:

SLDIAG: No log messages are present.


c. Exit Maintenance mode by entering the following command:
halt
d. Enter the following command at the firmware prompt to reboot the storage
system:
boot

You have completed system-level diagnostics.


Resulted in some Determine the cause of the problem.
test failures
a. Exit Maintenance mode by entering the following command:
halt
b. Perform a clean shutdown and disconnect the power supplies.
c. Verify that you have observed all the considerations identified for running system-
level diagnostics, that cables are securely connected, and that hardware
components are properly installed in the storage system.
d. Reconnect the power supplies and power on the storage system.
e. Repeat Steps 1 through 8 of Running system installation diagnostics.

Example
The following example pulls up the full status of failures that occurred:

*> sldiag device status -long -state failed

TEST START ------------------------------------------


DEVTYPE: nvram_ib
NAME: external loopback test
START DATE: Sat Jan 3 23:10:55 GMT 2009

STATUS: Completed
ib3a: could not set loopback mode, test failed
END DATE: Sat Jan 3 23:11:04 GMT 2009
20 | System-Level Diagnostics Guide

LOOP: 1/1
TEST END --------------------------------------------

TEST START ------------------------------------------


DEVTYPE: fcal
NAME: Fcal Loopback Test
START DATE: Sat Jan 3 23:10:56 GMT 2009

STATUS: Completed
Starting test on Fcal Adapter: 0b
Started gathering adapter info.
Adapter get adapter info OK
Adapter fc_data_link_rate: 1Gib
Adapter name: QLogic 2532
Adapter firmware rev: 4.5.2
Adapter hardware rev: 2

Started adapter get WWN string test.


Adapter get WWN string OK wwn_str: 5:00a:098300:035309

Started adapter interrupt test


Adapter interrupt test OK

Started adapter reset test.


Adapter reset OK

Started Adapter Get Connection State Test.


Connection State: 5
Loop on FC Adapter 0b is OPEN

Started adapter Retry LIP test


Adapter Retry LIP OK

ERROR: failed to init adaptor port for IOCTL call

ioctl_status.class_type = 0x1

ioctl_status.subclass = 0x3

ioctl_status.info = 0x0
Started INTERNAL LOOPBACK:
INTERNAL LOOPBACK OK
Error Count: 2 Run Time: 70 secs
>>>>> ERROR, please ensure the port has a shelf or plug.
END DATE: Sat Jan 3 23:12:07 GMT 2009

LOOP: 1/1
TEST END --------------------------------------------
Running system panic diagnostics | 21

Running system panic diagnostics


Running diagnostics after your storage system suffers a system panic can help you to identify the
possible cause of the panic.

Steps

1. At the storage system prompt, enter the following command to get to the Loader prompt:
halt

2. Enter the following command at the Loader prompt:


boot_diags

Note: You must run this command from the Loader prompt for system-level diagnostics to
function properly. The boot_diags command starts special drivers designed specifically for
system-level diagnostics.

The Maintenance mode prompt (*>) appears.


3. Enter the following command at the Maintenance mode prompt:
sldiag

For details about the sldiag command, see the sldiag man page.
4. Run diagnostics on all the devices by entering the following command:
sldiag device run

5. View the status of the test by entering the following command:


sldiag device status

Your storage system provides the following output while the tests are still running:

There are still test(s) being processed.

After all the tests are complete, you receive the following default response:

*> <SLDIAG:_ALL_TESTS_COMPLETED>
6. Identify the cause of the system panic by entering the following command:
sldiag device status -long -state failed

The example shows that the tests were run without the appropriate hardware.
22 | System-Level Diagnostics Guide

If the system- Then...


level diagnostics
tests...
Were completed There are no hardware problems and your storage system returns to the prompt.
without any
failures a. Clear the status logs by entering the following command:
sldiag device clearstatus
b. Verify that the log is cleared by entering the following command:
sldiag device status
The following default response is displayed:

SLDIAG: No log messages are present.


c. Exit Maintenance mode by entering the following command:
halt
d. Enter the following command at the firmware prompt to reboot the storage
system:
boot

You have completed system-level diagnostics.


Resulted in some Determine the cause of the problem.
test failures
a. Exit Maintenance mode by entering the following command:
halt
b. Perform a clean shutdown and disconnect the power supplies.
c. Verify that you have observed all the considerations identified for running system-
level diagnostics, that cables are securely connected, and that hardware
components are properly installed in the storage system.
d. Reconnect the power supplies and power on the storage system.
e. Repeat Steps 1 through 6 of Running system panic diagnostics.

Example
The following example pulls up the full status of failures that occurred:

*> sldiag device status -long -state failed

TEST START ------------------------------------------


DEVTYPE: nvram_ib
NAME: external loopback test
START DATE: Sat Jan 3 23:10:55 GMT 2009

STATUS: Completed
ib3a: could not set loopback mode, test failed
END DATE: Sat Jan 3 23:11:04 GMT 2009
Running system panic diagnostics | 23

LOOP: 1/1
TEST END --------------------------------------------

TEST START ------------------------------------------


DEVTYPE: fcal
NAME: Fcal Loopback Test
START DATE: Sat Jan 3 23:10:56 GMT 2009

STATUS: Completed
Starting test on Fcal Adapter: 0b
Started gathering adapter info.
Adapter get adapter info OK
Adapter fc_data_link_rate: 1Gib
Adapter name: QLogic 2532
Adapter firmware rev: 4.5.2
Adapter hardware rev: 2

Started adapter get WWN string test.


Adapter get WWN string OK wwn_str: 5:00a:098300:035309

Started adapter interrupt test


Adapter interrupt test OK

Started adapter reset test.


Adapter reset OK

Started Adapter Get Connection State Test.


Connection State: 5
Loop on FC Adapter 0b is OPEN

Started adapter Retry LIP test


Adapter Retry LIP OK

ERROR: failed to init adaptor port for IOCTL call

ioctl_status.class_type = 0x1

ioctl_status.subclass = 0x3

ioctl_status.info = 0x0
Started INTERNAL LOOPBACK:
INTERNAL LOOPBACK OK
Error Count: 2 Run Time: 70 secs
>>>>> ERROR, please ensure the port has a shelf or plug.
END DATE: Sat Jan 3 23:12:07 GMT 2009

LOOP: 1/1
TEST END --------------------------------------------

After you finish


If the failures persist after repeating the steps, you need to replace the hardware.
Running slow system response diagnostics | 25

Running slow system response diagnostics


Running diagnostics can help you identify the causes of slow system response times.

Steps

1. At the storage system prompt, enter the following command to get to the Loader prompt:
halt

2. Enter the following command at the Loader prompt:


boot_diags

Note: You must run this command from the Loader prompt for system-level diagnostics to
function properly. The boot_diags command starts special drivers designed specifically for
system-level diagnostics.

The Maintenance mode prompt (*>) appears.


3. Enter the following command at the Maintenance mode prompt:
sldiag

For details about the sldiag command, see the sldiag man page.
4. Run diagnostics on all the devices by entering the following command:
sldiag device run

5. View the status of the test by entering the following command:


sldiag device status

Your storage system provides the following output while the tests are still running:

There are still test(s) being processed.

After all the tests are complete, the following response appears by default:

*> <SLDIAG:_ALL_TESTS_COMPLETED>
6. Identify the cause of the system sluggishness by entering the following command:
sldiag device status -long -state failed

The example shows that the tests were run without the appropriate hardware.
26 | System-Level Diagnostics Guide

If the system- Then...


level diagnostics
tests...
Were completed There are no hardware problems and your storage system returns to the prompt.
without any
failures a. Clear the status logs by entering the following command:
sldiag device clearstatus
b. Verify that the log is cleared by entering the following command:
sldiag device status
The following default response is displayed:

SLDIAG: No log messages are present.


c. Exit Maintenance mode by entering the following command:
halt
d. Enter the following command at the firmware prompt to reboot the storage system:
boot

You have completed system-level diagnostics.


Resulted in some Determine the cause of the problem.
test failures
a. Exit Maintenance mode by entering the following command:
halt
b. Perform a clean shutdown and disconnect the power supplies.
c. Verify that you observed all the requirements for running system-level diagnostics,
that cables are securely connected, and that hardware components are properly
installed in the storage system.
d. Reconnect the power supplies and power on the storage system.
e. Repeat Steps 1 through 6 of Running slow system response diagnostics.
Running slow system response diagnostics | 27

If the system- Then...


level diagnostics
tests...
Resulted in the Technical support might recommend modifying the default settings on some of the tests
same test failures to help identify the problem.
a. Modify the selection state of a specific device or type of device on your storage
system by entering the following command:
sldiag device modify [-dev devtype] [-name device] [-
selection enable|disable|default]
-selection enable|disable|default allows you to enable, disable, or
accept the default selection of a specified device type or named device.
b. Verify that the tests were modified by entering the following command:
sldiag option show
c. Repeat Steps 4 through 6 of Running slow system response diagnostics.
d. After you identify and resolve the problem, reset the tests to their default states
by repeating substeps 1 and 2.
e. Repeat Steps 1 through 6 of Running slow system response diagnostics.

Example
The following example pulls up the full status of failures that occurred:

*> sldiag device status -long -state failed

TEST START ------------------------------------------


DEVTYPE: nvram_ib
NAME: external loopback test
START DATE: Sat Jan 3 23:10:55 GMT 2009

STATUS: Completed
ib3a: could not set loopback mode, test failed
END DATE: Sat Jan 3 23:11:04 GMT 2009

LOOP: 1/1
TEST END --------------------------------------------

TEST START ------------------------------------------


DEVTYPE: fcal
NAME: Fcal Loopback Test
START DATE: Sat Jan 3 23:10:56 GMT 2009

STATUS: Completed
Starting test on Fcal Adapter: 0b
Started gathering adapter info.
Adapter get adapter info OK
Adapter fc_data_link_rate: 1Gib
Adapter name: QLogic 2532
Adapter firmware rev: 4.5.2
28 | System-Level Diagnostics Guide

Adapter hardware rev: 2

Started adapter get WWN string test.


Adapter get WWN string OK wwn_str: 5:00a:098300:035309

Started adapter interrupt test


Adapter interrupt test OK

Started adapter reset test.


Adapter reset OK

Started Adapter Get Connection State Test.


Connection State: 5
Loop on FC Adapter 0b is OPEN

Started adapter Retry LIP test


Adapter Retry LIP OK

ERROR: failed to init adaptor port for IOCTL call

ioctl_status.class_type = 0x1

ioctl_status.subclass = 0x3

ioctl_status.info = 0x0
Started INTERNAL LOOPBACK:
INTERNAL LOOPBACK OK
Error Count: 2 Run Time: 70 secs
>>>>> ERROR, please ensure the port has a shelf or plug.
END DATE: Sat Jan 3 23:12:07 GMT 2009

LOOP: 1/1
TEST END --------------------------------------------

After you finish


If the failures persist after repeating the steps, you need to replace the hardware.
Running hardware installation diagnostics | 29

Running hardware installation diagnostics


You run diagnostics after adding or replacing hardware components in your storage system to verify
that the component has no problems and that the installation is successful.

Steps

1. As the node boots, interrupt the boot process by pressing Ctrl-c.


2. Complete the applicable step, depending on where the node halted during the boot process.
If the node halted at the... Then...
Loader prompt Continue with the procedure.
Boot menu a. Select the Maintenance mode option from the displayed menu.
b. Enter the following command at the prompt:
halt
c. Continue with the procedure.

3. Enter the following command at the Loader prompt:


boot_diags

Note: You must run this command from the Loader prompt for system-level diagnostics to
function properly. The boot_diags command starts special drivers designed specifically for
system-level diagnostics.

The Maintenance mode prompt (*>) appears.


4. Enter the following command at the Maintenance mode prompt:
sldiag

For details about the sldiag command, see the sldiag man page.
5. Run the default tests on the particular device you added or replaced by entering the following
command:
sldiag device run [-dev devtype] [-name device]

-dev devtype specifies the type of device to be tested.

mem is system memory.


nvram is nonvolatile RAM.
nvmem is a hybrid of NVRAM and system memory.
ata is an Advanced Technology Attachment device.
30 | System-Level Diagnostics Guide

fcal is a Fibre Channel-Arbitrated Loop device not connected to a storage device or Fibre
Channel network.
sas is a Serial Attached SCSI device not connected to a disk shelf.
storage is an ATA, FC-AL, or SAS storage device that has an attached disk shelf.
toe is a TCP Offload Engine, a type of NIC.
nic is a Network Interface Card.
cna is a Converged Network Adapter.
env is motherboard environmentals.
serviceproc is the Service Processor.
fcache is the Performance Acceleration Module 2, also known as the Flash Cache
adapter.
bootmedia is the system booting device.
interconnect or nvram-ib is the high-availability interface.
-name device specifies a given device class and type.

6. View the status of the test by entering the following command:


sldiag device status

Your storage system provides the following output while the tests are still running:

There are still test(s) being processed.

After all the tests are complete, the following response appears by default:

*> <SLDIAG:_ALL_TESTS_COMPLETED>
7. Verify that no hardware problems resulted from the addition or replacement of hardware
components on your storage system by entering the following command:
sldiag device status [-dev devtype] [-name device] -long -state failed

The example shows that the tests were run without the appropriate hardware.
Running hardware installation diagnostics | 31

If the system- Then...


level diagnostics
tests...
Were completed There are no hardware problems and your storage system returns to the prompt.
without any
failures a. Clear the status logs by entering the following command:
sldiag device clearstatus
b. Verify that the log is cleared by entering the following command:
sldiag device status
The following default response is displayed:

SLDIAG: No log messages are present.


c. Exit Maintenance mode by entering the following command:
halt
d. Enter the following command at the firmware prompt to reboot the storage
system:
boot

You have completed system-level diagnostics.


Resulted in some Determine the cause of the problem.
test failures
a. Exit Maintenance mode by entering the following command:
halt
b. Perform a clean shutdown and disconnect the power supplies.
c. Verify that you have observed all the considerations identified for running system-
level diagnostics, that cables are securely connected, and that hardware
components are properly installed in the storage system.
d. Reconnect the power supplies and power on the storage system.
e. Repeat Steps 1 through 7 of Running hardware installation diagnostics.

Example
The following example pulls up the full status of failures resulting from testing a newly installed
FC-AL adapter:

*> sldiag device status -dev fcal -long -state failed

TEST START ------------------------------------------


DEVTYPE: fcal
NAME: Fcal Loopback Test
START DATE: Sat Jan 3 23:10:56 GMT 2009

STATUS: Completed
Starting test on Fcal Adapter: 0b
32 | System-Level Diagnostics Guide

Started gathering adapter info.


Adapter get adapter info OK
Adapter fc_data_link_rate: 1Gib
Adapter name: QLogic 2532
Adapter firmware rev: 4.5.2
Adapter hardware rev: 2

Started adapter get WWN string test.


Adapter get WWN string OK wwn_str: 5:00a:098300:035309

Started adapter interrupt test


Adapter interrupt test OK

Started adapter reset test.


Adapter reset OK

Started Adapter Get Connection State Test.


Connection State: 5
Loop on FC Adapter 0b is OPEN

Started adapter Retry LIP test


Adapter Retry LIP OK

ERROR: failed to init adaptor port for IOCTL call

ioctl_status.class_type = 0x1

ioctl_status.subclass = 0x3

ioctl_status.info = 0x0
Started INTERNAL LOOPBACK:
INTERNAL LOOPBACK OK
Error Count: 2 Run Time: 70 secs
>>>>> ERROR, please ensure the port has a shelf or plug.
END DATE: Sat Jan 3 23:12:07 GMT 2009

LOOP: 1/1
TEST END --------------------------------------------

After you finish


If the failures persist after repeating the steps, you need to replace the hardware.
Running device failure diagnostics | 33

Running device failure diagnostics


Running diagnostics can help you determine why access to a specific device becomes intermittent or
why the device becomes unavailable in your storage system.

Steps

1. At the storage system prompt, enter the following command to get to the Loader prompt:
halt

2. Enter the following command at the Loader prompt:


boot_diags

Note: You must run this command from the Loader prompt for system-level diagnostics to
function properly. The boot_diags command starts special drivers designed specifically for
system-level diagnostics.

The Maintenance mode prompt (*>) appears.


3. Enter the following command at the Maintenance mode prompt:
sldiag

For details about the sldiag command, see the sldiag man page.
4. Run diagnostics on the device causing problems by entering the following command:
sldiag device run [-dev devtype] [-name device]

-dev devtype specifies the type of device to be tested.

mem is system memory.


nvram is nonvolatile RAM.
nvmem is a hybrid of NVRAM and system memory.
ata is an Advanced Technology Attachment device.
fcal is a Fibre Channel-Arbitrated Loop device not connected to a storage device or Fibre
Channel network.
sas is a Serial Attached SCSI device not connected to a disk shelf.
storage is an ATA, FC-AL, or SAS storage device that has an attached disk shelf.
toe is a TCP Offload Engine, a type of NIC.
nic is a Network Interface Card.
cna is a Converged Network Adapter.
env is motherboard environmentals.
serviceproc is the Service Processor.
34 | System-Level Diagnostics Guide

fcache is the Performance Acceleration Module 2, also known as the Flash Cache
adapter.
bootmedia is the system booting device.
interconnect or nvram-ib is the high-availability interface.
-name device specifies a given device class and type.

5. View the status of the test by entering the following command:


sldiag device status

Your storage system provides the following output while the tests are still running:

There are still test(s) being processed.

After all the tests are complete, the following response appears by default:

*> <SLDIAG:_ALL_TESTS_COMPLETED>
6. Identify any hardware problems by entering the following command:
sldiag device status [-dev devtype] [-name device] -long -state failed

The example shows that the tests were run without the appropriate hardware.

If the system- Then...


level diagnostics
tests...
Resulted in some Determine the cause of the problem.
test failures
a. Exit Maintenance mode by entering the following command:
halt
b. Perform a clean shutdown and disconnect the power supplies.
c. Verify that you have observed all the considerations identified for running system-
level diagnostics, that cables are securely connected, and that hardware components
are properly installed in the storage system.
d. Reconnect the power supplies and power on the storage system.
e. Repeat Steps 1 through 6 of Running device failure diagnostics.
Running device failure diagnostics | 35

If the system- Then...


level diagnostics
tests...
Resulted in the Technical support might recommend modifying the default settings on some of the tests
same test failures to help identify the problem.
a. Modify the selection state of a specific device or type of device on your storage
system by entering the following command:
sldiag device modify [-dev devtype] [-name device] [-
selection enable|disable|default]
-selection enable|disable|default allows you to enable, disable, or
accept the default selection of a specified device type or named device.
b. Verify that the tests were modified by entering the following command:
sldiag option show
c. Repeat Steps 4 through 6 of Running device failure diagnostics.
d. After you identify and resolve the problem, reset the tests to their default states
by repeating substeps 1 and 2.
e. Repeat Steps 1 through 6 of Running device failure diagnostics.

Were completed There are no hardware problems and your storage system returns to the prompt.
without any
failures a. Clear the status logs by entering the following command:
sldiag device clearstatus
b. Verify that the log is cleared by entering the following command:
sldiag device status
The following default response is displayed:

SLDIAG: No log messages are present.


c. Exit Maintenance mode by entering the following command:
halt
d. Enter the following command at the firmware prompt to reboot the storage system:
boot

You have completed system-level diagnostics.

Example
The following example pulls up the full status of failures resulting from testing the FC-AL
adapter:

*> sldiag device status fcal -long -state failed

TEST START ------------------------------------------


36 | System-Level Diagnostics Guide

DEVTYPE: fcal
NAME: Fcal Loopback Test
START DATE: Sat Jan 3 23:10:56 GMT 2009

STATUS: Completed
Starting test on Fcal Adapter: 0b
Started gathering adapter info.
Adapter get adapter info OK
Adapter fc_data_link_rate: 1Gib
Adapter name: QLogic 2532
Adapter firmware rev: 4.5.2
Adapter hardware rev: 2

Started adapter get WWN string test.


Adapter get WWN string OK wwn_str: 5:00a:098300:035309

Started adapter interrupt test


Adapter interrupt test OK

Started adapter reset test.


Adapter reset OK

Started Adapter Get Connection State Test.


Connection State: 5
Loop on FC Adapter 0b is OPEN

Started adapter Retry LIP test


Adapter Retry LIP OK

ERROR: failed to init adaptor port for IOCTL call

ioctl_status.class_type = 0x1

ioctl_status.subclass = 0x3

ioctl_status.info = 0x0
Started INTERNAL LOOPBACK:
INTERNAL LOOPBACK OK
Error Count: 2 Run Time: 70 secs
>>>>> ERROR, please ensure the port has a shelf or plug.
END DATE: Sat Jan 3 23:12:07 GMT 2009

LOOP: 1/1
TEST END --------------------------------------------

After you finish


If the failures persist after repeating the steps, you need to replace the hardware.
Index | 37

Index
C after device failures 33
after hardware installations 29
considerations after slow system responses 25
for running system-level diagnostics 14 after system installations 17
after system panics 21
D system-level considerations for 14
device failures
running diagnostics after 33 S
diagnostics
running after device failure 33 slow system response
running after hardware installation 29 running diagnostics for 25
running after slow system response 25 system-level diagnostics
running after system installation 17 considerations for running 14
running after system panic 21 systems
running diagnostics after installation failures 17
running diagnostics after panics 21
F running diagnostics for slow response 25
failures
running diagnostics after device 33
T
H troubleshooting
considerations for running system-level diagnostics
hardware installations
14
running diagnostics after 29
device failures 33
hardware installations 29
O slow system response 25
online command-line help 15 system installation 17
system panics 21
R
running diagnostics

You might also like