System Level Diagnostic Guide
System Level Diagnostic Guide
System Level Diagnostic Guide
NetApp, Inc.
495 East Java Drive
Sunnyvale, CA 94089 U.S.A.
Telephone: +1 (408) 822-6000
Fax: +1 (408) 822-4501
Support telephone: +1 (888) 4-NETAPP
Documentation comments: [email protected]
Information Web: www.netapp.com
Contents
Copyright information ................................................................................. 5
Trademark information ............................................................................... 7
About this guide ............................................................................................ 9
Audience ...................................................................................................................... 9
Terminology ................................................................................................................ 9
Where to enter commands ......................................................................................... 10
Keyboard and formatting conventions ...................................................................... 11
Special messages ....................................................................................................... 12
How to send your comments ..................................................................................... 12
Introduction to system-level diagnostics .................................................. 13
Requirements for running system-level diagnostics ................................................. 14
Accessing Data ONTAP man pages .......................................................................... 14
How to use online command-line help ...................................................................... 15
Running system installation diagnostics ................................................... 17
Running system panic diagnostics ............................................................ 21
Running slow system response diagnostics .............................................. 25
Running hardware installation diagnostics ............................................. 29
Running device failure diagnostics ........................................................... 33
Index ............................................................................................................. 37
Copyright information | 5
Copyright information
Copyright 19942011 NetApp, Inc. All rights reserved. Printed in the U.S.A.
No part of this document covered by copyright may be reproduced in any form or by any means
graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an
electronic retrieval systemwithout prior written permission of the copyright owner.
Software derived from copyrighted NetApp material is subject to the following license and
disclaimer:
THIS SOFTWARE IS PROVIDED BY NETAPP "AS IS" AND WITHOUT ANY EXPRESS OR
IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE,
WHICH ARE HEREBY DISCLAIMED. IN NO EVENT SHALL NETAPP BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER
IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
NetApp reserves the right to change any products described herein at any time, and without notice.
NetApp assumes no responsibility or liability arising from the use of products described herein,
except as expressly agreed to in writing by NetApp. The use or purchase of this product does not
convey a license under any patent rights, trademark rights, or any other intellectual property rights of
NetApp.
The product described in this manual may be protected by one or more U.S.A. patents, foreign
patents, or pending applications.
RESTRICTED RIGHTS LEGEND: Use, duplication, or disclosure by the government is subject to
restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer
Software clause at DFARS 252.277-7103 (October 1988) and FAR 52-227-19 (June 1987).
Trademark information | 7
Trademark information
NetApp, the NetApp logo, Network Appliance, the Network Appliance logo, ApplianceWatch,
ASUP, AutoSupport, Bycast, Campaign Express, ComplianceClock, Cryptainer, CryptoShred, Data
ONTAP, DataFabric, DataFort, Decru, Decru DataFort, FAServer, FilerView, FlexCache, FlexClone,
FlexScale, FlexShare, FlexSuite, FlexVol, FPolicy, GetSuccessful, gFiler, Go further, faster, Imagine
Virtually Anything, Lifetime Key Management, LockVault, Manage ONTAP, MetroCluster,
MultiStore, NearStore, NetCache, NOW (NetApp on the Web), ONTAPI, OpenKey, RAID-DP,
ReplicatorX, SANscreen, SecureAdmin, SecureShare, Select, Shadow Tape, Simulate ONTAP,
SnapCopy, SnapDirector, SnapDrive, SnapFilter, SnapLock, SnapManager, SnapMigrator,
SnapMirror, SnapMover, SnapRestore, Snapshot, SnapSuite, SnapValidator, SnapVault,
StorageGRID, StoreVault, the StoreVault logo, SyncMirror, Tech OnTap, The evolution of storage,
Topio, vFiler, VFM, Virtual File Manager, VPolicy, WAFL, and Web Filer are trademarks or
registered trademarks of NetApp, Inc. in the United States, other countries, or both.
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corporation in the United States, other countries, or both. A complete and current list of
other IBM trademarks is available on the Web at www.ibm.com/legal/copytrade.shtml.
Apple is a registered trademark and QuickTime is a trademark of Apple, Inc. in the U.S.A. and/or
other countries. Microsoft is a registered trademark and Windows Media is a trademark of Microsoft
Corporation in the U.S.A. and/or other countries. RealAudio, RealNetworks, RealPlayer,
RealSystem, RealText, and RealVideo are registered trademarks and RealMedia, RealProxy, and
SureStream are trademarks of RealNetworks, Inc. in the U.S.A. and/or other countries.
All other brands or products are trademarks or registered trademarks of their respective holders and
should be treated as such.
NetApp, Inc. is a licensee of the CompactFlash and CF Logo trademarks.
NetApp, Inc. NetCache is certified RealSystem compatible.
About this guide | 9
Next topics
Audience on page 9
Terminology on page 9
Where to enter commands on page 10
Keyboard and formatting conventions on page 11
Special messages on page 12
How to send your comments on page 12
Audience
This document is written with certain assumptions about your technical knowledge and experience.
This guide is for qualified system administrators and service personnel who are familiar with NetApp
storage systems.
Terminology
To understand the concepts in this document, you might need to know how certain terms are used.
Storage terms
storage The component of a storage system that runs the Data ONTAP operating system
controller and controls its disk subsystem. Storage controllers are also sometimes called
10 | System-Level Diagnostics Guide
HA (high In Data ONTAP 8.x, the recovery capability provided by a pair of nodes
availability) (storage systems), called an HA pair, that are configured to serve data for each
other if one of the two nodes stops functioning.
In the Data ONTAP 7.3 and 7.2 release families, this functionality is referred to
as an active/active configuration.
HA pair In Data ONTAP 8.x, a pair of nodes (storage systems) configured to serve data
for each other if one of the two nodes stops functioning.
In the Data ONTAP 7.3 and 7.2 release families, this functionality is referred to
as an active/active configuration.
the nodeshell prompt on systems running Data ONTAP 8.x Cluster-Mode. See the Data ONTAP
Cluster-Mode Administration Reference for more information.
You can enter commands either at the system console or from any client computer that can obtain
access to the storage system using a Telnet or Secure Shell (SSH) session.
In examples that illustrate command execution, the command syntax and output shown might
differ from what you enter or see displayed, depending on your version of the operating system.
Keyboard conventions
Enter, enter Used to refer to the key that generates a carriage return; the key is named
Return on some keyboards.
Used to mean pressing one or more keys on the keyboard and then pressing the
Enter key, or clicking in a field in a graphical interface and then typing
information into the field.
hyphen (-) Used to separate individual keys. For example, Ctrl-D means holding down the
Ctrl key while pressing the D key.
Formatting conventions
Monospaced font Command names, option names, keywords, and daemon names.
Information displayed on the system console or other computer monitors.
Contents of files.
File, path, and directory names.
Bold monospaced Words or characters you type. What you type is always shown in lowercase
font letters, unless your program is case-sensitive and uppercase letters are
necessary for it to work properly.
Special messages
This document might contain the following types of messages to alert you to conditions that you
need to be aware of.
Note: A note contains important information that helps you install or operate the system
efficiently.
Attention: An attention notice contains instructions that you must follow to avoid a system crash,
loss of data, or damage to the equipment.
Next topics
Requirements for running system-level diagnostics on page 14
Accessing Data ONTAP man pages on page 14
How to use online command-line help on page 15
14 | System-Level Diagnostics Guide
Commands 1
Special files 4
Step
You can also type the question mark at the command line for a list of all the commands that are
available at the current level of administration (administrative or advanced).
The following example shows the result of entering the environment help command at the
storage system command line. The command output displays the syntax help for the environment
commands.
Steps
1. At the storage system prompt, enter the following command to get to the Loader prompt:
halt
Note: You must run this command from the Loader prompt for system-level diagnostics to
function properly. The boot_diags command starts special drivers designed specifically for
system-level diagnostics.
For details about the sldiag command, see the sldiag man page.
4. View the version of system-level diagnostics present on your storage system by entering the
following command:
sldiag version show
storage is an ATA, FC-AL, or SAS storage device that has an attached disk shelf.
toe is a TCP Offload Engine, a type of NIC.
nic is a Network Interface Card.
cna is a Converged Network Adapter.
env is motherboard environmentals.
serviceproc is the Service Processor.
fcache is the Performance Acceleration Module 2, also known as the Flash Cache adapter.
bootmedia is the system booting device.
interconnect or nvram-ib is the high-availability interface.
6. Run all the default selected diagnostic tests on your storage system by entering the following
command:
sldiag device run
Your storage system provides the following output while the tests are still running:
After all the tests are complete, the following response appears by default:
*> <SLDIAG:_ALL_TESTS_COMPLETED>
8. Verify that there are no hardware problems on your new storage system by entering the following
command:
sldiag device status -long -state failed
The example shows that the tests were run without the appropriate hardware.
Running system installation diagnostics | 19
Example
The following example pulls up the full status of failures that occurred:
STATUS: Completed
ib3a: could not set loopback mode, test failed
END DATE: Sat Jan 3 23:11:04 GMT 2009
20 | System-Level Diagnostics Guide
LOOP: 1/1
TEST END --------------------------------------------
STATUS: Completed
Starting test on Fcal Adapter: 0b
Started gathering adapter info.
Adapter get adapter info OK
Adapter fc_data_link_rate: 1Gib
Adapter name: QLogic 2532
Adapter firmware rev: 4.5.2
Adapter hardware rev: 2
ioctl_status.class_type = 0x1
ioctl_status.subclass = 0x3
ioctl_status.info = 0x0
Started INTERNAL LOOPBACK:
INTERNAL LOOPBACK OK
Error Count: 2 Run Time: 70 secs
>>>>> ERROR, please ensure the port has a shelf or plug.
END DATE: Sat Jan 3 23:12:07 GMT 2009
LOOP: 1/1
TEST END --------------------------------------------
Running system panic diagnostics | 21
Steps
1. At the storage system prompt, enter the following command to get to the Loader prompt:
halt
Note: You must run this command from the Loader prompt for system-level diagnostics to
function properly. The boot_diags command starts special drivers designed specifically for
system-level diagnostics.
For details about the sldiag command, see the sldiag man page.
4. Run diagnostics on all the devices by entering the following command:
sldiag device run
Your storage system provides the following output while the tests are still running:
After all the tests are complete, you receive the following default response:
*> <SLDIAG:_ALL_TESTS_COMPLETED>
6. Identify the cause of the system panic by entering the following command:
sldiag device status -long -state failed
The example shows that the tests were run without the appropriate hardware.
22 | System-Level Diagnostics Guide
Example
The following example pulls up the full status of failures that occurred:
STATUS: Completed
ib3a: could not set loopback mode, test failed
END DATE: Sat Jan 3 23:11:04 GMT 2009
Running system panic diagnostics | 23
LOOP: 1/1
TEST END --------------------------------------------
STATUS: Completed
Starting test on Fcal Adapter: 0b
Started gathering adapter info.
Adapter get adapter info OK
Adapter fc_data_link_rate: 1Gib
Adapter name: QLogic 2532
Adapter firmware rev: 4.5.2
Adapter hardware rev: 2
ioctl_status.class_type = 0x1
ioctl_status.subclass = 0x3
ioctl_status.info = 0x0
Started INTERNAL LOOPBACK:
INTERNAL LOOPBACK OK
Error Count: 2 Run Time: 70 secs
>>>>> ERROR, please ensure the port has a shelf or plug.
END DATE: Sat Jan 3 23:12:07 GMT 2009
LOOP: 1/1
TEST END --------------------------------------------
Steps
1. At the storage system prompt, enter the following command to get to the Loader prompt:
halt
Note: You must run this command from the Loader prompt for system-level diagnostics to
function properly. The boot_diags command starts special drivers designed specifically for
system-level diagnostics.
For details about the sldiag command, see the sldiag man page.
4. Run diagnostics on all the devices by entering the following command:
sldiag device run
Your storage system provides the following output while the tests are still running:
After all the tests are complete, the following response appears by default:
*> <SLDIAG:_ALL_TESTS_COMPLETED>
6. Identify the cause of the system sluggishness by entering the following command:
sldiag device status -long -state failed
The example shows that the tests were run without the appropriate hardware.
26 | System-Level Diagnostics Guide
Example
The following example pulls up the full status of failures that occurred:
STATUS: Completed
ib3a: could not set loopback mode, test failed
END DATE: Sat Jan 3 23:11:04 GMT 2009
LOOP: 1/1
TEST END --------------------------------------------
STATUS: Completed
Starting test on Fcal Adapter: 0b
Started gathering adapter info.
Adapter get adapter info OK
Adapter fc_data_link_rate: 1Gib
Adapter name: QLogic 2532
Adapter firmware rev: 4.5.2
28 | System-Level Diagnostics Guide
ioctl_status.class_type = 0x1
ioctl_status.subclass = 0x3
ioctl_status.info = 0x0
Started INTERNAL LOOPBACK:
INTERNAL LOOPBACK OK
Error Count: 2 Run Time: 70 secs
>>>>> ERROR, please ensure the port has a shelf or plug.
END DATE: Sat Jan 3 23:12:07 GMT 2009
LOOP: 1/1
TEST END --------------------------------------------
Steps
Note: You must run this command from the Loader prompt for system-level diagnostics to
function properly. The boot_diags command starts special drivers designed specifically for
system-level diagnostics.
For details about the sldiag command, see the sldiag man page.
5. Run the default tests on the particular device you added or replaced by entering the following
command:
sldiag device run [-dev devtype] [-name device]
fcal is a Fibre Channel-Arbitrated Loop device not connected to a storage device or Fibre
Channel network.
sas is a Serial Attached SCSI device not connected to a disk shelf.
storage is an ATA, FC-AL, or SAS storage device that has an attached disk shelf.
toe is a TCP Offload Engine, a type of NIC.
nic is a Network Interface Card.
cna is a Converged Network Adapter.
env is motherboard environmentals.
serviceproc is the Service Processor.
fcache is the Performance Acceleration Module 2, also known as the Flash Cache
adapter.
bootmedia is the system booting device.
interconnect or nvram-ib is the high-availability interface.
-name device specifies a given device class and type.
Your storage system provides the following output while the tests are still running:
After all the tests are complete, the following response appears by default:
*> <SLDIAG:_ALL_TESTS_COMPLETED>
7. Verify that no hardware problems resulted from the addition or replacement of hardware
components on your storage system by entering the following command:
sldiag device status [-dev devtype] [-name device] -long -state failed
The example shows that the tests were run without the appropriate hardware.
Running hardware installation diagnostics | 31
Example
The following example pulls up the full status of failures resulting from testing a newly installed
FC-AL adapter:
STATUS: Completed
Starting test on Fcal Adapter: 0b
32 | System-Level Diagnostics Guide
ioctl_status.class_type = 0x1
ioctl_status.subclass = 0x3
ioctl_status.info = 0x0
Started INTERNAL LOOPBACK:
INTERNAL LOOPBACK OK
Error Count: 2 Run Time: 70 secs
>>>>> ERROR, please ensure the port has a shelf or plug.
END DATE: Sat Jan 3 23:12:07 GMT 2009
LOOP: 1/1
TEST END --------------------------------------------
Steps
1. At the storage system prompt, enter the following command to get to the Loader prompt:
halt
Note: You must run this command from the Loader prompt for system-level diagnostics to
function properly. The boot_diags command starts special drivers designed specifically for
system-level diagnostics.
For details about the sldiag command, see the sldiag man page.
4. Run diagnostics on the device causing problems by entering the following command:
sldiag device run [-dev devtype] [-name device]
fcache is the Performance Acceleration Module 2, also known as the Flash Cache
adapter.
bootmedia is the system booting device.
interconnect or nvram-ib is the high-availability interface.
-name device specifies a given device class and type.
Your storage system provides the following output while the tests are still running:
After all the tests are complete, the following response appears by default:
*> <SLDIAG:_ALL_TESTS_COMPLETED>
6. Identify any hardware problems by entering the following command:
sldiag device status [-dev devtype] [-name device] -long -state failed
The example shows that the tests were run without the appropriate hardware.
Were completed There are no hardware problems and your storage system returns to the prompt.
without any
failures a. Clear the status logs by entering the following command:
sldiag device clearstatus
b. Verify that the log is cleared by entering the following command:
sldiag device status
The following default response is displayed:
Example
The following example pulls up the full status of failures resulting from testing the FC-AL
adapter:
DEVTYPE: fcal
NAME: Fcal Loopback Test
START DATE: Sat Jan 3 23:10:56 GMT 2009
STATUS: Completed
Starting test on Fcal Adapter: 0b
Started gathering adapter info.
Adapter get adapter info OK
Adapter fc_data_link_rate: 1Gib
Adapter name: QLogic 2532
Adapter firmware rev: 4.5.2
Adapter hardware rev: 2
ioctl_status.class_type = 0x1
ioctl_status.subclass = 0x3
ioctl_status.info = 0x0
Started INTERNAL LOOPBACK:
INTERNAL LOOPBACK OK
Error Count: 2 Run Time: 70 secs
>>>>> ERROR, please ensure the port has a shelf or plug.
END DATE: Sat Jan 3 23:12:07 GMT 2009
LOOP: 1/1
TEST END --------------------------------------------
Index
C after device failures 33
after hardware installations 29
considerations after slow system responses 25
for running system-level diagnostics 14 after system installations 17
after system panics 21
D system-level considerations for 14
device failures
running diagnostics after 33 S
diagnostics
running after device failure 33 slow system response
running after hardware installation 29 running diagnostics for 25
running after slow system response 25 system-level diagnostics
running after system installation 17 considerations for running 14
running after system panic 21 systems
running diagnostics after installation failures 17
running diagnostics after panics 21
F running diagnostics for slow response 25
failures
running diagnostics after device 33
T
H troubleshooting
considerations for running system-level diagnostics
hardware installations
14
running diagnostics after 29
device failures 33
hardware installations 29
O slow system response 25
online command-line help 15 system installation 17
system panics 21
R
running diagnostics