Aix Hacmp Cookbook
Aix Hacmp Cookbook
Aix Hacmp Cookbook
SG24-4553-00
IBML
SG24-4553-00
Take Note! Before using this information and the product it supports, be sure to read the general information under Special Notices on page xiii.
Abstract
This document deals with HACMP/6000 Version 3.1.1. Its goal is to serve as a reminder, checklist and operating guide for the steps required in order to install and customize HACMP/6000. It describes a set of tools developed by the HACMP services team in IBM France, which make it easier to design, customize and document an HACMP cluster. Included in the book are the following:
How to install the HACMP product Description of the tools developed by the HACMP services team in IBM France Steps to be carried out during an installation, including customization Testing suggestions
Following the instructions in the checklist will assist you towards a smooth and error-free installation. A basic understanding of the HACMP is assumed, and therefore is not included in the book. (215 pages)
iii
iv
An HACMP Cookbook
Contents
Abstract
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii xiii
Special Notices
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How This Document is Organized . . . . . . . . . . . . . . . . . . . . . . Related Publications International Technical Support Organization Publications ITSO Redbooks on the World Wide Web (WWW) . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1. Overview of the Tools 1.1 Installation Tips . . . . . . . .
. . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 2. Inventory Tool . . . . . . . . 2.1 Inventory - Communication adapters . . . . . . . . . . . 2.2 Inventory - Disks 2.3 Output from the Inventory Tool . . . . . . . . . . . . . . . . . 2.4 Output Files . . . . . . . . 2.5 Sample Configuration . . . 2.6 Example of Anomalies Report 2.7 When to Run the Inventory Tool . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 3. Setting up a Cluster . . . . . . . . . 3.1 Cluster Description . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Planning Considerations . . . . . . . . 3.2.1 Network Considerations . . . . . 3.2.2 Disk Adapter Considerations 3.2.3 Shared Volume Group Considerations 3.2.4 Planning Worksheets . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 4. Pre-Installation Activities . . . . . . . . . . . 4.1 Installing the Tools . . . . . . . . . . . . . . . . . . . 4.2 TCP/IP Configuration . . . . . . . . . . . . . . . . . . 4.2.1 Adapter and Hostname Configuration . . . . . 4.2.2 Configuration of /etc/hosts File . . . . . . . . . . . . . . . . . . . 4.2.3 Configuration of /.rhosts File 4.2.4 Configuration of /etc/rc.net File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 Testing . . . . . . . . . 4.3 Non-TCP/IP Network Configuration . . . . . . . . . . . . 4.3.1 RS232 Link Configuration 4.3.2 SCSI Target Mode Configuration . . . . . . . . 4.4 Connecting Shared Disks . . . . . . . . . . . . . . . 4.5 Defining Shared Volume Groups . . . . . . . . . . . 4.5.1 Create Shared Volume Groups on First Node 4.5.2 Import Shared Volume Groups to Second Node Chapter 5. Installing the HACMP/6000 Software 5.1 On Cluster Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 On Cluster Clients . . . . . . . . . 5.3 Installing HACMP Updates
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Loading the Concurrent Logical Volume Manager 5.5 Customizing the /usr/sbin/cluster/etc/clhosts File 5.6 Customizing the /usr/sbin/cluster/etc/clinfo.rc File Chapter 6. Cluster Environment Definition . . . . . . 6.1 Defining the Cluster ID and Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Defining Nodes 6.3 Defining Network Adapters . . . . . . . . . . . . . . . . . 6.3.1 Defining mickeys Network Adapters . . . . . 6.3.2 Defining goofys Network Adapters 6.4 Synchronizing the Cluster Definition on All Nodes Chapter 7. Node Environment Definition 7.1 Defining Application Servers . . . . 7.2 Creating Resource Groups . . . . . 7.3 Verify Cluster Environment . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
28 28 29 31 31 33 34 34 39 41 43 43 44 53 55 55 57 58 59 59 59 63 64 66 67 67 67 68 70 71 72 74 75 75 76 77 77 78 78 79 79 97 97 97 98 99
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 8. Starting and Stopping Cluster Services . . . . . . . . . . . . 8.1 Starting Cluster Services 8.2 Stopping Cluster Services . . . . . . . . . . . . 8.3 Testing the Cluster . . . . . . . . . . . . . . . . Chapter 9. Error Notification Tool . . 9.1 Description . . . . . . . . . . . . . . . . . . 9.2 Error Notification Example . . . . . . 9.2.1 Checking the ODM . . . . . 9.3 Testing the Error Scripts 9.4 Deleting Error Notification Routines
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 10. Event Customization Tool . . . . . 10.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Primary Events . . . . . . . . . 10.3 Secondary or Sub Events 10.4 How the Event Customization Tool Works . . . 10.5 Event Customization Tool Example 10.5.1 Looking at the ODM . . . . . . . . . . 10.5.2 Customizing the Scripts . . . . . . . . 10.6 Synchronizing the Node Environment . . . . . . . . . . . . 10.6.1 Logging the Events . . . . 10.7 Testing the Event Customizations Chapter 11. Cluster Documentation . . . . . 11.1 Generating your Cluster Documentation 11.2 Printing the Report on a UNIX System . 11.3 Printing the Report on a VM System . . Appendix A. Qualified Hardware for HACMP . . . . . . . A.1 The HAMATRIX Document
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix B. RS232 Serial Connection Cable . . . . . . B.1 IBM Standard Cable . . . . . . . . . . . . . . . . . . B.2 Putting together Available Cables and Connectors . . . . . . . . . . . . . . . B.3 Making your Own Cable Appendix C. List of AIX Errors
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
vi
An HACMP Cookbook
Appendix D. Disk Setup in an HACMP Cluster . . . . . . . . . . . . D.1 SCSI Disks and Subsystems . . . . . . . . . . . . . . . . . . . . D.1.1 SCSI Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . D.1.2 Individual Disks and Enclosures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.1.3 Hooking It All Up . . . . . . . . . . . . . . D.1.4 AIXs View of Shared SCSI Disks . . . . . . . . . . . . . . . . . . . . . . . . . . D.2 RAID Subsystems D.2.1 SCSI Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.2.2 RAID Enclosures . . . . . . . . . . . . . . . . D.2.3 Connecting RAID Subsystems D.2.4 AIXs View of Shared RAID Devices . . . . . . . . . . . . . D.3 Serial Disk Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . D.3.1 High-Performance Disk Drive Subsystem Adapter D.3.2 9333 Disk Subsystems . . . . . . . . . . . . . . . . . . . . . D.3.3 Connecting Serial Disk Subsystems in an HACMP Cluster D.3.4 AIXs View of Shared Serial Disk Subsystems . . . . . . . D.4 Serial Storage Architecture (SSA) Subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . D.4.1 SSA Software Requirements D.4.2 SSA Four Port Adapter . . . . . . . . . . . . . . . . . . . . . D.4.3 IBM 7133 SSA Disk Subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D.4.4 SSA Cables D.4.5 Connecting 7133 SSA Subsystems in an HACMP Cluster D.4.6 AIXs View of Shared SSA Disk Subsystems . . . . . . . . Appendix E. Example Cluster Planning Worksheets
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
107 107 107 110 111 116 116 117 117 117 121 122 122 122 122 123 124 124 125 126 127 128 130 131
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . E.1 Preface of the Report E.2 SYSTEM CONFIGURATION . . . . . . . . . . . E.2.1 Cluster Diagram . . . . . . . . . . . . . . . E.2.2 Hostname . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.2.3 Defined Volume Groups E.2.4 Active Volume Groups . . . . . . . . . . . E.2.5 Adapters and Disks . . . . . . . . . . . . . E.2.6 Physical Volumes . . . . . . . . . . . . . . . . . E.2.7 Logical Volumes by Volume Group . . . . . . . . E.2.8 Logical Volume Definitions E.2.9 Filesystems . . . . . . . . . . . . . . . . . . E.2.10 Paging Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . E.2.11 TCP/IP Parameters E.2.12 NFS: Exported Filesystems . . . . . . . . E.2.13 NFS: Mounted Filesystems . . . . . . . . E.2.14 NFS: Other Parameters . . . . . . . . . . E.2.15 Daemons and Processes . . . . . . . . . E.2.16 Subsystems : Status . . . . . . . . . . . . E.2.17 BOS and LPP Installation/Update History . . . . . . . . . . . . . . E.2.18 TTY: Definitions E.2.19 ODM: Customized Attributes . . . . . . . . . . . . . . . . . . E.3 HACMP CONFIGURATION . . . . . . . E.3.1 Cluster (Command: cllsclstr) . . . . . . . E.3.2 Nodes (Command: cllsnode) E.3.3 Networks (Command: cllsnw) . . . . . . . E.3.4 Adapters (Command: cllsif) . . . . . . . . E.3.5 Topology (Command: cllscf) . . . . . . . .
137 137 138 138 139 139 139 140 140 141 141 145 145 145 146 146 146 147 147 148 156 156 160 160 160 160 160 161
Contents
vii
E.3.6 Resources (Command: clshowres -n All) . . . . . . . . . . . E.3.7 Daemons (Command: clshowsrv -a) . . . . . . . . . . . . . . E.4 HACMP EVENTS and AIX ERROR NOTIFICATION . . . . . . . . . E.4.2 Script: /usr/HACMP_ANSS/script/CMD_node_down_remote . E.4.3 Script: /usr/HACMP_ANSS/script/CMD_node_up_remote E.4.4 Script: /usr/HACMP_ANSS/script/POS_node_down_remote E.4.5 Script: /usr/HACMP_ANSS/script/PRE_node_down_remote E.4.6 Script: /usr/HACMP_ANSS/script/PRE_node_up_remote . . . . E.4.7 Script: /usr/HACMP_ANSS/script/error_NOTIFICATION E.4.8 Script: /usr/HACMP_ANSS/script/error_SDA . . . . . . . . . . . E.4.9 Script: /usr/HACMP_ANSS/script/event_NOTIFICATION E.4.10 Script : /usr/HACMP_ANSS/tools/tool_var . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.5 SYSTEM FILES E.5.1 File: /etc/rc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.5.2 File: /etc/rc.net E.5.3 File: /etc/hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.5.4 File: /etc/filesystems E.5.5 File: /etc/inetd.conf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E.5.6 File: /etc/syslog.conf . . . . . . . . . . . . . . . . . . . . . . . . . . E.5.7 File: /etc/inittab E.6 CONTENTS OF THE HACMP OBJECTS IN THE ODM . . . . . . . E.6.1 odmget of /etc/objrepos/HACMPadapter . . . . . . . . . . . E.6.2 odmget of /etc/objrepos/HACMPcluster . . . . . . . . . . . . E.6.3 odmget of /etc/objrepos/HACMPcommand . . . . . . . . . . . . . . . . . . . . . . E.6.4 odmget of /etc/objrepos/HACMPevent . . . . . . . . . . . . E.6.5 odmget of /etc/objrepos/HACMPfence . . . . . . . . . . . . E.6.6 odmget of /etc/objrepos/HACMPgroup E.6.7 odmget of /etc/objrepos/HACMPnetwork . . . . . . . . . . . . . . . . . . . . . . . . E.6.8 odmget of /etc/objrepos/HACMPnim E.6.9 odmget of /etc/objrepos/HACMPnim.120195 . . . . . . . . . . . . . E.6.10 odmget of /etc/objrepos/HACMPnim_pre_U438726 E.6.11 odmget of /etc/objrepos/HACMPnode . . . . . . . . . . . . E.6.12 odmget of /etc/objrepos/HACMPresource . . . . . . . . . . . . . . . . . . . . . E.6.13 odmget of /etc/objrepos/HACMPserver E.6.14 odmget of /etc/objrepos/HACMPsp2 . . . . . . . . . . . . . E.6.15 odmget of /etc/objrepos/errnotify . . . . . . . . . . . . . . . List of Abbreviations Index
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
163 163 164 167 167 167 167 168 168 169 170 171 172 172 173 176 178 180 181 182 184 184 185 185 195 202 203 203 203 205 205 205 206 207 207 207 211 213
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
viii
An HACMP Cookbook
Figures
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. Example of an inventory on a NODE . . . . . . . . . . . . . . . . . . . . . Example of a /tmp/HACMPmachine-anomalies file . . . . . . . . . . . . Cluster disney . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Defining Shared LVM Components for Non-Concurrent Access Termination Resistor Blocks on the SCSI-2 Differential Controller . . . Termination Resistor Blocks on the SCSI-2 Differential Fast/Wide . . Adapter/A and Enhanced SCSI-2 Differential Fast/Wide Adapter/A 7204-215 External Disk Drives Connected on an 8-Bit Shared SCSI Bus 7204-315 External Disk Drives Connected on a 16-Bit Shared SCSI Bus 9334-011 SCSI Expansion Units Connected on an 8-Bit Shared SCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bus 9334-501 SCSI Expansion Units Connected on an 8-Bit Shared SCSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bus 7134-010 High Density SCSI Disk Subsystem Connected on Two 16-Bit Shared SCSI Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7135-110 RAIDiant Arrays Connected on Two Shared 8-Bit SCSI Buses 7135-110 RAIDiant Arrays Connected on Two Shared 16-Bit SCSI Buses 7137 Disk Array Subsystems Connected on an 8-Bit SCSI Bus . . . . . 7137 Disk Array Subsystems Connected on a 16-Bit SCSI Bus . . . . . 9333-501 Connected to Eight Nodes in an HACMP Cluster (Rear View) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SSA Four Port Aapter IBM 7133 SSA Disk Subsystem . . . . . . . . . . . . . . . . . . . . . . . . High Availability SSA Cabling Scenario 1 . . . . . . . . . . . . . . . . . . High Availability SSA Cabling Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Worksheet 1 - Cluster Worksheet 2 - Network Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Worksheet 3 - 9333 Serial Disk Subsystem Configuration . . . . . . . . . . . . . . . Worksheet 4 - Shared Volume Group test1vg . . . . . . . . . . . . . . . Worksheet 5 - Shared Volume Group test2vg Worksheet 6 - Shared Volume Group conc1vg . . . . . . . . . . . . . . . 5 6 . 8 . 20 108
. .
108 111 112 114 114 116 118 119 120 121 123 125 126 128 130 131 132 133 134 135 136
ix
An HACMP Cookbook
Tables
1. 2. Wiring scheme for the RS232 connection between nodes . . . . . . . . Serial Storage Architecture (SSA) Cables
. . . . . . . . . . . . . . . . . .
98 127
xi
xii
An HACMP Cookbook
Special Notices
This publication is intended to help customers and IBM services personnel to more easily plan, install, set up, and document their HACMP clusters. The information in this publication is not intended as the specification of any programming interfaces that are provided by HACMP/6000 Version 3.1.1. See the PUBLICATIONS section of the IBM Programming Announcement for HACMP Version 3.1.1 for more information about what publications are considered to be product documentation. References in this publication to IBM products, programs or services do not imply that IBM intends to make these available in all countries in which IBM operates. Any reference to an IBM product, program, or service is not intended to state or imply that only IBMs product, program, or service may be used. Any functionally equivalent program that does not infringe any of IBMs intellectual property rights may be used instead of the IBM product, program or service. Information in this book was developed in conjunction with use of the equipment specified, and is limited in application to those specific hardware and software products and levels. IBM may have this document. these patents. Licensing, IBM patents or pending patent applications covering subject matter in The furnishing of this document does not give you any license to You can send license inquiries, in writing, to the IBM Director of Corporation, 500 Columbus Avenue, Thornwood, NY 10594 USA.
The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customers ability to evaluate and integrate them into the customers operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers attempting to adapt these techniques to their own environments do so at their own risk. Reference to PTF numbers that have not been released through the normal distribution process does not imply general availability. The purpose of including these reference numbers is to alert IBM customers to specific information relative to the implementation of the PTF when it becomes available to each customer according to the normal IBM PTF distribution process. The following terms are trademarks of the International Business Machines Corporation in the United States and/or other countries:
AIX IBM POWERserver RISC System/6000 SP HACMP/6000 OS/2 POWERstation RS/6000
The following terms are trademarks of other companies: C-bus is a trademark of Corollary, Inc.
xiii
PC Direct is a trademark of Ziff Communications Company and is used by IBM Corporation under license. UNIX is a registered trademark in the United States and other countries licensed exclusively through X/Open Company Limited. Windows is a trademark of Microsoft Corporation.
NFS PostScript SUN Microsystems, Inc. Adobe Systems, Inc.
xiv
An HACMP Cookbook
Preface
This publication is intended to help customers and IBM services personnel to more easily plan, install, set up, and document their HACMP clusters. It contains a description of a set of tools developed by the professional services team of IBM France for this purpose. This document is intended for anyone who needs to implement an HACMP cluster.
Chapter 1, Overview of the Tools This chapter briefly describes each of the configuration and documentation tools included with the book.
Chapter 2, Inventory Tool This chapter includes a description of and sample output from a tool that takes an initial inventory of a system that will be a cluster node, and reports any potential problems.
Chapter 3, Setting up a Cluster This chapter begins the description of setting up our example cluster. It introduces and describes the example cluster we will set up and use throughout the book, and covers the major planning considerations to be made before starting a cluster setup.
Chapter 4, Pre-Installation Activities The set of AIX configuration tasks that need to be done before the installation of HACMP is covered in this chapter. This includes TCP/IP network adapter definitions, tty and SCSI target mode definitions, connecting shared disks, and defining shared volume groups.
Chapter 5, Installing the HACMP/6000 Software This chapter describes how to install the HACMP/6000 software and its updates. It also covers the necessary customizations to the clhosts and clinfo.rc files.
Chapter 6, Cluster Environment Definition The definition of the cluster, its nodes, and the network adapters for HACMP are given in this chapter. The example cluster is used for the definitions.
Chapter 7, Node Environment Definition This chapter describes how to define application servers, resource groups, and resources belonging to those resource groups.
Chapter 8, Starting and Stopping Cluster Services The options involved in starting and stopping the HACMP software on a machine are described here.
xv
Once the basic cluster has been set up and tested, error notification can be used to take special action upon the occurrence of specified errors in the AIX error log. The set of tools included in this book includes a tool that makes the setup and testing of these error notification methods quite easy.
Chapter 10, Event Customization Tool This chapter describes a tool provided with the book that makes the customization of cluster events easier. It provides an example of using the tool.
Chapter 11, Cluster Documentation The documentation tool provided with this book generates extensive documentation of a cluster node and cluster definitions. This documentation report can be used to allow a new administrator to understand the original setup of the cluster. This chapter describes how to run the documentation tool and generate a report.
Appendix A, Qualified Hardware for HACMP This appendix includes the HAMATRIX document, which lists the tested and supported hardware for HACMP, as of the date of publication. This document is continually updated as new devices are introduced.
Appendix B, RS232 Serial Connection Cable This appendix describes the options for buying or building the RS232 connection cable that is used to connect nodes with a non-TCP/IP network.
Appendix C, List of AIX Errors This appendix provides a list of AIX errors that can be put into the AIX error log. It can be used as a reference in using the error notification tool.
Appendix D, Disk Setup in an HACMP Cluster This appendix gives detailed descriptions of the cable requirements and other activities involved in connecting any of the supported shared disks for HACMP.
Appendix E, Example Cluster Planning Worksheets This appendix includes completed cluster planning worksheets for the example cluster whose setup we describe in the document.
Part 1, Cluster Documentation Tool Report This appendix includes a cluster documentation report, generated by the documentation tool included with this redbook.
Related Publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this document.
HACMP/6000 Concepts and Facilities , SC23-2699 HACMP/6000 Planning Guide , SC23-2700 HACMP/6000 Installation Guide , SC23-2701 HACMP/6000 Administration Guide , SC23-2702 HACMP/6000 Troubleshooting Guide , SC23-2703 HACMP/6000 Programming Locking Applications , SC23-2704
xvi
An HACMP Cookbook
HACMP/6000 Programming Client Applications , SC23-2705 HACMP/6000 Master Index and Glossary , SC23-2707 HACMP/6000 Licensed Program Specification , GC23-2698 Common Diagnostics and Service Guide , SA23-2687 RISC System/6000 System Overview and Planning , GC23-2406
HACMP/6000 Customization Examples , SG24-4498 High Availability on the RISC System/6000 Family , SG24-4551 A Practical Guide to the IBM 7135 RAID Array , SG24-2565
A complete list of International Technical Support Organization publications, known as redbooks, with a brief description of each, may be found in:
http://www.redbooks.ibm.com/redbooks
IBM employees may access LIST3820s of redbooks as well. The internal Redbooks home page may be found at the following URL:
http://w3.itsc.pok.ibm.com/redbooks/redbooks.html
Preface
xvii
Acknowledgments
This project was designed and managed by: David Thiessen International Technical Support Organization, Austin Center The authors of this document are: Nadim Tabassum IBM France David Thiessen International Technical Support Organization, Austin Center The document is based on a version in the French language used in IBM France. The authors of the original document are: C. Castagnier IBM France J. Redon IBM France Nadim Tabassum IBM France This publication is the result of a residency conducted at the International Technical Support Organization, Austin Center. Thanks to the following people for the invaluable advice and guidance provided in the production of this document: Marcus Brewer International Technical Support Organization, Austin Center
xviii
An HACMP Cookbook
# tar xvf /dev/rfd0 The tools are installed in the /usr/HACMP_ANSS directory. All the tools are written to use this directory. If you wish to change this, it will involve a considerable effort on your part, and your scripts may not be in the same place in all sites where you use the tool. The main subdirectories are: tools This directory contains the tools provided to help you customize your environment. There is a subdirectory for each tool under this directory. Certain files which are common to all of the tools are also stored here. DOC_TOOL - there are two tools here. The first, inventory, is used to obtain the state of the system before installing HACMP. This will also give you a list of any problems you may encounter due to different machines having similar logical volume names, SCSI ids, or other characteristics. The second tool, doc_dossier, produces a detailed description of your cluster configuration and should be run after installing HACMP. You can print out the report either in an ascii, VM or PostScript format. ERROR_TOOL - this tool allows you to customize the handling of system errors. EVENT_TOOL - this tool allows you to customize the actions taken in response to cluster events. This directory is not created at install time. It is created the first time one of the tools needs to write something into it. You should place all of your customized scripts here and this directory should never be deleted. Skeleton files are created here for certain events and errors; these should be tailored to suit your needs. This directory contains site specific scripts which are created by the tools. This directory contains the files used to draw the cluster configuration. This directory is created the first time it is called. It contains the output files for the tools when they are run.
script
Log files for the messages, errors and warnings generated by the customized scripts are stored in the directory /var/HACMP_ANSS/log. This directory is automatically created the first time that the tools are used. It contains two files which are created when they are first invoked. The files are called:
hacmp.errlog hacmp.eventlog
As you use the tool, you will notice a French flavor in the variable names and file names. This has been preserved to recognize the heritage of the tools.
# /usr/HACMP_ANSS/tools/SAVE
An HACMP Cookbook
Lists the disk adapters Checks the SCSI ID of each adapter so you will know whether you you will have to change it (SCSI disks ONLY) Lists the disks connected to an adapter Lists the logical volumes (LVs) and indicates whether they are mirrored or not Checks that LV names and mount points are unique for each filesystem on the cluster nodes Checks that LV names are not trivial (like lv00 or lv01)
An HACMP Cookbook
66666 6 6 6 6 6 6 6 6 6
6 66 6 6 6 6 6
6 66 6 6 6 6 6 6 6 666666 6 6 6
6 6 66 66 6 66 6 6 6 6 6
6 6 66 66 6 66 6 6 6 6 6
The following serial ports were found: ADAP ADDRESS sa1 00-00-S1 sa2 00-00-S2 The following ttys are configured: TTY TERM LOGIN STOPS tty0 ibm3151 enable 1 tty1 dumb disable 1 The following network adapters were found: ent0 00-00-0E The scsi0 adapter has its SCSI ID set to id 7 and has the following disks connected to it: ADAPT DISK ADDRESS VOLUME GROUP scsi0 hdisk0 00-00-0S-00 rootvg scsi0 hdisk1 00-00-0S-40 nadvg scsi0 hdisk2 00-00-0S-50 nadvg Volume group VG NAME rootvg rootvg rootvg rootvg rootvg rootvg rootvg rootvg rootvg rootvg Volume group VG NAME nadvg nadvg nadvg nadvg nadvg nadvg nadvg nadvg nadvg rootvg contains the following logical volumes LV NAME TYPE MOUNT POINT MIRROR hd6 paging N/A no mirrored copies hd5 boot /blv no mirrored copies hd7 sysdump /mnt no mirrored copies hd8 jfslog N/A no mirrored copies hd4 jfs / no mirrored copies hd2 jfs /usr no mirrored copies hd1 jfs /home no mirrored copies hd3 jfs /tmp no mirrored copies hd9var jfs /var no mirrored copies lvtmp jfs /netview no mirrored copies
BPC 8 8
defined defined defined defined defined defined defined defined defined defined
nadvg contains the following logical volumes LV NAME TYPE MOUNT POINT MIRROR fslv00 jfs /alpha mirror 2 copies beta jfs /beta mirror 2 copies gamma jfs /gamma mirror 2 copies delta jfs /delta mirror 2 copies nadlog jfslog N/A mirror 2 copies zeta jfs N/A mirror 3 copies theta jfs N/A mirror 3 copies lv_netview jfs /usr/OV no mirrored copies defined lv_sm6000 jfs /usr/adm/sm6000 no mirrored copies defined
66 6 6 6 6 666666 6 6 6 6
66666 6 6 6 6 6
66666 6 6 6 6 6
6 6 66 6 6 6 6 6 6 6 6 66 6 6
66666 6 6 6 6 6
6 6 6 6 6 6
6666 6 6 6 6 6 6 6 6 6666
6 6 66 6 6 6 6 6 6 6 6 66 6 6
ANOMALIES: CONFIGURATION INFORMATION COMPARING THE TWO NODES IDENTIFYING rs232 PORTS ON THE TWO NODES NODE: jack - tty0 dumb disable 1 NODE: nadim - tty1 dumb disable 1
8 8
CHECKING THE SCSI ID s OF THE SHARED ADAPTERS NODE: jack: The scsi0 adapter has its SCSI ID set to id 7 NODE: nadim: The scsi0 adapter has its SCSI ID set to id 7 CHECKING THE MOUNT POINTS The /lll directory has the same mount point on the 2 nodes The /mountp directory has the same mount point on the 2 nodes CHECKING THE LOGICAL VOLUME NAMES logical volume : zz has the same name on the 2 systems logical volume lv00 has a non significant name on NODE: jack
Figure 2. Example of a /tmp/HACMPmachine-anomalies file
An HACMP Cookbook
Planning Considerations Pre-Installation Activities Installing HACMP Cluster Environment Definition Node Environment Definition Starting and Stopping HACMP Error Notification Customization Event Customization Documenting your Cluster
Spread throughout our example will be descriptions of the correct times to run each of the various tools provided.
The cluster nodes are evenly matched 5XX model CPUs. This makes them good candidates for Mutual Takeover, since each node is able to handle an equal application load during normal operations. The main or public network is a Token-Ring network. Each node has two interfaces on this network, a service and a standby. Since we will be configuring each node to be able to take over the IP address of the other, each node will also have a boot address to be used on its service interface. i1.boot address This will allow the machine to boot and connect to the network without conflicts, when its service address has been taken over and is still active on the other node. There is a second network, an ethernet network called etnet1. This network will be defined to HACMP as a private network . As such, it will be used to carry Cluster Lock Manager traffic between nodes. A private network is highly recommended in any configuration using concurrent access. The private network has only service interfaces, and not standby interfaces. Standby interfaces can, of course, also be used in private networks, but since Cluster Lock Manager traffic automatically shifts to the public network if there is a private network failure, standby interfaces on a private network are not essential.
The cluster has IBM 9333 Serial disks as its shared disks. There are two 9333 subsystems connected. The first one includes four disk drives, which will be configured into two volume groups, each containing two disks. The second subsystem includes two disks, which will be contained in a single concurrent volume group. The node mickey has two 9333 disk adapters, each connected to one of the subsystems. The other node goofy has only one 9333 disk adapter, which is connected to both 9333 subsystems. There is also a raw RS232 link between native serial ports on the two nodes, who each have a tty device defined. This link will be defined as an HACMP network called rsnet1, and will be used so that the cluster can continue to send keepalive packets between nodes, even if the TCP/IP subsystems fail on one or more nodes.
An HACMP Cookbook
Node goofy has two internal disks in its rootvg volume group, while node mickey has only one. This will cause the shared disks to have different device names on each of the nodes. For example, one of the shared disks will be named hdisk1 on node mickey, and hdisk2 on node goofy. This is a common situation in clusters, and is nothing to worry about. There is a client system, connected on the token-ring network, called pluto. We will be installing the client component of the HACMP software on this system.
The same subnet mask must be in use for all adapters on a node. Standby adapters must be on a different logical subnet from their service adapters. If a system will be having its service IP address taken over by another system, it must have a boot address configured. This boot address will be on the same logical subnet as the service address. The TCP/IP interface definition for the service adapter should be set to the boot address in this situation. If IP address takeover will not be used for this node, no boot address is necessary.
Please see the Planning Worksheets for our cluster in Appendix E, Example Cluster Planning Worksheets on page 131 to see how we have defined our adapters.
3.2.2.1 Termination
A SCSI bus must be terminated at each end. Normally, in a single system configuration, SCSI bus termination is done on the adapter at one end, by use of terminating resistor blocks. At the other end, the bus is terminated by a terminator plug, which is attached to the last device on the string. In an HACMP cluster, you will have at least two and possibly more systems sharing the same set of SCSI disks. To be able to create a SCSI string, including both disk devices and SCSI adapters in systems, special Y-Cables are used. Also, the termination of the bus must be moved off the adapters themselves, and on to the Y-cables, to allow more than just two systems to share the bus. Therefore, if you are using SCSI shared disks, you must use the correct Y-cables to connect them, and you must be sure to remove the terminating resistor blocks from each of your shared SCSI adapters. Depending on whether you are using 8-bit or 16-bit Fast/Wide adapters, the location of these terminating resistor blocks will be different. There are pictures of the locations of these blocks on each of the adapters, as well as a full description of how to cable each of the types of shared disks with HACMP in Appendix D, Disk Setup in an HACMP Cluster on page 107.
10
An HACMP Cookbook
11
12
An HACMP Cookbook
# mkdir /usr/HACMP_ANSS # tar -xvf/dev/fd0 If you do not have enough space in the /usr filesystem, and do not wish to make it bigger, you can make a separate filesystem for the tools by issuing the following commands:
# # # #
mklv -y toolhacmp rootvg 2 crfs -v jfs -d toolhacmp -m / usr/HACMP_ANSS -A yes -p rw -t no mount /usr/HACMP_ANSS tar -xvf/dev/fd0
Configuration of adapters and hostnames Configuration of the /etc/hosts file Configuration of the /.rhosts file Testing
13
It is recommended to configure the hostname of the system to be the same as the IP label for your service address, even if the IP address of the service adapter is initially set to the boot address. You will issue the command smit mktcpip to take you to the panel where you will configure your service adapter:
Minimum Configuration & Startup To Delete existing configuration data, please use Further Configuration menus Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [mickey] [9.3.1.45] [255.255.255.0] tr0 [] [] [] 16 yes + +
* HOSTNAME * Internet ADDRESS (dotted decimal) Network MASK (dotted decimal) * Network INTERFACE NAMESERVER Internet ADDRESS (dotted decimal) DOMAIN Name Default GATEWAY Address (dotted decimal or symbolic name) RING Speed START Now
F4=List F8=Image
Note that we have assigned a hostname of mickey, even though we have configured the IP address to be the boot address. If you are using a nameserver, be sure also to include the information about the server, and the domain, in this panel. From here, we will use the command smit chinet to take us to the panel to configure the other network adapters. Here is the example for node mickey s standby adapter:
14
An HACMP Cookbook
Change / Show a Token-Ring Network Interface Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] Network Interface Name INTERNET ADDRESS (dotted decimal) Network MASK (hexadecimal or dotted decimal) Current STATE Use Address Resolution Protocol (ARP)? Enable Hardware LOOPBACK Mode? BROADCAST ADDRESS (dotted decimal) Confine BROADCAST to LOCAL Token-Ring? tr1 [9.3.4.79] [255.255.255.0] up yes no [9.3.4.255] no
+ + + +
F4=List F8=Image
Continue with this for each of the TCP/IP network adapters on each of the nodes. If you have more than one network defined, also configure any service, boot, and standby adapters from those networks to TCP/IP.
# Cluster 1 - disney 9.3.1.45 9.3.1.79 9.3.4.79 9.3.5.79 9.3.1.46 9.3.1.80 9.3.4.80 9.3.5.80 mickey_boot mickey mickey_sb mickey_en goofy_boot goofy goofy_sb goofy_en
Once you have created the /etc/hosts file on one system, you can use ftp to transfer it to each of your other cluster nodes.
15
mickey_boot mickey mickey_sb mickey_en goofy_boot goofy goofy_sb goofy_en mickey_boot.itsc.austin.ibm.com mickey.itsc.austin.ibm.com mickey_sb.itsc.austin.ibm.com mickey_en.itsc.austin.ibm.com goofy_boot.itsc.austin.ibm.com goofy.itsc.austin.ibm.com goofy_sb.itsc.austin.ibm.com goofy_en.itsc.austin.ibm.com Be sure the permissions on the /.rhosts file are set to 600; that is, read/write for root, and no access for anyone else. Again, once you have created this file correctly on one node, you can use ftp to transfer it to each of the others. Remember that any new files delivered by ftp will be set up with default permissions. You may need to sign on to each of the other nodes and change the permissions on the /.rhosts file.
Again, if you are using your cluster nodes as gateways or routers, please skip this step.
4.2.5 Testing
Once you have completed this configuration, test it by using the ping command to contact each of your defined adapters, including standby adapters. If there is any problem here, do not continue until you have corrected it.
16
An HACMP Cookbook
Add a TTY TTY type TTY interface Description Parent adapter * PORT number BAUD rate PARITY BITS per character Number of STOP BITS TERMINAL type STATE to be configured at boot time ... ... Enable LOGIN tty rs232 Terminal asynchrone sa0 [s1] [9600] [none] [8] [1] [dumb] [available]
disable
Use all the default settings, including leaving the Enable LOGIN field set to disable, and the TERMINAL type set to dumb. Take note of the tty device number returned by the SMIT panel, since you will need it later. If this is the first tty device defined, it will be /dev/tty0, which we will use in our example. Do this definition on each of your nodes.
After you have entered the command, nothing should happen until you run the same command on the second node:
17
If the connection has been properly set up, you should now see the output of the stty command on both nodes. Make sure that this is working correctly before proceeding.
# chdev -l scsi2 -a tm= yes It can also be done through SMIT, by entering the command smit chgscsi. The following panel is presented:
SCSI adapter Description Status Location Adapter card SCSI ID BATTERY backed adapter ... Enable TARGET MODE interface =================> Target Mode interface enabled [PLUS...2]
A reboot is not necessary but you must rerun the configuration manager.
Do the following command to find the name of the target mode SCSI link device:
# lsdev -Cc tmscsi If this is the first link you have created, the device name will be tmscsi0. Note this name down, since it will be used in our testing and in HACMP configuration.
18
An HACMP Cookbook
# cat /etc/motd > /dev/tmscsi0.im The contents of the /etc/motd file should be listed on the node where you entered the first command.
19
For concurrent access, the steps are the same, if you omit those steps concerning the jfslog and filesystems.
VOLUME GROUP name Physical partition SIZE in megabytes * PHYSICAL VOLUME names Activate volume group AUTOMATICALLY at system restart? * ACTIVATE volume group after it is created? Volume Group MAJOR NUMBER
+ + + + +#
F4=List F8=Image
Here, you provide the name of the new volume group, the disk devices to be included, and the major number to be assigned to it. It is also important to specify that you do not want the volume group activated (varied on) automatically at system restart, by changing the setting of that field to no.
20
An HACMP Cookbook
The varyon of shared volume groups needs to be under the control of HACMP, so it is coordinated correctly. Regardless of whether you intend to use NFS or not, it is good practice to specify a major number of the volume group. To do this, you must select a major number that is free on each node. Be sure to use the same major number on all nodes. Use the lvlstmajor command on each node to determine a free major number common to all nodes. 2. Because test1vg and test2vg contain mirrored disks, you can turn off quorum checking. On the command line, enter smit chvg and set quorum checking to no
Change a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] test1vg no no
* VOLUME GROUP name * Activate volume group AUTOMATICALLY at system restart? * A QUORUM of disks required to keep the volume group on-line ?
+ +
F4=List F8=Image
Now repeat the two steps above for volume group test2vg, using major number 61. For our concurrent volume group conc1vg, with major number 62, repeat the two steps almost exactly, except that quorum protection must be left on for a concurrent volume group. 3. Varyon the three volume groups on node mickey:
# varyonvg test1vg # varyonvg test2vg # varyonvg conc1vg
4. Before you create any filesystems on the shared disk resources, you need to explicitly create the jfslog logical volume . This is so that you can give it a unique name of your own choosing, which is used on all nodes in the cluster to refer to the same log. If you do not do this, it is possible and likely that naming conflicts will arise between nodes in the cluster, depending on what user filesystems have already been created. Use SMIT to add the log logical volumes loglvtest1 for the filesystems in volume group test1vg, and loglvtest2 for the filesystems in volume group test2vg. Enter smit mklv, and select the volume group test1vg to which you are adding the first new jfslog logical volume.
21
Add a Logical Volume Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Logical volume NAME * VOLUME GROUP name * Number of LOGICAL PARTITIONS PHYSICAL VOLUME names Logical volume TYPE POSITION on physical volume RANGE of physical volumes MAXIMUM NUMBER of PHYSICAL VOLUMES to use for allocation Number of COPIES of each logical partition Mirror Write Consistency? Allocate each logical partition copy on a SEPARATE physical volume? [MORE...9] F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do [Entry Fields] [loglvtest1] test1vg [1] [hdisk1 hdisk2] [jfslog] midway minimum [] 2 yes yes
# + + + # + + +
F4=List F8=Image
The fields that you need to change or add to are shown in bold type. After you have created the jfslog logical volume, be sure to format the log logical volume with the following command:
# /usr/sbin/logform /dev/loglvtest1 logform: destroy /dev/loglvtest1 (y)?
Answer yes (y) to the prompt about whether to destroy the old version of the log. Now create the log logical volume loglvtest2 for volume group test2vg and format the log, using the same procedure. 5. Now use SMIT to add the logical volumes lvtest1 in volume group test1vg and lvtest2 in volume group test2vg. It would be possible to create the filesystems directly, which would save some time. However, it is recommended to define the logical volume first, and then to add the filesystem on it. This procedure allows you set up mirroring and logical volume placement policy for performance. It also means you can give the logical volume a unique name. On node mickey, enter smit mklv, and select the volume group test1vg, to which you will be adding the new logical volume.
22
An HACMP Cookbook
Add a Logical Volume Type or select values in entry fields. Press Enter AFTER making all desired changes. [TOP] Logical volume NAME * VOLUME GROUP name * Number of LOGICAL PARTITIONS PHYSICAL VOLUME names Logical volume TYPE POSITION on physical volume RANGE of physical volumes MAXIMUM NUMBER of PHYSICAL VOLUMES to use for allocation Number of COPIES of each logical partition Mirror Write Consistency? Allocate each logical partition copy on a SEPARATE physical volume? RELOCATE the logical volume during reorganization? Logical volume LABEL MAXIMUM NUMBER of LOGICAL PARTITIONS Enable BAD BLOCK relocation? SCHEDULING POLICY for writing logical partition copies Enable WRITE VERIFY? File containing ALLOCATION MAP [BOTTOM] F1=Help F5=Reset F9=Shell F2=Refresh F6=Command F10=Exit F3=Cancel F7=Edit Enter=Do [Entry Fields] [lvtest1] test1vg [20] [hdisk1 hdisk2] [] center minimum [] 2 yes yes yes [] [128] yes sequential no []
# + + + # + + + +
+ + +
F4=List F8=Image
The bold type illustrates those fields that need to have data entered or modified. Notice that SCHEDULING POLICY has been set to sequential. This is the best policy to use for high availability, since it forces one mirrored write to complete before the other may start. In your own setup, you may elect to leave this option set to the default value of parallel to maximize disk write performance. Again, repeat this procedure to create a 25 partition logical volume lvtest2 on volume group test2vg. 6. Now, create the filesystems on the logical volumes you have just defined. At the command line, you can enter the following fastpath: smit crjfslv. Our first filesystem is configured on the following panel:
Add a Journaled File System on a Previously Defined Logical Volume
Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] lvtest1 [/test1] no read/write [] no
* LOGICAL VOLUME name * MOUNT POINT Mount AUTOMATICALLY at system restart? PERMISSIONS Mount OPTIONS Start Disk Accounting?
+ + + + +
F4=List F8=Image
23
Repeat the above step to create the filesystem /test2 on logical volume lvtest2. 7. Mount the filesystems to check that creation has been successful.
# mount /test1 # mount /test2
8. If there are problems mounting the filesystems, there are two suggested actions to resolve them: a. Execute the fsck command on the filesystem. b. Edit the /etc/filesystems file, check the stanza for the filesystem, and make sure it is using the new jfslog you have created for that volume group. Also, make sure that the jfslog has been formatted correctly with the logform command. Assuming that the filesystems mounted without problems, now unmount them.
# umount /test1 # umount /test2
9. Now, create the logical volumes for our concurrent volume group conc1vg. From checking on the worksheet, you will see that we will be creating the following logical volumes:
24
An HACMP Cookbook
Import a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [test1vg] [hdisk2] + yes [60]
VOLUME GROUP name * PHYSICAL VOLUME name * ACTIVATE volume group after it is imported? Volume Group MAJOR NUMBER
+ +#
F4=List F8=Image
2. Change the volume group to prevent automatic activation of test1vg at system restart and to turn off quorum checking. This must be done each time you import a volume group, since these options will reset to their defaults on each import. Enter smit chvg:
Change a Volume Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] test1vg no no
* VOLUME GROUP name * Activate volume group AUTOMATICALLY at system restart? * A QUORUM of disks required to keep the volume group on-line ?
+ +
F4=List F8=Image
3. Repeat the two steps above for volume group test2vg, using major number 61, and for conc1vg, using major number 62. For volume group conc1vg, leave quorum protection turned on, since this is a requirement for concurrent volume groups. 4. Vary on the volume groups and mount the filesystems on goofy to ensure that there are no problems.
25
26
An HACMP Cookbook
Select your picks using F7. In our example, we are selecting the option to install all components, including cluster.clvm which gives us the ability to do concurrent access. If we were not running concurrent access, we would select cluster.server, which will automatically install cluster.client as a prerequisite.
Select your picks using F7. For non RS/6000 clients we can still carry out ARP cache refreshes using /usr/sbin/cluster/clinfo.rc.. Refer to Section 5.6, Customizing the /usr/sbin/cluster/etc/clinfo.rc File on page 29 to see how this is done.
27
28
An HACMP Cookbook
mickey 9.3.1.80
PING_CLIENT_LIST=
For instance:
PING_CLIENT_LIST=mickey goofy Clinfo is started automatically by the /etc/inittab file on cluster clients.
29
30
An HACMP Cookbook
These definitions can be entered from one node for the entire cluster. After this has been completed, the cluster environment definitions are synchronized from one node to all the others. Finally, the cluster environment should be verified, using the cluster verification utility, to ensure there are no errors before proceeding.
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
2. Select Manage Cluster Environment and press Enter to display the following menu:
31
Manage Cluster Environment Move cursor to desired item and press Enter. Configure Cluster Configure Nodes Configure Adapters Synchronize All Cluster Nodes Show Cluster Environment Configure Network Modules
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
3. Select Configure Cluster and press Enter to display the following menu:
Configure Cluster Move cursor to desired item and press Enter. Add a Cluster Definition Change / Show Cluster Definition Remove Cluster Definition
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
4. Choose the Add a Cluster Definition option and press Enter to display the following panel.
Add a Cluster Definition Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] **NOTE: Cluster Manager MUST BE RESTARTED in order for changes to be acknowledged.** * Cluster ID * Cluster Name [1] [disney] #
F4=List F8=Image
5. Press Enter. The cluster ID and name are entered in HACMP s own configuration database managed by the ODM. 6. Press F3 to return to the Manage Cluster Environment screen. From here, we will move to the next stage, defining the cluster nodes.
32
An HACMP Cookbook
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
2. Choose the Add Cluster Nodes option and press Enter to display the following screen:
Add Cluster Nodes Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [mickey goofy]
* Node Names
F4=List F8=Image
Remember to leave a space between names. If you use a duplicate name, an error message will be displayed. You need only to enter this information on one node, because you can later execute Synchronize All Cluster Nodes to propagate the information, using HACMPs Global ODM (GODM), to all other nodes configured in the cluster. 3. Press Enter to update HACMP s configuration database. 4. Press F3 to return to the Manage Cluster Environment screen. From here, we will move to the next stage, defining the network adapters to HACMP.
33
34
An HACMP Cookbook
Configure Adapters Move cursor to desired item and press Enter. Add an Adapter Change / Show an Adapter Remove an Adapter
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
2. Choose the Add an Adapter option. Press Enter to display the following panel, where you will fill out the fields for the service adapter:
Add an Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [mickey] [token] [trnet1] public service [9.3.1.79] [0x42005aa8b484] [mickey]
* * * * *
Adapter Label Network Type Network Name Network Attribute Adapter Function Adapter Identifier Adapter Hardware Address Node Name
+ + + +
F4=List F8=Image
3. Press Enter to store the details in HACMP s configuration database. The following observations can be made about the fields to be filled in on this panel: Adapter Label This is the IP label of the adapter, which should be the same as the label you have defined in the /etc/hosts file and in your nameserver. If you list this field with F4, you will see the various Network Interface Modules (NIMs) available. There is a NIM for each type of network medium supported, as well as a Generic IP NIM. Since this adapter is on a token-ring network, we have selected the token NIM. This is an arbitrary name of your own choosing, to define to HACMP which of its adapters are on the same physical network. It is important that you use the same network name for all of the adapters on a physical network.
Network Type
Network Name
35
Network Attribute
This field can either be set to public, private, or serial. A public network is one that is used by cluster nodes and client systems for access, as is this token-ring network. A private network is used for communications between cluster nodes only. The Cluster Lock Manager uses any private networks that are defined for its first choice to communicate between nodes. The most common reason to define a network as private is to reserve it for the exclusive use of the Cluster Lock Manager. A serial network is a non-TCP/IP network. This is the value you will define for your RS232 connection, and your SCSI Target Mode network if you have one. This field can either be set to service, standby, or boot. A service adapter provides the IP address that is known to the users, and that is in use when the node is running HACMP and is part of the cluster. The standby adapter , as we have said before, is an adapter that is configured on a different subnet from the service adapter, and whose function is to be ready to take over the IP address of a failed service adapter in the same node, or the service adapter address of another failed node in the cluster. The boot adapter provides an alternate IP address to be used, instead of the service IP address, when the machine is booting up, and before HACMP Cluster Services are started. This address is used to avoid address conflicts in the network, because if the machine is booting after previously failing, its service IP address will already be in use, since it will have been taken over by the standby adapter on another node. A node rejoining the cluster will only be able to switch from its boot to its service address, after that service address has been released by the other node. For a TCP/IP network adapter, this will be the IP address of the adapter. If you have already done your definitions in the /etc/hosts file, as you should have at this point, you do not have to fill in this field, and the system will find its value, based on the Adapter IP Label you have provided. For a non-TCP/IP (serial) network adapter, this will be the device name of the adapter, for instance /dev/tty0 or /dev/tmscsi0.
Adapter Function
Adapter Identifier
Adapter Hardware Address This is an optional field. If you want HACMP to also move the hardware address of a service adapter to a standby adapter at the same time that it moves its IP address, you will want to fill in a hardware address here. This hardware address is of your own choosing, so you must make sure that it does not conflict with that of
36
An HACMP Cookbook
any other adapter on your network. For token-ring adapters, the convention for an alternate hardware address is that the first two digits of the address are 42. In our example, we have found out the real hardware address of the adapter by issuing the command lscfg -v -l tok0. Our alternate hardware address is the same as the real address, except that we have changed the first two digits to 42. This ensures that there is not a conflict with any other adapter, since all real token-ring hardware address start with 10.... If you fill in an alternate hardware address here, HACMP will change the hardware address of the adapter from its real address which it has at boot time, to the alternate address, at the same time as it is changing the IP address from the boot address to the service address. If this is done, client users, who only know about the service address, will always have a constant relationship between the service IP address and its hardware address, even through adapter and node failures, and will have no need to flush their ARP caches when these failures occur. Alternate hardware address are only used with service adapters, since these are the only adapters that ever have their IP addresses taken over. Node Name This is the name of the node to which this adapter is connected. You can list the nodes that you have defined earlier with the F4 key, and choose the appropriate node.
4. Select the Add an Adapter option again. Press Enter to display the following panel and fill out the fields for the boot adapter:
Add an Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [mickey_boot] [token] [trnet1] public boot [9.3.1.45] [] [mickey]
* * * * *
Adapter Label Network Type Network Name Network Attribute Adapter Function Adapter Identifier Adapter Hardware Address Node Name
+ + + +
F4=List F8=Image
37
Notice that we have defined this adapter having the same network name as the service adapter. Also, you should note that the IP address for the boot adapter is on the same subnet as the service adapter. These two HACMP adapters, boot and service, actually represent different IP addresses to be used on the same physical adapter. In this case, token-ring adapter tok0 will start out on the boot IP address when the machine is first booted, and HACMP will switch the adapters IP address to the service address (and the hardware address to the alternative address we have defined) when HACMP Cluster Services are started. 5. Press Enter to store the details in HACMP s configuration database. 6. Select the Add an Adapter option again. Press Enter and fill out the fields for the IP details for the standby adapter:
Add an Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [mickey_sb] [token] [trnet1] public standby [9.3.4.79] [] [mickey]
* * * * *
Adapter Label Network Type Network Name Network Attribute Adapter Function Adapter Identifier Adapter Hardware Address Node Name
+ + + +
F4=List F8=Image
Notice again that we have used the same network name, since this adapter is on the same physical network. We should also point out that this adapter has been configured on a different subnet from the boot and service adapter definitions. Our subnet mask was set earlier in the TCP/IP setup to 255.255.255.0. 7. Press Enter to store the details in HACMP s configuration database. 8. Select the Add an Adapter option again. Press Enter and fill out the details for the RS232 connection:
38
An HACMP Cookbook
Add an Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [mickey_tty0] [rs232] [rsnet1] serial service [/dev/tty0] [] [mickey]
* * * * *
Adapter Label Network Type Network Name Network Attribute Adapter Function Adapter Identifier Adapter Hardware Address Node Name
+ + + +
F4=List F8=Image
Note here that we have chosen a different network type and network attribute, and assigned a different network name. Also, the adapter identifier is defined as the device name of the tty being used.
Add an Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [goofy] [token] [trnet1] public service [9.3.1.80] [0x42005aa8d1f3] [goofy]
* * * * *
Adapter Label Network Type Network Name Network Attribute Adapter Function Adapter Identifier Adapter Hardware Address Node Name
+ + + +
F4=List F8=Image
Here note that we have defined an alternate hardware address for this adapter also, which corresponds to the real hardware address of adapter tok0, with the first two digits changed to 42.
39
Add an Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [goofy_boot] [token] [trnet1] public boot [9.3.1.46] [] [goofy]
* * * * *
Adapter Label Network Type Network Name Network Attribute Adapter Function Adapter Identifier Adapter Hardware Address Node Name
+ + + +
F4=List F8=Image
Add an Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [goofy_sb] [token] [trnet1] public standby [9.3.4.80] [] [goofy]
* * * * *
Adapter Label Network Type Network Name Network Attribute Adapter Function Adapter Identifier Adapter Hardware Address Node Name
+ + + +
F4=List F8=Image
40
An HACMP Cookbook
Add an Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [goofy_tty0] [rs232] [rsnet1] serial service [/dev/tty0] [] [goofy]
* * * * *
Adapter Label Network Type Network Name Network Attribute Adapter Function Adapter Identifier Adapter Hardware Address Node Name
+ + + +
F4=List F8=Image
Manage Cluster Environment Move cursor to desired item and press Enter. Configure Cluster Configure Nodes Configure Adapters Synchronize All Cluster Nodes Show Cluster Environment Configure Network Modules
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
1. Select the Synchronize All Cluster Nodes option on the Manage Cluster Environment menu and press Enter. SMIT responds: ARE YOU SURE? 2. Press Enter.
41
Note: Before synchronizing the cluster definition, all nodes must be powered on, and the /etc/hosts and /.rhosts files must include all HACMP IP labels.
The cluster definition, including all node, adapter, and network module information, is copied from mickey to goofy. For more information, refer to Chapter 8, Defining the Cluster Environment, in the HACMP/6000 Installation Guide .
42
An HACMP Cookbook
Defining application servers Defining resource groups and resources Verifying the cluster
Using this information, the application can be defined as a resource protected by HACMP. HACMP will then be able to start and stop the application at the appropriate time, and on the correct node. Application Server start and stop scripts should be contained on the internal disks of each node, and must be kept in the same path location on each node. To define an Application Server, perform the following tasks: 1. At the command prompt, enter the SMIT fastpath smit hacmp. The following panel is presented:
HACMP/6000 Move cursor to desired item and press Enter. Manage Cluster Environment Manage Application Servers Manage Node Environment Show Environment Verify Environment Manage Cluster Services Cluster Recovery Aids Cluster RAS Support
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
43
Manage Application Servers Move cursor to desired item and press Enter. Add an Application Server Change / Show an Application Server Remove an Application Server
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
F4=List F8=Image
4. Enter an arbitrary Server Name, and then enter the full pathnames for the start and stop scripts. Remember that the start and stop scripts must reside on each participating cluster node. Our script names are:
/usr/local/mickey_start /usr/local/mickey_stop
Once this is done, an Application Server named mickeyapp1 has been defined, and can be included in a resource group to be controlled by HACMP. You can now repeat a similar procedure to define an application server for goofys application, called goofyapp1. Finally, you could create an application for the concurrent application, called concapp1.
44
An HACMP Cookbook
As a final step, we will define our concurrent resource group concrg. Resource group concrg will consist of the following resources:
The steps required to set up this configuration of resource groups are as follows: 1. Configure the resource group mickeyrg on node mickey by using the SMIT fastpath command:
# smit cl_mng_res
Then select Add / Change / Show / Remove a Resource Group from the following menu:
Manage Resource Groups Move cursor to desired item and press Enter. Add / Change / Show / Remove a Resource Group Configure Resources for a Resource Group Configure Run Time Parameters
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
45
Add / Change / Show / Remove a Resource Group Move cursor to desired item and press Enter. Add a Resource Group Change / Show a Resource Group Remove a Resource Group
F2=Refresh F10=Exit
F3=Cancel Enter=Do
+ +
F4=List F8=Image
In the field Participating Node Names, be sure to name the highest priority node first . For resource group mickeyrg, this is mickey, since it is the owner. Other nodes participating then get named, in decreasing order of priority. In a two node cluster, there is only one other name, but in a larger cluster, you may have more than two nodes (but not necessarily all nodes) participating in any resource group. 4. Press Enter to store the information in HACMP s configuration database. 5. Press F3 twice to go back to the Manage Resource Groups panel. Select Configure Resources for a Resource Group.
Manage Resource Groups Move cursor to desired item and press Enter. Add / Change / Show / Remove a Resource Group Configure Resources for a Resource Group Configure Run Time Parameters
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
6. The list that appears should show only one resource group, mickeyrg. Select this item.
46
An HACMP Cookbook
Select a Resource Group Move cursor to desired item and press Enter. mickeyrg F1=Help F8=Image /=Find F2=Refresh F10=Exit n=Find Next F3=Cancel Enter=Do
7. In the SMIT panel that follows, fill out the fields as shown. Make sure that the Inactive Takeover Activated and the 9333 Disk Fencing Activated fields are set to false.
Configure Resources for a Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] mickeyrg cascading mickey goofy [mickey] [/test1] [/test1] [] [] [] [] [mickeyapp1] [] false false + + + + + + + +
Resource Group Name Node Relationship Participating Node Names Service IP label Filesystems Filesystems to Export Filesystems to NFS mount Volume Groups Concurrent Volume groups Raw Disk PVIDs Application Servers Miscellaneous Data Inactive Takeover Activated 9333 Disk Fencing Activated
+ +
F1=Help F5=Reset
F2=Refresh F6=Command
F3=Cancel F7=Edit
F4=List F8=Image
The following comments should be made about some of these parameters: Service IP label By filling in the label of mickey here, we are activating IP address takeover. If node mickey fails, its service IP address (and hardware address since we have defined it) will be transferred to the other node in the cluster. If we had left this field blank, there would be no IP address takeover from node mickey to node goofy. Any filesystems that are filled in here will be mounted when a node takes over this resource group. The volume group that contains the filesystem will first be automatically varied on as well. Filesystems listed here will be NFS exported, so they can be mounted by NFS client systems or other nodes in the cluster.
Filesystems
Filesystems to Export
47
Filesystems to NFS mount Filling in this field sets up what we call an NFS cross mount . Any filesystem defined in this field will be NFS mounted by all the participating nodes, other than the node that currently is holding the resource group. If the node holding the resource group fails, the next node to take over breaks its NFS mount of this filesystem, and mounts the filesystem itself as part of its takeover processing. Volume Groups This field does not need to be filled out in our case, because HACMP will automatically discover which volume group it needs to vary on in order to mount the filesystem(s) we have defined. This field is there, so that we could specify one or more volume groups to vary on, in the case where there were no filesystems, but only raw logical volumes being used by our application. This field is very rarely used, but would be used in the case where an application is not using the logical volume manager at all, but is accessing its data directly from the hdisk devices. One example of this might be an application storing its data in a RAID-3 LUN. RAID-3 is not supported at all by the LVM, so an application using RAID-3 would have to read and write directly to the hdisk device. For any Application Servers that are defined here, HACMP will run their start scripts when a node takes over the resource group, and will run the stop script when that node leaves the cluster.
Application Servers
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
48
An HACMP Cookbook
Add / Change / Show / Remove a Resource Group Move cursor to desired item and press Enter. Add a Resource Group Change / Show a Resource Group Remove a Resource Group
F2=Refresh F10=Exit
F3=Cancel Enter=Do
Select Add a Resource Group. On the resulting panel, fill in the fields, as shown below, to define your second resource group.
Add a Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [goofyrg] cascading [goofy mickey]
+ +
F4=List F8=Image
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
49
Resource Group Name Node Relationship Participating Node Names Service IP label Filesystems Filesystems to Export Filesystems to NFS mount Volume Groups Concurrent Volume groups Raw Disk PVIDs Application Servers Miscellaneous Data Inactive Takeover Activated 9333 Disk Fencing Activated
+ +
F1=Help F5=Reset
F2=Refresh F6=Command
F3=Cancel F7=Edit
F4=List F8=Image
Fill in the appropriate fields, as shown above, and hit Enter to save the configuration. 9. Finally, we will set up our concurrent resource group concrg.
# smit cl_mng_res
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
50
An HACMP Cookbook
Add / Change / Show / Remove a Resource Group Move cursor to desired item and press Enter. Add a Resource Group Change / Show a Resource Group Remove a Resource Group
F2=Refresh F10=Exit
F3=Cancel Enter=Do
Select Add a Resource Group. On the resulting panel, fill in the fields, as shown below, to define the concurrent resource group.
Add a Resource Group Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] [concrg] concurrent [mickey goofy]
+ +
F4=List F8=Image
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
51
Resource Group Name Node Relationship Participating Node Names Service IP label Filesystems Filesystems to Export Filesystems to NFS mount Volume Groups Concurrent Volume groups Raw Disk PVIDs Application Servers Miscellaneous Data Inactive Takeover Activated 9333 Disk Fencing Activated
+ +
F1=Help F5=Reset
F2=Refresh F6=Command
F3=Cancel F7=Edit
F4=List F8=Image
Fill in the appropriate fields, as shown above, and hit Enter to save the configuration. In a concurrent resource group, the only two resources to be defined are:
Concurrent volume group - this gives access to the logical volumes Application server
10. The next job is to synchronize the node environment configuration to the other node. Hit F3 three times to return you to the Manage Node Environment panel, as shown below:
Manage Node Environment Move cursor to desired item and press Enter. Manage Resource Groups Change/Show Cluster Events Sync Node Environment
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
Select Sync Node Environment. You will see a series of messages, as the ODMs on the other node(s) are updated from the definitions on your node. You can also synchronize the resource group configuration from the command line by executing the /usr/sbin/cluster/diag/clconfig -s -r command.
52
An HACMP Cookbook
Note for HACMP Version 2.1 Users For those users that have used HACMP Version 2.1, it is important for you to note that in HACMP/6000 Version 3.1 and HACMP 4.1 for AIX, the node environment must also be synchronized explicitly, along with the cluster environment. This is a change from HACMP Version 2.1, where the node environment was automatically synchronized by the Global ODM.
# smit hacmp
HACMP/6000 Move cursor to desired item and press Enter. Manage Cluster Environment Manage Application Servers Manage Node Environment Show Environment Verify Environment Manage Cluster Services Cluster Recovery Aids Cluster RAS Support
F1=Help F9=Shell
F2=Refresh F10=Exit
F3=Cancel Enter=Do
F8=Image
53
Verify Environment Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] both []
+ #
F4=List F8=Image
Take the default on this panel, which is to verify both the network configurations and the resource configurations. The Global ODM of HACMP will check the definitions on all nodes, to make sure they are correct and consistent. It will also check various AIX system parameters and system files, to make sure they are set correctly for HACMP, and will check any application server scripts you have defined, to make sure they are on all the nodes where they need to be, and that they are executable. You should see several verification messages, but the results should yield no errors. If you encounter errors, you must diagnose and rectify them before starting the cluster managers on each node. Failure to rectify verification errors will cause unpredictable results when the cluster starts.
54
An HACMP Cookbook
* Start now, on system restart or both BROADCAST message at startup? Startup Cluster Lock Services? Startup Cluster Information Daemon?
+ + + +
F4=List F8=Image
Here, you can select all the defaults, and hit Enter to start cluster services on the node. Since we are running a concurrent access environment in our example, we would want to change the last two fields to true. Here are some comments on some of the fields: Start now, on system restart or both The recommended setting for this field is to now. If you set it to system restart or both, it will put a record into the /etc/inittab file, so that HACMP cluster services are started automatically on the machine each time it boots. This is not a very good idea, because it may result in a node trying to join the cluster before fixes have been fully tested, or at a time when the impact of resource group movement in the cluster is not desired.
55
It is much better to have explicit control over when cluster services are started on a node, and for that reason, the now setting is recommended. Startup Cluster Lock Services? Cluster Lock Services are, almost in all cases, only needed in a concurrent access configuration. The Cluster Lock Manager is normally used to control access to concurrently varied on volume groups. Therefore, we will want to change the setting to true, since we have a concurrent access configuration. The cluster information daemon, or clinfo, is the subsystem that manages the cluster information provided through the clinfo API to applications. This option would need to be set to true if you were going to be running applications directly on the cluster node that used the clinfo API. An example of such an application would be the cluster monitor clstat, which is provided as part of the product. If you are not running such an application, or are running such an application, but on a client machine, this option can be left with its default of false. If you are running a clinfo application on a client machine, it gets its information from the clsmuxpd daemon on a cluster node, and does not need clinfo to be running on that cluster node. When you start cluster services on a node, you will see a series of messages on the SMIT information panel, and then its status will switch to OK. This does not mean the cluster services startup is complete, however. To track the cluster processing, and to know when it is completed, you must watch the two main log files of HACMP:
/var/adm/cluster.log This log file tracks the beginning and completion of each of the HACMP event scripts. Only when the node_up_complete event completes is the node finished its cluster processing.
/tmp/hacmp.out This is a more detailed log file, as it logs each command of the HACMP event scripts as they are executing. In this case, you not only see the start and completion of each event, but also each command being executed in running those event scripts.
56
An HACMP Cookbook
It is recommended to run the tail -f command against each of these log files when you start up nodes in the cluster, so that you can track the successful completion of events, and so that you can know when the processing is completed.
* Stop now, on system restart or both BROADCAST cluster shutdown? * Shutdown mode
F4=List F8=Image
Here are some comments on the field choices: Stop now, on system restart or both If you select now, the default, HACMP will be stopped immediately, and no further action controlling future behavior will be taken. If you chose system restart or both, the system would also remove any automatic startup line for HACMP from the /etc/inittab file. Controls whether a broadcast message is sent to all users when HACMP is shut down on a node. If you choose graceful, HACMP will be shut down on the machine, and any resources being held will be released. However, no other nodes in the cluster will take over the resources. This is a good option when you want to just shut down HACMP on all nodes, one at a time. If you choose graceful with takeover, the HACMP software will be shut down and the resources released from the node. The next highest
Chapter 8. Starting and Stopping Cluster Services
Shutdown mode
57
priority node defined for the resource groups will then take over the appropriate resources. If you choose forced, the HACMP software will be stopped on the node, but the resources that it is holding will be retained.
58
An HACMP Cookbook
9.1 Description
Hardware and software errors, incidents and operator messages are logged in the AIX error log. To avoid the need for someone to periodically examine the error log in search of particular errors, we can configure Error Notification Methods to react automatically to the arrival of these errors. The errors that you will want to trap and treat will be dependent upon your installation. The error notification tool will do the following:
Create the templates for the scripts in the script subdirectory. These scripts can then be customized so that they react in the desired way to the arrival of errors. A possible example would be to promote a serial disk adapter failure to a node failure. Customize the relevant error notification objects in the ODM. Provide a test environment so that errors can be sent by you into the error log, without any real errors actually occurring. This will allow you to test your scripts. For example, we can generate SCSI_ERR3 without physically touching the SCSI adapter or the attached disks.
59
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + + + Choose one option at a time + + You can choose different errors successively + + + + Enter: end (when you have finished) + + + +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1) end 2) ************** 3) X25 - X25 adapter error 4) DISK - SCSI disk error 5) LVM - LOGICAL VOLUME MANAGER error 6) SCSI - SCSI adapter error 7) TOK - TOKEN RING adapter error 8) EPOW - POWER SUPPLY problem 9) FDDI - FDDI adapter error 10) SDA - SERIAL disk ADAPTER error 11) SDC - SERIAL disk CONTROLLER error 12) TMSCSI - SCSI network problem Amongst this list, which errors would you like to treat:
We could also choose more errors at the same time, if we wished. Here is what we will see on the screen:
*************************************** ** UPDATING ODM *************************************** /usr/HACMP_ANSS/utils/error_SDA applied ******************************************************** ** In order to delete your choice from the ODM ** ** use error_del ** ********************************************************
This procedure, as well as the procedures used to deselect the errors, (created automatically by the tool) are put into the utils subdirectory.
/usr/HACMP_ANSS/utils/error_SDA
The following routines, which will be executed as soon as the relevant error is logged in the error log, will be automatically created in the /usr/HACMP_ANSS/script subdirectory.
error_SDA error_NOTIFICATION
It is up to you to modify these scripts so that they behave as you require. As they are created by the tool, they are just empty template scripts.
60
An HACMP Cookbook
The error_NOTIFICATION script, which is automatically invoked by the error_SDA script, logs the incident in the /var/HACMP_ANSS/log/hacmp.errlog file and sends a mail message to the root user. Here is a listing of the error_SDA script, as we have modified it to our requirements:
#!/bin/ksh ############################################################################### # Written by: AUTOMATE # Last modification by *** who *** # # script: error_SDA # parameters: 8 parameters (documented in error_NOTIFICATION) # # ARGUMENTS received : # sequence number in the error log = $1 # error ID = $2 # error class = $3 # error type = $4 # alert flag = $5 # resource name = $6 # resource type = $7 # resource class = $8 # error label = $9 ############################################################################### # Variables: . /usr/HACMP_ANSS/tools/tool_var STATUS=0 ( echo n=error_SDA===============date echo ERROR DETECTED: error_SDA ) | tee -a $ERREURS/hacmp.errlog> /dev/console . $SCRIPTS/error_NOTIFICATION ####################### START OF CUSTOMIZATION ############################## # LOCALNODENAME=$(/usr/sbin/cluster/utilities/get_local_nodename) mail -s Error Alert [email protected] << END An error has been detected on the HACMP cluster node $LOCALNODENAME look at the $LOG file on the node. DEVICE = $6 ADAPTER = $8 The system will be shut down and the users moved to a backup node. END wall System will be shutting Down in 20 Seconds. Please log off now. You will be able to login to your application again within 5 minutes. sleep 20 # This command does a shutdown with takeover of HACMP -gr
/usr/sbin/cluster/utilities/clstop -y -N sleep 5 # #
We now want to shutdown the machine, until our administrator can investigate the problem.
61
The error_NOTIFICATION script, automatically created along with error_SDA in the script subdirectory, looks like this:
#!/bin/ksh ######################################################################## # # name : error_NOTIFICATION # INPUT paremeters : $1 to $8 sent by errpt # Description : called by each error, sends a message # into hacmp.errlog ######################################################################## # Variables: . /usr/HACMP_ANSS/tools/tool_var STATUS=0 G=$(tput smso) F=$(tput rmso) LOG=$ERREURS/hacmp.errlog ################################################################ # main ################################################################ (print ************ Source and cause of error **************** print HOSTNAME=$(hostname) DATE=$(date) print sequence number in error log = $1 print error ID = $2 print error class = $3 print error type = $4 print alert flag = $5 print resource name = $6 print resource type = $7 print resource class = $8 print error label = $9) >> $LOG ####################################################################### # DO NOT FORGET TO set TO_WHOM in error_MAIL . /usr/HACMP_ANSS/tools/ERROR_TOOL/error_MAIL $1 $2 $3 $4 $5 $6 $7 $8 $9 ####################################################################### # DO NOT FORGET TO set QUEUE in error_PRINT # . /usr/HACMP_ANSS/tools/ERROR_TOOL/error_PRINT $1 $2 $3 $4 $5 $6 $7 $8 $9 ####################################################################### return $STATUS
The only customization required to this script might be to uncomment the line near the end that will cause a record of the error to be printed to the printer of your choice. The /usr/HACMP_ANSS/tools/ERROR_TOOL/error_MAIL script, in its default form, will send mail to the root user on the system on which the error occurs. This could also be changed as required. The script is shown below:
62
An HACMP Cookbook
#!/bin/ksh # this script is executed if it has been uncommented in # error_NOTIFICATION # # variable: TO_WHOM should be set to the name of a user # and should be in the form # user or user@hostname ####################################################################### . /usr/HACMP_ANSS/tools/tool_var TO_WHOM=root LOCALNODENAME=$(/usr/sbin/cluster/utilities/get_local_nodename) mail $TO_WHOM << END An error has been detected on the HACMP cluster node $LOCALNODENAME look at the $LOG file DEVICE = $6 ADAPTER = $8 END Finally, if you wish to use the printing option, you will need to set the QUEUE variable in the /usr/HACMP_ANSS/tools/ERROR_TOOL/error_PRINT script to the name of a valid print queue for your system. The script is shown below:
#!/bin/ksh # this script is executed if it has been uncommented in # error_NOTIFICATION # # variable: QUEUE should be set to a local or remote print queue # which has been defined in /etc/qconfig ####################################################################### QUEUE=NONE if [ $QUEUE = NONE ] then FILE_CIBLEE= else FILE_CIBLEE=-P $QUEUE fi (banner Machine: $(hostname ) print =================================================================== print $(date) print =================================================================== print refer to $LOG and look at errpt banner error on device $6 ) | qprt $FILE_CIBLEE #####################################################################
Our error notification tool actually set up two error notification methods, for the errors sda_err1 and sda_err3. If we choose the first one, the following panel is presented:
63
Change/Show a Notify Method Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] sda_err1 Yes [0] All All All [SDA_ERR1] [] [] [] [/usr/HACMP_ANSS/script>
* Notification Object Name * Persistence across system restart? Process ID for use by Notify Method Select Error Class Select Error Type Match ALERTable errors? Select Error Label Resource Name Resource Class Resource Type * Notify Method
+ +# + + + + + + +
F4=List F8=Image
Once we have customized these scripts as we want them, and have checked that they are correctly in the ODM, we are able to test the error notification method, simulating the actual error with the error testing tool.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + + + MENU: Testing errors + + + + Choose one option at a time + + You can choose different errors successively + + + + Enter: end (when you have finished) + + + +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1) end 2) SDA_ERR1 3) SDA_ERR3 Which of the above errors would you like to generate: If you wanted to run error_test to simulate SDA_ERR1, then you would do the following:
64
An HACMP Cookbook
You will have to enter the adapter for which you wish to simulate the error.
For which device are you simulating this error For example enter: scsi2 hdisk4 ent0 The defective device is: serdasda0
The defective unit is: serdasda0 Error id : b135ae8b B135AE8B 1214112795 P FEC31570 1213144095 P B135AE8B 1213141195 P B135AE8B 1213120895 P FEC31570 1213115495 P B135AE8B 1213114095 P FEC31570 1213104695 P B135AE8B 1213101995 P FEC31570 1212180795 P B135AE8B 1212180595 P B135AE8B 1212175595 P BAECC981 1128181495 P
H H H H H H H H H H H H
serdasda0 serdasda0 serdasda0 serdasda0 serdasda0 serdasda0 serdasda0 serdasda0 serdasda0 serdasda0 serdasda0 serdasda0
STORAGE SUBSYSTEM FAILURE UNDETERMINED ERROR STORAGE SUBSYSTEM FAILURE STORAGE SUBSYSTEM FAILURE UNDETERMINED ERROR STORAGE SUBSYSTEM FAILURE UNDETERMINED ERROR STORAGE SUBSYSTEM FAILURE UNDETERMINED ERROR STORAGE SUBSYSTEM FAILURE STORAGE SUBSYSTEM FAILURE MICROCODE PROGRAM ERROR
Each time this error is generated, the following entry will be added to the /var/HACMP_ANSS/log/hacmp.errlog file. This file should be checked periodically, since it will grow over time. The entry added is formatted by the error_NOTIFICATION program which can also send mail messages if desired.
=error_SDA===============Wed Dec 13 11:40:55 CST 1995 ERROR DETECTED: error_SDA ************ Source and cause of error **************** HOSTNAME=goofy DATE=Wed Dec 13 11:40:55 CST 1995 sequence number in error log = 1790 error ID = 0xb135ae8b error class = H error type = PERM alert flag = TRUE resource name = serdasda0 resource type = serdasda resource class = adapter error label = SDA_ERR1
At the same time as the hacmp.errlog is being updated, the error_SDA shell script will be executed, carrying out whatever instructions you have added there. For more information about error notification refer to the AIX Problem Solving Guide .
65
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + + + REMOVING AN ERROR NOTIFICATION OBJECT CLASS + + + + Choose one option at a time + + You can remove different errors successively + + + + Enter: end (when you have finished) + + + +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1) end 2) SDA Amongst this list, which errors would you like to remove: 2 Suppose you choose number 2. The errnotify object class within ODM will automatically be modified, deleting the entry for the treatment of errors generated by the failure of the 9333 serial disk adapter. The error_SDA script will be removed from the script subdirectory. The script is not actually deleted. Rather, it is moved to the backup subdirectory and its name is suffixed with YYYYMMDDhhmmss.
66
An HACMP Cookbook
10.1 Description
HACMP constantly surveys the states of the nodes in the cluster and at any given moment knows if:
A node has failed A node has come up and has rejoined the cluster
Sometimes you need to customize HACMPs reactions to an event because the event script, as provided with HACMP, does not fulfill your needs. For instance, you may have some of the following requirements:
A node goes down. The cluster clients access this node through X.25. What must I do on the backup machine so that HACMP will correctly restart all the applications? A node goes down. The database has also crashed. What procedures do I have to run (rollback, redologs) before restarting the application on the backup machine? A node goes down. How do I recover the print jobs and cron jobs?
HACMP handles all changes to the cluster with cluster events. There are two types of events:
Primary Events - 14 of them, called by the cluster manager Secondary or Sub Events - 16 of them, called by primary event scripts
join_standby network_down
network_down_complete
67
network_up
Occurs when the cluster determines that a network has become available. The event script provided takes no default action, since the appropriate action will be site/LAN specific. Occurs only after a network_up event has successfully completed. The event script provided takes no default action, since the action will be site/LAN specific. Occurs when a node is detaching from the cluster, either voluntarily or due to a failure. Depending on whether the node is local or remote, either the node_down_local or node_down_remote sub event is called. Occurs only after a node_down event has successfully completed. Depending on whether the node is local or remote, either the node_down_local_complete or node_down_remote_complete sub event is called. Occurs when a node is joining the cluster. Depending on whether the node is local or remote, either the node_up_local or node_up_remote sub event is called. Occurs only after a node_up event has successfully completed. Depending on whether the node is local or remote, either the node_up_local_complete or node_up_remote_complete sub event is called. Exchanges or swaps the IP addresses of two network interfaces. NIS and name serving are temporarily turned off during this event. Occurs only after a swap_adapter event has successfully completed. Ensures that the local ARP cache is updated by deleting entries and pinging cluster IP addresses. Occurs when an HACMP event script fails for some reason.
network_up_complete
node_down
node_down_complete
node_up
node_up_complete
swap_adapter
swap_adapter_complete
event_error
acquire_takeover_addr
get_disk_vg_fs
68
An HACMP Cookbook
node_down_local
Releases resources taken from a remote node, stops application servers, releases a service address taken from a remote node, releases concurrent volume groups, unmounts file systems and reconfigures the node to its boot address.
node_down_local_complete Instructs the cluster manager to exit when the local node has completed detaching from the cluster. This event only occurs after a node_down_local event has successfully completed. node_down_remote Unmounts any NFS file systems and places a concurrent volume group in non-concurrent mode when the local node is the only surviving node in the cluster. If the failed node did not go down gracefully, acquires a failed nodes resources: file systems, volume groups and disks and service address.
node_down_remote_complete Starts takeover application servers if the remote node did not go down gracefully. This event only occurs after a node_down_remote event has successfully completed. node_up_local When the local node attaches to the cluster: acquires the service address, clears the application server file, acquires file systems, volume groups and disks resources, exports file systems and either activates concurrent volume groups or puts them into concurrent mode depending upon the status of the remote node(s). Starts application servers and then checks to see if an inactive takeover is needed. This event only occurs after a node_up_local event has successfully completed. Causes the local node to release all resources taken from the remote node and to place the concurrent volume groups into concurrent mode.
node_up_local_complete
node_up_remote
node_up_remote_complete Allows the local node to do an NFS mount only after the remote node is completely up. This event only occurs after a node_up_remote event has successfully completed. release_service_addr release_takeover_addr Detaches the service address and reconfigures to its boot address. Identifies a takeover address to be released because a standby adapter on the local node is masquerading as the service address of the remote node. Reconfigures the local standby into its original role. Releases volume groups and file systems that the local node took from the remote node. Starts application servers. Stops application servers.
69
HACMPevent: name = swap_adapter desc = Swap adapter event happens. Swapping adapter. setno = 0 msgno = 0 catalog = cmd = / usr/sbin/cluster/samples/swap_adapter notify = pre = post = recv = count = 0 HACMPevent: name = swap_adapter_complete desc = Swap adapter event completed. setno = 0 msgno = 0 catalog = cmd = / usr/sbin/cluster/samples/swap_adapter_complete notify = pre = post = recv = count = 0 HACMPevent: name = network_up desc = Network up event happens. setno = 0 msgno = 0 catalog = cmd = / usr/sbin/cluster/samples/network_up notify = pre = post = recv = count = 0 The event you choose to modify with the Event Customization Tool is copied from its original location in /usr/sbin/cluster/events into the /usr/HACMP_ANSS/script directory. The copied event script has its name prefixed by CMD_ The tool will also ask you whether you want to configure a pre, post or recovery event for this event. You can choose one, some, all or none. Depending on your choice(s), the tool will copy one or more shell templates into the
70
An HACMP Cookbook
/usr/HACMP_ANSS/script directory. These templates will have the same name as the event but will be prefixed by PRE_, POS_, or REC_, appropriate to your choice.
# /usr/HACMP_ANSS/tools/EVENT_TOOL/event_select
After replying to the questions asked, you will see the following panel:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + + + MENU: Modifying the events + + + + Choose one option at a time + + You can choose different events successively + + + + Enter: end (when you have finished) + + + +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1) end 2) swap_adapter 3) swap_adapter_complete 4) network_up 5) network_down 6) network_up_complete 7) network_down_complete 8) node_up 9) node_down 10) node_up_complete 11) node_down_complete 12) join_standby 13) fail_standby 14) acquire_service_addr 15) acquire_takeover_addr 16) get_disk_vg_fs Which event would you like to modify: 17) 18) 19) 20) 21) 22) 23) 24) 25) 26) 27) 28) 29) 30) 31) 32) 19 node_down_local node_down_local_complete node_down_remote node_down_remote_complete node_up_local node_up_local_complete node_up_remote node_up_remote_complete release_service_addr release_takeover_addr release_vg_fs start_server stop_server unstable_too_long config_too_long event_error
The tool will create the necessary templates and also create the corresponding event notification script. Suppose, for example, you chose the following two events:
node_down_remote node_up_remote
For each event you have chosen, the tool will ask you whether you would like to add a PRE, POS or REC event with the aid of the following menu:
71
You have selected: 19 node_down_remote Do you want to configure the PRE, POS and REC events ? Choose one option at a time, run as many times as desired Enter end or 4 to exit
You cannot use this procedure to delete events from the ODM To do this you will have to use smit 1) PRE event 2) POST event 3) RECOVERY event 4) end enter your choice ? We will choose PRE and POST events for node_down_remote and a PRE event for node_up_remote.
72
An HACMP Cookbook
HACMPevent: name = swap_adapter desc = Swap adapter event happens. Swapping adapter. setno = 0 msgno = 0 catalog = cmd = / usr/sbin/cluster/events/swap_adapter notify = pre = post = recv = count = 0 . . . HACMPevent: name = node_down_remote desc = Script run when it is a remote node which is leaving the cluster. setno = 0 msgno = 0 catalog = cmd = / usr/HACMP_ANSS/script/CMD_node_down_remote notify = / usr/HACMP_ANSS/script/event_NOTIFICATION pre = / usr/HACMP_ANSS/script/PRE_node_down_remote post = / usr/HACMP_ANSS/script/POS_node_down_remote recv = count = 0 . . . HACMPevent: name = node_up_remote desc = Script run when it is a remote node which is joining the cluster. setno = 0 msgno = 0 catalog = cmd = / usr/HACMP_ANSS/script/CMD_node_up_remote notify = / usr/HACMP_ANSS/script/event_NOTIFICATION pre = / usr/HACMP_ANSS/script/PRE_node_up_remote post = recv = count = 0 A list of the shell scripts the tool will have created in the script subdirectory is given below. The scripts are copies of the standard HACMP scripts, put into this alternate location, so future PTF updates to the HACMP scripts will not immediately overwrite any customizations. If you wish, you can modify or customize them so that the event behaves as you require for your specific cluster configuration.
CMD_node_up_remote CMD_node_down_remote
The templates for the PRE (before), POS (after) and REC (recovery) are also created, where they are requested. For the above example, a PRE event was requested for the node_up_remote event, and PRE and POS events were requested for the node_down_remote event, so the following files are created:
73
PRE_node_up_remote PRE_node_down_remote POS_node_down_remote Also, you can see that the event_NOTIFICATION script is automatically identified as an event notification customization, for any event chosen with the tool. You can also look at the ODM entries for the HACMP events by entering smit hacmp, and selecting the following options:
Manage Node Environment Change/Show Cluster Events Selecting, for example, our local node and the node_down_remote event results in the following panel:
Change/Show Cluster Events Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] mickey node_down_remote Script run when it is > [/usr/HACMP_ANSS/script> [/usr/HACMP_ANSS/script> [/usr/HACMP_ANSS/script> [/usr/HACMP_ANSS/script> [] [0] #
Node Name Event Name Description Event Command Notify Command Pre-event Command Post-event Command Recovery Command Recovery Counter
F4=List F8=Image
If you pressed the right arrow key in the appropriate fields, you could see the locations of the event customization scripts.
74
An HACMP Cookbook
#!/bin/ksh # Program : PRE_node_up_remote # Role : run before the event # Arguments : $1 = event name # and the parameters passed in # Written : Wed Dec 13 16:50:41 CST 1995 # Modified : . /usr/HACMP_ANSS/tools/tool_var STATUS=0 (print \n=PRE-EVENT===============$(date) print on : $(hostname) print BEFORE : $1 shift print Input Parameters: $* ) >> $LOG ##################################################################### # Enter your customizing code here mail -s Event Alert [email protected] << END Node goofy is about to re-enter the cluster. Users will be migrated back from node mickey. END wall Machine goofy has been recovered and is coming on-line. There will be a short interruption for users of machine goofy. Please logoff your application now. You will be able to login to your application again within 5 minutes. sleep 10 ##################### END OF CUSTOMIZATION ########################## return $STATUS
In a similar way, you can customize the other PRE and POST event scripts.
75
=ODM_EVENT====================Wed Dec 13 16:43:27 CST 1995 Modification of object ++ node_up_remote ++ in HACMPevent adding customized procedures PRE return code = 0 =ODM_EVENT====================Wed Dec 13 16:50:43 CST 1995 Modification of object ++ node_down_remote ++ in HACMPevent adding customized procedures PRE POS return code = 0 =NOTIFICATION===============Mon Dec 18 14:21:11 CST 1995 on: mickey =PRE-EVENT===============Mon Dec 18 14:21:12 CST 1995 on : mickey BEFORE : node_down_remote Input Parameters: goofy graceful START: node_down_remote arguments: goofy graceful =POST-EVENT===============Mon Dec 18 14:21:12 CST 1995 on : mickey AFTER : node_down_remote return code : 0 =NOTIFICATION===============Mon Dec 18 14:21:13 CST 1995 on: mickey OUTPUT: node_down_remote return code : 0
76
An HACMP Cookbook
Cluster configuration Details of any HACMP customization you have carried out Scripts you have written System files used/modified by HACMP
You have three options for printing the output: 1. ASCII file which can be printed out under AIX 2. Bookmaster file for printing out on a VM host 3. PostScript file produced by the troff command The report for each machine is called /tmp/HACMPdossier-<hostname>-vm or /tmp/HACMPdossier-<hostname>-ascii or /tmp/HACMPdossier-<hostname>-ps depending upon whether you replied vm or ascii or postscript when you ran the documentation tool. Nothing prevents you from doing all of them. Obviously, you would need to run the tool multiple times. An example report, from the doc_dossier tool, is provided in Part 1, Cluster Documentation Tool Report on page 137.
# /usr/HACMP_ANSS/tools/DOC_TOOL/doc_dossier
Once the command has executed, a menu will appear on the screen. You should select option 4 ) Save the output on a UNIX diskette. If you dont have a formatted diskette, choose option 3 first. Take the diskette produced by the first step to the second cluster node, and restore it by issuing the following command:
# tar -xvf/dev/fd0 Once you have run doc_dossier on this machine, and returned to the menu, choose option 4. The diskette now contains the configurations of the two machines.
77
78
An HACMP Cookbook
| |
This document designates which hardware has been qualified for use with HACMP for AIX (herafter referred to as HACMP). The designated hardware should only be used on an appropriate RISC System/6000 Platform or 9076 Scalable POWERParallel Platform (SP/2). Please refer to the processor documentation to be sure that appropriate hardware is obtained. This document contains the following information:
The main body of the document and Appendix A contain the disk adapters, disk enclosures and associated cabling; Appendix B contains other hardware, e.g. processors and network adapters.
The document is intended to convey information pertinent to HACMP support so cabling methods and hardware features unrelated to HACMP are not shown. If a piece of hardware is not listed it should be assumed that the hardware is not supported by HACMP. The following are the major changes since the last version of this matrix:
| | | | | |
Serial Storage Architecture (SSA) supported on HACMP Version 3.1.1 Enhanced SCSI-2 Fast/Wide Adapter/A (FC 2412) supported on HACMP Version 3.1.1 Target Mode on SCSI-2 Fast/Wide Adapters (FC 2412 and FC 2416) supported on HACMP Version 3.1.1 IBM RISC System/6000 7013 Model 591, 7015 Model R21 and 7015 Model R3U
79
The disk storage portions of the document contain brief descriptions of many of the disk drive adapters, disk enclosures and associated cabling in tabular form. These tables are grouped as follows and unless specifically noted otherwise, the hardware in one group can not be used with hardware in another group:
One of the columns in the disk tables is titled HACMP Rlse and contains two subheadings:
Non-concurrent disk access, denoted by an NC in the column heading (Modes 1 and 2) Concurrent disk access, denoted by a CC in the column heading (Mode 3)
Under each subheading in the disk tables is noted the release of HACMP in which the hardware was first supported for that configuration. The following conventions were used for this data:
If the specified release is prior to the current release, then the hardware is still supported unless noted otherwise. If the column has a TBD in it then no commitment has been made to support the hardware; the hardware might or might not be supported in the future. If the column has an N/A in it then there are no plans to support the hardware.
Attachment A contains the SCSI-1 SE and SCSI-2 SE device support. Existing HACMP configurations using SCSI SE devices continue to be supported. New HACMP installations must use SCSI-2 differential or serial devices due to the unavailability of the PTT cables. If you have further questions about disk cabling you can also consult the following information:
RISC System/6000, System Overview and Planning, Chapter 7: Cables and Cabling (GC23-2406) A copy of the SCSI cabling portion of publication GC23-2406 can be found on MKTTOOLS(RS6CABLE) A pictorial view of some of SCSI cabling for HACMP is available in MKTTOOLS(HASCSI6)
(The proper hardware documents take precedence over the hardware information contained in these tables and should be used to resolve any conflicts.)
80
An HACMP Cookbook
SCSI-2 DIFFERENTIAL DEVICE SUPPORT = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = The following conventions are used in this section:
All 16 bit adapters and enclosures have an * next to their feature codes. All 16 bit cables or 8 bit to 16 bit cables have an * next to their feature codes. The 16 bit implementation is generally known as SCSI Fast/Wide. Enclosures which can be cabled with either 16 bit or a combination of 8 bit and 16 bit cables have @ next to their feature codes. All 8 bit adapters, enclosures and cables have no indication next to their feature codes.
ADAPTERS -------Maximum HACMP Rlse Feature Cable -----------(FRU #) MBPS Length NC CC ------- ----- --------- ----- ----2412* 20 25 m 3.1.1 3.1.1 2416* 20 25 m 2.1 2.1 (65G7315) 2420 10 19 m 1.2 1.2 (43G0176)
| | |
Notes: -----1 - Eight external SCSI IDs and eight LUNs are available on these buses. In an HACMP environment two or more of the addresses are used for hosts so the bus can have up to a maximum of six other devices (subject to cabling length and device constraints). 2 - Only SCSI-2 differential devices can be attached to a SCSI-2 differential adapter. 3 - Cable length is measured from end to end and includes the cabling which is within any attached subsystems. Exception: For the 7135, no internal SCSI-2 SE cabling is included. 4 - In HACMP configurations the differential terminating resistors U8 and U26 must be removed from the 2420 adapter; these resistors are located next to the external SCSI bus connector on the adapter card. 5 - 2412 and 2416 adapter can execute in either 8 bit or 16 bit mode; a SMIT option exists to set the adapter to the desired width. All the devices on the bus must of the same type. 6 - HACMP does not support target mode SCSI on the 2412 or the 2416 adapter prior to HACMP Version 3.1.1; on HACMP Version 3.1.1 APAR IX52772 is required. 7 - In HACMP configurations the three built-in differential terminating resistors (labelled RN1, RN2 and RN3) must be removed from the 2412 and 2416 adapters. 8 - In HACMP Version 4.1 sixteen external SCSI IDs and 32 LUNs are available on these buses. In an HACMP environment two or more of the addresses are used for hosts so the bus can have up to a maximum of fourteen other devices (subject to cabling length and device constraints). Prior to HACMP Version 4.1 eight external SCSI IDs and eight LUNs are available on these buses. In an HACMP environment two or more of the addresses are used for hosts so the bus can have up to a maximum of six other devices (subject to cabling length and device constraints). 9 - The 2412 and 2416 can not be assigned SCSI IDs 0, 1 or 8 through 15.
81
ENCLOSURES ---------# # Media HACMP Rlse Per Dsk Size Disk Rate ---------Bus Drv GB Feat MBPS NC CC Notes --- --- --- ---- ---- --- --- ----4 1 2.0 5.22 2.1 N/A (1) 6 1 2.0 5.22 2.1 N/A (1,8) 14 1 2.2 9-12 3.1 N/A (1,8) 14 1 4.5 9-12 3.1 N/A (1,8) 2 4 1.0 2565 3.0 1.2 N/A (1,4) 2 4 2.0 2585 5.22 1.2 N/A (1,4) 2 4 1.0 2565 3.0 1.2 N/A (1,4) 2 4 2.0 2585 5.22 1.2 N/A (1,4) 1 16 2.0 2821 5.22 2.1 N/A (1,5) 1 16 2.2 2712 9-12 3.1 N/A (1,5) 1 16 4.5 2714 9-12 3.1 N/A (1,5) 12 2.0 2720 5.22 N/A N/A (1) 2 30 1.3 2715 5.22 (7) (7) (1,2,3,7) 2 30 2.0 2725 5.22 (7) (7) (1,2,3,7) 2 30 2.2 2825 9-12 (7) (7) (1,2,3,7) 2 30 4.5 2845 9-12 (7) (7) (1,2,3,7) 2 30 1.3 2715 5.22 4.1 4.1 (1,2,3) 2 30 2.0 2725 5.22 4.1 4.1 (1,2,3) 2 30 2.2 2825 9-12 4.1 4.1 (1,2,3) 2 30 4.5 2845 9-12 4.1 4.1 (1,2,3) 2 8 1.0 1011 5-6 2.1 2.1 (1,6) 2 8 2.0 1008 5.22 2.1 2.1 (1,6) 2 8 1.0 1020 5.22 2.1 2.1 (1,6) 2 8 2.0 1030 9-12 2.1 2.1 (1,6) 2 8 4.4 1040 9-12 2.1 2.1 (1,6) 2 8 1.0 1020 5.22 2.1 2.1 (1,6) 2 8 2.0 1030 9-12 2.1 2.1 (1,6) 2 8 4.4 1040 9-12 2.1 2.1 (1,6)
7135-010 7135-110@
7135-210@
Notes: -----1 - All SCSI-2 Differential devices use one bus address per disk except the 7135, 3514 and the 7137 which use one address per controller. All devices on the same bus must be of the same type unless stated otherwise. 2 - For maximum availability the 7135 array should be configured with two controllers. HACMP supports RAIDs 1, 3 and 5. The external interface for the 7135 is SCSI-2 differential; however, internally the disk drives are SCSI-2 SE. 3 - The specified disk feature provides a full bank of five disks. Disks in the 7135 array are normally configured in banks of 5 disks each, for a total capacity of 30 disks. 4 - 9334-011 and 9334-501 enclosures can be daisy chained with up to two enclosures and six disk drives on a SCSI bus. No tape drives are permitted. 5 - With two hosts the 7134-010 without an internal expansion unit can support up to eight drives on one bus. With an internal expansion unit the maximum number of drives with two hosts and one bus is fourteen. With an internal expansion unit the maximum number of drives with two hosts and two buses is sixteen. 6 - Even though the 3514 and 7137 are RAID devices, they have single
82
An HACMP Cookbook
points of failure in the SCSI bus and in the controller. If this is unacceptable, one or more additional enclosures with LVM mirroring are required; a total of three enclosures with quorum provides the highest availability. Concurrent access mode (HACMP Mode 3) will not support mirroring on SCSI devices so the single points of failure noted above would exist in this configuration. 7 - HACMP Version 4.1 does not support the 7135-110. The 7135-110 is supported in HACMP Version 2.1 and later releases, up to but not including HACMP Version 4.1. 8 - 7204 Models 315, 317 and 325 can be used on the same SCSI-2 differential bus.
83
CABLES -----Feature Attachd Attachd Len (Part #) From To (m) Notes --------- ------- ----------- -------------------------------CONFIGURED ON SERVERS WITH 8 BIT WIDE ADAPTER ********************************************* 2422 Adapter 9334 cable, .765 Y-cable: (52G7348) (2420) 3514 cable*, o base to adapter; 7137 cable*, o 8 bit long leg to 7204-215 - 9334 cable, cable, - 3514 cable, terminator, - 7137 cable or 2423 - 7204-215 cable; o 8 bit short leg is - terminated or - connected to a 2423 cable to add additional processors (>2 processors) to a shared differential 8-bit bus N/A Y-cable (52G7350) (2422) 2423 Y-cable (52G7349) (2422, 2427) self 0 Terminator, 8 bit, included when the Y-cable is ordered. Cable can be used to attach a third and fourth system to a shared differential 8 bit bus.
CONFIGURED ON SERVERS WITH 16 BIT WIDE ADAPTER ********************************************** 2427* Adapter 9334 cable, .765 Y-cable: (52G4349) (2412*, 7204-215 o 16 bit base to adapter; 2416*) cable, o 8-bit long leg to 2424*/2425*, - 9334 cable or terminator* - 7204-215 cable; o 8-bit short leg is - terminated or - connected to a 2423 cable to add additional processors (>2 processors) 2426* Adapter (52G4234) (2412*, 2416*) 7204-3XX .94 cable*, 3514 cable*, 7137 cable*, 7134-010 cable*, 2424*, 2425*, terminator* Y-cable: o 16 bit base to adapter; o 16-bit long leg to - 7204-3XX cable, - 3514 cable, - 7137 cable or - 7134-010 cable; o 16-bit short leg is terminated or is connected to a 2424 or 2425 cable to add additional processors (>2 processors) Y-cable: o base to adapter; o 16-bit long leg to - 7135-210 cable; o 16-bit short leg is terminated
84
An HACMP Cookbook
or is connected to a 2424 or 2425 cable to add additional processors (>2 processors) 2426* Adapter (52G4234) (2416*) 7135-110 .94 cable*, 2424*, 2425*, terminator* Y-cable: o base to adapter; o 16-bit long leg to - 7135-110 cable; o 16-bit short leg is terminated or is connected to a 2424 or 2425 cable to add additional processors (>2 processors) Terminator, 16-bit, included when the Y-cable is ordered. Terminator, 8 bit, included when the Y-cable is ordered. Cable can be used to attach a third and fourth system to a shared differential 16 bit bus. 2424 (52G4291) 2425 (52G4233)
N/A* Y-cable (61G8324) (2426*) N/A Y-cable (52G7350) (2427*) 2424*/2425*Y-cable (2426*)
self
self
.6 2.5
CONFIGURED ON 7204-215 ********************** 2854/2921 Y-cable 7204-215 (2422, 2427*) 2848 7204-215 (74G8511) 7204-215
Needed on 7204-215 at each end of the shared unit. 0.6 2854 (87G1358) 4.75 2921 (67G0593) 2.0 Used between 7204-215 s on the shared string.
CONFIGURED ON 7204-315, 7204-317, 7204-325 ****************************************** 2845*/2846* Y-cable 7204-315*, 0.6 2845 (52G4291) (2426*) 7204-317*, 2.5 2846 (52G4233) 7204-325* Needed on 7204-3XX at each end of the shared unit. 2845*/2846* 7204-315* 7204-315*, 0.6 2845 (52G4291) 7204-317* 7204-317*, 2.5 2846 (52G4233) 7204-325* 7204-325* Used between 7204-3XX s on the shared string. CONFIGURED ON 9334-011 ********************** 2921/2923 Y-cable 9334-011 (2422, 2427*)
Needed on 9334-011 at each end of the shared unit. 4.75 2921 (67G0593) 8.0 2923 (95X2494) To conform to the cable length limit, the 8.0 meter cable must be paired with the 4.75 meter cable. 2.0 Allows daisy chaining of two 9334-011 enclosures
85
2939 9334-501 9334-501 (95X2498) CONFIGURED ON 7134-010 ********************** 2902-2918* Y-cable 7134-010* (2426*)
Needed on 9334-501 at each end of the shared unit. 1.48 2931 (70F9188) 2.38 2933 (45G2858) 4.75 2935 (67G0566) 8.0 2937 (67G0562) To conform to the cable length limit, the 8.0 meter cable must be paired with a shorter cable. 2.0 Allows daisy chaining of two 9334-501 enclosures
2.4 4.5 12.0 14.0 18.0 CONFIGURED ON 7135-110 AND 7135-210 *********************************** 2919 Y-cable 7135 0 (61G8323) (2422) cable* 2901*-14* 2919, Y-cable (2426*) 7135@
Needed on 7134-010 at each end of the shared unit. 2902 (88G5750) 2905 (88G5749) 2912 (88G5747) 2914 (88G5748) 2918 (88G5746)
Cable interposer; connects 8 bit Y-cable to 16 bit 29XX cable for 7135 Connects 7135 array controller to an interposer (2919) or to a 16 bit Y-cable 2901 (67G1259) 2902 (67G1260) 2905 (67G1261) 2912 (67G1262) 2914 (67G1263) 2918 (67G1264) To conform to the cable length limit, the 12m, 14m and 18m cables must be paired with shorter cables.
CONFIGURED ON 3514 ****************** 2002* Y-cable (2422*) 2014* 3001* Y-cable (2426*) 3514*
3514@
4.0
3514@ 3514@
4.0 2.0
Needed on 3514 at each end of the shared unit (8-bit to 16-bit cable) Needed on 3514 at each end of the shared unit Allows daisy chaining of two 3514 units
7137@
4.0
7137@
4.0
Needed on 7137 at each end of the shared unit (8-bit to 16-bit cable) Needed on 7137 at each end of
86
An HACMP Cookbook
3001*
(2426*) 7137*
7137@
2.0
Notes: -----1 - After configuring a SCSI-2 differential bus for the HACMP environment , use the following checklist to validate the configuration: - At least two and no more than four processors are attached to the bus. - Only SCSI-2 differential cables, adapters and devices were used. - A Y-cable is attached to each processor on the bus. - The bus must have a terminator on the short leg of each Y-cable which is at the end of the bus (total of 2 terminators per bus). - 8 bit wide and 16 bit wide enclosures can not be used on the same bus. - You must not exceed maximum SCSI-2 differential bus lengths, including the cabling within enclosure cabinets. Cable lengths within enclosure cabinets are: - 7204-215 nil - 7204-315 nil - 7204-317 nil - 7204-325 nil - 9334-011 3.1 meters - 9334-501 2.66 meters - 7134-010 3.0 meters/bus - 7135-110 0.66 meters/controller - 7135-210 0.66 meters/controller - 3514-2XX 1.0 meters - 7137-XXX 0.2 meters The publication Common Diagnostics and Service Guide (SA23-2687) contains additional information about cabling. 2 - For a given cable, any item listed in the Attachd From column can be connected to any item in the Attachd To column. Y-cables do not follow this rule; they have three legs and the above tables show what connects to each of the legs. 3 - The configurations in this table assume that processors are at the two ends of the bus (just prior to each terminator) and all the storage devices are connected to the bus between the processors. 4 - The recommended 7135 configuration for HACMP is: - Two controllers on the 7135, each controller on a separate SCSI-2 differential bus - Each controller is attached to every processor in the cluster. This yields two different SCSI-2 differential buses, each bus is connected to one controller and to every processor in the cluster. The Disk Array Manager software in the processors manages access to the different controllers and will switch controllers if one of the controller fails; this occurs independently of HACMP. 5 - SCSI buses can not include non-disk devices (i.e. tape, CD ROM).
87
ADAPTERS -------HACMP Rlse Feature ---------(FRU #) MBPS NC CC Notes ------- ----- --- --- ------6210 8 1.1 1.2 (1,2,3) (52G1071) 6211 8 1.1 1.2 (1,2,3) (00G3357) 6212 8 1.2 1.2 (1,2,3) (67G1755) Notes: -----1 - Only serial devices can be attached to a serial adapter. 2 - For serial adapters the maximum cable length is measured from the adapter to the subsystem controller. The cabling which might be within a subsystem is not included. 3 - Serial adapters contain four serial link connectors to allow the attachment of up to four serial subsystems (e.g. four 9333 s). Data transfer rates on the microchannel side of the adapter are: 6210 - 40 MBPS, used for 9333 Model 010 or Model 500 6211 - 80 MBPS, used for 9333 Model 010 or Model 500 6212 - 40 or 80 MBPS, used for 9333 Model 011, Model 501, Model 010 or Model 500
ENCLOSURES ---------# Dsk Drv --4 4 4 4 4 4 4 4 4 4 Size -GB----.857 1.07 .857 1.07 2.0 .857 1.07 .857 1.07 2.0 Disk Feat ----3100 3110 3100 3110 3120 3100 3110 3100 3110 3120 Media Rate MBPS ---3.0 3.0 3.0 3.0 5.22 3.0 3.0 3.0 3.0 5.22 HACMP Rlse ---------NC CC Notes --- --- ----1.1 1.2 1.1 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.1 1.2 1.1 1.2 1.2 1.2 1.2 1.2 1.2 1.2
9333-500 9333-501
Notes: -----1 - The following table shows AIX Release 3.2.3E -----HACMP Release 1.2 -----Configuration NC CC -- -9333 010/500 2 2 PTF # - 88
An HACMP Cookbook
N -
N -
2 -
2 a
2 -
2 -
2 -
2 b
4 -
4 c
N = Not supported 2 = 2-way is supported, if PTF# is not specified then the support is in the base system. Under AIX 3.2.4 Feature codes 4001 and 4002 of the 9333-011 and -501 subsystem are not permitted. 4 = 2-, 3- and 4-way are supported, if PTF# is not specified then the support is in the base system. If either 3- or 4-way is desired then Feature 4001 must be installed on the 9333-011 or -501. a = U421401 or supersede b = U425614 or supersede c = U426577 or supersede 2 - 9333 Models 010 and 500 come standard with two ports connected to one controller card; the controller card controls up to 4 disks inside the enclosure. The ports can be connected to two different hosts using one serial link connector on each host adapter. An upgrade is available to go from a 9333 Model 010 to a 9333 Model 011, or from a 9333 Model 500 to a 9333 Model 501. 3 - 9333 Models 011 and 501 come standard with two ports connected to one controller card; the controller card controls up to 4 disks inside the enclosure. The ports can be connected to two different hosts using one serial link connector on each host adapter. With the 9333 Models 011 or 501, the number of attachable hosts can be expanded by ordering the appropriate expansion features, either to 4 systems (feature 4001) or to 8 systems (features 4001 and 4002). 4 - The data transfer rate for a serial bus is 8 MB/sec.
89
CABLES -----Notes: -----1 - There are no special cabling requirements for HACMP for AIX. The publication Common Diagnostics and Service Guide (SA23-2687) contains information about cabling serial buses. 2 - Each 9333 enclosure comes standard with one attachment cable. Additional cables need to be ordered to attach it to more than one system.
90
An HACMP Cookbook
| | | | | | | | | | | | | | |
ADAPTERS -------HACMP Rlse Feature ---------(FRU #) MBPS NC CC Notes ------- ----- --- --- ------6214 80 (1) (1) (1,2) Notes: -----1 - The 6214 adapter is supported on HACMP Version 3.1.1 only; APAR IX52776 is required. 2 - Only two 6214 adapters can be put into a single SSA loop; one in each processor in the cluster.
| | | | | | | | | | | | | | | | | | | | | | | | | |
ENCLOSURES ---------# Media HACMP Rlse Dsk Size Disk Rate ---------Model Drv -GB- Feat MBPS NC CC Notes -------- --- ----- ---- ----- ---- ---- ----7133-010 16 1.1 31XX 35 (1) (1) (1,2,3) 16 2.2 32XX 35 (1) (1) (1,2,3) 16 4.5 34XX 35 (1) (1) (1,2,3) 7133-500 16 1.1 31XX 35 (1) (1) (1,2,3) 16 2.2 32XX 35 (1) (1) (1,2,3) 16 4.5 34XX 35 (1) (1) (1,2,3) Notes: -----1 - The 7133-010 and 7133-500 are supported on HACMP Version 3.1.1 only; APAR IX52776 is required. 2 - The disk features are YYXX where YY is as shown in the table above and XX is 01, 08 or 16 for one, eight or sixteen 3 - Up to 96 disks can be supported in a single SSA loop. CABLES -----Notes: -----1 - There are no special cabling requirements for HACMP. The publication Common Diagnostics and Service Guide (SA23-2687) contains information about cabling.
91
ATTACHMENT A
Attachment A contains the SCSI-1 SE and SCSI-2 SE device support. Existing HACMP configurations using SCSI SE devices continue to be supported. New HACMP installations must use SCSI-2 differential or serial devices due to the unavailability of the PTT cables. The SCSI SE PTT cables (FC 2914 and FC 2915) are available via an RPQ but only with prior Austin lab approval of the specific configurations. Two of these cables are required for a minimum HACMP configuration. None of the equipment in this attachment can be configured in a new HACMP installation.
ADAPTERS -------Feature (FRU #) MBPS ------- ----2835 4 (31G9729) 2410 10 (52G5484 52G7509) 2415 20 T Maximum HACMP Rlse Y Cable ---------P Length NC CC Notes - --------- --- --- --------1 6 m 1.1 N/A (1,2,3,4) 2 4.75 m 1.2 N/A (1,2,3,5)
note 7
N/A N/A
(1,2,3,6,7)
Notes: -----1 - Eight external device addresses are available on these buses. In an HACMP environment two of the addresses are used for hosts so the bus can have up to six other devices (subject to cabling length constraints). 2 - Only SCSI SE devices can be attached to a SCSI SE adapter. 3 - Cable length is measured from one end of the bus to the other and includes the cabling which is within any attached disk subsystem enclosures. 4 - In an HACMP environment the 2835 adapter can only be used with SCSI-1 SE disk enclosures. Minimum assembly numbers which can be used for an HACMP configuration is part #31G9722 and Field Replaceable Unit (FRU) #31G9729. For HACMP configurations the 50 position card edge terminator must be removed, and the jumper J1 must be removed. The removed jumper can be moved over and attached to only one row of pins for storage, the row furthest from the the external SCSI connector. 5 - In an HACMP environment the 2410 adapter can only be used with the 7203 and/or 7204 enclosures utilizing the 1 GB SCSI-2 SE disk, (7203-001 with feature 2320 or 7204-001). For HACMP configurations the 50 position card edge terminator must be removed, and the jumper P3 must be removed. The removed jumper can be moved over and attached to only one row of pins for storage, the row furthest from
92
An HACMP Cookbook
the external SCSI connector. 6 - This adapters can execute in either 8 bit or 16 bit mode; a SMIT option exists to set the adapter to the desired width. All the devices on the bus must of the same type. 7 - Maximum cable length varies with the configuration: - 6m when attached to 9334-500 - 3m what attached to anything else.
ENCLOSURES ---------T # # Trans. Rate Y Per Dsk Size Disk MBPS Model P Bus Drv -GB- Feat Media Bus -------- - --- --- ----- ---- ---- --7203-001 1 4 1 .355 2300 1.87 4 1 4 1 .670 2310 1.87 4 2 2 1 1.0 2320 5.0 5 7204-320 1 5 1 .320 2.0 4 7204-001 2 2 1 1.0 3.0 5 7204-010 2 1 1.0 3.0 5 9334-010 1 4 .670 2510 1.87 4 1 4 .857 2530 3.0 4 1 4 1.37 2570 4.5 5 2 4 2.0 2580 5.22 10 2 - 3+1 2.4 2590 3.0 10 2 1.0 2555 3.0 10 9334-500 1 1 4 .670 2510 1.87 4 1 1 4 .857 2530 3.0 4 1 1 4 1.37 2570 4.5 5 2 4 2.0 2580 5.22 10 2 - 3+1 2.4 2590 3.0 10 2 1.0 2555 3.0 10
HACMP Rlse ---------NC CC ---- --1.1 N/A 1.1 N/A 1.2 N/A 1.1 N/A 1.2 N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 1.1 N/A 1.1 N/A 1.2 N/A N/A N/A N/A N/A N/A N/A
Notes -----------(5) (5) (3,5) (3,5) (1) (1) (1) (1) (1,4) (1,4) (6) (6) (2,6) (4) (4)
Notes: -----1 - The internal cabling of the 9334-010 makes it unsuitable for sharing between systems. Therefore it is not supported by HACMP. Only the 9334-500 is supported, with the features as noted in the table above. 2 - Disk fencing must not be enabled in an HACMP environment unless the fix documented in the HACMP Version 1.2 Release Notes is applied. 3 - For use with HACMP in a twin-tailed environment, 1 GB disks for the 7203 and 7204 enclosures (7203-001 with feature 2320, 7204-001) are only tested and supported using the SCSI-2 SE adapter (feature 2410). 4 - The 2590 which uses two bus addresses is two 1.2 GB disks within a single package. The 2555 drive is available only as the fourth drive within a 9334 which contains 3 2590 s. 5 - The limitation in the table under # Per Bus is not a cabling limitation but a testing limitation and only the specified number of devices is supported on the bus. (Cable limitations allow one more device to be connected than is shown.) 6 - 9334-500 in an HACMP environment is supported only on the 2835 adapter.
93
CABLES -----Feature Attachd (Part #) Type From --------- ----------- ------3130 SCSI-1/2 SE 7203, (31F4222) 7204 2915 SCSI-1 SE (00G0959)
Attachd Len To (m) Notes -------- ---- -------------------------------7203, 0.66 Device-to-Device cable. 7204 Used between devices in a shared string. Adapter 7203, 1.57 Passthru terminator (2835) 7204 (PTT) cable, withdrawn from marketing. See note #4. Adapter 9334-500 1.48 Passthru terminator (2835) (PTT) cable, withdrawn from marketing. See note #4. Adapter 7203, 1.57 Passthru terminator (2410) 7204 (PTT) cable, withdrawn from marketing. See note #4.
Notes: -----1 - After configuring a SCSI SE bus for the HACMP environment, use the following checklist to validate the configuration: - Two processors must be attached to the bus. - Only SCSI SE cables, adapters and enclosures can be used. - A shared SCSI SE bus requires two PTT cables, one attached to each adapter. - You must not exceed maximum SCSI SE bus lengths, including the cabling within enclosure cabinets. The SCSI SE maximum bus cable lengths are: - SCSI-1 SE 6 meters - SCSI-2 SE 4.75 meters Cable lengths within enclosure cabinets: - 7203 nil - 7204 nil - 9334-010 not supported by HACMP - 9334-500 2.66 meters The publication Common Diagnostics and Service Guide (SA23-2687) contains additional information about cabling. 2 - For a given cable, any item listed in the Attachd From column can be connected to any item in the Attachd To column 3 - SCSI bus can not include non-disk devices (i.e. tape, CD ROM) 4 - The PTT cables are available via an RPQ but only after the Austin lab approves the specific SCSI SE bus configuration(s) involved. FC 2915 is available via RPQ #8A0759; FC 2914 is available via RPQ #8A0758.
94
An HACMP Cookbook
| | | | | | | | | | | | | |
PROCESSORS 7009-C10 7009-C20 7011-22W 7011-220 7011-23S 7011-23T 7011-23W 7011-230 7011-25S 7011-25T 7011-25W 7011-250 7012-32E 7012-32H
7012-320 7012-34H 7012-340 7012-350 7012-355 7012-36T 7012-360 7012-365 7012-37T 7012-370 7012-375 7012-380 7012-39H 7012-390
7013-52H 7013-520 7013-53E 7013-53H 7013-530 7013-540 7013-55E 7013-55L 7013-55S 7013-550 7013-56F 7013-560 7013-57F 7013-570
7013-58F 7013-58H 7013-580 7013-59H 7013-590 7013-591 7015-R10 7015-R20 7015-R21 7015-R24 7015-930 7015-95E 7015-950 7015-97B
7015-97E 7015-97F 7015-970 7015-98B 7015-98E 7015-98F 7015-980 7015-99E 7015-99F 7015-99J 7015-99K 7015-990
Symmetric Multi-Processors 7012-G30, 7013-J30, 7015-R30 and 7015-R3U 9076 Scalable POWERParallel Platforms (SP/2) - supported on HACMP Version 3.1.1 but not HACMP Version 4.1
Asynchronous Communication Adapters =================================== FC 2930 - 8 Port Async Adapter - EIA-232 FC 2950 - 8 Port Async Adapter - MIL-STD 188 FC 2955 - 16 Port Async Adapter - EIA-232 FC 6400 - 64 Port Async Controller FC 8128 - 128 Port Async Controller
Local Area Network (LAN) Communication Adapters =============================================== FC 2402 - Network Terminal Accelerator - High performance ethernet adapter permitting up to 256 login sessions when used in conjunction with a 7318 Model S20 Serial Communications Network Server. HACMP supports only the MAC Layer Interface for the adapter, not the HTY functionality. FC 2403 - Network Terminal Accelerator - High performance ethernet adapter permitting up to 2048 login sessions when used in conjunction with a 7318 Model S20 Serial Communications Network Server HACMP supports only the MAC Layer Interface for the adapter, not the HTY functionality. FC 2720 - Fiber Distributed Data Interface Adapter FC 2722 - Fiber Distributed Data Interface Dual Ring Upgrade KIT FC 1906 - Fiber Channel Adapter/266 FC 2723 - FDDI / Fiber Dual-Ring Upgrade FC 2724 - FDDI - Fiber Single-Ring Adapter FC 2725 - FDDI - STP Single-Ring Adapter FC 2726 - FDDI - STP Dual-Ring Upgrade FC 2970 - Token-Ring High-Performance Network Adapter FC 2972 - Auto Token-Ring Lanstreamer 32 MC Adapter 95
FC 2972 - Auto Token-Ring Lanstreamer 32 MC Adapter FC 2980 - Ethernet High-Performance LAN Adapter FC 4224 - Ethernet 10BASET Transceiver (Twisted Pair)
RS-232 Serial Network ===================== FC 3107 - C10 Serial Port Converter FC 3124 - 3.7 Meter Serial to Serial Port Cable FC 3125 - 8 Meter Serial to Serial Port Cable
Other Adapters / Subsystems =========================== 7318-P10 Serial Communications Network Server -allows attachment of async devices and parallel printers to an Ethernet LAN attached RISC System/6000 (Most commonly concerned with HACMP configurations when used with FC 2402/3 Network Terminal Accelerator) 7318-S20 Serial Communications Network Server -allows attachment of async devices and parallel printers to an Ethernet LAN attached RISC System/6000 (Most commonly concerned with HACMP configurations when used with FC 2402/3 Network Terminal Accelerator) FC 2860 - Serial Optical Channel Converter FC 4018 - High Performance Switch (HPS) Adapter-2 - supports node fallover on an SP/2
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = end of document = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
96
An HACMP Cookbook
Feature 3124 (Part number 88G4853) - 3.7 meter cable Feature 3125 (Part number 88G4854) - 8.0 meter cable
Each of these cables has the null modem pinout connections required to make a direct connection between serial ports.
97
CPU RS232 port (*) 59F3740 connect 10-pin to 25-pin (30cm long)
6323741
58F2861
terminal/printer interposer
(*) 59F3740
98
An HACMP Cookbook
Id
Label
Type CL Error_Description UNKN H PERM H TEMP S PERM S TEMP H UNKN H TEMP H PERM S TEMP S PERM H PERM H PERM S PERM S TEMP H PERM H PERM H TEMP S TEMP S TEMP S TEMP S TEMP S UNKN H UNKN S UNKN S PERM H PERM H UNKN H UNKN H PERM S TEMP S TEMP H TEMP H TEMP S PERM S TEMP H PERM S TEMP H TEMP H PERM H PERM H PERM H TEMP H TEMP S UNKN H TEMP S TEMP O TEMP S UNKN S PERM S UNKN H UNDETERMINED ERROR X-25 RESTART REQUEST BY X.25 ADAPTER RESOURCE UNAVAILABLE Cant Allocate bd_t Structures X-9 FRAME TYPE W RECEIVED UNDETERMINED ERROR COMMUNICATION PROTOCOL ERROR SOFTWARE PROGRAM ERROR Host independent initialization failed ADAPTER ERROR Memory failure Configuration failed: bad bus type Adapter FEPOS Execution Failed STORAGE SUBSYSTEM FAILURE DISKETTE MEDIA ERROR ADAPTER ERROR ttyhog over-run SOFTWARE PROGRAM ERROR REMOVE ADAPTER COMMAND RECEIVED SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR Failed to write Volume Group Status Area Software error: iocc not configured Software error: cannot find slih X-33 (DCE) RESET INDICATION X.25 ADAPTER Memory failure Bad block relocation failure - PV no lon Electrical power resumed SOFTWARE PROGRAM ABNORMALLY TERMINATED CONFIGURATION OR CUSTOMIZATION ERROR X-34 (DCE) RESTART INDICATION X.25 ADAPT COMMUNICATION PROTOCOL ERROR System reset interrupt received Cannot access memory: 64 port controller MICROCODE PROGRAM ERROR SOFTWARE PROGRAM ERROR X-39 (DCE) TIMEOUT ON CLEAR IND, T13 DISK OPERATION ERROR OPTICAL DISK DRIVE ERROR SLA LINK CHECK fault in laser driver X-26 TIMEOUT ON RESTART REQUEST, T20 OPTICAL DISK DRIVE ERROR Host independent initialization failed Bad block relocation failure - PV no lon Unexpected interrupt Error logging turned off Failed loading microcode Physical volume defined as missing C327 Start error Mirror Write Cache write failed 99
00530EA6 DMA_ERR 01F2D769 X25_ALERT25 0299F00B FDDI_NOMBUFS 03348B46 CXMA_MEM_BD 0375DFC2 X25_ALERT9 038F2580 SCSI_ERR7 038F3117 MPQP_DSRDRP 03ACD152 NB20 04B1C8C0 VCA_INITZ 0502F666 SCSI_ERR1 069DB93B MEM2 06ABB2EB COM_CFG_BUST 06CC7029 CXMA_CFG_FEPOS 0733FFA0 SDA_ERR2 0734DA1D DISKETTE_ERR3 08502E29 FDDI_TRACE 0873CF9F TTY_TTYHOG 087468D0 PSLA002 08784A20 TOK_RMV_ADAP2 0A667C32 WHP0001 0A940597 NB9 0C1EC9FA LVM_SA_WRTERR 0CACEC26 RS_PROG_IOCC 0CFAD921 RS_PROG_SLIH 0D5C1698 X25_ALERT33 0E017ED1 MEMORY 0E37FE58 LVM_BBEPOOL 0EC7E7E5 EPOW_RES 0F27AAE5 CORE_DUMP 0F568474 IENT_ERR2 103F1912 X25_ALERT34 10C6CED6 MPQP_RCVERR 1104AA28 SYS_RESET 1251B5B7 LION_HRDWRE 13881423 SCSI_ERR4 13C8A0AA NB22 150ACBA4 X25_ALERT39 1581762B DISK_ERR4 1588DDD9 CDROM_ERR3 160544E1 SLA_DRIVER_ERR 1642B5A7 X25_ALERT26 173D5818 CDROM_ERR7 17A1F1E4 ACPA_INITZ 18A546CD LVM_BBDIRERR 18B25E18 ACPA_INTR2 192AC071 ERRLOG_OFF 1A1D42F9 ACPA_LOAD 1A2E7186 LVM_MISSPVADDED 1A660730 C327_START 1A9465A3 LVM_MWCWFAIL
Copyright IBM Corp. 1995
1AC82784 LVM_SA_FRESHPP 1B1647DF MPQP_XMTUND 1CCD189F NB21 1D5588BE WHP0013 1E629BB1 RS_8_16_ARB 1F05D2DE FDDI_DWNLD 1FD6C71A X25_ALERT32 20188DE1 TOK_WIRE_FAULT 20FAED7F DSI_PROC 21D5B396 NB28 21F54B38 DISK_ERR1 225E3B63 KERNEL_PANIC 22F7B47B RS_MEM_IOCC 233E36D2 NB26 24247FB2 WHP0006 24DCDBA8 NB24 25D74748 EU_DIAG_ACC 270CB959 VCA_INTR2 273FE0AC NB14 27C1EFFF DSI_IOCC 28935927 NLS_MAP 289590AE NB13 29202CA2 COM_MEM_SLIH 2929FD6D FDDI_RCVRY_EXIT 29975223 COM_CFG_DEVD 2A53071F FDDI_PATH_ERR 2A7392A2 COM_CFG_MNR 2AA90CCD CXMA_IO_ATT 2B60DD24 WHP0012 2B76062D MPQP_BFR 2BFA76F6 REBOOT_ID 2C7CE30E EU_BAD_ADPT 2CF9AB6C CFGMGR_MEMORY 2D3BDDD6 BADISK_ERR8 2DACEE65 FDDI_ADAP_CHECK 2F24221A ENT_ERR4 2F65D788 X25_ALERT7 30911E21 X25_ALERT5 30F182A4 CDROM_ERR1 342CB115 FDDI_TX_ERR 345707F5 TTY_INTR_HOG 34FC3203 CDROM_ERR2 3503BDBA X25_ALERT30 35890E9F TOK_NOMBUFS 358D0A3E DOUBLE_PANIC 35BE4BC0 IENT_ERR1 35BFC499 DISK_ERR3 36C3328B ATE_ERR1 3766B2C7 FDDI_BYPASS 384E0485 BADISK_ERR1 39DCD110 SLA_PROG_ERR 3A30359F INIT_RAPID 3A58ABE2 RS_PIN_IOCC 3A67AFE0 ATE_ERR6 3A9C2352 DISKETTE_ERR2 3B145117 IENT_ERR4 3C19F251 NB2 3CFF4028 DISK_ERR5 3D858A1B MEM1 100
UNKN S PERF H PERM S TEMP S PERM S TEMP H PERM H PERM H PERM S TEMP S PERM H TEMP S PERM S PERM S TEMP S TEMP S PERM S TEMP S PERM S PERM H PERM S PERM S PERM S TEMP H PERM S PERM H PERM S PERM S TEMP S PERF S TEMP S PERM H UNKN S PERM H PERM H TEMP H PERM H PERM H PERM H TEMP H TEMP H TEMP H PERM H UNKN S TEMP S TEMP H PERM H PERM S PERM H TEMP H TEMP S TEMP S PERM S PERM S UNKN H UNKN S TEMP S UNKN H PERM H
Physical partition marked active COMMUNICATIONS UNDERRUN SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR Invalid 8/16 port arbitration register MICROCODE PROGRAM ABNORMALLY TERMINATED X-32 (DCE) CLEAR INDICATION X.25 ADAPTER WIRE FAULT Data Storage Interrupt, Processor SOFTWARE PROGRAM ERROR DISK OPERATION ERROR SOFTWARE PROGRAM ABNORMALLY TERMINATED Cannot allocate memory: iocc structure SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR Cannot perform destructive diagnostics Unexpected interrupt SOFTWARE PROGRAM ERROR Data Storage Interrupt, IOCC SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR Cannot allocate memory: slih structure PROBLEM RESOLVED Configuration failed: devswdel failed ADAPTER ERROR Configuration failed: bad minor number I/O Segment Attach Failed SOFTWARE PROGRAM ERROR OUT OF RESOURCES System shutdown by user Expansion unit error Not enough memory for configuration mgr DISK OPERATION ERROR ADAPTER ERROR ADAPTER ERROR X-7 MODEM FAILURE: ACU NOT RESPONDING X-5 MODEM FAILURE: DCD, DSR, CABLE OPTICAL DISK OPERATION ERROR ADAPTER ERROR PIO exception OPTICAL DISK OPERATION ERROR X-30 DIAGNOSTIC PACKET RECEIVED RESOURCE UNAVAILABLE SOFTWARE PROGRAM ABNORMALLY TERMINATED ADAPTER ERROR DISK OPERATION ERROR COMMUNICATION PROTOCOL ERROR ADAPTER ERROR DISK OPERATION ERROR SLA programming check SOFTWARE PROGRAM ERROR Cannot pin memory: iocc structure COMMUNICATION PROTOCOL ERROR DISKETTE DEVICE FAILURE UNDETERMINED ERROR SOFTWARE PROGRAM ERROR UNDETERMINED ERROR Memory failure
An HACMP Cookbook
3EC3C657 COM_CFG_NADP 3F86401A LION_BOX_DIED 419D40C2 NB23 4224BA8C WHP0008 4287A984 COM_CFG_BUSID 43D4ADCE TTY_PARERR 44CB9ECE MPQP_DSRTO 4523CAA9 CMDLVM 476B351D TAPE_ERR2 47E84916 IENT_ERR5 484F5514 NB6 4865FA9B TAPE_ERR1 4A29D32A MACHINECHECK 4A4FBE2B NB16 4AB56573 CAT_ERR2 4B0E39BB CXMA_MEM_CH 4C2BDA1E NB3 4CEBE931 COM_CFG_UIO 4EDEF5A1 SCSI_ERR5 4F3E9630 INIT_UNKNOWN 4F515DF0 WHP0005 504B04D3 NB18 506E5213 ACPA_IOCTL2 50CA5315 LION_BUFFERO 5114C792 COM_CFG_IFLG 51F9313A NB17 52DB7218 SCSI_ERR6 532D1C49 TOK_DOWNLOAD 53920B1F ACPA_IOCTL1 5416CE51 COM_TEMP_PIO 544FF289 COM_CFG_SLIH 54B73180 LVM_BBDIRFUL 54E423ED SCSI_ERR9 5529E45B X25_ALERT21 5537AC5F TAPE_ERR4 56816728 MPQP_CTSTO 57797644 X25_ADAPT 592D5E9D TOK_WRAP_TST 59792439 X25_ALERT12 59853D4A CXMA_CFG_TALLOC 59D54E37 X25_ALERT16 5A48B4FF FDDI_RCVRY_TERM 5AE97EAA MSLA_PROTOCOL 5CC986A0 SCSI_ERR3 5CE03B80 INIT_OPEN 5CFBFA4A WHP0004 5D1F16FA CAT_ERR8 5D66BBC4 DUMP_STATS 5DFEADCB LVM_HWREL 5E9573AA CXMA_ERR_ASSRT 5F504A40 SLA_SIG_ERR 60D5349F COM_PIN_SLIH 618DB24A X25_ALERT24 627A4F55 BADISK_ERR3 6297CA97 DUMP 66C3412B RS_MEM_EDGE 680A6C7C CXMA_CFG_PORT 684B0E5C LVM_BBDIR90 68F9701C CXMA_ADP_FAIL
PERM S PERM H PERM S TEMP S PERM S TEMP S TEMP H PERF H PERM H UNKN S TEMP S PERM H PERM H TEMP S PERM S PERM S TEMP S PERM S PERM S TEMP S TEMP S PERM S TEMP S TEMP S PERM S PERM S TEMP S PERM H PERM S TEMP H PERM S UNKN H PERM H PERM H PERM H TEMP H PERM H PERM H TEMP H PERM S TEMP H PERM H TEMP S PERM H TEMP S TEMP S TEMP H UNKN S UNKN H PERM S PERM H PERM S PERM H TEMP H TEMP H PERM S PERM S UNKN H PERM H
Configuration failed: adapter missing Lost communication: 64 port concentrator SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR Configuration failed: bad bus id range Parity/Framing error on input UNABLE TO COMMUNICATE WITH DEVICE DISK OPERATION ERROR TAPE DRIVE FAILURE COMMUNICATIONS SUBSYSTEM FAILURE SOFTWARE PROGRAM ERROR TAPE OPERATION ERROR Machine Check SOFTWARE PROGRAM ERROR MICROCODE PROGRAM ERROR Cant Allocate ch_t Structures SOFTWARE PROGRAM ERROR Configuration failed: resid not correct SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR Invalid ioctl request Buffer overrun: 64 port concentrator Configuration failed: bad interrupt flag SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR MICROCODE PROGRAM ABNORMALLY TERMINATED Invalid ioctl request PIO exception Configuration failed: i_init of slih Bad block relocation failure Potential data loss condition X-21 CLEAR INDICATION RECEIVED TAPE DRIVE FAILURE COMMUNICATION PROTOCOL ERROR ADAPTER ERROR OPEN FAILURE X-12 FRAME TYPE Z RECEIVED talloc failed X-16 FRAME TYPE Z SENT ADAPTER ERROR COMMUNICATION PROTOCOL ERROR MICROCODE PROGRAM ERROR SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR ADAPTER ERROR System dump Hardware disk block relocation achieved Driver Assert Message SLA LINK CHECK signal failure Cannot pin memory: slih structure X-24 CLEAR REQUEST BY X.25 ADAPTER DISK OPERATION ERROR Dump device error Cannot allocate memory: edge structure Bad Adapter I/O Port Address Bad block directory over 90% full Async Adapter Failed 101
69221791 MSLA_START TEMP S 6B0B47FA CFGMGR_LOCK UNKN S 6D6B57F9 TOK_BAD_ASW PERM H 6F7D7290 X25_ALERT11 TEMP H 6FD1189E X25_ALERT15 TEMP H 70559CAE NB4 PERM S 71248BF5 ISI_PROC PERM S 7239AC3D FDDI_LLC_ENABLE TEMP H 72CBC436 TMSCSI_UNKN_SFW_ERR UNKN S 74533D1A EPOW_SUS UNKN H 74E0CEA8 X25_IPL PERM H 760470A6 IENT_ERR3 TEMP H 76C9D063 DSI_SLA PERM H 770F9606 BADISK_ERR2 PERM H 773D6C8E NB7 TEMP S 77E0148A MEM3 PERM H 7873CE72 X25_ALERT31 PERM H 794A4421 X25_ALERT37 TEMP H 7993098B COM_CFG_UNK PERM S 79FED1ED NB29 PERM S 7A9C71E6 X25_ALERT18 PERM H 7A9E20BB MPQP_XFTO PERM H 7AB881D9 MISC_ERR UNKN H 7B3D4206 SLA_EXCEPT_ERR PERM H 7BDD117A TOK_RCVRY_ENTER TEMP H 7C197591 SLA_FRAME_ERR TEMP H 7D1E4727 TOK_DUP_ADDR TEMP S 7EF0A4FF CFGMGR_NONFATAL_DB UNKN S 7F0052C6 COM_CFG_UNPIN PERM S 7FF45EC0 WHP0003 TEMP S 804055EB NB15 PERM S 804C1878 COM_CFG_RESID PERM S 80A357F9 INIT_CREATE TEMP S 80F672FF CAT_ERR4 TEMP S 813E4B9A X25_ALERT10 TEMP H 81922194 X25_ALERT14 TEMP H 835C5977 ACPA_INTR1 TEMP S 836A2443 X25_CONFIG PERM H 83E4C0B2 LVM_SWREL UNKN H 84917289 LVM_BBRELMAX UNKN H 84EE0148 MPQP_QUE TEMP H 861365E7 EU_CFG_NPLN PERM S 868921F2 TMSCSI_READ_ERR TEMP H 86922CCD X25_ALERT27 PERM H 89B52AA5 CONSOLE PERM S 89C695BB ACPA_INTR4 TEMP S 8B5D61E6 CXMA_MEM_TTY PERM S 8BBE428E TOK_BEACON3 TEMP S 8C0353CB MPQP_X21CECLR PERM S 8D2CC3AA MSLA_WRITE TEMP S 8DCE65AF FDDI_MC_ERR TEMP H 8DD34341 CDROM_ERR6 TEMP H 8EA094FF CHECKSTOP TEMP H 8FEF9795 DISKETTE_ERR6 PERM H 904C6053 VCA_INTR1 TEMP S 9060A2F8 CAT_ERR3 TEMP S 90809FD9 TOK_ERR10 PERM H 91D6C4F8 CDROM_ERR5 PERM H 91E8D590 TOK_MC_ERR PERM H 102
OUT OF RESOURCES Could not acquire configuration lock MICROCODE PROGRAM ERROR X-11 FRAME TYPE Y RECEIVED X-15 FRAME TYPE Y SENT SOFTWARE PROGRAM ERROR Instruction Storage Interrupt PROBLEM RESOLVED SOFTWARE PROGRAM ERROR LOSS OF ELECTRICAL POWER ADAPTER ERROR Data Storage Interrupt, IOCC Data Storage Interrupt, SLA DISK OPERATION ERROR SOFTWARE PROGRAM ERROR Memory failure X-31 RESET INDICATION PACKET RECEIVED X-37 (DCE) TIMEOUT ON RESET IND, T12 Configuration failed: bad adapter type SOFTWARE PROGRAM ERROR X-18 UNEXPECTED DISC RECEIVED ADAPTER ERROR Miscellaneous interrupt Internal serial link adapter exception ADAPTER ERROR SLA LINK CHECK possible lost frame OPEN FAILURE Configuration mgr nonfatal database err Configuration failed: unpincode failed SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR Configuration failed: resid not correct SOFTWARE PROGRAM ERROR RESOURCE UNAVAILABLE X-10 FRAME TYPE X RECEIVED X-14 FRAME TYPE X SENT Interrupt handler registration failed X.25 CONFIGURATION ERROR Software disk block relocation achieved Bad block relocation failure - PV no lon MPQP unable to access queue Configuration failed: adapter missing Attached SCSI initiator error X-27 TIMEOUT ON RESET REQUEST, T22 SOFTWARE PROGRAM ERROR Interrupt timed out Cant Allocate tty_t Structures TOKEN-RING TEMPORARY ERROR X.21 ERROR ADAPTER ERROR ADAPTER ERROR OPTICAL DISK DRIVE ERROR Checkstop PIO exception Interrupt handler registration failed RESOURCE UNAVAILABLE MANAGEMENT SERVER REPORTING LINK ERROR OPTICAL DISK DRIVE ERROR ADAPTER ERROR
An HACMP Cookbook
91F9700D LVM_SA_QUORCLOSE 91FDA5E4 CFGMGR_OPTION 925A4C9B SLA_CRC_ERR 92A72C14 COM_CFG_ILVL 9359F226 LVM_MISSPVRET 974CC901 X25_ALERT19 9844042C NB27 98A70F55 ENT_ERR5 98F39A90 TMSCSI_RECVRD_ERR 99227331 ENT_ERR3 9A335282 EXCHECK_RSC 9AD6AC9F VCA_INTR4 9B55A553 FDDI_RMV_ADAP 9C7FE90B LION_MEM_ADAP 9D30B78E TTY_OVERRUN 9DBCFDEE ERRLOG_ON 9E45396D NB5 A194D797 TOK_ERR15 A28B68BD MSLA_ADAPTER A386E435 ENT_ERR1 A38E8CF2 CDROM_ERR4 A5417864 WHP0011 A668F553 DISK_ERR2 A6BAD8E6 CORRECTED_SCRUB A741AD52 MPQP_DSROFFTO A80659F3 WHP0014 A84C681B VCA_MEM A853F9CE EU_DIAG_MEM A92AE715 DISKETTE_ERR1 A9844FEE EXCHECK_DMA A9ED5BB6 SDC_ERR1 AA8AB241 OPMSG AAD5C121 TOK_AUTO_RMV ABB81CD5 ENT_ERR2 ABEC9F35 TOK_RMV_ADAP1 AC47FA8A X25_ALERT38 ACDAE3FC TOK_ADAP_CHK AD682624 CDROM_ERR8 AD917FBA MPQP_ASWCHK AEC7B1B0 TOK_BEACON2 AFF4BD94 NB30 B135AE8B SDA_ERR1 B1462F15 SDC_ERR3 B18287F3 SDA_ERR4 B188909A LVM_SA_STALEPP B216DB3E COM_CFG_PORT B29547EF CXMA_CFG_RST B3683B72 FDDI_XCARD B5982183 EU_CFG_BUSY B598ECB3 PSLA001 B617E928 TAPE_ERR6 B63E9C5E RS_BAD_INTER B6A6F2B7 CXMA_CFG_MPORT B7164FA8 WHP0007 B73A1D33 X25_ALERT13 B73BC3CD DISKETTE_ERR4 B76A0A99 LION_CHUNKNUMC B7BF9C85 CXMA_CFG_MEM B7F0EC53 NB10
UNKN H UNKN S TEMP H PERM S UNKN S PERM H TEMP S UNKN S TEMP H PERM H PERM H TEMP S PERM H PERM S TEMP S TEMP O TEMP S UNKN H PERM H PERM H TEMP H TEMP S PERM H TEMP H TEMP H TEMP S TEMP S PERM S TEMP H PERM H PERM H TEMP O PERM H TEMP H PERM H TEMP H PERM H UNKN H PERM S PERM H PERM S PERM H TEMP H TEMP H UNKN S PERM S PERM S PERM H PERM S TEMP H TEMP H PERM S PERM S TEMP S TEMP H UNKN H TEMP S PERM S TEMP S
Quorum lost, volume group closing Invalid option: configuration manager SLA LINK CHECK crc error Configuration failed: interrupt level Physical volume is now active X-19 DM RXD DURING LINK ACTIVATION SOFTWARE PROGRAM ERROR RESOURCE UNAVAILABLE Attached SCSI target device error ADAPTER ERROR External Check, DMA Interrupt timed out REMOVE ADAPTER COMMAND RECEIVED Cannot allocate memory: adap structure Receiver over-run on input Error logging turned on SOFTWARE PROGRAM ERROR ADAPTER ERROR ADAPTER ERROR ADAPTER ERROR OPTICAL DISK DRIVE ERROR SOFTWARE PROGRAM ERROR DISK OPERATION ERROR Memory scrubbing corrected ECC error UNABLE TO COMMUNICATE WITH DEVICE SOFTWARE PROGRAM ERROR Failed pinning memory Cannot allocate memory: wrap buffer DISKETTE OPERATION ERROR External Check, DMA LINK ERROR OPERATOR NOTIFICATION AUTO REMOVAL COMMUNICATION PROTOCOL ERROR OPEN FAILURE X-38 (DCE) TIMEOUT ON CALL IND, T11 UNABLE TO COMMUNICATE WITH DEVICE UNDETERMINED ERROR MICROCODE PROGRAM ERROR TOKEN-RING INOPERATIVE SOFTWARE PROGRAM ERROR STORAGE SUBSYSTEM FAILURE STORAGE SUBSYSTEM FAILURE UNDETERMINED ERROR Physical partition marked stale Configuration failed: port configured Adapter Reset Failed ADAPTER ERROR Configuration failed: in use DEVICE ERROR TAPE OPERATION ERROR Interrupt from non-existant port Bad or Missing Port on Adapter SOFTWARE PROGRAM ERROR X-13 FRAME TYPE W RECEIVED DISKETTE OPERATION ERROR Bad chunk count: 64 port controller Bad Adapter Memory Address SOFTWARE PROGRAM ERROR 103
B8892A14 DSI_SCU BAB1383B NB8 BAECC981 SDM_ERR1 BB5C513F ACPA_MEM BBA1D78B ACPA_UCODE BC8F0BBB COM_CFG_DEVA BDA444C8 SLA_PARITY_ERR BE42630E REPLACED_FRU BE7E5290 LION_PIN_ADAP BE7F0C5D COM_CFG_DMA BE910C7F CAT_ERR7 BF06FA0D FDDI_LLC_DISABLE BF3F8438 PSLA003 BF6D9219 LION_UNKCHUNK BF93B600 TOK_RCVRY_TERM BFEA74DC CXMA_MEM_ATT C0073BB4 TTY_BADINPUT C0514A3F X25_ALERT35 C1423E5B WHP0010 C14C511C SCSI_ERR2 C2B80BFB X25_ALERT36 C580DED6 WHP0009 C5C09FFA PGSP_KILL C67E7D0F LVM_HWFAIL C6ACA566 SYSLOG C6EB3E75 FDDI_SELF_TEST C70E1E46 X25_ALERT17 C88D3DD8 MPQP_X21CPS C89DE914 C327_INTR C8F22E8E FLPT_UNAVAIL C92F456F NB11 C9A0C741 X25_UCODE C9E358D3 CXMA_LINE_ERR C9F4EE17 EU_CFG_NADP CBE1D1A5 LVM_SA_PVMISS CBE25456 MSLA_INTR CEDCB90F FDDI_PIO CF4781D3 BADISK_ERR4 CFC1A4DD MPQP_ADPERR CFCDE8F6 FDDI_DOWN CFFF77BD TOK_ADAP_ERR D080E08D CAT_ERR5 D2360951 TOK_CONGEST D2B9B5A9 BADISK_ERR5 D3B0ECBF X25_ALERT8 D3F26EC3 NB1 D41B92E8 RS_PIN_EDGEV D62AAFD8 LVM_BBDIRBAD D7BDE2AD INTR_ERR D7DDDC46 CAT_ERR1 D824DB48 VCA_INTR3 D84B1C5B LION_MEM_LIST D8EA614B FDDI_USYS D9EE4AC1 EU_CFG_GONE DA244DCA COM_CFG_PIN DA80B2D4 NB12 DB3E3DFD ENT_ERR6 DB451F82 MPQP_RCVOVR DBF56911 EU_CFG_HERE 104
PERM H TEMP S PERM H TEMP S TEMP S PERM S TEMP H PERM H PERM S PERM S TEMP S TEMP H TEMP H TEMP S PERM H PERM S TEMP S TEMP H TEMP S TEMP H TEMP H TEMP S PERM S UNKN H UNKN S TEMP H PERM H PERM S PERM S PERM S TEMP S PERM H PERM H PERM S UNKN H TEMP S TEMP H PERM H PERM H TEMP H PERM H TEMP H PERF S PERM H PERM H TEMP S PERM S UNKN H UNKN H PERM H TEMP S PERM S UNKN S PERM S PERM S PERM S PERM H PERF H PERM S
Data Storage Interrupt, SCU SOFTWARE PROGRAM ERROR MICROCODE PROGRAM ERROR Failed pinning memory Failed loading microcode onto M-ACPA/A Configuration failed: devswadd failed SLA buffer parity error Repair action Cannot pin memory: adap structure Configuration failed: dma level conflict RESOURCE UNAVAILABLE LAN ERROR LINK ERROR Unknown error code: 64 port concentrator ADAPTER ERROR Memory Segment Attach Failed Bad ttyinput return X-35 (DCE) RESTART RESET RECEIVED SOFTWARE PROGRAM ERROR ADAPTER ERROR X-36 (DCE) TIMEOUT ON RESTART IND, T10 SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ABNORMALLY TERMINATED Hardware disk block relocation failed Message redirected from syslog LAN ERROR X-17 FRAME RETRY N2 REACHED X.21 ERROR C327 Interrupt error OPERATOR NOTIFICATION SOFTWARE PROGRAM ERROR X.25 MICROCODE ERROR Synchronous Line Errors Configuration failed: adapter missing Physical volume declared missing COMMUNICATION PROTOCOL ERROR PIO exception DISK OPERATION ERROR ADAPTER ERROR ADAPTER ERROR Potential data loss condition ADAPTER ERROR COMMUNICATIONS OVERRUN DISK OPERATION ERROR X-8 X.21 NOT CONNECTED SOFTWARE PROGRAM ERROR Cannot pin memory: edge vector Bad block relocation failure - PV no lon UNDETERMINED ERROR MICROCODE PROGRAM ABNORMALLY TERMINATED Invalid interrupt Cannot allocate memory: ttyp_t list UNDETERMINED ERROR Configuration failed: unconfigured Configuration failed: pincode failed SOFTWARE PROGRAM ERROR CSMA/CD LAN COMMUNICATIONS LOST COMMUNICATIONS OVERRUN Configuration failed: already configured
An HACMP Cookbook
DBF832FF LVM_BBFAIL UNKN H DD0E4902 TOK_RCVRY_EXIT TEMP H DD11B4AF PROGRAM_INT PERM S DD2201A9 X25_ALERT28 PERM H DDBCA0EE VCA_IOCTL2 TEMP S DFC508F5 PPRINTER_ERR1 UNKN H E0EA14BF TOK_BEACON1 TEMP S E180FD0E CXMA_CONC_DOWN PERM H E18E984F SRC PERM S E2109F7A COM_PERM_PIO PERM H E225351D CXMA_ERR_EVNT PERM S E252FE92 MPQP_X21DTCLR PERM S E2A4EC26 RS_MEM_EDGEV PERM S E2B9E02B TTY_PROG_PTR UNKN S E47E212E INIT_UTMP TEMP S E4EF0A90 WHP0002 TEMP S E4F5F86E MPQP_IPLTO PERM H E61501A6 MPQP_X21TO TEMP H E64EC259 TAPE_ERR3 PERM H E6599C95 X25_ALERT23 PERM H E6784BC4 X25_ALERT29 PERM H E6CDBCFC CFGMGR_PROGRAM_NF UNKN S E70473E7 VCA_IOCTL1 PERM S E79A3C09 ACPA_INTR3 TEMP S E7D0FE3F RS_PIN_EDGE PERM S E7E2E3E9 NLS_BADMAP PERM S E85C5C4C HFTERR PERM S E9645CC5 FDDI_RCV UNKN H E97374FF RS_MEM_PVT PERM S EA388E60 X25_ALERT22 PERM H EB5F98B2 RCMERR PERM S EE18DF01 TMSCSI_CMD_ERR TEMP H EE8BC5D8 CXMA_CFG_BIOS PERM S EFEC314D DISKETTE_ERR5 TEMP H F15F3C50 FDDI_RCVRY_ENTER PEND H F2F30ADF FDDI_PORT TEMP H F3D17657 CXMA_CFG_MTST PERM S F438E969 SDC_ERR2 PERM H F4CB727F FDDI_SELFT_ERR PERM H F5345AAB NB25 PERM S F5458763 COM_CFG_ADPT PERM S F6E3C547 ATE_ERR7 TEMP S F734B194 NB19 PERM S F7E70B81 EXCHECK_SCRUB PERM H F81946D8 CFGMGR_CHILD UNKN S F9171B5C CFGMGR_FATAL_DB UNKN S F924E95E TOK_PIO_ERR PERM H FB683A72 ACCT_OFF TEMP S FBD2B2B5 MSLA_IOCTL TEMP S FBF0BFC1 TMSCSI_UNRECVRD_ERR PERM H FCA960CE TOK_ESERR TEMP S FDE6A5A1 COM_CFG_BUSI PERM S FE1DA20A TOK_ERR5 PERM H FE6A2D60 COM_CFG_INTR PERM S FEC31570 SDA_ERR3 PERM H FED1497C MSLA_CLOSE TEMP S FFC9ECAA TOK_TX_ERR PERM H FFE2F73A TAPE_ERR5 UNKN H
Bad block relocation failure - PV no lon PROBLEM RESOLVED Program Interrupt X-28 TIMEOUT ON CALL REQUEST, T21 Invalid ioctl request PRINTER ERROR OPEN FAILURE Concentrator Removed From System SOFTWARE PROGRAM ERROR PIO exception Event handler Failure X.21 ERROR Cannot allocate memory: edge vector Software error: t_hptr field invalid SOFTWARE PROGRAM ERROR SOFTWARE PROGRAM ERROR ADAPTER ERROR X.21 ERROR TAPE DRIVE FAILURE X-23 RESET REQUEST BY X.25 ADAPTER X-29 TIMEOUT ON CLEAR REQUEST, T23 Program or method not found Invalid ioctl request Invalid interrupt Cannot pin memory: edge structure Software error: NLS map corrupted SOFTWARE PROGRAM ERROR ADAPTER ERROR Cannot allocate memory: priv. structure X-22 RESTART INDICATION RECEIVED SOFTWARE PROGRAM ERROR Attached SCSI target device error Adapter BIOS Initialization Failed PIO exception Recovery logic initiated by device ADAPTER ERROR Adapter Memory Test Failed STORAGE SUBSYSTEM FAILURE ADAPTER ERROR SOFTWARE PROGRAM ERROR Configuration failed: already configured COMMUNICATION PROTOCOL ERROR SOFTWARE PROGRAM ERROR OPERATOR NOTIFICATION Configuration mgr child process failed Configuration mgr fatal database problem ADAPTER ERROR EC26 ADAPTER ERROR Attached SCSI target device error EXCESSIVE TOKEN-RING ERRORS Configuration failed: bad bus ID OPEN FAILURE Configuration failed: interrupt priority UNDETERMINED ERROR SOFTWARE PROGRAM ABNORMALLY TERMINATED ADAPTER ERROR UNDETERMINED ERROR
105
106
An HACMP Cookbook
SCSI disks and subsystems RAID subsystems 9333 Serial disk subsystems Serial Storage Architecture (SSA) disk subsystems
SCSI-2 Differential Controller (FC: 2420, PN: 43G0176) SCSI-2 Differential Fast/Wide Adapter/A (FC: 2416, PN: 65G7315) Enhanced SCSI-2 Differential Fast/Wide Adapter/A (FC: 2412, PN: 52G3380) (This adapter was only supported under AIX 4.1 and HACMP 4.1 for AIX at the time of publishing, but testing was underway to certify the adapter under HACMP/6000 Version 3.1)
The non-RAID SCSI disks and subsystems that you can connect as shared disks in an HACMP cluster are:
7204 Models 215, 315, 317, and 325 External Disk Drives 9334 Models 011 and 501 SCSI Expansion Units 7134-010 High Density SCSI Disk Subsystem
107
Figure 6. Termination Resistor Blocks on the SCSI-2 Differential Fast/Wide Adapter/A and Enhanced SCSI-2 Differential Fast/Wide Adapter/A
The ID of a SCSI adapter, by default, is 7. Since each device on a SCSI bus must have a unique ID, the ID of at least one of the adapters on a shared SCSI bus has to be changed. The procedure to change the ID of a SCSI-2 Differential Controller is: 1. At the command prompt, enter smit chgscsi. 2. Select the adapter whose ID you want to change from the list presented to you.
108
An HACMP Cookbook
SCSI Adapter Move cursor to desired item and press Enter. scsi0 scsi1 scsi2 scsi3 Available Available Available Available 00-02 00-06 00-08 00-07 SCSI SCSI SCSI SCSI I/O I/O I/O I/O Controller Controller Controller Controller F3=Cancel Enter=Do
3. Enter the new ID (any integer from 0 to 7) for this adapter in the Adapter card SCSI ID field. Since the device with the highest SCSI ID on a bus gets control of the bus, set the adapters ID to the highest available ID. Set the Apply change to DATABASE only field to yes.
Change / Show Characteristics of a SCSI Adapter Type or select values in entry fields. Press Enter AFTER making all desired changes. [Entry Fields] SCSI Adapter scsi1 Description SCSI I/O Controller Status Available Location 00-06 Adapter card SCSI ID [6] BATTERY backed adapter no DMA bus memory LENGTH [0x202000] Enable TARGET MODE interface yes Target Mode interface enabled yes PERCENTAGE of bus memory DMA area for target mode [50] Name of adapter code download file /etc/microcode/8d77.a0> Apply change to DATABASE only yes
+# + + + +# +
F4=List F8=Image
4. Reboot the machine to bring the change into effect. The same task can be executed from the command line by entering:
Also with this method, a reboot is required to bring the change into effect. The procedure to change the ID of a SCSI-2 Differential Fast/Wide Adapter/A or Enhanced SCSI-2 Differential Fast/Wide Adapter/A is almost the same as the one described above. Here, the adapter that you choose from the list you get after executing the smit chgsys command should be an ascsi device. Also, as, shown below, you need to change the external SCSI ID only.
109
SCSI adapter Description Status Location Internal SCSI ID External SCSI ID WIDE bus enabled ... Apply change to DATABASE only
ascsi1 Wide SCSI I/O Control> Available 00-06 7 [6] yes yes
+# +# +
As in the case of the SCSI-2 Differential Controller, a system reboot is required to bring the change into effect. The maximum length of the bus, including any internal cabling in disk subsystems, is limited to 19 meters for buses connected to the SCSI-2 Differential Controller, and to 25 meters for those connected to the SCSI-2 Differential Fast/Wide Adapter/A or Enhanced SCSI-2 Differential Fast/Wide Adapter/A.
110
An HACMP Cookbook
determined by the number of available SCSI IDs on the shared bus to which it is attached.
SCSI-2 Differential System-to-System Cable FC: 2423 (2.5m), PN: 52G7349 This cable is used only if there are more than two nodes attached to the same shared bus.
SCSI-2 DE Controller Cable FC: 2854 or 9138 (0.6m), PN: 87G1358 - OR FC: 2921 or 9221 (4.75m), PN: 67G0593
Figure 7 shows four RS/6000s, each represented by one SCSI-2 Differential Controller, connected on an 8-bit bus to a chain of 7204-215s.
Figure 7. 7204-215 External Disk Drives Connected on an 8-Bit Shared SCSI Bus
111
D.1.3.2 7204 Model 315, 317, and 325 External Disk Drives
To attach a chain of 7204 Model 315s, 317s, or 325s, or a combination of them to SCSI-2 Differential Fast/Wide Adapter/As or Enhanced SCSI-2 Differential Fast/Wide Adapter/As on a shared 16-bit SCSI bus, you need the following 16-bit cables and terminators:
16-Bit SCSI-2 Differential System-to-System Cable FC: 2424 (0.6m), PN: 52G4291 - OR FC: 2425 (2.5m), PN: 52G4233 This cable is used only if there are more than two nodes attached to the same shared bus.
16-Bit SCSI-2 DE Device-to-Device Cable FC: 2845 or 9131 (0.6m), PN: 52G4291 - OR FC: 2846 or 9132 (2.5m), PN: 52G4233
Figure 8 shows four RS/6000s, each represented by one SCSI-2 Differential Fast/Wide Adapter/A, connected on a 16-bit bus to a chain of 7204-315s. The connections would be the same for the 7204-317, and Model 325 drives. You could also substitute the Enhanced SCSI-2 Differential Fast/Wide Adapter/A (feature code 2412) for the SCSI-2 Differential Fast/Wide Adapter/As shown in the figure, if you are running HACMP 4.1 for AIX.
Figure 8. 7204-315 External Disk Drives Connected on a 16-Bit Shared SCSI Bus
112
An HACMP Cookbook
SCSI-2 Differential System-to-System Cable FC: 2423 (2.5m), PN: 52G7349 This cable is used only if there are more than two nodes attached to the same shared bus.
SCSI-2 DE Controller Cable FC: 2921 or 9221 (4.75m), PN: 67G0593 - OR FC: 2923 or 9223 (8.0m), PN: 95X2494
SCSI-2 DE Controller Cable FC: 2931 (1.48m), PN: 70F9188 - OR FC: 2933 (2.38m), PN: 45G2858 - OR FC: 2935 (4.75m), PN: 67G0566 - OR FC: 2937 (8.0m), PN: 67G0562
Figure 9 on page 114 shows four RS/6000s, each represented by one SCSI-2 Differential Controller, connected on an 8-bit bus to a chain of 9334-011s. Figure 10 on page 114 shows four RS/6000s, each represented by one SCSI-2 Differential Controller, connected on an 8-bit bus to a chain of 9334-501s.
113
Figure 9. 9334-011 SCSI Expansion Units Connected on an 8-Bit Shared SCSI Bus
Figure 10. 9334-501 SCSI Expansion Units Connected on an 8-Bit Shared SCSI Bus
114
An HACMP Cookbook
16-Bit SCSI-2 Differential System-to-System Cable FC: 2424 (0.6m), PN: 52G4291 - OR FC: 2425 (2.5m), PN: 52G4233 This cable is used only if there are more than two nodes attached to the same shared bus.
16-Bit Differential SCSI Cable FC: 2902 (2.4m), PN: 88G5750 - OR FC: 2905 (4.5m), PN: 88G5749 - OR FC: 2912 (12.0m), PN: 88G5747 - OR FC: 2914 (14.0m), PN: 88G5748 - OR FC: 2918 (18.0m), PN: 88G5746
Figure 11 on page 116 shows four RS/6000s, each represented by two SCSI-2 Differential Fast/Wide Adapter/As, connected on a 16-bit bus to a 7134-010 with a base and an expansion unit. You could also substitute the Enhanced SCSI-2 Differential Fast/Wide Adapter/A (feature code 2412) for the SCSI-2 Differential Fast/Wide Adapter/As shown in the figure, if you are running HACMP 4.1 for AIX.
115
Figure 11. 7134-010 High Density SCSI Disk Subsystem Connected on Two 16-Bit Shared SCSI Buses
SCSI-2 Differential Controller (FC: 2420, PN: 43G0176) SCSI-2 Differential Fast/Wide Adapter/A (FC: 2416, PN: 65G7315) Enhanced SCSI-2 Differential Fast/Wide Adapter/A (FC: 2412) (This adapter was only supported under AIX 4.1 and HACMP 4.1 for AIX at the time of publishing, but testing was underway to certify the adapter under HACMP/6000 Version 3.1)
The RAID subsystems that you can connect on a shared bus in an HACMP cluster are:
7135-110 (HACMP/6000 Version 3.1 only, at the time of publishing) and 7135-210 (HACMP 4.1 for AIX only) RAIDiant Array 7137 Model 412, 413, 414, 512, 513, and 514 Disk Array Subsystems
116
An HACMP Cookbook
Note: Existing IBM 3514 RAID Array models continue to be supported as shared disk subsystems under HACMP, but since this subsystem has been withdrawn from marketing, it is not described here. As far as cabling and connection characteristics are concerned, the 3514 follows the same rules as the 7137 Disk Array subsystems.
SCSI-2 Differential System-to-System Cable FC: 2423 (2.5m), PN: 52G7349 This cable is used only if there are more than two nodes attached to the same shared bus.
Differential SCSI Cable (RAID Cable) FC: 2901 or 9201 (0.6m), PN: 67G1259 - OR FC: 2902 or 9202 (2.4m), PN: 67G1260 - OR -
117
FC: 2905 or 9205 (4.5m), PN: 67G1261 - OR FC: 2912 or 9212 (12m), PN: 67G1262 - OR FC: 2914 or 9214 (14m), PN: 67G1263 - OR FC: 2918 or 9218 (18m), PN: 67G1264
Cable Interposer (I) FC: 2919, PN: 61G8323 One of these is required for each connection between a SCSI-2 Differential Y-Cable and a Differential SCSI Cable going to the 7135 unit, as shown in Figure 12.
Figure 12 shows four RS/6000s, each represented by two SCSI-2 Differential Controllers, connected on two 8-bit buses to two 7135-110s each with two controllers. Note The diagrams in this book give a logical view of the 7135 subsystem. Please refer to the 7135 Installation and Service Guide for the exact positions of the controllers and their corresponding connections.
Figure 12. 7135-110 RAIDiant Arrays Connected on Two Shared 8-Bit SCSI Buses
To connect a set of 7135s to SCSI-2 Differential Fast/Wide Adapter/As or Enhanced SCSI-2 Differential Fast/Wide Adapter/As on a shared 16-bit SCSI bus, you need the following:
16-Bit SCSI-2 Differential System-to-System Cable FC: 2424 (0.6m), PN: 52G4291 - OR -
118
An HACMP Cookbook
FC: 2425 (2.5m), PN: 52G4233 This cable is used only if there are more than two nodes attached to the same shared bus.
16-Bit Differential SCSI Cable (RAID Cable) FC: 2901 or 9201 (0.6m), PN: 67G1259 - OR FC: 2902 or 9202 (2.4m), PN: 67G1260 - OR FC: 2905 or 9205 (4.5m), PN: 67G1261 - OR FC: 2912 or 9212 (12m), PN: 67G1262 - OR FC: 2914 or 9214 (14m), PN: 67G1263 - OR FC: 2918 or 9218 (18m), PN: 67G1264
Figure 13 shows four RS/6000s, each represented by two SCSI-2 Differential Fast/Wide Adapter/As, connected on two 16-bit buses to two 7135-110s, each with two controllers. The 7135-210 requires the Enhanced SCSI-2 Differential Fast/Wide Adapter/A adapter for connection. Other than that, the cabling is exactly the same as shown in Figure 13, if you just substitute the Enhanced SCSI-2 Differential Fast/Wide Adapter/A (FC: 2412) for the SCSI-2 Differential Fast/Wide Adapter/A (FC: 2416) in the picture.
Figure 13. 7135-110 RAIDiant Arrays Connected on Two Shared 16-Bit SCSI Buses
D.2.3.2 7137 Model 412, 413, 414, 512, 513, and 514 Disk Array Subsystems
To connect two 7137s to SCSI-2 Differential Controllers on a shared 8-bit SCSI bus, you need the following:
119
SCSI-2 Differential System-to-System Cable FC: 2423 (2.5m), PN: 52G7349 This cable is used only if there are more than two nodes attached to the same shared bus.
Attachment Kit to SCSI-2 Differential High-Performance External I/O Controller FC: 2002, PN: 46G4157 This includes a 4.0-meter cable, an installation diskette, and the IBM 7137 (or 3514) RISC System/6000 System Attachment Guide .
Multiple Attachment Cable FC: 3001, PN: 21F9046 This includes a 2.0-meter cable, an installation diskette, and connection instructions.
Figure 14 shows four RS/6000s, each represented by one SCSI-2 Differential Controller, connected on an 8-bit bus to two 7137s.
Figure 14. 7137 Disk Array Subsystems Connected on an 8-Bit SCSI Bus
To connect two 7137s to SCSI-2 Differential Fast/Wide Adapter/As or Enhanced SCSI-2 Differential Fast/Wide Adapter/As on a shared 16-bit SCSI bus, you need the following:
16-Bit SCSI-2 Differential System-to-System Cable FC: 2424 (0.6m), PN: 52G4291 - OR FC: 2425 (2.5m), PN: 52G4233 This cable is used only if there are more than two nodes attached to the same shared bus.
120
An HACMP Cookbook
Attachment Kit to SCSI-2 Differential Fast/Wide Adapter/A or Enhanced SCSI-2 Differential Fast/Wide Adapter/A FC: 2014, PN: 75G5028 This includes a 4.0-meter cable, an installation diskette, and the IBM 7137 (or 3514) RISC System/6000 System Attachment Guide .
Multiple Attachment Cable FC: 3001, PN: 21F9046 This includes a 2.0-meter cable, an installation diskette, and connection instructions.
Figure 15 shows four RS/6000s, each represented by one SCSI-2 Differential Fast/Wide Adapter/As, connected on a 16-bit bus to two 7137s. The Enhanced SCSI-2 Differential Fast/Wide Adapter/A uses exactly the same cabling, and could be substituted for the SCSI-2 Differential Fast/Wide Adapter/A in an AIX 4.1 and HACMP 4.1 for AIX configuration.
Figure 15. 7137 Disk Array Subsystems Connected on a 16-Bit SCSI Bus
121
High-Performance Disk Drive Subsystem Adapter 40/80 MB/sec. (FC: 6212, PN: 67G1755)
The serial disk subsystems that you can connect as shared devices in an HACMP cluster are:
Serial-Link Cable (Quantity 2) FC: 9210 or 3010 (10m) FC: 9203 or 3003 (3m)
To connect a 9333-011 or 501 to three or more systems, each containing High-Performance Disk Drive Subsystem Adapters, you need the following:
Serial-Link Cable (One for each system connection) FC: 9210 or 3010 (10m) FC: 9203 or 3003 (3m)
122
An HACMP Cookbook
FC: 4002 (Connect up to eight systems) Feature 4001 is a prerequisite for feature 4002. Figure 16 shows eight RS/6000s, each having a High-Performance Disk Drive Subsystem Adapter, connected to one 9333-501 with the Multiple System Attachment Features 4001 and 4002 installed.
Figure 16. 9333-501 Connected to Eight Nodes in an HACMP Cluster (Rear View)
123
To connect SSA subsystems as shared devices in your HACMP cluster, the adapter that you will use is:
This adapter is shown in Figure 17 on page 125. The SSA disk subsystems that you can connect as shared devices in an HACMP cluster are:
IBM 7133-010 SSA Disk Subsystem This model is in a drawer configuration, for use in rack mounted systems.
IBM 7133-500 SSA Disk Subsystem This model is in a standalone tower configuration, for use in all models.
124
An HACMP Cookbook
The labeled components of the adapter in the figure are as follows: 1. Connector B2 2. Green light for adapter port pair B 3. Connector B1 4. Connector A2 5. Green light for adapter port pair A 6. Connector A1
125
7. Type-number label The green lights for each adapter port pair indicate the status of the attached loop as follows: Off Both ports are inactive. If disk drives are connected to these ports, then either the modules have failed or their SSA links have not been enabled. Both ports are active. Only one port is active.
The SSA loop that you create need not begin and end on the same &ssaadt.. Loops can be made to go from one adapter to another adapter in the same system or in a different system. There can at most be two adapters on the same loop.
126
An HACMP Cookbook
As you can see in Figure 18, each group of four disk drives in the subsystem is internally cabled as a loop. Disk Group 1 includes disk drive positions 1-4 and is cabled between connectors J9 and J10. Disk Group 2 includes disk drive positions 5-8 and is cabled between connectors J5 and J6. You can also see Disk Groups 3 and 4 in the picture. These internal loops can either be cabled together into larger loops, or individually connected to SSA Four Port Adapters. For instance, if you were to connect a short cable between connectors J6 and J10, you would have a loop of eight drives that could be connected to the SSA Four Port Adapter from connectors J5 and J9.
The feature code numbers start with the number 5, and the next three digits give a rounded length in meters, which makes the feature numbers easy to understand and remember. As was mentioned before, the only difference between these cables is their length. They can be used interchangeably to connect any SSA components together. If you obtain an announcement letter for the 7133 SSA Subsystem, you will also see a number of other cable feature codes listed, with the same lengths (and same prices) as those in Table 2. You neednt worry or be confused about these, since they are the same cables as those in the tables. As long as you have the correct length of cable for the components you need to connect, you have the right cable. The maximum distance between components in an SSA loop using IBM cabling is 25 meters. With SSA, there is no special maximum cabling distance for the entire loop. In fact, the maximum cabling distance for the loop would be the maximum distance between components (disks or adapters), mulitplied by the maximum number of components (48) in a loop.
127
The first scenario, shown in Figure 19, shows a single 7133 subsystem, containing eight disk drives (half full), connected between two nodes in a cluster. We have not labeled the cables, since their lengths will be dependent on the characteristics of your location. Remember, the longest cable currently marketed by IBM is 25 meters, and there are many shorter lengths, as shown in Table 2 on page 127. As we said before, all cables have the same connectors at each end, and therefore are interchangeable, provided they have sufficient length for the task. In the first scenario, each cluster node has one SSA Four Port Adapter. The disk drives in the 7133 are cabled to the two machines in two loops, the first group of four disks in one loop, and the remaining four in the other. Each of the loops is connected into a different port pair on the SSA Four Port Adapters.
128
An HACMP Cookbook
In this configuration, LVM mirroring should be implemented across the two loops; that is, a disk on one loop should be mirrored to a disk on the other loop. Mirroring in this way will protect you against the failure of any single disk drive. The SSA subsystem is able to deal with any break in the cable in a loop by following the path to a disk in the other direction of the loop, even if it does go through the adapter on the other machine. This recovery is transparent to AIX and HACMP. The only exposure in this scenario is the failure of one of the SSA Four Port Adapters. In this case, the users on the machine with the failed adapter would lose their access to the disks in the 7133 subsystem. The best solution to this problem is to add a second SSA Four Port Adapter to each node, as shown in Figure 20 on page 130. However, this adds an amount of cost to the solution that might not be justifiable, especially if there is a relatively small amount of disk capacity involved. An alternative solution would be to use HACMPs Error Notification feature to protect against the failure. You could define an error notification method, which is triggered on the AIX error log record on the failure of the adapter, and which would run a script to shut down the cluster manager in a graceful with takeover mode. This would migrate the users to the other node, from which they would still have access to the disks. Our second scenario, in Figure 20 on page 130, shows a second SSA Four Port Adapter added to each node. This allows each system to preserve its access to the SSA disks, even if one of the adapters were to fail. This solution does leave an adapter port pair unused on each adapter. These could be used in the future to attach additional loops, if the remaining disk locations in the 7133 were filled, and if additional 7133 subsystems were added into the loops.
129
Any of the loops can be extended at any time, by reconnecting the cabling to include the new disks in the loop. If these additions are planned correctly, and cables are unplugged and plugged one at a time, this addition of disks can be done in a hot-pluggable way, such that the system does not have to be brought down, access to existing disks is not lost, and the new disks can be configured while the system continues running.
130
An HACMP Cookbook
131
132
An HACMP Cookbook
133
134
An HACMP Cookbook
135
136
An HACMP Cookbook
Its goal is to give a complete picture of a working cluster configuration, including any customizations, at the time it is put into production. In case of future malfunctions, this will allow the service personnel to understand any changes that have been made to the original cluster configuration.
137
138
An HACMP Cookbook
E.2.2 Hostname
====> mickey
139
140
An HACMP Cookbook
LV STATE open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd open/syncd closed/syncd open/syncd
MOUNT POINT N/A N/A / /home /tmp /usr /var /blv /mnt
VOLUME GROUP: PERMISSION: LV STATE: WRITE VERIFY: PP SIZE: SCHED POLICY: PPs: BB POLICY:
141
INTER-POLICY: minimum INTRA-POLICY: middle MOUNT POINT: N/A MIRROR WRITE CONSISTENCY: off EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: hd4 LV IDENTIFIER: 000147325ccaf23c.3 VG STATE: inactive TYPE: jfs MAX LPs: 128 COPIES: 1 LPs: 3 STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: center MOUNT POINT: / MIRROR WRITE CONSISTENCY: on EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: hd1 LV IDENTIFIER: 000147325ccaf23c.4 VG STATE: inactive TYPE: jfs MAX LPs: 128 COPIES: 1 LPs: 1 STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: center MOUNT POINT: /home MIRROR WRITE CONSISTENCY: on EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: hd3 LV IDENTIFIER: 000147325ccaf23c.5 VG STATE: inactive TYPE: jfs MAX LPs: 128 COPIES: 1 LPs: 5 STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: center MOUNT POINT: /tmp MIRROR WRITE CONSISTENCY: on EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: hd2 LV IDENTIFIER: 000147325ccaf23c.6 VG STATE: inactive TYPE: jfs MAX LPs: 512 COPIES: 1 LPs: 135 142
An HACMP Cookbook
yes 32 None
VOLUME GROUP: PERMISSION: LV STATE: WRITE VERIFY: PP SIZE: SCHED POLICY: PPs: BB POLICY: RELOCATABLE: UPPER BOUND LABEL:
VOLUME GROUP: PERMISSION: LV STATE: WRITE VERIFY: PP SIZE: SCHED POLICY: PPs: BB POLICY: RELOCATABLE: UPPER BOUND LABEL:
VOLUME GROUP: PERMISSION: LV STATE: WRITE VERIFY: PP SIZE: SCHED POLICY: PPs: BB POLICY: RELOCATABLE: UPPER BOUND LABEL:
VOLUME GROUP: PERMISSION: LV STATE: WRITE VERIFY: PP SIZE: SCHED POLICY: PPs:
STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: center MOUNT POINT: /usr MIRROR WRITE CONSISTENCY: on EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: hd9var LV IDENTIFIER: 000147325ccaf23c.7 VG STATE: inactive TYPE: jfs MAX LPs: 128 COPIES: 1 LPs: 1 STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: center MOUNT POINT: /var MIRROR WRITE CONSISTENCY: on EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: hd5 LV IDENTIFIER: 000147325ccaf23c.8 VG STATE: inactive TYPE: boot MAX LPs: 128 COPIES: 1 LPs: 2 STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: edge MOUNT POINT: /blv MIRROR WRITE CONSISTENCY: on EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: hd7 LV IDENTIFIER: 000147325ccaf23c.9 VG STATE: inactive TYPE: sysdump MAX LPs: 128 COPIES: 1 LPs: 2 STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: edge MOUNT POINT: /mnt MIRROR WRITE CONSISTENCY: on EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: loglvtest1 LV IDENTIFIER: 00014732b5a91022.1 VG STATE: inactive TYPE: jfslog MAX LPs: 128 COPIES: 2
VOLUME GROUP: PERMISSION: LV STATE: WRITE VERIFY: PP SIZE: SCHED POLICY: PPs: BB POLICY: RELOCATABLE: UPPER BOUND LABEL:
VOLUME GROUP: PERMISSION: LV STATE: WRITE VERIFY: PP SIZE: SCHED POLICY: PPs: BB POLICY: RELOCATABLE: UPPER BOUND LABEL:
VOLUME GROUP: PERMISSION: LV STATE: WRITE VERIFY: PP SIZE: SCHED POLICY: PPs: BB POLICY: RELOCATABLE: UPPER BOUND LABEL:
143
LPs: 1 STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: center MOUNT POINT: N/A MIRROR WRITE CONSISTENCY: on EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: lvtest1 LV IDENTIFIER: 00014732b5a91022.2 VG STATE: inactive TYPE: jfs MAX LPs: 128 COPIES: 2 LPs: 20 STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: middle MOUNT POINT: /test1 MIRROR WRITE CONSISTENCY: on EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: loglvtest2 LV IDENTIFIER: 00014732ca66234e.1 VG STATE: inactive TYPE: jfslog MAX LPs: 128 COPIES: 2 LPs: 1 STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: middle MOUNT POINT: N/A MIRROR WRITE CONSISTENCY: on EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: lvtest2 LV IDENTIFIER: 00014732ca66234e.2 VG STATE: inactive TYPE: jfs MAX LPs: 128 COPIES: 2 LPs: 25 STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: middle MOUNT POINT: /test2 MIRROR WRITE CONSISTENCY: on EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: conc1lv LV IDENTIFIER: 00014732b5ac04be.1 VG STATE: inactive TYPE: jfs MAX LPs: 128 144
An HACMP Cookbook
VOLUME GROUP: PERMISSION: LV STATE: WRITE VERIFY: PP SIZE: SCHED POLICY: PPs: BB POLICY: RELOCATABLE: UPPER BOUND LABEL:
VOLUME GROUP: PERMISSION: LV STATE: WRITE VERIFY: PP SIZE: SCHED POLICY: PPs: BB POLICY: RELOCATABLE: UPPER BOUND LABEL:
VOLUME GROUP: PERMISSION: LV STATE: WRITE VERIFY: PP SIZE: SCHED POLICY: PPs: BB POLICY: RELOCATABLE: UPPER BOUND LABEL:
COPIES: 2 LPs: 10 STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: center MOUNT POINT: N/A MIRROR WRITE CONSISTENCY: off EACH LP COPY ON A SEPARATE PV ?: yes _________________________________________ LOGICAL VOLUME: conc2lv LV IDENTIFIER: 00014732b5ac04be.2 VG STATE: inactive TYPE: jfs MAX LPs: 128 COPIES: 2 LPs: 7 STALE PPs: 0 INTER-POLICY: minimum INTRA-POLICY: center MOUNT POINT: N/A MIRROR WRITE CONSISTENCY: off EACH LP COPY ON A SEPARATE PV ?: yes
VOLUME GROUP: PERMISSION: LV STATE: WRITE VERIFY: PP SIZE: SCHED POLICY: PPs: BB POLICY: RELOCATABLE: UPPER BOUND LABEL:
E.2.9 Filesystems
Name /dev/hd4 /dev/hd1 /dev/hd2 /dev/hd9var /dev/hd3 /dev/hd7 /dev/hd5 /usr/bin/blv.fs /dev/extlv1 /dev/lvtest1 /dev/lvtest2 Nodename -----------Mount Pt / /home /usr /var /tmp /mnt /blv /usr/bin/blv.fs /inst /test1 /test2 VFS jfs jfs jfs jfs jfs jfs jfs -jfs jfs jfs Size 24576 8192 1105920 8192 40960 ------Options --------rw rw rw Auto yes yes yes yes yes no no no no no no Accounting no no no no no no no no no no no
145
inet 9.3.4.79 netmask 0xffffff00 broadcast 9.3.4.255 ____________________________ Routing tables Destination Gateway Flags Refcnt Use Netmasks: (root node) (0)0 ff00 0 (0)0 ffff ff00 0 (root node) Route Tree for Protocol Family 2: (root node) default itsorusi.itsc.aust UG 9.3.1 mickey_boot.itsc.a U 9.3.4 mickey_sb.itsc.aus U 9.3.5 mickey_en.itsc.aus U 127 localhost U (root node)
Interface
2 3 1 4 0
Route Tree for Protocol Family 6: (root node) (root node) Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll lo0 1536 <Link> 279124 0 279124 0 0 lo0 1536 127 localhost 279124 0 279124 0 0 en0 1500 <Link> 672530 0 672438 0 0 en0 1500 9.3.5 mickey_en.itsc. 672530 0 672438 0 0 en1* 1500 <Link> 235 0 0 0 0 et0* 1492 <Link> 0 0 0 0 0 et1* 1492 <Link> 0 0 0 0 0 tr1 1492 <Link> 748576 0 578803 0 0 tr1 1492 9.3.4 mickey_sb.itsc. 748576 0 578803 0 0 tr0 1492 <Link> 71366 0 38425 0 0 tr0 1492 9.3.1 mickey_boot.its 71366 0 38425 0 0 nameserver 9.3.1.74 domain itsc.austin.ibm.com
146
An HACMP Cookbook
147
C C
U491105 U435220
C C
U491105 U428079
U491105
U491105
U491105
U491105
U491105
C C C C C
C C C C C
C C C
148
An HACMP Cookbook
3250 X11fnt X11-R5 Maintenance Level X11rte.ext.obj 1.2.3.0 3250 X11rte X11-R5 Maintenance Level AIXwindows Run Time Environment Extensions AIXwindows Run Time Environment Extensions AIXwindows Run Time Environment Extensions X11-R5 Additional Postscript Fonts X11-R5 Extensions X11-R5 Info X11-R5 X Customize Utilities X11-R5 Motif SMIT X11-R5 X-Desktop X11-R5 Font Utility X11-R5 Additional Postscript Utilities X11rte.motif1.2.obj 1.2.3.0 3250 X11rte X11-R5 Maintenance Level Motif 1.2 Translated mwmrc Files Motif 1.2 Window Manager Program X11rte.obj 1.2.3.0 3250 X11rte X11-R5 Maintenance Level AIXwindows Run Time Environment AIXwindows Run Time Environment AIXwindows Run Time Environment X11-R5 Runtime Environment Fonts X11-R5 Runtime Environment Locales X11-R5 Runtime Environment X11-R5 Runtime Environment Examples X11-R5 Runtime Environment bos.data 3.2.0.0 3250 bos.data Maintenance Level Info Explorer Databases Terminal Capabilities Database bos.obj 3.2.0.0 3250 bos Maintenance Level 3251 AIX Maintenance Level Vital User Information Device Diagnostics POSIX Asynchronous I/O Services User Messaging Utilities ILS Locale Management Utilities C Language Preprocessor Trace Reporting and Error Logging Input Method Library & Keymaps Math Library Math Library(SYS-V/SAA Error Semantics) X10 Library Trace Reporting Library Network File System System Resource Controller Base Operating System
U491105
C C C C C C C C C C C C
U491119 U411705 U409194 U428192 U428193 U435058 U435060 U435062 U435064 U435070 U435222
C C C
C C C C C C C C C
C C C
C C C C C C C C C C C C C C C C C
U491123 U493251 U424153 U427865 U428206 U428212 U428215 U428218 U428223 U428226 U428231 U428232 U428233 U428236 U428243 U428249 U432415 149
Base Operating System Base Operating System The Base Operating System C library - Common Mode Network File System Trace Reporting and Error Logging Bourne Shell Korn Shell SYS-V IPC Utilities C Library Security Services Library Kernel Info Explorer Utilities tty Utilities and Device Drivers Printer Management Utilities Spooler Services Library Security Related Utilities and Files System IPL Utilities Base Device Drivers Device Drivers Reject Utilities Devices Message Catalog Logical Volume Manager Diskless Workstation Manager System Installation Utilities File Archival Utilities Device Configuration Utilities GXT100/GXT150 Device Drivers HFT Utilities and Device Drivers GXT1000 Device Drivers & Microcode GT3/4 Family Device Drivers & Microcode X11-R4 Library Motif 1.1.4 Library X11-R4 Toolkit Library File Scanning/Searching Utilities x25 Device Drivers Streams Devices, Interfaces & Utilities Base Network Utilities Lan Device Drivers Host Communications Device Drivers Communications Device Drivers awk Language Interpreter XCOFF File Management Utilities File Comparison Utilities System, Process, Boot Utilities File Attribute Utilities File System Management Utilities System Accounting Data Compression Utilities cron Daemon Utilities Date & Time Related Utilities DIRECTORIES Character Stream Editing Utilities Maintenance Level Update Utilities Character Set Tables & Libraries Device Configuration Library 150
An HACMP Cookbook
C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C
U432416 U432447 U433283 U433342 U434427 U434922 U434992 U434993 U434996 U434997 U434998 U435001 U435066 U435110 U435111 U435112 U435113 U435115 U435116 U435117 U435119 U435120 U435123 U435125 U435126 U435127 U435155 U435156 U435157 U435158 U435159 U435160 U435161 U435165 U435171 U435178 U435180 U435181 U435182 U435184 U435228 U435229 U435230 U435231 U435232 U435233 U435234 U435235 U435236 U435237 U435238 U435239 U435240 U435241 U435243
Curses Standard and Extended Libraries Remote Procedure Call Services Library Error Logging Utilities Mail Facilities Man Page Facility MultiMedia Device Drivers Base NFS Network Utilities Object Data Manager BSD Disk Quota Utilities Service Information Tool System Management Interface Tool System Activity Reporting Terminal Capability Utilities Video Capture Adapter vi Text Editor Base Operating System HFT Utilities and Device Drivers POSIX Asynchronous I/O Services tty Utilities and Device Drivers Devices Message Catalog System IPL Utilities Device Drivers Reject Utilities GT3/4 Family Device Drivers & Microcode Streams Devices, Interfaces & Utilities Base Device Drivers Application Installation Utilities Object Data Manager cron Daemon Utilities GXT1000 Device Drivers & Microcode The Base Operating System The Base Operating System The Base Operating System The Base Operating System Communications Device Drivers Device Diagnostics Kernel Mail Facilities 3250 Packaging Requisite bosadt.bosadt.data 3.2.0.0 No Maintenance Level Applied. bosadt.bosadt.obj 3.2.0.0 3250 bosadt Maintenance Level The bs Program Locale Management Utilities lex Program yacc Program DOS Device Merge Utility Assembler Utilities C Language Source Utilities FORTRAN Language Utilities lint Program make Program Program Debug Utilities
C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C
U435244 U435245 U435246 U435247 U435248 U435249 U435250 U435251 U435252 U435253 U435254 U435255 U435256 U435257 U435258 U435625 U436256 U436267 U436337 U436439 U436739 U436748 U436779 U436782 U436811 U437028 U437035 U437079 U437101 U437134 U437135 U437136 U437137 U437272 U437315 U437317 U437398 U491150
C C C C C C C C C C C C
U491125 U428255 U428260 U428263 U428265 U435121 U435259 U435260 U435261 U435262 U435263 U435264 151
Source Code Control (sccs) Utilities bosadt.lib.obj 3.2.0.0 3250 bosadt Maintenance Level BSD System Administration Help HFT Programming Examples New hardware fast library Base Development Libraries & Include files Base Development Libraries & Include files Base Development Libraries & Include files Programming Examples lint Program Rules Databases Include Files Include Files bosadt.prof.obj 3.2.0.0 3250 bosadt Maintenance Level Performance Profiling Utilities Base Profiling Support bosadt.xde.obj 3.2.0.0 3250 bosadt Maintenance Level xde Program Debugger bosext1.csh.obj 3.2.0.0 3250 bosext1 Maintenance Level C Shell bosext1.ecs.obj 3.2.0.0 3250 bosext1 Maintenance Level bosext1.extcmds.data 3.2.0.0 3250 bosext1.data Maintenance Level man Database bosext1.extcmds.obj 3.2.0.0 3250 bosext1 Maintenance Level Math Calculator Utilities Performance Monitoring Utilities bosext1.mh.obj 3.2.0.0 3250 bosext1 Maintenance Level mh Mail Program bosext1.uucp.obj 3.2.0.0 3250 bosext1 Maintenance Level uucp Program and Utilities bosext1.vdidd.obj 3.2.0.0 3250 bosext1 Maintenance Level Video Capture Adapter Utilities bosext2.acct.obj 3.2.0.0 3250 bosext2 Maintenance Level System Accounting Utilities 152
An HACMP Cookbook
U435265
C C C C C C C C C C C
U491125 U428266 U428267 U428268 U432448 U432449 U432450 U435266 U435267 U435306 U436252
C C C
C C
U491125 U428269
C C
U491126 U434995
U491126
C C
U491127 U428271
C C C
C C
U491126 U435268
C C
U491126 U435269
C C
U491126 U435270
C C
U491128 U435271
bosext2.ate.obj 3.2.0.0 3250 bosext2 Maintenance Level Simple Terminal Emulator bosext2.dlc8023.obj 3.2.0.0 3250 bosext2 Maintenance Level 8023 Data Link Control bosext2.dlcether.obj 3.2.0.0 3250 bosext2 Maintenance Level Ethernet Data Link Control bosext2.dlcfddi.obj 3.2.0.0 3250 bosext2 Maintenance Level FDDI Data Link Control bosext2.dlcqllc.obj 3.2.0.0 3250 bosext2 Maintenance Level QLLC Data Link Control bosext2.dlcsdlc.obj 3.2.0.0 3250 bosext2 Maintenance Level SDLC Data Link Control bosext2.dlctoken.obj 3.2.0.0 3250 bosext2 Maintenance Level Token Ring Data Link Control bosext2.dosutil.obj 3.2.0.0 3250 bosext2 Maintenance Level DOS File & Disk Utilities bosext2.games.obj 3.2.0.0 3250 bosext2 Maintenance Level Miscellaneous Amusements bosext2.lrn.data 3.2.0.0 3250 bosext2.data Maintenance Level bosext2.x25app.obj 3.2.0.0 3250 bosext2 Maintenance Level X25 Applications bosnet.ncs.obj 3.2.0.0 3250 bosnet Maintenance Level Network Computing Services Network Computing Services bosnet.nfs.obj 3.2.0.0 3250 bosnet Maintenance Level NFS Client Utilities NFS Server Utilities NFS SMIT Utilities NFS Client Utilities
C C
U491128 U435272
C C
U491128 U435172
C C
U491128 U435174
C C
U491128 U435173
C C
U491128 U435176
C C
U491128 U435177
C C
U491128 U435175
C C
U491128 U435124
C C
U491128 U428284
U491129
C C
U491128 U435179
C C C
C C C C C
bosnet.snmpd.obj 3.2.0.0 3250 bosnet Maintenance Level Simple Network Management Protocol Daemon (Agent) SNMP Daemon bosnet.tcpip.obj 3.2.0.0 3250 bosnet Maintenance Level TCP/IP Client Utilities TCP/IP Server Utilities TCP/IP SMIT Utilities bsl.en_US.aix.loc 3.2.0.0 3250 bsl Maintenance Level bsl.en_US.pc.loc 3.2.0.0 3250 bsl Maintenance Level bsmEn_US.msg 3.2.0.0 3250 bsmEn_US Maintenance Level Base System Messages - U.S. English SMIT Install Messages - U.S. English Base System Messages - U.S. English bspiEn_US.info 3.2.5.0 No Maintenance Level Applied. bssiEn_US.info 3.2.5.0 No Maintenance Level Applied. cluster.client 3.1.0.0 No Maintenance Level Applied. HACMP/6000 cluster.clvm 3.1.0.0 No Maintenance Level Applied. cluster.server 3.1.0.0 No Maintenance Level Applied. HACMP/6000 sd6k_clnt.obj 2.3.0.11 No Maintenance Level Applied. serdasd.mc 3.2.0.16 No Maintenance Level Applied. sysback.obj 3.2.0.30 No Maintenance Level Applied. txtfmt.bib.data 3.2.0.0 3250 txtfmt.data Maintenance Level txtfmt.bib.obj 3.2.0.0 3250 txtfmt Maintenance Level 154
An HACMP Cookbook
C C C
C C C C
U491131
U491131
C C C C
U438726
U438726
U491156
U491155
Text Formating Bibliography Utilities txtfmt.graf.obj 3.2.0.0 3250 txtfmt Maintenance Level Tektronics Terminal Drivers txtfmt.hplj.fnt 3.2.0.0 3250 txtfmt Maintenance Level txtfmt.ibm3812.fnt 3.2.0.0 3250 txtfmt Maintenance Level IBM-3812 Fonts txtfmt.ibm3816.fnt 3.2.0.0 3250 txtfmt Maintenance Level txtfmt.spell.data 3.2.0.0 3250 txtfmt.data Maintenance Level txtfmt.spell.obj 3.2.0.0 3250 txtfmt Maintenance Level Spell Checker Utilities txtfmt.tfs.data 3.2.0.0 3250 txtfmt.data Maintenance Level txtfmt.tfs.obj 3.2.0.0 3250 txtfmt Maintenance Level Text Formatting Utilities Text Formatting Utilities txtfmt.ts.obj 3.2.0.0 3250 txtfmt Maintenance Level Postscript Formatter txtfmt.xpv.obj 3.2.0.0 3250 txtfmt Maintenance Level X Preview Utility xlccmp.obj 1.3.0.0 3250 xlccmp 1.3 Maintenance Level
U428350
C C
U491155 U428351
U491155
C C
U491155 U428390
U491155
U491156
C C
U491155 U428352
U491156
C C C
C C
U491155 U428354
C C
U491155 U435300
U491204
State Codes: A -- Applied. B -- Broken. C -- Committed. N -- Not Installed, but was previously installed/seen on some media. - -- Superseded, not Applied. ? -- Inconsistent State...Run lppchk -v.
155
An HACMP Cookbook
hd2 hd2 hd2 hd3 hd3 hd3 hd3 hd4 hd4 hd4 hd4 hd5 hd5 hd5 hd5 hd5 hd6 hd6 hd6 hd7 hd7 hd7 hd7 hd8 hd8 hd8 hd9var hd9var hd9var hdisk0 hdisk1 hdisk2 hdisk3 hdisk4 hdisk5 hdisk6 hft0 hft0 hft0 inet0 inet0 loglvtest1 loglvtest1 loglvtest1 loglvtest1 loglvtest2 loglvtest2 loglvtest2 lvtest1 lvtest1 lvtest1 lvtest1 lvtest2 lvtest2 lvtest2
label lvserial_id size intra label lvserial_id size intra label lvserial_id size intra lvserial_id relocatable size type lvserial_id size type intra lvserial_id size type intra lvserial_id type intra label lvserial_id pvid pvid pvid pvid pvid pvid pvid console default_disp swkb_path hostname route copies intra lvserial_id type copies lvserial_id type copies label lvserial_id size copies label lvserial_id
/usr 000147325ccaf23c.6 135 c /tmp 000147325ccaf23c.5 5 c / 000147325ccaf23c.3 3 e 000147325ccaf23c.8 n 2 boot 000147325ccaf23c.2 20 paging e 000147325ccaf23c.9 2 sysdump c 000147325ccaf23c.1 jfslog c /var 000147325ccaf23c.7 000111874109e6740000000000000000 0000411925a746100000000000000000 000002992679061e0000000000000000 00002819699e632f0000000000000000 000005080b85c6880000000000000000 000009854777a0910000000000000000 000009854777a5c60000000000000000 1 gda0 /usr/lib/nls/loc/En_US.hftkeymap mickey net,,0,9.3.1.74 2 c 00014732b5a91022.1 jfslog 2 00014732ca66234e.1 jfslog 2 /test1 00014732b5a91022.2 20 2 /test2 00014732ca66234e.2
Part 1. Cluster Documentation Tool Report
157
lvtest2 mem0 mem0 mem0 mem1 mem1 mem1 rootvg rootvg rootvg sa1 scsi0 serdasda0 serdasda0 serdasda1 serdasda1 serdasda1 serdasda1 serdasda1 serdasdc0 serdasdc1 serdasdc1 siokb0 sys0 sys0 sys0 sys0 sys0 sys0 sys0 sys0 test1vg test1vg test1vg test1vg test1vg test1vg test2vg test2vg test2vg test2vg test2vg test2vg tok0 tok0 tok0 tok1 tok1 tok1 tok1 tok1 tok1 tr0 tr0 tr0 158
size desc size type desc size type pv state vgserial_id dma_lvl ucode dma_bus_mem ucode bus_intr_lvl bus_io_addr dma_bus_mem dma_lvl ucode ucode desc ucode int_level bootdisk dcache icache keylock modelcode realmem rostime syscons auto_on pv pv quorum state vgserial_id auto_on pv pv quorum state vgserial_id alt_addr dma_bus_mem ring_speed alt_addr bus_intr_lvl bus_io_addr dma_bus_mem dma_lvl ring_speed netaddr netmask state
25 32 32 0x8 32 32 0x8 000111874109e6740000000000000000 0 000147325ccaf23c 0x2 /etc/microcode/8d77.32.54 0x250000 /etc/microcode/8f78.00.16 0x7 0xc400 0x800000 0x9 8f78.00.16 8f78.00.16 51 8f78.00.16 0x1 hd5 64K 8K normal 0x0010 65536 9003071302 /dev/hft n 000002992679061e0000000000000000 0000411925a746100000000000000000 n 0 00014732b5a91022 n 000005080b85c6880000000000000000 00002819699e632f0000000000000000 n 0 00014732ca66234e 0x42005aa8b484 0x200000 16 0x42005aa8d1f3 0x5 0x96a0 0x352000 0x5 16 9.3.1.45 255.255.255.0 up
An HACMP Cookbook
159
rsnet1
serial
trnet1
public
0x42005aa8d1f3
An HACMP Cookbook
0x42005aa8b484
Service Interface goofy_tty0: IP address: /dev/tty0 Hardware Address: Network: rsnet1 Attribute: serial Service Interface goofy_tty0 has no standby interfaces.
Service Interface goofy: IP address: 9.3.1.80 Hardware Address: 0x42005aa8d1f3 Network: trnet1 Attribute: public Service Interface goofy has a possible boot configuration: Boot (Alternate Service) Interface: goofy_boot IP address: 9.3.1.46 Network: trnet1 Attribute: public Service Interface goofy has 1 standby interfaces. Standby Interface 1: goofy_sb IP address: 9.3.4.80 Network: trnet1 Attribute: public
161
Service Interface mickey_en: IP address: 9.3.5.79 Hardware Address: Network: etnet1 Attribute: private Service Interface mickey_en has no standby interfaces.
Service Interface mickey_tty0: IP address: /dev/tty0 Hardware Address: Network: rsnet1 Attribute: serial Service Interface mickey_tty0 has no standby interfaces.
Service Interface mickey: IP address: 9.3.1.79 Hardware Address: 0x42005aa8b484 Network: trnet1 Attribute: public Service Interface mickey has a possible boot configuration: Boot (Alternate Service) Interface: mickey_boot IP address: 9.3.1.45 Network: trnet1 Attribute: public Service Interface mickey has 1 standby interfaces. Standby Interface 1: mickey_sb IP address: 9.3.4.79 Network: trnet1 Attribute: public
Breakdown of network connections: Connections to network etnet1 Node goofy is connected to network etnet1 by these interfaces: goofy_en Node mickey is connected to network etnet1 by these interfaces: mickey_en
Connections to network rsnet1 Node goofy is connected to network rsnet1 by these interfaces: goofy_tty0 Node mickey is connected to network rsnet1 by these interfaces: 162
An HACMP Cookbook
mickey_tty0
Connections to network trnet1 Node goofy is connected to network trnet1 by these interfaces: goofy_boot goofy goofy_sb Node mickey is connected to network trnet1 by these interfaces: mickey_boot mickey mickey_sb
163
In the following pages you will find shell scripts which have been prefixed by CMD, PRE, POS and REC. Read the explanations given below in order to understand what they are all about. When you have understood that, then you will easily understand what they contain.
The HACMP daemons which run on the various cluster nodes all communicate amongst themselves. They react to the 32 predefined cluster events such as : Node 2 has just rejoined the cluster A network has just failed
Default shell scripts for all of the events are in the directory /usr/sbin/cluster/events. Some of the scripts are just empty shells which you can customize according to your needs. It is advisable NOT to modify the original scripts. Select the event you wish to customize. This is copied into the /usr/HACMP_ANSS/script directory and prefixed by CMD_ (for example, network_down --> CMD_network_down). The events are configured in the ODM. The event object class is called /etc/objrepos/HACMPevent. As the location of the event script to be executed is stored within the object, it is necessary to modify the path name, either with SMIT or use the tool and let it do it for you automatically.
Sometimes it is necessary to carry out a certain action before (PRE) or after (POS) an event script is executed. An example may be sending a message PRE_stop_server before stopping the server application through CMD_stop_server. Then once it has taken place, sending another message via POS_stop_server. The PRE and POST events are also modified by SMIT or by the tool. They are placed in the /usr/HACMP_ANSS/script directory as well.
Each event should send a return code of 0 if it has successfully completed execution. If not, then HACMP will not terminate the event properly and you will see a number of messages on the console. We can customize a reaction to a script terminating with a non 0 exit status by executing a RECOVERY script. This script will be executed one or more times depending on how you have set the Retry Counter field in the SMIT Event Customization panel. Once again the RECOVERY script is configured either through SMIT or with the tool. A template is created for you (if you use the tool) in /usr/HACMP_ANSS/script with the event name prefixed by REC_ (for example, REC_network_down). The shell script is empty, and you are free to customize it as you wish.
164
An HACMP Cookbook
fail_standby
Sends a console message when a standby adapter fails or is no longer available because it has been used to take over the IP address of another adapter. Sends a console message when a standby adapter becomes available. Occurs when the cluster determines that a network has failed. The event script provided takes no default action, since the appropriate action will be site/LAN specific. Occurs only after a network_down event has successfully completed. The event script provided takes no default action, since the appropriate action will be site/LAN specific. Occurs when the cluster determines that a network has become available. The event script provided takes no default action, since the appropriate action will be site/LAN specific. Occurs only after a network_up event has successfully completed. The event script provided takes no default action, since the action will be site/LAN specific. Occurs when a node is detaching from the cluster, either voluntarily or due to a failure. Depending on whether the node is local or remote, either the node_down_local or node_down_remote sub event is called. Occurs only after a node_down event has successfully completed. Depending on whether the node is local or remote, either the node_down_local_complete or node_down_remote_complete sub event is called. Occurs when a node is joining the cluster. Depending on whether the the node is local or remote, either the node_up_local or node_up_remote sub event is called. Occurs only after a node_up event has successfully completed. Depending on whether the node is local or remote, either the node_up_local_complete or node_up_remote_complete sub event is called. Exchanges or swaps the IP addresses of two network interfaces. name serving are temporarily turned off during this event. NIS and
join_standby network_down
network_down_complete
network_up
network_up_complete
node_down
node_down_complete
node_up
node_up_complete
swap_adapter swap_adapter_complete
Occurs only after a swap_adapter event has successfully completed. Ensures that the local ARP cache is updated by deleting entries and pinging cluster IP addresses. Occurs when an HACMP event script fails for some reason.
event_error
acquire_takeover_addr get_disk_vg_fs
165
node_down_local
Releases resources taken from a remote node, stops application servers, releases a service address taken from a remote node, releases concurrent volume groups, unmounts file systems and reconfigures the node to its boot address. Instructs the cluster manager to exit when the local node has completed detaching from the cluster. This event only occurs after a node_down_local event has successfully completed. Unmounts any NFS file systems and places a concurrent volume group in non-concurrent mode when the local node is the only surviving node in the cluster. If the failed node did not go down gracefully, acquires a failed nodes resources: file systems, volume groups and disks and service address.
node_down_local_complete
node_down_remote
node_down_remote_complete Starts takeover application servers if the remote node did not go down gracefully. This event only occurs after a node_down_remote event has successfully completed. node_up_local When the local node attaches to the cluster: acquires the service address, clears the application server file, acquires file systems, volume groups and disks resources, exports file systems and either activates concurrent volume groups or puts them into concurrent mode depending upon the status of the remote node(s). Starts application servers and then checks to see if an inactive takeover is needed. This event only occurs after a node_up_local event has successfully completed. Causes the local node to release all resources taken from the remote node and to place the concurrent volume groups into concurrent mode. Allows the local node to do an NFS mount only after the remote node is completely up. This event only occurs after a node_up_remote event has successfully completed. Detaches the service address and reconfigures to its boot address. Identifies a takeover address to be released because a standby adapter on the local node is masquerading as the service address of the remote node. Reconfigures the local standby into its original role. Releases volume groups and file systems that the local node took from the remote node. Starts application servers. Stops application servers.
node_up_local_complete
node_up_remote node_up_remote_complete
release_service_addr release_takeover_addr
AIX has a daemon errdemon which is alerted by the kernel whenever a HARDWARE or SOFTWARE incident takes place. Errors are logged into the AIX error log, and can be examined with the errpt command. There exists an object class /etc/objrepos/errnotify in ODM which can be customized for the special handling of errors. The customization can be carried out with SMIT, and consists of configuring the types of errors to be dealt with, and the action to be taken when such an error occurs. This is done through the definition of a script to be executed when this error is put into the AIX error log. The program err_select can also be used for the customization of error handling. It creates templates in /usr/HACMP_ANSS/script for you to customize. All of these templates are prefixed
An HACMP Cookbook
166
by error_. error_SCSI).
The name of the file depends on the type of error selected (for example,
167
print error class = $3 print error type = $4 print alert flag = $5 print resource name = $6 print resource type = $7 print resource class = $8 print error label = $9) >> $LOG ####################################################################### # DO NOT FORGET TO set TO_WHOM in error_MAIL . /usr/HACMP_ANSS/tools/ERROR_TOOL/error_MAIL $1 $2 $3 $4 $5 $6 $7 $8 $9 ####################################################################### # DO NOT FORGET TO set QUEUE in error_PRINT # . /usr/HACMP_ANSS/tools/ERROR_TOOL/error_PRINT $1 $2 $3 $4 $5 $6 $7 $8 $9 ####################################################################### return $STATUS
169
wall System will be shutting Down in 20 Seconds. Please log off now. You will be able to login to your application again within 5 minutes. sleep 20 # This command does a shutdown with takeover of HACMP /usr/sbin/cluster/utilities/clstop -y -N -gr sleep 5 # We now want to shutdown the machine, until our administrator can # investigate the problem. /etc/shutdown -Fr ####################### END OF CUSTOMIZATION ############################## return $STATUS
An HACMP Cookbook
171
172
An HACMP Cookbook
# dspmsg rc.cat 6 Write system start up record to /usr/adm/sa/sa`date` #/bin/su - root -c /usr/lib/sa/sadc /usr/adm/sa/sa`date +%d` # Manufacturing post install process. # This must be at the end of this file, /etc/rc. if [ -x /etc/mfg/rc.preload ] then /etc/mfg/rc.preload fi dspmsg rc.cat 5 Multi-user initialization completedn exit 0
173
# # Close file descriptor 1 and 2 because the parent may be waiting # for the file desc. 1 and 2 to be closed. The reason is that this shell # script may spawn a child which inherit all the file descriptor from the parent # and the child process may still be running after this process is terminated. # The file desc. 1 and 2 are not closed and leave the parent hanging # waiting for those desc. to be finished. #LOGFILE=/dev/null # LOGFILE is where all stdout goes. LOGFILE=/tmp/rc.net.out # LOGFILE is where all stdout goes. >$LOGFILE # truncate LOGFILE. exec 1<&# close descriptor 1 exec 2<&# close descriptor 2 exec 1< /dev/null # open descriptor 1 exec 2< /dev/null # open descriptor 2 no -d lowclust # set cluster low water mark
################################################################## # Part I - Configuration using the data in the ODM database: # Enable network interface(s): ################################################################## # This should be done before routes are defined. # For each network adapter that has already been configured, the # following commands will define, load and configure a corresponding # interface. /usr/lib/methods/defif >>$LOGFILE 2>&1 /usr/lib/methods/cfgif $* >>$LOGFILE 2>&1 ################################################################## # Special X25 and SLIP handling ################################################################## # In addition to configure the network interface, X25 and SLIP # interfaces require special commands to complete the configuration # The x25xlate command bring the x25 translation table into the # kernel while the slattach changes the tty handling for the tty # port used by the the SLIP interface. A separate slattach command is # execute for every tty port used by configured SLIP interfaces. X25HOST=`lsdev -C -c if -s XT -t xt -S available` if [ ! -z $X25HOST ] then x25xlate >>$LOGFILE 2>&1 fi SLIPHOST=`lsdev -C -c if -s SL -t sl -S available | awk { print $1 }` for i in $SLIPHOST do echo $i >>$LOGFILE 2>&1 TTYPORT=`lsattr -E -l $i -F value -a ttyport` TTYBAUD=`lsattr -E -l $i -F value -a baudrate` TTYDIALSTRING=`lsattr -E -l $i -F value -a dialstring` rm -f /etc/locks/LCK..$TTYPORT if [ -z $TTYBAUD -a -z $TTYDIALSTRING ] then 174
An HACMP Cookbook
FromHOST=`lsattr -E -l $i -F value -a netaddr` DestHOST=`lsattr -E -l $i -F value -a dest` SLIPMASK=`lsattr -E -l $i -F value -a netmask` if [ -z $SLIPMASK ] then ifconfig $SLIPHOST inet $FromHOST $DestHOST up else ifconfig $SLIPHOST inet $FromHOST $DestHOST netmask $SLIPMASK up fi ( slattach $TTYPORT ) >>$LOGFILE 2>&1 else eval DST=$TTYDIALSTRING >>$LOGFILE 2>&1 ( eval slattach $TTYPORT $TTYBAUD $DST ) >>$LOGFILE 2>>$LOGFILE fi done ################################################################## # Configure the Internet protocol kernel extension (netinet): ################################################################## # The following commands will also set hostname, default gateway, # and static routes as found in the ODM database for the network. /usr/lib/methods/definet >>$LOGFILE 2>&1 /usr/lib/methods/cfginet >>$LOGFILE 2>&1
################################################################## # Part II - Traditional Configuration. ################################################################## # An alternative method for bringing up all the default interfaces # is to specify explicitly which interfaces to configure using the # ifconfig command. Ifconfig requires the configuration information # be specified on the command line. Ifconfig will not update the # information kept in the ODM configuration database. # # Valid network interfaces are: # lo=local loopback, en=standard ethernet, et=802.3 ethernet # sl=serial line IP, tr=802.5 token ring, xt=X.25 # # e.g., en0 denotes standard ethernet network interface, unit zero. # # Below are examples of how you could bring up each interface using # ifconfig. Since you can specify either a hostname or a dotted # decimal address to set the interface address, it is convenient to # set the hostname at this point and use it for the address of # an interface, as shown below: # #/bin/hostname robo.austin.ibm.com >>$LOGFILE 2>&1 # # (Remember that if you have more than one interface, # youll want to have a different IP address for each one. # Below, xx.xx.xx.xx stands for the internet address for the # given interface). # #/usr/sbin/ifconfig lo0 inet loopback up >>$LOGFILE 2>&1 #/usr/sbin/ifconfig en0 inet `hostname` up >>$LOGFILE 2>&1
Part 1. Cluster Documentation Tool Report
175
#/usr/sbin/ifconfig et0 inet xx.xx.xx.xx #/usr/sbin/ifconfig tr0 inet xx.xx.xx.xx #/usr/sbin/ifconfig sl0 inet xx.xx.xx.xx #/usr/sbin/ifconfig xt0 inet xx.xx.xx.xx # # # Now we set any static routes. # # /usr/sbin/route add 0 gateway # /usr/sbin/route add 192.9.201.0 gateway
################################################################## # Part III - Miscellaneous Commands. ################################################################## # Set the hostid and uname to `hostname`, where hostname has been # set via ODM in Part I, or directly in Part II. # (Note it is not required that hostname, hostid and uname all be # the same). /usr/sbin/hostid `hostname` >>$LOGFILE 2>&1 /bin/uname -S`hostname|sed s/..*$//` >>$LOGFILE 2>&1 ################################################### # The socket default buffer size (initial advertized TCP window) is being # set to a default value of 16k (16384). This improves the performance # for ethernet and token ring networks. Networks with lower bandwidth # such as SLIP (Serial Line Internet Protocol) and X.25 or higher bandwidth # such as Serial Optical Link and FDDI would have a different optimum # buffer size. # ( OPTIMUM WINDOW = Bandwidth * Round Trip Time ) ################################################### if [ -f /usr/sbin/no ] ; then /usr/sbin/no -o tcp_sendspace=16384 /usr/sbin/no -o tcp_recvspace=16384 fi /etc/no -o ipforwarding=0 /etc/no -o ipsendredirects=0
176
An HACMP Cookbook
# # /etc/hosts # # This file contains the hostnames and their address for hosts in the # network. This file is used to resolve a hostname into an Internet # address. # # At minimum, this file must contain the name and address for each # device defined for TCP in your /etc/net file. It may also contain # entries for well-known (reserved) names such as timeserver # and printserver as well as any other host name and address. # # The format of this file is: # Internet Address Hostname # Comments # Items are separated by any number of blanks and/or tabs. A # # indicates the beginning of a comment; characters up to the end of the # line are not interpreted by routines which search this file. Blank # lines are allowed. # Internet Address Hostname # Comments # 192.9.200.1 net0sample # ethernet name/address # 128.100.0.1 token0sample # token ring name/address # 10.2.0.2 x25sample # x.25 name/address 127.0.0.1 loopback localhost # loopback (lo0) name/address # Cluster 1 - disney 9.3.1.79 9.3.4.79 9.3.5.79 9.3.1.46 9.3.1.80 9.3.4.80 9.3.5.80 mickey.itsc.austin.ibm.com mickey mickey_sb.itsc.austin.ibm.com mickey_sb mickey_en.itsc.austin.ibm.com mickey_en goofy_boot.itsc.austin.ibm.com goofy_boot goofy.itsc.austin.ibm.com goofy goofy_sb.itsc.austin.ibm.com goofy_sb goofy_en.itsc.austin.ibm.com goofy_en
# Cluster 2 - dave 9.3.1.3 9.3.1.16 9.3.4.16 9.3.1.6 9.3.1.17 9.3.4.17 hadave1_boot.itsc.austin.ibm.com hadave1_boot hadave1.itsc.austin.ibm.com hadave1 hadave1_sb.itsc.austin.ibm.com hadave1_sb hadave2_boot.itsc.austin.ibm.com hadave2_boot hadave2.itsc.austin.ibm.com hadave2 hadave2_sb.itsc.austin.ibm.com hadave2_sb
# Client & Others 9.3.1.43 pluto 9.3.1.74 gandalf 9.209.46.194 surveyor 9.209.41.111 aix11 9.209.32.4 jd560 9.3.4.16 hadave1_sb.itsc.austin.ibm.com hadave1_sb
177
9.3.1.3 9.3.1.45
/home: dev = /dev/hd1 vol = /home mount = true check = true free = false vfs = jfs log = /dev/hd8 /usr: dev vfs log mount check type vol = /dev/hd2 = jfs = /dev/hd8 = automatic = false = bootfs = /usr
178
An HACMP Cookbook
free /var:
= false
dev = /dev/hd9var vol = /var mount = automatic check = false free = false vfs = jfs log = /dev/hd8 type = bootfs /tmp: dev vfs log mount check vol free /mnt: dev = vol = mount check free = vfs = log /blv: dev = vol = mount check free = vfs = log /dev/hd5 spare = false = false false jfs = /dev/hd8 /dev/hd7 spare = false = false false jfs = /dev/hd8 = /dev/hd3 = jfs = /dev/hd8 = automatic = false = /tmp = false
/usr/bin/blv.fs: dev = /usr/bin/blv.fs vol = / /inst: dev vfs log mount check options account /test1: dev vfs
= /dev/lvtest1 = jfs
Part 1. Cluster Documentation Tool Report
179
log mount check options account /test2: dev vfs log mount check options account
180
An HACMP Cookbook
echo stream tcp nowait root internal echo dgram udp wait root internal discard stream tcp nowait root internal discard dgram udp wait root internal daytime stream tcp nowait root internal daytime dgram udp wait root internal chargen stream tcp nowait root internal chargen dgram udp wait root internal ftp stream tcp nowait root /etc/ftpd ftpd telnet stream tcp nowait root /etc/telnetd telnetd time stream tcp nowait root internal time dgram udp wait root internal #bootps dgram udp wait root /etc/bootpd bootpd #tftp dgram udp wait nobody /etc/tftpd tftpd -n #finger stream tcp nowait nobody /etc/fingerd fingerd #rexd sunrpc_tcp tcp wait root /usr/etc/rpc.rexd rexd 100017 1 executiond sunrpc_tcp tcp wait root /usr/lpp/sd/executiond executiond 300201 1 comp_ed sunrpc_tcp tcp wait root /usr/lpp/sd/executiond comp_ed 33333332 1 rstatd sunrpc_udp udp wait root /usr/etc/rpc.rstatd rstatd 100001 1-3 rusersd sunrpc_udp udp wait root /usr/etc/rpc.rusersd rusersd 100002 1-2 rwalld sunrpc_udp udp wait root /usr/etc/rpc.rwalld rwalld 100008 1 sprayd sunrpc_udp udp wait root /usr/etc/rpc.sprayd sprayd 100012 1 pcnfsd sunrpc_udp udp wait root /etc/rpc.pcnfsd pcnfsd 150001 1 exec stream tcp nowait root /etc/rexecd rexecd #biff dgram udp wait root /etc/comsat comsat login stream tcp nowait root /etc/rlogind rlogind shell stream tcp nowait root /etc/rshd rshd #talk dgram udp wait root /etc/talkd talkd ntalk dgram udp wait root /etc/talkd talkd uucp stream tcp nowait root /etc/uucpd uucpd #instsrv stream tcp nowait netinst /u/netinst/bin/instsrv instsrv -r /tmp/netinstalllog /u/netinst/scripts godm stream tcp nowait root /usr/sbin/cluster/godmd
181
# # Each line must consist of two parts:# # 1) A selector to determine the message priorities to which the # line applies # 2) An action. # # The two fields must be separated by one or more tabs or spaces. # # format: # # <msg_src_list> <destination> # # where <msg_src_list> is a semicolon separated list of <facility>.<priority> # where: # # <facility> is: # * - all (except mark) # mark - time marks # kern,user,mail,daemon, auth,... (see syslogd(AIX Commands Reference)) # # <priority> is one of (from high to low): # emerg/panic,alert,crit,err(or),warn(ing),notice,info,debug # (meaning all messages of this priority or higher) # # <destination> is: # /filename - log to this file # username[,username2...] - write to user(s) # @hostname - send to syslogd on this machine # * - send to all logged in users # # example: # mail messages, at debug or higher, go to Log file. File must exist. # all facilities, at debug and higher, go to console # all facilities, at crit or higher, go to all users # mail.debug /usr/spool/mqueue/syslog # *.debug /dev/console # *.crit * # HACMP/6000 Critical Messages from HACMP/6000 local0.crit /dev/console # HACMP/6000 Informational Messages from HACMP/6000 local0.info /usr/adm/cluster.log # HACMP/6000 Messages from Cluster Scripts user.notice /usr/adm/cluster.log
182
An HACMP Cookbook
: : US Government Users Restricted Rights - Use, duplication or : disclosure restricted by GSA ADP Schedule Contract with IBM Corp. : : Note - initdefault and sysinit should be the first and second entry. : init:2:initdefault: brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot powerfail::powerfail:/etc/rc.powerfail >/dev/console 2>&1 # d51225 rc:2:wait:/etc/rc > /dev/console 2>&1 # Multi-User checks fbcheck:2:wait:/usr/lib/dwm/fbcheck >/dev/console 2>&1 # run /etc/firstboot srcmstr:2:respawn:/etc/srcmstr # System Resource Controller harc:2:wait:/usr/sbin/cluster/etc/harc.net # HACMP6000 network startup rctcpip:a:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons rcnfs:a:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons cons:0123456789:respawn:/etc/getty /dev/console piobe:2:wait:/bin/rm -f /usr/lpd/pio/flags/* # Clean up printer flags files cron:2:respawn:/etc/cron qdaemon:a:wait:/bin/startsrc -sqdaemon writesrv:a:wait:/bin/startsrc -swritesrv uprintfd:2:respawn:/etc/uprintfd rcncs:a:wait:sh /etc/rc.ncs infod:2:once:startsrc -s infod tty0:2:off:/etc/getty /dev/tty0 clvm6000:2:wait:/usr/sbin/cluster/cllvm -c status # Check CLVM stat clinit:a:wait:touch /usr/sbin/cluster/.telinit # HACMP6000 This must be last entry in inittab!
183
E.6 CONTENTS OF THE HACMP OBJECTS IN THE ODM E.6.1 odmget of /etc/objrepos/HACMPadapter
HACMPadapter: type = ether network = etnet1 nodename = goofy ip_label = goofy_en function = service identifier = 9.3.5.80 haddr = HACMPadapter: type = rs232 network = rsnet1 nodename = goofy ip_label = goofy_tty0 function = service identifier = /dev/tty0 haddr = HACMPadapter: type = token network = trnet1 nodename = goofy ip_label = goofy function = service identifier = 9.3.1.80 haddr = 0x42005aa8d1f3 HACMPadapter: type = token network = trnet1 nodename = goofy ip_label = goofy_boot function = boot identifier = 9.3.1.46 haddr = HACMPadapter: type = token network = trnet1 nodename = goofy ip_label = goofy_sb function = standby identifier = 9.3.4.80 haddr = HACMPadapter: type = ether network = etnet1 nodename = mickey ip_label = mickey_en 184
An HACMP Cookbook
function = service identifier = 9.3.5.79 haddr = HACMPadapter: type = rs232 network = rsnet1 nodename = mickey ip_label = mickey_tty0 function = service identifier = /dev/tty0 haddr = HACMPadapter: type = token network = trnet1 nodename = mickey ip_label = mickey function = service identifier = 9.3.1.79 haddr = 0x42005aa8b484 HACMPadapter: type = token network = trnet1 nodename = mickey ip_label = mickey_boot function = boot identifier = 9.3.1.45 haddr = HACMPadapter: type = token network = trnet1 nodename = mickey ip_label = mickey_sb function = standby identifier = 9.3.4.79 haddr =
numargs = 0 args = help = Tools for verifying that a cluster is properly installed and configured catalog = command.cat setno = 0 msgno = 2 HACMPcommand: command = clverify options = cluster optflag = 1 path = numargs = 0 args = help = Tools for verifying that a cluster is properly installed and configured catalog = command.cat setno = 0 msgno = 3 HACMPcommand: command = clverify.software options = bos optflag = 1 path = numargs = 0 args = help = Verifies that your software environment is compatible with HACMP catalog = command.cat setno = 0 msgno = 6 HACMPcommand: command = clverify.software options = prereq optflag = 1 path = numargs = 0 args = help = Verifies that your software environment is compatible with HACMP catalog = command.cat setno = 0 msgno = 7 HACMPcommand: command = clverify.software options = badptfs optflag = 1 path = numargs = 0 args = help = Verifies that your software environment is compatible with HACMP catalog = command.cat setno = 0 msgno = 8
186
An HACMP Cookbook
HACMPcommand: command = clverify.software options = lpp optflag = 1 path = numargs = 0 args = help = Verifies that your software environment is compatible with HACMP catalog = command.cat setno = 0 msgno = 8 HACMPcommand: command = clverify.cluster options = topology optflag = 1 path = numargs = 0 args = help = Verifies that your cluster is configured properly catalog = command.cat setno = 0 msgno = 9 HACMPcommand: command = clverify.cluster options = config optflag = 1 path = numargs = 0 args = help = Verifies that your cluster is configured properly catalog = command.cat setno = 0 msgno = 10 HACMPcommand: command = clverify.software.prereq options = optflag = 0 path = /usr/sbin/cluster/diag/clvreq numargs = 0 args = help = Verifies that all fixes to AIX required by HACMP have been installed catalog = command.cat setno = 0 msgno = 13 HACMPcommand: command = clverify.software.lpp options = optflag = 0 path = /usr/sbin/cluster/diag/clvhacmp numargs = 0 args =
Part 1. Cluster Documentation Tool Report
187
help = Verifies that HACMP is properly installed catalog = command.cat setno = 0 msgno = 14 HACMPcommand: command = clverify.software.bos options = optflag = 0 path = /usr/sbin/cluster/diag/clvbos numargs = 0 args = help = Verifies that the AIX level is correct for HACMP catalog = command.cat setno = 0 msgno = 15 HACMPcommand: command = clverify.software.badptfs options = optflag = 0 path = /usr/sbin/cluster/diag/clvinval numargs = 0 args = help = Verifies that no known PTFs that break HACMP are installed catalog = command.cat setno = 0 msgno = 16 HACMPcommand: command = clverify.cluster.topology options = check optflag = 1 path = numargs = 0 args = help = Verifies that all cluster nodes agree on cluster topology catalog = command.cat setno = 0 msgno = 17 HACMPcommand: command = clverify.cluster.topology options = sync optflag = 1 path = numargs = 0 args = help = Forces all cluster nodes to agree on cluster topology catalog = command.cat setno = 0 msgno = 18 HACMPcommand: command = clverify.cluster.topology.check 188
An HACMP Cookbook
options = optflag = 0 path = /usr/sbin/cluster/diag/clconfig numargs = 1 args = -t help = Verifies that all cluster nodes agree on cluster topology catalog = command.cat setno = 0 msgno = 19 HACMPcommand: command = clverify.cluster.topology.sync options = optflag = 0 path = /usr/sbin/cluster/diag/clconfig numargs = 2 args = -s -t help = Forces all cluster nodes to agree on cluster topology catalog = command.cat setno = 0 msgno = 20 HACMPcommand: command = clverify.cluster.config options = networks optflag = 1 path = numargs = 0 args = command.cat help = Verifies that cluster resources are properly installed catalog = setno = 0 msgno = 23 HACMPcommand: command = clverify.cluster.config options = resources optflag = 1 path = numargs = 0 args = help = Verifies that cluster resources are properly installed catalog = command.cat setno = 0 msgno = 22 HACMPcommand: command = clverify.cluster.config options = both optflag = 1 path = numargs = 0 args = help = Verifies that cluster resources are properly installed catalog = command.cat
Part 1. Cluster Documentation Tool Report
189
setno = 0 msgno = 21 HACMPcommand: command = clverify.cluster.config.networks options = optflag = 0 path = /usr/sbin/cluster/diag/clconfig numargs = 2 args = -v -t help = Checks for proper configuration of network adapters and tty lines catalog = command.cat setno = 0 msgno = 25 HACMPcommand: command = clverify.cluster.config.resources options = optflag = 0 path = /usr/sbin/cluster/diag/clconfig numargs = 2 args = -v -r help = Checks for agreement on resource ownership and takeover distribution catalog = command.cat setno = 0 msgno = 26 HACMPcommand: command = clverify.cluster.config.both options = optflag = 0 path = /usr/sbin/cluster/diag/clconfig numargs = 1 args = -v help = Runs both the networks and resources programs catalog = command.cat setno = 0 msgno = 24 HACMPcommand: command = cldiag options = logs optflag = 1 path = numargs = 0 args = help = Allows for selected viewing of HACMP log files, enables debugging of the C luster Manager, or enables dumping of all Lock Manager resources. catalog = command.cat setno = 0 msgno = 27 HACMPcommand: command = cldiag.logs options = scripts 190
An HACMP Cookbook
optflag = 1 path = numargs = 0 args = help = Allows for selected viewing of script output or syslog output. catalog = command.cat setno = 0 msgno = 28 HACMPcommand: command = cldiag.logs.scripts options = optflag = 0 path = /usr/sbin/cluster/diag/cld_logfiles numargs = 2 args = -t scripts help = scripts [-h host] [-s] [-f] [-d days] [-R file] [event ...] where: -h host is the name of a remote host from which to gather log data -s filters Start/Complete events -f filters failure events -d days defines the number of previous days from which to retrieve log -R file is file to which output is saved event is a list of cluster events Allows for parsing the /tmp/hacmp.out file catalog = command.cat setno = 0 msgno = 29 HACMPcommand: command = cldiag.logs options = syslog optflag = 1 path = numargs = 0 args = help = Allows for selected viewing of script output or syslog output. catalog = command.cat setno = 0 msgno = 30 HACMPcommand: command = cldiag.logs.syslog options = optflag = 0 path = /usr/sbin/cluster/diag/cld_logfiles numargs = 2 args = -t syslog help = syslog [-h host] [-e] [-w] [-d days] [-R file] [process ...] where: -h host is the name of a remote host from which to gather log data -e filters error events -w filters warning events
Part 1. Cluster Documentation Tool Report
191
-d days defines the number of previous days from which to retrieve log -R file is file to which output is saved process is a list of cluster daemon processes Allows for parsing the /usr/adm/cluster.log file. catalog = command.cat setno = 0 msgno = 31 HACMPcommand: command = cldiag options = debug optflag = 1 path = numargs = 0 args = help = Allows for selected viewing of HACMP log files, enables debugging of the C luster Manager, or enables dumping of all Lock Manager resources. catalog = command.cat setno = 0 msgno = 32 HACMPcommand: command = cldiag.debug options = clstrmgr optflag = 1 path = numargs = 0 args = help = Enables debugging of the Cluster Manager or the dumping of the lock resour ce table. catalog = command.cat setno = 0 msgno = 33 HACMPcommand: command = cldiag.debug.clstrmgr options = optflag = 0 path = /usr/sbin/cluster/diag/cld_debug numargs = 2 args = -t clstrmgr help = clstrmgr [-l level] [-R file] where: -l level is the level of debugging performed (0 - 9, where 0 turns debugging off) -R file is the file to which output is saved
192
An HACMP Cookbook
HACMPcommand: command = cldiag.debug options = cllockd optflag = 1 path = numargs = 0 args = help = Enables debugging of the Cluster Manager or the dumping of the lock resour ce table. catalog = command.cat setno = 0 msgno = 35 HACMPcommand: command = cldiag.debug.cllockd options = optflag = 0 path = /usr/sbin/cluster/diag/cld_debug numargs = 2 args = -t cllockd help = cllockd [-R file] where: -R file is the file to which output is saved Allows dumping of the Lock Resource Table. catalog = command.cat setno = 0 msgno = 36 HACMPcommand: command = cldiag options = vgs optflag = 1 path = numargs = 0 args = help = Finds volume group inconsistencies among hosts and the disks. catalog = command.cat setno = 0 msgno = 37 HACMPcommand: command = cldiag.vgs options = optflag = 0 path = /usr/sbin/cluster/diag/cld_vgs numargs = 0 args = help = vgs hostnames [-v volume_groups] where: -h hostnames is a list of 2 to 8 hostnames separated by commas -v volume_groups is a list of volume group names separated by commas Note: Spaces are not allowed between hostname entries or volume group entries
193
Checks for consistencies of volume groups among hosts, ODMs, and disks. catalog = command.cat setno = 0 msgno = 38 HACMPcommand: command = cldiag options = trace optflag = 1 path = numargs = 0 args = help = Obtains a sequential flow of time stamped system events. catalog = command.cat setno = 0 msgno = 39 HACMPcommand: command = cldiag.trace options = optflag = 0 path = /usr/sbin/cluster/diag/cld_trace numargs = 0 args = help = trace [-t time] [-R file] [-l] daemon ... where: -t time is the number of seconds to perform the trace -R file is file to which output is saved -l chooses a more detailed trace option daemon is a list of cluster daemons to trace Allows for tracing HACMP daemons (clstrmgr, cllockd, clsmuxpd, clinfo). catalog = command.cat setno = 0 msgno = 40 HACMPcommand: command = cldiag options = error optflag = 1 path = numargs = 0 args = help = Displays errors from the error log (hardware, software, system) that occur in the cluster. catalog = command.cat setno = 0 msgno = 41 HACMPcommand: command = cldiag.error options = optflag = 0 194
An HACMP Cookbook
path = /usr/sbin/cluster/diag/cld_error numargs = 0 args = help = error type [-h host] [-R file] where: type is one of: short - short eror report long - long error report cluster - HACMP/6000 specific short error report -h host is the name of a remote host from which to gather log data -R file is file to which output is saved Allows for parsing the system error log. catalog = command.cat setno = 0 msgno = 42
195
pre = post = recv = count = 0 HACMPevent: name = network_down desc = Script run when a network has failed. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/network_down notify = pre = post = recv = count = 0 HACMPevent: name = network_up_complete desc = Script run after the network_up script has successfully completed. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/network_up_complete notify = pre = post = recv = count = 0 HACMPevent: name = network_down_complete desc = Script run after the network_down script has successfully completed. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/network_down_complete notify = pre = post = recv = count = 0 HACMPevent: name = node_up desc = Script run when a node is attempting to join the cluster. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/node_up notify = pre = post = recv = 196
An HACMP Cookbook
count = 0 HACMPevent: name = node_down desc = Script run when a node is attempting to leave the cluster. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/node_down notify = pre = post = recv = count = 0 HACMPevent: name = node_up_complete desc = Script run after the node_up script has successfully completed. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/node_up_complete notify = pre = post = recv = count = 0 HACMPevent: name = node_down_complete desc = Script run after the node_down script has successfully completed. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/node_down_complete notify = pre = post = recv = count = 0 HACMPevent: name = join_standby desc = Script run after a standby adapter has become active. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/join_standby notify = pre = post = recv = count = 0 HACMPevent:
Part 1. Cluster Documentation Tool Report
197
name = fail_standby desc = Script run after a standby adapter has failed. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/fail_standby notify = pre = post = recv = count = 0 HACMPevent: name = acquire_service_addr desc = Script run to configure a service adapter with a service address. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/acquire_service_addr notify = pre = post = recv = count = 0 HACMPevent: name = acquire_takeover_addr desc = Script run to configure a standby adapter with a service address. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/acquire_takeover_addr notify = pre = post = recv = count = 0 HACMPevent: name = get_disk_vg_fs desc = Script run to acquire disks, varyon volume groups, and mount filesystems. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/get_disk_vg_fs notify = pre = post = recv = count = 0 HACMPevent: name = node_down_local desc = Script run when it is the local node which is leaving the cluster. 198
An HACMP Cookbook
setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/node_down_local notify = pre = post = recv = count = 0 HACMPevent: name = node_down_local_complete desc = Script run after the node_down_local script has successfully completed. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/node_down_local_complete notify = pre = post = recv = count = 0 HACMPevent: name = node_down_remote desc = Script run when it is a remote node which is leaving the cluster. setno = 0 msgno = 0 catalog = cmd = /usr/HACMP_ANSS/script/CMD_node_down_remote notify = /usr/HACMP_ANSS/script/event_NOTIFICATION pre = /usr/HACMP_ANSS/script/PRE_node_down_remote post = /usr/HACMP_ANSS/script/POS_node_down_remote recv = count = 0 HACMPevent: name = node_down_remote_complete desc = Script run after the node_down_remote script has successfully completed. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/node_down_remote_complete notify = pre = post = recv = count = 0 HACMPevent: name = node_up_local desc = Script run when it is the local node which is joining the cluster. setno = 0 msgno = 0 catalog =
Part 1. Cluster Documentation Tool Report
199
cmd = /usr/sbin/cluster/events/node_up_local notify = pre = post = recv = count = 0 HACMPevent: name = node_up_local_complete desc = Script run after the node_up_local script has successfully completed. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/node_up_local_complete notify = pre = post = recv = count = 0 HACMPevent: name = node_up_remote desc = Script run when it is a remote node which is joining the cluster. setno = 0 msgno = 0 catalog = cmd = /usr/HACMP_ANSS/script/CMD_node_up_remote notify = /usr/HACMP_ANSS/script/event_NOTIFICATION pre = /usr/HACMP_ANSS/script/PRE_node_up_remote post = recv = count = 0 HACMPevent: name = node_up_remote_complete desc = Script run after the node_up_remote script has successfully completed. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/node_up_remote_complete notify = pre = post = recv = count = 0 HACMPevent: name = release_service_addr desc = Script run to configure the boot address on the service adapter. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/release_service_addr notify = pre = 200
An HACMP Cookbook
post = recv = count = 0 HACMPevent: name = release_takeover_addr desc = Script run to configure a standby address on a standby adapter. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/release_takeover_addr notify = pre = post = recv = count = 0 HACMPevent: name = release_vg_fs desc = Script run to unmount filesystems and varyoff volume groups. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/release_vg_fs notify = pre = post = recv = count = 0 HACMPevent: name = start_server desc = Script run to start application servers. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/start_server notify = pre = post = recv = count = 0 HACMPevent: name = stop_server desc = Script run to stop application servers. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/stop_server notify = pre = post = recv = count = 0
Part 1. Cluster Documentation Tool Report
201
HACMPevent: name = unstable_too_long desc = Script run when the Cluster Manger has been unstable for too long. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/unstable_too_long notify = pre = post = recv = count = 0 HACMPevent: name = config_too_long desc = Script run when the Cluster Manger has been in configuration for too long. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/config_too_long notify = pre = post = recv = count = 0 HACMPevent: name = event_error desc = Script run when a previously executed script has failed to complete succes sfully. setno = 0 msgno = 0 catalog = cmd = /usr/sbin/cluster/events/event_error notify = pre = post = recv = count = 0
202
An HACMP Cookbook
203
name = rs232 desc = RS232 Serial Protocol addrtype = 1 path = /usr/sbin/cluster/nims/nim_sl para = grace = 30 hbrate = 1500000 cycle = 6 HACMPnim: name = socc desc = Serial Optical Protocol addrtype = 0 path = /usr/sbin/cluster/nims/nim_socc para = grace = 30 hbrate = 500000 cycle = 12 HACMPnim: name = fddi desc = Fiber Data Optical Protocol addrtype = 0 path = /usr/sbin/cluster/nims/nim_fddi para = grace = 30 hbrate = 500000 cycle = 12 HACMPnim: name = IP desc = Generic IP addrtype = 0 path = /usr/sbin/cluster/nims/nim_genip para = grace = 30 hbrate = 500000 cycle = 12 HACMPnim: name = slip desc = Serial IP protocol addrtype = 0 path = /usr/sbin/cluster/nims/nim_slip para = grace = 30 hbrate = 1000000 cycle = 12 HACMPnim: name = tmscsi desc = TMSCSI Serial protocol addrtype = 1 path = /usr/sbin/cluster/nims/nim_tms para = 204
An HACMP Cookbook
grace = 30 hbrate = 1500000 cycle = 6 HACMPnim: name = fcs desc = Fiber Channel Switch addrtype = 0 path = /usr/sbin/cluster/nims/nim_fcs para = grace = 30 hbrate = 500000 cycle = 12 HACMPnim: name = hps desc = High Performance Switch addrtype = 0 path = /usr/sbin/cluster/nims/nim_hps para = grace = 60 hbrate = 500000 cycle = 32
205
206
An HACMP Cookbook
name = DISK_FENCING value = false HACMPresource: group = goofyrg name = SSA_DISK_FENCING value = false HACMPresource: group = concrg name = CONCURRENT_VOLUME_GROUP value = conc1vg HACMPresource: group = concrg name = INACTIVE_TAKEOVER value = false HACMPresource: group = concrg name = DISK_FENCING value = false HACMPresource: group = concrg name = SSA_DISK_FENCING value = false
207
en_class = en_type = en_alertflg = en_resource = en_rtype = en_rclass = en_method = /usr/lib/ras/notifymeth -l $1 -r $6 -t $9 errnotify: en_pid = 0 en_name = en_persistenceflg = 1 en_label = CDROM_ERR4 en_crcid = 0 en_class = en_type = en_alertflg = en_resource = en_rtype = en_rclass = en_method = /usr/lib/ras/notifymeth -l $1 -r $6 -t $9 errnotify: en_pid = 0 en_name = en_persistenceflg = 1 en_label = CDROM_ERR6 en_crcid = 0 en_class = en_type = en_alertflg = en_resource = en_rtype = en_rclass = en_method = /usr/lib/ras/notifymeth -l $1 -r $6 -t $9 errnotify: en_pid = 0 en_name = en_persistenceflg = 1 en_label = TAPE_ERR3 en_crcid = 0 en_class = en_type = en_alertflg = en_resource = en_rtype = en_rclass = en_method = /usr/lib/ras/notifymeth -l $1 -r $6 -t $9 errnotify: en_pid = 0 en_name = en_persistenceflg = 1 en_label = MEMORY 208
An HACMP Cookbook
en_crcid = 0 en_class = en_type = en_alertflg = en_resource = en_rtype = en_rclass = en_method = /usr/lib/ras/notifymeth -l $1 -t $9 errnotify: en_pid = 0 en_name = en_persistenceflg = 1 en_label = MEM1 en_crcid = 0 en_class = en_type = en_alertflg = en_resource = en_rtype = en_rclass = en_method = /usr/lib/ras/notifymeth -l $1 -r $6 -t $9 errnotify: en_pid = 0 en_name = en_persistenceflg = 1 en_label = MEM2 en_crcid = 0 en_class = en_type = en_alertflg = en_resource = en_rtype = en_rclass = en_method = /usr/lib/ras/notifymeth -l $1 -r $6 -t $9 errnotify: en_pid = 0 en_name = en_persistenceflg = 1 en_label = MEM3 en_crcid = 0 en_class = en_type = en_alertflg = en_resource = en_rtype = en_rclass = en_method = /usr/lib/ras/notifymeth -l $1 -r $6 -t $9 errnotify: en_pid = 0 en_name = TAPE_ERR6 en_persistenceflg = 1
Part 1. Cluster Documentation Tool Report
209
en_label = TAPE_ERR6 en_crcid = 0 en_class = en_type = en_alertflg = en_resource = en_rtype = en_rclass = en_method = /usr/lib/ras/notifymeth -l $1 -r $6 -t $9 errnotify: en_pid = 0 en_name = sda_err1 en_persistenceflg = 1 en_label = SDA_ERR1 en_crcid = 0 en_class = - en_type = - en_alertflg = - en_resource = en_rtype = en_rclass = en_method = /usr/HACMP_ANSS/script/error_SDA $1 $2 $3 $4 $5 $6 $7 $8 $9 errnotify: en_pid = 0 en_name = sda_err3 en_persistenceflg = 1 en_label = SDA_ERR3 en_crcid = 0 en_class = - en_type = - en_alertflg = - en_resource = en_rtype = en_rclass = en_method = /usr/HACMP_ANSS/script/error_SDA $1 $2 $3 $4 $5 $6 $7 $8 $9
210
An HACMP Cookbook
List of Abbreviations
ADSM/6000 AIX APAR
Adstar Distributed Storage Manager/6000 Advanced Interactive Executive Authorized Program Analysis Report The description of a problem to be fixed by IBM defect support. This fix is delivered in a PTF (see below).
IPL ITSO JFS KA KB kb LAN LU LUN LVM MAC MB MIB MTBF NETBIOS NFS NIM
Initial Program Load (System Boot) International Technical Support Organization Journaled Filesystem Keepalive Packet Kilobyte kilobit Local Area Network Logical Unit (SNA definition) Logical Unit (RAID definition) Logical Volume Manager Medium Access Control Megabyte Management Information Base Mean Time Between Failure Network Basic Input/Output System Network File System Network Interface Module Note: This is the definition of NIM in the HACMP context. NIM in the AIX 4.1 context stands for Network Installation Manager.
ARP ASCII AS/400 CDF CD-ROM CLM CLVM CPU CRM DE DLC DMS DNS DSMIT FDDI F/W GB GODM GUI HACMP HANFS HCON IBM I/O IP
Address Resolution Protocol American Standard Code for Information Interchange Application System/400 Cumulative Distribution Function Compact Disk - Read Only Memory Cluster Lock Manager Concurrent Logical Volume Manager Central Processing Unit Concurrent Resource Manager Differential Ended Data Link Control Deadman Switch Domain Name Service Distributed System Management Interface Tool Fiber Distributed Data Interface Fast and Wide (SCSI) Gigabyte Global Object Data Manager Graphical User Interface High Availability Cluster Multi-Processing High Availability Network File System Host Connection Program International Business Machines Corporation Input/Output Interface Protocol
Network Information Service Non-Volatile Random Access Memory Object Data Manager Packet Assembler/Disassembler Power On Self Test Program Temporary Fix A fix to a problem described in an APAR (see above).
RAID
Redundant Array of Independent (or Inexpensive) Disks Reduced Instruction Set Computer Small Computer Systems Interface Serial Line Interface Protocol
211
System Management Interface Tool Symmetric Multi-Processor SNMP (see below) Multiplexor Systems Network Architecture Simple Network Management Protcol Serial Optical Channel Converter Single Point of Failure Sequenced Package Exchange/Internetwork Packet Exchange
System Resource Controller Serial Storage Architecture Transmission Control Protocol Transmission Control Protocol/Interface Protocol User Datagram Protocol Uninterruptible Power Supply Volume Group Descriptor Area Volume Group Status Area Wide Area Network
212
An HACMP Cookbook
D
dessin subdirectory 1 disk adapter planning considerations disk cabling 107 doc_dossier command 77 doc_dossier output report 137 doc_dossier tool 1 documentation report, cluster 137 documentation tool 77 documentation tools 1 10
A
abbreviations 211 acronyms 211 adapter configuration 13 adapter identifier 36 anomalies report 6 application server definition 43 ARP cache 10, 29, 37 hardware address swapping 36
E
error listing, AIX 99 error log 59 error notification testing 64 error notification tool 1, 59 error notification, deleting 66 error simulation 64 error_del script 66 error_MAIL script 62 error_NOTIFICATION script 61 error_PRINT script 63 error_test script 64 errpt 59 event customization example 71 event customization testing 76 event customization tool 1, 67 event logging 75 event_NOTIFICATION script 75 event_select script 67, 71 events, primary 67 events, secondary 68 example cluster description 7
B
backup subdirectory boot adapter 36 1
C
cabling 7133 SSA Subsystem 124 7134-010 High Density SCSI Disk Subsystem 115 7135-110 or 7135-210 RAIDiant Array 117 7137 Model 412, 413, 414, 512, 513, and 514 Disk Array Subsystems 119 7204 Model 315, 317, and 325 External Disk Drives 112 7204-215 External Disk Drive 111 9333 Serial-Link Subsystems 122 9334-011 and 9334-501 SCSI Expansion Units 113 cascading resource groups 44 chinet command 14 chvg command 25 clhosts file 28 clinfo startup 56 clinfo.rc file 29 cllvm command 28 clsmuxpd daemon 56 clstart command 55 clstop command 57 cluster definition 31 cluster documentation report 137 cluster documentation tool 77 cluster environment definition 31
F
forced shutdown 58 fsck Command 24
G
global ODM 33 graceful shutdown 57 graceful with takeover shutdown 57
213
H
hacmp.errlog file 61 hacmp.eventlog file 75 hacmp.out file 56 HACMPevent object class 70 HAMATRIX report 79 hardware address swapping 36 hardware address takeover 10, 47 hostname configuration 13
planning worksheets 9, 12, 131 pre-installation activities 13 primary events 67 private network 7, 36 public network 7, 36
Q
qualified hardware for HACMP quorum checking 21, 25 79
I
importvg command 24 installation of HACMP 27 installation of tools 13 inventory tool 1, 3 inventory tool report 4 IP address takeover 10, 47
R
rebooting nodes 11 resource group definition 44 RS232 cable preparation 97 RS232 link configuration 17 RS232 link definition 38
J
jfslog 11, 21
S
SAVE script 2 script subdirectory 1 SCSI adapter ID changing 108 SCSI bus termination 10 SCSI disk cabling 107 SCSI IDs 11 SCSI target mode configuration 18 secondary events 68 serial network 36 service adapter 36 service address 7 shared disk cabling 7133 SSA Subsystem 124 7134-010 High Density SCSI Disk Subsystem 115 7135-110 or 7135-210 RAIDiant Array 117 7137 Model 412, 413, 414, 512, 513, and 514 Disk Array Subsystems 119 7204 Model 315, 317, and 325 External Disk Drives 112 7204-215 External Disk Drive 111 9333 Serial-Link Subsystems 122 9334-011 and 9334-501 SCSI Expansion Units 113 shared volume group definition 19 shared volume group planning considerations 11 shutdown options, HACMP 57 standby adapter 36 starting cluster services 55 stopping cluster services 57 stty command 17 subnet 36 subnet mask 9, 38 synchronizing cluster nodes 41 synchronizing node environment 52, 75
L
lock manager startup 56 logform command 22 logging, events 75 lscfg command 37 lvlstmajor command 12, 21
M
MAC address 10, 36 major number 21 major numbers 12 mirroring scheduling policy mktcpip command 14 mkvg command 20 23
N
nameserver 14 network adapter definition 34 network planning considerations 9 NFS cross mount 48 NFS exports 47 node definition 33 node environment definition 43 application server definition 43 resource group definition 44 node environment synchronization 52, 75 node isolation 34 non-TCP/IP network configuration 17
P
permissions 16
T
tail -f command 57
214
An HACMP Cookbook
takeover shutdown 57 target mode configuration 18 TCP/IP addresses 9 terminating resistor blocks 10, 107 termination, SCSI 10 testing, event customization 76 tools subdirectory 1 tty device 17
U
utils subdirectory 1
V
verification 53
Y
Y-cables 10
Index
215
RED000
Your feedback is very important to help us maintain the quality of ITSO Bulletins. Please fill out this questionnaire and return it using one of the following methods:
Mail it to the address on the back (postage paid in U.S. only) Give it to an IBM marketing representative for mailing Fax it to: Your International Access Code + 1 914 432 8246 Send a note to [email protected]
Please rate on a scale of 1 to 5 the subjects below. (1 = very good, 2 = good, 3 = average, 4 = poor, 5 = very poor) Overall Satisfaction Organization of the book Accuracy of the information Relevance of the information Completeness of the information Value of illustrations ____ ____ ____ ____ ____ ____ Grammar/punctuation/spelling Ease of reading and understanding Ease of finding information Level of technical detail Print quality ____ ____ ____ ____ ____
Please answer the following questions: a) If you are an employee of IBM or its subsidiaries: Do you provide billable services for 20% or more of your time? Are you in a Services Organization? b) c) d) Are you working in the USA? Was the Bulletin published in time for your needs? Did this Bulletin meet your needs? If no, please explain: Yes____ No____ Yes____ No____ Yes____ No____ Yes____ No____ Yes____ No____
Comments/Suggestions:
Name
Address
Company or Organization
Phone No.
RED000
IBML
IBM International Technical Support Organization Department JN9B, Building 045 Internal Zip 2834 11400 BURNET ROAD AUSTIN TX USA 78758-3493
SG24-4553-00
IBML
Printed in U.S.A.
SG24-4553-00